Persian Today Corpus
The '''Persian Today Corpus''' or '''The Persian One-Million-word Corpus''' ({{PerB|واژههاي پركاربرد فارسي امروز }}) is a book written in [[Persian language|Persian]] by [[Hamid Hassani]], published in [[Iran]], [[Tehran]], [[2005]]. The book is based on a 1,000,000-word [[text corpus|corpus]] that contains 80 ‘‘main texts’’ (over 500 subtexts) of modern [[Persian language|Persian]], mostly written in the years [[1994]]-[[2004]]. By ‘‘main texts’’ the writer means those [[publications]] which are referred to as ‘‘books’’, ‘‘magazines’’, and ‘‘newspapers’’ as well as ‘‘subtexts’’ chapters or short and long articles and essays that books, magazines, and newspapers are composed of. There is no doubt that the usefulness of a [[corpus]] is primarily judged by its volume and the variety of its sources.
The Persian Today Corpus is a [[Corpus]] not a Concordance Dictionary. In a corpus, the words appear exactly as used in the source texts.
The first important advantage of a corpus is its efficiency in [[language]] description (morphological, lexical, orthographic, and [[phonetic]] features, to name the least). The second advantage is providing accurate [[statistics]] for collecting basic vocabulary and compiling textbooks for [[language teaching]].
There are different types of [[corpora]]: sheer corpora, [[concordance]] dictionaries, and word indexes. Compiled by specialists in research centers, universities, and academies of several countries, especially developed ones, [[lingual]] corpora have been around since decades ago. The best known corpora of the world, such as the [[Brown Corpus]], usually include around 1,000,000 words, though there are some corpora made up of several hundred million words. Among corpora the most famous ones in the world are those prepared for [[English language|English]] ([[United States|American]] and [[United Kingdom|British]]), some of which, like the [[British National Corpus]], consist of over 100,000,000 words.
Sponsored by the [[Iran Language Institute]] (ILI), a learner’s dictionary of [[Persian language|Persian]] is being compiled by the other [[Iran|Iranian]] scholar, [[Behruz Safarzadeh]] (in collaboration with [[Hamid Hassani]]), which is due to be published in 2006. This dictionary consists of over 5,000 entries and the basis for choosing some of entries and the [[defining vocabulary]] is the above-mentioned 1,000,000-word corpus. It is expected that the learner’s dictionary, which is the first corpus-based Persian dictionary, will be welcomed by Persian lovers around the world.
These are some Persian words with their original [[orthography]], [[pronunciation]] (large letters show [[accented syllable]] in each [[word]]), meaning in [[English language|English]], [[frequency]], and [[usage]] [[percentage]] according to Hassani’s [[corpus]]:
=== ===
{| class="wikitable" style="font-size:90%; float: right"
!No. !!Words !!Grammatical Categories (and Meanings) !!Frequencies and Percentages
|-
| 1 || و <''VA/ -O''> || a [[conjunction]] that means ''and'' || 49,758 times of 1,002,394 (4.96%),
|-
| 2 || به <''BE''> || a [[preposition]] that means ''to'', ''at'', ''in'', or ''with'' || 32,478 times (3.24%),
|-
| 3 || را <''RAA''> || a [[Grammatical particle|particle]] serving as a sign of the [definite] [[direct object]] || 25,797 times (2.57%),
|-
| 4 || از <''AZ''> || a preposition that means ''from'', ''of'', ''since'', ''than'', out of, or belonging to || 23,717 times (2.37%),
|-
| 5 || كه <''KE''> || a conjunction, a [[pronoun]], a [[relative]], or an [[interrogative]] that means ''that'', ''which''; ''who'', ''who?''; or used idiomatically || 22,593 times (2.25%),
|-
| 6 || در <''DAR''> || a preposition that means ''in'', ''at'', ''on'', or ''within''; a [[noun]] that means ''door'' || 21,671 times (2.16%),
|-
| 7 || اين <''IIN''> || an adjective or a pronoun that means ''this'' || 11,762 times (1.17%),
|-
| 8 || با <''BAA''> || a preposition that means ''with'' or ''by'' || 11,611 times (1.16%),
|-
| 9 || است /-ست <''AST/-ST''> || a [[verb]] that means ''is'' || 9,837 times (0.981%),
|-
| 10 || آن <''AAN''> || an adjective or a pronoun that means ''that'', or a noun that means ''[[moment]]'' || 6,999 times (0.698%)...
|-
| 30 || كار <''KAAR''> || a [[noun]] that means ''work'' || 2,535 times (0.253%)...
|-
| 50 || بيرون <''biiROON''> || an [[adverb]] that means ''out'' or ''outside'' || 1,551 times (0.155%)...
|-
| 70 || هيچ <''HIICH''> || an adjective, a noun, or an adverb that means ''any'', ''nothing'', ''ever'', at all, or ''no'' || 1,277 times (0.127%)...
|-
| 100 || بابا <''baaBAA''> || a noun that means ''[[papa]]'', ''daddy'', ''dad'', or ''father'' || 1,005 times (0.1%)...
|-
| 125 || شب <''SHAB''> || a noun or an adverb that means ''night'' || 856 times (0.085%)...
|-
| 137 || ايران <''iiRAAN''> || the [[proper noun]] [[Iran]] || 774 times (0.077%)...
|-
| 142 || كتاب <''keTAAB''> || a noun that means ''book'' || 759 times (0.076%)...
|-
| 150 || آنجا / آنجا <''aan-JAA''> || an adverb or a pronoun that means ''there'' || 726 times (0.072%)...
|-
| 196 || شهر <''SHAHR''> || a noun that means ''city'' or ''town'' || 594 times (0.059%)...
|-
| 210 || چشم <''CHESHM''> || a noun that means ''eye'' || 552 times (0.055%)...
|-
| 376 || امروز <''emROOZ''> || a noun or an adverb that means ''today'' || 319 times (0.032%)...
|-
| 396 || كشور <''keshVAR''> || a noun that means ''country'' || 297 times (0.03%)...
|-
| 476 || آمريكا /امريكا <''aamriiKAA/emriiKAA''> || the proper noun [[America (disambiguation)|America]] || 258 times (0.026%)...
|-
| 545 || ده <''DAH''> || a [[numeral]] (adjective/noun) that means ''ten'' || 233 times (0.023%)...
|-
| 838 || امام <''eMAAM''> || a noun that means ''[[Imam]]'' || 157 times (0.016%)...
|-
| 879 || انگليسي <''engeliiSII''> || the proper nouns [[English language|English]] or [[United Kingdom|British]] || 149 times (0.015%)...
|-
| 1000 || حسابي <''hesaaBII''> || an adjective that means ''good'' or ''regular'' || 133 times (0.013%)...
|-
| 1150 || عسل <''aSAL''> || a noun that means ''[[honey]]'' || 116 times (0.011%)...
|-
| 1500 || دروني <''darooNII''> || an adjective that means ''[[internal]]'' || 87 times (0.009%)...
|-
| 1857 || ده <''DEH''> || a noun that means ''[[village]]'' || 70 times (0.007%)...
|-
| 2000 || ميرساند <''MI-resaanad''> || a verb that means he/she/it reaches/extends/delivers/supplies/carries || 65 times (0.006%)...
|-
| 2792 || جمعه <''jom’E''> || a noun or an adverb that means Friday || 43 times (0.004%)...
|-
| 3000 || كلاسها <''kelaas-HAA''> || a plural noun (a noun + [[suffix]]) that means ''classes'' || 40 times (0.004%)...
|-
| 3445 || شاهزاده <''shaah-zaaDE''> || a noun that means ''prince'' or ''princess'' || 34 times (0.003%)...
|-
| 4418 || جوراب <''jooRAAB''> || a noun that means ''socks'' or ''stockings'' || 24 times (0.002%)...
|-
| 5000 || بخت <''BAKHT''> || a noun that means ''[[luck]]'' or fortune || 20 times (0.002%)...
|-
| 5552 || ميليمتر <''miiliiMETR''> || a noun that means ''[[millimeter]]'' || 18 times (0.002%)...
|-
| 8000 || سووشون <''soovaSHOON''> || the proper noun ''Suvashun'', the name of a Persian [[novel]]) written by [[Simin Daneshvar]] || 10 times (0.001%)...
[[Category:Corpus linguistics]]
[[Category:Persian language]]