User:Robert Ullmann/Pronunciation statistics


pronunciation section statistics


  • from XML dump as of 13 June 2008
  • total of 81775 pronunciation sections in 873169 entries
  • total of 119785 pronunciation lines, average 1.465 per section


Pronunication lines by type of line:

  • "accent" means the line has an {{a}} template, followed by (enPR), IPA and (SAMPA). These lines are not listed under enPR, IPA etc
  • IPA, SAMPA, etc are others with those templates or text on the line.
  • "qualifier" means a line usually starting with * that is like "* RP:" or uses {qualifier} for the same thing.
  • Classification of lines is not exact.
  • SAMPA includes X-SAMPA, IPA includes lines with {{IPAchar}}.
  • blanks, comments, sister links, etc are not included in totals
line type occurs
accent 3490
AHD 234
IPA 46977
IPA/SAMPA 2463
SAMPA 4247
enPR 650
enPR/IPA/SAMPA 1357
ad hoc 7811
audio 24624
homophones 532
hyphenation 9165
rhymes 14223
comment 167
image 37
qualifier 1953
rfc/rfp/rfap 399
sisterlink 134
table syntax 900
other 760


number of lines in sections, counting all types except blank lines, comments, sister, image

lines occurs pages (if 10+ lines in a section)
0 146
1 56248
2 17054
3 6126
4 1313
5 399
6 134
7 107
8 69
9 32
10 22 been father you're entrance advocate poor novem manganese marathon 0 transport nasty whore marriage data foray plaque articulate transpose animate Mars Robin
11 25 alphabetical acerbity maroon read though drawer marten abeille manzana magnesium polytonic use abo hooter record contract associate hendecasyllabic duplicate monosyllabic polysyllabic pentasyllabic octosyllabic trisyllabic pitää varansa
12 10 pneumonoultramicroscopicsilicovolcanoconiosis project copper chalk complex março excuse clerk gnocchi Martin
13 1 thorn
14 2 our ت
15 1 قابلة
16 1 solder
28 2 hello Celtic
29 1 atomic


number of lines in sections by language, for languages with 10 or more pronunciation sections

language sections lines average
Albanian 11 11 1.000
Ancient Greek 3433 3609 1.051
Arabic 34 75 2.206
Aramaic 517 517 1.000
Armenian 11 18 1.636
Aromanian 19 19 1.000
Asturian 20 20 1.000
Basque 13 14 1.077
Bengali 27 27 1.000
Breton 32 35 1.094
Bulgarian 717 729 1.017
Catalan 45 54 1.200
Classical Nahuatl 441 463 1.050
Croatian 22 22 1.000
Czech 1025 1064 1.038
Danish 63 83 1.317
Dutch 2773 3091 1.115
Egyptian 50 56 1.120
English 27869 47354 1.699
Esperanto 22 31 1.409
Estonian 17 18 1.059
Ewe 120 210 1.750
Faroese 1620 1639 1.012
Fijian Hindi 87 87 1.000
Filipino 40 41 1.025
Finnish 6411 12674 1.977
French 6348 11117 1.751
Ga 18 35 1.944
Gamilaraay 70 70 1.000
German 1471 1748 1.188
Greek 344 354 1.029
Guugu Yimidhirr 25 25 1.000
Hebrew 400 420 1.050
Hungarian 2634 5151 1.956
Icelandic 269 388 1.442
Indonesian 15 28 1.867
Interlingua 10 13 1.300
Irish 777 891 1.147
Isthmus Zapotec 10 10 1.000
Istro-Romanian 21 21 1.000
Italian 911 980 1.076
Japanese 128 143 1.117
Jingpho 34 34 1.000
Kabyle 11 2 0.182
Kashubian 25 25 1.000
Korean 2450 2589 1.057
Lao 244 244 1.000
Latin 290 470 1.621
Lithuanian 99 99 1.000
Lojban 96 99 1.031
Macedonian 28 28 1.000
Mandarin 4911 5995 1.221
Martuthunira 28 28 1.000
Megleno-Romanian 11 11 1.000
Min Nan 1598 2098 1.313
Moldavian 15 15 1.000
Navajo 25 25 1.000
Norwegian 74 76 1.027
Occitan 11 12 1.091
Old English 1662 1689 1.016
Old Prussian 105 204 1.943
Persian 146 155 1.062
Polish 2009 2473 1.231
Portuguese 202 284 1.406
Romanian 3970 4027 1.014
Russian 1391 1615 1.161
Scots 234 247 1.056
Scottish Gaelic 173 178 1.029
Serbian 63 64 1.016
Seri 14 16 1.143
Slovene 18 18 1.000
Spanish 911 1052 1.155
Swedish 1336 1422 1.064
Tagalog 26 26 1.000
Tok Pisin 10 10 1.000
Translingual 79 97 1.228
Turkish 42 327 7.786
Vietnamese 56 88 1.571
Volapük 15 40 2.667
Warlpiri 11 11 1.000
Welsh 138 146 1.058
Western Apache 14 15 1.071
Xhosa 13 13 1.000
Yiddish 78 79 1.013