Open main menu

Wiktionary β

User talk:Wyang


  • Archive 1 — 2013/01/18 21:12 (UTC) to 2014/05/24 00:43 (UTC)
  • Archive 2 — 2014/05/25 15:03 (UTC) to 2015/01/25 11:17 (UTC)
  • Archive 3 — 2015/01/23 00:31 (UTC) to 2015/07/10 05:42 (UTC)
  • Archive 4 — 2015/08/15 18:18 (UTC) to 2016/07/18 01:13 (UTC)
  • Archive 5 — 2016/07/18 18:16 (UTC) to 2017/01/13 10:16 (UTC)
  • Archive 6 — 2017/01/16 04:17 (UTC) to 2017/06/27 06:38 (UTC)
  • Archive 7 — 2017/06/25 09:08 (UTC) to

User talkEdit

Korean hanja compoundsEdit

Do you know of the best way to format compounds in Korean hanja entries? The plain links and lack of consistent formatting bothers me. —suzukaze (tc) 08:14, 20 August 2017 (UTC)

No, and I don't think there is one either. The Hanja entries as a whole needs a format overhaul. 樂#Korean looks horrible; the eumhun template should be redesigned to match the intelligence of the Japanese readings template. As a headword template it is way too long. Wyang (talk) 08:23, 20 August 2017 (UTC)
I agree completely. —suzukaze (tc) 08:34, 20 August 2017 (UTC)


Hi Wyang, The second reading for (lèi) is . I have no problem with the second character's usage but am somewhat unsure about the use of in this word or phrase. Maybe something about "family" or "house"? Unsure due to having a large number of meanings. Thanks! Bumm13 (talk) 23:46, 26 August 2017 (UTC)

This sense seems to be one of those hapax legomena; the explanation of 門祭 was recorded in Jiyun, and subsequent dictionaries have inherited this explanation verbatim, afraid of misinterpreting. My guess would be that it referred to "a rite of door god worship". Wyang (talk) 00:27, 27 August 2017 (UTC)

automatic display of definitions in Derived termsEdit

Hi Wyang. Would there be any way to automatically display definitions for terms listed in Derived terms, similar to how we display definitions in the hanzi boxes? This would mean we wouldn't have to manually add them like we do sometimes. Just an idea. ---> Tooironic (talk) 01:12, 27 August 2017 (UTC)

Hi Tooironic. I haven't thought about this thoroughly before - it sounds like a good idea, if the definitions can be cleanly fetched from derived term pages and aesthetically displayed, without placing a too high memory load on the entry. @Justinrleung, Suzukaze-c, Atitarev. I have several ideas for a more interactive, dynamic and visually appealing layout in Chinese entries, one of which is a hover-over display of pronunciations and definitions for Chinese terms in an entry, similar to what these websites are doing. Wyang (talk) 01:23, 27 August 2017 (UTC)
I think it's a good idea, but I don't know how necessary it would be. If we do do this, would we be listing the first definition or all the definitions? We should also consider setting a maximum number of derived terms for there to be automatic definitions. — justin(r)leung (t...) | c=› } 01:49, 27 August 2017 (UTC)
Yes, I think displaying the first definition would be more than adequate. Derived terms rarely have more than one translation anyway. ---> Tooironic (talk) 10:46, 28 August 2017 (UTC)


Hi Wyang. Do you know if 白菜 should be translated as bok choy or napa cabbage? Or both? When I eat 白菜 in China it's usually the latter, but our entry suggests the former. Thanks. ---> Tooironic (talk) 00:45, 30 August 2017 (UTC)

It could be both, and it can also be 小白菜 (see the images from search engines). In northern China it is usually napa cabbage, very rarely bok choy. @Justinrleung It would be a good candidate for zh-dial, although it is quite complex... Wyang (talk) 00:58, 30 August 2017 (UTC)
@Tooironic 白菜 is bok choy (小白菜) in Cantonese, which is probably why the English name for 小白菜 is bok choy. It's gonna be challenging since it involves several overlapping names. I'm slowly compiling a list of names for 白菜 and related terms. — justin(r)leung (t...) | c=› } 05:25, 31 August 2017 (UTC)

Here is a list of definitions for different kinds of 白菜 based on different regions. I'm not sure how that could turn into zh-dial tables. — justin(r)leung (t...) | c=› } 20:52, 4 September 2017 (UTC)

Just to be difficult, I might as well point out that the most common type of the European cabbage (Brassica oleracea) is known as white cabbage... Chuck Entz (talk) 21:54, 4 September 2017 (UTC)
@Justinrleung Wow!!! I think it would be good to have dial tables for the two things below for now. 油菜 is a bit vague. The others, such as 圓白菜, can come later. Wyang (talk) 13:45, 5 September 2017 (UTC)


As a second reading and definition of , gives "". Does this mean annoying, bothersome? I just wanted to check because none of the Chinese-English dictionaries seem to give the specific meaning of the word/phrase. Cheers! Bumm13 (talk) 11:46, 4 September 2017 (UTC)

Hi. Its definition is (~) (simplified: 烦郁), which means (1) thick; dense; (2) glum; downcast; depressed. Here, 漨浡 refers to the first definition ("thick; dense; profuse; abundant (e.g. of clouds)"). It is essentially a variant of 蓬勃. Wyang (talk) 12:41, 4 September 2017 (UTC)


Hi Wyang, sorry to be a pest but I'm trying to figure out the first definition of . It's given as "水虚;水的中心有空处". Does it mean water deficiency? The second part of that definition in Chinese seems like it's quite literal ("spot or place in the center of water") but I'm not quite parsing it. Thanks again for your help! Bumm13 (talk) 18:03, 4 September 2017 (UTC)

Hi, 水虚;水的中心有空处 seems to say "the middle spot of a pool of water, where there is no water (due to the ground being elevated?)". Sorry! I'm not understanding this completely either. Wyang (talk) 13:53, 5 September 2017 (UTC)


I came across this online video, and I think it must be some instructional video on doing cloud hands. If so, I think it would be an excellent external link for the subject title entry. --Lo Ximiendo (talk) 05:51, 7 September 2017 (UTC)

As we're not an encyclopedia, I think external links to YouTube are almost never appropriate for non-signed languages. However, I have added a video I found on Wikimedia Commons to the entry. —Μετάknowledgediscuss/deeds 05:55, 7 September 2017 (UTC)
But the video from Wikimedia Commons concerns tai chi, while the one from YouTube concerns Classical Chinese Dance. --Lo Ximiendo (talk) 06:05, 7 September 2017 (UTC)

Arabic on WiktionaryEdit

I'm curious if you have any thoughts on this. I don't know how much you've studied Arabic, but I've noted that you have some interest, and the technical challenges in handling it seem to me quite similar to those for Chinese. I feel like if we only had a robust infrastructure for Arabic as a macrolanguage, with support for dialectal synonyms and pronunciations, we might be able to keep more serious Arabic contributors. @Wikitiki89, AtitarevΜετάknowledgediscuss/deeds 16:29, 8 September 2017 (UTC)

I've said in other places that the main problem is the lack of comprehensive, easy-to-use, reliable resources for Arabic dialects. And furthermore the fact that the dialects don't have a standard orthography makes it that much more difficult. I don't know which of these problems are shared with Chinese. --WikiTiki89 17:18, 8 September 2017 (UTC)
There may not be comprehensive resources for the Arabic macrolanguage as a whole, but there are pretty good dictionaries for a vast array of topolects, and well described sound correspondences. Surely that would be sufficient? —Μετάknowledgediscuss/deeds 20:34, 8 September 2017 (UTC)
@Metaknowledge, Wikitiki89, for Chinese, while there are certain works that have tables comparing words in different topolects, the vast majority is dictionaries or vocabulary lists that have Standard Chinese glosses. If there are dictionaries/vocabulary lists for particular topolects, that should be sufficient. That being said, most of them follow a similar grouping by category, which makes it easier to find words. I'm not sure if the Arabic resources would be sorted by category or have an index sorted by category. The issue with orthography might be a bigger problem. In Chinese, since the words are written in Chinese characters, cognate words are generally written with the same orthography. There are variations in orthography, but I generally find the more "etymologically-sound" orthography for the purpose of comparison. I'm not sure if this approach to orthography is possible for Arabic. — justin(r)leung (t...) | c=› } 20:46, 8 September 2017 (UTC)
What about using primary sources that can be found online such as newspapers? There would be no information on pronunciation, but they would provide the orthography. DTLHS (talk) 21:18, 8 September 2017 (UTC)
Newspapers are written in Standard Arabic, not in dialects. Only informal communication might be written in dialects, otherwise the dialects are mainly spoken languages. Also music and movies are usually in dialect. --WikiTiki89 21:36, 8 September 2017 (UTC)
In my opinion, the focus should be on MSA. Sorry to be blunt, but if Arabs don't care about written dialects and don't think they are worthy enough to write in dialects, we have no choice. There are simply not enough dictionaries and references to make the infrastructure work for all dialects.
Some other points: the majority of the vocabulary will share the spellings, just like in Chinese, the differences will be only in pronunciations. Providing inflections is going to be a challenge, especially for verbs - this is where there is a substantial difference from MSA.
The vocabulary, which differs from MSA is not very numerous but they are the most common words. Dialects are also more acceptable to loanwords. IMO, common loanwords or dialectal words used in multiple dialects should simply be marked "colloquial, regional" and can be used in MSA, even if Arabs don't consider them part of fuṣḥā.
As for resources for dialects, they are limited but they do exist. As I said, the differences are mostly in most common words, which can be collected even from Egyptian, Moroccan, Iraqi, etc. phrasebooks. An Egyptian and Gulf Arabic textbook, a Moroccan textbook and English-Arabic (Egyptian/Syrian) dictionary. The last two are romanised. I've seen Iraqi, Lebanese, Syrian, Tunisian and Algerian resources.
The infrastructure can be experimented with. A good start would be words, which are all different in dialects, like the word for what. --Anatoli T. (обсудить/вклад) 00:01, 9 September 2017 (UTC)
@Atitarev: To respond to your points individually: Yes, the focus should be on MSA, but right now it's almost exclusively MSA, and that's a problem. There are a lot more resources out there than you are aware of, even for dialects that lack an ISO code and have zero representation on Wiktionary currently (e.g. Nigerian Arabic). Anyway, "what" would be a good place to try out an {{ar-dial}} modelled on {{zh-dial}}, and قابلة is an example of a low-tech attempt of what should eventually be an {{ar-pron}} modelled on {{zh-pron}}. —Μετάknowledgediscuss/deeds 00:49, 9 September 2017 (UTC)
@Metaknowledge I only made some examples of books you can get from shelves of some bookstores. The research of course exists. One major difference between the Chinese and the Arabic modules would be that dialectal Arabic transliterations would need to be entered, never generated. MSA transliterations and phonemic transcriptions can be generated from vocalised Arabic (for most and regular words). Dialectal pronunciation are only partially predictable from the standard Arabic readings. Transliterations are not standardised, so if a dialectal dictionary could be used as a resource, a lot of work will be required to normalise it. --Anatoli T. (обсудить/вклад) 01:15, 9 September 2017 (UTC)

The infrastructure we currently have only allows automatic phonemic IPA transcriptions of Arabic. We need phonetic, even for MSA, which requires more work. To add some positives to the discussions - a quick win would be to add phonemic transcriptions for Arabic dialects based on transliterations only. The Arabic module already allows to do that for MSA - for irregularly pronounced words or when it's difficult to determine the reading. --Anatoli T. (обсудить/вклад) 01:38, 9 September 2017 (UTC)

I agree with a lot of the points by Anatoli and Wikitiki above, and I also agree that the lack of reasonable infrastructure is often a big disincentive for native editors of these languages, although paradoxically those are exactly the editors who would best spearhead such infrastructure projects. The Arabs do not seem interested in writing in dialects, perhaps even less so than in Chinese. I think an entry format similar to Chinese would be very beneficial for Arabic. There are a number of things that need to be considered:
(1) how dialectal pronunciations (e.g. in كِتَاب (kitāb)) should be managed: via a central template {{ar-pronunciations}}? At present there is {{arabic-dialect-pronunciation}} which IMO looks like the embryonic form of a pronunciation template that we will need. What (systemic) resources for dialectal pronunciations do we have and what dialects do they cover? For these dialects, what input should we use for the dialects in this template - phonetic Arabic script (for that particular dialect), Latin-script transcription or IPA? The answer is probably both dialect- and resource-dependent, and it may have to be designed by Wiktionary, like what we did for many Chinese dialects. Arabic varieties are incredibly complex, but we can start with the major ones, such as maṣri and šāmi.
(2) lexicon: I suspect nouns are going to be the most conserved and easiest to handle. Many textbooks (and general dictionaries even) may have a limited number of dialectal correspondences―an example is here (a screenshot of my textbook)―but I suspect the majority of our information will come from specific dialectal dictionaries. I remember User:GeekEmad has shared his resources on Maghrebi Arabic here; it would be good if relevant dialectal lexical resources can be listed, by variety, on a central reference page (WT:About Arabic/references). This will overlap with #1, as many dictionaries will have pronunciations or phonetic Arabic-script forms, followed or preceded by the glosses. In some (or many?) cases the dialectal forms are likely to be unattestable, so {{ar-dial}} will need to suppress the links, though the information of the lexical differences in the {{ar-dial}} itself is already invaluable, unlikely to be found elsewhere.
(3) grammar: This is probably the most variable and the most difficult to deal with, although there is at present no information on dialectal conjugations on Wiktionary anyway (that I'm aware of). It could well be the case that we will only be able to generate MSA conjugation tables on Wiktionary, but that is less of a concern for now. Our focus should be on: (1) the presentation of Arabic entries; (2) how dialectal forms of Arabic should be written, and (3) how the dialectal phonetic information should be handled (hopefully systematically, akin to {{zh-pron}}). Wyang (talk) 04:12, 9 September 2017 (UTC)
  1. I am not concerned how this is implemented but MSA will always stay the default and the only one that can more or less rely on the conversion from the Arabic script into transliteration first, then transcription. Phonologically, dialects have less consonants (some are merged) but they may use some some foreign sounds, which have also penetrated fuṣḥā. Dialects are also more likely to use vowels e/ē, o/ō, which are missing in the classical Arabic and can't be rendered with the Arabic script.
  2. I will start listing what I can find or what I possess. I can use my small Arabic-English/English-Arabic Concise Romanized Dictionary for Syrian (or the whole Levantine) and Egyptian (ISBN 10: 0781806860).
  3. You're right, dialectal grammar is not handled in Wiktionary and inflection table only cater for MSA. Dialectal inlfections are much simpler, have less forms or missing altogether - only a fraction of ʾiʿrāb but duality, plurality, genders and verb conjugations are all used. To some extent, dialectal conjugations can be taken from MSA but there are differences and verbs that are only used in dialects, need some handling as well.
  4. I have just created كَذَّاب (kaḏḏāb), which has all required info for MSA and I have added the Syrian (Levantine?) and Egyptian pronunciations from my dictionary above. Although the declension table only refers to MSA, the rows, which say "informal" are used in most dialects as well. Notably the nominative indefinite plural form كَذَّابُونَ‏ (kaḏḏābūna‏) is not used in dialects. The transliteration is, apparently the standard one.
* {{ar-IPA|كَذَّاب}}
*: {{i|Egyptian}} {{ar-IPA|tr=keddāb}}
*: {{i|Levantine}} {{ar-IPA|tr=kizzāb}}
--Anatoli T. (обсудить/вклад) 06:32, 9 September 2017 (UTC)
I think we'll want to use Latin script input to produce IPA, for the reasons Anatoli gave. Dialectal spellings are very unstandardised, but tend to cleave to MSA orthography where possible, so that shouldn't be too hard to generate given the pagetitle and dialectal romanisation. What has to be done first in order to create a pronunciation template that can handle a few major dialects correctly and appropriately? —Μετάknowledgediscuss/deeds 23:08, 9 September 2017 (UTC)
I think creating some help pages for each of the dialects we would like to include would be a good idea. This was the approach used by the Chinese editors before writing modules to generate dialect pronunciations, e.g. Wiktionary:About Chinese/Xiang, Wiktionary:About Chinese/Gan, so that we had an idea of the phonologies. Also we should think about whether we would like to produce phonetic IPA, phonemic IPA or both; I suspect the current output of {{arabic-dialect-pronunciation}} on قَابِلَة (qābila) is actually phonetic. BTW, Wikipedia's Varieties of Arabic is quite well-written; I found the table in the section #Phonetics very informative. Wyang (talk) 00:26, 10 September 2017 (UTC)
Dialectal phonology is very close to the standard. Some consonants have merged, so dialects lack some consonants but they use some consonants more often than MSA - g, č, ž. Consonants v and p are only used in loanwords. If anything is missing, they could be added later, when the need arises. Yes, transcriptions in قَابِلَة (qābila) are phonetic. For phonemic, we could just use notations like "ʾāble", "gābla", etc. --Anatoli T. (обсудить/вклад) 03:19, 10 September 2017 (UTC)
  • @Wikitiki89, Atitarev, Mahmudmasri: I have written WT:About Arabic/Egyptian. Please beware, as there may be still be errors. Does this conform to the sort of page that will be needed? (Note that our current Egyptian Arabic content is romanised in a variety of ways, and often lacks romanisation altogether by using IPA instead; this system is an extension of how we normally romanise Arabic here.) —Μετάknowledgediscuss/deeds 03:22, 10 September 2017 (UTC)
    Great start! I really like the page. Can't really comment on the phonology, but the format is what I had in mind. Additional columns can be added to show how the transcriptions in different dictionaries differ. I wonder if the phonetic pronunciation is largely predictable from the romanisation; if so, some module (Module:arz-pron) can be created to handle the romanisation-to-IPA conversions. Wyang (talk) 03:59, 10 September 2017 (UTC)
    I've intended that the pronunciation should be predictable from the romanisation, yes, although there will be exceptions and potential edge cases I haven't considered. I'd really like to try out a module, but I don't even actually speak Arabic (yet), so I'd need a fluent (preferably native) speaker to help with testcases. —Μετάknowledgediscuss/deeds 04:31, 10 September 2017 (UTC)
    It can but the transliteration can only give the phonemic pronunciation, e.g. جَمِيل (jamīl): Egyptian romanisation "gamīl" -> /ɡa.miːl/ (phonemic), [ɡæˈmiːl] (phonetic). Like many others, the choice between vowels [ɑ(ː)] and [æ(ː)] is not so straightforward. --Anatoli T. (обсудить/вклад) 06:11, 10 September 2017 (UTC)
    Actually, that shouldn't be a problem — if no other vowel is between a/ā and an emphatic consonant, it goes to [ɑ(ː)]. Please tell me if you have any examples that can't be predicted based on that alone (with emphatic consonants as defined on the page). —Μετάknowledgediscuss/deeds 07:43, 10 September 2017 (UTC)
    Erutuon described it better than I could have. For me, adding dialectal phonemic pronunciation wouldn't be a problem, if there was a decision to merge Arabic dialects but there isn't. Arabic speakers don't realise the gains dialects may get, if dialects are treated as "Arabic". Phonetic transcription is not so easily available but a phonemic can be done based on transliterations. They should be normalised to match our standards, though. As I said before, we might need to add one or two odd regional sounds but the modules are about ready to work with dialects. --Anatoli T. (обсудить/вклад) 09:47, 10 September 2017 (UTC)
    (edit conflict) Regarding back [ɑ(ː)] and front [æ(ː)], I see two usable options described in the Wikipedia article: considering them separate phonemes, or transcribing certain cases of r, b, m, l as emphatic.
    In the emphatic consonant option, we would have, for instance, تجارة tigāṛa "commerce" [teˈɡɑːra], مية ṃayya "water" [ˈmɑjja]), with a dot indicating emphasis, as with the conventional emphatic consonants (aside from q). I guess in this analysis, if a word contains a back a ([ɑ(ː)]) but there isn't a conventional emphatic consonant (, , , , q), then one of the consonants r, b, m, l in the word is considered to be emphatic, so that there is a emphatic consonant to trigger emphasis spreading (backing). The emphasis-triggering behavior of r seems, however, to be predictable, unlike that of the other consonants. Not sure how to decide which consonant is emphatic if there's more than one of r, b, m, l in the word. Maybe then all of them are emphatic? (The Wikipedia article gives the example ḅāḅa [ˈbɑːba] "patriarch" contrasted with bāba [ˈbæːbæ] "Paopi".) Against the emphatic consonant option is the fact that it's somewhat misleading: at least, I don't see it mentioned that r, b, m, l are actually emphatic phonetically (that is, velarized, uvularized, or pharyngealized).
    As for the option in which front and back a are phonemes, I'm not sure how they would be transcribed: æ and a, æ and å? (tigāra or tigå̄rå contrasting with تجاري tigǣri "commercial" [teˈɡæːri].) Considering front and back a to be phonemes has the benefit of not requiring the postulation of emphatic consonants that are not phonetically emphatic. But it seems to ignore the fact that the emphasis-triggering behavior of r is predictable. And I wonder if it would require that the other emphatic consonants not be considered phonemes, as the two are connected. That is a problem, if emphatic consonants affect the pronunciation of vowels besides a as the Wikipedia article states. (There's currently no source given for that claim. It is true in other varieties of colloquial Arabic at least.) — Eru·tuon 08:31, 10 September 2017 (UTC)
    If you read WT:About Arabic/Egyptian, you will see that I have already addressed that issue by choosing to denote emphatic ṛ, ḅ, etc. This seems to be the most common analysis in the references I've read, probably because Arabic is traditionally very consonant-heavy and vowel-light, so until it evolves more, it is still most convenient for morphophonological analysis to consider them as extra consonants. —Μετάknowledgediscuss/deeds 18:11, 10 September 2017 (UTC)
    I think it's the easiest choice to implement. Probably the pronunciation module will be able to display both analyses by transforming one version to the other (/bˁaːbˁa//bɑːbɑ/). — Eru·tuon 20:49, 10 September 2017 (UTC)
  • I have now created WT:About Arabic/Moroccan as well; the same caveats apply. This might be enough to test out a module, although they need to be checked, and more of these ought to be written. @Wikitiki89, Atitarev, maybe you could write up a similar page for Urban Levantine and any other topolects you are familiar with? —Μετάknowledgediscuss/deeds 20:22, 10 September 2017 (UTC)
    • @Metaknowledge Thanks for your efforts in creating the pages. They are very good and can definitely be used, however, they require quite an effort, if this info is not readily available. For working on transliteration and transcription modules, strictly speaking, they are not required, as long as the transliteration is correct. I mean, they won't help to provide 100% accurate mappings from MSA to ʿāmmiyya. Other points - vocalisations are mainly designed for MSA, so when they are used in dialects, then they cause confusion, e.g. should يَوْم (yawm), which is pronounced "yōm" in most eastern dialects be rendered with a fatḥa and a sukūn or just a ḍamma? Some rules about merged consonants are helpful but you still need to know the original MSA spelling and the transliteration (pronunciation). The words can sometimes be respelled to show the actual, not the original pronunciation. I will update further as I have to go now--Anatoli T. (обсудить/вклад) 05:50, 11 September 2017 (UTC)
      • Continued: OK, if I write in Levantine Arabic "أنا مش هون" (I am not here) - all I need is the spelling and the romanisation: "ʾana muš hōn". I can't rely on automatic transliteration at all and I won't necessarily need Arabic vocalisation, which are seldom used in dialects, anyway - just for disambiguation. Even if a table shows how dialectal pronunciations differ from MSA, it's still unpredictable because it's not prescribed, so ظ () may not always be "ḍ" but "ẓ", as in standard Arabic and as you also wrote yourself on the Egyptian page, ق (q) is not necessarily "ʾ" (as قلب "ʾalb" in Egyptian) but can be "q" as well, as in القاهرة "el-qahera". --Anatoli T. (обсудить/вклад) 06:21, 11 September 2017 (UTC)
        The plan is to input the transliteration manually, so none of that will be an issue. --WikiTiki89 17:45, 11 September 2017 (UTC)
  • This is sort of like taking Spanish input and trying to produce French or Italian pronunciation. It's worth a try, but I'm not sure how well it will work on some of the historical oddities that happen in language diversification. I hope we can avoid giving automated "guesses" when we have no data- that was an issue we ran into with Ancient Greek dialects some time ago, and it raises questions about how to apply CFI to such things. Also, we have some very expert and talented people working on this, but I notice a conspicuous lack of input from native speakers- in past discussions they've provided some very helpful insights. Chuck Entz (talk) 13:29, 11 September 2017 (UTC)
    @Atitarev, Chuck Entz: I think you have misunderstood what's going on. As far as I'm concerned, there's no plan to generate pronunciation in one lect from pronunciation in another lect, and no intent to use Arabic script to generate IPA. These pages provide a standard romanisation that can be used to generate IPA for that topolect unambiguously, with a secondary goal of generating orthography (although that will need the MSA spelling as well). —Μετάknowledgediscuss/deeds 16:22, 11 September 2017 (UTC)
    @Metaknowledge, Chuck Entz No, I didn't misunderstand, just wanted to make sure what the plan is. Without the official merger of topolects, which may require a vote, the only thing we can do now is provide additional pronunciations for dialects on terms, which share the same spellings. I am not too keen to make entries for dialectal spellings when it's not clear if Arabic dialects are eventually going to be merged and what is going to happen with inflection tables, etc. @Wyang, do you feel that we highjacked your talk page? It may not be fair to expect major work from you on Arabic and non-standard Aarbic when you're just a beginner in it. --Anatoli T. (обсудить/вклад) 22:27, 11 September 2017 (UTC)
    @Atitarev Haha, no I don't mind at all. The discussion started here, plus reading the knowledgeable discuss is enjoyable. :) I feel like it would be helpful to illustrate how the proposed pronunciation template may look with certain examples, e.g. for entry ..., the format of the code is: ..., and the effect would be ... , with each dialect requiring a transcription input. Wyang (talk) 00:29, 12 September 2017 (UTC)
    Well, I started the discussion hoping to piggyback on your coding skills and success with Chinese, but I'm not really all that knowledgeable. I imagined that the template would be basically like {{arabic-dialect-pronunciation}} but intelligent: able to handle standardised romanisation and convert to IPA (and hopefully orthography as well, although that's not as pressing, especially as some dialects are entirely unwritten), able to add notes about gender or usage for each, able to represent dialects flexibly by city rather than country (and even sociolect, so we can input Jewish Baghdad, etc). —Μετάknowledgediscuss/deeds 04:42, 12 September 2017 (UTC)
    I agree with Chuck, though, in that the input of native speakers is invaluable on this, but it seems the Arabs themselves are not as interested (as we would like) in linguistically and computationally analysing their colloquial speech on Wiktionary. :) Which may mean we will need to do some serious literature and reference crunching ourselves. There seemed to have been a misunderstanding with Mahmud regarding the format of the template to be designed and the role of transcription, which have now been resolved. Overall I find your transcription scheme (using ad hoc letters such as to account for vowel variation, making use of ž) quite reasonable, Metaknowledge. I hope the project can go ahead, although I also think that we should systematically research the phonology of Egyptian Arabic before we can implement it. Finding a dictionary which transcribes the Egyptian speech reasonably faithfully would be desirable. I'm a lot freer after next week and can help out then, to the best of my ability. Wyang (talk) 11:31, 14 September 2017 (UTC)


The has three separate readings with their own definitions! I'm asking specifically about the huò reading: "水勢相激貌" It seems to say something about "power/potential" and "fierce/violent" + "appearance" (of water). I think I'm close but I'll have you clarify the meaning. Cheers! Bumm13 (talk) 08:48, 9 September 2017 (UTC)

Quite close! 水勢相激貌 can be decomposed as: 水勢 (lit. the state of water; i.e. the flow of water; the current) + 相 (each other) + 激 (to surge) + 貌 (lit. "appearance". When it is used in Chinese lexicography like this, it denotes an ideophone―a word which tries to give a vivid impression of a scene, a sound or an idea). This sense of 漷 (huo4) is only used in 泧漷 and means "tempestuous, turbulent (of a current)". Wyang (talk) 11:00, 9 September 2017 (UTC)

日本漢字音 and Middle ChineseEdit

Hello. Would it be possible to deduce the Japanese 呉音・漢音 algorithmically from Middle Chinese? I've found 漢和辞典s to disagree with each other sometimes; for example, the go'on of is just "こう" in 大漢和辞典 and 広漢和辞典 but "こう<かう" in the 1914 漢和大辞書. Another application might be this. --Dine2016 (talk) 14:55, 12 September 2017 (UTC)

It should definitely be possible. That was my original vision for {{zh-pron}} as well, i.e. that it will incorporate the Sinoxenic equivalents in JKV for each MC pronunciation; an example of the old version of the template as used on can be see at Talk:斗 (as manual jkv input into the template). What is needed though is good sources on the correspondences between 呉音・漢音 and MC. I've got a couple on Korean and Vietnamese, but I'm sure others have published their analyses of the initial and rime correspondences for Japanese and MC. The algorithm for MC > Mandarin is at Module:ltc-pron/predict, and the raw data on which this was generated was Module talk:ltc-pron/predict/raw, with the resulting reflex tables being Outcomes of Middle Chinese finals and Outcomes of Middle Chinese initials. That turned out to be a much more intricate project than I had planned... as each rime affected the outcome of initials in sometimes very subtle ways, and vice versa. Conceivably, something similar is needed to write the deduction algorithm for JKV. Each initial and rime will have different reflexes in the modern readings and conditioned, respectively, by the rime and initial (and potentially tone). Wyang (talk) 22:12, 12 September 2017 (UTC)

Etymology for Edit

Hi Frank, just a couple of concerns about the etymology of the surname 羋:

  • Could you find out what the title of the article by Yan Xuequn (CAAAL 21, 1983) is?
  • Do the archaeological findings support such an etymology? The bronze inscriptions use (OC *rneːlʔ, *niːlʔ) for this surname. Also note that Guangyun says that is the Chu word for "mother". — justin(r)leung (t...) | c=› } 05:02, 14 September 2017 (UTC)
Hi Justin. With 1) I believe it is this article:
Yan, Xuequn (严学宭) (1983), “On the Chu Nationality, Chu Dialect and Chu Sound”, Computational Analyses of Asian & African Languages, 21: 131–137.
The journal also has a Japanese name アジア・アフリカ語の計数研究, and the article's Japanese title is 楚人と楚方言と楚音, archived online here, but I don't have the authorisation to view it. It may be better to list it as apud in the etymology if it can't be accessed.
Various sources say the archaeological records universally show (OC *rneːlʔ, *niːlʔ), and that (OC *meʔ) was tongjia character by subsequent-era scholars. I haven't been able to locate good sources on the topic, only found these from the Chinese literature that's archived online, which aren't super-useful. I think to understand this in more detail may require consulting the specialised references, on the Chu manuscripts or inscriptions. Wyang (talk) 05:58, 14 September 2017 (UTC)
Thanks for your thorough response! How should apud be used? Should it be "Schuessler, 2007, apud Yan, 1983"?
Also, for some reason the download link to the zip file doesn't work. I've also found a few articles related to this (not sure if they coincide with yours), but they all don't really address (OC *rneːlʔ, *niːlʔ). — justin(r)leung (t...) | c=› } 16:00, 14 September 2017 (UTC)
I would recommend (Schuessler (2007) apud Yan (1983)). It seems the 芈 in the title 芈.zip cannot be handled by the site, so the file just becomes 'zip', lol. Here it is with the proper name. Two of the articles were also found in your search; the articles aren't really that helpful to be honest, since most of them are outdated or not relying on the latest research on Old Chinese. Wyang (talk) 22:00, 14 September 2017 (UTC)

Japanese pronunciation oddityEdit

Discussion moved to Template talk:ja-pron#日向.


Hi. Would it possible to get use the Middle Chinese pronunciation located at Module:zh/data/ltc-pron/亘? --Dine2016 (talk) 13:56, 16 September 2017 (UTC)

Yep! You can just move it across to Module:zh/data/ltc-pron/亙 if you think 亘 shouldn't have the Middle Chinese, otherwise you can copy the content to the nonexistent page. Wyang (talk) 13:59, 16 September 2017 (UTC)
Thanks. also has its own MC pronunciations "荀緣切 先平" (for xuān) and "胡官切 寒平" (for huán?), though they are not in the database as they're not in 廣韻. --Dine2016 (talk) 14:13, 16 September 2017 (UTC)
This was discussed before: Module talk:zh/data/ltc-pron/蝦. At the moment only Guangyun fanqie readings are included, since there is no phonetic reconstruction of the fanqie system from other rime books, so to an English speaker the fanqie reading itself will not be very useful. Wyang (talk) 14:19, 16 September 2017 (UTC)

I'm completely 外行 at this, but it seems that of the three readings of in Guangyun, "曉庚二開 平許庚", "滂庚三開 平撫庚", "曉陽三開 上許兩", the latter two are written with 烹 and 享 nowadays. Would it be possible/desirable to get the pronunciation sections of and duplicate and cite 亨 #2 and #3 for their MC pronunciations? ( could also cite the sole reading of 亯, which is identical with 亨 #3.) --Dine2016 (talk) 15:12, 14 October 2017 (UTC)

@Dine2016 I think that would be reasonable. 亨 was a just a written variant of 烹 and 享 in ancient times, and Guangyun recorded the pronunciations of those words on 亨. Wyang (talk) 22:29, 14 October 2017 (UTC)

POJ termsEdit

This is a following of User_talk:Wyang/Archive6#Category:Min_Nan_terms_with_IPA_pronunciation. I have modified Module:zh-see to support POJ forms. See Tiong-kok as an example. However I don't know whether this is a good idea. (The heading should also be discussed.)--2001:DA8:201:3512:3D32:5FBD:8099:19C7 12:33, 17 September 2017 (UTC)

I support reducing the amount of content on POJ pages if the character page exists. @Tooironic, Atitarev, Justinrleung, Suzukaze-c, Mar vin kaiser, Hongthay, A-cai, Dokurrat, Dine2016 and others I've missed. Wyang (talk) 12:40, 17 September 2017 (UTC)
I support too, as per my message on that talk page. It would be more palatable for the community to have an L3 header and the headword, like pinyin entries. Note that non-registered users still have issues creating simplified Chinese entries. --Anatoli T. (обсудить/вклад) 12:51, 17 September 2017 (UTC)
Also the cat name Category:Min Nan Pe̍h-ōe-jī form should be in plural - "forms". --Anatoli T. (обсудить/вклад) 12:54, 17 September 2017 (UTC)
My stance remains the same as before. —suzukaze (tc) 20:36, 17 September 2017 (UTC)
I still don't feel too comfortable with using {{zh-see}} for this. Zhuang uses {{za-sawndip form of}} for Sawndip entries, Vietnamese uses {{vi-Nom form of}} for Nom entries, Korean uses {{ko-hanja form of}} for hanja entries. In all of these cases, the template is used in the less common form of the written language, so I think it's fine to have a similar template for POJ. — justin(r)leung (t...) | c=› } 20:55, 17 September 2017 (UTC)
From the maintenance and consistency point of view, these can also use soft redirects. The count of lemmas would also show the true picture.--Anatoli T. (обсудить/вклад) 21:23, 17 September 2017 (UTC)


In this term 乾符 is simplified to 干符, I don't know how to fix it.--2001:DA8:201:3512:1D2:5209:1889:E3C 09:31, 21 September 2017 (UTC)

Fixed now. P.S. Thanks for your edits in Chinese! Lots of really interesting literary words and loanwords, and I don't even know how you found them. Please seriously consider creating an account. Wyang (talk) 09:40, 21 September 2017 (UTC)


Would 物自體, 物自体 (wùzìtǐ) be a good translation for thing-in-itself? [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 21:43, 21 September 2017 (UTC)

Yes. Wyang (talk) 21:46, 21 September 2017 (UTC)

Sorry. May you check out Mandarin translations at it never rains but it pours? [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 17:19, 24 September 2017 (UTC)

I've removed some of the translations that don't really correspond to the English proverb. Wyang (talk) 02:54, 25 September 2017 (UTC)

miscellaneous place namesEdit

What categories should we add for miscellaneous place names like 江南, 江淮, 關中, etc.? zh:geography? I note that is the case for 嶺南, but I thought zh:geography might be more terms relating to geography, not toponyms. ---> Tooironic (talk) 07:08, 23 September 2017 (UTC)

Definitely not Geography, it is an abuse of the category perpetrated by one single user. I used to think we should use Place names (which seems logical?), but recently I read the category description and such use seems to be wrong, and now I think it should be Places or its finer subcategories. Or China. —suzukaze (tc) 07:13, 23 September 2017 (UTC)
I agree. I think these should go to Category:zh:Regions of China, a subcategory of Category:zh:China (cf. the similar category of Category:en:Regions of the United States of America). Wyang (talk) 12:08, 23 September 2017 (UTC)


Was my edit wrong straightforward? Or was I just misusing {{syn}}? Please let me know. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 09:39, 25 September 2017 (UTC)

Just stay off languages you don't know, as you have been repeatedly warned about please. You are creating more trouble than benefit for editors that work on these languages. Wyang (talk) 09:41, 25 September 2017 (UTC)
Just asking, you did not tell me what exactly was mistaken. I really want to help. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 09:43, 25 September 2017 (UTC)
Please don't. Wyang (talk) 09:44, 25 September 2017 (UTC)
But why did you undo all of my edit? That was the question you did not reply to. If I do something which is not fine, at least I want to know what it is that makes it so. Sorry if I’m being so insistent. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 09:48, 25 September 2017 (UTC)
Have you observed how Chinese entries are usually formatted at all on Wiktionary? They don't use markups such as {{l|en}} in definitions, and they don't use {{syn}} to list synonyms at all- it seriously screws the formatting up. What is the root of the problem is your unfamiliarity with these languages you are editing, which means you are unable to observe how things are usually done by editors working on them. What is exacerbating the problem is the attitude that your edits are infallible, which has led to your repeated ignorance of warnings about your errors as well as the previous block. Wyang (talk) 10:02, 25 September 2017 (UTC)
Now that I can reply, why did you remove {{place}} too then? I don’t see the point of undoing an edit altogether when part of it is helpful. Then of course, now that I know, I will not add {{l}} or {{syn}} to Chinese entries again, which I was not actually warned about, as you wrote here. And I never said my edits are “infallible”. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 21:05, 3 October 2017 (UTC)
You have been warned before about your incompetence in Chinese, but you did not take it seriously, and proceeded to editing Chinese entries, assuming your edit was unproblematic and your knowledge in Chinese was sufficient. That was the reason for your block. Confucius said: "知之為知之,不知為不知,是知也。" (When you know a thing, to hold that you know it; and when you do not know a thing, to allow that you do not know it - this is knowledge.) As a consequence, the Chinese have very minimal acceptance for people who pretend and act as if they know, when they actually do not. Wyang (talk) 08:24, 4 October 2017 (UTC)

Brown hair in Chinese varietiesEdit

How could the term "brown hair" be translated into any variety of Chinese? We have 黑髮黑发 (hēifà), 白髮白发 (báifà) and 金髮金发 (jīnfà), for instance.

(Off topic, but I feel you should have look at what I requested at the Chinese version of Wiktionary concerning categories for Northern Sami.) --Lo Ximiendo (talk) 12:04, 25 September 2017 (UTC)

It would be 棕髮棕发. With the category name, 北萨摩斯语 seems wrong. I think it should be 北萨米语 or 北方萨米语. Wyang (talk) 12:07, 25 September 2017 (UTC)
Thanks, co-editor. The latter (北方薩米語/北方萨米语) is wheretoward I'm going to move the Northern Sami categories. --Lo Ximiendo (talk) 12:13, 25 September 2017 (UTC)
You can just call me Frank... Wyang (talk) 12:17, 25 September 2017 (UTC)


I have a question about this. According to dictionaries it refers to eczema or fungus infection of the hands or feet, so does that mean when we do 拔罐 what we are actually getting rid of is a skin or fungal thing? I always thought it referred to literally damp qi or something like that. ---> Tooironic (talk) 05:48, 27 September 2017 (UTC)

I can see how it might be both: fungal skin infections can be associated with dampness, so that might have given rise to the skin-infection sense based on the component words independently of the qi sense, which is presumably based on the principles of traditional Chinese medicine. Not that I know a lot about either... Chuck Entz (talk) 21:14, 27 September 2017 (UTC)
@Tooironic Yes, this reflects the "environmental factors → internal factors → disease" pathogenetic theory in traditional Chinese medicine. 濕 (dampness) is one of the six qi (六淫), which are environmental factors capable of causing human diseases, the other five being wind, cold, heat, summer heat and dryness. 濕氣 (damp qi) can mean several things: (1) dampness in the environment: i.e. oversaturation of air with water; moisture; humidity; (2) dampness in the body (濕邪): i.e. the body in a state of excessive dampness, resulting from dampness in the environment and internal imbalance. It is not necessary pathological at this stage, but it predisposes the individual to developing dampness-related illnesses. Signs and symptoms of having dampness in the body include: feelings of heaviness and lethargy, joint and muscle soreness and pain, chest tightness, reduced appetite, thick tongue coating, and bowel and bladder dysfunction. It is managed by behavioural changes (to avoid contact with humidity), dietary changes (which explains why spicy food in China is usually concentrated in humid regions e.g. Sichuan and Hunan ― this helps eliminate the bodily dampness), and therapies such as cupping; and finally (3) dampness as an evil in the body, and the various dampness-related diseases, such as eczema, hand and foot fungal infections, etc. Hope this helps! P.S. please check out Justinrleung's admin vote. Wyang (talk) 22:55, 27 September 2017 (UTC)

More JA pronunciation questionsEdit

Discussion moved to Template talk:ja-pron#ざ・ず・ぜ・ぞ.

Wyang, Eirikr, would you mind if I move these two discussions related to {{ja-pron}} to Template talk:ja-pron? I think they would find a better home there since they may prove useful to those interested in the function of the template. Nardog (talk) 06:16, 29 September 2017 (UTC)

I don't mind at all. :) Wyang (talk) 06:39, 29 September 2017 (UTC)
Done. :) Nardog (talk) 17:04, 1 October 2017 (UTC)

Wiktionary:Beer parlour/2017/AugustEdit

Hey, thanks for your explanations there. I'm not sure how to respond to Thecurran's latest message (it's really weirding me out), could you look at it? Thanks. —Aryaman (मुझसे बात करो) 17:40, 30 September 2017 (UTC)

@Aryamanarora I've replied there. Now, talking about death by verbosity... Wyang (talk) 05:29, 1 October 2017 (UTC)
lol, thanks. I think we should just remove all of the syllable entries at this point; I don't think he is one to be convinced. —Aryaman (मुझसे बात करो) 13:46, 1 October 2017 (UTC)
For that, you need to post in rfd. You can either move the whole discussion to rfd using the {{movedfrom}} and {{movedto}} templates to mark what you've done, or you can start a new discussion in rfd and just link to the Beer parlour discussion. Chuck Entz (talk) 14:59, 1 October 2017 (UTC)

Weird talk pagesEdit

Out of curiosity, how are you finding these? —suzukaze (tc) 05:08, 5 October 2017 (UTC)

Just by using the search function to get all the talk pages without signatures by registered or IP users, and without texts such as RFV, RFD, RFC, transwiki, etc. Never knew there is so much fun material hidden on Wiktionary. Wyang (talk) 05:11, 5 October 2017 (UTC)
It's fun but you're deleting it :( —suzukaze (tc) 05:37, 5 October 2017 (UTC)
I was wondering the same thing. Keep up the good work. SemperBlotto (talk) 05:40, 5 October 2017 (UTC)


Hi. Your block of this user seems a bit harsh to me: I don't think he's a troll himself, and all his French contributions are legit. I suppose he was just disagreeing with your removal of the bogus requests. --Barytonesis (talk) 10:04, 11 October 2017 (UTC)

Ah, all right. They did add a bogus request themselves though: diff, which translates to "Crabs pass through the Bible through the energy of people". Eight hours later, it was followed by a Turkish-to-English request by a brand-new user (who posted 1 min after registering), which translated to "This apple boxer is carefully passing through an electronic swimming cemetery". Wyang (talk) 10:13, 11 October 2017 (UTC)
Mmh, yes, it might be that they're doing legit work in the main space, but having fun on this specific page. So they might deserve a warning after all. --Barytonesis (talk) 10:16, 11 October 2017 (UTC)
Well, he's definitely very suspicious... There's this edit by a new user restoring the bogus requests (it's his single edit), and a minute later Bu193 is back after a day's absence. --Barytonesis (talk) 15:26, 15 October 2017 (UTC)
I have the feeling that the recent spate of nonsensical requests by Charleston/Kyiv IPs and newly created European-name accounts are the work of one or two people, and this user is one of their Doppelgänger. I've blocked them for two weeks. Wyang (talk) 07:17, 16 October 2017 (UTC)

Xiongnu and Xianbei languagesEdit

Although they are not directly attested, should WithionaryWiktionary have exceptional codes for Xiongnu and Xianbei languages? This will make display and categorization easier.--2001:DA8:201:3512:B0E8:A155:42B6:5D5D 07:26, 15 October 2017 (UTC)

I think that would be good. I added a discussion at Wiktionary:Grease pit/2017/October#Adding Xiongnu and Xianbei as etymology-only languages. Any other languages in mind? Wyang (talk) 08:08, 15 October 2017 (UTC)

The etymology of Korean subject particle (ga)Edit

I'm curious if you have any insight into this edit. If the particle is indeed attestable prior to the Japanese invasion of the late 1500s, perhaps this is cognate rather than borrowed? ‑‑ Eiríkr Útlendi │Tala við mig 17:44, 16 October 2017 (UTC)

@Eirikr Writing it as "unknown" would be more accurate IMO.
"The first use of this particle in 1572" was probably in reference to 洪允杓's 主格語尾 「-가」에 대하여 (1975), which says this particle "was not used in texts from the 15th century" and "its first attestation was in 1572 in a Hangul text".
It's earlier than the Japanese invasion, but not much earlier, and the earliest usages seemed to be limited and uncommon. There are at least four different theories in the literature regarding the etymology of this, one of them being the Japanese loan theory, e.g. in 鄭光's "主格 「가」의 發達에 對하여" (1968).
In pp. 412-413 of 홍윤표's 近代國語硏究 (1994), secondarily cited by 고광모's 주격조사 ‘-가’의 발달 (2013), the first use of the particle 가 was placed at mid-17th century, and the historical development for the usage is as follows:
  1. Since the mid-17th century: used after nouns ending in -i or -y,
    e.g. pwuli-ka ("mouth"), nay-ka ("scent"), poy-ka ("boat");
  2. Since the mid-18th century: used after nouns ending in vowels/semivowels other than -i,
    e.g. ca-ka ("one who"), soyngswo-ka, nwongso-ka ("farm work");
  3. During the end of the 18th century: used briefly in the form of double particle -i/yka after nouns ending in vowels/semivowels other than -i,
    e.g. to-yka ("road"), inkwu-yka ("population"), nwongso-yka ("farm work").
So the origin of this particle is still unsettled, but quite interesting. Wyang (talk) 11:35, 17 October 2017 (UTC)
Good stuff, thank you. I'm curious, if it's not too much trouble, if you could add the four theories you know of to the entry? And I'm also curious, what are your own thoughts? ‑‑ Eiríkr Útlendi │Tala við mig 22:20, 17 October 2017 (UTC)
@Eirikr Sure, the etymology has been expanded to explain the various theories. I've had to read some of the publications... my impression is that this is probably multifactorial. -ga was probably there all along, well before the Japanese invasion, as some kind of emphatic, or interrogative particle. The awkwardness of the -i nominative after vowel-ending stems, especially -i/y nouns, brought on the use of the -ga in late 16th century. In Middle Korean, when nominative -i was attached to nouns ending in -i/y, there was no change to the noun ending, with the nominative case merely manifesting itself as a tonal difference on the preceding vowel. Thus, as the Middle Korean tones were gradually lost in the transition to Modern Korean, the -ga emphatic bloomed as a way to compensate for the loss of case marking in these nouns. The ‘vowel + vowel’ sequences are generally not stable in Korean, prone to coalescence, and in the case of ‘non -i/y ending noun + -i nominative’ sequences, such fusion often lead to monophthongisation and the disappearance of a tangible nominative particle. The spread of the -ga nominative to non -i/y ending nouns was facilitated by this vowel sequence stability. Whether Japanese -ga had a role in the development of the phonologically conditioned allomorph, I don't know. It may have accelerated the popularisation of -ga, but I think it is unlikely to have been the major factor since case particles are much harder to borrow. The biggest pulling force is from Korean itself, likely the instability and loss of the -i nominative marking after -i/y ending nouns, and vowel-ending nouns as a whole. I'm not exactly sure what the pushing force is, but it looks to be much less influential in the process.</imagination> Wyang (talk) 10:50, 18 October 2017 (UTC)
That's very interesting and most helpful, thank you. From what you describe, it does indeed look like Japanese has very little to do with things here.
I noticed in your recent edit to 가 that one of the theories states "Developed from the interrogative particle (-ga)", but there is nothing on that page about the interrogative particle. Looking at interrogative verb forms in the past, I had naively assumed that the interrogative was (kka), but there's no such entry for that either.
At the risk of further grasping at straws :), I don't suppose there's any chance of a cognate relationship between the Korean interrogative and Japanese ? ‑‑ Eiríkr Útlendi │Tala við mig 18:44, 18 October 2017 (UTC)
@Eirikr Middle Korean (-ga) was an interrogative suffix occurring in general questions, the same as the ga in the modern interrogative suffix ᆫ가 (-n-ga). My brief search did not seem to yield much useful on the etymology of (-kka), but I would imagine that this is from Middle Korean (-ga), as the tense consonants were typically of secondary cluster origin when traced back. Korean interrogative -ka and Japanese interrogative and genitive > nominative are probably related as a proto-interrogative particle, so this isn't grasping at straws. :) Some references are Vovin (2010), Robbeets (2005), Francis-Ratte (2016, pp. 295). Wyang (talk) 08:10, 19 October 2017 (UTC)


I wanted to use "tl2=y" for 上, but I didn't get the output that I wanted. Could you please help me with this entry if you have time? Dokurrat (talk) 23:29, 17 October 2017 (UTC)

We should probably make the tl parameter more flexible, just like er, but MOD:cmn-pron is just a beast to work with. — justin(r)leung (t...) | c=› } 00:36, 18 October 2017 (UTC)
Exactly. I would rewrite the function (or even the whole module) completely; it's doing my head in. But judging the amount of recent changes to Chinese entries yet to be checked, this is probably a task for next year... Wyang (talk) 07:39, 18 October 2017 (UTC)

Some Arabic tracking categoriesEdit

Hi Frank,

Are you available to do some work with Arabic modules? I wonder how hard would it be to add some tracking categories?

  1. First of all - wrong letters. Often, even native speakers use Persian letters, which are never used in Arabic (Persian would also use the reverse category). That way many copypasta problems will be discovered.
  2. Rare letters - special letters, which are acceptable but seldom used or only used in specific dialects, especially Maghrebi or some letters borrowed from Persian and other languages.
  3. I would also like to add some Arabic terms, which use sounds (IPA or transliterations), which are not part of Classical Arabic.

I'll give you the details later - I've gotta go now, just let me know if you're interested and it's not too hard and you may consider it in the next days or weeks. I'll try WT:GP otherwise. --Anatoli T. (обсудить/вклад) 07:03, 18 October 2017 (UTC)

Hi Anatoli. I'm happy to help out. Some of these tasks can be done using data from the dump (such as retrieving a list of Arabic lemmas using non-Arabic letters), and others can be achieved by regular or tracking categories. Wyang (talk) 07:35, 18 October 2017 (UTC)
You just need a string of all the valid characters for Arabic, then put it under standardChars in Module:languages/data2 (see Greek / English for examples). DTLHS (talk) 07:41, 18 October 2017 (UTC)
Done; this should include most characters. To @Atitarev and others, see here for a full list of the characters. — Eru·tuon 08:46, 18 October 2017 (UTC)
Thank you all! Sorry, I am getting very busy. @Erutuon: Thanks for this but what is the impact? Does it allow to create tracking categories?
Diacritics (listing most common) ـَ‎, ـُ‎, ـِ‎, ـّ‎, ـً‎, ـٌ‎, ـٍ‎, ـٰ‎ shouldn't be part of the title but can only be used in the headword, just like accents. Not sure if your range above covers ـٰ‎. اللّٰه (allāh)‎ is, of course, an exception, it contains a šadda and an ʾalif ḵanjariyya.
Common errors in the title include Persian letters ی‎ and ک‎, which identical to Arabic letters ي‎ (initial or middle position) and ى‎ (final position) and ك‎ (initial or middle position). I'd like to track those as errors, so that patrollers could pick them up.
Rare and regional letters are those that appear in Wiktionary:About_Arabic#Rare_letters and Wiktionary:About_Arabic#Regional_letters. Ideally, we should track their usage for all Arabic entries, including MSA (ar) or dialects. E.g. مݣانة (magana). These should be tracked as "Arabic terms spelled with ..." categories. Some may need checking and verifications.
For phonetic tracking, beneficial would be to have categories for all MSA (ar) terms having symbols o, ō, e, ē, č, ž, g and v in the transliteration, which are not part of Classical Arabic. E.g. نِيتْرُوجِين (nitrōžēn or nitrojīn). Perhaps, "irregular pronunciation categories"? --Anatoli T. (обсудить/вклад) 07:02, 19 October 2017 (UTC)
@Atitarev: Adding the standard characters pattern makes Module:headword automatically add categories for any characters not matched by the pattern. (Search "spelled with" in the module code to find the place where the category is added.) I guess some or many of the categories in English terms by their individual characters are populated this way. (Categories for numbers and basic punctuation must be populated another way. They are included in the standard characters pattern for English.)
It's true that diacritics shouldn't appear in the title, but it may still be best to include them in the standard characters field, to avoid putting entries on diacritics, like ـَ (-a), in "spelled with" categories. I would suggest Module:ar-headword as the place to add an error message for titles containing diacritics, and the other characters that you mention.
The standard characters for Arabic currently are ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي◌ً◌ٌ◌ٍ◌َ◌ُ◌ِ◌ّ◌ْٰٱ۰۱۲۳۴۵۶۷۸۹،؛؟‏, plus the regular punctuation characters. So titles with the regional letters and Persian letters should be being categorized now. — Eru·tuon 08:32, 19 October 2017 (UTC)
@Erutuon: Thanks, I'll try to makes sense of it but you seem to have included Persian numbers. Eastern Arabic numerals are ٠١٢٣٤٥٦٧٨٩ (0123456789), not ۰۱۲۳۴۵۶۷۸۹. --Anatoli T. (обсудить/вклад) 08:43, 19 October 2017 (UTC)
@Atitarev: Oops, my mistake. Fixed. I saw they looked wrong, but somehow didn't figure out that they were the wrong codepoints. — Eru·tuon 09:08, 19 October 2017 (UTC)
Great, thanks! --Anatoli T. (обсудить/вклад) 10:43, 19 October 2017 (UTC)
For now, here is a list of entries in Category:Arabic lemmas (or its subcategories) containing characters other than the ones above: اخیرا, بَلٰی, تصویر, تقدیر, توجیه, تکرار, حسین, حمید, رویا, سیب, شیطانه, صوفیا, طبیب, طیف, عقیده, عیار, غلیظ, فرامین, محاصره‌, مرضیه, مهدی, هـ, وضعیت, پاول, کاظم, کتاب, کمیل, یارا. Some of them involve a miscategorisation by the {{ar-root}} template, and hence need |nocat=1.
Entries containing diacritics in their title: أبداً, جمهوريّة جزر فيجي, حتماً, كَلْب,‏. Wyang (talk) 09:11, 19 October 2017 (UTC)
Thanks, Frank but I see mostly if not all Persian entries there. Persian character set is slightly different. Thanks for the diacritic entries. --Anatoli T. (обсудить/вклад) 10:43, 19 October 2017 (UTC)
I have cleaned all the entries in the above list (some I had to mark for imminent deletion). But note that هـ is a correct entry. It is written this way in Arabic (with taṭwīl and without any dot, not the isolated form) in running text, and such may appear in other cases too, shewing that the taṭwīl should be one of those standard characters too. Palaestrator verborum (loquier) 11:46, 19 October 2017 (UTC)
Note also ٫ U+066B ARABIC DECIMAL SEPARATOR and ٬ U+066C ARABIC THOUSANDS SEPARATOR as standard punctuation signs, if we already include the Arabic digits. Palaestrator verborum (loquier) 11:51, 19 October 2017 (UTC)
Return to the user page of "Wyang".