Wiktionary:Beer parlour/2018/September

News from French Wiktionary edit

Hello!

It's a pleasure to invite you to read the August issue of Wiktionary Actualités translated in English!

August Actualités arrived with brand new stats! Not only about French Wiktionary but about all Wiktionaries thanks to a great new tool! There is also an article about dictionaries of law in French, some shorts news and nice pictures from a campaign about remote but beautiful lands.

This issue was written by eight people and was translated for you by Pamputt and me. This translation can still be improved by readers (wiki-spirit). We hope you'll enjoy this reading and we'll be happy to discuss about it if you have any questions Noé 12:32, 1 September 2018 (UTC)[reply]

Cleaning up citations edit

I've been noticing a plethora of grossly incomplete, misformatted citations such as the following at [[lot]]:

C-3PO

"We seem to be made to suffer. It's our lot in life." in Star Wars Episode IV: A New Hope.

What's missing are the name of the writer (not the fictional speaker), the date, a link to the script or some other way of confirming that these words were part of the movie and showing the context, etc.

We don't have a good template for requesting cleanup of such things. A page like [[lot]] might have a dozen or more problems with citations. I've been using {{rfdate}} to attract attention to such citations, but "date" has been interpreted hundreds of times as the birth and death dates of the author. I've been adding a comment so that when someone wants to add the date what appears in the edit window is "rfdate|and other bibliographic particulars". I've started adding a longer comment: "rfdate|and other bibliographic particulars, eg, title of work, page, url, full name of author"

{{quote-book}} can also be useful, as inserting "| title= |" generates a visible request to supply the missing title.

But these tools don't address the general case of missing citation information. And it is not reasonable to expect all of the citation problems on a page to be corrected by the contributor who happens to notice them. Often it is efficient to correct instances of bad citations of a single author across multiple entries rather than clean up an entire entry.

The problem is not tiny. Already more than 1,400 pages use {{rfdate}} and an unknown additional number of entries should have it. And there are many entries that have badly formatted citations

How should we organize cleanup of citations on such entries? A single category called something like Category:English pages with citations problems ("pages" to recognize inclusion of pages in Citations namespace) into which multiple templates categorized might be sufficient categorization. Perhaps a single additional template to be inserted into a somehow wrong citation in that placed the entry into such a category would be useful. Its first unnamed parameter could be a comment on what seemed wrong. DCDuring (talk) 15:46, 2 September 2018 (UTC)[reply]

We can probably run a dump analysis to make a list of all dodgy citations and take it from there - one thing we are good at is making cleanup pages. --XY3999 (talk) 09:01, 9 September 2018 (UTC)[reply]

Malay as an ISO 639 macrolanguage edit

According to the IANA language subtag registry [1], various subtags such as the following have been listed as belonging to the macrolanguage "ms" (Malay language):

"bjn" (Banjar)
"btj" (Bacanese Malay)
"bve" (Berau Malay)
"bvu" (Bukit Malay)
"coa" (Cocos Island Malay)
"dup" (Duano)
"hji" (Haji)
"id" (Indonesian)
"jak" (Jakun)
"jax" (Jambi Malay)
"kvb" (Kubu)
"kvr" (Kerinci)
"kxd" (Brunei)
"lce" (Loncong/Sekak)
"lcf" (Lubu)
"max" (North Moluccan Malay)
"meo" (Kedah Malay)
"mfa" (Pattani Malay)
"mfb" (Bangka)
"min" (Minangkabau)
"mqg" (Kota Bangun Kutai Malay)
"msi" (Sabah Malay)
"mui" (Musi)
"orn" (Orang Kanaq)
"ors" (Orang Seletar)
"pel" (Pekal)
"pse" (Central Malay)
"urk" {Urak Lawoi')
"tmw" (Temuan)
"vkk" (Kaur)
"vkt" (Tenggarong Kutai Malay)
"xmm" (Manado Malay)
"zlm" (Malay (individual language))
"zmi" (Negeri Sembilan Malay)
"zsm" (Standard Malay)

In addition, language subtags for several Malay trade and creole languages are available but not listed under the macrolanguage "ms" (Malay language):

"abs" (Ambonese Malay)
"bpq" (Banda Malay)
"ccm" (Malaccan Creole Malay)
"lrt" (Larantuka Malay)
"mbf" (Baba Malay)
"mfp" (Makassar Malay)
"mhp" (Balinese Malay)
"mkn" (Kupang Malay)
"pmy" (Papuan Malay)
"sci" (Sri Lankan Creole Malay)

In view of the fact that the "ms" language tag refers to the Malay language, which is a dialect continuum used in the Malay archipelago (including parts of Thailand, Singapore, Brunei and Indonesia) rather than the Malaysian language ("zsm" language tag) or the Indonesian language ("id" language tag), are there any plans to create a new section for the Malaysian language using the zsm language tag, as well as new sections for the other languages? See ISO 639 macrolanguage for further reading. KevinUp (talk) 10:59, 3 September 2018 (UTC)[reply]

Whenever the Ethnologue/SIL/ISO has assigned separate codes for "[Language]" and "Standard [Language]", we (always?) use only the former code and treat the latter as redundant; this is also the case with Malay. Based on prior discussions recorded at Wiktionary:Language treatment, the redundant codes zsm and zlm are subsumed under ms, as are jak, orn, ors and tmw. The status of the remaining ones is unclear. There has been discussion in the past of merging some or all of the dialects, although a vote that proposed merging even the standardized register of Indonesian with Malay/ms failed (although it had relatively limited participation and was years ago, so new discussion is fine). It seems likely that several more of the dialects listed above should be merged (one would need to evaluate their level of mutual intelligibility, of course). - -sche (discuss) 17:50, 3 September 2018 (UTC)[reply]

That's a bit unusual. jak, orn, ors, tmw should not be merged with ms. These are Proto-Malay languages spoken by the indigenous people (Orang Asli) in peninsular Malaysia. The four languages are mutually intelligible among the Orang Asli but differ significantly with standard Malay in terms of vocabulary. In my opinion, the various subtag languages identified by the Ethnologue/SIL/ISO for the Malay language are well defined. Each language has certain vocabulary words that are mutually exclusive from other regional variants of Malay. Only zsm and zlm are redundant language tags. Where mutual intelligibility is concerned, the differences between these languages are similar to that of Danish vs. Norwegian, or Pennsylvania German vs. standard German, and not as similar as the differences between British, American and Australian English. KevinUp (talk) 21:25, 3 September 2018 (UTC)[reply]

On the other hand, I don't think Indonesian and standard Malay (used in Malaysia/Singapore/Brunei) are unifiable. Both languages have considerable differences in terms of spelling, grammar, pronunciation and vocabulary. I would suggest having a unified Malay section with a unified etymology section followed by Indonesian, standard Malay, and other regional variants below it, in line with the ISO 639 macrolanguage definition. Are there any other languages using this format? KevinUp (talk) 21:25, 3 September 2018 (UTC)[reply]

Malayan languages vs. Malay language edit

AFAIK Pattani Malay (of Thailand) at least does not write in Latin script; it regularly uses Thai and localised-Arabic scripts resulting unable to share the same entry. Pattani Malay and Malaysian Malay are also very different in reading and grammar, so they could not communicate each other. --Octahedron80 (talk) 02:20, 6 September 2018 (UTC)[reply]

Yes well, even though Pattani Malay uses the Thai script, the IPA pronunciation of its vocabulary is the same as that of Kelantanese or Kelantan Malay (in northern Malaysia) written using the Latin script. Unfortunately, we can't merge Pattani Malay and Kelantan Malay due to script differences. Also, we do not yet have Kelantan Malay entries (but it can be created). Note that both varieties have the same article (Kelantan-Pattani Malay) on Wikipedia. Are there any other languages on Wiktionary that has this kind of situation (same IPA pronunciation, different script)? Also, we may need to discuss whether Malayan languages can be considered the same as the Malay language on Wiktionary. Both terms may have been confused because native speakers of Malayan languages (ie. non-standard varieties of Malay) are also taught the standardized version of the language (bahasa Melayu) in schools. KevinUp (talk) 07:01, 6 September 2018 (UTC)[reply]

FYI Pattani and Kelantan and that region were in the same country until Britain cut them in half, that's why. And Malaysia began to use Latin script since then. --Octahedron80 (talk) 16:34, 6 September 2018 (UTC)[reply]

Malay native speaker here. I think Malay and Indonesian should be separated. We could separate it by having Malay entries in Jawi (localised Arabic script) and Indonesian entries written in Roman script. Like Hindi and Urdu. --Tofeiku (talk) 11:17, 6 September 2018 (UTC)[reply]

Seems like there is also the issue of Malay entries with the same content but using different scripts. This also needs to be sorted out. Compare -nya and -ڽ, bait and بيت for example. KevinUp (talk) 18:43, 6 September 2018 (UTC)[reply]

I already have solution for (standard) Malay: making Latin as primary and Jawi as secondary -- by putting the ms-jawi template instead of copying them around -- because it is easier to search by A-Z. I have been doing this for some time. For Pattani Malay, I make Jawi as primary because no Latin. By the way, Pattani's Jawi recently has new orthography against (standard) Malay's. --Octahedron80 (talk) 02:48, 7 September 2018 (UTC)[reply]

Any source about the new orthography that I could read online? ^^ And also, IIRC, Pattani Malay is kind of a dialect like any state dialects in Malaysia. The Malay community in Thailand uses Jawi script and Standard Malay since they alao have their own Dewan Bahasa dan Pustaka. It's like Brunei I guess, they have a Bruneian dialect but for formality they use Standard Malay. --Tofeiku (talk) 14:54, 7 September 2018 (UTC)[reply]

@Tofeiku User Talk:Octahedron80#Etymologies of Pattani Malay Terms --Octahedron80 (talk) 02:25, 8 September 2018 (UTC)[reply]

@Octahedron80: Good solution. There are now 162 entries under Category:Malay terms with Jawi spelling. Entries using Jawi spelling can be linked back to equivalent entries using the Latin script via the {{ms-jawi}} template. On the other hand, although Pattani Malay and standard Malay are significantly different from one another, Pattani Malay (southern Thailand) and Kelantan Malay (northeastern state of Malaysia) are equivalent forms. The two regions were cut off by the British as mentioned before, resulting in Pattani Malay using the Thai script and Kelantan Malay using the Latin script (in informal writings, eg. diaries and letters). Currently, Pattani Malay entries using the Jawi script ([2]) seem to be equivalent to that of Kelantan Malay. Unfortunately, we do not have editors that are proficient in Kelantan Malay to create entries for Kelantan Malay using the Latin script. In future, Pattani Malay entries using Jawi script instead of Thai script may need to be renamed as Kelantan-Pattani Malay to avoid ambiguity. KevinUp (talk) 15:56, 7 September 2018 (UTC)[reply]

@Sgconlaw I think you might be interested in mbf (Baba Malay), also known as Peranakan Malay. The language is almost extinct among the younger generation, but plenty of resources can be found at the National Library of Singapore.

Are there examples of languages that concurrently use two different type of scripts due to the separation of geographical borders? How shall we deal with this type of situation? KevinUp (talk) 18:43, 6 September 2018 (UTC)[reply]

@KevinUp: Of, course, there are few. Malay itself, just have a look at Malay lemmas. E.g. اءور and aur are two forms of the same word, Serbo-Croatian is written in Cyrillic and Roman. --Anatoli T. ^{(обсудить}/^вклад) 02:58, 7 September 2018 (UTC)[reply]

Romanised Malay itself has a lot of homographs unlike Jawi Malay. For example, bait: بيت (“house”) and باءيت (“byte”), bala: بلاء (“disaster”) and بالا (“troop”). I see Chinese Traditional character is used over Simplified although most people use Simplified characters. Why not use Jawi Malay as well here? In Brunei and Pattani, Thailand, Jawi can be seen everywhere. Malaysia also still use Jawi but not as much like those 2 places. --Tofeiku (talk) 03:15, 7 September 2018 (UTC)[reply]

@Tofeiku We can just separate etymology sections like other languages do; no problem. --Octahedron80 (talk) 03:19, 7 September 2018 (UTC)[reply]

The terms بيت and باءيت (romanized as bait) as well as بلاء and بالا (romanized as bala) do not share the same etymology. When such situations are encountered, it is best to split the romanized Malay entry into separate etymology sections. The |gloss= parameter can then be added to {{ms-jawi}} to link the Jawi entries to its related meaning in romanized Malay. As for Chinese characters, one of the reasons why simplified Chinese characters are linked back to traditional Chinese characters (despite simplified Chinese being used more often) is because of the approach taken by various Chinese dictionaries published in mainland China such as Hanyu Da Zidian, Hanyu Da Cidian and Zhonghua Zihai which refer back to traditional characters for headwords involving simplified characters. This is done because some simplified Chinese characters such as 发 are derived from two different traditional characters: 發／发 (fā, “to issue; to develop”) and 髮／发 (fà, “hair”) KevinUp (talk) 15:56, 7 September 2018 (UTC)[reply]

On an unrelated note, I noticed that Serbo-Croatian more and море have almost identical layout and content. I'm not sure whether that was intended or not. KevinUp (talk) 15:56, 7 September 2018 (UTC)[reply]

Statistics for Malay/Indonesian entries edit

Hi. Would it be possible for someone to figure out the number of Malay and Indonesian entries that are currently available? In addition, I would like to know the number of entries which has both Malay and Indonesian sections it it. Thank you very much. KevinUp (talk) 14:36, 5 September 2018 (UTC)[reply]

@KevinUp: You can find entries with both a Malay and an Indonesian header by searching : insource:/==Malay==/ insource:/==Indonesian==/. Currently there are 1,033 of them. — Eru·tuon 02:11, 6 September 2018 (UTC)[reply]

@KevinUp: The other question about the number of entries (lemmas and non-lemmas) are in Category:Malay lemmas and Category:Indonesian lemmas and Category:Malay non-lemma forms and Category:Indonesian non-lemma forms. --Anatoli T. ^{(обсудить}/^вклад) 02:16, 6 September 2018 (UTC)[reply]

Thank you. The statistics for current entries of Malayan languages (as of Sept 2018) are as follows: KevinUp (talk) 07:01, 6 September 2018 (UTC)[reply]

standard Malay: 3978
Indonesian: 2984
standard Malay and Indonesian: 1033
standard Malay using Jawi script: 162
Minangkabau: 68
Kedah Malay: 22
Pattani Malay: 18
Brunei Malay: 2

Possible outcomes edit

A possible outcome of this entire discussion is to rename the current Malay section as "standard Malay" so as not to confuse the dialect continuum with that of the standard variety used in Brunei, Singapore and Malaysia. KevinUp (talk) 07:01, 6 September 2018 (UTC)[reply]

@KevinUp: It is an option but it's still a significant change and needs to go through a vote. --Anatoli T. ^{(обсудить}/^вклад) 07:14, 6 September 2018 (UTC)[reply]

@Atitarev: Indeed. That is up to the community to decide. KevinUp (talk)

FWIW I doubt that will happen; we tend not to use "Standard X" names; "Malaysian" might work, though. A Chinese-style merger could work if there were enough editors knowledgeable of Malay interested in implementing and maintaining it, but if there is opposition to merging Malaysian and Indonesian, that'd seem to be a roadblock. FWIW script differences are not inherently an impediment to merging things under one L2; Serbo-Croatian is written in multiple scripts, even e.g. Afrikaans has some entries in Arabic script; for that matter, some varieties of Chinese are written in Arabic or Cyrillic. - -sche (discuss) 18:45, 6 September 2018 (UTC)[reply]

For naming “zsm” (Bahasa Malaysia) I think “Malaysian” works better than “standard Malay”. The family “ms” of Malayan languages can be named “Malayan” instead of “Malay“. The term “Malay”, without further qualifications, could then be reserved for the (many) cases in which Bahasa Malaysia and Bahasa Indonesia agree. --Lambiam 19:17, 6 September 2018 (UTC)[reply]

I'm sticking to calling it (Standard) Malay. This language is not exclusively for Malaysia only. Brunei Darussalam and Singapore call their national language bahasa Melayu (Malay language). --Tofeiku (talk) 03:00, 7 September 2018 (UTC)[reply]

Standard Malay is the description found in the IANA language subtag registry [3] for the "zsm" language code. The use of the "zsm" language code has remained largely obsolete due to "ms" being the preferred code before it was redefined as a macrolanguage in ISO 639-3 to refer to the dialect continuum used in the Malay archipelago. Although "Malaysian" might work, native speakers in Brunei and Singapore as well as Malay speakers living overseas might oppose to it. KevinUp (talk) 19:32, 7 September 2018 (UTC)[reply]

As for naming the "ms" family of languages as Malayan rather than Malay, I think some users may confuse Malayan with that pertaining to Malaya, the region formerly ruled by the British that includes peninsular Malaysia (also known as west Malaysia) and Singapore. I suggest we treat Malay to be the same as "bahasa Melayu". Malay can be used as an umbrella term to refer to any of the regional variants used in the Malay archipelago. Note that regional forms such as Brunei Malay and Pattani Malay are translated as "bahasa Melayu Brunei" and "bahasa Melayu Pattani" in the Malay language. We can still use Malay to refer to the many cases where both standard Malay and Indonesian are similar, but I don't think we need to reserve the term for that purpose, because Malay or bahasa Melayu is just a general term that can refer to any of its varieties. For more examples, see Talk:bahasa Melayu. KevinUp (talk) 19:32, 7 September 2018 (UTC)[reply]

As for the possibility of unifying Indonesian and standard Malay, I would like to reiterate that the differences between the two languages are akin to that of Danish and Norwegian. Due to Dutch influence and the influence of the Javanese language, the Indonesian language has a wider source of loanwords and has evolved considerably compared to standard Malay ("bahasa Melayu piawai" or "bahasa Melayu baku") which is based on the language used during the time of the Riau-Lingga Sultanate that had a flourishing literary culture. KevinUp (talk) 19:32, 7 September 2018 (UTC)[reply]

Kindly peruse the following pages for further reading:

History of the Malay language#Modern Malay (20th century)
Comparison of Standard Malay and Indonesian
Perbedaan antara bahasa Melayu Baku dan bahasa Indonesia (Indonesian Wikipedia)
Perbezaan antara Bahasa Melayu Piawai dan Bahasa Indonesia (Malay Wikipedia)
Abstand and ausbau languages

Support merger under a "Malay" heading, as per the comments in previous discussions. The differences can be handled by labels, lists of synonyms, etc. Wyang (talk) 08:15, 12 September 2018 (UTC)[reply]

Regarding Malay vs Indonesian edit

Discussion moved from User talk:Lingo Bingo Dingo.

I don't think the merge is going to happen soon. The vote has failed twice in 2012 and 2016. The most recent discussion can be found here: Wiktionary:Beer parlour/2018/September#Possible outcomes. Although both languages have a largely identical vocabulary, there are many intricate differences between the two languages, particularly in usage and context. On the other hand, some terms such as akuarium have the same spelling but slightly different etymologies. Note that Malay (as a dialect continuum that includes Indonesian) is to be distinguished from standard Malay (used in Brunei, Malaysia and Singapore) as the latter form is based on the standardized form used during the time of the Riau-Lingga Sultanate that does not include loanwords from Dutch or Javanese. In the case of tempé, it is unlikely for the word to have been borrowed from standard Malay (See etymology section of English tempeh). The language code ms (Malay as a dialect continuum that includes Indonesian) should be distinguished from zsm (standard Malay) but discussion into it remains inconclusive. For now, "ms" is assumed to be the same as "zsm" and should not be confused with "id" (which has some interaction with Dutch but not so much with English). See etymology section of akuarium for example. KevinUp (talk) 15:47, 25 September 2018 (UTC)[reply]

@KevinUp Okay, since you obviously have a better sense of the discussion, feel free to revert my edit. The last time it came up there seemed to be a lot more support for a merge, but I think that far fewer native/proficient speakers were involved. If the word comes from Javanese an Indonesian etymology is indeed vastly more likely. ~~←₰-→~~ Lingo ^Bingo _Dingo (talk) 06:57, 26 September 2018 (UTC)[reply]

Thanks for the understanding. If possible, it would be better to mark the etymology of Dutch words that were borrowed from Indonesian Malay during colonial times such as loempia using the language code "id" rather than "ms". KevinUp (talk) 07:18, 26 September 2018 (UTC)[reply]

@KevinUp Okay, though couldn't some early borrowings be from other varieties of Malaysian? There have also been recent cases of Indonesian etymologies being changed to Malay, such as toko (where the Indonesian entry was converted to Malay). You might perhaps want to do a check of Dutch terms derived from Malay some time. ~~←₰-→~~ Lingo ^Bingo _Dingo (talk) 07:32, 26 September 2018 (UTC)[reply]

Since the Dutch did colonize Malacca (modern day state in Malaysia) from 1641 to 1825, it is possible for some Dutch words such as pisang to be derived from the Malay language that was used in Malacca during that time rather than Indonesian Malay. I hope Dutch editors can look up words such as these in historical Dutch dictionaries to determine its etymology. As for Dutch toko, further discussion will have to be made at the Beer Parlour regarding what constitutes Malay and what constitutes Indonesian. Technically, toko is not of Malay origin but borrowed from Min Nan spoken by Chinese people of Hoklo descent that have migrated to the Dutch East Indies (modern day Indonesia), so I would prefer for the Dutch etymology to be Indonesian (Indonesian variety of Malay) rather than Malay (assumed to be the standard register spoken in Brunei, Singapore and Malaysia). @Mar vin kaiser, any thoughts regarding this? KevinUp (talk) 08:48, 26 September 2018 (UTC)[reply]

@KevinUp: In the Taiwanese aboriginal language entries I create, since a lot of them are loanwords from Taiwanese (a dialect of the Hokkien language), I still write these entries as borrowed from Hokkien, not Taiwanese (one of its dialects). The specific dialect from which it was borrowed can be written in the etymology, but as for the listed origin language, I think the actual recognized language should be listed. In words like toko#Dutch and pisang#Dutch, these Dutch words were borrowed most likely before the formation of the Indonesian state, and therefore before the creation of the Indonesian language (basically Indonesian Malay) and its official status in the new country. Therefore, technically, saying that it was borrowed from Indonesian (as a language) is sort of anachronistic as it did not exist yet, or if it did, only as a dialect of the Malay language. Technically, modern Indonesian (Bahasa Indonesia) is a formal variety of the Malay language, just like modern Malaysian (Bahasa Malaysia). I think how it is written right now in toko#Dutch is OK, showing that it was borrowed from the Indonesian Malay, while marking it as Malay. --Mar vin kaiser (talk) 08:09, 7 October 2018 (UTC)[reply]

Regarding the merger, I think what we concluded from earlier discussions is to first maintain the Indonesian entries but for we can create entries preferably under the Malay heading (for those that Indonesian and Malaysian have in common), and for specifically Malaysian and specifically Indonesian entries to be labelled in the entry, but still under the Malay heading, just like what I did in toko. --Mar vin kaiser (talk) 08:12, 7 October 2018 (UTC)[reply]

Just to add, I think the failure of the past mergers is the lack of Wiktionary editors for Malay (or Indonesian). My personal opinion is that it's a waste of space if you make separate entries for Malay and Indonesian, when a large majority of their vocabulary is identical. And merging the entries can result in explanations of language usage differences in one entry, which is insightful for language learners, similar to the benefits of the Chinese languages merger (and in that case, a merger of different languages, while in this case, a possible merger of technically the same language). --Mar vin kaiser (talk) 08:17, 7 October 2018 (UTC)[reply]

I think that a distinction should be made between the language codes "ms", "zsm" and "id" (See first paragraph of this discussion). Vocabulary in standard Malay and Indonesian that have the same meaning and spelling (eg. pisang (“banana”)) were mostly inherited from Classical Malay (14th to 18th century). If possible, I think it would be better to create a new language section for Classical Malay (similar to what we have for Middle English and Middle French) to deal with lemmas that exist in both standard Malay and Indonesian (for the benefit of language learners). However, certain terms such as toko (“shop”), which can be found in both standard Malay and Indonesian, are not inherited from Classical Malay, unlike the term pisang (“banana”). I think it would be better to split toko to Indonesian and standard Malay as this term is common in Indonesian but is relatively uncommon in standard Malay due to kedai (which also exists in both Indonesian and standard Malay) being the preferred term in standard Malay. Note that both toko and kedai are borrowed terms that are not found in Classical Malay. The equivalent term in Classical Malay is actually warung. The etymology of Dutch toko is not incorrect, but the usage of the "ms" language code means that the term is derived from standard Malay, which is incorrect. One possible solution would be to use the undefined "zlm" language code (Malay (individual language)) for entries in Indonesian Malay that first came to existence during the time of Dutch colonization up until the formation of the modern Indonesian language. Another similar term is Dutch Japan, derived from the Indonesian Malay term Jepang which is borrowed from sinitic 日本 (Ji̍t-pńg) spoken by the Min Nan settlers in the Dutch East Indies. Note that Jepang does not exist in standard Malay. Jepun is used instead, itself borrowed from sinitic 日本 (Ji̍t-pún). Current Wiktionary entries for both terms (Jepun and Jepang) need to be cleaned up. KevinUp (talk) 15:20, 7 October 2018 (UTC)[reply]

Currently, "ms" (Malay as dialect continuum) and "zsm" (standard Malay) is assumed to be the same. However, I don't think it is a good idea to further merge "ms"/"zsm" (assumed to be the standard register spoken in Brunei/Malaysia/Singapore) with "id" (Indonesian), because many words that exist in both languages have different usage and context in both languages. I would like to reiterate that words that exist in both languages with the same meaning are derived from Classical Malay, and I think it would be better to create a new language section for Classical Malay, rather than working on unifying standard Malay and Indonesian, which is not practical in the long run, as the two languages are gradually evolving apart. In the Chinese language, it is common to find multilingual speakers, eg. Taiwanese speaker fluent in Mandarin, Min Nan and Hakka. It is even possible to read a Mandarin textbook using Cantonese. Most Chinese speakers are able to converse in an additional Chinese language besides Mandarin (usually one that is related to their hometown/ancestry), so the unification of Chinese language is doable. Besides, all Chinese languages are able to share the same written script (either traditional or simplified), with additional characters being created for terms that don't exist in Mandarin. However, with regards to Malay vs. Indonesian, it is uncommon to find someone that is proficient in both standard Malay and Indonesian. If we were to unify both Indonesian and standard Malay, certain lemmas that exist in both languages but with slightly different meanings is likely to get mixed up, eg. butterfly in standard Malay (kupu-kupu); moth in standard Malay (rama-rama); butterfly in Indonesian (kupu-kupu and rama-rama); moth in Indonesian (ngengat). Note that the term ngengat does not exist in standard Malay. Yes, we could use labels and qualifiers to separate the different meanings in a unified Malay section, but users proficient in either Indonesian or standard Malay are usually not proficient in the other language, and mistakes are prone to occur. KevinUp (talk) 15:20, 7 October 2018 (UTC)[reply]

Newer discussions not found in the original talk page are found below:

Another issue that was not discussed previously regarding the unification of Malay and Indonesian is that we would need to redirect translation entries using {{t|id|term}} to term#Malay instead of term#Indonesian and I think this will cause more confusion among language learners. KevinUp (talk) 12:24, 9 October 2018 (UTC)[reply]

The following discussion is copied from User talk:Nama.Asal

BTW, would it be feasible to create an autonomous template for Indonesian derivations, similar to {{ms-der}}? I availed myself with the Malaysian variant in this entry, which was perhaps not the most wise thing to do, seeing that the generated forms are hard-linked to Malaysian lemmas. Nama.Asal (talk) 14:59, 22 July 2018 (UTC)[reply]

Hello, maybe if you think it's worth it, unification of Malay and Indonesian on wiktionary might be worthwhile. How different do you think they are? DerekWinters (talk)

I have no objection to that. Malaysian and Indonesian are merely varieties of the same language, which fall under the blanket appellation of "Malay". Subsuming them under a unified "Malay" header would definitely help and alleviate content duplication.

However, there are numerous lexical differences between both varieties (definitely more substantial than those between the Serbian and Croatian), so many entries would need to be embellished with regional labels. Both varieties also prescribe slightly different rules of affixation (Indonesian "menerjemahkan" vs Malaysian "menterjemahkan"), which would have to be identified accordingly. Who would take on the formidable task of overhauling every Malay/Indonesian entry out there? Nama.Asal (talk) 08:02, 23 July 2018 (UTC)[reply]

End of copied discussion from User talk:Nama.Asal

Once again, I need to reiterate that it is rare to encounter someone proficient in both standard Malay and Indonesian. If we were to unify both languages, labels and qualifiers have to be used for every section, from definitions up until derived terms to prevent confusion among language learners, eg. for synonyms of kupu-kupu (“butterfly”): rama-rama (only applies to Indonesian, no synonym in standard Malay); for derived terms of terjemah (“to translate”): menterjemahkan (standard Malay), menerjemahkan (Indonesian). In addition, some derived terms need to be labelled as being used only in Indonesian or standard Malay. Recently I had to convert kacamata (wrongly labelled as Malay) to Indonesian and kaca mata (wrongly labelled as Indonesian) to Malay. As mentioned above, users proficient in either Indonesian or standard Malay are usually not proficient in the other language, and mistakes are prone to occur. I'm not sure why not much native or proficient speakers are interested to edit in both languages despite Malay (including both Indonesian and Malaysian) being ranked as No. 7 on the list of languages by total number of speakers, but one of the reasons is likely to be instances such as this: Special:Diff/49178908/49178914. KevinUp (talk) 12:24, 9 October 2018 (UTC)[reply]

Additional comment from User talk:Lingo Bingo Dingo:

@KevinUp: Perhaps we can move the discussion over to the Beer parlour. However, just some comments on what you said. When you proposed a "Classical Malay" language section, I immediately looked up the language code, and found out that there is none, which means that it's really not feasible to do it. Also, I don't see the point, when we already have an "ms" language code, which refers to the Malay language as a whole, together with Malay's two standard forms, Malaysian (zms) and Indonesian (id). So just to be clear, on the linguistic aspect, there is only one language, Malay, with two standard forms, Malaysian and Indonesian. So from the onset, they are one language. As for your comment that they are slowly "gradually evolving apart", I don't think so. There is media exchange between Indonesia and Malaysia. For example, Indonesian TV stations also broadcast Malaysian TV shows and vice versa. Especially with the ongoing ASEAN integration, more of the infrastructures of ASEAN countries will be integrated, so Malaysia and Indonesia will experience more people exchanges and media exchanges, more than what we currently have now, and obviously that will also bring more language convergence. The problem you cite, about certain lemmas existing in both languages but with slightly different meanings, that always exists in languages with two standards lol. For example, the word 土豆 means potato in Mainland China, but means peanut in Taiwan. And there are words existing in standard Taiwanese Mandarin that doesn't exist in standard Mainland Mandarin. Labels have worked well to differentiate. By the way, don't say Standard Malay if you mean Malaysian. Because technically both Malaysian and Indonesian are Standard Malay (two standards of Malay). Also, of course most speakers are not proficient in both standards (Malaysian and Indonesian), same thing with Mandarin, most speakers aren't proficient in both standards (both Mainland and Taiwan standards). However, with a unified Malay, we can better inform Malaysians and Indonesians alike, as well as foreign learners, how their standard differs from the other standard, which would facilitate better understanding in the long run. --Mar vin kaiser (talk) 13:14, 9 October 2018 (UTC)[reply]

End of additional comment from User talk:Lingo Bingo Dingo:

@Mar vin kaiser:. Even though the language code for Classical Malay does not exist, that does not mean that the language does not exist. Classical Malay, along with its predecessor Old Malay (7th to 14th century) is the missing link between the Proto-Malayic language and the Malay language. On the other hand, the broadcast of Malaysian TV shows in Indonesia does not mean that Indonesian viewers would eventually pick up standard Malay, just like the broadcast of American shows in the United Kingdom does not mean that American slang would gradually be incorporated into British English. Yes, the problem with certain lemmas that exist in two languages with slightly different meanings is not uncommon. Fortunately, we have dictionaries like Taiwan MOE 《兩岸萌典》 to point out differences between Taiwan Mandarin and Beijing Mandarin. However, in the case of standard Malay vs. Indonesian, there is a lack of modern resources to point out the differences between the two languages. I think that usage notes can be used to point out the differences between the two languages, rather than merging the two languages without realizing the consequences and cleanup needed afterwards. Standard Malay doesn't solely refer to the Malaysian language. To be more precise, it is the standard language spoken by the Malay people of Singapore, Brunei and Malaysia that is based on the standard register spoken during the time of the Riau-Lingga Sultanate that has relatively few Dutch and Javanese loanwords, unlike Indonesian that has a wider source of loanwords, particularly from Javanese. See also comment above regarding redirection of {{t|id|term}} (translations of English terms to Indonesian) to {{l|ms|term}} (Malay section) rather than {{l|id|term}} (Indonesian section), likely to cause confusion among language learners. KevinUp (talk) 14:06, 9 October 2018 (UTC)[reply]

Suggested outcome edit

I would like to suggest the following:

Malay (using the "ms" language code) be treated as the same as "standard Malay" (which uses the "zsm" language code), which is the current practice on Malay and Indonesian Wikipedia/Wiktionary.
In addition, standard Malay is to be defined as the standard register currently spoken in Brunei, Singapore and Malaysia which is based on the language spoken during the time of the Riau-Lingga Sultanate.
The generic language code "msa", or some other language code is to be used for Classical Malay, which is the contemporary dialect spoken in the Malay Archipelago from the 14th to 18th century from which modern vocabulary that exists in both standard Malay and Indonesian (with the same spelling and meaning) were inherited from. Classical Malay, along with its predecessor Old Malay (7th to 14th century) is the missing link between the Proto-Malayic language and the Malay language. It is analogous to European languages such as Middle French, Middle English and Middle Dutch. The primary script used is the Jawi alphabet, but many classical texts such as the Malay Annals and Hikayat Hang Tuah have been transcribed into the Latin script during the late 19th and early 20th century using the Za'aba spelling system of 1927.
Terms borrowed from the Malay language up until the 18th century originating from Bazaar Malay and Classical Malay shall use the "msa" language code (or some other language code) and not the "ms" language code (reserved for standard Malay). Terms borrowed from 19th century onwards (mostly English words) shall use the "ms" language code which refers to words borrowed from the language spoken during the time of the Riau-Lingga Sultanate (1824-1911), and this is to be distinguished from an additional language code being proposed below.
The redundant "zlm" language code (defined by the IANA language subtag registry [4] as "Malay (individual language)"), or some other language code is to be used for Indonesian Malay lemmas (Malay language spoken in the Indonesian archipelago) that first came into existence from the time of Dutch colonization up until the formation of the modern Indonesian language. This includes terms such as toko and Jepang that were borrowed from Min Nan settlers in the Dutch East Indies.
This is in response to the query here: Wiktionary:Beer parlour/2018/May#Indonesian vs. Indonesian Malay regarding the difference between the Indonesian language and Indonesian Malay. KevinUp (talk) 12:24, 9 October 2018 (UTC)[reply]

Summary of what is being proposed and what needs to be done (for technical users):

Everything stays the same. Two new language codes will need to be introduced, one for Classical Malay, and another for Indonesian Malay (from Dutch colonization until independence of Indonesia). KevinUp (talk) 12:24, 9 October 2018 (UTC)[reply]

End of proposal. Newer comments are found below:

In retrospect, Indonesian Malay (from Dutch colonization until independence of Indonesia) is likely to be Betawi Malay (code "bew"), a Malay-based creole of Jakarta, which is still used in informal spoken Indonesian. KevinUp (talk) 21:49, 18 February 2019 (UTC)[reply]

Use cases edit

Would adding information about use cases of a word not be useful? As a kind of separate section with references and excerpt quotations. I'm thinking about old words which have interesting histories in early writing, or new words that just aren't as old as people say they are. -Inowen (talk) 07:42, 5 September 2018 (UTC)[reply]

Isn’t that the purpose of the example sentences (aka usexes) following a definition, to show the uses of a word by applying it in appropriately illustrative contexts? (Unfortunately, many current usexes are not particularly helpful, but that is another discussion.) If you have something else in mind, could you give an example of what you’d like to see? --Lambiam 10:30, 5 September 2018 (UTC)[reply]

New language request: Milpa Alta Nahuatl edit

This language currently does not have an ISO code, and is not recognized as any a dialect of any other variety of Nahuatl (at least according to Ethnologue). --Lvovmauro (talk) 03:40, 6 September 2018 (UTC)[reply]

Read-only mode for up to an hour on 12 September and 10 October edit

Read this message in another language • Please help translate to your language

The Wikimedia Foundation will be testing its secondary data centre. This will make sure that Wikipedia and the other Wikimedia wikis can stay online even after a disaster. To make sure everything is working, the Wikimedia Technology department needs to do a planned test. This test will show if they can reliably switch from one data centre to the other. It requires many teams to prepare for the test and to be available to fix any unexpected problems.

They will switch all traffic to the secondary data center on Wednesday, 12 September 2018. On Wednesday, 10 October 2018, they will switch back to the primary data center.

Unfortunately, because of some limitations in MediaWiki, all editing must stop when we switch. We apologize for this disruption, and we are working to minimize it in the future.

You will be able to read, but not edit, all wikis for a short period of time.

You will not be able to edit for up to an hour on Wednesday, 12 September and Wednesday, 10 October. The test will start at 14:00 UTC (15:00 BST, 16:00 CEST, 10:00 EDT, 07:00 PDT, 23:00 JST, and in New Zealand at 02:00 NZST on Thursday 13 September and Thursday 11 October).
If you try to edit or save during these times, you will see an error message. We hope that no edits will be lost during these minutes, but we can't guarantee it. If you see the error message, then please wait until everything is back to normal. Then you should be able to save your edit. But, we recommend that you make a copy of your changes first, just in case.

Other effects:

Background jobs will be slower and some may be dropped. Red links might not be updated as quickly as normal. If you create an article that is already linked somewhere else, the link will stay red longer than usual. Some long-running scripts will have to be stopped.
There will be code freezes for the weeks of 10 September 2018 and 8 October 2018. Non-essential code deployments will not happen.

This project may be postponed if necessary. You can read the schedule at wikitech.wikimedia.org. Any changes will be announced in the schedule. There will be more notifications about this. Please share this information with your community. /User:Johan(WMF) (talk)

13:33, 6 September 2018 (UTC)

Arawakan tree edit

So, the tree for Arawakan is very underdeveloped (see {{#invoke:family tree|show|awd-pro}}), many languages have less common spellings and forms (ex. Pareci vs. Paresi), several dialects are marked as languages (ex. Ashéninka Pajonal), a few languages are still lacking codes (ex. Wiriná), etc. Since I appear to be the only one that has any interest in this family and the fact that most languages don't even have entries to them, instead of painstakingly indexing all the needed changes, I'd like to just have a whack at cleaning it up. Does anyone have any objections? @-sche, Metaknowledge? --Victar (talk) 15:42, 6 September 2018 (UTC)[reply]

I have no knowledge of those languages or the literature on them, although -sche might (so I'd wait for them to respond). I'm fine with letting you go at it, though. —Μετάknowledge^{discuss/deeds} 17:39, 6 September 2018 (UTC)[reply]

Yeah, a lot of our language families are pretty underdeveloped. (Meta has been working for some time to fix names of Bantoid languages...) Go for it; that (i.e. not bothering with discussions) is what I've done in the past for languages with no entries where I expected the changes wouldn't be controversial. If you merge any codes/languages, make a note in WT:LT (I guess link to this thread as the "discussion"), and try and keep the merged varieties' names as "otherNames" so it's clear how they're being included. If any of the changes seem questionable, I'll bring it up here, though I doubt that'll happen since you tend to know what you're talking about! :) We do have a tiny handful of users with specialist knowledge of Arawakan, like Emi-Ireland, though they haven't been active recently. - -sche (discuss) 19:06, 6 September 2018 (UTC)[reply]

@-sche, Metaknowledge, thanks, and I'll be sure to put in a merge request we I switch the lang codes to etym-only codes for the Asheninka dialects. --Victar (talk) 20:25, 6 September 2018 (UTC)[reply]

Categorize Babel categories by language family edit

The thread above this gave me an idea: wouldn't it be useful to categorize "user language" categories, e.g. Category:User wau which contains users who speak Wauja, into language-family categories, i.e. Category:User awd (or whatever naming format, like "User language family awd")? That way, if you need of someone who speaks Wauja, you go to Category:User wau like always, but if you're in search of people with knowledge of e.g. any Arawakan languages, you don't have to try out "Category:User xxx" for every different language to see which ones exist and have users, you just go to Category:User awd and use the "▷" buttons to find such users. (As for how to bring about the categorization, it seems like someone could write a bot to add the families based on the families that are set in Module:languages, where a respectable 6518 of 8054 languages have family info already; it's not an urgent or high priority, so it's fine if no-one has time at the moment. Special cases like "User sr" can be done by hand.) - -sche (discuss) 19:20, 6 September 2018 (UTC)[reply]

Which level of language families? "Indo-European" or "Germanic", or both? DTLHS (talk) 19:31, 6 September 2018 (UTC)[reply]

I suppose it should follow the same categories as the languages do in Module:languages, for simplicity / ease of implementation and maintenance. (So, "CAT:User de" in "CAT:User gmw" in "CAT:User gem" in "CAT:User ine".) - -sche (discuss) 20:45, 6 September 2018 (UTC)[reply]

Doesn’t work anyway with Metawiki. Fay Freak (talk) 19:38, 6 September 2018 (UTC)[reply]

Huh? Just add the categories... to the category pages... - -sche (discuss) 20:45, 6 September 2018 (UTC)[reply]

I mean any plan here does not catch users who keep their user pages on Metawiki. Fay Freak (talk) 23:15, 6 September 2018 (UTC)[reply]

As long as the user's page on this wiki is categorized into CAT: User wau by any means, we can put [[Category:User awd]] at the bottom of our page [[Category:User wau]], and effect this categorization on our wiki. Right? Whether metawiki wants to copy our system or not seems like a separate matter. - -sche (discuss) 23:22, 6 September 2018 (UTC)[reply]

Pro- edit

Suggest adding Proto as a language itself for word and word particle definitions, particularly where lexemes are primary. -Inowen (talk) 17:55, 7 September 2018 (UTC)[reply]

Proto-Indo-European is a reconstructed language- it is nowhere attested in actual use. It doesn't belong in the dictionary itself, but we do have entries in the Reconstruction namespace. See WT:AINE for details. Chuck Entz (talk) 19:36, 7 September 2018 (UTC)[reply]

Reconstruction is sometimes the way we know about languages which in real life were spoken natural languages. Its what we got, the best we can do, and since research into Proto is constantly being improved, its the place where a lot of whole-istic knowledge about European-Eurasian language roots. Roots are the main idea - the simple words which are at the base of the tree of a lot of words. Proto is just the name for all of that research, its not a "reconstructed language" as much as a dictionary of roots (nobody speaks Proto anymore, do they). -Inowen (talk) 18:56, 8 September 2018 (UTC)[reply]

Vowel Lengths in Ancient Greek edit

For the life of me I cannot seem to uncover the sources of some of the vowel lengths so confidently produced on Wiktionary's Ancient Greek pages. On what authority do we hear that Κικέρων had a short ι (Κῐκέρων)? As far as I am aware Cicero's name does not show up in any surviving Hellenistic poetry, only prose (Plutarch, Appian; Strabo?) - and I am not convinced simple transcription from Latin to Greek conserves vowel lengths robustly enough for us to claim this quantity as 'known'. (Interestingly, the Latin page for Cicero - https://en.wiktionary.org/wiki/Cicero#Latin - seems to lack the vowel lengths; where, however, they would be very much justified (Cĭcĕro, ōnis is well-attested). Why are there so few macrons on Latin words?)

Do not get me wrong. The presence here of macra and breves in our Greek pages is in my opinion desirable, even indispensable. The fact is that Wiktionary is one of few places where a student or scholar can find good information on vowel quantities, whether for prosodic purposes or phonetic (pronunciation); and we must retain this. But the approach taken has perhaps been over-zealous, if the editors feel no vowel can ever be left unmarked. Thus my question is really a) an open one - can someone inform me what the procedure is for deciding these vowel lengths, or from what source this information is reproduced? - maybe someone has found a lexicon that can corroborate these vowel lengths? - or, failing that, b) a suggestion, or plea, that unknown quantities be removed, and macra and breves used only where we actually do have them on good testimony. — This unsigned comment was added by 2A00:23C4:3988:C101:25C0:90F9:EB63:6D8A (talk) at 18:29, 7 September 2018 (UTC).[reply]

In general transcription from Greek to Latin preserves vowel length (for example, ̓́Ῑκᾰρος → Īcărus), so I expect transcription in the opposite direction to behave the same way. --Lambiam 00:12, 8 September 2018 (UTC)[reply]

I don't know as much about policy for Latin, but for Ancient Greek, editors agreed on adding macrons and breves zealously (see Wiktionary:About Ancient Greek § Diacritics and accentuation). It's fairly common for a lexicon not to mark short vowels with breves, but we include breves because without them it's unclear whether an unmarked vowel is short or just hasn't had a macron added yet.

Occasionally breves in Wiktionary are not specifically supported by a lexicon or other evidence (perhaps most commonly with proper nouns), but are added because if LSJ or someone else doesn't indicate a vowel is long, it is probably short because lexicons are fairly diligent about marking long vowels, and because short vowels are more common. But I second what Lambiam says about Κικέρων (Kikérōn). — Eru·tuon 00:29, 8 September 2018 (UTC)[reply]

What categories labels generate edit

I have discovered that {{lb|en|uncommon}} generates Category:English rare forms, whereas {{lb|en|rare}} generates Category:English terms with rare senses. So you can use "uncommon" if a term has only one sense, but it is defined in the category as "English terms that serve as rarely used forms of other terms". I'm scratching my head. DonnanZ (talk) 23:28, 7 September 2018 (UTC)[reply]

We do have both Category:English terms with rare senses (>9100 members) and Category:English terms with uncommon senses (1 member, namely Category:English uncommon forms (20 members)). I don't see the point of categorizing "English uncommon forms" as "English terms with common senses". "See also" links between the two would be convenient and less confusing.

It would also help if we had some criteria for applying "uncommon" and "rare" to definitions. "Rare" would seem to me to mean less common than "uncommon". It is not clear to me whether the frequency of the sense is uncommon or rare relative to the whole of the English language (but still meeting RfV) or to the total use of the word (lemma). I think one good use of these labels for English definitions is in part to discourage the use of both rare and uncommon terms in defining FL terms, so perhaps precision isn't required. DCDuring (talk) 20:13, 17 September 2018 (UTC)[reply]

Creation of Cajun French, Missouri French, and Louisiana Spanish tags edit

1.I have created many a definition for Louisiana and Cajun French entries, but we have nary a dedicated Cajun French tag linking to a category. There exists the tag "Cajun" that shows the word "Louisiana" and uses a blue link to direct to Cajun Peoples on wikipedia, but it links only to the Louisiana French category.

2.I would like to begin adding Missouri French(Paw-paw French) definitions, but we have neither a category nor a tag for it.

3.I also ask that we make a Louisiana Spanish tag linked to the Louisiana Spanish category, as there are region specific words available. Aearthrise (𓂀) 15:18, 9 September 2018 (UTC)[reply]

Also, can someone please create such tags for Travancore English, South America English, and Nicaragua Spanish? I've seen them used in entries but they don't result in categories. 32.210.179.170 22:56, 9 September 2018 (UTC)[reply]

French Dialects of America edit

@Per utramque cavernam We should make a CFI exception for more poorly attested French dialects- like Missouri(Paw-Paw), Louisiana(Louisiana Colonial), and Cajun(Acadian) French. Aearthrise (𓂀) 18:53, 9 September 2018 (UTC)[reply]

I second this. Minority forms of a language can't be treated with the same rigour as the main dialects, and these forms of French are pretty distinctive (and fairly localized). Andrew Sheedy (talk) 00:28, 10 September 2018 (UTC)[reply]

I agree (see the convo that prompted this suggestion). I suppose it's not limited to French, by the way. Per utramque cavernam 08:34, 12 September 2018 (UTC)[reply]

Replace `{{t-needed|xx}}` with `{{t|xx}}` (with no term) edit

All of our other linking templates already display a request when the term hasn't been provided, so why not do the same with translations? Module:translations would need to be modified so that it displays the way {{t-needed}} currently does, if there is no term, and of course add the category. The translation added script would also need to be modified. As an added bonus, you can differentiate between {{t}} and {{t+}} even if en.Wiktionary itself has no translation yet. —Rua (mew) 12:38, 10 September 2018 (UTC)[reply]

Support —Suzukaze-c ◇◇ 21:54, 11 September 2018 (UTC)[reply]

Meetings edit

Hi,

Next month, German Wikicon, French Wikiconvention, annual meeting of Wikimedians from Central and Eastern Europe (CEE Meeting) and WikiConference North America will be four opportunities for contributors to meet. In the first, third and last one, I haven't see anything about Wiktionary but I'll be at the French one in France and I'll be part of a team that will do a workshop "How to enjoy Wiktionary" and some other meetings and talks about cool stuffs in our project.

Well, Is any English Wiktionary editors plan to go to any of those event or even took part of any event of this kind in the past? Have you ever met another wiktionarian IRL in the past 15 years? Noé 15:32, 11 September 2018 (UTC)[reply]

I met one Wiktionarian IRL. It was fun. --XY3999 (talk) 06:58, 17 September 2018 (UTC)[reply]
<thinks @Equinox should attend> I have met quite a number of Wiktionarians at Wikimanias, and a few other events. Mostly the meetings felt awkward. Perhaps if the events were less stressed… - Amgine/^t·e 18:27, 17 September 2018 (UTC)[reply]

I have no idea what goes on at these "Wikimanias". Why are they stressful? DTLHS (talk) 18:42, 17 September 2018 (UTC)[reply]

I find them stressful because of the quantity different personalities attempting to meet together in order to coordinate sometimes disparate goals and efforts. But also because it is really only about Wikipedia. - Amgine/^t·e 18:50, 17 September 2018 (UTC)[reply]

Not really interested in the "official"/corporate-ish Wikimedia events. Equinox ◑ 18:53, 17 September 2018 (UTC)[reply]

Can we make a bylaw that official Wiktionary meetings/meet-ups can only take place in musty basement rooms in public libraries? - TheDaveRoss 19:32, 17 September 2018 (UTC)[reply]

The Wiktionarian meeting I was part of in a café benefited greatly from a lack of must. —Μετάknowledge^{discuss/deeds} 19:32, 25 September 2018 (UTC)[reply]

Wikiconvention francophone edit

Thanks for the comments about meetings. It seems you have a different perspective than French wiktionarians on this matter. Our strategy is to occupy the ground to make Wiktionary more popular to wikimedians and to the public. So, at Wikiconvention francophone, in France at the beginning of October, in ten days from now, we will do:

a conference: Wiktionnaire: qu’est-ce que c’est?
a pushy talk: Why Wikipedia should be less a dictionary
a discussion: Build a MOOC for Wiktionary
a workshop: Enjoy Wiktionary
a meeting: Wiktionary meet-up

So, if you have some possibilities to travel and can understand some French, you are very welcome (ping Jberkel)!

And if you want to do something similar for English Wiktionary, I'll be very happy to help you to draw some slides, in my free time. Yeah, to clarify, we are not supported by Wikimedia Foundation or any chapter, we are just ~~unemployed~~ very motivated to talk in public about Wiktionary Noé 17:45, 25 September 2018 (UTC)[reply]

Feedback in short: it was great. Some people discovered Wiktionary and some Wikipedians got a second chance to explore Wiktionary, a project that evolved a lot in the last couple of years. During our workshop, a dozen of person learned how to add pictures, quotations, translations, synonyms and definition. We had nice conversation with plenty people and the food was great. A good experience. We think about organizing a Wiktionary Weekend in the future, to gather colleagues, maybe with some English Wiktionarians if some people are interested. Let me know

Noé 10:22, 10 October 2018 (UTC)[reply]

Thank you very much for informing us about these workshops. I think the French people are very cool. Thanks for organizing such events to let the public know more about Wiktionaries. If the talks and seminars are in English I think more people might be interested to attend. KevinUp (talk) 13:01, 10 October 2018 (UTC)[reply]

Module errors in categories with invalid canonical names edit

Could an admin delete all the categories in CAT:E with "Kamviri" and "Taino" in their names? Those canonical names are no longer valid and the pages have had module errors for several days. — Eru·tuon 20:47, 11 September 2018 (UTC)[reply]

V111P and Latin headings edit

Just noticed V111P's contributions moving Latin heading to L5 from L3. This is not reflected in WT:ELE or WT:ALA#Part_of_speech_headers. Is this a boldness which should be reverted or left alone? - Amgine/^t·e 15:03, 12 September 2018 (UTC)[reply]

Nope, looks like vandalism. (and dang I should know better about BP) - Amgine/^t·e 15:08, 12 September 2018 (UTC)[reply]

@Amgine: I don't see what's wrong with their edits. See WT:ALA#More complex cases. — justin(r)leung _{{ (t...) | c=› }} 15:13, 12 September 2018 (UTC)[reply]

Mmm, perhaps I should revert myself. The contribution history seems to me to be inconsistent; about half the time enforcing ELE, at others doing the opposite, whimsically in one edit correcting to L3 and also demoting another heading to L4... - Amgine/^t·e 15:28, 12 September 2018 (UTC)[reply]

In principle, part-of-speech headers should never be L5. If you find that occurring, there's probably something else that needs changing. —Rua (mew) 16:07, 12 September 2018 (UTC)[reply]

There's no point in worrying about header levels at all since every editor and language has their own quirks that aren't written down in WT:ELE or anywhere other than someone's head. DTLHS (talk) 16:53, 12 September 2018 (UTC)[reply]

FWIW, I just went through a batch of their edits to JA entries, and it looked to me like they were trying to match ELE in good faith, getting confused in a couple places by complicated entry structures (such as accidentally moving deriveds up from under the relevant POS, minor and easily understandable goofs). ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:58, 12 September 2018 (UTC)[reply]

At the moment I'm mostly fixing POS sections that are not subsections of a language, etymology or pronunciation section. Maybe I don't understand the rules for the pages with the Chinese and Japanese characters, so I may have made some mistakes there. --V111P (talk) 22:51, 12 September 2018 (UTC)[reply]

Finnish pronunciation template edit

Today I wanted to add a pronunciation to a word, specifically huuhkaja.

I often use wiktionary as a translation dictionary and a way to compare cognates etc in different languages, but often find when doing this entries don't have pronunciations - which to me are very important for understanding new words in languages I'm not familiar with. So I'm thinking about whenever that happens just trying to find audio for those words and (using the wikipedia IPA guide for the language) add the IPA pronunciations myself.

I have a good idea what the pitfalls of doing that could be and when I should be careful, and know sometimes I'd probably want to do a "request for pronunciation" instead. (I don't know how those work. Is it just like I add the template and they're listed in the category, and that's all there is to it? If I were to take on the maybe-more-sensible task of adding pronunciations to words in the English rfp category, would I just let those words exit the category once I've added a pronunciation, or do some of them remain in there if they don't have an audio file?)

As I was looking at other entries for reference though, I discovered something I really wasn't expecting to find - that Finnish words have some kind of special templates that can automatically insert hyphenations ( {{fi-hyphenation}} ) and pronunciations ( {{fi-IPA|*}} ). (for example, on "huuhkaja" when I was going to go insert /ˈhuːkɑjɑ/ as what I thought the word sounded like and tried the template instead it suggested: /ˈhuːhkɑjɑˣ/, [ˈxʷuːxkɑ̝jɑ̝(ʔ)] )

How does this work? Does the template simply use an algorithm to convert the letters in the word to a pronunciation guess?? Is Finnish a case of a very predictably-pronounced language where it's almost always best practice to use this template because even somebody who knew it really well would rarely have to fix the template's work? Are there other languages where this is the case?

Valenoern (talk) 05:24, 15 September 2018 (UTC)[reply]

RFP: If you want IPA, add {{rfp}}, and if you want audio, add {{rfap}}. They are removed as the request is fulfilled using {{IPA}} (or whatever) or {{audio}}.

Finnish: See Template:fi-IPA and Module:fi-IPA. Similar templates can be found here. Someone familiar with Finnish needs to say how predictable the pronunciation is.

—Suzukaze-c ◇◇ 05:32, 15 September 2018 (UTC)[reply]

Please don't add pronunciations in languages you don't know. The fi-IPA template gives the correct pronunciation in this case: /ˈhuːhkɑjɑ/, [ˈxʷuːxkɑ̝jɑ̝] - no ˣ or (ʔ) in the end, where did you get that? - but Finnish spelling is not completely phonetic, for example the glottal stop is never written down. The fi-hyphenation template may produce nonsense for combined words. The templates are an aid for editors who know what the result should be. Nobody wants to go through all the IPAs and hyphenations to find mistakes added by a non-speaker. --Makaokalani (talk) 10:43, 15 September 2018 (UTC)[reply]

I suspect the ˣ came from the editor trying to use {{fi-IPA|*}} (note the * parameter), which adds the glottal stop at the end. It's intended for words that actually have them; most words would get by without with just {{fi-IPA}}. But as said, it's not automatically correct, and editors not familiar with Finnish should probably not try to add pronunciation templates. For compound words, one needs to supply a parameter with hyphens at word boundaries, such as {{fi-IPA|valta-tie}} for valtatie. For fi-hyphenation, it also needs hyphens at word boundaries, but only under certain conditions. SURJECTION _{·talk·contr·log·} 18:06, 19 September 2018 (UTC)[reply]

No, there is definitely something wrong here: no initial [xʷ-] should ever appear in Finnish. Looks like someone has been overapplying the (itself only optional) rule about coda /h/ being fricated. --Tropylium (talk) 18:30, 3 October 2018 (UTC)[reply]

Genitive case in the Scandinavian languages edit

According to w:Danish grammar#Grammatical case, the "genitive" of Danish is really a clitic like in English. w:Swedish grammar#Genitive says the "genitive" in that language behaves the same, and attaches to the last word in the phrase and not the head noun as would be expected of a case. w:Norwegian language#Genitive of nouns again says pretty much the same thing. Given all that, and the fact that we decided to exclude possessives of English nouns on the same grounds, I think we should do the same for these three languages. —Rua (mew) 19:00, 17 September 2018 (UTC)[reply]

It's come up before; e.g. here. In this message to me, PseudoSkull seemed to want to expand our coverage of inflected forms even further, although I think that fizzled out. At the time, I did not see any compelling reason to prevent someone willing to do the work from making the entries, although I do feel strongly that linking to all those forms on the lemma page would be cluttering.__Gamren (talk) 10:14, 13 October 2018 (UTC)[reply]

On'yomi and katakana edit

According to w:Katakana#Usage, on'yomi readings of kanji are typically written in katakana, not hiragana, in kanji dictionaries, because they were historically borrowed from Chinese and katakana is used for foreign borrowings. I think we should be doing the same thing. For example, the kanji 音 currently lists おん (on) as a goon'yomi reading, いん (in) for kan'on'yomi, and おと (oto) and ね (ne) for kun'yomi. Under this change, the first two readings would become オン (on) and イン (in).

RoseOfVarda (talk) 18:02, 19 September 2018 (UTC)[reply]

It does seem to be a common practice in reference works. All of our entries currently use hiragana on the wikitext level. Perhaps they could be converted to katakana within Module:ja-kanji-readings. It would be quick, and easily reversible if we don't like it.

(Notifying Eirikr, TAKASUGI Shinji, Dine2016, Poketalker, Fumiko Take, Dingo1234555, 飯江誰出茂, 신묘마루쨩, Kamilz, Krun, Asdfsfs, Lo Ximiendo): —Suzukaze-c ◇◇ 04:31, 21 September 2018 (UTC)[reply]

I'm OK with the katakana approach, but it's best if they still link to the hiragana entries (e.g. おん) or a kanji index (e.g. Index:Japanese kanji by reading/ア#オ. --Dine2016 (talk) 07:02, 21 September 2018 (UTC)[reply]

The nice thing about hiragana is that that is what is used to spell the words in actual (modern) texts when kanji are not used, and also as furigana to aid reading. Such agreement with actual usage seems to me to be very much in line with the general norms of Wiktionary, i.e. attestation-based. Of course, especially historically, but also today, there are examples of on’yomi words being spelled in katakana, whether it be e.g. for names of species, to indicate roughness or manliness, for archaic flair, or for some other reason, but none of these represents the typical way of spelling out on-yomi. I think the primary reason some dictionaries use katakana for on and hiragana for kun is brevity, i.e. whether it is katakana or hiragana will immediately signal the type of reading without requiring a label that takes up valuable space. We don’t really have much trouble with space and do in fact indicate explicitly what kind of reading it is, even subtypes of on-readings, etc. Some kanji also have special readings that are usually spelled in katakana e.g. ページ for 頁 (this isn’t even an on-reading). I think the current system we have is pretty good. – Krun (talk) 15:06, 21 September 2018 (UTC)[reply]

I agree with Krun on this. I also note that Wikipedia hews more to descriptions in cited publications rather than actual observable use; the stated focus at Wiktionary is on the latter. Another point is that katakana is used in scientific texts for things like species names and related nomenclature; discussions of Homo sapiens will generally use the term hito (“person; human”) spelled in katakana as ヒト, even though that is kun'yomi (the native Japanese reading) and not on'yomi (the Chinese-derived reading). And as a point of reference, while the JA Wiktionary does list katakana for on'yomi on single-kanji entries, as at ja:複, JA Wikt lists only the hiragana for the on'yomi of multi-kanji terms, as at ja:複雑. This inconsistency has always bothered me, and I am happy that we haven't copied that here.

Moreover, implementing this change would require much more than just tweaking Module:ja-kanji-readings: we already have tons of hiragana soft-redirect entries for on'yomi terms, but extremely few entries using the katakana spellings. For instance, きょうわ vs. キョウワ (kyōwa), ふくざつ vs. フクザツ (fukuzatsu), おうえん vs. オウエン (ōen), etc. etc. etc. I've confirmed that using the Search tool in the upper right for these katakana spellings will present the hiragana entries towards the top of the list, as well as any existing kanji entries that include the kana spellings. However, simply entering the katakana spelling in the URL bar just presents the user with the “Wiktionary does not yet have an entry for [FOO]” message. Granted, creating these katakana entries would be bot-able, but I don't have the required bot skills, nor the bandwidth to acquire them.

I am thus opposed to this proposed change, as it is at best a minor convenience for a specific subset of inexperienced users (those accustomed to seeing katakana for on'yomi in kanji reading lists who would then not have to get used to seeing hiragana instead in the {{ja-readings}} output), while it entails a huge amount of work that doesn't really add any appreciable usability. ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:54, 21 September 2018 (UTC)[reply]

@Eirikr: I think you have misunderstood the proposal. I think RoseOfVarda is proposing that on’yomi readings in {{ja-readings}} be displayed in katakana, while Sino-Japanese terms – affixes like 複(ふく) (fuku) and nouns like 複雑(ふくざつ) (fukuzatsu) – still uses hiragana in the headword templates and as soft-redirects. It is important to distinguish between kanji readings (especially kan’on and goon), which are prescriptive pronunciations of kanji deduced from the Chinese rhyme tables, and Sino-Japanese affixes and words, which are attested shapes functioning as affixes or words. For example, 面 has メン (men) and ベン (ben) as kanji readings, but in the Sino-Japanese vocabulary, only(?) メン (men) has made it into an affix and a noun, as めん (men), making it clear that on’yomi readings and actual Sino-Japanese word shapes are two different layers of the pronunciation of kanji. I think only the latter needs to use hiragana. --Dine2016 (talk) 00:47, 22 September 2018 (UTC)[reply]

Using katakana is a universally common practice among Japanese dictionaries, though it isn’t necessarily easier to understand for learners. We should link to hiragana pages even when we use katakana for on-readings. I tried to modify Module:ja-kanji-readings but I couldn’t find the code that displays on-readings. — TAKASUGI Shinji (talk) 01:58, 22 September 2018 (UTC)[reply]

@TAKASUGI Shinji: I made the proposed change in Module:ja-kanji-readings/sandbox. It displays the on readings in katakana but still links to hiragana. It can be tested with {{ja-readings/sandbox}}. — Eru·tuon 03:34, 22 September 2018 (UTC)[reply]

Just want to mention that kanji dictionaries published during the Meiji era such as 《日本大玉編》 and 《袖珍康煕字典》 would list character readings in katakana whereas iroha dictionaries of that time (such as [5], [6] and [7]) would list lemmas in hiragana.

Since it is modern practice to list on'yomi using katakana and kun'yomi using hiragana (as found in the Jōyō Kanji table), it would be good to display it as such for conformity with modern dictionaries. However, I agree that the on'yomi readings should redirect to corresponding hiragana entries despite being written in katakana, as the non-kanji form of these readings are usually written in hiragana.

I hope the proposed change at Module:ja-kanji-readings/sandbox would automatically convert existing hiragana readings to katakana (appearance wise) while maintaining its link to hiragana entries. I hope the same can also be done for on'yomi readings using {{ja-kanjitab}} after this has been successfully implemented. KevinUp (talk) 17:09, 23 September 2018 (UTC)[reply]

See Module:ja-kanji-readings/sandbox/documentation for examples.

I think editors may end up inputting on readings in katakana, when they see them displayed that way. One response: accept on readings in katakana as well as hiragana, while in both cases displaying katakana and linking to hiragana. (This is what Module:ja-kanji-readings/sandbox does as of this edit.) Another possibility: throwing an error for katakana input. — Eru·tuon 03:36, 24 September 2018 (UTC)[reply]

Thank you very much. With your recent edit, hiragana or katakana input for on'yomi readings would all be displayed as katakana while being linked to hiragana entries. I think an error can also be added in case katakana is accidentally entered for kun'yomi readings. KevinUp (talk) 05:18, 24 September 2018 (UTC)[reply]

It is normal in some cases for a kun'yomi reading to be in katakana, such as loanwords like ページ for 頁. --Dine2016 (talk) 06:10, 24 September 2018 (UTC)[reply]

If possible, can we annotate these kun'yomi readings as ateji? Besides 頁(ページ) (pēji), are there any more examples of single character kanji (not compound words) with kun'yomi readings that are written in katakana instead of hiragana? KevinUp (talk) 06:44, 24 September 2018 (UTC)[reply]

@Eirikr, @Erutuon, @Dine2016 I think Module:ja-kanji-readings/sandbox is good enough to replace the current module. Any thoughts on making the changes go live? KevinUp (talk)

I'll go on record to say that I still disagree with this direction: I believe this change entails a number of concerning usability issues. Displaying katakana and linking through to hiragana is non-obvious and odd behavior; this display of katakana for on'yomi may falsely suggest to users that they should try searching for on'yomi in katakana, or just enter katakana on'yomi into the URL bar, which will cause confusion when they then arrive at the wrong entry, or no entry at all; we already clearly distinguish on'yomi from kun'yomi in readings tables by explicitly labeling each line, so this change is not necessary for clarity.

A related thread recently came to my attention over at the Japanese Stack Exchange, wherein a beginner asked about this convention, clearly confused about why kanji-focused dictionaries use katakana for on'yomi, when no one uses katakana for these spellings "in real life". I dimly recall experiencing a similar confusion when I started my own studies.

Considering the potential for confusion, the lack of any need to implement this change, and the fact that katakana are not actually used for on'yomi "in real life", I remain opposed to this change. ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:33, 8 October 2018 (UTC)[reply]

As a compromise, I've made a script (User:Erutuon/scripts/katakanaOnyomi.js) to change onyomi from hiragana to katakana. It just required a simple change to Module:ja-kanji-readings to allow JavaScript to find the readings. — Eru·tuon 00:48, 16 October 2018 (UTC)[reply]

The GFDL license on Commons edit

This has been posted here because your wiki allows local file uploads. Please help translate to your language.

Commons will no longer allow uploads of photos, paintings, drawings, audio and video that use the GFDL license and no other license. This starts after 14 October. Textbooks, manuals and logos, diagrams and screenshots from GFDL software manuals that only use the GFDL license are still allowed. Files licensed with both GFDL and an accepted license like Creative Commons BY-SA are still allowed.

There is no time limit to move files from other projects to Commons. The licensing date is all that counts. It doesn't matter when the file was uploaded or created. Every wiki that allows local uploads should check if bots, scripts and templates that are used to move files to Commons need to be updated. Also update your local policy documentation if needed.

The decision to allow files that only have a GFDL license, or not allow them, is a decision all wikis can make for themselves. Your wiki can decide to continue allowing the files that Commons will no longer allow after 14 October. If your wiki decides to continue to allow files after 14 October that Commons will no longer allow those files should not be moved to Commons. — Alexis Jazz, distributed by Johan using MassMessage

18:11, 20 September 2018 (UTC)

Wiktionary:About Classical Nahuatl edit

I went ahead and created Wiktionary:About Classical Nahuatl as a draft, since I don't see any rules about who can make these things.

Issues that remain to be sorted out are:

Should nouns be marked for animacy (as they currently are in Template:nci-noun)? Animacy in Classical Nahuatl is not a property of words like grammatical gender, it's more like English's natural gender. The same word can be animate or inanimate depending on what it's referring to. That said, certain things that you would think are inanimate are conventionally treated as animate (somewhat analogous to the way that, although English gender is "natural", people still refer to boats and countries as she), so that would need to be indicated somewhere. Perhaps it's best as a usage note?
- I've also noticed that some entries for inanimate nouns currently have plurals listed which are the same as the singular. Thing is, inanimate nouns don't have plurals at all. They take singular verb agreement even when referring to multiple objects. Saying "the plural of cuahuitl is cuahuitl" implies that you could say things like **niquimitta in cuahuitl "I see the trees", but that's ungrammatical because the verb has a plural object but cuahuitl is grammatical singular. Classical Nahuatl "cuahuitl" (which has no plural) is not like English "sheep" or "fish" (whose plurals are the same as the singular).
The possessed form of nouns needs to be indicated, since it isn't predictable. But which form should be used in headword-line? "my", "his/her/its", "someone's", "ours"? "Ours" is actually what Molina normally uses in his entries for possessed nouns, though dictionaries of modern varieties of Nahuatl use "his/her/its".
Some nouns distinguish between "organic" possession and regular possession, so that should be included too. But what to label it? There isn't really a standard name. It's sometimes called "inalienable possession", but that term can also refer to obligatory possession (which is the next point).
A related issue is obligatorily possessed nouns. Some words are only ever attested in the possessed form. Others are attested in the absolutive in dictionaries and grammars but are always or almost always possessed in actual texts. If the absolutive is attested but rare, should it still be used as the lemma? And if we go with a possessed form as the lemma for some words, which possessed form? (Same issue as above.)
And again, the same issue: adpositions take the same prefixes as possessed nouns, so we'd need to decide on a lemma form for them. Some adpositions already have entries that treat them as suffixes and ignore their adpositional usage, e.g. -pan.
Template:nci-noun currently includes an option to mark a noun as "locative". Problem is, locative nouns are grammatically indistinguishable from adpositions and are often derived from them. E.g. nopan (“on me”) (adposition), teopan (“temple”) (locative noun, derived from an adposition), nīxpan (“before my eyes”) (adposition? possessed locative noun?). I'm not sure if adpositions and locative nouns should be distinguished, and if not, what they should be labelled; they're sometimes called "relational nouns".
What forms should be included in the headword-line for verbs? Preterite, causative, applicative, passive, maybe even patient nouns? These are unpredictable but in many cases they also have more than one form.
Should Classical Nahuatl have tables of inflections? If so, what should they include? They can't include everything since Nahuatl is polysynthetic.

I'd appreciate any input from any other Nahuatlists we have. --Lvovmauro (talk) 04:22, 21 September 2018 (UTC)[reply]

Hi. Maybe Marrovi may have a look on it. Pamputt (talk) 12:41, 22 September 2018 (UTC)[reply]

@Lvovmauro: @Marrovi, 1. wiktionary can benefit from adding the animacy tag to words, and 2. we should only add plurals to animate nouns; we could also add link for the tag explaining that animate words have plurals opposed to inanimate. 3. I like the his/her forms as in Modern Aztec dialects. I don't see any harm in using the same style for Classical Aztec. 5. We should add absolutive states even if they are rare for normally possessed forms. 8. The preterite should definitely be included in the headword. 9. Classical Aztec should have inflection tables- we Aztequists should collaborate on choosing what to include. Aearthrise (𓂀) 17:36, 16 October 2018 (UTC)[reply]

Japanese anchors edit

Kanji spellings of this term
帰る〔歸る〕還る返る反る

I propose that we create a template for listing the kanji spellings of a Japanese word. Similar to {{zh-forms}}, it should be placed immediately under the ===Etymology x=== headers and appear on the right side of the displayed page, unless the section is a soft-redirect with {{ja-see}} in which case it is not needed. This has two advantages:

Kanji spellings of a word can be displayed more prominently and with more details (such as kyūjitai, irregular okurigana, and labels like “uncommon” and “obsolete”) than the inline {{ja-def}}.
The template can generate anchor links like み#ja-身 and 上下#ja-じょうげ so that templates like {{ja-l}} and {{ja-see}} can link right to the intended word in a list of homographs. (Entries like み and 上下 can include more than ten etymology sections.)

Please also see User:Dine2016#Random stuff for the problem of function overlap with {{ja-kanjitab}}.

(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Fumiko Take, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4): --Dine2016 (talk) 05:53, 21 September 2018 (UTC)[reply]

Support. Wyang (talk) 22:32, 21 September 2018 (UTC)[reply]

Support, with some questions.

Would this be intended for inclusion on any entry, or only certain types of entry (such as kana spellings)?
Would editors be able to indicate common/uncommon/rare/ancient/etc. spellings?
What barriers are there to adding details inline? Can we not do things like the following? {{ja-def|[KANJI]}} {{lb|ja|sort=[SORT]|rare|possibly|_|obsolete}} [SENSE]

‑‑ Eiríkr Útlendi │^{Tala við mig} 23:46, 21 September 2018 (UTC)[reply]

Support. User:Eirikr: yes to one of your questions, notes are added in {{zh-forms}}, which can be further tweaked for Japanese. --Anatoli T. ^{(обсудить}/^вклад) 00:43, 22 September 2018 (UTC)[reply]

@Eirikr: Thanks. (1) It is intended to be placed on the lemma entries. I'll work out something about this later. (3) First {{ja-def|無い|亡い}} {{lb|ja|rare|possibly|_|obsolete}} [[not]]; there is [[no]] looks like the labels are describing the sense, not the kanji spellings. So a better format should be {{ja-def|無い|亡い|lb1=uncommon|lb2=obsolete}} [[not]]; there is [[no]]. Second, inline information must be laid one-dimensionally, which makes definition lines extremely crammed in cases like

おいはらう:
- 追い払う
  - 追払う, 追い払らう, 追払らう (irregular okurigana usage)
  - 追ひ払ふ, 追払ふ, 追払らふ (historical kana)
  - 追い拂う, 追拂う (kyūjitai)
  - 追ひ拂ふ, 追拂ふ, 追拂らふ (kyūjitai with historical kana)
- 追いはらう
  - 追ひはらふ (historical kana)

In addition, if there are multiple definitions, the inline information must usually be repeated on every definition line, sometimes with variations, which makes maintainence harder. Third, it's not ideal for anchoring. You would jump directly to the definition lines if you go from 無い to ない. It's better if you could jump under the etymology header, in which case etymology and pronunciation information will be on your screen. --Dine2016 (talk) 02:03, 22 September 2018 (UTC)[reply]

Huh, I found myself not good at web design, so I probably have nothing to come up with. In short, if the term is lemmatized at the kana, the kanji spelling list should appear on the kana entry. If the term is lemmatized at a kanji spelling, likewise, in which case the kanji spelling in question will be displayed in a bold font rather than as a link.

I'm thinking about “absorbing” the kanjitabs into the template, like this. --Dine2016 (talk) 14:55, 22 September 2018 (UTC)[reply]

Support. —Suzukaze-c ◇◇ 03:15, 23 September 2018 (UTC)[reply]

Support. ― AstroVulpes (talk) 07:36, 26 September 2018 (UTC)[reply]

centralizing kanjitab information edit

@Wyang, Eirikr, Atitarev, Suzukaze-c and others: I have created {{ja-ks}} to also allow centralizing kanjitab information on the lemma entry. For example, we could have {{ja-ks|勾玉|曲玉|y=まが,たま,k}} (short for {{ja-ks|勾玉|曲玉|y1=まが,たま,k|y2=まが,たま,k}}) on the lemma entry (まがたま), then {{ja-see}} could copy the kanjitab information to the non-lemma entries (勾玉, 曲玉) and generate the kanjitabs there. What do you think about such an approach? --Dine2016 (talk) 04:45, 24 September 2018 (UTC)[reply]

Looks good. I might prefer to have Kanji and their readings in a single parameter, maybe something like 勾玉:まが,たま, 勾玉:{まが,たま}. This would make {{ja-see}} easier to code. Also it would be good to remove the inner "Kanji in this term", which duplicates the outer table heading. Wyang (talk) 09:09, 24 September 2018 (UTC)[reply]

@Wyang: Thanks. I agree it's better to group the spelling and reading information in the same parameter (e.g. 勾玉:まが,たま), though it was disputed some time ago.

I'm not sure about the best design for kanjitab. For example, should it be centralized on the lemma spelling? Should it generate the kyūjitai form if possible (to avoid editor inconsistencies like 告/吿)? How should okurigana be handled (historical dictionaries like the Nihon Kokugo Daijiten omit all okurigana and map kanji to full morphemes, but there are inconsistencies like あじ-わう【味―】 and あじわい-しる【味知】)? --Dine2016 (talk) 10:25, 24 September 2018 (UTC)[reply]

If possible, can we add kyūjitai, historical and nonstandard lemma forms only if these are attestable? I hope we can create a page for Japanese references similar to what we have for Chinese, Korean and Vietnamese to look these up. KevinUp (talk) 06:29, 25 September 2018 (UTC)[reply]

Concern: I am worried about the name {{ja-ks}} -- the template name is too short, and it's non-obvious unless you already know what it is. The ja- prefix is fine. But how are users supposed to know what ks stands for?

Proposal: Rename or redirect to something like {{ja-kanji spellings}}. ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:42, 24 September 2018 (UTC)[reply]

Done. {{ja-ks}} will now redirect to {{ja-kanji spellings}}. On an unrelated note, I found a blank template: Template:ja-kanjireading (See discussion here). KevinUp (talk) 06:29, 25 September 2018 (UTC)[reply]

Thank you for the redirect, that is much appreciated.

Re: {{ja-kanji reading}} (target of the redirect at {{ja-kanjireading}}), that was deliberately orphaned and deleted following a change in approach to how we handled readings. Did you have a new use in mind for that template name? ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:15, 25 September 2018 (UTC)[reply]

Just wondering why there is a blank template there, not really planning on using the template name. I think {{ja-kanjireading}} can be removed as it leads to nowhere. KevinUp (talk) 02:57, 26 September 2018 (UTC)[reply]

@KevinUp: Ah, thank you, I misunderstood you earlier. I've deleted the leftover redirect. Thank you for pointing it out. ‑‑ Eiríkr Útlendi │^{Tala við mig} 08:16, 30 September 2018 (UTC)[reply]

Should it include form only with different kanji? (掛ける/懸ける.) Or should it also include spellings with a different combination of kanji with hiragana and katakana? (折り紙/折紙 and 鮴押し屋/ゴリ押し屋.) —britannic124 (talk) 20:29, 27 September 2018 (UTC)[reply]

	おる > おり kun’yomi	かみ > がみ kun’yomi
shin. and kyū. (折り紙/折紙)	折 Grade: 4	紙 Grade: 2

@Britannic124: Hi. My original plan is to have {{ja-kanjitab}} handle variations of the same kanji spelling like this. 鮴押し屋/ゴリ押し屋 should belong to different kanji strings (e.g. handled by {{ja-kanji spellings}}) but can also be handled by {{ja-kanjitab}} if it's more convenient. --Dine2016 (talk) 04:01, 29 September 2018 (UTC)[reply]

Terms consisting partly of spelled-out letter names edit

Following up on Wiktionary:Beer parlour/2017/April#Words_formed_by_respelling_letters,_e.g._deejay, I created Category:English spelled-out initialisms, but now I want to know what we want to do with half/edge cases, like eff off, tee shirt/tee-shirt/teeshirt, tee ball/tee-ball, union tee, and (in other languages) esesman and defeño. Put them in the category? Put them into a separate category? Leave them uncategorized? - -sche (discuss) 21:54, 21 September 2018 (UTC)[reply]

Latin pronunciation for non-Classical terms edit

Is there any automated means for representing the IPA pronunciation of Latin words that did not exist in the Classical period? For example, the word cranium first appears in Medieval Latin, but our Latin entry currently has the Classical pronunciation, which is a gross anachronism.

I can find nothing in the documentation for using {{la-IPA}} to accomplish this task. It will allow for the addition of Ecclesiastic Latin pronunciation, but not the exclusion of the Classical. --EncycloPetey (talk) 04:02, 22 September 2018 (UTC)[reply]

I can't find the thread, but someone recently proposed changing the label in such cases, or even in all cases, from "Classical" to "Classicist", rather than removing the pronunciation altogether; I think that's a better solution, since pronunciation is apparently regular enough that we can know what it would have been, and people reading Latin texts today would find it useful. Anyway, this is a recurring issue and I agree something should be done. (Incidentally, it seems a bit odd that we give an "Ecclesiastical Latin" pronunciation, when AFAIK there are regional differences.) - -sche (discuss) 05:49, 22 September 2018 (UTC)[reply]

But the pronunciation would be thoroughly anachronistic. It would be saying: "Here's what the pronunciation would have been if the word had been invented a millenium earlier." Why give a pronunciation for a word from a period 700+ years before the word even existed? Putting a Classical pronunciation on a Medieval Latin word is like adding Middle English pronunciations to words that only exist in Modern English. Wiktionary's underlying principle has always been to be descriptive, and inventing anachronistic pronunciations for words runs counter to that principle. --EncycloPetey (talk) 19:01, 22 September 2018 (UTC)[reply]

Is it not also saying how it is pronounced currently by people who try to speak Latin with a Classical pronunciation?--Prosfilaes (talk) 09:17, 23 September 2018 (UTC)[reply]

Yea, that surely happens. Naturally if I find a Medieval word I will speak it with the now known Classical pronunciation. What do you think about the people on the Latin Wikipedia? The same. Besides the humanists already classicized the Medieval words. Fay Freak (talk) 12:30, 23 September 2018 (UTC)[reply]

Then we should use a label such as "Modern" to indicate that it is a modern pronunciation. But the problem with doing so is that Modern "Classical" pronunciations vary by country, often quite strongly. So, yes, modern students of Latin use a "classicized" pronunciation, but there is no consistency in the way it is applied in different countries, or even by specialist subgroups within a single country.

So we return to my original objection of using a "Classical" label on a word that did not exist at the time. --EncycloPetey (talk) 15:58, 23 September 2018 (UTC)[reply]

Adding 'sarcastic' to Appendix:Glossary for lb template. edit

Harking back to the topic of idiomatic sarcasm, I'd like to add to Appendix:Glossary an entry for 'sarcastic', whose definition reads something along the lines of: 'This word is commonly used sarcastically to express something contrary to its meaning, while using other words with the same basic meaning in this manner may not sound natural'.
An example I'm thinking of is German adverb 'schön' (nicely), which is idiomatically to express undesired results such as 'schön die Milch verkippt' ("nicely spilled the milk") while using similar words such as 'gut' (well), 'wunderbar' (wonderfully), 'großartig' (greatly) etc. pp. will not be readily understood in this construction or at the very least not sound natural. Korn [kʰũːɘ̃n] (talk) 10:40, 22 September 2018 (UTC)[reply]

Don’t you mean irony – specifically verbal irony? Sarcasm is a much wider notion; see e.g. Sarkasmus & Ironie: Der gar nicht mal so feine Unterschied.
Whether a nominally positive word “works” when irony is intended depends on the context and tone of voice. Instead of “schön gemacht!” you can say so many things, like “prima gemacht!” or “toll gemacht!” If your boss says, “Ausgezeichnet! Perfekt! Genau was wir brauchten!”, then – unless you are expecting a complement for a job well done – you just know she is mocking you in exasperation. I think she might as well have said “Wundervoll! Genau was wir brauchten!”; I question whether this would be any less natural or understood than “schön“, “super”, “genial”, or “brilliant, wirklich brilliant”. I don’t think we should label all these praiseful adjectives as being suitable for ironic use. --Lambiam 18:45, 22 September 2018 (UTC)[reply]

I've made a specific point about using the adverb 'schön' to express undesired results, which is never done with toll, prima and so forth. Please stay on topic, your paragraph does not relate to what I'm talking about. Korn [kʰũːɘ̃n] (talk) 10:54, 23 September 2018 (UTC)[reply]

You can of course add another sense # (ironic) poorly, blunderingly, incompetently to schön#Adverb, but I see little need for a glossary addition. --Lambiam 09:37, 24 September 2018 (UTC)[reply]

The need is in informing the user that this word, but not others of the same meaning, can be used in this sense, which applies to multiple words and would, without an automatic link in the glossary, result in the same usage note being present on several pages, which is unœconomic. Korn [kʰũːɘ̃n] (talk) 21:12, 24 September 2018 (UTC)[reply]

The suggestion above involves a label ({{lb}}), not a usage note. --Lambiam 16:44, 27 September 2018 (UTC)[reply]

breed-specific legislation edit

Hello, I created the definition for breed-specific legislation and added three quotes from notable webpages and an editor keeps removing them because the editor believes webpage quotations are not allowed. I disagree with this persons opinion. I think both books and webpage quotes should be allowed at Wiktionary. I did find this reference: Template:quote-web. To avoid an edit-war, would an administrator please participate. Thank you IQ125 (talk) 09:02, 24 September 2018 (UTC)[reply]

Notability is not relevant on Wiktionary, and "Notable webpages" have no standing in Wiktionary's Criteria for inclusion- for English, usage conveying meaning in three independent durably-archived sources spanning more than a year is required. The administrator you are arguing with is correct that quotes from web pages and those that define, but don't use the term are useless in the verification process- we delete entries with "three quotes from notable webpages" all the time.

Such quotes are certainly allowed if they contribute something to the entry other than verifying the existence of the term, but the way you're talking about the quotes suggests that's not your purpose. Chuck Entz (talk) 12:53, 24 September 2018 (UTC)[reply]

What do you consider to be my purpose? In addition, why does Wiktionary have the Template:quote-web? IQ125 (talk) 15:47, 24 September 2018 (UTC)[reply]

Quotes are provided to show usage examples. See also [[WT:QUOTE]].

And, as noted in [[Template:quote-web/documentation]]:

Usage edit

This template can be used in a dictionary entry to provide a quotation from a webpage. Do not use the template for online versions of books or journal articles (including magazines and newspapers) – use {{quote-book}} or {{quote-journal}} instead.
For citations in "References" sections and on talk pages, use {{cite-web}}.

Given your apparent background in editing at Wikipedia, you may also find [[WT:WINW]] a useful reference, as that page discusses some of the notable differences in approach and policy.

HTH, ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:50, 24 September 2018 (UTC)[reply]

I believe you are supporting my position as opposed to Entz position. IQ125 (talk) 12:35, 25 September 2018 (UTC)[reply]

Recurring theme of ridiculous rubbish requests at Wiktionary:Translation requests edit

Does anyone know why we're getting a pretty constant stream of rubbish?

Things like this, posted by an anon:

ไก่ได้เห็นคางคกแมงมุมฟุตบอลสถานที่บดและพวกเขาสนุกกับเวลาของพวกเขาเผาสิ่งสบายๆเช่นเกราะที่ซ่อนอยู่ในตู้

Google Translate says:

Chickens have seen toads, spiders, football, grinding places, and they enjoy their time burning something casual like the hidden armor in the cupboard.

I know MT in general can be pretty awful, but I really don't think the idiocy evident here is from the algorithm.

This kind of bullshit request is not uncommon. It's a waste of time and resources. Should we be reverting such posts? ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:01, 24 September 2018 (UTC)[reply]

I have noticed it as well; it seems to concern a variety of languages (translating from a language to English) and comes from a wide range of IPs. It is a fairly strange phenomenon, but if the text is obviously rubbish, I don't personally see much wrong in simply reverting them. SURJECTION _{·talk·contr·log·} 18:05, 24 September 2018 (UTC)[reply]

Seeing what search terms get people to that page would be interesting (referral data). DTLHS (talk) 18:11, 24 September 2018 (UTC)[reply]

I've asked about that at w:Wikipedia:Village pump (technical)#Referral_data_/_what_search_terms_get_people_to_wiki_pages. - -sche (discuss) 22:21, 24 September 2018 (UTC)[reply]

I believe they are all Bu193 (talk • contribs). Wyang (talk) 22:25, 24 September 2018 (UTC)[reply]

I don't like them and consider them to be abuse of a free resource, but I like how Stephen is kindhearted enough to consider the possibility that they are legitimate. —Suzukaze-c ◇◇ 22:39, 24 September 2018 (UTC)[reply]

First thought when I see the phrase is WTF. Whoever that output this is probably not a person. I think about some AI (or word tangling bot) trying to synthesize sentence but it is not good enough just random meaning. Such bot is usually used by spammers. I suggest to reject all of these in case of spamming. --Octahedron80 (talk) 01:10, 15 October 2018 (UTC)[reply]

Using a piṭakamālā to verify Pali spellings edit

May I use a Tai Tham script contents list of the Tipitaka to verify Pali spellings in the Tai Tham script? I'm not sure whether the ones I have count as being in Pali or Northern Thai, as although the titles of the suttas are in Pali, the surrounding notes are in Northern Thai. Would the titles count as uses or as mentions of the underlying ordinary words? The contents lists I have are published in a book.

The primary reason for verifying the spelling is that Unicode offers several ways of writing several Pali 'letters', and the rules for choosing the form vary from school to school, and are not fully documented. - RichardW57m (talk) 13:10, 26 September 2018 (UTC)[reply]

Tremendous Wiktionary User Group 2nd annual report edit

Hi,

The Tremendous Wiktionary User Group is celebrating its second anniversary today and the present is the annual report.

This report is not very positive, beside some good points. I spent a lot of energy at the beginning of this group, with the hope of unity beyond language diversity, reinforcing communication and sharing good ideas between communities. I mainly spent time translating things into English. Finally, being in charge of LexiSession and French Wiktionary Actualités is already quite a lot and I wasn't supported so much by the people from the other communities. I recently asked you about meetings and you seem to be reluctant to meet up. So, my dreams of a WiktiCON or a global meeting of Wiktionarians seem to be postponed for a while.

Well, I still don't know very much where this group is going, or if it is of any use to put our lexi-communities together and to be more visible to the Wikimedia Foundation. If you have any ideas about this group and its future, I will keep an eye on this page, and will be happy to discuss proposals or criticism Noé 13:07, 26 September 2018 (UTC)[reply]

Phrasebook entries are lemmas? edit

Do phrasebook entries like "I can't find my " belong in Category:English lemmas? I hope not. 83.216.80.232 21:15, 26 September 2018 (UTC)[reply]

Where else do you want to put it? Not on a single page surely, it would load too long. An appendix translation table would not look well and not be as easily found either. The phrases already have few clicks. Fay Freak (talk) 22:17, 26 September 2018 (UTC)[reply]

What's wrong with just Category:English phrasebook? In fact that puts it indirectly in the lemma category anyway. 83.216.80.232 22:47, 26 September 2018 (UTC)[reply]

If you want more clicks you could try linking to Category:Phrasebooks_by_language from the body of the main page. Just a thought... 83.216.80.232 23:08, 26 September 2018 (UTC)[reply]

Latin not on Wiktionary:Criteria for inclusion/Well documented languages? edit

Why not? It seems very well documented online to me. DTLHS (talk) 16:51, 27 September 2018 (UTC)[reply]

Latin is extinct, and therefore falls under a different set of criteria. However, we have applied this standard to Latin written after the language died, and that has been subject to a great deal of discussion and disagreement. —Μετάknowledge^{discuss/deeds} 17:27, 27 September 2018 (UTC)[reply]

What is Wiktionary's stance on reconstructions missing from sources? edit

Wiktionary has a lot of reconstructions that are not present in any other linguistic source. Sources never contain all possible reconstructable forms, and Wiktionary fills these gaps, providing information that is not available anywhere else. I have always seen this as a strength. Given that Wiktionary is an etymology dictionary itself, it makes sense not to rely on what is already present in existing dictionaries. It has never been our mission to parrot other sources. Just as dictionaries of attested languages can include terms that don't exist and have never been used, there are plenty of cases where sources include reconstructions that are demonstrably incorrect, or state that a certain later term descended from a specific reconstruction when sound laws clearly give a different outcome. Wiktionary's practices of independent verification allow errors in sources to be caught and corrected, and omissions to be filled in.

I've been interested in importing Wiktionary entries to Wikidata, but the people on Wikidata are demanding that all of Wiktionary's reconstructions come from an external source in order to be includable. This seems to clash with Wiktionary's current practice, and means that Wiktionary may not be able to rely on Wikidata for reconstructions in the future. So I'm asking for a clarification of Wiktionary's stance on reconstructions and sourcing. —Rua (mew) 17:29, 27 September 2018 (UTC)[reply]

"Wiktionary may not be able to rely on Wikidata for reconstructions in the future" Huh? Why would we ever rely on Wikidata? Frankly their approach seems entirely orthogonal to ours and they are better off starting from scratch rather than "importing" anything. I don't think we should change any of our policies to fit whatever Wikidata is doing. If that means they can't include something that we have too bad. DTLHS (talk) 17:33, 27 September 2018 (UTC)[reply]

I agree in part with both parties here. Rua points out that Wiktionary's information is, in places, both independent of and superior to formally dead-tree-published writings. DTLHS points out that Wikidata's approach does not mesh at all well with Wiktionary's, and that catering to their needs may be to our detriment.

Regarding Rua's core question, restated: what is Wiktionary's stance on reconstructions and sourcing? I have nothing directly addressing that. WT:WINW does mention:

Wiktionary is a secondary source for its subject matter (words and phrases) whereas Wikipedia is a tertiary source for its subject matter (topics). This means that while Wikipedia documents what other people say about topics, Wiktionary documents words and phrases itself without relying on the statements of other people. As a consequence, the requirements of verifiability are very different. Verification on Wikipedia asks "can we find a credible source that says it is the case?" while on Wiktionary we ask "can we find real-life examples to show it is the case?". This also means that whereas Wikipedia discourages original research and relies on the research of others, Wiktionary users themselves actively research terms and their meanings.

Meanwhile, WT:WFW states:

Wiktionary is a secondary source. Rather than trying to document the words that others have documented, we do all the documenting first hand. This means that to prove a word exists and is in use, we need to cite actual usage, not documentation of that usage. We research and cite usage that happens "in the wild" so to say, rather than relying on what other sources say about something. And as a consequence, we don't have any policy on original research; sometimes original research is inevitable and necessary to help us document a rare or brand-new use that no one else has documented before. Importantly, on Wiktionary allows attestation from any durably archived source, which includes not just "credible" or "reliable" sources. As our aim is to be descriptive and neutral, we don't discriminate between different speakers of a language, as noted in the section above. This means that we allow citations from colloquial sources such as Usenet, if they are considered durably archived for Wiktionary's purposes.
[…]
Wiktionary does not include just terms and their definitions, but also other information about terms such as pronunciation and etymology. Verification of this information (using the {{rfv-pronunciation}} and {{rfv-etymology}} templates) follows a similar process, but in this case secondary sources are required as references like on Wikipedia. However, this kind of information is not the mainstay of Wiktionary's content, so it's not submitted for verification as often as terms and definitions themselves are.

The lower paragraph suggests that we do need to provide sources. However, there are cases where secondary sources are clearly incorrect, as when deriving a term in contravention of known sound laws and providing no explanation for such deviation from an otherwise well-established linguistic principle. There are also cases where secondary sources don't say anything at all, despite the presence of circumstantial evidence or other reconstructable processes that would fill in gaps. Simply parroting known-incorrect data, or simply saying nothing, both strike me as sub-optimal. ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:19, 27 September 2018 (UTC)[reply]

Sorry to intervene here (because I do not really contribute here), but from what you cite Eirikr, I understand it applies for "real" words. From WT:WINW, when I read "can we find real-life examples to show it is the case?", I understand this applies for words that you can hear or you can find in a text. Here we are talking about words that do not exist. Reconstructed forms are words that may have exist in the past but there is no written traces of them. This is also what I understand from this sentence: "And as a consequence, we don't have any policy on original research; sometimes original research is inevitable and necessary to help us document a rare or brand-new use that no one else has documented before. Importantly, on Wiktionary allows attestation from any durably archived source, which includes not just "credible" or "reliable" sources." Pamputt (talk) 19:02, 27 September 2018 (UTC)[reply]

That sentence in WT:WFW “secondary sources are required as references like on Wikipedia” clearly comes from an European-languages viewpoint and has been derogated by best practice already. Only for Indo-European languages there can be the principal assumption that there is a reference for every etymology section, and even in widespread ones of those there are cases where we have ridiculed this rule: I remind everyone of Vahagn Petrosyan (talk • contribs)’s grandiose resolution of the origin of the Armenian term for the white poplar կաղամախի (kałamaxi), User talk:Vahagn Petrosyan § կաղամախի Etymology. Wiktionary:About Arabic describes the state of documentation on a language the speaker count of which supposedly amounts to a number that is half the population count of Europe. There isn’t an etymological dictionary of Arabic, that’s something one needs to hold critics in the face. And for Sami … its speaker count is ¹⁄₁₀₀₀₀ of the one of Arabic. If one desires references for the etymologies one lives in the technocratic ivory tower of Anglo privilege! Regarding pronunciations, having references for them is even difficult for English, and the Chinese contributors fare well with getting them from the streets or Youtube; also pronunciations can be gathered from audio recordings, so this is also a reason why I am going to think now how to reword that locus in WT:WFW. The personnel we have is the key to quality, like a book published by some scientist attains credibility by author and publisher name and is for this reason adduced. Thus has spoken someone who is absolutely not a Wikipedian.

Also, can anyone tell me where I could start to browse Wikidata for information on words? I have hitherto been untainted by requirements of other Wikimedia projects. Fay Freak (talk) 22:25, 27 September 2018 (UTC)[reply]

It clearly comes from a viewpoint that desires references for claims. If we have no references, that doesn't mean that we should start making up our own answers.--Prosfilaes (talk) 00:02, 2 October 2018 (UTC)[reply]

If A + B = C, but no references state that, are we thus proscribed from stating the obvious conclusion ourselves? How closely do we hew to Wikipedia's strictures? ‑‑ Eiríkr Útlendi │^{Tala við mig} 07:43, 2 October 2018 (UTC)[reply]

I don't see anything under discussion that's comparable to algorithmically derivable arithmetic questions.--Prosfilaes (talk) 02:53, 5 October 2018 (UTC)[reply]

My query comes from Wikipedia:No_original_research#Synthesis_of_published_material. Are we binding ourselves by this same restriction? If a conclusion is not explicitly stated by a reference, are we forbidden from stating that conclusion? ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:51, 8 October 2018 (UTC)[reply]

I too agree with both parties. Careful original research that follows from scholarly sources is good, and we shouldn't care what Wikidata thinks. This is yet another example of how Wikidata wants to do lexicography without respecting or listening to the Wikimedians who are already doing it. —Μετάknowledge^{discuss/deeds} 19:58, 27 September 2018 (UTC)[reply]
@Metaknowledge I am not sure I understand your point. Does it mean that you accept reconstructed forms without any reference here? Pamputt (talk) 20:25, 27 September 2018 (UTC)[reply]
@Pamputt: No. Please see Wiktionary:Votes/2013-10/Reconstructions need references. —Μετάknowledge^{discuss/deeds} 00:53, 28 September 2018 (UTC)[reply]
See that according to any principles of law, acceptance by extensions, limitations or other alterations is deemed a refusal. Option 1 failed 1-4-1, Option 2 failed 4-9-1. Also if it hadn’t failed, the wording of the rules would be contentious. 1. At first, I could read the part “sources (scholarly work)” in various ways. If it really meant scholarly work, why not just write “scholarly work”? 2. ”or provide evidence” does not say that the descendants aren’t by themselves evidence; the “or” can be read “or if” (as there has been an “if” in the sentence), i. e. two unrelated conditions that make evidence. Though in this case it would be more likely that the words “only if they have references to sources (scholarly work) that” would be under a., it does not really seem that anyone believed that the sound changes for every descendant language need to be explained. Well according to good practice we delete reconstruction pages if they have no descendants because we don't want a blue link for the sake of having a blue link as Per utramque cavernam (talk • contribs) has User_talk:Greenismean2016#Sources recently put it to Greenismean2016 (talk • contribs). The sentence “Appendix pages on reconstructed protoforms that remain unreferenced after a certain amount of time was given for references to be added should be deleted.” does not contradict it because links to words are also references, and also the part “certain amount of time” is contentious. Good that it hasn’t passed. Part of the reason of not-passing is surely the perplexing wording. Also if we really interpret that vote’s rules the way that descendants have no say it would contradict the lex superior that Wiktionary is a secondary source and would therefore need to be annulled. Fay Freak (talk) 12:14, 28 September 2018 (UTC)[reply]

This reminds me of the problems we've had trying to import other (out of copyright) English dictionaries, that have different inclusion criteria and different definitions of what "English" is. That's not a weakness, that's a strength. Having more dictionaries with different styles of content is a good thing. But we shouldn't be mass importing data between one dictionary and another, since they are fundamentally different. DTLHS (talk) 20:16, 27 September 2018 (UTC)[reply]

Hmmm, actually I think there is a misunderstanding about Wikidata. For now, Wikidata do not think anything about reconstructed form. There are only preliminary discussions about that. The point is Rua wants to import a lot of Proto-Samic "words" - coming from here - on Wikidata and, as a contributor to the French Wiktionary and contributor to Wikidata, I am a bit suprised that she wants to import reconstructed forms without any reference. For example, original work for etymology and reconstructed forms are not allowed on the French Wiktionary, or at least strongly not encouraged. So there are two questions, the Wikidata question and the reconstructed forms question. Pamputt (talk) 20:32, 27 September 2018 (UTC)[reply]

I note that the key discussion that prompted this discussion in the first place seems to be at d:Wikidata:Requests_for_permissions/Bot/MewBot. This also makes it clear that the issue is not reconstructions completely absent from sources, but rather reconstructions that have been adjusted to some extent.

In any case, I don't really see the point of all this: reconstructions are prone to changing every so often with reinterpretations of historical phonology. (E.g. in modern sources Proto-Semitic has *s *ts *ɬ, versus *s₁ *s₂ *s₃ in slightly older ones and *š *s *ś in even older ones.) They don't seem to be … factual enough to be worth adding on Wikidata. Unless, perhaps, to document specific sources, but that kind of rules out OP's question entirely then. --Tropylium (talk) 18:48, 3 October 2018 (UTC)[reply]

The goal is not as much to give specific reconstructions. Rather, the main goal is to give ancestral lexemes that ties the descendants together, thus establishing that they are cognates descended from a common ancestral form. What that form is, is less important than the fact that those terms derive from it. —Rua (mew) 19:17, 3 October 2018 (UTC)[reply]

Ido suffixes edit

People tend to write Ido entries for suffixes four different ways:

Writing it like a false-interfix: -et-.
Writing it with the desinence: -eyo.
Adding entries for both: -es- and -esar.
Adding different pages for different desinences: -ifar and -ifo.

I think this is messy, and so, my proposal is that we change it to the style of several Ido books, writing it just like: -et, -ey, -es and -if. That way, we can write etymologies like {{suffix|io|aquo|iz|ar}} or possibly {{suffix|io|aquo|iz|alt2=izar}}. — Algentem (talk) 03:20, 29 September 2018 (UTC)[reply]

I support the proposed standard, the final hyphen in suffixes always struck me as odd and this is in my view an elegant solution. ~~←₰-→~~ Lingo ^Bingo _Dingo (talk) 06:45, 4 October 2018 (UTC)[reply]