Open main menu

Language log on VietnameseEdit

Evidently, it is more-or-less common to have diacrtictless Vietnamese. Also, there are some shorthands like "L/L" interpolated from Chinese varieties that are common. I'm not sure how editors here can incorporate that knowledge into the dictionary but I figured I would surface it for anyone who works on Vietnamese. —Justin (koavf)TCM 04:32, 1 October 2018 (UTC)

I created L/L. That sign is probably just created on a computer lacking Vietnamese input methods. Diacritic-less Vietnamese is common in chats, but it can be highly ambiguous- a commonly cited example is Em dang o truong in a text message sent by a female, which can mean either I'm at school or I'm completely naked. Wyang (talk) 04:40, 1 October 2018 (UTC)
I've made diacriticless ="accentless", not "diacrtictless". Diacriticless Vietnamese must be much worse than toneless Chinese pinyin. We don't need those as alt spellings or even redirects. --Anatoli T. (обсудить/вклад) 04:56, 1 October 2018 (UTC)
It's like in accentless Spanish "Toque el cono" can mean "touch the cunt" or "I touched the cone"both of which, of course, are great titles for songs on my next thrash metal album --WF110 (talk) 19:21, 1 October 2018 (UTC)

New language family and protolanguage requestEdit

There's currently no codes for the Mixtec family or for Proto-Mixtec. Mixtec is a branch of Mixtecan alongside Cuicatec (which also lacks a code) and Trique (which does have a code, for some reason). --Lvovmauro (talk) 05:11, 1 October 2018 (UTC)

Oh, and likewise for Otomi and Proto-Otomi. There's probably a lot of these missing. --Lvovmauro (talk) 05:20, 1 October 2018 (UTC)

Featured entriesEdit

Hi,

Is there a list of featured entries in English Wiktionary, similar as Wikipedia featured articles? There is a ticket in Phabricator to add to Cognate a way to indicate with the links to other versions of Wiktionary if the page there is featured, but it may be too early to develop this feature if French Wiktionary is the only Wiktionary project having a list of good entries   Noé 08:58, 1 October 2018 (UTC)

The only featured entries are the words of the day on the main page. DTLHS (talk) 16:22, 1 October 2018 (UTC)

Category:en:VietnameseEdit

Can someone help me? I want to create a category for the Vietnamese language to house Chu Nom, like we have a Japanese category for hiragana, katakana, etc. ---> Tooironic (talk) 11:21, 1 October 2018 (UTC)

@Tooironic: It looks like these are appropriately categorized now. —Justin (koavf)TCM 16:59, 1 October 2018 (UTC)
Category:Vietnamese Nom sounds like what you're proposing. — Eru·tuon 20:06, 1 October 2018 (UTC)
No, this is for English terms that are contextually related to the Vietnamese language. DTLHS (talk) 20:11, 1 October 2018 (UTC)
Oh, I see. Duh. The place to add Category:en:Vietnamese is apparently Module:category tree/topic cat/data/Communication. Done. — Eru·tuon 20:25, 1 October 2018 (UTC)

What about uniformise a little bit all Wiktionaries?Edit

Hello, could you tell me what do you think about T150841 and especially the Addshore's comment. In brief, the idea is to have a uniform structure on Wiktionaries in order to be able to identify automatically a language section. This is needed in order, for example, to apply different colours for interwiki links depending on:

  1. there is a section in English on enwikt and a section in English on frwikt => the interwiki link to the French Wiktionary will be blue
  2. here is a section in English on enwikt but no section in English on frwikt => the interwiki link to the French Wiktionary will be green (or another colour)
  3. There is no section in English on enwikt and no section in English on enwikt => the interwiki link to frwikt may remain blue or have another colour.

This is only the use case that is discussed in this ticket but we may imagine other use case in the future.

On the French Wiktionary, we already have template for language heading so I think it will be easier than in your case (I guess using a template may be a good idea here). Pamputt (talk) 20:11, 1 October 2018 (UTC)

Wouldn't this depend on having uniform language codes? We have many customized, merged or deleted languages that will not be uniform across projects. DTLHS (talk) 20:21, 1 October 2018 (UTC)
I don't see how us having a template in the header would make a difference. We have a bijective list of language names and language codes. The real problem is that we don't have bijectivity of languages between Wiktionaries, as DTLHS points out. —Μετάknowledgediscuss/deeds 20:30, 1 October 2018 (UTC)
You are right, there are two points. One is the language code (the code has to be the same between Wiktionaries) and the other is the magic word that Addshore propose to add to do what I described above.
About the language codes, I do not think they are two different among Wiktionaries, at least for main languages (for minority languages, this is another story, I guess). Most of Wiktionaries use "en" for the English language (liwikt, rowikit, viwikt and zhwikt use the "eng" code). You can see how different are the language headers between Wiktionaries here. Pamputt (talk) 06:02, 2 October 2018 (UTC)
First of all, if a solution doesn't work for minority languages, then it doesn't work at all. Most of our coded languages are "minority languages" of one type or another. Secondly, even major languages like Serbo-Croatian (sh or hr+sr or hr+sr+bs) are not bijective. You can't sweep this under the rug and expect your proposal to work. —Μετάknowledgediscuss/deeds 16:51, 3 October 2018 (UTC)
Actually, the proposal applies mainly for major languages, i.e. languages that are used on one Wiktionary project (about 170 languages in total). Pamputt (talk) 05:30, 4 October 2018 (UTC)
Why would you only focus on languages that have Wiktionaries? If I want to know if the French Wiktionary has a Vandalic section at fr:drincan (it currently doesn't), would the link at en:drincan be blue or green as a default? If it's blue, it would imply that fr.wikt had a Vandalic section, which would be false. If it's green, then the Old English interwiki would also be green, which would imply that fr:drincan lacks an Old English section, even though it actually has one. Both Vandalic and Old English lack Wiktionaries, so by ignoring them, you're essentially setting up a situation where the links will be misleading. (Also, you didn't address the problem of Serbo-Croatian.) —Μετάknowledgediscuss/deeds 05:57, 4 October 2018 (UTC)
The proposal is the reversal: in the English Wiktionary highlight the interwiki links that have an English section. In the French Wiktionary it would highlight the English interwiki if it detects somehow that on en.wikt there is a French section. This needs some magic uniformising the language headings across Wiktionaries. --Vriullop (talk) 09:05, 4 October 2018 (UTC)
Vriullop is right, the proposal is the reversal. About Serbo-Croatian, there exist Serbo-Croatian Wiktionary, Croatian Wiktionary, Serbian Wiktionary and Bosnian Wiktionary, so the "problem" will be addressed the same way as French or English. Pamputt (talk) 09:16, 4 October 2018 (UTC)
I think there's still a question about what should happen in the interwiki links from the Croatian, Serbian, and Bosnian Wiktionaries that point to the English Wiktionary. On English Wiktionary we only use Serbo-Croatian in headers, not Croatian, Serbian, or Bosnian. So in the proposed extension perhaps links from the Croatian, Serbian, and Bosnian Wiktionaries to the English Wiktionary should be colored to indicate whether there is a Serbo-Croatian entry in the English Wiktionary. Even though the language code is different, the word that the entry is about is probably the same. If the extension does not do this, then no links from the Croatian, Serbian, and Bosnian Wiktionaries to the English Wiktionary will ever be highlighted. — Eru·tuon 19:27, 4 October 2018 (UTC)
Given how butthurt some of the people in those speaker communities are, I suspect that they would choose to not have the link over linking to an "evil" language. —Rua (mew) 20:56, 6 October 2018 (UTC)

InktoberEdit

Hi,

A proposal: this month, a lot of people all over the world are doing Inktober challenge, drawing an ink art per day during a month. There is a list of 31 themes. We may take part of this challenge our way, by adding more content on those pages. Plus, those pages may be more consulted during this month so it's better if the content is good. Most of them are already fine pages of course, but there is still some improvement to do   Noé 06:44, 2 October 2018 (UTC)

Hey Noe! To me it's quite an odd proposal - I doubt there'll be a significant rise in traffic to those Wiktionary pages, really - to be fair, Inktober is pretty obscure. Anyway, it's worth a try. You know what: while I'm here complaining, I'll be productive add some more ====Derived terms==== to a few of these, as solidarity or whatever. --WF110 (talk) 08:51, 2 Inktober 2018 (UTC)
So, I added loads of Derived terms to gift, double, jolt, slice...chicken. Maybe do more another day --WF110 (talk) 09:37, 2 Inktober 2018 (UTC)
Thanks for your help! I am not sure about the traffic rise, but three of my friends are doing this Inktober, here in France. My hypothesis is: when several people want to design something more original than the neighbour in a delimited topic, they look at the definition and polysemy. Well, I can be wrong about that and it could be a random list after all, but we are still improving the content of our projects and that's what matter most   Noé 16:10, 3 October 2018 (UTC)
It can also be a good opportunity to record audio pronunciations with Lingua Libre for words that don't have one yet. Pamputt (talk) 05:26, 4 October 2018 (UTC)
I have created a list on Lingua Libre. You only need to take your microphone and read the list on Lingua Libre now :) Pamputt (talk) 06:06, 12 October 2018 (UTC)

Reminder: No editing for up to an hour on 10 OctoberEdit

12:03, 4 October 2018 (UTC)

Lemma of Southern Sami adjectivesEdit

I've started working on improving our Southern Sami coverage, after having worked on a bunch of other Sami languages already. And I've hit a bit of a snag. All Sami languages have both "predicative" and "attributive" forms of adjectives. Almost all Sami languages traditionally lemmatise adjectives under the predicative nominative singular form. In existing Southern Sami dictionaries, though, the attributive form is chosen instead. Should Wiktionary lemmatise Southern Sami like other Sami languages for consistency, or should it follow the practice of other dictionaries and use the attributive as the lemma? —Rua (mew) 20:53, 6 October 2018 (UTC)

I'd go with attributive, respecting existing native lexicographical practice is a good thing insofar as it exists. Crom daba (talk) 14:37, 7 October 2018 (UTC)
I found another reason to use the predicative instead. raeffie is both a noun meaning "peace" and an adjective meaning "peaceful", but the attributive of the adjective is raeffies. Both have exactly the same case forms, the noun just lacks the attributive form. It would make more sense to me to place these on the same page, because they have the same stem and one can't be said to derive from the other through suffixation. This connection is lost if the lemma for the adjective is raeffies; how would you write its etymology section then? —Rua (mew) 15:05, 7 October 2018 (UTC)
This also illustrates an important point: the attributive is usually a derived form, while the predicative is the more basic form. The former is derived from the latter with an -s suffix. There are some adjectives where it's the other way around, but those are more rare. —Rua (mew) 15:08, 7 October 2018 (UTC)

Removal of commentsEdit

Is there policy here on Wiktionary regarding the removal of someone's comments, when those comments aren't vandalism? There is an idea that talk page comments are protected and not to be messed with that has long-standing in the wiki world. Comments welcome. -Inowen (talk) 02:40, 7 October 2018 (UTC)

Unless comments are disruptive or abusive etc., they should not be deleted, though comments on user talk pages can be deleted, more or less at the pleasure of the user. Comments that are merely wrong, deemed a waste of time, etc. should not be deleted IMO. I think that is our practice. DCDuring (talk) 03:52, 7 October 2018 (UTC)
Is there a policy page, like on Wikipedia? -Inowen (talk) 04:13, 7 October 2018 (UTC)
The Wikipedia:Talk page guidelines are, as the title says, guidelines, not policy. Quoting from that page: “Cautiously editing or removing another editor's comments is sometimes allowed, but normally you should stop if there is any objection.” Editing someone’s comment to change its meaning, or removal of comments for no good reason against other editors’ wishes, are considered unacceptable on Wikipedia and may get you blocked for disruptive behaviour. Same here, even if we do not spell it out, so wikilawyering won’t help here.  --Lambiam 08:58, 7 October 2018 (UTC)

A related issue: re-addition of commentsEdit

Users are permitted to delete comments from their own talk page unless it's a block notice, correct? I ask this because @Metaknowledge undid @IQ125's removal of comments from their talk page. When I undid Meta, citing that Meta had no right to re-add comments a user wants deleted from their page, Meta undid me. Purplebackpack89 22:36, 26 October 2018 (UTC)

If the owner of the talk page is trying to deceive people by pretending they haven't been warned about something of importance to the dictionary, I don't think that should be allowed. Chuck Entz (talk) 23:06, 26 October 2018 (UTC)
I'm in favor of a similar policy to what Wikipedia has: users are allowed to remove any comments except blocks. I think admins being allowed to add and remove comments like that gives them too much power. I'm not sure "deceive" is the word I'd use either. Purplebackpack89 01:11, 27 October 2018 (UTC)
Then perhaps we should move IQ's issues to WT:VIP. Equinox 10:24, 27 October 2018 (UTC)

Sanskrit categories for non-lemmaEdit

I am thinking of starting a new project for Sanskrit language regarding adding categories to all the Sanskrit non-lemmas based on their recognition of form. I've commenced it for some words already; however upon dispute as mentioned on Talk page of the word भवामि, I've been to told to suggest this idea on Beer parlour. The crux of this categorization is as follows:

  • For nouns: Creating categories based on case, number
  • For verbs: Creating categories based on tense/mood, voice, person, number
  • For pronouns: Creating categories based on case,  number

Why this is needed? Non-lemmas in Sanskrit are just divided into categories based on parts of speech as of now. However, it is noteworthy to mention that in Sanskrit, because of inflection and conjugation, there can be millions of non-lemmas. Thus, current categorization is not that useful.

For example, in Sanskrit just for a single verb, there can be over dozens of terms in future tense because of 3 persons, 3 voices and 3 number. And on top of that, there are two future tenses in Sanskrit! So to have over 50 terms of future tense verb forms in the Category:Sanskrit verb forms is utterly useless unless they can be further categorized based on the type of future tense.

Thus to avoid a dump of million verb forms in the aforementioned category, I am suggesting to add, in addition to the already existing categories, some categories to a term, so that it is reflected with proper presentation in an organized form under aforementioned category. This I'm suggesting to do for all Sanskrit non-lemma.

Initially, it will be a monumental task, but since Sanskrit language doesn't even have a complete dictionary on Wikitionary yet, this idea too will be a work in progress.

Hoping for positive feedback. JainismWikipedian (talk) 00:13, 11 October 2018 (UTC)

@JainismWikipedian: Manually adding over 100+ categories designed specifically for verb forms to each Sansktit non-lemma verb is insane! What kind of lunatic would complete such a project like that? If categorization were necessary I'd say that the only way to maintain the mind's well-being would be through automation- through robots. Aearthrise (talk) 08:43, 12 October 2018 (UTC)
I just want categories to be added, manual or automated. I am not well versed with coding, so if someone could create a module for it, then extremely grateful. Till then, I can add them for major Sanskrit words. My idea was not to manually add the categories, of course. That's why I mentioned it on this forum, so that someone can help to automate it. Till then, I can add them for major Sanskrit words. My idea was to have categories; not to add them manually forever of course. That's why, I mentioned it on this forum, so that someone can help to automate it. :) JainismWikipedian (talk) 11:05, 12 October 2018 (UTC)
We have already dumped millions of verb forms, to twist your words, in other categories and nobody seemed to care about it. Check out Category:Latin verb forms for a start - nigh on half a million beautiful words all lying together in harmony. BTW, I don't care about Sanskrit at all, I'm just giving information. --WF110 (talk) 11:06, 12 October 2018 (UTC)
Well, that's what. Then it just becomes a useless list of million words. Categories will help to organize the non-lemma in Sanskrit properly. And, someone not caring about a language on Wikitionary is not a good argument to keep things as it is. And, as mentioned by me just above, till the time some good soul creates an automated module or something, I can add categories manually to certain major Sanskrit non-lemma. It will enrich Wiki experience, not diminish it. That's all I am saying. JainismWikipedian (talk) 11:15, 12 October 2018 (UTC)
@JainismWikipedian: If you can find a way to automate the categorization process, I think that adding categories a good idea. What do the categories do for Sanskrit lemmas? Aearthrise (talk) 23:49, 12 October 2018 (UTC)

Proposed revision to WT:FICTION: names of universesEdit

I have a proposal to tweak the criteria for inclusion given at WT:FICTION.

As it currently stands, terms originating in fictional universes must have at least three durably archived uses that are independent of the universe in which they originated. This means, for instance, that Klingon gets in because there are many durably archived references to the Klingons or Marc Okrand's Klingon language that do not mention Star Trek nor Paramount. In addition, names like Homer Simpson and Pippi Longstocking are permissible because they have acquired secondary meanings (doofish dolts, in the former case, or freckled redheaded girls or spunky girls, in the latter) beyond their literal uses as the name of the characters. Bat'leth gets in because there are print references to real-life bat'leths, proving that the word (and object) have acquired a life beyond the Star Trek universe. And Eevolution and pedosaur get in because the former is not actually officially used in canon Pokémon and the latter certainly would never appear on Barney & Friends; they're kosher because they didn't originate in fictional universes.

But then you have terms like Pokémon. The word Pokémon is a nonexistent entry in Wiktionary. Why? Because it's impossible to make a reference to Pokémon, using the word "Pokémon", that doesn't mention the Pokémon universe, because Pokémon is Pokémon. The requirement that all terms originating in fictional universes have uses independent of the universe (i.e. not mentioning the universe) creates a bizarre legal fiction wherein the name of a franchise can't get into Wiktionary.

It seems bizarre that we have the non-canon spelling Pokemon and the outright misspelling Pokéman, but not Pokémon. That we have Pokédollar listing Pokémon in its etymology, but the link at that etymology is to a nonexistent page. That Pikachu, Squirtle and Jigglypuff are defined as species of Pokémon, but the link to Pokémon in each of their definitions won't tell a reader what a Pokémon is.

So my proposal is: The name of an entire franchise can get an entry in Wiktionary as long as (a) it would meet WT:FICTION were it not for the self-referential aspect as per the independence requirement (i.e. there are at least three uses, the uses are durably archived, they are at least a year apart from earliest to latest, they are uses and not mentions, the uses are by at least three different authors, etc. -- for the name of the franchise this would also require that the cited texts not mention the name of the franchise's creator nor its corporation), and (b) it is a one-word (solid) name, hyphenated word, or multi-word phrase of which the single words are not standalone meaningful words in English nor the native language of the franchise (Italian, French, German, Japanese, Spanish, Korean, Mandarin, Arabic, Hindi, Hebrew, Portuguese, or whatnot). So Pokémon, Digimon, Fraggle, She-ra, Spider-man, and Winnie-the-Pooh would all be permissible Wiktionary entries, whereas Bridge to Terabithia, My Little Pony, Paw Patrol, Masters of the Universe, and Dora the Explorer would not get in because they fail criterion (b), and that comic featuring ninja sharks that you made up last week would not get in because it doesn't have the durably archived uses by three different authors.

What do all of you think? Khemehekis (talk) 03:02, 12 October 2018 (UTC)

It seems to me that Pokémon as a term for a creature could be cited by the same kind of citations as would be sufficient for Klingon, like if someone spoke of "Game of Thrones' resident Pokémon Hodor" in reference to his only speech being his name, or something. Are you wanting a (second) sense line that defines it specifically as a franchise, and if so, why? (And, sincere non-rhetorical question: is it really unincludable in that sense? I see we have e.g. Star Trek and Star Wars, and the former was RFDed and kept. Should they be RFVed?) - -sche (discuss) 03:33, 12 October 2018 (UTC)
-sche, your comment provides many good points to think about. Excellent point about the "resident Pokémon Hodor" uses. TV Tropes even has a trope called Pokémon Speak. The word kryptonite gets in because of its allusory meaning of something someone is helpless against, and muggle gets in as a word for someone without a superpower or special distinction (for example, I once read an article on synaesthesia that referred to non-synaesthetes as "we muggles"). I personally think Pokémon as a word for a creature would be fine, although Pokémon has stayed a nonexistent entry for all these years, so some force must be driving the inertia at creating it (or re-creating the old entry). Look at Pokémon -- it has alternative spellings, a link to the (lower-case) Spanish translation, even a sound file, so it's just silly that it's a nonexistent entry -- following the letter of CFI at the expense of the spirit. To answer your first question: we don't really need a separate definition line for the franchise, vis-à-vis the creature sense. I just want the creature sense to get in, for the reasons stated in this edit and the OP. Come to think of it, in fact, Pokémon, Digimon, Fraggle, and Muppet are all names of creatures/species, while She-ra, Spider-man, and Winnie-the-Pooh are names of individual characters as well as franchises. My favorite guideline at Wikipedia is Use common sense, and that should be observed in drafting WT:FICTION at Wiktionary as well. And considering that we have Star Trek and Star Wars: maybe multi-word entries are permissible for really big franchises, or at least those that have gained extended metaphorical use, but not for every TV show, every toy, and every book under the sun? I'd be open to that as a tweak to my original tweak. Would you be OK with an entry for, say, Game of Thrones or Harry Potter? Khemehekis (talk) 04:58, 12 October 2018 (UTC)
How about an additional requirement that the franchise must have, say, three associated lemmas that must have already made it into Wiktionary by the old rules? Basically to stop the flood gates being opened too wide. e.g. Pokémon would pass this test thanks to the above mentioned Pikachu, Pokédollar, and Eevolution. But not every random show would make it in just by narrowly fitting the other criteria. Pengo (talk) 11:17, 16 October 2018 (UTC)
Support this idea. That's a good objective criterion to have, I think. That would justify the inclusion of major franchises like Star Wars (lightsaber, Wookiee, Jedi) without having to make exceptions to the rules. Andrew Sheedy (talk) 14:31, 16 October 2018 (UTC)
Pengo: That's brilliant! I guess the fictional universes at Category:en:American fiction, for instance, would be able to get in, without having to admit Shirt Tales or The Get-along Gang as entries. Khemehekis (talk) 22:59, 16 October 2018 (UTC)
It's been a month since Pengo came up with this idea. Any other supporters? Khemehekis (talk) 01:12, 19 November 2018 (UTC)
Oppose. I would rather not, instead linking terms which are inherently encyclopedic (like Game of Thrones) to Wikipedia. - TheDaveRoss 14:36, 19 November 2018 (UTC)

Adding translation boxes in the entries for Chinese character componentsEdit

The entries for Chinese character components lack translations, which I think would be useful --Backinstadiums (talk) 11:37, 14 October 2018 (UTC)

as long as we add the referent source, especially in well-known resources such as The Chinese Language Fact and Fantasy, it only adds to the entries. --Backinstadiums (talk) 15:21, 17 October 2018 (UTC)

EmojiEdit

Everyone's favourite troll, Wonderfool, had been putting emoji on a few pages before (s)he was blocked. Asking on WF's behalf, can anyone think of a valid reason why not to include the emoji on the corresponding pages? --XY3999 (talk) 12:22, 15 October 2018 (UTC)

They aren't words in a language. Equinox 18:26, 15 October 2018 (UTC)
Nonetheless, they are included in Wiktionary, so it doesn't seem out of place for them to be in a "See also" section. Andrew Sheedy (talk) 19:36, 15 October 2018 (UTC)
Many emojis represent feelings and emotions that can be translated objectively into other languages, such as "I feel happy". —Stephen (Talk) 04:10, 16 October 2018 (UTC)
Neither is r or ;. We should definitely include all Unicode characters. —Justin (koavf)TCM 04:36, 16 October 2018 (UTC)
This isn't about having entries for emoji, this is about linking to those entries from non-emoji pages. The current practice is to not link to other languages except in etymologies, descendants sections and translation tables. I think the argument is that linking to something that isn't English from an English section would go against that practice. Chuck Entz (talk) 06:39, 16 October 2018 (UTC)
But translingual terms get links in definitions (taxonomic names of organisms …). So maybe we can use emojis in definition lines: “A domestic fowl, Gallus gallus, 🐔, especially when young”. – “The meat from this bird eaten as food, 🍗 of 🐔.” Fay Freak (talk) 07:09, 16 October 2018 (UTC)
Ugh, I hope you're joking! I think a "see also" section is the only appropriate place for them. Andrew Sheedy (talk) 17:59, 16 October 2018 (UTC)
Emojis aren't a language. comma links to ,. —Justin (koavf)TCM 07:53, 16 October 2018 (UTC)
@Chuck Entz But emojis are used in English as if they were English, which is the key difference. Andrew Sheedy (talk) 17:59, 16 October 2018 (UTC)

What was the argumentation supporting the addition of pictures in entries? for example in house --Backinstadiums (talk) 20:49, 16 October 2018 (UTC)

Maybe that those pictures illustrate and are pretty and situated almost only on the edges of pages whereas emojis are ugly, often hard to read and intrusive wherever you put em. Fay Freak (talk) 21:09, 16 October 2018 (UTC)
Pictures in entries serve a clear purpose: I can see the definition of a word and now I want to know what the referent looks like. "A picture tells a thousand words." The emoji entry question is quite different: having non-word entries for pictures merely because those pictures are sanctioned by Unicode (which has lost its way as a project recently, with skin colour modifiers etc.). Equinox 22:25, 16 October 2018 (UTC)
@Equinox: The argumentation at issue is adding emojis as a referent in entries, just as are pictures, not entries of their own. The only reasoning to ban their addition, thereby sanctioning a lexicographic principle , would be that actual pictures are a "better" linguistic referent --Backinstadiums (talk) 23:03, 16 October 2018 (UTC)
I don't think we should include emoji or other non-words, and we should definitely not include all of Unicode. We're a dictionary, not a Unicode database. —Rua (mew) 11:09, 17 October 2018 (UTC)
Should we create a vote to ban all Unicode characters from Wiktionary (mainspace) that have the “Emoji” property? Serious question. It would be great to have such a clear stance against emoji. After all we are the guardians of the words, in natural opposition to idiocracy, and so not only would it save work from the encroachment from ever swelling emoji data but we would get media attention as stalwart conservatives and thus an influx of much needed new editors (to which it does not harm that I have mentioned this calculation already). No statement is so far made from my side about kaomojis like (´・ω・`), but those are a dangerous can too. Note that other websites document emojis, kaomojis and so on better anyway, hence we should not be sensitive about a motion for a ban. Fay Freak (talk) 11:43, 17 October 2018 (UTC)
I wouldn't want to get rid of them altogether, but I would support moving them to an Appendix. Andrew Sheedy (talk) 17:51, 17 October 2018 (UTC)
Of course. I explicitly talked about the mainspace. Having an appendix listing them is different. But I claim it is not worth having pages for each as these are of utmost shortness. Fay Freak (talk) 19:13, 17 October 2018 (UTC)
What about emoji which have a broad usage other than their nominal meaning. I am thinking of eggplant (🍆). - TheDaveRoss 18:27, 17 October 2018 (UTC)
We can’t add vulgar hand gestures either, such is appendix matter – emojis are somewhat of a 2D equivalent of hand gestures. Is ”paralinguistic” the correct word? Also we are not Know Your Meme. I have just found Category:Gestures, interesting. Totally underdeveloped also – I think we could link to hand gesture entries from the mainspace only in “see also” at the most, which is deniable too, whereas we would not like even that much to see emojis in “See also” sections, for emojis are trifling and excessive in variation at the same time. Fay Freak (talk) 19:13, 17 October 2018 (UTC)
We do have flip the bird, flip off, flick off, air quote and scare quotes to name a few gestures. Emoji are ideograms, and there is plenty of precedence for including those (see e.g. hieroglyphics and Chinese), however the degree to which any particular emoji has entered any particular language is open to interpretation. Some have become very common and have widely agreed upon meaning (e.g. smiley face) whereas others have little to no usage. - TheDaveRoss 14:17, 18 October 2018 (UTC)
We don't have the action "flip the bird" (or any of the others. We have the words used to describe the actions (just like we don't have an image of someone running, but we have the verb "run"). We do, however include gestures themselves in appendices, and I don't think there's an issue with doing the same for emoji. Andrew Sheedy (talk) 17:54, 18 October 2018 (UTC)
But even if they are put in an appendix, what are they doing in a dictionary? Are we going to include things like road signs, TV/audio player button symbols, "emergency exit" signs, etc etc? All of these carry meaning, and I don't see any difference between them and emoji. What distinguishes an emoji of an eggplant from a JPEG photo of an eggplant? —Rua (mew) 17:59, 18 October 2018 (UTC)
Because if I'm texting or writing an email to a friend, I'm probably going to use some sort of emoji. Signs or symbols on buttons are not used in running text. I use emojis to convey a particular meaning. They're a sort of punctuation that indicates the tone of my text/email. Andrew Sheedy (talk) 18:09, 18 October 2018 (UTC)
But what would distinguish an emoji of an eggplant from a JPEG photo of an eggplant? —Rua (mew) 18:11, 18 October 2018 (UTC)
It's tough to use a JPEG in running text. For the record, I would support excluding emojis that are not attestable (although if we're putting them in appendices, I'd have no strong feelings either way). Andrew Sheedy (talk) 00:42, 19 October 2018 (UTC)
  • Like it or not, emojis convey meaning and thus should be included. Purplebackpack89 18:10, 18 October 2018 (UTC)
    • Can we exclude you, then? —Rua (mew) 18:12, 18 October 2018 (UTC)

@TheDaveRoss What you have mentioned, as Andrew Sheedy has meanwhile noted, are the names of gestures, as such they are in the dictionary. I mean gestures as such, independent of any name, existing as gestures. And similar to making a gesture is posting an image file of a meme, an emoticon, a sticker, an emoji (all depending on the platform, on Windows Live Messenger it was mostly emoticons, no emojis yet, on Telegram Messenger there are stickers and image files and emojis, imageboards insinuate images …) – we don’t add rage comics, not only because a page title cannot be an image and not only because of copyright, but because of what they are is out of scope: Currently emojis are only included because of their technical handling being like of words, while by their conversational nature they are apart, so we shouldn’t have emojis either. Emojis are just memes in Unicode; the deciding criterion for something being included in Unicode is to be used in text, apart from emoji submissions that lack this criterion. If not for copyright the Unicode Consortium would include variations of Pepe the Frog, NPC Wojak, Cereal Guy and what not, I am not even sure that I exaggerate. (Literal gestures are by nature in the grey area in what concerns their being language, but technically more difficult to add here) Fay Freak (talk) 18:15, 18 October 2018 (UTC)

  • I think we can get a small nonthreatening emoji box that can link, like a normal picture. --XY3999 (talk) 13:29, 22 October 2018 (UTC)
    • I tried to do something clever with [[Image:Emoji_u1f414.svg|50px|right|thumb|text=This is the chicken emoji, find an entry at [[🐔]]|]] but I failed. Also, I added an emoji to [[chicken]] for fun. --XY3999 (talk) 13:41, 22 October 2018 (UTC)
      • Yeah, I tried another one at aubergine. I wanted to have it so when you click on the picture you go to the aubergine emoji, but that apparently simple wikiformatting is too advanced for me. --XY3999 (talk) 21:09, 22 October 2018 (UTC)

Amazing Feature that Exists in other Chinese- English Online Dictionaries that We Lack: 田+女 ---> 㚻Edit

Scene: I sit in my room scrolling through WeChat on my phone and see the character 㚻 in an online joke, and I want to search for it on en.wiktionary. I can't type it in via my pinyin software, so I put 田 and 女 in the search bar like this: 田女. It was the sixth result in a list of hundreds. I don't even scroll down to the sixth result at first, because I expect that the results are all going to be totally unrelated. But I go to ctext.org: [1] I just type 田+女 into the search bar and BOOM there it is, the two relevant characters: no fuss, no muss. I think to myself, this is an amazing feature that exists in another Chinese- English online dictionary (and therefore seems technically feasible to be implemented here) that we are lacking. I imagine: this dictionary will one day be better than the dictionary at ctext.org, and this is one of the things that we might want think about a way to implement within the existing framework here. I open the beer parlor page and type my screed. Perhaps this 功能 already exists in some fashion but I don't know of it? --Geographyinitiative (talk) 14:56, 16 October 2018 (UTC)

@Geographyinitiative: check the setion "consists of" in this link. --Backinstadiums (talk) 20:55, 16 October 2018 (UTC)

A bit crude, but you can try:

Wyang (talk) 21:18, 16 October 2018 (UTC)

@Geographyinitiative: CHISE IDS Find might be better and more comprehensive. —Suzukaze-c 02:59, 18 October 2018 (UTC)

Optional embolding parameter in Template:suffixusexEdit

The user Rua decided to delete an optional embolding parameter I've added to the Template:suffixusex, claiming it was "inconsistent" (with what?).

May I remind everyone of the fact that the "Wiktionary:Entry layout#Example sentences" states that "Example sentences should" "be italicized, with the defined term boldfaced". --ARBN19 (talk) 10:58, 17 October 2018 (UTC)

Here was the desired (optional) result:

gir- (to enter) + ‎-iş → ‎gir (entrance)

--ARBN19 (talk) 11:17, 17 October 2018 (UTC)

Firstly, it is inconsistent with usage elsewhere on Wiktionary. If the suffix should be bold, then it should be bold in every language, so why make it a parameter? Secondly, suffixes are not example sentences, so they have their own conventions. It may not always be possible to bold the suffix, because it may be obscured by morphological processes that occur as part of the suffixation, which makes it unclear which part of the word is the suffix and which part isn't. {{suffixusex}} was specifically made to not bold the suffix because of this. —Rua (mew) 11:01, 17 October 2018 (UTC)

Classical Nahuatl Possessive forms.Edit

@Marrovi,@Lvovmauro I propose that we use the he/she/it form to demonstrate possessive Classical Nahuatl noun forms. Aearthrise (𓂀) 12:35, 21 October 2018 (UTC)

I'm fine with that. --Lvovmauro (talk) 06:23, 22 October 2018 (UTC)

Proposed new bot-generated list: Template:desc or Template:desctree with an invalid ancestorEdit

@DTLHS Something we could perhaps create list for, any uses of {{desc}} or {{desctree}} that have a parent within the list of descendants, or if there is no parent the language entry itself, that is not a valid ancestor of the descendant language (determined the way {{inh}} does it). Of course, if there is a parameter that places an arrow before the term, it's ok, so this should only catch uses of the templates without any of those parameters. I expect to find especially Latin terms in this list, but we'll see. The bot could perhaps also be adapted to handle raw descendants that don't use {{desc}}, but that may be harder and involve more parsing. —Rua (mew) 20:25, 21 October 2018 (UTC)

I barely found any, but maybe I misunderstood something. User:DTLHS/cleanup/descendant ancestors. DTLHS (talk) 21:56, 21 October 2018 (UTC)
Yeah, you shouldn't include descendants that have bor=1, because there is no inheritance then. I don't know why there are so few listed though. —Rua (mew) 09:55, 22 October 2018 (UTC)
At the very least, Danish should appear in the list for capio, because Latin is not an ancestor of Danish and Danish is not marked as being borrowed. On fluo, Esperanto and Ido should appear in the list, for the same reason. —Rua (mew) 09:58, 22 October 2018 (UTC)
I see, I didn't understand your second condition. I'll update it later. DTLHS (talk) 16:27, 22 October 2018 (UTC)
I think I don't understand how to construct the language ancestor tree. French for example, has ancestor frm which has ancestor fro, but fro doesn't have ancestors so how do I get to Latin? DTLHS (talk) 01:16, 23 October 2018 (UTC)
See the getAncestors method in Module:languages. When there aren't ancestors in the language's data table, you have to step through the families to which the language belongs and get their proto-languages. — Eru·tuon 01:30, 23 October 2018 (UTC)
OK, I think I got it now- updated with the first 50,000 entries. DTLHS (talk) 01:53, 23 October 2018 (UTC)
Wow, that's a lot! I think it'll take a while to fix all of these. Any help from who is reading this would be appreciated! —Rua (mew) 10:22, 23 October 2018 (UTC)
Also, @DTLHS I think your bot may have made some errors at language in the Old French section. The bot has included English in the list, but English is given here as a descendant of Middle English, which should be totally fine. —Rua (mew) 10:24, 23 October 2018 (UTC)
OK, I filtered out any nested descendants. DTLHS (talk) 04:35, 25 October 2018 (UTC)
I don't know if they should be filtered out altogether. Rather, they should be filtered using the list item that is given as the parent. In this case, English was given as a descendant of Middle English, which is valid, but if French were listed as a descendant of English in the list for example, then that would be an error, unless it was indicated as borrowed. —Rua (mew) 12:24, 25 October 2018 (UTC)

Copyright status of Category:Esperanto 9OAEdit

The category Category:Esperanto 9OA contains the words from the work "Naŭa Oficiala Aldono al la Universala Vortaro" ("Ninth Official Addition to the Universal Dictionary"), a work released by the Academy of Esperanto in the year 2007. There is no indication on this page containing the work that it's is freely usable under our license, so this category should probably be deleted. This list of words, which is directly copied from the aforementioned page, prevents the copyright holder to fully benefit from their work. Robin van der Vliet (talk) (contribs) 19:36, 22 October 2018 (UTC)

The theory is that lists are not copyrightable, perhaps @BD2412 might want to weigh in. It may be appropriate to reference the source of the list, however. - TheDaveRoss 19:45, 22 October 2018 (UTC)
Lists of facts existing in the real world are not subject to copyright protection because anyone can assemble such a list by researching the field from which the facts are derived. However, since Esperanto is a manufactured language, if this is nothing more than a list of words newly coined by the manufacturers of that language, it is probably subject to copyright protection. There are probably other ways to approximately convey such information, such as categories for Esperanto words coined by decade, without reference to a particular work or release. bd2412 T 20:33, 22 October 2018 (UTC)
The words on that list are not copyrighted, they have been in use for a long time. For such a long time, that AdE has decided to add them to their official dictionary. That dictionary, called the "Universala Vortaro", has such a size that it has copyright protection. Listing them all together systematically in a category is a copyright infringement. Robin van der Vliet (talk) (contribs) 20:59, 22 October 2018 (UTC)
Lmaltier (talkcontribs) presented a very similar problem on this page:
"The list of words (nomenclature) of a dictionary should normally be considered as copyrighted, because it's the result of a huge selection work, and because customers may buy a dictionary only to check the presence or not of words (arbitration of a discussion, or dictionary used as a referee for word games). If such a list is copied, it's unfair competition, because there will be fewer customers for the copied dictionary."
Robin van der Vliet (talk) (contribs) 06:29, 23 October 2018 (UTC)
The full word list, perhaps. A very small subset of such words would obviously not constitute competition on those grounds since absence from such a list would not suffice to settle those conversations. It has become common for major dictionaries to release the lists of words they include in addenda to the press for publication. While I am not concerned about this being fodder for lawsuit or immoral, I have no problem removing this category since I don't find it to be particularly of use. - TheDaveRoss 12:36, 23 October 2018 (UTC)
This is a clear violation of copyright law; words were copied from a dictionary that invented words in a fake language; there is no way around this, we must delete the category. Aearthrise (𓂀) 15:01, 24 October 2018 (UTC)
First of all, @Aearthrise Esperanto is a real language, I speak it fluently and I know a lot of people who speak it. I care about Esperanto, but I also care about free projects like Wiktionary.
@TheDaveRoss A small subset of such a word categorization would be completely useless. The category Category:Esperanto 9OA invites people to complete it, and it also invites the creation of Category:Esperanto 8OA, Category:Esperanto 7OA, etc, until Category:Esperanto 1OA. This category would only be useful, if we have categories for all Official Additions to the Universal Dictionary.
In the future I would like to send a request to the Academy of Esperanto, asking them to publish all their publications under the public domain, so that it can be included here and in the future also on Wikidata. Robin van der Vliet (talk) (contribs) 22:56, 24 October 2018 (UTC)
Even if the category were complete and all of the categories you mention were complete, I still do not think it would be a copyright violation. Regarding your efforts to encourage the governing body to release their documents more permissively, that seems like a great idea. - TheDaveRoss 13:07, 25 October 2018 (UTC)
As far as I can see, the category is not a list of all words of the Ninth Official Addition, but only lists the ones for which we have an entry, half the number, extracted automatically from the fact that we have marked these entries as not only approved by the AdE, but also specifically stemming from that official edition. I don’t think anyone in their right mind would interpret that as a copyright violation.  --Lambiam 11:43, 25 October 2018 (UTC)
Today somebody showed me this page, at which the Academy of Esperanto declared in 2011 that all official publications are dual-licensed under CC BY-SA 3.0 and GPLv2. This is not declared at their main page, only on one subpage that I didn't see before. I asked a member of the academy months ago about the license status, and nothing came from it. I showed that specific academician that page, and they also didn't see it before and were also surprised... It appears that not all academicians even know about that page. Well anyway, I guess that this discussion can be closed now. The lists are free. Robin van der Vliet (talk) (contribs) 15:49, 25 October 2018 (UTC)

Request for bot permissionEdit

I would like to create pages with my bot for alternative orthographies of Esperanto. I detailed my proposal here. The 2 alternative orthographies I described are used and recognized by the language community. Robin van der Vliet (talk) (contribs) 23:08, 24 October 2018 (UTC)

-- “陶行行知“Edit

There is a problem with zh-x that is causing a duplication of 行 here when I add the dash between 行 and 知. I don't think I'm making any mistakes. --Geographyinitiative (talk) 02:06, 26 October 2018 (UTC)

You can see a similar, probably related problem in the second quotation for , sense #4. Maybe more an issue for the Grease pit.  --Lambiam 09:43, 26 October 2018 (UTC)
moved conversation there --> [2]--Geographyinitiative (talk) 23:46, 26 October 2018 (UTC)

Pinyin for phonetic annotationEdit

Victor Mair's last post on languagelog asks for help to use wikipedia's Pinyin for phonetic annotation (Template:Ruby-zh-p) ; CCan it be used in wiktionary entries? --Backinstadiums (talk) 02:23, 28 October 2018 (UTC)

The template Template:Ruby-zh-p in Wikipedia is not smart enough, you have to provide each characters with its pinyin and it's a lot of work. It's not automated. The result looks good but it would some effort to do it for longer texts. Not really a "wonderful tool", as the blogger says. Ruby has been implemented well for Japanese at Wiktionary, and it's also considered a common practice for Japanese, see Template:ja-usex/documentation or Template:ja-r/documentation. One needs to provide the kana spellings, spacing and use a few tricks to give accurate transliterations.
For Chinese, ruby has not been that popular and besides, you can see only one version - simplified or traditional, not both. See Template:zh-x/documentation for the best way to show Chinese usage examples, IMO. The blogger can try annotation tools such as Chinese Annotation Tool but you should know that neither our templates, nor the online annotation tools can give a 100% accuracy. You have to provide the accurate spacing, work with irregular or less common readings and at times provide the correct simplified variant where there are variants. Wiktionary's modules and templates can exported to Wikipedia but I don't think they can be used elsewhere. --Anatoli T. (обсудить/вклад) 08:52, 28 October 2018 (UTC)

attributive -> relational for describing relational adjectives in RussianEdit

(Notifying Atitarev, Cinemantique, KoreanQuoter, Useigor, Wanjuscha, Wikitiki89, Stephen G. Brown, Per utramque cavernam, Guldrelokk, Fay Freak): Wiktionary Russian entries use the term "attributive" to adjectives that stand in the place of nouns when modifying other nouns. For example, English can simply say "water wheel" or "processor architecture" or "Night Wolves", with one noun in apposition to another, but Russian needs to use a special adjectival form of the modifying noun, e.g. водяно́е колесо́ (vodjanóje kolesó) (cf. вода́ (vodá, water)), проце́ссорная архитекту́ра (procéssornaja arxitektúra) (cf. проце́ссор (procéssor, processor)), or ночны́е во́лки (nočnýje vólki) (cf. ночь (nočʹ, night)). (Note that English has such adjectives too, e.g. cellular vs. cell and senatorial vs. senator, but their use isn't mandatory, and many such terms have a decidedly literary or obscure flavor -- e.g. bovine vs. cow, apian vs. bee.) These adjectival forms are separate lexical entries, which are normally defined e.g.

  1. (attributive) processor

However, this terminology doesn't appear to be standard; instead, they are normally called "relational adjectives" in the literature, and "attributive" has a different meaning (in a directly modifying position, as opposed to predicative). I'm thinking of defining a new label "relational" to mark such adjectives, which classifies the term into e.g. Category:Russian relational adjectives and links to a glossary entry clarifying how such terms work, and using a bot to replace all "attributive" labels on Russian adjectives with "relational". Thoughts? Benwing2 (talk) 22:41, 28 October 2018 (UTC)

@Benwing2: by "they are normally called "relational adjectives" in the literature", do you mean the English-written literature about Russian? Per utramque cavernam 22:50, 28 October 2018 (UTC)
It’s indeed confusing to call those adjectives attributive if what you want to say is “relational”. But the reason that that label is used is that the editors who use the label use such labels because the English glosses belong to multiple parts of speech (example I pick from your edit history because you haven’t named any: древесноуксуснокислый (drevesnouksusnokislyj)). I would not place any labels there, glosses do not generally need to be of the same part of speech, particularly in this situation when an adjective would be part of a composed noun in English. The POS header and the form of the word (the ending ый) is enough, nobody could reasonably mistake the adjective for a noun. Well often in Arabic entries I just mechanically add -related to the gloss so the gloss is an adjective or write “related to X” as is done at عَقَارِيّ(ʿaqāriyy). Others write “pertaining to X” like at جِهَادِيّ(jihādiyy). Compare حَدِيدِيّ(ḥadīdiyy) created by Benwing where the gloss is just “iron”.
tl;dr: You do not need any labels where you think them. Fay Freak (talk) 22:59, 28 October 2018 (UTC)
@Per utramque cavernam Yes, in the English literature. The Russian equivalent appears to be относи́тельное прилага́тельное, which is a pretty direct equivalent. Benwing2 (talk) 23:39, 28 October 2018 (UTC)
@Benwing2: I have no strong opinion on the English terminology or labels to be used. --Anatoli T. (обсудить/вклад) 01:57, 29 October 2018 (UTC)
@Fay Freak I think it's important to put some sort of explanatory label or text if the definition is a part of speech other than the headword. I find it very confusing to have e.g. процессорный defined as just "processor" with no label. It wouldn't be obvious to me, encountering such a thing, that it's a relational adjective and is used in the attributive position (hence the Wiktionary label), modifying another noun. I would just conclude it's a mistake (not unusual in Wiktionary :-( ...). BTW your example of "iron" isn't probative because "iron" is defined as an adjective in English as well (although I would in fact put a "relational" label here, just to make it clearer). Benwing2 (talk) 02:12, 29 October 2018 (UTC)
BTW I don't like 'pertaining to X' or 'related to X' or similar sorts of definitions, because that's not the usual translation in English. It also makes the term sound quite formal, since the wording is so formal, which isn't the case. Benwing2 (talk) 02:14, 29 October 2018 (UTC)

Changing Wastek language name to HuastecEdit

I would like to change the name of the label of Wastek Mayan to Huastec Mayan. 1. Huasteca is the region it comes from, 2. ortography mentioning the language spells it as Huastec, and 3. the Wiktionary Article names it as "Huastec language". Aearthrise (talk) 16:25, 29 October 2018 (UTC)

I second this. Huastec is an English word, derived from a Spanish word, derived from a Nahuatl word. There's no reason to spell it as if it's a Mayan word. --Lvovmauro (talk) 00:01, 30 October 2018 (UTC)
I third this. Frankly, I've never seen the spelling "Wastek" before. Khemehekis (talk) 04:15, 3 November 2018 (UTC)
It's the name of a couple of refuse-related businesses... This would seem to indicate that Huastec is still in use by scholars. It does seem odd that anyone would bother to make this non-Mayan name look superficially Mayan when its speakers call it Téenek, instead. Nonetheless, we should go by whatever is the predominant one in actual usage. However this turns out, it would be a good idea to add Teenek to the list of names (and Huastec, for that matter), and to create English entries for all the names that are in use (note the redlinks). Chuck Entz (talk) 06:13, 3 November 2018 (UTC)

The Community Wishlist SurveyEdit

11:05, 30 October 2018 (UTC)

Anyone in favor of a lexicographer's workbench? An example of a tool for such a workbench would be a way of extracting snippets of occurrences of words and sorting by collocated words within 1,2,3,4,5, etc. words/tokens, then grouping by synonym groups, then by role groups. This would enable us to do primary lexicographic work on polysemic words. It would also make us more attractive for students of linguistics, which might enable us to recruit more talent. It would greatly enhance our ability to mine whatever corpora we could get the required access to. DCDuring (talk) 21:10, 30 October 2018 (UTC)
Support. Andrew Sheedy (talk) 18:26, 31 October 2018 (UTC)