Wiktionary:Beer parlour/2020/July

Etymological morphology edit

The eymology of uncomeatable reads un- + come at + -able. Is there any policy about how to better express the morphological order [un- + [come at + -able]] --Backinstadiums (talk) 09:36, 1 July 2020 (UTC)[reply]

Well, we could write {{af|en|un-|comeatable}}. —Mahāgaja · talk 09:40, 1 July 2020 (UTC)[reply]
(e/c) I agree with Mahagaja here, although in at least some cases English words of the form un-X-able are formed directly from X, not from X-able, as shown by X-able being rare or nonexistent (can't think of examples off the top of my head). Benwing2 (talk) 06:16, 2 July 2020 (UTC)[reply]
How did you manage to get an edit conflict with me 20 hours later?? Anyway, I don't see the problem with using {{af|en|un-|X|-able}} even in cases where X-able does exist. —Mahāgaja · talk 07:37, 2 July 2020 (UTC)[reply]

Templates for intensity of adjectives and nouns edit

I've been thinking about this for a little bit and I wonder if anyone else thinks it's a good idea. It seems like it could be handy to have a template that shows increasing levels of intensity of a word. E.g. something like this:

drydamp/moistdrenched/soakedsaturated

Or

irritatedannoyedangry/madenraged/furious

The latter could also have a noun equivalent like:

irritationannoyanceangerfury/rage

Does this seem useful? —Justin (koavf)TCM 06:15, 2 July 2020 (UTC)[reply]

Where would this information go in an entry? It also seems extremely subjective. Who says "saturated" is "more wet" than "soaked"? Is there a mathematical formula? DTLHS (talk) 16:44, 2 July 2020 (UTC)[reply]
I imagine it at the end, immediately before headers for other languages or near thesaurus entries for words that have them. —Justin (koavf)TCM 00:55, 3 July 2020 (UTC)[reply]
Isn't this just coordinate terms with an ordering? That heading seems like a good location for this to me. DTLHS (talk) 01:03, 3 July 2020 (UTC)[reply]
@DTLHS: Yes, these are coordinate terms, for sure. That's a great way to frame it. —Justin (koavf)TCM 04:44, 3 July 2020 (UTC)[reply]
Since these are subjective I don't think a template is appropriate: they do not fit into e.g. a formal grammatical model like -er and -est. Perhaps we could do something on the Thesaurus pages. Equinox 16:50, 2 July 2020 (UTC)[reply]
Sure but lots of language is subjective. Agreed that this is similar to thesaurus information but aren't those pages subjective, too? The goal would be to help navigate without going to a different page/namespace. —Justin (koavf)TCM 00:55, 3 July 2020 (UTC)[reply]
Thesaurus pages are sort of woolly by design: here are words that mean something like the other word, but with a different tone (disapproving?) or different register (slang? scientific?) etc. Templates (to me) work best for things that are not woolly but specific rules, such as verb inflections (go, went, gone) or -er/-est as noted. If the template is just a list of vaguely/subjectively related words why do we need a template at all? We already have a co-ord terms section. OK we could add a sort of hot-to-cold "range" template for these but is it really so simple? There isn't just one axis. We could go chilled-calm-annoyed-angry or we could go angry-jealous-envious or angry-furious-enraged etc. etc. Equinox 19:22, 3 July 2020 (UTC)[reply]
I think this is a good idea. As DTLHS suggests, the coordinate terms section seems the best place for this, accompanied by a brief explanatory note so that people know what the purpose of it is. Andrew Sheedy (talk) 04:09, 3 July 2020 (UTC)[reply]

Feedback on movement names edit

Hello. Apologies if you are not reading this message in your native language. Please help translate to your language if necessary. Thank you!

There are a lot of conversations happening about the future of our movement names. We hope that you are part of these discussions and that your community is represented.

Since 16 June, the Foundation Brand Team has been running a survey in 7 languages about 3 naming options. There are also community members sharing concerns about renaming in a Community Open Letter.

Our goal in this call for feedback is to hear from across the community, so we encourage you to participate in the survey, the open letter, or both. The survey will go through 7 July in all timezones. Input from the survey and discussions will be analyzed and published on Meta-Wiki.

Thanks for thinking about the future of the movement, --The Brand Project team, 19:39, 2 July 2020 (UTC)

Note: The survey is conducted via a third-party service, which may subject it to additional terms. For more information on privacy and data-handling, see the survey privacy statement.

IMO, calling it a movement invites even more politicization. DCDuring (talk) 22:07, 2 July 2020 (UTC)[reply]

Rename qwm to Middle Kipchak edit

This umbrella variety, under which we put entries from Codex Cumanicus and a number of other medeival dictionaries, should be renamed to Middle Kipchak rather than, as it does now, going under the somewhat confusing name Kipchak. @Crom daba, Borovi4ok, Anylai, 몽골어 물리, victar, Fay Freak etc. etc Allahverdi Verdizade (talk) 18:16, 3 July 2020 (UTC)[reply]

Kipchak has divided three.. Armeno-Kipchak, Mamluk-Kipchak , Cuman. @ 몽골어 물리. — This unsigned comment was added by 몽골어 물리 (talkcontribs) at 01:24, 4 July 2020 (UTC).[reply]
I propose renaming Kipchak to Middle Kipchak, whereas its subvarieties would still, in my proposal, go under the same names. If we add an entry from an Egyptian Mamluk Dictionary, and the same word is attested in Codex Cumanicus, then we add the Latin spelling under ===Alternative forms=== * {{alter|qwm|the word||Codex Cumanicus}}, and if there is an attested Armenian-orthography variants, it goes there too. If a word is only attested in Egyptian sources, it gets {{tlb|qwm|Mamluk-Kipchak}}, which means that this label must go from the entry سو (su) because it's obviously a word found in other sub-varieties as well.
Codex entries should be lemmatized under their original spellings, but transliterated to the common spelling system that we use for historical Turkic languages, i.e. ï represents ı, č represents ç, ŋ stands for ñ etc. The same goes for the other orthographies as well.
OPTIONALLY, we could also discuss setting one of the orthographies as standard, in which case the spelling variants in other orthographies would have # {{alternative spelling of|qwm|Arabic|the word||the gloss}} in the definition line. The orthographies which we set as alternative ones would only have "full" definitions if the term is not found in the "standard" script. — This unsigned comment was added by Allahverdi Verdizade (talkcontribs) at 14:55, 4 July 2020 (UTC).[reply]
That’s how I imagined it. I am not against it being called “Middle Kipchak” (not being a Turkologist anyway in the foreseeable future). Some would confront however that a Middle implies an Old, and what is the Old? Of course the question what the ancestor of it is remains unsolved (it did not die out but the attested forms are not the ancestor of all Kipchak languages, but perhaps they are attested forms of a broader language which is the ancestor of all of them.) Fay Freak (talk) 20:58, 4 July 2020 (UTC)[reply]
Agree. Also, ideally, a description of this umbrella entity needs to be created. Borovi4ok (talk) 12:02, 4 July 2020 (UTC)[reply]
Description? You mean Wiktionary:About Kipchak I guess. Fay Freak (talk) 20:58, 4 July 2020 (UTC)[reply]
  • The label "Middle" not only implies an Old, it also implies a Modern by the same name. If there's no modern Kipchak language to distinguish qwm from, and if the most common name for the language in the linguistic literature is simply "Kipchak" without the Middle label, then we shouldn't rename it. I notice that Wikipedia has separate articles on Kipchak language and Cuman language, and associates the code qwm with Cuman, not Kipchak. The articles aren't terribly explicit about the difference, though. Are we treating them as the same language? —Mahāgaja · talk 21:54, 4 July 2020 (UTC)[reply]
@Mahagaja: Yes, as implied by Allahverdi’s comment above. “Cuman” is basically Kipchak as known from a certain source which writes the language in Latin script, the Codex Cumanicus (the sources are listed). The primary and secondary sources on Kipchak are very sparse and the scattered, therefore also the conceptions of us netizens. Deverbale Wortbildung im Mittelkiptschakisch-Türkischen of 1996 lists the “Middle Kipchak sources” with descriptions on four pages. We should have passed the times already when special corpora are misunderstood as “languages”. When I mentioned certain Aramaic “languages” being more corpora than languages this was found convincing: Wiktionary:Beer parlour/2019/December § Aramaic subfamilies, Wiktionary:Beer parlour/2019/April § Splitting Aramaic. @몽골어 물리 has fallen victim to such an easy misconception.
Mahagaja, the Modern Kipchak languages or some of them, as explained, are the Modern. If Standard German had usually another name than German then Middle High German would not be a problematic name if it is understood that all the languages descending from Middle High German are the Modern German.
I suspect by the way that the historical word “Tatar” referred to the same Kipchak, being the Russian word for Kipchak like Alman, from the name of a German tribe, is the word for German in some languages, for examples. Now it is not hard to see that it means the same thing as deutsch but because the sources are so gappy we are not even certain of the Kipchak language identities. Fay Freak (talk) 23:24, 4 July 2020 (UTC)[reply]
@Fay Freak: I understand that there are modern Kipchak languages, but if there isn't one that's usually called "Kipchak", then we don't need to add the label "Middle", especially if English-language sources don't usually use that term (your link is for German mittelkiptschakisch, not English Middle Kipchak). If the modern CAT:Brythonic languages were descended from a single language spoken in the middle ages and that language were called Brythonic, we wouldn't need to call it "Middle Brythonic" because none of the modern languages is called Brythonic. I'm not asserting that "Middle Kipchak" is an unnecessary designation, I'm just asking questions. —Mahāgaja · talk 23:38, 4 July 2020 (UTC)[reply]
@Mahagaja: Middle Kipchak is not the only "Middle" language that is not preceded by an earlier attested form. Middle Mongol is another example: there is no such thing as "Old Mongol", only reconstructed Proto-Mongolic.
Anyways, to give you a recent example of usage of this label:
"The western Kipchak subbranch today includes only small and partly endangered languages in Eastern Europe: Kumyk, Karachay-Balkar in the Caucasian area, Crimean Tatar in Crimea and Dobruja and Lithuanian Karaim in the Baltic area. [...] Earlier varieties were spoken in old nomadic confederations such as those of the Pechenegs and the Kumans. The western subbranch is therefore sometimes referred to as the Kipchak-Kuman group. An interesting old representative of this subbranch is Armeno-Kipchak, which was written in Armenian script in Ukraine and Poland. Codex Cumanicus, an early document of Middle Kipchak, was compiled by Italian traders and German missionaries in the late 13th and early 14th centuries. "[1] Allahverdi Verdizade (talk) 00:07, 5 July 2020 (UTC)[reply]
@Mahagaja: Are you still maintaining opposition, or did we manage to convince the big elephant? :) Allahverdi Verdizade (talk) 23:18, 12 July 2020 (UTC)[reply]
@Allahverdi Verdizade: I was never exactly opposed, I just wanted to discuss the necessity of the name change. If this were a vote, you could (still) count me as an "abstain". —Mahāgaja · talk 07:03, 13 July 2020 (UTC)[reply]

I support this. A Google Books search for "Middle Kipchak" reveals widespread use of the term. An additional reason to change the name is disambiguating the Kipchak language (code qwm) from the Kipchak language family (code trk-kip). Compare the etymology section of ելակ (elak). --Vahag (talk) 07:50, 7 July 2020 (UTC)[reply]

plural affix in the OED edit

I cannot find an entry for the plural affix in the OED -es, -s , 's, s. Any help? Thnx --Backinstadiums (talk) 10:53, 4 July 2020 (UTC)[reply]

Does the OED list any other inflectional suffixes such as -ed, -ing, -en? Many dictionaries don't; indeed many dictionaries don't list bound morphemes at all. —Mahāgaja · talk 11:05, 4 July 2020 (UTC)[reply]
@Mahagaja: all of them; for example for -ed: https://www.oed.com/oed2/00072077, https://www.oed.com/oed2/00072078, https://www.oed.com/oed2/00056947, https://www.oed.com/oed2/00245745, https://www.oed.com/oed2/00245746 etc. Yet, I am clueless about the plural (-en is added https://www.oed.com/oed2/00074421 --Backinstadiums (talk) 11:42, 4 July 2020 (UTC)[reply]
It's not there right now, but this range has not been edited since 1909. It's quite possible that when the 3rd edition gets to S, it will be added. The Shorter OED (revised in 1993) does have it, and goes into some detail about the phonetic rules governing its use. Ƿidsiþ 11:59, 4 July 2020 (UTC)[reply]
@Ƿidsiþ: What are the lexicographical reasons for it? If somebody could check the paywalled current version, it would surely clarify this issue --Backinstadiums (talk) 12:11, 4 July 2020 (UTC)[reply]
The lexicographical reasons for what? I already have access to the OED (if that's what you mean by paywalled version), but as noted it isn't in there. Ƿidsiþ 14:42, 4 July 2020 (UTC)[reply]
@Ƿidsiþ: the technical reasons to leave out -s, but not -en. Seconldy, the 3rd edition should have gotten to S by now, because:
Beginning with the launch of the first OED Online site in 2000, the editors of the dictionary began a major revision project to create a completely revised third edition of the dictionary (OED3), expected to be completed in 2037. Revisions were started at the letter M. --Backinstadiums (talk) 14:54, 4 July 2020 (UTC)[reply]
Why should it have gotten to S? If it started with M and it's only about halfway to 2037, there's no reason it should have gotten to S given how many words start with M, N, R, etc. and how few start with V, W, X, Y, and Z. Andrew Sheedy (talk) 23:35, 4 July 2020 (UTC)[reply]
They're not doing it in alphabetical order anyway. They update every quarter around key batches of thematically-linked words. Ƿidsiþ 05:28, 5 July 2020 (UTC)[reply]
@Ƿidsiþ are they then regularly updating the online paywalled version? --Backinstadiums (talk) 09:15, 5 July 2020 (UTC)[reply]
Yes, every quarter. Ƿidsiþ 12:48, 5 July 2020 (UTC)[reply]

Label "(irregular)" edit

For example, the entry of taxies reads plural of taxi (irregular), and taxying reads present participle of taxi (irregular); yet, neither label is shown in taxi. Can't that label somehow be automatically added to the headword too? --Backinstadiums (talk) 16:20, 4 July 2020 (UTC)[reply]

Furthermore, just as taxying is labeled irregular, taxies must be too --Backinstadiums (talk) 10:04, 5 July 2020 (UTC)[reply]

Requests for cleanup in "Wiktionary:Requests for cleanup" edit

Many requests for cleanup (Template:rfc) are not mentioned in any section in "Wiktionary:Requests for cleanup". Should sections be made for them or does the rule that there should be sections for requests only apply to RFDs and RFVs? J3133 (talk) 19:11, 4 July 2020 (UTC)[reply]

Labeling non-native audio edit

I've complained before about English language audios uploaded by non-native speakers and hastily nominated one for deletion at Commons. I guess I can accept these audios existing, but not importing them into Wiktionary with no disclaimers attached. At the very least I propose labeling these files e.g. "Audio (France)" at family, one of 500+ entries with audio by a single French user. It's standard to include the speaker's country anyway. I might go as far as "Audio (France; English learner)". If the basic label is confusing ("they speak English in France?"), so is including them at all. I'd be happy to get rid of them, but that's not a popular option. What do you think of starting with the country label? Ultimateria (talk) 19:24, 6 July 2020 (UTC)[reply]

Just purge them all. Every tree that bringeth not forth good fruit is hewn down, and cast into the fire. Allahverdi Verdizade (talk) 21:32, 6 July 2020 (UTC)[reply]
Who is importing them here? As far as I can tell Commons never deletes anything, but we should absolutely remove them from entries on this site- "Audio (France; English learner)" is not a useful thing to have. We should ban all automated audio import IMO. DTLHS (talk) 21:37, 6 July 2020 (UTC)[reply]
That’s throwing the baby out with the bathwater. Yes, we should not have such “Audio (France; English learner)”, but this can be filtered by bots (and commented out to not be added again). Often enough I have created German and Russian entries where audios have been added which were already there on the other Wiktionaries or came later, as my precious attention may be used up by other things than adding audios, of the many things one can pay attention to, therefore we have bots; und luckily, for this issue, there are few learners of any language other than English who would have such a recording idea, tainting a target language.
Non-native audio is of course permissible for dead languages. Fay Freak (talk) 21:59, 6 July 2020 (UTC)[reply]
@DTLHS: Do you propose we only import audio manually?
The audio at family was imported by User:DerbethBot (diff), who is responsible for importing most of our audio files. I'd like @Derbeth to weigh in on this discussion. Ultimateria (talk) 18:32, 7 July 2020 (UTC)[reply]
Yes, I think this has been discussed before, the importing bot should take the (self-reported) speaker information into account, and only pick native pronunciations when importing from commons (and not from another Wiktionary). – Jberkel 06:49, 7 July 2020 (UTC)[reply]
In this case, the speaker self-reports as a native English speaker, so that wouldn't help. The issue here is that the bot doesn't listen to the sound files, so it can't spot an atypical accent or an out-and-out hoax. Another problem with the family entry is the way the the audio was adding directly under IPA labeled "New Zealand". If I hadn't looked at the edit history, I would never have known that the audio was completely unrelated to the IPA above it- not New Zealand, but Geneva, Switzerland.
We have blocked a number of people for adding huge blocks of translations in languages they didn't know that they got from non-English Wikipedias. Here we have a bot mass-importing wiki-sourced audio based on nothing but file names. What's more, these edits aren't being checked by anyone. If someone from the public notices and removes bad audio, it will just be added right back unless someone goes through the process of getting it deleted from Commons. I know from past experience that a small, but non-negligible percentage of this audio is utter garbage- but there's no way to know which files in which entries.
I say we need to a) come up with a way to do systematic quality control, b) label bot-imported audio with a warning template, c) revoke DerbethBot's bot status, or d) some combination of all of the above. Chuck Entz (talk) 09:33, 7 July 2020 (UTC)[reply]

I would like to avoid reading descriptions of Commons audio files for my bot. This would make scanning for audio files impractical: how to check descriptions of tens of thousands of files in a reasonable time? Currently my bot scans categories and simply accepts all properly named files (correct: En-uk-cat.ogg; incorrect: cat-English-pronunciation-by-Joe.ogg). Such scanning is really fast. Getting each file description is too slow.
To be more precise, the bot checks file description for some languages with non-Latin script where the description contains information what word is actually pronounced. But these languages don't have many files: Chinese is the biggest one with 5,000 files. English has 30,000. German has 300,00 and Dutch 500,000! Maybe I could check the file description once and not check again when I run the bot again, but this way we would risk that the bot will miss a file description being updated as 'wrong'.
Plus relying on description is not safe. There is no structure for providing such data. I'm pretty sure every user will mark files as non-native in a slightly different way.
The easiest option would be to categorise files on Commons. No deleting (it takes too long to pass a RfD), no need for me to manually maintain a bot blacklist. This could be one 'global' category ('Audio files by non-native speakers') or many per-language categories ('English pronunciation by non-native speakers'). I can easily ignore files from a category.
You can of course ignore me and just block the bot. Good luck then manually linking those 300,000 German files. German Wikimedia paid for recording German pronunciation and every month new 20,000 files appear, waiting to be used.
As for the example with 'family', commons:File:LL-Q1860 (eng)-Nattes à chat-family.wav comes from Lingua Libre website. I blacklisted all files except for Chinese from this source in November for the same reason you discuss above. Currently this edit won't happen. --Derbeth talk 19:00, 7 July 2020 (UTC)[reply]

@Derbeth How long does it take to download and check a description? It shouldn't take so long. Even if it takes 1 second per description, there are 86,400 seconds in a day, so you could do 300,000 German files in one day using 4 threads. I once did a bot run to move |lang= to |1= for {{inflection of}}, which is used on over 1,000,000 pages and it takes 1 second to save a page. Threads are your friend :) ... Benwing2 (talk) 00:45, 8 July 2020 (UTC)[reply]
I don't want to put too much load on WMF servers. My bot is not that important. I'm not only not using multiple threads, my bot stops between requests. My bot currently processes 71 languages. Currently refreshing audio files 'database' takes less than an hour. Checking description of each file would mean moving this to several days. --Derbeth talk 15:11, 8 July 2020 (UTC)[reply]
@Derbeth: "German Wikimedia paid for recording German pronunciation". Interesting, do you have a link for the project? Wasn't aware of this initiative. – Jberkel 06:49, 8 July 2020 (UTC)[reply]
I'm not able to find the link. An example is commons:File:De-Putten.ogg. The device is mentioned in w:de:Wikipedia:Archiv/Technikpool/Geräteliste#Audiotechnik. --Derbeth talk 15:11, 8 July 2020 (UTC)[reply]
Ah, so they just provide the recording hardware, it sounded like they were paying someone to do the recording. – Jberkel 18:04, 9 July 2020 (UTC)[reply]

@Derbeth Can you use your bot to remove English audio by User:Nattes à chat? Based on this discussion I don't think there will be any objections. Ultimateria (talk) 03:17, 11 July 2020 (UTC)[reply]

I don't have any code ready to remove audio files from pages. I would need to write such code and carefully test it – and I don't have much spare time to do this now. --Derbeth talk 13:16, 11 July 2020 (UTC)[reply]
No worries. @Benwing2? Ultimateria (talk) 16:03, 11 July 2020 (UTC)[reply]
@Ultimateria This is easy enough to do if all of the 'Nattes à chat' audio files say 'Nattes à chat' in their title, as the one at family does. Benwing2 (talk) 17:29, 11 July 2020 (UTC)[reply]
@Derbeth Can you download and process the Commons dump file? I do this with Wiktionary's dump; it's not hard to do in Python using their SAX library because the dump files are so nicely structured. Benwing2 (talk) 17:32, 11 July 2020 (UTC)[reply]
I can send you code for this if you want. Benwing2 (talk) 17:33, 11 July 2020 (UTC)[reply]
My code is written in Perl. As far as I can see there is a library for reading dumps in Perl, so I could use it. Still, I'm reluctant to do any complex modifications of my bot. I have code that is well-tested and running it does not require any effort from me. Modifying the bot is a completely different thing. Spending my free time on playing with the code isn't something I dream of at the moment. --Derbeth talk 06:44, 12 July 2020 (UTC)[reply]
Generally, excluding Commons files based on description is not a hard thing and using database dumps should probably speed it up. If I don't have time to adopt my bot to the final decision, I will just turn it off and return to the code once I have time. My biggest concern is that I don't see a suggestion here how to reliably mark files as 'non-native'. People on Commons will just ignore the rule, just as they ignore file naming rules for audio files. Using a category as a marker may be more reliable. Plus 'non-native' is not the only reason a file is not suitable for Wiktionary. You can be a native speaker but pronounce your language just plain wrong. Maybe the category or other marker should not be just about 'non-native', but rather about 'invalid pronunciation'. --Derbeth talk 07:02, 12 July 2020 (UTC)[reply]
That kind of checking doesn't address the real problem, which is importing audio files that no one has listened to. I think the best compromise would be a template like the one User:Tbot included in entries it created. The audio is a much smaller part of the entry, so it should be a lot more concise and unobtrusive- something along the lines of "this audio was added by a bot and hasn't been checked. Feel free to remove this notice if it sounds ok". It would also add a maintenance category so people can find them if they have time to check a few. There would be no parameters, so it should be very simple to add to your code. Chuck Entz (talk) 07:35, 12 July 2020 (UTC)[reply]
Maybe even just a message "unchecked audio file" with a tooltip and link that contains more detail? Benwing2 (talk) 20:38, 12 July 2020 (UTC)[reply]
@Ultimateria Do you want me to remove all of the non-Chinese Lingua Libre audio files? As mentioned above, they were blacklisted by DerbethBot. Benwing2 (talk) 17:42, 11 July 2020 (UTC)[reply]
'Nattes à chat' audio files removed. Benwing2 (talk) 19:23, 11 July 2020 (UTC)[reply]
@Benwing2: Thank you. I just meant that user, and yes I think she only uploaded through Lingua Libre so that should be all her files. Ultimateria (talk) 15:48, 12 July 2020 (UTC)[reply]
"Getting each file description is too slow." @Derbeth Use Special:Export or action=raw, that puts next to no load on the WMF servers. Alexis Jazz (talk) 05:42, 12 July 2020 (UTC)[reply]

Idioms used only in some meanings of other idioms edit

For example, only the second meaning of lay on the line uses on the line. What is the best way to indicate so? --Backinstadiums (talk) 17:37, 8 July 2020 (UTC)[reply]

I think you are saying that the noun template links to on the line (via en-noun|head) but that three-word phrase only applies to one of the two senses. I don't think we have any particular solution for this. In this case I would probably remove the special en-noun head, so that it just links to the four individual words, and put on the line in a Related terms section. Equinox 04:33, 9 July 2020 (UTC)[reply]
In the case at hand, I disagree with the premise that the first definition of lay on the line does not use an idiomatic sense of on the line. It does, but not a sense clearly accommodated by the wording of our definitions at on the line. But I also don't see a suitable definition at line.
I have nothing to contribute to the question of how to handle the hypothetical problem of linking to individual words rather than idioms, except that such differences may (sometimes? always) be considered different etymologies, which then gives different inflection lines with different linking potential. DCDuring (talk) 11:15, 9 July 2020 (UTC)[reply]
My own experience and some other dictionaries have the second idiom as lay it on the line, because one cannot lay anything other than IT on the line. DCDuring (talk) 11:41, 9 July 2020 (UTC)[reply]

{{desc}} usage edit

Back in October, the parameter |unc= was added to {{desc}}, doing away with the need for appendinding {{q|possibly}} whenever a term was uncertain. Recently however, I was reverted from using {{desc|nolb=1|unc=1}} under the header ====Derived terms==== with the explanation "this is a derived term, not a descendant", but {{desc}} is also used for both derivatives and borrowings, which is why we have the parameters |der= and |bor=, respectively. Any thoughts on this? @Benwing2, Metaknowledge, Vorziblix, Erutuon, Chuck Entz, Rua --{{victar|talk}} 03:42, 9 July 2020 (UTC)[reply]

I agree with Rua. It's a PIE term in a PIE entry, and therefore not a descendant. —Μετάknowledgediscuss/deeds 04:27, 9 July 2020 (UTC)[reply]
@Metaknowledge: Right... but neither is any term that employs {{desc|der=1}} or {{desc|bor=1}}, clearly demonstrating that {{desc}} isn't only for direct descendants. --{{victar|talk}} 05:53, 9 July 2020 (UTC)[reply]
That is false, and suggests to me that you have a fundamental misunderstanding of how the term "borrowing" is used on Wiktionary. —Μετάknowledgediscuss/deeds 06:08, 9 July 2020 (UTC)[reply]
Umkay, and derived? --{{victar|talk}} 06:32, 9 July 2020 (UTC)[reply]
I'd use {{desc|der=1}} for a derived term in a different language, not the same language. For example, say we have a Proto-Germanic verb and a Gothic noun that's clearly derived from that verb, but the verb itself isn't attested in Gothic. We could create a Reconstruction page for the Gothic verb and list the noun as a Derived term there, but I would rather simply list the Gothic noun under the Descendants of the PGmc verb, and label it with |der=1 since it's an indirect descendant, not a direct one. —Mahāgaja · talk 07:18, 9 July 2020 (UTC)[reply]
Yep, that's a pretty standard usage, and one I employ all the time as well. My point to Meta is that any term with {{desc|der=1}} is a derivative, so to say no derived terms should use {{desc}} is at odds with the example above. --{{victar|talk}} 07:29, 9 July 2020 (UTC)[reply]
Derived terms of the current term, listed in the Derived terms section, use {{l}}, or one of the column generating templates. They never use {{desc}}, which is used exclusively in the Descendants section. Moreover, {{desc}} is always used for languages other than the current, which is why it prepends the language name by default. {{desc|foo|der=1}} would never be used in an entry of language foo, not even in Descendants. The only exception I can think of is a twice-borrowed term, where foo is borrowed into another language and then back into foo in another form. —Rua (mew) 08:37, 9 July 2020 (UTC)[reply]
This is why |nolb= was added to {{desc}}, so that it can act as {{l}}, whilst retaining all the benefits, like |unc=. As you well know, {{l}} was also used in all descendants sections before {{desc}} was created. All that you're describing is the status-quo before the template was created, and not the reason for not using it. --{{victar|talk}} 17:24, 9 July 2020 (UTC)[reply]
You understand that templates have syntactic meaning and aren't just used because you like the features they have? You could also use {{t}} as a linking template everywhere but that would be bad because it's supposed to be used for translations. "Desc" means descendant. If you want a generic linking template to have the same features give it a different name. DTLHS (talk) 17:32, 9 July 2020 (UTC)[reply]
That semantical argument doesn't really hold water in light of the template being used in other valid usages for derived terms. You'll first have to argue for the abolition of |der=. That said, if people want to create an alias for {{desc}} called {{deri}}, or whatever, sobeit. (FYI: I named {{desc}}) --{{victar|talk}} 18:15, 9 July 2020 (UTC)[reply]
I think it does hold water, though. Why would a term being a morphological derivative (as opposed to an inherited reflex or a borrowing or a calque) disqualify it from being a descendant? PUC18:44, 9 July 2020 (UTC)[reply]
@PUC: Who are you directing that do? It doesn't sound like we're on different sides here. --{{victar|talk}} 18:50, 9 July 2020 (UTC)[reply]

Inner structure of a Chinese verb edit

It may be a good idea for Wiktionary to clarify the "inner structure" of Chinese verbs, because this structure affects how the verb interacts with adverbs or particles. Consider 打烊 (dǎyàng) and 離開离开 (líkāi). When modifying them with (le), besides the normal 打烊了 and 離開了, 打烊 can have inserted in the middle, giving 打了烊, which 離開 can not (ungrammatical *離了開). The underlying reason is that 打烊 has a verb-object inner structure. Normally, the particle can be placed between the verb and the object in a sentence. When it comes to 打烊, we can say that and are not a true pair of verb and object, as neither of them has a meaning close to 打烊 when used separately, but they still preserve the grammar feature of such a pair that enables them to have the particle placed in the middle. 離開 on the other hand cannot do this because it has a different verb-complement inner structure. This different structure however also produces compositions that a verb-object structure doesn't, e.g. 離不開 (ungrammatical *打不烊). I think it would give readers a better understanding of how these Chinese verbs behave in a sentence if Wiktionary could present some information about their inner structure, maybe in the header or somewhere else.

As far as I know, usually Chinese grammar books categorize these inner structures into the followings:

  1. 主謂, "subject-predicate", e.g. 頭痛头痛 (tóutòng).
  2. 偏正, "modifier-head" (namely "adverb-verb" for verbs), e.g. 蠶食蚕食 (cánshí).
  3. 動宾, "verb-object", e.g. 洗澡 (xǐzǎo).
  4. 動補, "verb-complement", e.g. 坐實坐实 (zuòshí).
  5. 並列, "conjunctive", e.g. 吃喝 (chīhē).

恨国党非蠢即坏 (talk) 09:54, 9 July 2020 (UTC)[reply]

I agree this information should be included. MuDavid 栘𩿠 (talk) 00:58, 10 July 2020 (UTC)[reply]
  Support so much. I've thought about this too. —Suzukaze-c (talk) 06:05, 10 July 2020 (UTC)[reply]
  Support as well! I think we could maybe put this kind of into in {{lb}}, just like transitive and intransitive. — justin(r)leung (t...) | c=› } 03:03, 11 July 2020 (UTC)[reply]
@Justinrleung: I believe putting them in headers is better. Because in most cases these structures do not change for different senses. 恨国党非蠢即坏 (talk) 17:10, 15 July 2020 (UTC)[reply]
I'm curious, do these structures change for different lects? ‑‑ Eiríkr Útlendi │Tala við mig 17:20, 15 July 2020 (UTC)[reply]
@Eirikr: No, they don't. A Chinese word will always have the same structure in any dialects, perhaps only unless the words in different dialects are false cognates, which is very rare. 恨国党非蠢即坏 (talk) 17:30, 15 July 2020 (UTC)[reply]
@Eirikr, 恨国党非蠢即坏: I can't say 100%, but in most cases, it's the same. — justin(r)leung (t...) | c=› } 19:42, 15 July 2020 (UTC)[reply]
  Support as a learner — I can usually figure it out on my own, but not always... —Μετάknowledgediscuss/deeds 20:27, 11 July 2020 (UTC)[reply]
  Support Pleco added a // to pinyin in the last year or so to indicate this: e.g., 洗澡 xǐ//zǎo. —Enervation (talk) 21:50, 22 July 2020 (UTC)[reply]
  Support Good idea. This is a grammatical feature. Perhaps, adjectives can be addressed as well, e.g. adjectives, which can only be used attributively or predicatively, always used with 的. --Anatoli T. (обсудить/вклад) 11:15, 11 September 2020 (UTC)[reply]

I suggest it be displayed like this:

  • {{zh-verb|type=vo}}
  • 打烊 (verb-object)

A dot for verbs with 3 or more characters, to clarify where the border of the 2 parts is:

  • {{zh-verb|type=vo1}}
  • 吃·豆腐 (verb-object)

恨国党非蠢即坏 (talk) 17:26, 15 July 2020 (UTC)[reply]

@恨国党非蠢即坏: That looks good. The standard notation in dictionaries seems to be two slashes for separating a verb from an object. I think we could show the verb-object boundary for all cases (even for 2 character compounds). — justin(r)leung (t...) | c=› } 19:42, 15 July 2020 (UTC)[reply]
What's the status of this now? Enervation (talk) 07:23, 11 September 2020 (UTC)[reply]
@Enervation: It seems editors generally support that. I am trying my best to implement it. It may be very slow because I am not very familiar with these templates. 恨国党非蠢即坏 (talk) 09:36, 8 November 2020 (UTC)[reply]

Announcing a new wiki project! Welcome, Abstract Wikipedia edit

Sent by m:User:Elitre (WMF) 20:13, 9 July 2020 (UTC) - m:Special:MyLanguage/Abstract Wikipedia/July 2020 announcement [reply]

Votes timeline edit

Is there any reason why Wiktionary:Votes/2019-10/Application_of_idiomaticity_rules_to_hyphenated_compounds doesn't appear at Wiktionary:Votes/Timeline? If it's only an oversight then that's fine, I can add it myself if necessary, but I'm not very familiar with these procedures so I might be missing some factor. Mihia (talk) 01:47, 10 July 2020 (UTC)[reply]

It's just an oversight; feel free to help out. —Μετάknowledgediscuss/deeds 02:46, 10 July 2020 (UTC)[reply]
OK, thanks, done. Mihia (talk) 08:10, 10 July 2020 (UTC)[reply]

Treatment of hyphenated prefixes and suffixes edit

As a result of Wiktionary:Votes/2019-10/Application_of_idiomaticity_rules_to_hyphenated_compounds, the following text was added to the CFI:

Idiomaticity rules apply to hyphenated compounds in the same way as to spaced phrases. For example, wine-lover, green-haired, harsh-sounding and ex-teacher are all excluded as they mean no more than the sum of their parts, while green-fingered and good-looking are included as idiomatic.

A question has arisen at Wiktionary:Tea_room/2020/July#semi-charmed as to whether this policy applies equally to hyphenated prefixes, and, I suppose, by the same token, suffixes. For example, should a word "semi-X" that means nothing more than "semi-" + X, or a word "X-like" that means nothing more than X + "-like", be deemed sum-of-parts or not? This may need to be clarified in the CFI wording, especially if an exclusion to allow such words is understood. Any views on this issue anyone? Mihia (talk) 08:31, 10 July 2020 (UTC)[reply]

I omitted to spot, of course, that "ex-teacher" is actually given an example of what is excluded (which looking back now I believe I added deliberately to cater for exactly this question) – and "ex-" is surely as much of a prefix as "semi-". Mihia (talk) 20:20, 10 July 2020 (UTC)[reply]

Update CFI to reflect decision on treatment of attributive forms edit

One more thing ... it just occurred to me that I think the decision to exclude definitions of the form "x-y = attributive form of x y" was never codified into the CFI, which I guess it should be. May I suggest the following wording for comment here before insertion into the CFI.

Hyphenated attributive forms
The definition of a hyphenated compound merely as an attributive form of the corresponding spaced phrase is not eligible for inclusion. For example a definition of periodic-table as "attributive form of periodic table" is ineligible.

Mihia (talk) 21:54, 10 July 2020 (UTC)[reply]

Keeping up with the competition edit

https://public.oed.com/blog/the-oed-june-2020-update/

hordes
to bring our A-game
at full tilt
Cheddar Man
ambystomatid
Dobson unit
ordu (missing English)
alkannin
garba (missing English)
raas garba
circle dance
hiraeth
topophilia
silver fox (SOP?)
LOL (laughing out loud; they also have little old lady, but not lots of love)
Farmer Giles (personification/stereotype, and rhyming slang for something piles (medical))
Atlanticism
Atlanticists
Anglosphere
banana bread
plant-based
oat milk
farm to table
farm fresh
farm shops
farm markets
vote buying
quilling
vote fraud
voter fraud
vote early, vote often (as vote early, and (vote) often)
vote-a-rama
slobberhannes
slobberknocker
buzzer-beater
ar
arr
more than enough
unfathom

It seems most of the ones we lack entries for are at least arguably SOP.--Prosfilaes (talk) 05:54, 11 July 2020 (UTC)[reply]

We have A-game pointing to A game which has bringing it as the ux. Vox Sciurorum (talk) 23:29, 12 July 2020 (UTC)[reply]
vote fraud can be translated to Dutch as stemfraude (vote fraud), verkiezingsfraude (election fraud or poll fraud) or stembusfraude (ballot box fraud). However, "stemmersfraude" (voter fraud) is probably not attestable. I'm not sure if votefraud is attestable, I have one cite for it. Two more needed before WT:COALMINE could be invoked. Alexis Jazz (talk) 05:35, 13 July 2020 (UTC)[reply]
If John McWhorter is to be believed we need another sense of LOL that is even weaker than the "diluted" sense, as a marker of empathy without humor. I still get sad when somebody with a PhD writes "U" for "you" so I may not be the best editor to add text-speak. Vox Sciurorum (talk) 16:23, 13 July 2020 (UTC)[reply]
I follow OED's quarterly updates and tend to add stuff we are missing (at least the obvious ones; they used to just give you a list of words, but now they also give definition fragments, which is interesting, though they don't include the important glosses like "obsolete" or "Jamaican"!). I can by no means cover it all though. — and I should clarify that I don't steal their definition wording and usually dig out some citations from Google Books. Equinox knows about copyright (and is pretty sure that working from a word list isn't a "derivative work"). Equinox 09:30, 30 July 2020 (UTC)[reply]
@Equinox: w:Database rights are a thing in some countries (but not in the US, so the WMF doesn't care), though that's not strictly copyright. It's also rather questionable if that would even apply to a couple dozen words. So, keep it up. Alexis Jazz (talk) 16:43, 30 July 2020 (UTC)[reply]
Slobberhannes, capitalized, meets CFI. I didn't spot the lowercase form. - -sche (discuss) 21:25, 30 July 2020 (UTC)[reply]
@-sche That sounds really Dutch. (slobberen and hannes) Wikipedia says German though.. Alexis Jazz (talk) 22:28, 31 July 2020 (UTC)[reply]

User J3133 edit

User:J3133 engages in what to me seems to be questionable editing for which I asked them a question that they repeatedly reverted. My communication is probably far from perfect but the question seems perfectly reasonable. Since, diff looks very suspect to me: We don't move Beer parlour discussions around to various pages. I propose someone reinstates my post on that editor's talk page.

Furthermore, the same user started revert warring on Wiktionary:Style guide. I ask that the page is restored to status quo ante in view of Wiktionary:Votes/pl-2008-12/curly quotes in WT:ELE and Wiktionary:Votes/2013-02/Typographic vs ASCII punctuation in policies.

Thank you. --Dan Polansky (talk) 19:51, 11 July 2020 (UTC)[reply]

Two minutes before you posted this I'd already locked down WT:STYLE in its inconsistent (both in that it used both straight and curly quotes, and that it stated that curly quotes are preferred while using some straight quotes) status quo, to prevent further exasperating edit warring. I guess we need a poll of some kind since you're dead-set against the sort-of consensus that we use curly quotes. [Edit: I'm not really interested in arguing much about whether there's a consensus – I'm referring to the fact that I don't recall any public objections to the statement preferring curly quotes in WT:STYLE, and that many templates, such as those that use Module:links, as well as {{rfc}}, use curly quotes. But everyone has their own idea of what consensus is and only a well-formulated proposition put to a vote will really satisfy everyone's definition.] — Eru·tuon 20:00, 11 July 2020 (UTC)[reply]
I am happy to yield to 60% majority or perhaps even 55% majority since the issue seems to be largely a matter of taste; the above votes do not suggest these have been achieved. I do not see anything approaching "consensus" for any definition of it. As for what "consensus" is , we do not have a definition but we have a voted-on definition of when a vote passes. As for {{rfc}}, curly quotes have been placed there quite recently in diff on 25 March 2020, and the edit did not suggest any trace to a discussion or the like; the template would be evidence of long-term practice without curly quotes. --Dan Polansky (talk) 20:09, 11 July 2020 (UTC)[reply]
Sorry, I had forgotten that I added curly quotes to {{rfc}} only this year. I thought there was another cleanup template where the reason has been in curly quotes for a long time, but maybe not. {{rfd}} and {{rfv}} use straight quotes. — Eru·tuon 20:22, 11 July 2020 (UTC)[reply]
Here's my take on all of this: User:J3133 posted a topic asking if their edits would be ok. No one raised any objections. After spending days working on this, they discovered that someone they had never heard of was undoing all their work for reasons they didn't understand. They interpreted that the same way you interpreted their edits, and so the two of you wasted a lot of time reverting to what each thought was the correct, consensus-derived status quo.
You compounded it by posting cryptic references to "Miracle jester" on J3133's talk page. I have been an admin here for 8 years, have read all of the old discussion forums, and, as a checkuser, I have access to a lot of non-public discussions between veteran vandal-fighters from all over Wikimedia-land, and I have no clue who you're talking about.
If you're talking about Wonderfool, the main difference is this is someone who cares about doing things right. WF can be cynical and certainly enjoys stirring things up, as you say. J3133, on the other hand, is trying to improve things, to dot the i's and cross the t's. The last thing they want to do is cause trouble. Besides which, WF hasn't been engaging in any serious impersonations in recent years. Their recent MO has been to stick with one account at a time and to make self-references in the third person in a sly way that includes everyone else in the joke. Comparing the two betrays poor understanding of both.
Basically, you've committed as egregious a violation of "assume good faith" and "don't bite the newbies" as I can remember seeing (in recent years at least). Regardless of the merits, you didn't do enough to understand who they were, and why they were doing it. When you appoint yourself as a an arbiter of adherence to rules and proper conduct, you also assume the responsibility of making sure you know what's going on. Chuck Entz (talk) 20:50, 11 July 2020 (UTC)[reply]
The Miracle jester post of mine was unfortunate as I have to admit, but the above raises some doubt. To start, "undoing all their work for reasons they didn't understand": what do you mean? I have reverted a single-digit number of pages concerning curly quotes (Wiktionary:Style guide, User:AryamanA/Wonderfool, asked reversion of in WT:CFI), not undo all their work; and I posted links to two relevant votes to J3133 talk page; and it was after that that J3133 revert warred at Wiktionary:Style guide, and their edits include "Reverted edits made in bad faith after previous discussions" in edit summary (italics mine, diff) and a most curious understanding of what status quo ante is. And "posted a topic asking if their edits would be ok": which post do you mean? As for "you didn't do enough to understand who they were, and why they were doing it", diff shows my post that inquires what they were doing so that I can understand it, but they reverted it. It is also hard to see what that person was doing when they leave no edit summaries; diff is untransparent and curious enough to deserve an explanatory edit summary, but there is none. Since you have the appropriate communication skills, it might be a good idea if you ask the person to use edit summaries appropriately when they are doing something so very unobvious. --Dan Polansky (talk) 08:19, 12 July 2020 (UTC)[reply]
But for the record, I now undid more curlies in policy pages, always providing proper traceability to votes in the edit summary so that anyone, a veteran, a newbie, a mind reader, can know what is going on and where to find out more. --Dan Polansky (talk) 08:46, 12 July 2020 (UTC)[reply]

Vote: Converting policy and guide pages as for quotes and apostrophes edit

FYI, I created Wiktionary:Votes/2020-07/Converting policy and guide pages as for quotes and apostrophes since there is a renewed interest in making that kind of conversion, and previous vote/poll outcomes do not support any conversion. On vote design I posted on the vote talk page. Let us postpone the start of the vote as much as discussion requires, if at all. --Dan Polansky (talk) 09:04, 12 July 2020 (UTC)[reply]

A Revert in the Tea Room edit

@DTLHS recently reverted a response that was at least partly an unhinged, conspiracy-theorist rant. What complicates this is that it was by the person whose inquiry was the start of the topic, and it makes clear why they asked it in the first place.

I personally think it should be restored, in all its ugliness, so we can respond to it. Wiktionary isn't censored, and we have plenty of entries for terms that come from the kind of ideology this person espouses. We should adhere to a neutral point of view in our entries, but surely we can tolerate different viewpoints in our discussion forums.

As long as we can keep this from becoming a debate about such viewpoints rather than about lexicography, I think we should leave such content alone and let the community deal with it.

That said, I can see how this could get out of hand, and I don't own this place, so I'm asking here for everyone's opinions rather than just restoring the edit. We could certainly use a clear consensus on how to deal with this and similar events in the future. Chuck Entz (talk) 17:28, 12 July 2020 (UTC)[reply]

I think that sort of drama just distracts us from writing the dictionary, and shutting it down like that is probably the best solution. —Mahāgaja · talk 18:05, 12 July 2020 (UTC)[reply]
I agree with Mahagaja. We're not here to have discussions about people's viewpoints, and there's no reason why the lack of censorship in our entries should extend to our discussion forums. DTLHS made the right call. —Μετάknowledgediscuss/deeds 18:42, 12 July 2020 (UTC)[reply]
Could we have a link to the edit in question, in order to judge it? Mihia (talk) 22:10, 12 July 2020 (UTC)[reply]
I saw that. It was cheesy conspiracy stuff about "Zionists" and "Globalists" and "Agenda 21", you know, these conspiracy theories that a Jewish elite is controlling the world. Don't think we'd lose much by deleting the whole convo since it had little bearing on the dictionary project. Nobody has actually hidden the revision so you can see it here: [2]. Equinox 23:24, 12 July 2020 (UTC)[reply]
The conversation didn't go off the rails until the last comment by the originator of the topic. Even at that point it could have been drawn back to the original point at issue, which did seem potentially relevant to building a dictionary. Several contributors were pointing to words that seemed to fill the bill. I don't see that the last comment necessarily invalidates the original point, though that may put me completely at odds with cancel culture. I'd favor restoring the whole thing and strenuously getting the conversation on track or summarized and closed. DCDuring (talk) 02:54, 13 July 2020 (UTC)[reply]
I'd say restore per Chuck Entz and per DCDuring; the discussed censorship is diff. Note that the conversation is reasonably short; deleting a multi-page rant would be another matter. The conversation does contain lexicographical information, viz that racialist is a term some people use as alternative to racist to achieve some effect. One benefit of restoring it is that it reinforces the open discussion spirit of the place, free from censorship: posts that are bad should usually collapse on their own weight. The poster did not make any personal attack on other editors; he just criticized mainstream media, albeit in a way that is not very convincing and that does not contribute to building the dictionary. --Dan Polansky (talk) 08:48, 13 July 2020 (UTC)[reply]

Does anyone have any objections to a {{vrddhi}} template?

The template {{gerund of}} incidentally creates Category:LANG gerunds, so could go with Category:LANG vṛddhi gerunds instead.

--{{victar|talk}} 03:38, 14 July 2020 (UTC)[reply]

@Victar: I like it! Could be used for NIA languages as well :) Relatedly, do we need templates for the other ablaut gradations (guna, zero-grade) too? —AryamanA (मुझसे बात करेंयोगदान) 20:54, 14 July 2020 (UTC)[reply]
@AryamanA: I really don't know anything about guṇa formations (yes?), but my thought behind it, atleast, is that vṛddhi derivatives are more about the type of formation, possessive adjectives, than the actual grade of the vowels, like like how *gʷʰoréyeti is a causative verb, not just an o-grade. --{{victar|talk}} 21:23, 15 July 2020 (UTC)[reply]
@Victar: Yeah I'm not sure if guna formations are comparable... either way, yes, this template is a Good Idea. —AryamanA (मुझसे बात करेंयोगदान) 21:26, 15 July 2020 (UTC)[reply]
@Victar: It’s an excellent idea. Hölderlin2019 (talk) 05:48, 17 July 2020 (UTC)[reply]

Is "speech act" a reasonable label? edit

I noticed this edit by Gamren in which they removed the label "speech act" from a few definitions at anyway with the edit summary '"speech act" is a nonsensical label'. I can't say I have an informed opinion on this, but I did find that it's currently being used as a label in 16 other entries (mostly sentence adverbs). Should these stand? Is there a different label that captures the pragmatic function of these senses? "sentence adverb" is also in use as a label in three entries, FWIW. Colin M (talk) 18:33, 14 July 2020 (UTC)[reply]

Speech act seems like a good category, but not a good label for normal users. Using it as a label is lazy, helping convey the appearance or reality of either contempt for or ignorance of the capability of normal users. Most dictionaries have non-gloss definitions and/or usage notes to convey the idea. Hard categorization isn't that many more keystrokes.
BTW, I wouldn't be surprised to find that I had been lazy enough to have added such labels, especially sentence adverb.
We might be able to convince ourselves that it is acceptable to use these terms if we provide a link to intelligible definitions of how we use the term in labels. DCDuring (talk) 20:31, 14 July 2020 (UTC)[reply]
I genuinely don't know what the label was supposed to convey. We define "speech act" as an "act carried out by speech", which is ridiculously broad, since any kind of speech is an action. OED has "​something that somebody says, considered as an action", which would mean that speech acts are distinguished from other speech purely by perspective. Wikipedia defines it as "something expressed by an individual that not only presents information", which is closer to my own understanding -- although I think questions would also not be considered speech acts.
In any case, I understand a "speech act" to be a complete communicational object -- i.e., usually a sentence. Since adverbs, by definition, cannot stand alone, I don't see how the label applies to them in a meaningful way.__Gamren (talk) 22:29, 14 July 2020 (UTC)[reply]
My understanding is that it's an utterance that does something rather than conveying information: "good morning" doesn't say anything about the quality of the morning, it just performs the social act of greeting. If you say, "I'd like to order the combo", you're not really describing your preferences, you're in fact ordering the combo. On the far end of the scale, "hello" and "thank you" have no literal meaning at all. That's not to say that a speech act can't also convey information, but its primary purpose is to do something. Chuck Entz (talk) 04:35, 15 July 2020 (UTC)[reply]
Many students of linguistics and the philosophy of language know what it means, but I don't think most of the rest of us do, this discussion supporting that belief. Hence, my comments above. I think Chuck's explanation is good. The term was invented(?) by the Ordinary language philosopher J. L. Austin (See How to Do Things with Words). DCDuring (talk) 15:27, 15 July 2020 (UTC)[reply]
Okay, well, I've removed those instances where it seems to be used as a synonym of sentence adverb. Really, I can't see how anyone could conflate concepts so different.
P.S. Now that I reread, @DCDuring, it seems like you're suggesting that "sentence adverb" is a bad label? I strongly disagree, it provides essential grammatical information. I copied the entry def to the glossary and made the label link to it.__Gamren (talk) 21:57, 18 July 2020 (UTC)[reply]
It all depends on who you think is the audience for Wiktionary. If it is limited to those formally educated in fairly advanced English grammar, the term is fine. If you are willing to address the needs of those outside such elite populations, you should translate such a label into ordinary English. DCDuring (talk) 22:14, 18 July 2020 (UTC)[reply]
Okay, do you consider the definition at Appendix:Glossary#sentence_adverb to be in "ordinary English"?__Gamren (talk) 12:06, 19 July 2020 (UTC)[reply]
Sure. We just have to make sure that we have a link to that definition whenever we use sentence adverb in {{lb}}. I don't know how to do that, but it probably can be done. DCDuring (talk) 20:38, 19 July 2020 (UTC)[reply]
I just said I've done that. All you have to do is add an entry to Module:labels/data.__Gamren (talk) 20:50, 19 July 2020 (UTC)[reply]
So sue me. DCDuring (talk) 21:05, 19 July 2020 (UTC)[reply]

Merging Eastern Chatino lects edit

I'd like to merge the following codes into one language: cly (Eastern Highland Chatino), ctp (Western Highland Chatino), ctz (Zacatepec Chatino), cya (Nopala Chatino). The resulting code will be omq-che, Eastern Chatino. Of course, the merged entries would have dialect labels to make sure no information is lost.

The basis for this is in Campbell (2013):

"It is [...] shown that within Coastal Chatino, Eastern Chatino is in fact a valid genetic unit coordinate with Tataltepec."
"Ethnologue (Lewis 2009) lists four distinct languages or subgroups that would fall in the Eastern Chatino area (Zacatepec, Nopala, Eastern Highland, and Western Highland) but does not refer to any data that would support those groupings, and they are not supported here either."

I am working with a linguist who is a Chatino native speaker who would like to upload their Eastern Chatino data to Wiktionary, and they corroborated this. I see @Lvovmauro, Thadh have worked on Eastern Chatino lects, do you have any thoughts? I think this a fairly reasonable change, and if no one has any objections I'll do it by the end of the week so that we can start uploading the Eastern Chatino data. —AryamanA (मुझसे बात करेंयोगदान) 20:48, 14 July 2020 (UTC)[reply]

Also, @Chuck Entz I think you're knowledgeable about Amerindian languages, any thoughts? —AryamanA (मुझसे बात करेंयोगदान) 20:51, 14 July 2020 (UTC)[reply]
My background is more in North American languages, especially the northern Uto-Aztecan ones. I have nothing meaningful to say about Zapotecan languages. Chuck Entz (talk) 04:41, 15 July 2020 (UTC)[reply]
My understanding is that linguists studying Chatino generally split them down further than Ethnologue does, rather than lumping them all together.
And I think Campbell may be misinterpretting Ethnologue. The divisions that Ethnologue uses aren't claimed to be genetic groupings, they're based on intelligibility studies. --Lvovmauro (talk) 04:11, 15 July 2020 (UTC)[reply]
There are two problems with the merging. First of all, if the lects were to be merged together and the singular lects were to be deleted as languages, then the result would not be useful for the majority of lemmas, as the lects differ enough to create standardisation problems: you would get the case with Serbo-Croatian but instead of two spellings, it would be four; if you propose keeping the current codes and adding an extra one above them, that would result in the chaos that Cree is in, where the language does not exist on its own, it has a lot of papers written about it, while the problem with the scripts doesn't go away. My thoughts are that whenever there is a possibility to split non-standardised languages, we shouldn't do that. Thadh (talk) 10:13, 15 July 2020 (UTC)[reply]

Russian vowel allophone edit

I thought /e/ in the position CVCj was [e] not [ɛ] Dngweh2s (talk) 18:54, 15 July 2020 (UTC)[reply]

See for example цель. As a native speaker I can confirm, the /e/ in this word is pronounced the same as it would if it were written as цэль. I don't know which lemma you have in mind though. Thadh (talk) 19:26, 15 July 2020 (UTC)[reply]
The Wikipedia Russian phonology page says that /e/ in CVC is [ɛ] and in CVCj it is [e] Dngweh2s (talk) 19:45, 15 July 2020 (UTC)[reply]
Like with any <е>, it differs depending on the consonant that precedes it. For example <ц> or <ш> always governs with [ɛ], while most consonants (like <б> or <в>) generally convert into [bʲ] and [vʲ] respectively, and thus govern with [e]. Thadh (talk) 20:49, 15 July 2020 (UTC)[reply]
@Thadh: What User:Dngweh2s means, I am sure, is that цель (celʹ) should be [t͡selʲ], rather than [t͡sɛlʲ], which is CVCj, as opposed to цех (cex), which is CVC and [t͡sɛx] is justified.
In this case, we're more phonemic and if there is any real difference in the pronunciations of the vowel between цель (celʹ) and цех (cex), then I challenge you to provide some evidence :) I personally don't know any difference between цель (celʹ) and цех (cex) (the first part). @Benwing2: FYI.
BTW, this topic is misplaced (it doesn't affect any policies) and should probably be placed on talk pages of Russian-specific modules, such as Module talk:ru-pron. --Anatoli T. (обсудить/вклад) 01:54, 3 August 2020 (UTC)[reply]

Wonderfool user pages edit

I would propose the following about User:Wonderfool user pages:

  • 1) Undelete them. They usually do not contain anything objectionable.
  • 2) Place a template on them to the effect of "This is a user account of blocked user Wonderfool", and the template would place the page into a category User accounts of Wonderfool, Suspected sockpuppets of Wonderfool or the like. A similar thing is done on Wikipedia and it seems to work well.

--Dan Polansky (talk) 10:30, 18 July 2020 (UTC)[reply]

Why? We already have a pretty complete list. —Μετάknowledgediscuss/deeds 19:55, 19 July 2020 (UTC)[reply]
Because it is a more straightforward and intuitive way, one that scales much better for more people using multiple accounts. That would be the likely explanation for why Wikipedia does it that way. Like, if I want to learn about user Wonderfool, I would naturally go to User:Wonderfool, but I find an empty page. That is why some new editors talk of Wonderfool mystery and such. User:AryamanA/Wonderfool really looks like a non-standard workaround, not too bad, but still; the page does not have obvious findability, unlike User:Wonderfool. --Dan Polansky (talk) 09:05, 20 July 2020 (UTC)[reply]
An easy solution would be to move User:AryamanA/Wonderfool to User:Wonderfool. PUC09:10, 20 July 2020 (UTC)[reply]
That would be one option, better than nothing. However, I think a category for sockpuppet accounts is the perfect, tried and tested solution. Moreover, User:AryamanA/Wonderfool rewards Wonderfool by doing edit counting for them, whereas I would say: nope, if you want to have your edits counted, stick to one account. Incidentally, the page says "Count would put WF in a highly respectable fourth place in the human edit count", plainly false since some the accounts are bot accounts. --Dan Polansky (talk) 09:39, 20 July 2020 (UTC)[reply]
Hm, I would actually support that move since everyone still calls him WF. Equinox 17:45, 20 July 2020 (UTC)[reply]
Well, a non-standard user who gets non-standard treatment quite fairly can get a non-standard workaround in terms of tracking them. But yeah, a sockpuppet category would make sense generally. —AryamanA (मुझसे बात करेंयोगदान) 00:00, 21 July 2020 (UTC)[reply]
I am not really criticizing User:AryamanA/Wonderfool; that page was a great improvement on the state before, done in the user space in the spirit of individual enterprise and initiative, a praiseworthy thing. Nonetheless, I am proposing what I see to be a systematic improvement over this, one that has improved findability and is more intuitive for those many people who have ever seen Wikipedia's category for sockpuppets of a single user. --Dan Polansky (talk) 10:59, 23 July 2020 (UTC)[reply]
I like the idea of putting Wonderfool stuff under User:Wonderfool, but perhaps User talk:AryamanA/Wonderfool, Wonderfool's userpage where he keeps track of his work, which he formerly put on userpages of his various accounts, should be on User:Wonderfool itself, and the list of his accounts (User:AryamanA/Wonderfool) should be on a subpage such as User:Wonderfool/accounts. — Eru·tuon 21:39, 20 July 2020 (UTC)[reply]
Why not use a category, like Wikipedia does? --Dan Polansky (talk) 11:01, 23 July 2020 (UTC)[reply]
I wouldn't object to a category, but I still like the page. — Eru·tuon 18:50, 23 July 2020 (UTC)[reply]

OED Antedatings on Twitter edit

In May I posted about the #oedantedatings hashtag on Twitter, where people were giving evidence of word-usages predating the first-cited in OED; and of course the examples can be used here also.

I just wanted to note that this is ongoing. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:04, 18 July 2020 (UTC)[reply]

Romanized Hindustani edit

I am curious as to why there are no entries in Romanized Hindustani, which is Hindustani written in Latin script. For example, Hindi "मैं ठीक हूँ" would be romanized as "Main theek hoon." I would think that a redirect to the respective Hindi and Urdu entries could be appropriate. This form of text is in widespread use on the internet, arguably even more so than Devanagari among informal circles. I realize it may present some challenges (mainly, the ambiguity that entails between words/letters that would be transcribed similarly) but is this the only reason? Wiktionary does have entries such as ichigo, the romanization of a Japanese word, and ROTFLOL, Internet slang, so it surely wouldn't be unprecedented across languages. Is there anything in Wiktionary's policy that would bar entries for Romanized Hindustani words? I went through the "Criteria for inclusion" but do not recall reading about any such policy, though perhaps I am just interpreting it incorrectly. I am not making an argument for its inclusion, even though it may seem that way. I respect Wiktionary's choice to add or not add anything, I would just like to know why not in the case of Romanized Hindustani. --Baadal9 (talk) 17:32, 19 July 2020 (UTC)[reply]

Japanese rōmaji can be found in books and learner's materials. Hindustani popular romanisation is not standardised and is almost exclusively found in informal use online when one doesn't have the proper keyboard, similar to the Arabic chat alphabet. —Μετάknowledgediscuss/deeds 19:50, 19 July 2020 (UTC)[reply]

Are zu-infinitives lemmas? edit

I created aufzustellen by cut-and-pasting from existing abzugeben and noticed it was categorized as a lemma. Is that correct? The head line uses {{de-verb}} instead of {{head|de|verb form}} which is what I would have written if I had not copied from an existing page. Abzugeben is used in the definition of zu-infinitive and that's how I found the code to copy from. Vox Sciurorum (talk) 16:34, 20 July 2020 (UTC)[reply]

I think it's just an error at abzugeben. The lemma is abgeben; anything else is a nonlemma form. —Mahāgaja · talk 17:23, 20 July 2020 (UTC)[reply]
OK, I guess that makes sense on Wiktionary where every inflected form can be searched for directly. I remember trying to look up capisaldi in a paper dictionary not realizing it was an inflected form with a change in the middle of the word (the two components are pluralized independently). That was a case where a second entry would have helped. Internal zu struck me as inviting similar confusion. Vox Sciurorum (talk) 13:57, 21 July 2020 (UTC)[reply]
Listing every inflected form separately is my favorite thing about Wiktionary, especially for a language like Old Irish where inflected forms can deviate wildly from their corresponding lemma forms. For example, the student first learning Old Irish will probably not easily remember that érbart is a form of asbeir or that teilci is form of doléici (and those aren't even suppletive!). —Mahāgaja · talk 15:36, 21 July 2020 (UTC)[reply]

Incorrigible Quotes edit

I've been cleaning up after the deletion of โควิด-๑๙ and I'm finding that template {{th-x}} insists on having a space before the repetition sign mai yamok and after the abbreviation sign paiyan noi . Although normative Thai grammar calls for these spaces, they are frequently missing in the originals. How should I tag these inaccurate quotes until someone gets round to fixing that template to allow Thai text to be quoted verbatim? --RichardW57 (talk) 17:00, 20 July 2020 (UTC)[reply]

Incorrigible quotations found so far are for สวด (sùuat) and ไทย (tai). Like the latter, a quotation for บัญชา (ban-chaa) has another problem - a word was transliterated to the Thai script so as to work with that intolerant template! --RichardW57 (talk) 17:00, 20 July 2020 (UTC)[reply]

Can you use a narrow non-breaking space (thin space) or similar character to approximate the correct layout? Vox Sciurorum (talk) 17:41, 20 July 2020 (UTC)[reply]
Unfortunately, a lot of these spaces are often rendered with the width of normal spaces. The numeric character entities get mangled by {{th-x}}. --RichardW57 (talk) 18:31, 20 July 2020 (UTC)[reply]

It seems like we need etymological templates for these kinds of verb derivations. For example, in Hindi, plenty of intransitive verbs are internally created through vowel gradation from inherited transitive verbs. In this case saying {{af|verb|-suffix}} doesn't cut it, as this derivational process is more opaque. Also, these kinds of verbs should probably be in some centralized category. —AryamanA (मुझसे बात करेंयोगदान) 03:37, 21 July 2020 (UTC)[reply]

I wonder if a language-specific approach might not be more useful. Compare {{sw-derform}}. —Μετάknowledgediscuss/deeds 03:42, 21 July 2020 (UTC)[reply]
Yeah, every language has a different approach. The Hebrew binyanim are the obvious parallel (compare יָצָא (come/go out/forth) and הוֹצִיא (take or bring out or forth)), but even English has a few fossilized remnants of a causative suffix showing up as umlaut, and also ablaut grade derivations (compare fall/fell, drink/drench, lie/lay, sit/set, etc.). Chuck Entz (talk) 07:37, 21 July 2020 (UTC)[reply]

English taxonomic names edit

Three Two entries (Bovine coronavirus, Homo superior and Homo superioris) are English taxonomic names. They are in Category:Translingual proper nouns and Category:Translingual lemmas instead of the English categories because they use Template:taxoninfl. How should this be fixed? J3133 (talk) 05:00, 22 July 2020 (UTC)[reply]

Bovine coronavirus was changed to translingual by DCDuring. J3133 (talk) 14:14, 22 July 2020 (UTC)[reply]

At the BP-level, I would say that the answer is to introduce an optional first parameter to specify the language for consistency with other templates. There's then the Grease Pit question of how to get the side effects of {{head|{{{1}}}|proper noun}} when a value other than |1=mul is specified. (My first thought is to wrap it in {{[[Template:#if|#if]]}} as an unused condition.) --RichardW57 (talk) 11:19, 22 July 2020 (UTC)[reply]
I can also see an argument for a separate template for language-specific pseudo-taxonominc names. --RichardW57 (talk) 11:19, 22 July 2020 (UTC)[reply]
i don't think that a separate template is worth the trouble for the few science-fiction and mythical cryptospecies names that we have, together with Homo Xus-type names. The don't really form a semantically homogeneous category, IMO.
Virus species names are formed from English words and become the ICTV-approved names of the taxa in a way parallel to the binomial names established under the zoological, botanical, and prokaryote codes. Bovine coronavirus is a no-longer-current species name. (I haven't tracked down the nature of its successor.) I have changed the language header to Translingual and added the label archaic.
The other two seem like English, in some cites from fictional universes. A similar example would be Homo economicus. The inflection-line template: should be {{en-proper noun}}, which I've done. I don't think the Hyponyms section is appropriate either, but it remains. DCDuring (talk) 14:16, 22 July 2020 (UTC)[reply]
@DCDuring Is "archaic" (defined in the glossary as "No longer in general use, but still found in some contemporary texts that aim for an antique style") the best label for terms like Bovine coronavirus? I would suggest that "superseded" would be better. Andrew Sheedy (talk) 02:46, 24 July 2020 (UTC)[reply]
I was trying to fit the situation onto the Procrustean bed of our existing terminology, the choices being obsolete, archaic, or dated. I think that the time frame in a field like virology is quite compressed compared to normal language and even to much slang. DCDuring (talk) 02:53, 24 July 2020 (UTC)[reply]
There is no such thing as an “English pseudo-taxonomic name”. If it is by intention sounding like / devised as translingual, then it is translingual, even if only ever appearing in one language; sometimes the scientific community is not that large or connected. There have been many disease names in the 19th century only used by some authors in one country, such as those under translingual in bejel (I haven’t even added all, but some may be only used by German physicians, I do not recall, however nobody considered them German). The same way Homo erectus is translingual, so is Homo oeconomicus and should be moved to translingual. Homo economicus is according to Medieval practice of replacing ⟨ae⟩ and ⟨oe⟩ with ⟨e⟩ (as well as American practice, which is basically the same as Medieval practice, see their disregard for the metrical system, pure medievalism). Just that because it is consciously pseudo it does not need to be italicized and can also begin with a minuscule. Likewise there is no reason why homo sovieticus would be English before any other language. Having an English pronunciation does not make a term English, all organism taxons and anatomical taxons are spoken in the accent of the language in which they are embedded. Fay Freak (talk) 20:18, 24 July 2020 (UTC)[reply]
Those examples all work as multilingual, and were composed as multilingual. However, consider the nonce taxon illegitimus estúpido in Legacy of Heorot:
    • He looked Carlos over carefully...."We'll call it illegitimus estúpido for the time being."
It is not so multilingual. Everyone is intended to spot the Spanishness of the specific name - Carlos is the only character in the book who uses any Spanish. This character and his occasional use of Spanish could be a headache for a translator. --RichardW57 (talk) 22:39, 24 July 2020 (UTC)[reply]

More specific categories of English irregular plurals edit

For example, Category:English irregular plurals ending in "-i" could be separated into categories for “-o” to “-i” and “-us” to “-i”; Category:English plurals ending in "-a" into categories for “-a” and “-ata”. J3133 (talk) 15:08, 22 July 2020 (UTC)[reply]

Restore Frankish entries edit

Last month there was some Reddit discussion at r/badlinguistics and r/linguistics arguing against how all the Frankish entries have been deleted:

Premise: Because "Frankish is a dialect of Proto-West Germanic", every single Frankish entry was deleted by one user without consulting anyone. This was met with overwhelming support and any etymological entries of Frankish have been expunged.
Point A: this is not necessarily true, operating off of a theory they've deleted years of work.
Point B: you know what, I can agree Frankish was a collection of dialects that melted into OHG and Old Dutch - but that doesn't justify deleting every entry, which despite what one user (the main deleter said), were not replaced: Frankish is absent from any "Proto-West Germanic" entry [which is a language that didn't properly exist unlike Frankish, which makes this even sillier; it's only a theoretical reconstruction from which to apply sound changes, unlike PIE and PU which at least in some form existed as a united dialect continuum or "language", then languages].
Point C: There is attested Frankish, justifying a namespace and separate entries

Can we restore the deleted Frankish pages, or at least add the information to their corresponding Proto-West-Germanic entries? For an example of a word that has been totally deleted, you can see that User:Rua deleted Appendix:Frankish/baukan, and this is nowhere to be found if you search on Wiktionary now (excluding User and Talk pages). —Enervation (talk) 21:39, 22 July 2020 (UTC)[reply]

Actually, Appendix:Frankish/baukan has not been totally deleted. If you visit the page and click the links in the move log, you will see that it has been moved multiple times (Appendix:Frankish/baukanReconstruction:Frankish/baukanReconstruction:Frankish/bauknReconstruction:Proto-West Germanic/baukn), and as an admin I can confirm that all the revisions with content are still publicly visible at its newest title. All the deleted revisions are the ones automatically created by the page move, containing #REDIRECT [[<new title>]]. When a page is moved, the history moves along with it.
But it is true that some of the history of the Frankish entries was lost. User:Victar pointed out several of the cases where a Proto-West Germanic entry was created anew when there was already a corresponding Frankish entry. I tried to preserve the edit histories with Special:MergeHistory, but sometimes there were edits that could be moved into the history of the new entry because the newest edit of the old entry was later than the oldest edit of the new entry. Renaming had been conducted with more forethought: move, don't recreate the entries from scratch, to preserve history.Eru·tuon 22:35, 22 July 2020 (UTC)[reply]
Edit: Correction: the history merging that I did was merging edits of Proto-Germanic entries with only West Germanic descendants into the corresponding Proto-West Germanic entry, by request of User:Victar. The edits that were lost, because they could not be merged, were in some of the Proto-Germanic entries. So I don't know if any edits in Frankish entries were actually lost. — Eru·tuon 07:38, 26 August 2020 (UTC)[reply]
I also did a lot of page history merging for Frankish and West Germanic entries. Even in case where individual edits can no longer be traced, I doubt there's much if any actual information that's no longer available. —Mahāgaja · talk 08:38, 23 July 2020 (UTC)[reply]
My point was that the new Reconstruction:Proto-West Germanic page does not have any term "baukan", nor does it appear anywhere on Wiktionary, nor does the Proto-West Germanic page even mention Frankish, even though Wikipedia still has some references to Frankish *baukan. The same is the case for many of the other deleted Frankish terms. —Enervation (talk) 07:19, 11 September 2020 (UTC)[reply]
For reference: Wiktionary:Votes/2020-01/Make Frankish an etymology-only variant of Proto-West Germanic, Wiktionary:Beer parlour/2019/December#Proposal: make Frankish an etymology-only variant of Proto-West Germanic. I am happy to have CodeCat/Rua desysopped, now as before, but their decision has been reinforced by the vote. If Frankish is attested in use, it is untrue that "Frankish is a dialect of Proto-West Germanic". --Dan Polansky (talk) 17:05, 23 July 2020 (UTC)[reply]
While we still recognized Frankish as a separate language, we listed only one attested lemma, ᚨᚾᚾ, which we now call Old Dutch. Wikipedia says that the source of that term, the Bergakker inscription, can be considered the earliest attestation of Old Dutch. —Mahāgaja · talk 17:59, 23 July 2020 (UTC)[reply]
Though Frankish properly existed and Proto-West Germanic be a theoretical reconstruction, what is devised as Frankish ends up having followed the same rules which one would use to reconstructed Proto-West Germanic, hence it is actually the same, but fans of Frankish reconstructions did not realize it. Better follow the practitioners who actually had to deal with creating a lexicon of all kinds of Germanic languages. Unlike the armchair linguists represented by Reddit the lexicographers here realized there is no difference. Point C is a non-sequitur. The attestation may as well mean Proto-West Germanic or Old Dutch or Old High German before any Second Germanic sound shift appeared. “Frankish” we could treat hasn’t a higher grade of existence than Proto-West-Germanic, maybe even less because it is arbitrarily made up as an ancestor of Old Dutch while it could as well by more like Old High German, so as an ambiguous name it had to be removed. I see Wikipedia distinguishes a Frankish language from Franconian languages but on Wiktionary Franconian and Frankish are put as synonyms, so you see it is all wishy-washy imagination. Interesting how Redditors are “infuriated” about things they have merely fancied. This confirms all memes. Fay Freak (talk) 00:40, 26 July 2020 (UTC)[reply]
Re "If Frankish is attested in use, it is untrue that 'Frankish is a dialect of Proto-West Germanic'", and the Redditors' point C: I don't see that the conclusion "it can't be [a dialect of] PWG" follows from the premise "it is attested". Although most languages with "Proto-" in their name are entirely reconstructed, this is not actually some kind of universal law—Proto-Norse is directly attested, for example, and accordingly has mainspace entries. - -sche (discuss) 05:56, 26 July 2020 (UTC)[reply]

Are variant spellings lemmas? edit

Are variant spellings lemmas? For example, I used the {{en-adj}} in creating the obsolete spelling doubtlesse. It is classified as a lemma. But it feels more like a variant form of doubtless. Should I use {{head|en|adjective form}}? Vox Sciurorum (talk) 14:24, 23 July 2020 (UTC)[reply]

@Vox Sciurorum: The consensus is to classify them as lemmas; almost all variant spellings are in lemma categories. J3133 (talk) 14:17, 24 July 2020 (UTC)[reply]
I suppose I was really asking, should they be lemmas? A bot can fix them if the consensus is wrong. Does being a lemma matter for anything other than which giant category the word is in? Vox Sciurorum (talk) 14:28, 24 July 2020 (UTC)[reply]
@Vox Sciurorum: My reply was about the variant spellings that you did not classify as lemmas, which is against the consensus, rather than whether they should be classified as lemmas. J3133 (talk) 14:36, 24 July 2020 (UTC)[reply]
@Vox Sciurorum: The alternative forms may have alternative inflections. For example the plural of pesciu is pesce, while the plural of pisciu will be pisce. Why shouldn't pisce pisciu be a lemma? It possesses all the characteristics of one. Thadh (talk) 17:54, 24 July 2020 (UTC)[reply]
@Thadh: Don't you mean pisciu? Thanks Thadh (talk) 20:19, 24 July 2020 (UTC) @Vox Sciurorum: These giant lists do have some merit - sometimes one discovers a systematic error in entries. The general rule I've seen is that inflection tables are always on the same page as the corresponding lemma. Simple inflection is stored in lemma head lines. In that respect, the distinction between lemmas and non-lemmas does serve man, or at least, Wiktionary author. Now, there are problems with controlling the provision of definitions. Last time I looked, the British distinction between 'draft' and 'draught' was rather missing. --RichardW57 (talk) 20:08, 24 July 2020 (UTC)[reply]
It can have its complexities - in one or two cases draft is used in British English, e.g. a draft of a document, so not all senses of draft are American spellings only. DonnanZ (talk) 20:47, 24 July 2020 (UTC)[reply]
I think the concept of lemma can be split into two parts: whether the entry has a full definition, and whether it has inflections. The current practice seems to be that any entry that either has inflections, or has a full definition, is a lemma. This vote explicitly classed comparative and superlative forms as lemmas, too, renaming their category from "adjective comparative forms" (a non-lemma naming convention) to "comparative adjectives" (a lemma naming convention), with each of them receiving their own non-lemma form category "comparative adjective forms". —Rua (mew) 11:50, 25 July 2020 (UTC)[reply]
That's not quite correct. The vote established a lemma-like naming convention for the categories of comparative adjectives and superlative adjectives, but they are still classified as non-lemma forms. For instance, νεώτερος (neṓteros) is in Ancient Greek comparative adjectives and Ancient Greek non-lemma forms. — Eru·tuon 22:54, 25 July 2020 (UTC)[reply]
How is the consensus that these should be lemmas reconciled with the grease pit discussion about adding categories to variant forms? Three comments said that alternative forms should not be categorized (i.e. should not be put in categories like en:Diseases, en:Grasses, English words with offensive senses). I didn't pile on but I have generally avoided categorizing variant forms. Is there a principled reason to classify them as lemmas but not classify them with meanings? Vox Sciurorum (talk) 13:08, 25 July 2020 (UTC)[reply]
British -ise spellings, which are already treated like second-rate citizens, should have lemmas, being the primary form in British English. I don't care what you do with American spellings. DonnanZ (talk) 00:03, 26 July 2020 (UTC)[reply]

Back to Letter entries edit

So I was wondering about whether the impossible amount of letter/noun entries could be reduced (and thus also deal with the Lua Memory Error problem), and I have stumbled upon this and this unfinished discussions as well as an unused vote regarding the issue. There has been proposed a lot and most contributors seemed to favour the idea of ending language-specific entries (either by moving them to another page or deleting them fully).

I propose we move all the entries to an appendix (like e.g. Appendix:Pronunciation of A and Appendix:Spelling pronunciation of A), links to which would be found in the Translingual entry. This way we eliminate the useless links within the page and at the same time do not discourage people from learning the spelling rules of the specific languages.

But most importantly, I think it's about time to finally solve this problem. In my understanding the main contributors were @Rua, @Daniel Carrero and @Metaknowledge Thadh (talk) 21:26, 23 July 2020 (UTC)[reply]

I pulled the vote because Rua's comments led me to question whether my approach was the correct one. I think her idea of treating writing systems as languages has a great deal of merit, but I'm not exactly sure how to implement it. —Μετάknowledgediscuss/deeds 21:31, 23 July 2020 (UTC)[reply]
I would personally prefer user guides for each language that would have the list of letters and the basic orthographic rules for their use, among other things. That way we could still have letter entries, but instead of a template that has Lua-formatted links to literally every letter in the alphabet, you would just have a single link to the user guide page.
The idea of having the same content transcluded to dozens of entries for each of hundreds of languages is totally unrealistic and doomed to fail from its own weight as we include more language sections.
It would be much better to have one page per language, and to also use it to lay out the currently unwritten reasons why there are no entries for "-'s" possessives in English or "-que" forms in Latin and why a lot of English "adjectives" are really nouns used attributively, etc. It's not enough having an "About" page telling editors the rules for making an entry- we need to have a page for each language that explains our practices to our readers. Paper dictionaries have introductory sections. So should we. </sermon> Chuck Entz (talk)
I agree about the introduction part; we ought to tell the users how to use this dictionary (and I would also include why we use the first-person verb in Latin and Greek);
However, I'm not sure it is an appropriate page to add the letter definitions to. Thadh (talk) 06:06, 24 July 2020 (UTC)[reply]
Each multilingual letter entry could link to a common page linking to the language-specific information for the letters of each language. The location of that information does not need to be standardised, though it may make sense to have families of templates {{xx-letter-info|letter}} for more direct access where appropriate. So the method of access would be: (1) click off page for language menu; (2) click off page for language; (3) click for letter. An alternative would be to have language-specific stubs for each language, so the access path would be: (1) click within page for language; (2) click for language-specific letter-data. The trouble with the second scheme is that it would increase the page size even for vowels; for consonants it would be a massive increase in size, as the vast majority of Roman script languages have the letter 'n'. A compromise is to have a template-generated per language look-up in each multilingual letter entry, but I fear that might be a memory hog; it would have to invoke hundreds of templates or modules, or load a very large data module. --RichardW57 (talk) 10:21, 24 July 2020 (UTC)[reply]
I'll give a slightly out of the way example to make sure I'm understanding the mooted solutions. I have just discovered that as well as the longer letter names in Pali such as kakāra, there are shorter names such as ka, and they are also inflected - I've quotes for four forms - bare stem, nominative singular, accusative singular and genitive singular. For nine Pali abugidas supported, the bare stem, which is our citation form, is the single letter. So, to take the Thai script instance, is it being suggesting that (ka) would have a normal entry reduced to script-specific details for the interrogative pronoun (as at present) and a stub entry referencing a language-specific page for the one-character letter name and its inflections? Would the existence of a stub entry for the letter only be justified because in this case there is the interrogative pronoun? I suppose the boilerplate for a redirect could be written in a Lua-free template. --RichardW57 (talk) 08:28, 24 July 2020 (UTC)[reply]
The Pali Roman script entries for the consonant letter names - ka, kha, ga,... is part of a larger problem. Some of these pages have already reached breaking point. The Pali vowel letters will be joining the specific Roman script letter problem; so far I have only a little that is Pali-specific to record about them. I haven't found case forms for them yet, though there are some notable combining forms. --RichardW57 (talk) 08:28, 24 July 2020 (UTC)[reply]
So, to summarize the proposed solutions:
  1. We delete the letters (or nouns that stand for letters) altogether;
    Pro's:
    * A whole lot less entries and a whole lot less work
    Cons:
    * The dictionary becomes incomplete in the long run, as the nouns are actual nouns.
    * The users have no way to determine how spelling in a specific language works
  2. We create a category for every writing system, that gives all the needed information about pronunciation per language. Thadh (talk) 18:29, 24 July 2020 (UTC)[reply]
    Revised to: We create a 'language header' for every writing system, under which we give all the needed information about pronunciation per language. --RichardW57 (talk) 11:23, 1 August 2020 (UTC)[reply]
    Pro's:
    * We keep the entries, but the amount of memory needed is reduced Thadh (talk) 18:29, 24 July 2020 (UTC)[reply]
    That assertion needs explanation. It looks false to me. --RichardW57 (talk) 11:23, 1 August 2020 (UTC)[reply]
    * We keep the language-specific entries tidy and clean
    Cons:
    * Unsure how to arrange it (e.g. where to put the entry in the page)
    * Is still some links in the page, so not perfect
  3. We create a user guide per language, containing both the spelling pronunciation rules and why some entries are preferred/excluded (much like we do with About: pages)
    Pro's:
    * Fixes both this problem and the problem with 'unspoken rules' of Wiktionary.
    * Is a centralised solution.
    Cons:
    * These explanatory pages may become too long, and thus difficult for Users to use
    * The user has to understand that the word he is looking for may be a letter.
  4. We create a pronunciation page per letter (possibly an appendix), that lists the pronunciation and inflection per language; linked to the multilingual entry
    Pro's:
    * Is a centralised solution.
    * Is fairly easy to use.
    Cons:
    * Needs a creation of another page per letter-symbol, which doesn't exist yet.
    * The user has to understand that the word he is looking for may be a letter.
Please refrase and add to the list above, as I am not very good with words. Don't hesitate to correct me if I misunderstood some of the solutions proposed or the implications that they carry. Thadh (talk) 18:29, 24 July 2020 (UTC)[reply]
For the sake of not forgetting this discussion with the start of August the day after tomorrow, and if no-one has anything else to add, shall we put this to a vote? Thadh (talk) 09:22, 30 July 2020 (UTC)[reply]
I have created the following vote: Wiktionary:Votes/2020-07/Removing letter entries except Translingual. I am not sure if I have done everything correctly though... Thadh (talk) 08:03, 31 July 2020 (UTC)[reply]
So, under option 1a (as on the vote page), if we delete 'the nouns that stand for letters', what happens to the English zeta? It currently has two glosses, the Greek letter and the Riemann zeta function. Would the first gloss be deleted? As things currently stand, there are 9 Indic scripts for which most letters will ultimately receive a Pali language entry saying that it is the short name of the letter and giving its inflection (or what is known of it). Under option 1b, would this entry be deleted (despite its being a noun), leaving its attested multicharacter inflected forms as orphans? --RichardW57 (talk) 10:10, 31 July 2020 (UTC)[reply]
The letter zeta is a noun that refers to a letter in another writing system. I'll be honest, I haven't thought of that yet, but this isn't the subject of the vote; The same as I wouldn't yet delete entries of early Cyrillic and Glagolitic names, such as buky or az. They do not (yet) contribute to the problem and are very seldom duplicate, far less than words like auto.
And yes, under 1b all these Pali entries will be deleted, including the multicharacter inflected forms, as they are probably regularly inflected. Thadh (talk) 13:32, 31 July 2020 (UTC)[reply]
Do you mean 1a? You said we would keep aes under Option 1b. As to regularity, Pali o and u exhibit an irregularity in compounding, so I would actually expect an irregularity between 'crude word' form and inflectional stem! A word form like rassa could be a detached form of the word for 'short' or the genitive singular of the letter name ra (which is one letter in an abugida). (Pali letters are 'in work' as far as I am concerned, unrelatedly to the current issue.) A usage note in rakāra could serve as an orphanage. --RichardW57 (talk) 12:14, 1 August 2020 (UTC)[reply]
No, I specifically meant 1b, as "aes" is an irregular plural ('as' would be regular in my understanding). As for the irregular Pali inflections, the whole point of not deleting everything is ensuring that if one found an inflected form in a text, he would be able to determine the word's meaning. I don't have enough knowledge of Pali to judge the inflection's regularity, that would be up to people that work with Pali (I assume that includes you). Thadh (talk) 12:15, 2 August 2020 (UTC)[reply]
Under Options 2 to 4, do the noun forms (e.g. aes) referring to the letter but different from it still have main page entries? --RichardW57 (talk) 10:10, 31 July 2020 (UTC)[reply]
I would say that this depends on the case. For example, in Esperanto, the noun a is inflected regularly (a-oj being the plural); I would keep the specific word aes though, if only for the possibility that a user comes across this form and cannot determine that it is a plural of a letter. However, even so, we could just redirect them to the script entry/manual/pronunciation page the same way that Chinese entries redirect users to the Hanzi-formatted page. Thadh (talk) 13:32, 31 July 2020 (UTC)[reply]
So does Esperanto co (cee) get a main space entry? One can work out that one of the meanings of the two letter word is the name of a letter, and it is formed regularly from the letter. Switching to Thai, if the letter name ส เสือ (sɔ̌ɔ sʉ̌ʉa) were currently qualified for a main-space entry (I don't know why or if it doesn't), would it still qualify? One can, I believe, work out that it is the name of a letter, and therefore of which one. What about ณ เณร (nɔɔ neen)? It so happens that () is a preposition in Thai, so one can't work out that ณ เณร (nɔɔ neen) is a letter. This leads up to the issue of the obvious cognates of the predictable letter name kakāra (letter k), whose cognates in several languages have an etymology. I don't like the idea of these etymologies being relegated to appendices. --RichardW57 (talk) 11:57, 3 August 2020 (UTC)[reply]
I suspect I don't understand Option 2. As I read it, a possible implementation would be:
  1. Add each letter used (whatever that means) in language X to the category letters_used_by_X. Put the source text creating this categorisation in the multilingual entry.
  2. The text of the category page will then provide information on the letters.
  3. The multilingual entry for the letter will instruct the user on how to access the information for the letter in the context of a language.
As I understand it, Options 2 and 3 are compatible; the category pages from Option 2 could link to the user guide. --RichardW57 (talk) 10:10, 31 July 2020 (UTC)[reply]
I think you misunderstood. Option 2 gives the following example:
  • The letter A will be moved from the header "English" or "Spanish" to the header "Latin Script" where it on its turn will give the further usages and notes.
At least, that is how I understood Rua's point. The upside of this would be that instead of a link per every language, just one link to the respecting category (Latin Script letters) will be made. Thadh (talk) 13:32, 31 July 2020 (UTC)[reply]
I take it you are referring to, "My proposal was actually to treat scripts as languages, so that letters are listed and categorised under their script rather than as translingual" back at one of the older discussions. So there is no use of Wiktionary categories here, rather well over 100 new pseudo-languages, one per lettered Script. This was dealing with the objection to 'xh' being multilingual for the stated reason that it was only used in Albanian, which objection fails because:
  • It's also used in Xhosa and other languages.
  • It's the name in English of the digraph in other languages.
By link, I will assume you are referring to the referring part ('reference') of the link, rather than the anchor. So far as I am aware, we have no problem with having lots of anchors. So, what links would we be eliminating on the page k, for example? For most languages using the letter, we apparently eliminate 4 links form the ToC: language, 'pronunciation', 'letter', and 'see also'. However, we then end up with either an unnavigable mass of per-language information, or that needs indexing in a ToC, in which case the gain vanishes. In fact, we've now added another link to the Toc, namely 'Latin script letters'. Or can we create subsidiary automatic tables of contents? I don't think we can.--RichardW57 (talk) 11:23, 1 August 2020 (UTC)[reply]
I think we may also see cross-references appearing from top-level per-language section to the letter section! That may add to the links from the ToC. --RichardW57 (talk) 11:23, 1 August 2020 (UTC)[reply]
I've clarified the terminology of Option 2; I believe I haven't changed its meaning from what was intended. --RichardW57 (talk) 11:23, 1 August 2020 (UTC)[reply]
I think we should split Option 2 into creating a language-level header and moving the language-specific information into the Multilingual section. There is a general issue with multilingual lemmas. A lot of those lemmas are speakable, and pronunciations obviously differ from language to language. --RichardW57 (talk) 11:23, 1 August 2020 (UTC)[reply]
Are kana letters? --RichardW57 (talk) 10:10, 31 July 2020 (UTC)[reply]
They would be treated the same as "proper" letters: The part of a kana entry that merely gives information on the fact that the symbol signifies the kana syllable (like the syllable "の" (the syllable "の")) will be moved/deleted (depending on the option), while the entries that have actual content (like the particle "の" (the particle "の")) will be kept. Same goes for Canadian Syllabics, Abugida and other non-hieroglyphic writing systems. Thadh (talk) 13:32, 31 July 2020 (UTC)[reply]
'Hieroglyphic' presumably includes Chinese characters. Does it also include Babylonian cuneiform? (A sample of one suggests that the language-specific phonetics are being kept to the multilingual entry.) --RichardW57 (talk) 11:57, 3 August 2020 (UTC)[reply]

Vote structure edit

@Thadh Can we have two sets of options on one vote, one on which entries are affected and then one on what to do with them? I'm really not happy with expecting people to recognise the inflected form of a letter when they meet it. --RichardW57 (talk) 14:30, 10 August 2020 (UTC)[reply]

What do you propose? I am all for it, but at this point the amount of possibilities seems far too big to be able to be decided by one vote at all. I'm afraid it all comes to the fact that most people just don't care enough to come forward with ideas. Thadh (talk) 14:37, 10 August 2020 (UTC)[reply]
@Thadh:: We need to clarify which entries (possibly even senses) are to be removed or banned from main space. For example, for the letter α, the letter names ἄλφα and άλφα appear to be candidates for removal, but you seemed to think it obvious that alpha should remain. If we remove the letter name sense of modern Greek άλφα, we orphan its two recorded derivative senses. Letters in English have the same issues of orphaning - 'triple A candidate' and the like. I think we should only ban entries that are homographs of letters. That helps the alpha-related cases, but doesn't help with English 'triple A candidate'.--RichardW57 (talk) 11:19, 11 August 2020 (UTC)[reply]
I don't see why ἄλφα and άλφα should be deleted either. The letter names all full-out names, instead of depicting the letter. Now, what I proposed was deleting the Greek entry α (a) (as well as Albanian and Tsakonian). That was the point all along. Thadh (talk) 12:35, 11 August 2020 (UTC)[reply]
There's also the issue of inflected forms. English "b's" is easy for the English user to deal with, but some inflected forms are less obvious. For example, the 'r' letter of Pali has name ra and genitive singular rassa, a homophone of rassa (short). Both forms might survive in the Roman alphabet, but in, for example, the Devanagari script, the name (ra, ra) will probably be removed but I strongly believe the page for रस्स (rassa, rassa) should alert the user to the letter sense. If that retention is to be prohibited, I will vote against all the options for change. Or can we use some sort of hat note? --RichardW57 (talk) 11:19, 11 August 2020 (UTC)[reply]
I see the problem, and to be honest, I don't personally object to the inflected forms (when reasonableas opposed to b's or c's) being kept; However, I also understand if other users don't agree with this opinion. Thadh (talk) 12:35, 11 August 2020 (UTC)[reply]
There's also the issue of implementation. Perhaps we need to make it easy to undo the changes if we find the resulting structure has too many problems. Removing letters assigned to languages without losing content also looks fiddly. We probably need warning templates between data being copied and its being deleted from the original location. --RichardW57 (talk) 11:19, 11 August 2020 (UTC)[reply]

Obsolete possessive forms in English edit

According to Wiktionary:English entry guidelines § Modern English possessives, entries for “possessive forms which are formed by adding the enclitics ’s or , and which are otherwise not idiomatic (with the single exception of the pronoun one’s)” are not allowed. What about obsolete possessive forms (which are not ’s or )? The page does not mention them. J3133 (talk) 03:15, 24 July 2020 (UTC)[reply]

You mean like e.g. kings for king's? De facto we don't include kings-type possessives any more than king's. (And king his would be treated as SOP.) Certainly, the page should be updated to cover this... - -sche (discuss) 06:03, 26 July 2020 (UTC)[reply]
@-sche: What about kinges (as a possessive form of king)? J3133 (talk) 06:06, 26 July 2020 (UTC)[reply]
De facto, we don't include such possessives, nor e.g. kingis (google books:"the Kingis Majeste") or any other obsolete or modern defective spellings of the [regular] "'s" (or "'") possessive (in this case, king's). - -sche (discuss) 06:20, 26 July 2020 (UTC)[reply]
However, if a genitive were formed in an irregular (to English) way, not by "'s" or "'" or a form thereof, I think that would merit discussion. For example, I find some limited evidence of Jesu as the genitive of Jesus, following Latin (see Citations:Jesu). If that is attested, I'd think there'd be a good case for keeping it... - -sche (discuss) 06:46, 26 July 2020 (UTC)[reply]
We include Swedish genitives which are regularly formed by adding s. We should include similar archaic English possessives. Vox Sciurorum (talk) 11:22, 1 August 2020 (UTC)[reply]

Terms exclusively found in expressions? edit

I’m sure this has been asked before, but I couldn’t find a guideline about this, so feel free to direct me there.

What is the proper way of showing that a term only exists in a certain phrase? For some entries I have written a usage note about it, e.g. fårakläder or morse and that works pretty well if the word has a meaning on its own. But I just created i lönndom, and in this case you cannot really say that lönndom has any meaning on its own, so I would be unsure what to write in a separate entry. Can you do a redirect, or is there a template to use in the definition with a soft redirect of some sort? Thanks! --Lundgren8 (t · c) 13:07, 25 July 2020 (UTC)[reply]

I normally give the term an etymology and set the header to the term. (for example pesciu ruvettu). Thadh (talk) 13:19, 25 July 2020 (UTC)[reply]
Lönndom appears not in i lönndom in song lyrics by the Swedish folk metal band Falconer. The album is a retelling of Scandinavian folk tales so possibly they are trying for an archaic style. Or it could be a back formation, not considered correct use. But it has been used alone. For the question you asked, in camera may show the right way to do it. There is no link to camera because the phrase is not derived from English camera. The head is coded {{en-adv|-|head=in camera}}. Vox Sciurorum (talk) 13:28, 25 July 2020 (UTC)[reply]
Yes, it appears to be a reanalysis in their cases. They use it as a normal noun on -dom, whereas older usages exclusively use it in prepositional phrases, as shown in SAOB. --Lundgren8 (t · c) 18:39, 25 July 2020 (UTC)[reply]
Thanks for mentioning SAOB. I found the definition there[3]. I assume uttr. is short for uttryck. Is that abbreviation standard enough to have a definition here? Vox Sciurorum (talk) 22:53, 25 July 2020 (UTC)[reply]
Spontaneously I’d say no. SAOB uses a lot of abbreviations not found elsewhere which makes it hard to read in the beginning. They also use ‘l.’ for ‘eller’ and ’ss.’ for ’såsom’ which I have never seen elsewhere. On a side note I noticed that they list the Falconer usage as definition c) but it’s marked as both ‘rare’ and extinct with one citation from the mid 19th c. --Lundgren8 (t · c) 06:20, 26 July 2020 (UTC)[reply]
The template is {{only used in}}: Fay Freak (talk) 13:38, 25 July 2020 (UTC)[reply]
Thank you very much. This is along the lines of what I was looking for! --Lundgren8 (t · c) 18:39, 25 July 2020 (UTC)[reply]
How would commentators here feel about the use of {{l|la|camera}} in {{en-adverb|head=[[in]] {{l|la|camera}}}}? DCDuring (talk) 22:26, 25 July 2020 (UTC)[reply]
It would have to be |head={{l|la|in}} {{l|la|camera}}, since it also has the Latin in rather than the English one, but I don't like the idea. The Etymology section is the place for that. —Mahāgaja · talk 22:30, 25 July 2020 (UTC)[reply]
I agree in this case the etymology section does the job. Vox Sciurorum (talk) 22:31, 25 July 2020 (UTC)[reply]
Okay, I have used it in Translingual because otherwise Translingual is linked; but as Mahagaja says one can also have no links, overwriting the default linking behaviour by specificing the title in |head= without links, and use the etymology section for links. This seems redundant however, especially as one should expect to land at a Latin page if one is at Translingual so one can make it less wordy. Fay Freak (talk) 22:34, 25 July 2020 (UTC)[reply]
Incidentally, regarding {{only used in}} and Translingual: I recall discussing (with DCDuring IIRC) at some point in the last year whether to create {{only used in}} entries for e.g. the parts of a species name (and whether it makes a difference in a part is used in >3 vs <3 species' names)... - -sche (discuss) 06:49, 26 July 2020 (UTC)[reply]
@-sche: Well if it is used in three species names one cannot use {{only used in}} because this template accepts only one longer phrase, it looks bad if used on multiple lines (as one lines says only used in and then another line the same, which is a contradiction) and such a template that accepts more phrases could not possibly be used on one line when say there are thirty derived terms so obviously you’re not gonna put them all in a definition line but put them into derived terms. At the same time, it is disagreeable to have an entry because of derived terms in a derived terms section if the part is not used independently. So I don’t know what do if a term is “only used in > 1 ∈ ℕ” expressions. Fay Freak (talk) 12:42, 26 July 2020 (UTC)[reply]

Are languages proper nouns or common nouns? edit

Some previous discussions: Beer parlour 2015, Beer parlour 2016, Tea room 2018

Currently, Wiktionary is inconsistent regarding this. Some examples:

Common nouns: French, German, Spanish.

Proper nouns: English, Dutch, Esperanto.

J3133 (talk) 07:02, 26 July 2020 (UTC)[reply]

I would like to treat languages as common nouns. For instance, English is also an adjective. But there is a hard core of resistance against that unfortunately. They are not proper nouns in the same way as people's names, place names, names of businesses and organisations etc. are. DonnanZ (talk) 08:49, 26 July 2020 (UTC)[reply]
This has been discussed before; perhaps someone will find the links. Most languages are (or at least, were for a decade — I haven't checked to see if anyone has changed things in the last few months) categorized as proper nouns, e.g. Abaza, Abkhaz. A few users relatively recently changed a few entries to common nouns, which you've noticed. I was undoing those changes when one user asked me to stop and have the most recent prior discussion of this, which unfortunately petered out without much of a resolution. Perhaps we should have a vote. (Another difficult question that has been discussed: are religions proper or common nouns?) - -sche (discuss) 09:57, 26 July 2020 (UTC)[reply]
In Nordic languages, they are considered common nouns as they are not written with a capital letter. Same thing with e.g. holidays. --Lundgren8 (t · c) 12:16, 26 July 2020 (UTC)[reply]
Correct. I'm sure that I have pointed that out before. DonnanZ (talk) 15:07, 26 July 2020 (UTC)[reply]
Common nouns. My French is better than your French, and his French is an even shoddier French. This is a usage that does not usually go well with proper nouns. Fay Freak (talk) 13:07, 26 July 2020 (UTC)[reply]
"Trump's America is more dangerous than Biden's America" (or vice versa, according to the former's ads). (On the question of what part of speech religions are, I note they can be used the same way, e.g. "His Christianity is more militant than my Christianity, and her Christianity is an even more extreme Christianity.") - -sche (discuss) 19:29, 26 July 2020 (UTC)[reply]
And for that matter "There are three Marys in my class" and "He's a whole new Jonathan". Proper nouns can always behave as common nouns under the right circumstances, and there really is no clear-cut distinction between the two types. Capitalization is a red herring, since it differs from language to language (and of course a whole lot of languages are written in scripts that don't have letter case), and isn't always a reliable indicator even within a single language (in English, Frenchman is a common noun, not a proper noun, but it's always capitalized, while k.d. lang is a proper noun, not a common noun, but it's written lower case). —Mahāgaja · talk 21:22, 26 July 2020 (UTC)[reply]
If you see these language names as proper nouns, then the entry language itself needs to have a proper noun entry. But it isn’t a proper noun because it is a diverse phenomenon, and this is true for every individual language as in the examples. When I say “English” then I do not mean an idealized standard but an amassment of regiolects, sociolects, and idiolects. Therefore it is no proper noun, because you cannot pinpoint it. If it were otherwise you could make hydrogen, calcium or nickel proper nouns. Fay Freak (talk) 23:41, 26 July 2020 (UTC)[reply]
The names of some languages are also surnames, and thus proper nouns in that sense - English, French, German, Welsh - all of these are also adjectives. A proper noun is also called a proper name, and looking at the Oxford definition I see only names (spelt with an initial capital letter) mentioned as being proper nouns, there is no mention of languages, nor of Mahagaja's example, k.d. lang. The use of lower case for email addresses may be a contributory factor there. DonnanZ (talk) 08:39, 27 July 2020 (UTC)[reply]
k.d. lang is undoubtedly a proper noun, as it is a name; It shouldn't be in a dictionary, as nor is Ice-T or X Æ A-12. Those are names that are specific to one person. Thadh (talk) 10:55, 27 July 2020 (UTC)[reply]
Individuals are, of course, "Wikipedia stuff", but I have no objection to including the likes of Winston Churchill at Churchill. That can be useful. DonnanZ (talk) 11:32, 27 July 2020 (UTC)[reply]
I didn't mean to imply we should have an entry for k.d. lang; my point is merely that even within English, not all capitalized nouns are proper nouns, nor are all proper nouns capitalized, so we can't use capitalization as a test to determine whether a given noun is proper or common. —Mahāgaja · talk 11:54, 27 July 2020 (UTC)[reply]
No, but with English spelling conventions proper nouns should be capitalised, and normally are; again by convention so are nationalities and ethnic and religious groups, and adjectives relating to them. Harking back to languages, Old English and Old Norse are classified as mass nouns by Oxford; these two, of course, have the adjective old capitalised by convention, which doesn't normally happen with words like anti-American. DonnanZ (talk) 13:12, 27 July 2020 (UTC)[reply]
The arguments advanced for languages being common nouns in English have each been addressed. Are there any other arguments?
I don't see that English, referring to the language, is an adjective either.
As for Lexico, I note that the leading usage example in their entry for English uses Englishes, a plural (as Donnanz did above. I don't think the lexicographer who worked on that entry bothered to look at their own definition of mass noun.
I don't care what lexicographers in other languages do about word classes in their languages and don't think it need trouble us with regard to English L2 sections. DCDuring (talk) 02:21, 29 July 2020 (UTC)[reply]
@DCDuring: I don’t see how what I have said is refuted or wholly addressed. What now about that you cannot pinpoint a language? Does it ever occur that proper nouns are mass nouns? Because masses is what languages are. Fay Freak (talk) 14:36, 29 July 2020 (UTC)[reply]
"My Porsche is better/faster/prettier/sexier than your Porsche." It may be that some usage of brand names and some usage of language names are "common", but when used of the brand or language as a whole, we have unique entities. But that is a feature of many proper nouns. "Mary Pickering is one of the Duckworths." "The Charleston Duckworths?" "No the Charlotte Duckworths." NB, not every Duckworth is named Duckworth. Duckworth is like the names of a Roman gens, the geographic modifier is like a cognomen.
You're dwelling on the semantics, when a grammatical class is what's under discussion. Almost any English noun can be used both countably and uncountably. A relevant test for countability is pluralization and modification by the discrete quantifiers (a, one, two etc). A relevant test for uncountability ("massiness") is modifiability by determiners like much. When English is modified by much it may refer to English speech or English writing or speech or writing by some population of speakers of writers of English. I suppose we could make definitions or subdefinitions for each of those, but I think of these usages are ellipses, omitting the uncountable nouns like speech, writing, and, possibly, population. DCDuring (talk) 15:50, 29 July 2020 (UTC)[reply]
Re "pinpointing": I would say it's broadly as clear what entity a language name refers to as it is what entity a region name refers to, or even as it has historically been what entity a 'country' name refers to. For some languages like Dothraki the corpus is finite and its borders (though changeable) are thoroughly delimited the way the boundaries of the Vatican City are. For other languages and places the precise borders may be harder to delineate, like whether (or which part of) Polari is English or where exactly the precise limits of Media or Cathay were, as to whether some particular word or acre would've been included — but the referent is still a specific identifiable entity.
I lament that there are few high-quality resources on what is vs isn't a proper noun that could be consulted for their opinions here; as I commented in a previous discussion, even many college-level grammar/language textbooks equate the distinction with 'whether the noun is capitalized', which is facile, spurious. - -sche (discuss) 19:41, 31 July 2020 (UTC)[reply]
Can you pinpoint Las Vegas, in common usage (not de jure)? The de jure limits mean that many tourists to Las Vegas never enter the city, as neither the airport nor the Strip are in the de jure limits. It is an extreme case, but I think most cities, in common usage, have similar vagueness, where which urban and suburban areas are part of the city may vary wildly in various opinions.--Prosfilaes (talk) 15:44, 2 August 2020 (UTC)[reply]
I tracked down some of the prior discussions (there was at least one other significant one that I didn't find yet) and added links to them at the top of this section (as is, IME, a usual way to present such links). The related discussion of whether religions are proper nouns was Wiktionary:Beer parlour/2015/July#are_religions_nouns_or_proper_nouns?. - -sche (discuss) 20:05, 30 July 2020 (UTC)[reply]
I undid the changes which made the three languages mentioned at the top of this section into common nouns, deviating them from the overwhelmingly most common, longstanding, standard practice of treating languages as proper nouns, in the absence of consensus above for so changing languages to common nouns. If you spot other entries which were changed to common nouns, point them out. - -sche (discuss) 20:35, 2 August 2020 (UTC)[reply]

Status of [h] as an allophone of /x/ in Old English edit

I've noticed in most (if not all) Old English entries for words containing <h> that it's loosely transcribed into IPA as /x/ with [h] in narrow transcription. Take hūs for example. Why is /x/ analyzed as the underlying phone when the Old English Wikipedia entry clearly states that it along with [ç] are allophones of /h/ when in coda position? GabenInABox (talk) 22:23, 27 July 2020 (UTC)[reply]

@GabenInABox See Module talk:ang-pron, where there's a discussion that led to this. Benwing2 (talk) 03:15, 28 July 2020 (UTC)[reply]