Wiktionary:Beer parlour/2012/January

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit

first noun of a noun-noun compound is not (necessarily) an adjective

Forgive me if this is the wrong forum for this, I'm a casual wiktionary user only. I've noticed entries for "Adjective (not comparable)" for many words which are not adjectives, but which could easily be construed as adjectives, since they commonly occur as the first noun in a noun-noun compound. Some examples:

I know this distinction can be a bit subjective sometimes (especially for materials like acid/bamboo/etc), so before I go editing like mad, I wanted to know if there is a policy on this. —This comment was unsigned.

Can someone provide an example of a word which has been correctly marked as such? -- 17:38, 5 January 2012 (UTC)
What, a word marked as a noun, you mean? Mglovesfun (talk) 17:40, 5 January 2012 (UTC)
I have for some time advocated grammatical tests as the principal means to determine whether a word fell into a given PoS category. For adjectives, see WT:English adjectives. I think there is some agreement about this in the sense that a noun whose sole adjectival trait is that it is used attributively but whose entry has an Adjective PoS section usually gets that section removed when challenged at WT:RFD, which is our standard forum for handling such matters. One very significant proviso is that, if the term is used attributively with a meaning that does not clearly and directly correspond to a legitimate noun sense, then an Adjective section containing that sense should remain. An example, I think, of this proviso in operation would be acid#Adjective, for which at least the acid rock sense seems to me to be distinct from any noun sense that comes to mind. With the possible exception of a cappella, each of the others seems, at first blush, worth an RfD challenge to the Adjective section IMHO. DCDuring TALK 19:06, 5 January 2012 (UTC)

Middle Spanish

¶ Hullo. I wou’d like to creäte entries for Middle Spanish, but we do not possess the necessary categories or an index for Middle Spanish; there does not seem to be an ISO code for it according to its Wikipedia article, so an appendix may be necessary, unless it is possible to make our own code, as for Simple English. In essence: I desire to start entries for a particular language but we do not have the necessary resources right now, so I wou’d like to ask if somebody cou’d please provide them for us. I thank you. --Pilcrow 02:22, 3 January 2012 (UTC)

Considering it wasn't spoken so long ago, does it really merit separate treatment? —CodeCat 02:48, 3 January 2012 (UTC)
I'd create Middle Spanish entries under the Spanish header and tag them "obsolete" if they're distinct from Modern Spanish. The differences aren't as great as between Middle English and Modern English, especially not in the written language. —Angr 18:48, 4 January 2012 (UTC)

Norwegian Bokmål/Nynorsk

Why are there separate headers for Norwegian, Norwegian Bokmål, and Norwegian Nynorsk? I propose that these be merged under the common header 'Norwegian' and indicated as either Bokmål or Nynorsk when necessary. --JorisvS 16:15, 3 January 2012 (UTC)

They inflect differently though, so it isn't as easy as having two context tags Bokmal and Nynorsk. -- Liliana 06:40, 4 January 2012 (UTC)
Okay, but since when are we in the habit of having multiple headers for one language, even if context tags aren't sufficient to handle the differences? Note also that there exist not two, but three different headers for Norwegian. --JorisvS 11:22, 4 January 2012 (UTC)
I know almost nothing about the issue, but it is unresolved here; the templates {{nb}} and {{nn}} are often used under the Norwegian header (code {{no}}). Most of the Norwegian entries in User:Yair rand/uncategorized language sections/Not English aren't uncategorized, they just contain the 'wrong' language code. So some sort of real resolution would be nice. Mglovesfun (talk) 11:52, 4 January 2012 (UTC)
Yes, that's why I started this discussion. I think it's obvious that these should be under a common header and that this header should be ==Norwegian==. I'm not knowledgeable enough about Norwegian to have an opinion about the remaining issue(s) (inflection, as I understand it).--JorisvS 13:57, 4 January 2012 (UTC)
Arguably it should be the opposite way around, have separate headers for Bokmal and Nynorsk (thus eliminating the common Norwegian header). -- Liliana 16:28, 4 January 2012 (UTC)
Why? Why separate headers for what is essentially the same language? --JorisvS 20:07, 4 January 2012 (UTC)
The question of whether Bokmål and Nynorsk are different languages is certainly non-trivial. Given that they have separate Wikipedias and separate ISO-639-1 codes, I'd keep them separate unless native speakers argued otherwise.--Prosfilaes 09:36, 5 January 2012 (UTC)
They are not just different spelling forms but they also have different words in some cases, such as Bokmål dere and Nynorsk dykk. From what I understand, the two standards are based on different dialects, with Bokmål being based mostly on the urban dialects of Oslo and Nynorsk centered more around the west coastal area. Maybe we can look at how other Wiktionaries solve this problem. I know that Dutch Wiktionary treats Bokmål as 'Norwegian' and has Nynorsk as a separate language. —CodeCat 12:38, 5 January 2012 (UTC)
That's quite biased, though. -- Liliana 13:36, 5 January 2012 (UTC)
That's true, but in everyday practice most people who learn 'Norwegian' as a foreign language learn Bokmål, and never encounter Nynorsk at all. So we could either perpetuate this existing bias, or be correct at the cost of possibly confusing our users. —CodeCat 13:39, 5 January 2012 (UTC)
I support this bias as well. Unless it's Nynorsk, we are talking about Norwegian. Google Translate works with Bokmål but calls it Norwegian. If the words are spelled identically, mark them as Norwegian, otherwise add Nynorsk:
Translation of autumn into Norwegian:
* Norwegian: {{t+|no|høst|m}}
*: Nynorsk: {{t|nn|haust|m}}
In short, I support to have two headers - Norwegian and Nynorsk or merged into Norwegian where practical. Bokmål should be merged into Norwegian and {{nb}} should not be used, only {{no}} and {{nn}} in some cases. --Anatoli (обсудить) 00:37, 6 January 2012 (UTC)
Are they really sufficiently different to hamper intelligibility? As I understand it these are different standard languages with separate language codes. We have another situation where there are different standard languages with separate language codes: Serbo-Croatian, whose standards were merged some time ago. --JorisvS 18:35, 5 January 2012 (UTC)
We just need to have n-nn and n-bo and in the case that it is not known work on updating them into one or the other, and not allowing any new entries that are n-unspecified. Norweigian is a special case but how to treat it is not, it is universally treated as two languages on every operating system, translator, website, or wikipedia I have ever seen. They just happen to be spoken very similarly to the point of mutual intelligibility, even more pronounced than chinese dialects.Lucifer 18:52, 5 January 2012 (UTC)
Chinese dialects are actual dialects, though. Bokmål and Nynorsk are just different spelling systems, you can't really 'speak Nynorsk', even though some people still try. The spoken language and the written language aren't necessarily related. Many people speak Norwegian dialects, and might write in Bokmål even though Nynorsk more closely matches their dialect. And in the same way, urban people who speak a dialect that more resembles written Bokmål might still prefer to write in Nynorsk (although that's rare). —CodeCat 19:02, 5 January 2012 (UTC)

Please note that we're primarily a written dictionary, not a spoken one. Thus spelling differences are of much greater importance to us than pronunciations across dialects. -- Liliana 00:22, 6 January 2012 (UTC)

But why would we treat different spellings as separate languages? Would you support treating Pinyin as a separate language from Mandarin? Or Cyrillic Serbo-Croatian as separate from Latin Serbo-Croatian? The fact that both are standardised shouldn't matter either; the Valencian standard is distinct from standard Catalan, but we call both Catalan (although that's a debate in itself). Or what about Simplified Chinese and Traditional Chinese, which is actually very similar to the Bokmål-Nynorsk issue? In the end, what Wiktionary represents is a language. Bokmål and Nynorsk are not languages, they are different representations of one group languages called Norwegian. —CodeCat 00:58, 6 January 2012 (UTC)
We merged Romanian and Moldavian, Serbo-Croatian varieties, so the same could be done with Norwegian and Albanian forms. --Anatoli (обсудить) 01:17, 6 January 2012 (UTC)
Maybe it also helps to look at how Norwegian Wiktionary itself treats Norwegian. They treat it as one language, but add qualifiers after words when necessary to specify whether the form is Bokmål, Nynorsk or both. This implies that Norwegian speakers themselves treat it as one language, not two. —CodeCat 01:22, 6 January 2012 (UTC)
True. The Chinese also treat Mandarin and Chinese as one language ({{zh}} links to Wiktionary, which is entirely in Mandarin {{cmn}}) but that's a different story. --Anatoli (обсудить) 01:41, 6 January 2012 (UTC)
When this topic was last discussed, almost a year ago, the consensus was to treat them as two languages: Norwegian Bokmål and Norwegian Nynorsk (Wiktionary:Beer parlour archive/2011/February#Norwegian headings). However, nobody was volunteering to sort up the existing entries. (Here is one example.) The user who most actively supported two headings at the time was Njardarlogar, who has made some 700 edits in the last year, primarily creating new entries for Norwegian Nynorsk. --LA2 23:38, 8 January 2012 (UTC)

Using one header is just confusing, it will lead to tags here tags there; tags all over. Some words are more relevant in one language form than the other, and many words have no equivalents at all in the other language form. I cannot think of any Norwegian [lanugage] dictionary that ever contained both Nynorsk and Bokmål, that would be pointlessly messsy. I support the the previous consensus which landed on splitting the two language forms completely. This would leave the header Norwegian for dialectal words only. Regarding similarity, the same argument can be used for all the Scandinavian languages; they are very similar. Njardarlogar 10:09, 9 January 2012 (UTC)

AWB access

I would like to use AutoWikiBrowser to extract audio file names from the articles in this category. Can an admin please add me to the check page? I am an admin on English Wikipedia. I don't intend to make any changes, just browse the category. Thanks. Ganeshk 02:14, 5 January 2012 (UTC)

Why do you want to do that? Mglovesfun (talk) 13:46, 5 January 2012 (UTC)
For use with translation to the Tamil Wiktionary. Please see the request here. I would use the AWB access to extract the content into a CSV file and allow the Tamil Wiktionary folks to upload them. I plan to use custom modules as shown in examples here. Ganeshk 12:02, 6 January 2012 (UTC)
When you say 'extract', do you mean 'remove' or something else? Mglovesfun (talk) 15:25, 6 January 2012 (UTC)
I would parse each page in the category and regex scrape the audio file name and append it to a csv file on my computer. The page will then be skipped with no changes. Nothing will get removed from the page. Ganeshk 17:21, 6 January 2012 (UTC)
I see ok, done consider it done. Mglovesfun (talk) 17:24, 6 January 2012 (UTC)
Thanks! Ganeshk 00:42, 7 January 2012 (UTC)


The pronunciation that I added was it right? I am not sure.Lucifer 18:26, 5 January 2012 (UTC)

No idea, but I'd recommend Talk:anachronism for this sort of question, you can also use {{rfv-pronunciation}} which links to the talk page. Mglovesfun (talk) 19:45, 5 January 2012 (UTC)
It's not possible to say whether your pronunciatory transcription was correct without the intended accent being denoted (by {{a}}); however, if you intended to give an RP transcription, you were correct except for the secondary stress. I've tweaked the transcription per the OED [2ⁿᵈ ed., 1989]. BTW, as Martin notes, this isn't really the forum for this; at the very least, this is more appropriate to the Tea Room. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 21:05, 5 January 2012 (UTC)
The audio is accurate, yes. —Internoob 04:37, 7 January 2012 (UTC)

More languages to add?

I've been recently working to improve the coverage of the languages of Oceania on Wiktionary (which is generally pretty bad), and I realized that we have no words at all for a bunch of languages. These are living languages that should have around a few thousand native speakers each.

  • Wallisian {{wls}}
  • Anuta {{aud}}
  • Tikopia {{tkp}}
  • Rennellese {{mnv}}
  • Pukapukan {{pkp}}
  • West Futuna (often called Futuna-Aniwan, but Wiktionary already calls Futunan 'East Futunan') {{fut}}
  • Niuafo'ou {{num}}

Should I add them, and if so, how? Metaknowledge 20:59, 8 January 2012 (UTC)

Add them as you have been doing. What in more detail are you asking? How to look up ISO 639 codes? How to add languages that don't have ISO 639 codes? Mglovesfun (talk) 21:28, 8 January 2012 (UTC)
Sorry for any confusion. I want to know the format for making the Category:Language name page and for any necessary templates. Also, some languages have different codes in ISO 639-1, ISO 639-2, and ISO 639-3, and I want to know which to use.Metaknowledge 21:55, 8 January 2012 (UTC)
Added those codes I could figure out. -- Liliana 21:58, 8 January 2012 (UTC)
The ISO 639-1 code is used if there is one, otherwise ISO 639-3 is used, I think. (A quick way to determine a language's code is to type the name into the language field of the "Add translation box".) The format for Category:Language name is {{langcatboiler|language code}}. The countries in which the language is spoken can optionally be added as parameters two and on. --Yair rand 22:15, 8 January 2012 (UTC)
I've also added Pukapukan's code to the list. --JorisvS 16:25, 10 January 2012 (UTC)

Categories in need of cleanup

Many pages are in the request category, but need no longer be there.

http://en.wiktionary.org/wiki/Category:Requests_(English) —This unsigned comment was added by Dragonh4t (talkcontribs) at 01:54, 10 January 2012‎.

As in how? Explain. -- Liliana 06:13, 10 January 2012 (UTC)
Many of the articles in the categories have definitions or etymologies
People sometimes put them in those categories because they think that the definitions and etymologies are incomplete even if they're not missing entirely. —Internoob 23:51, 10 January 2012 (UTC)
Yeah, I know that, but what about words like deep or ad? The definitions seem fitting. Also, where can I go to learn more about editing. I know there are pages on it, but this is my first time using any web design type thing.

Category:Old South Arabian place names

It seems, these are duplicates: Category:Old South Arabian Place names and Category:Old South Arabian place names (the P is different). And shouldn't it be Category:sem-srb:Place names? Would someone like to tidy up? --MaEr 18:57, 10 January 2012 (UTC)

Tabbed Languages trial is over

The admin-only Tabbed Languages trial has come to an end. For those who still want to use it, it is available opt-in in the Gadgets section of Special:Preferences.

So what's next? A vote on whether to enable it by default for all users? More testing? --Yair rand 22:21, 10 January 2012 (UTC)

I like it but I would like if it didn't switch to English automatically. Often when I'm working on a language, I would prefer to see that language each time and not have English pop up every time... —CodeCat 01:03, 11 January 2012 (UTC)
Okay, I've lowered the priority of English and Translingual, so that the "remembered" language takes priority over them, but English and Translingual are still higher up than targeted translations languages. Does anyone object? --Yair rand 23:29, 19 January 2012 (UTC)
I'm not sure what you mean with targeted translation? —CodeCat 23:45, 19 January 2012 (UTC)
I listed the old hierarchy at #Default tabbed non-English language. The new hierarchy places the "remembered" language two places higher. Targeted translations languages refers to the languages selected using the little "Select targeted languages" button at the top of translation tables. --Yair rand 23:32, 22 January 2012 (UTC)
Oh I see, thank you. —CodeCat 23:35, 22 January 2012 (UTC)
There was a bunch of feedback (including specific suggestions) in varios section of this page and elsewhere. I wish I had time to collate it: perhaps someone can?​—msh210 (talk) 02:27, 11 January 2012 (UTC)
List of suggestions (I probably missed some):
  • msh210 suggested that each language's content should start vertically positioned near the language name.
    I meant that specifically for when the language header is clicked to load the language's content, not when linked to from elsehwere. That way, the content is near what was clicked to get to it, and no scrlling is necessary. If linked to from elsewhere, then the content should be on top, as it is.​—msh210 (talk) 17:35, 11 January 2012 (UTC)
  • DCDuring suggested that we should be able to select whether the Translingual or the English section merits priority placement, perhaps by placement of a template.
    • My own opinion on this is that English should just always be given higher priority, but a vote on that failed, so...
    • (Another point: The way the script is currently written, if there's both an English and a Translingual section on the page, and the English section is above the Translingual section, the English section is displayed at the start.)
  • Mzajac suggested making the standard page index (I assume this means the TOC?) float at the top-right, only showing sub-section links for the currently-selected language.
    • My personal opinion on this is that this would cause problems for our existing right-floated content, and probably wouldn't be worth it.
      • I've had the TOC floated top-right for years, and it doesn't cause any problems (it does reveal when some right-floated content is out of order, which I routinely fix). Floating the TOC/tabs on the right resolves the wasted space and misalignment issues with the tabs, and collapsing the TOC's non-displayed language sections would simplify the TOC for the reader, while retaining the section links for a very long entry. This would be a good combination of existing and new features, a less jarring design change, and make switching between the two schemes much smoother for new and experienced readers and editors. Michael Z. 2012-01-18 16:10 z
  • Doremítzwr suggested including a show all / hide all toggle atop the column of language tabs.
  • Saltmarsh suggested shrinking the language names on the tabs, especially those not at the focus, to allow more horizontal space for the substance of an entry.
  • And a suggestion from Codecat: "I think it would be a good idea to place the tabs horizontally in the place where the page name displays now. Since we use headword lines, we don't actually need the page name to be there anyway..."
I am hesitant to make substantial design changes to Tabbed Languages at this point, since the current design was made by an actual professional designer (WMF Senior Designer Brandon Harris, AKA User:Jorm) and I'm rather afraid that if we start fiddling with lots of things without a designer helping, the result will be very messy. I've asked on Jorm's talk page if he'd be able to participate in the discussion here about changes to the design. --Yair rand 04:05, 11 January 2012 (UTC)

Stale requests for cleanup

diegesis and other pages have request for cleanup links that when clicked on reveal there is no entry for that word in the RFCU page. How does this happen? Did the person who put the RFCU link in the word page forget to add an entry on the RFCU page? Or do the entries on the RFCU page age and disappear? Can I delete the RFCU tag on the word's page when this happens as there's no longer any way to tell what the requester originally wanted? -- dougher 04:07, 11 January 2012 (UTC)

Very many people add the tags without ever opening a discussion header on WT:RFC. In many cases it's obvious what needs to be fixed, in this one it wasn't (I fixed it anyway). Be sure to look at the history, the tag may just have been added years ago with nobody coming along to remove it once the page is fixed. -- Liliana 06:39, 11 January 2012 (UTC)
...and sometimes there was discussion at RFC and the discussion, never resolved, was archived/deleted anyway. Sometimes whatlinkshere (e.g.) will help find such conversations. Incidentally, it's RFC: RFCU is something else entirely.​—msh210 (talk) 17:40, 11 January 2012 (UTC)

Splitting the Beer Parlour

A while ago I suggested using subpages for different BP discussions, so that they could be more easily followed. That never went anywhere so I'd like to suggest something else instead. It's obvious that the BP is very busy and it's hard to follow discussions because many older discussions are missed out on when new ones are added. Splitting it into two or more distinct discussion pages would slow down the rate of posting somewhat and would make it easier to keep track of discussions, which would in turn allow for better participation. I don't know how it should be split, but since policy discussions are often relatively long, splitting them off into a separate page might be a good start. —CodeCat 19:53, 11 January 2012 (UTC)

Much better idea: Make the BP use Liquidthreads. -- Liliana 23:19, 11 January 2012 (UTC)
Um yeah, please don't! Equinox 22:00, 12 January 2012 (UTC)

"color/colour" etc.

e.g. at shade: "A postage stamp showing an obvious difference in colour/color to the original printing and needing a separate catalogue/catalog entry." I really hate this pandering to spelling pedants, which makes the definition look stupid and unprofessional. It doesn't fix the problem because they could still argue about which form comes first (before the slash). Isn't there any better way? Equinox 22:00, 12 January 2012 (UTC)

Well, we could just say that what ever was there first sticks and no one is allowed to change it (isn't this what we do now?). Or we could just pick one spelling and use it consistently. Or we could use something like {{#ifexpr:{{NUMBEROFARTICLES:R}} mod 2 = 1|color|colour}} to have it randomly alternate... --Yair rand 22:22, 12 January 2012 (UTC)
Would CURRENTTIMESTAMP be cheaper?​—msh210 (talk) 09:12, 13 January 2012 (UTC)
I have no idea. --Yair rand 00:11, 16 January 2012 (UTC)
We could also (which I think was proposed before) have some kind of user-level setting that specifies which set of spellings to prefer, but I doubt it's worth doing for so small a group of pedants, and it would also get complicated, as there are many more "Englishes" than just UK and US. Equinox 00:14, 16 January 2012 (UTC)
Yes, it should just be one or the other. As "color" is used elsewhere in that same entry, it should be just "color" in those definitions too. I'm sure I've seen a guideline for it somewhere. Pengo 01:46, 13 January 2012 (UTC)
Something like a combination of what Yair and Pengo said sounds reasonable to me. Specifically: Keep whatever the first edition uses unless there's good reason to switch. Good reason to switch includes if you're adding more to the entry than it has already, and doing so in the opposite dialect. For example, we Hebrew editors can't decide on ch or kh as transliteration for a certain letter, so each of us does what he wants. But we leave an entry with its current transliteration scheme. But if an entry has one POS section and I add two more as big as it, I will without hesitation make the existing one use my transliteration scheme to make the whole entry consistent.​—msh210 (talk) 09:12, 13 January 2012 (UTC)
My opinion: 1. whichever spelling comes first should stay, 2. spellings should be consistent within an entry, and 3. a definition tagged as {{US}} or {{British}} should use the respective spellings in the definition. -- Liliana 09:47, 13 January 2012 (UTC)
2 and 3 can conflict.​—msh210 (talk) 19:12, 13 January 2012 (UTC)
I just always use US spellings, I think their more internationally recognized and plus nobody can call me US biased, as I'm British, not American. Mglovesfun (talk) 10:25, 13 January 2012 (UTC)
I am just sitting on the fence. I have added this discussion to Wiktionary:American_or_British_Spelling, as I have found on other page where discussions of American and British spellings could suitably be listed. --Dan Polansky 12:35, 13 January 2012 (UTC)

If you think about it, it will become obvious that everyone should just use Canadian English, everywhere. Case closed. Michael Z. 2012-01-18 15:55 z

Narrower IPA, thinking about a vote

I think there should be a stricter policy on IPA. (Given that there isn't one which I'm plainly missing.) There was a vote on "using /ɹ/ at three words" and it passed. All its arguments can be applied to any other sound in any other word in any other language. Thinking from the viewpoint of a Wiktionary-user, not -member: "I want to know how to pronounce X. There's 'IPA: /X/'. What is IPA? [Opens Wikipedia, looks at the symbols. Comes to the conclusion that <X> is pronounced [X]." Now, we know there are narrow and broad transcription and we know that when we click on the IPA in a Wiktionary entry, a key opens. But the usual user doesn't. I don't want to remove all broad transcriptions, but what I want to propose is this:
The Broad IPA transcriptions should use the IPA-sign closest to the sound without any combining marks. Take English. there are dialects which speak /r/, there are dialects which speak something like /ɻʷ/. But both RP and GenAm have /ɹ/. And some 95% (random number, not a statistic) of English accents speak a sound which is far closer to /ɹ/ than to /r/. So why write /r/ anywhere but in a narrow transc. for Northumbria? We musn't make the IPA too broad, because in some languages we will end up merely copying the orthography, leaving the reader non the wiser. So no /r/ or /R/ for German, but /ʁ/. If you write /R/, add a|Austrian. No /sprɔːg/ for Danish but /sbʁɔːw/. And if it happens to be close to the narrow transcription, that is not a problem but merely a lucky coincidence. I always thought the purpose of IPA was not having to learn the whole phonology of a language. And with br. trans. such as /sprog/ I would simply end up pronouncing it utterly wrong.Dakhart 13:20, 13 January 2012 (UTC)

Umm sorry, but hasn't this been the consensus all along? I don't think you'll see /r/ used in any English or German entry here. -- Liliana 13:22, 13 January 2012 (UTC)
farm#Pronunciation, both Pron. and template. horse#Pronunciation, same. I already gave sprog#Danish as an example. So I strongly assume there's much more. Further, approx. every German entry I saw was Bavarian (e.g. Austrian). I just think it wouldn't harm to make it official and maybe add botting for it.Dakhart 13:29, 13 January 2012 (UTC)
I thought the policy here (at least de facto) was to use a broad transcription for English and a narrow one for other languages. That's why we put English transcriptions in slashes and other languages' transcriptions in square brackets. We can assume that users of the English Wiktionary have some knowledge of English and therefore know how the English r is pronounced. Using /ɹ/ would imply that precisely [ɹ] is the only possible realization of the English r phoneme, which it isn't; but using /r/ covers all existing realizations. It's long been the practice and policy of phoneticians, lexicographers and others using the IPA to use the typographically simplest symbols in broad transcriptions; that's why we use /iː/ rather than /ɪ̝j/ or something for the vowel of see. That's why every single English dictionary that uses IPA uses /r/ (Collins, COED, Longman Pronouncing Dictionary, Jones/Gimson, Kenyon & Knott, etc.) to render the English r sound, because they know that their readers are equipped with enough common sense to realize that /r/ stands for "the English r sound (however it may happen to be realized in the accent you're most familiar with)" in the context of an English-language dictionary and not necessarily for "voiced alveolar trill". —Angr 14:16, 13 January 2012 (UTC)
My opinion is that /ɹ/ should be encouraged, but not forced; using /ɹ/ instead of /r/ will give us no disadvantage, but since most sources use /r/ we can't consider it wrong. It would also be nice using /ɫ/ instead of /l/ in words like peel, and placing /ʰ/s where they exist. Ungoliant MMDCCLXIV 14:27, 13 January 2012 (UTC)
Only if we switch to using square brackets instead of slashes, and only if people then add an extra line to accommodate dialects where peel doesn't have [ɫ] (like Irish English); and then to be fair an extra line would also have to be added to words like leaf to show the dialects where it does have [ɫ] (like Scottish English and Australian English). Making our English transcriptions narrow seems to be an awful lot of work for zero benefit. —Angr 14:41, 13 January 2012 (UTC)
My opinion is that our policy should be compatible with verifiability. If the normal practice among linguists and lexicographers and so on is to write /a e i o u/ in discussing a certain language, then we should write /a e i o u/ even if phonetic realizations vary greatly depending on environment, because otherwise we're basically requiring original research: we won't even be able to take pronunciations from reliable sources. —RuakhTALK 14:43, 13 January 2012 (UTC)
1. I didn't say narrow, I said narrower. 2. What I was going to post (edit conflict): Well, according to this we don't use /r/, since the original intention of this vote clearly was the same as the one voiced by me now. Why he changed it, using example words instead, I do not know. I have seen some broad transcriptions in brackets on Wiktionary. There are proper narrow transcriptions for other languages, but there are also things like [d] for [̪d̪]. Further: Dictionaries give an explanation of their script used. Wiktionary does too, but as said: Only when an user happens to find it. On the other hand: Using /ɹ/ would imply that precisely [ɹ] is the only possible realization of the English r phoneme seems to be very strange a sentence to me since I always thought that using [ɹ] would imply that precisely [ɹ] is the only possible realisation of English <r>.
Most transcriptions for other languages (that I saw, naturally) are in slashes and I think that vowels are not a problem in them. /i:/ does depict the standard pron.s of "see" good enough, because "ee" is, at least for some part, a rather unrounded rather close front-vowel. But /zi:/ wouldn't. Because "S" is not a rather voiced consonant. And in the same vain <r> is not rather trilled. And the Danish <G> in "sprog" is neither velar nor a plosive in any way. The only advantage of such very broad transc. I can see is that they are more convenient for the author, but they bear more risk to mislead. And last but not least: I'm talking general policies, not English alone. To rephrase my proposal: "Let's use broad transcriptions for all IPA-entries, no matter what language, but use the IPA-sign that is closest to the nature of the actual sound used in the Standard given." That is: A velar sign for a velar sound, a trill sign for a trill sound, a /d/ for any sort of voiced-tongue-based stop etc. But not a trill for an approximant, not a velar stop for a labial approximant. And I think we won't have a problem finding a source that says that neither GenAm nor RP use a trilled R or a source saing that Dutch G is a /ɣ/ rather than a /g/. Which it isn't. I think no dialect has it but all dialects have /xχʝç/. Yet, /ɣ/ is broad enough but certainly narrower than /g/.Dakhart 14:52, 13 January 2012 (UTC)
I use /r/ because it's the most commonly used here and also the easiest to type. I will continue to do so until there is a consensus or a succeeded vote to do otherwise. Mglovesfun (talk) 17:23, 13 January 2012 (UTC)
There has been (for English).​—msh210 (talk) 19:21, 13 January 2012 (UTC)
To paraphrase my comment on the Tea Room, the vote doesn't really say what the voters are voting on. It only affects "words like red, green and orange". I have genuinely no idea what that's supposed to mean. Mglovesfun (talk) 19:36, 13 January 2012 (UTC)
No, it affects "the r phoneme in words like red, green and orange" (emphasis mine). Those three words exemplify English /r/. (I imagine they were chosen so as to give a diversity of phones; in GenAm, at least, red is typically pronounced with a retroflex /r/, green with a bunched /r/, and orange with a rhoticized vowel, though there is variation in all three. The point being that all three words supposedly have the same phoneme, just realized differently.) —RuakhTALK 21:22, 13 January 2012 (UTC)

And the underlying phoneme is /ɹ/, not /r/. To repeat/rephrase myself: IPA should give a broad transcription (slashes) unless somebody is really sure about a standard pronunciation, which then is given in narrow transcription (brackets). This should be because of how easy it is to make a wrong/nonstandard narrow transcription. And the broad transcription should give the IPA-sign for the phoneme occurring in most positions without combining signs. The underlying phonemes could easily be gathered from Wikipedia, which has sufficient sources for most languages. Such phones would be /ɹ/ for all <r>s, p.e. [ɹʷ], /l/ for [lˠ] (English), /ʁ/ for [ɐ̯] (Danish, German), /ɣ/ for /xcj.../ (Dutch), /g/ for [j] (Swedish), /d/ for [d̪ ð] (Spanish) and so forth. I gather that, while some would vote nay, nobody sees a reason not to vote on it. So I will wait two days for further input and then find out how to get the vote rolling.Dakhart 21:47, 13 January 2012 (UTC)

/g/ for [j] in Swedish would be a bad idea because there is a phonemic merger with /j/, it's more than just allophony. —CodeCat 22:01, 13 January 2012 (UTC)
You just sound like you don't even know what "allophony" means and just picked up a fancy word. Allophones have to be considered. Pronunciations are given to show the English reader how it's exactly pronounced in this case. It's not an allophone with free variation, but the standard pronunciation. It's not helpful at all to require the reader to know that the letter G can be pronounced differently. -- 19:51, 19 April 2017 (UTC)
I tripped upon that one too. But details are for later. The important thing is that no sign is used which represents a phone not existing within the language.Dakhart 22:31, 13 January 2012 (UTC)
Sorry, but the claim "the underlying phoneme is /ɹ/, not /r/" makes no sense. The underlying phoneme is simply a rhotic consonant; English only has one, so nothing else about it needs to be specified underlyingly. We write it /r/ instead of [+rhotic] (or [+sonorant, −nasal, −lateral] or whatever) because it's easier for humans to read. We write it /r/ instead of /ɹ/ for the same reason. Wiktionary already uses narrow transcriptions for languages other than English, so if you find a misleading broad transcription for Danish, just change it. It's a wiki. You don't have to discuss it or bring anything up for a vote to do that. If there's a vote on anything, it can only be about English, because English is the only language that would be changed by such a vote. —Angr 23:12, 13 January 2012 (UTC)
You make no sense. Can you please tell me what a rhotic consonant in terms of acoustic properties? -- 19:51, 19 April 2017 (UTC)

Non-lemma forms on rhymes page

Taking Rhymes:English:-ɪŋɪŋ as a typical example, there is a line that says <!--Do not add present participles or gerunds to this page unless they have other meanings-->. Um, why ever not? WT:Rhymes doesn't mention it, Wiktionary:ELE#Rhymes also does not mention it, am I right in thinking this isn't a consensus, but just one or more editors who wrote the invisible comments many years ago, and are therefore no longer relevant unless there is some evidence that this is still a consensus. Mglovesfun (talk) 17:02, 13 January 2012 (UTC)

  • Well, I don't think we could stop poets from using participles, gerunds &c as rhymes, so we should be able to include them in these pages if we want. Some of the pages could become ginormous mind you! SemperBlotto 17:07, 13 January 2012 (UTC)
    Rhymes:French:-e for one! Mglovesfun (talk) 17:21, 13 January 2012 (UTC)
    If we consider traditional rules for rhymes, this page should include only et, , ||ohé]], Noé, Pasiphaé, Aglaé, béer..., gréer..., agréer..., and a few others, but not words where there is a consonant sound before /e/. blé and thé are not considered as rhymes in French. Lmaltier 08:12, 14 January 2012 (UTC)
  • It might be hard to pull out lemma forms from the list if others are there, too; OTOH, I can't think of a good reason one might want to do so, so I'm with you unless someone comes up with one.​—msh210 (talk) 19:20, 13 January 2012 (UTC)
  • Do we waste more valuable resources (eg, contributor time, download time) in trying to enforce such limits or in having long lists of trivial Rhymes? DCDuring TALK 19:44, 13 January 2012 (UTC)
    It would be far better to auto-generate the rhyme lists based on pronunciations. A word would then be "added" to the rhymes page by simply giving it the correct pronunciation in IPA (or whatever other notation). Equinox 22:56, 15 January 2012 (UTC)
    That'd be difficult without the StringFunctions extension (which, seemingly, we're not getting) unless we change our IPA template to do something like {{IPA|lang=foo|nɑnˌɹɑjmɪŋg̚p|ɑɹt}}.​—msh210 (talk) 16:02, 17 January 2012 (UTC)
    Actually, we will be able to have templates manipulate strings as soon as we get Lua scripting available, but I'm not sure it would be a good idea to merge rhyme content and pronunciation content, since many users might not actually know how to use IPA, but do know that one word rhymes with another word and can thus be added using the "Add new rhyme" forms. --Yair rand 20:48, 15 February 2012 (UTC)
  • On a related note, forbidding non-lemma forms on Czech rhymes pages makes no sense to me, as, in Czech, it is the particular inflected form that has to rhyme. --Dan Polansky 20:42, 13 January 2012 (UTC)
    Yup, also for Icelandic. What with vowel changes and all manner of irregular forms (which occur to a lesser extent in English as well), non-lemma forms need to be listed as well. This is what I've always done for the Icelandic rhymes. – Krun 22:10, 13 January 2012 (UTC)
As far as I know, the restiction against non-lemmata was instituted at the start of the Rhymes project, and has never been discussed as far as its value. I think that, given the current state of thinks on Wiktionary, inclusion of non-lemmata should be allowed and comments forbidding their inclusion be removed. --EncycloPetey 02:21, 17 January 2012 (UTC)
I agree. --Yair rand 20:48, 15 February 2012 (UTC)

Wiktionary:Votes/2012-01/Modify WT:ELE rhymes section

Like the vote says. Someone may want to put something in to reflect what Yair rand says about different spacing when different dialects are involved. There are still 7 days to edit the vote. Mglovesfun (talk) 17:20, 13 January 2012 (UTC)

Why considering the number of syllables for rhymes?

This seems to be quite irrelevant. What would be most helpful is an order of rhymes according to the richness of the rhymes, i.e. in a kind of reverse phonetical order (giving priority to vowels): e.g. ringing should be near stringing because of the common ringing, making them closer of each other than pinging. Lmaltier 08:25, 14 January 2012 (UTC)

If you're writing a poem, the number of syllables could be rather important — unless I'm missing something. Equinox 22:55, 15 January 2012 (UTC)
Of course, but the number of syllables of the verse, not of the last word of the verse. I now understand that this can be useful when the verse is almost complete and you try to find the last word. But in most cases, you look for a rhyme much before that, and the richness of the rhyme is something important. Lmaltier 17:34, 17 January 2012 (UTC)
This may depend on language.​—msh210 (talk) 17:45, 17 January 2012 (UTC)
"Richness" wouldn't be useful for English rhymes; in English, "ringing", "pinging", and "stringing" all rhyme to the same extent. —RuakhTALK 17:44, 17 January 2012 (UTC)
You are right: this depends on languages (see w:Rhyme). My suggestion does not apply to English, but it applies to French (and probably to some other languages). Lmaltier 18:30, 17 January 2012 (UTC)

Announcing Wikipedia 1.19 beta

Wikimedia Foundation is getting ready to push out 1.19 to all the WMF-hosted wikis. As we finish wrapping up our code review, you can test the new version right now on beta.wmflabs.org. For more information, please read the release notes or the start of the final announcement.

The following are the areas that you will probably be most interested in:

  • Faster loading of javascript files makes dependency tracking more important.
  • New common*.css files usable by skins instead of having to copy piles of generic styles from MonoBook or Vector's css.
  • The default user signature now contains a talk link in addition to the user link.
  • Searching blocked usernames in block log is now clearer.
  • Better timezone recognition in user preferences.
  • Improved diff readability for colorblind people.
  • The interwiki links table can now be accessed also when the interwiki cache is used (used in the API and the Interwiki extension).
  • More gender support (for instance in logs and user lists).
  • Language converter improved, e.g. it now works depending on the page content language.
  • Time and number-formatting magic words also now depend on the page content language.
  • Bidirectional support further improved after 1.18.

Report any problems on the labs beta wiki and we'll work to address them before they software is released to the production wikis.

Note that this cluster does have SUL but it is not integrated with SUL in production, so you'll need to create another account. You should avoid using the same password as you use here. — Global message delivery 00:06, 15 January 2012 (UTC)

Wikipedia blackout

For those who weren't already aware, the English and German Wikipedias will be "blacked out" tomorrow (the 18th) in protest of impending US legislation. The Main page will be replaced with a blackout banner, and editing will be locked for the duration of the protest. See WP:SOPA for more information. Commons may be displaying a banner, but does not appear to be planning to lock down. --EncycloPetey 02:17, 17 January 2012 (UTC)

At the Dutch wiktionary a banner is flying in solidarity and there are discussions elswhere Will the English Wiktionary consider the same?Jcwf 04:10, 17 January 2012 (UTC)
It's a bit late now to gain any meaningful consensus for it.​—msh210 (talk) 15:57, 17 January 2012 (UTC)
I wouldn't have thought so. SemperBlotto 08:41, 17 January 2012 (UTC)
I predict that we will get 999 angry comments saying "I HATE U WIKIPEDIA, U STOPPED ME DOING MY HOMEWORK". Equinox 23:54, 17 January 2012 (UTC)
Yeah, I predict that despite the blackout being Wikipedia-only, we (Wiktionary) will get at least a few such angry comments. Because, you know, people won't be able to leave them on Wikipedia during the blackout. Phol 01:27, 18 January 2012 (UTC)
Or more probably because people genuinely can't tell Wikipedia and Wiktionary apart. Look at the pathetic specimens we get on the feedback page. Equinox 01:34, 18 January 2012 (UTC)
And for those who haven't realized, a WP blocks out a (very) short time after loading, so stopping the page's loading will allow it to be displayed.​—msh210 (talk) 16:14, 18 January 2012 (UTC)
Yes, my internet connection is so slow that it took me a while to realise that the Java script was supposed to be blocking pages. If you really want to read Wikipedia, just disable Java in your browser. Dbfirs 17:10, 18 January 2012 (UTC)
No, just disable Javascript. Javascript and Java are completely different things. --Yair rand 21:53, 18 January 2012 (UTC)
Sorry, yes, my mistake! Dbfirs 23:00, 18 January 2012 (UTC)
Or simply right click->View page source :). JamesjiaoTC 22:19, 18 January 2012 (UTC)
m:English Wikipedia SOPA blackout/Technical FAQ#Are there ways to circumvent the read blackout? The page lists several.​—msh210 (talk) 22:23, 18 January 2012 (UTC)
Adding ?banner=none or &banner=none to the end of the address works too. —CodeCat 22:24, 18 January 2012 (UTC)
Or just pressing the browser's "stop" button before the page finished loading... --Yair rand 22:25, 18 January 2012 (UTC)

The 'definition' of non-English place names

Our current practice for non-english place names is to give them a definition in English, and to create a link to the English entry in the non-English entry, with the proper translation into English. This is our practice for regular words as well so it's not really that strange. But with place names it often seems backwards. In many cases, the English 'translation' is the same word, as it was simply loaned from the place of origin into English. For example, Catalan Girona is simply defined as 'Girona', with a link to the English section, even though the city is in Catalonia. And the same way for Dutch Eindhoven, Indonesian Jakarta and so on. I'm not quite sure what would be a better way to display this, but it seems strange to me that the main definition is in the English section when the name is clearly native to another language. —CodeCat 18:56, 17 January 2012 (UTC)

So you think the English definition should be "English name of Jakarta" or "English name of ירושלים? That sounds reasonable, but IMO the following four reasons for doing it the way we've been doing it win out: (1) Consistency with non-proper-noun entries. (2) The lack of desire to get into a fight over which name should be chosen as the primary one, linked to in all the definitions, when more than one language-speaking group lays claim to a place. (3) The primacy of English-language entries: they shouldn't rely on other-language entries for their definitions. (4) Readability: an English-language definition should not include foreign-language words.​—msh210 (talk) 20:03, 17 January 2012 (UTC)
I agree with Msh210. Let's keep to simple principles. But the discussion was not about English entries, and I understand CodeCat's concern. I think that, in such cases, the definition in the non-English sections could be written as: [[Jakarta#English|Jakarta]] (the capital city of Indonesia). Lmaltier 20:48, 17 January 2012 (UTC)
Sure, {{gloss}} is always good to use.​—msh210 (talk) 22:38, 17 January 2012 (UTC)
I think the status quo is the best practice. In addition to Msh210's arguments above, doing it this way also allows consistency with entries for place names where the native name is spelled differently from the English name, so München#German is defined as Munich, and Praha#Czech is defined as Prague, while the meaningful definitions are at the English names. Using {{gloss}} is only necessary if the English entry has more than one meaning, and the native entry corresponds to only of those meanings. Thus, if at Prague we have "1. The capital city of the Czech Republic" and "2. A town in Lincoln County, Oklahoma", then Praha#Czech should be say "(the capital city of the Czech Republic)" so readers know that the town in Oklahoma is not also called Praha in Czech. —Angr 23:17, 17 January 2012 (UTC)
New senses can be added at any moment. It's not always necessary to add a gloss in the non-English word definition, but if you want to add it (just in case), it's never bad, as it might become necessary some day. And, even when unnecessary, it might help some readers. This is true for all words, of course, not only placenames. Lmaltier 18:20, 18 January 2012 (UTC)
FWIW, I agree with both of you, Angr and Lmaltier: {{gloss}} is necessary only when there's more than one definition but sometimes helps (and never hurts) even otherwise.​—msh210 (talk) 22:28, 18 January 2012 (UTC)

Radio shorthand and other codes, is it translingual?

In radio communication, there are many shorthands such as SOS (emergency), CQ (calling all stations), 73 (best regards), as well as the Q codes such as QSL (reception report). These are used internationally, and as far as I've been able to tell they're used in other languages as well as English. But as English has had a leading role in international radio communications, I'm not quite sure whether these terms are translingual or not. What category would be best for such terms, given that they are a kind of 'translingual radio slang'? —CodeCat 18:05, 18 January 2012 (UTC)

Well I see them used a lot in German running text, so it's safe to assume they're translingual. -- Liliana 19:48, 18 January 2012 (UTC)
I think they are translingual, but this fact does not exclude additional sections for several languages (with prononciation, examples showing how it is used in the language, etc.), even if these sections seem much less useful fot these codes than for other translingual terms (scientific names in biology, etc.) Lmaltier 20:12, 18 January 2012 (UTC)

Irony and sarcasm

Currently, {{ironic}} redirects to {{sarcastic}}. I submit that this should be the other way around. ‘Sarcastic’ is far too restrictive a word for how virtually all the terms in Category:English sarcastic terms are used. Ƿidsiþ 17:37, 21 January 2012 (UTC)

In school I was told that sarcasm is a type of irony, so I agree. Ungoliant MMDCCLXIV 18:20, 21 January 2012 (UTC)
Sarcasm is often used to mean "verbal irony", but that's often considered a misuse. The OED defines sarcasm as "A sharp, bitter, or cutting expression or remark; a bitter gibe or taunt. Now usually in generalized sense: Sarcastic language; sarcastic meaning or purpose" and irony (in the relevant sense) as "A figure of speech in which the intended meaning is the opposite of that expressed by the words used; usually taking the form of sarcasm or ridicule in which laudatory expressions are used to imply condemnation or contempt." Properly speaking, neither is a subset of the other; something like "Good going; wanna break anything else while you're at it?" is both, but something like "You suck at this" is only sarcasm (not irony), and "Nice weather, huh? I love trudging through knee-deep snowdrifts" is only irony (not sarcasm). Some of the terms in Category:English sarcastic terms do not seem ironic to me, only sarcastic; what's ironic about no duh? —RuakhTALK 22:50, 23 January 2012 (UTC)

Renaming requests for verification

I am in the process of creating Wiktionary:Votes/2012-01/Renaming requests for verification, which proposes to rename WT:Requests for verification to WT:Requests for attestation. Feel free to discuss the proposal here or on the vote's talk page, as you see fit. Feel free to postpone the vote should the discussion last longer than until the start of the vote.

Most recent relating discussion: Wiktionary:Requests_for_moves,_mergers_and_splits#Wiktionary:Requests for verification to Wiktionary:Requests for attestation, March 2011. --Dan Polansky 13:48, 22 January 2012 (UTC)

Responding to one of the arguments made in the previous discussion: 'Whatever we call the page, we will need to explain it to new users/contributors. "Verification" is 20 times more common in English than "attestation". [...] Consequently, Oppose. DCDuring TALK 00:30, 28 March 2011 (UTC)': "verification" is misleading, so its being common does not save it. The term "attestation" is used by CFI, and it is "attestation" as defined by CFI that is being sought at the page currently called "WT:Requests for verification". --Dan Polansky 13:54, 22 January 2012 (UTC)
I strongly prefer Wiktionary:Please read the prologue of this page to see what it's all about It's so far the only proposed name that makes it clear what is going on in there. -- Liliana 05:31, 23 January 2012 (UTC)
This seems to be made in joke, or as a sarcastic argument. For the latter case: the jocularly proposed page name does not tell the user at all what the page is about. Actually, all pages in Wiktionary namespace could have this name. The name with "attestation" is not significantly longer than "verification", so the implication in that jocular argument that the renaming is going to make page names needlessly long is wrong. Another way of reading this sarcastic remark is as saying this: page names in Wiktionary namespace don't matter, as everyone can read the top of the page anyway. By contrast, I find clear and fitting page names a good thing, regardless of the option to read the top of the page. Curiously, the top of the page has to say that 'Requests for verification is a page for requests for attestation of a term or a sense, [...]'. When a newbie sees this sentence, the natural response would often be like "if this page is for requests for attestation, why the heck is it called requests for verification"? --Dan Polansky 07:56, 23 January 2012 (UTC)
Or “I don't know what attestation is, but from the page title, I guess it just means “verification.” This easier-to-understand name is a poor choice, because it's actually just easier to misunderstand. Michael Z. 2012-01-30 22:01 z

Please help with sorting out unknown language names

Sometimes people request translations and such for languages that we don't have a code for on Wiktionary. I've modified {{ttbc}} and {{trreq}} temporarily to add any language names it doesn't recognise to Category:CodeCat's test category. Could everyone please help empty that category again, by replacing the parameter of those templates with the proper code? Thank you! —CodeCat 21:20, 22 January 2012 (UTC)

I've fixed one of them, and its problem was that it used {{ttbc|[[languagename]]}}. Whoever fixes others, can you state whether that was the problem also? If so, perhaps we should adjust {{ttbc}} to allow for such use.​—msh210 (talk) 03:59, 23 January 2012 (UTC)
That one was [[pander]]. Same thing at [[illness]].​—msh210 (talk) 18:36, 23 January 2012 (UTC)
At [[safety]], the problem seems to be that the entry contains {{ttbc|Visaya}}, and we don't have Visaya as a language (in fact, it seems not to be one). But it is a language family, and that seems like an appropriate use of {{ttbc}}. Perhaps the template should allow for such use (by language-family code if not by name)?​—msh210 (talk) 18:27, 23 January 2012 (UTC)
Similar issue at [[bone]]: it uses {{ttbc|Old Mongolian}} and {{ttbc|Middle Turkish}}, and we have neither language. Again, I didn't remove these, as I don't know them to be nonexistent: maybe we just need to add the languages. (See also w:Middle Turkic languages and w:Middle Mongolian language.)​—msh210 (talk) 18:36, 23 January 2012 (UTC)
As Wikipedia says, there's no language "Old Mongolian", as the first written sources appeared only in 12th century. We have {{xng}} and {{cmg}} though. Not sure what to do about Middle Turkish. -- Liliana 17:11, 24 January 2012 (UTC)
I've brought the number down to three. I'm not sure what to do with the remainder though. The problems with bone have already been mentioned, and sinew also mentions 'Middle Turkish'. octillion uses 'Chinese numeral' as a language, I'm not sure what that's supposed to be. —CodeCat 21:11, 23 January 2012 (UTC)
Mglovesfun has fixed octillion. I've removed the Old Mongolian from bone because it didn't seem to be correct or in a correct script; as long as I was at it, I removed the Middle Turkish (which it was oddly subordinated to), too. That leaves sinew. - -sche (discuss) 03:42, 30 January 2012 (UTC)

Internet =/= Internet slang

Last time I checked, these contexts worked like this:

However, a lot of words in Category:en:Internet are Internet slang instead. (epic fail, a/s/l, BTW...) I can recategorize them, but I'd like to make the distinction clear first.

(Standard disclaimer: But feel free to propose different things.)

Hi, Wiktionary.

--Daniel 09:27, 24 January 2012 (UTC)

Things like IP and hyperlink aren't necessarily Internet related; they occur in a network as well. Those should be {{networking}}. -- Liliana 16:56, 24 January 2012 (UTC)
Are you saying the 'Internet Protocol' is not just for the Internet? —CodeCat 17:00, 24 January 2012 (UTC)
How do you expect a modern network to function without IPs? NetBEUI and IPX/SPX are obsolete nowadays. -- Liliana 17:59, 24 January 2012 (UTC)
That's right: one can set up an IP network which is not connected to the Internet. —AugPi 18:21, 24 January 2012 (UTC)
  • This would seem to be a problem in the way context information is used to populate topical categories. Topical categories and usage contexts overlap, but neither is a subset of the other. Perhaps the remedy is either to not use contexts to populate categories or to allow individual contexts to be marked in such a way as to override the default categorization. The general answer would seem to be that topical categorization should be distinct from usage contexts, a point MZajac made years ago. DCDuring TALK 19:33, 24 January 2012 (UTC)
    • I agree. It would be nice if there were a separate {{topic}} template. But it would also mean that we would have to make a distinction between {{topic|Internet}} and {{context|Internet}}, because they can't both use {{Internet}} as the underlying template... —CodeCat 19:39, 24 January 2012 (UTC)
      We could allow the context to have priority, especially as there is much less subjectivity and arbitrariness and more linguistic content to usage contexts. Topical categories have always seemed much more arbitrary to me. And, as we would not in general have sense marking for topical categories if we make the context-topic distinction, it would not be clear which sense accounted for the headword being in the category. DCDuring TALK 19:58, 24 January 2012 (UTC)
    • Re "Topical categories and usage contexts overlap, but neither is a subset of the other. Perhaps the remedy is either to not use contexts to populate categories...": We've already decided on that remedy. Alas, it'snot yet implemented as widely as it should be.​—msh210 (talk) 06:04, 25 January 2012 (UTC)
  • If we're going to use {{networking}} instead of {{Internet}} as the context of IP because technically there are instances of IPs existing without Internet... We may as well use (hypertext) instead of {{Internet}} as the context of web page, hyperlink, splash page, pop-under, frameset, because technically we can view these things in offline hypertext pages. --Daniel 08:35, 25 January 2012 (UTC)
    • Internet Protocol (IP) not only can be used outside of the Internet, it frequently is, for example it's used even for communicating between processes on a single machine, for local area networks, and increasingly with peripheral devices. I'm not sure if you're just trying to make a point about pedanticism, but regardless it's a fair point that "offline" hypertext pages exist, so I'll address it in good faith. Some of those words could sense offline or within a broader context, for example "hyperlink" and "frameset" could be considered "networking" or "computing" terms, rather than Internet-specific. But "web page", "splash page", and "pop-under" all imply Internet. A pop-under, for example, makes little sense offline, even if it's technically possible, and "web" in "web page" is for "world wide web", part of the Internet. TL;DR: I strongly suggest IP be considered "networking" rather than "Internet" (it's not just being pedantic, it's how it's commonly used), and if you want to broaden the scope of some "Internet" terms to be "networking" or "computing" that seems fair enough to me but they should be considered on a case-by-case basis. Pengo 12:19, 27 January 2012 (UTC)
    • I think Daniel makes a good point, and I mostly disagree with you, Pengo. A frameset has nothing to do with networking (the connection of multiple computers); it is only part of a hypertext document; it just so happens that we see most of our hypertext on Web pages that come over a network, but they don't have to, and sometimes don't — so "Internet" (relevant context) is a more reasonable tag for frameset than "networking" (irrelevant context). Likewise, a pop-under can certainly exist offline and make sense, e.g. when developers are testing their sites. Equinox 23:53, 27 January 2012 (UTC)
  • You seem to "mostly disagree" with only two examples (and one of them due to a misunderstanding). My overall point was that the context labels should be considered on a case-by-case basis and that IP is definitely networking and not Internet, and I don't seem to be disagreeing with that. Sorry, I stated the frameset example ambiguously. I meant it could be considered "computing" (and that "hyperlink" could be considered "networking" or "computing"). As for pop-under, testing a pop-under offline is still testing it for the Internet. Like I said, a pop-under makes little sense outside of the context of the Internet, even if one could technically exist offline, so I'd consider it extremely pedantic to broaden its context. You can disagree if you like, I'm not really so worried about how it ends up or if it has the context/topic removed. Pengo 00:38, 28 January 2012 (UTC)
Thanks for the permission! Equinox 00:48, 28 January 2012 (UTC)
No, no, no! Don't apply labels based on facts about the referrent! If they contribute to the definition, then they belong in the definition. Don't label something internet or computing based on whether the thing works online or offline. You don't label the definition of bear with (woods). Nor should you label each sense just to help the reader discriminate each item in a long entry. This confusion is why “context” is such a poor name for these labels.
A usage label is applied only based on by whom and where the term is used.
Everybody knows what a web address is – don't label it. Internet Protocol is a technical term in computing and networking, but anyone who operates a web browser or other networked software might benefit from knowing what an IP address is: I'd be tempted to label it with the more general computing. Hyperlink predates the WWW, and is a concept in various media, including writing, multimedia CD-ROMS and computer software interfaces; we now find hyperlinks in all of our apps and ebooks. I don't think it is technical or restricted enough to warrant a label, or at most computing. Image map seems to occur in books on web design and graphics, but not in web users' how-to books: label it web design or web authoring. I see that splash page appears in books about web authoring and marketing, so perhaps label it with both. —This unsigned comment was added by Mzajac (talkcontribs).

Let's not overuse these lexicographical restricted-usage labels. Web page, for example, is not jargon or restricted to specialized lexical contexts, and shouldn't be labelled as such.

For “topical” categorization (although I can't understand why we would try to duplicate Wikipedia in categorizing the referrents of terms), what is wrong with typing [[category:Internet]] at the end of a definition line? Michael Z. 2012-01-27 15:29 z

There is a lot of overuse of the context labels to clean up -- and a need for the advocates of topical categories to actually hard-code topical categories. And the default use of the contexts to include entries in topical categories should end, as plenty of time has passed to allow for the hard-coding to categories for those entries with misused context labels. Appropriate context labels are a useful guide for the insertion of hard categories using AWB or some fully automated approach. DCDuring TALK 18:49, 27 January 2012 (UTC)
We should refurbish the nomenclature, which is vague and encourages misuse. Our “context” has no useful meaning, and should be replaced with restricted-usage labels, or usage labels for short. “Topical context labels” are not for identifying the topical context of a sense – they're restricted-usage labels for technical or specialized terms – perhaps these should be called technical or subject usage labels. {{context}} can be renamed {{usage}}, which is practically unused, or {{label}}. “Grammatical context” labels have nothing to do with context, and should be regarded separately as grammatical labels.
See category:Context labelsMichael Z. 2012-01-27 23:44 z


If anyone is interested, I have copied over a list of dinosaur names from Wikipedia, containing over 1,300 names - all blue links at 'pedia, but mostly red links here. The list is at User:BD2412/walk the dinosaurs, though I won't object if others want to move it to project space or otherwise rename it. I don't see myself getting back to this for a while, but please have at it. Cheers! bd2412 T 19:31, 26 January 2012 (UTC)

Wow this is incredibly useful, thanks for creating this valuable page, I'll try to find some time to look it over. -- Cirt (talk) 23:59, 13 February 2012 (UTC)

When to use the gerund tag

I just discovered Appendix:Glossary#gerund. The languages I work on (gml, de, nds) genuinely treat gerunds as nouns. (confer Leben, lęvend). Would the right thing to do be, to add the gerund tag in front of those nouns?Dakhart 14:32, 28 January 2012 (UTC)

That depends on the language. English treats gerunds as nouns, but we only list them as nouns when the term has taken on strongly noun-like characteristics that warrant a separate definition. Otherwise, we simply label English gerunds as "Verb" since they are also a present participle form. However, for Latin gerunds we have a separate "Gerund" part of speech, since Latin gerunds do not behave fully like nouns. Among other differences, they have no nominative and no plural, for example, and have a modified conjugation table as a result. As a result, Latin gerunds are not treated in the same way as English gerunds. What you do depends on the languages you're looking at. I don't known enough about gerunds in German to offer any more specific advice. --EncycloPetey 16:12, 28 January 2012 (UTC)
In Italian, we use "Verb" as the section name, and use {{gerund of}} (with "lang=it" in the definition line. SemperBlotto 16:20, 28 January 2012 (UTC)

7 Wonders

Much in the way that we have kept from listing specific people by first and last name, I would propose that we not include the place name for specific entities that otherwise warrant inclusion unless the place name is integral to the name of the entity. The Seven Wonders of the Ancient World will be used to illustrate this idea, assuming that we might all consider these to be permissible dictionary entries under some title. I would permit:

In some cases the full name is required:

Can we agree to allow these entries under the suggested titles? DAVilla 03:11, 30 January 2012 (UTC)

  1.   Oppose Liliana 03:17, 30 January 2012 (UTC)
  2. Oppose also, don't include them. WT:NOT#Wiktionary is not Wikipedia. Mglovesfun (talk) 11:40, 30 January 2012 (UTC)
    This is my feeling too. Equinox 17:52, 30 January 2012 (UTC)
    To clarify, if a term has no linguistic merit, don't include it because it is well known, or whatever. Mglovesfun (talk) 16:33, 31 January 2012 (UTC)
    I never said they should be included because they're well known. Rather, I had assumed that they all have linguistic merit. Worse, I assumed you all realized this, but having been challenged, there's no reason to think this would not still have to be proven. Yet your reflexive denial of their linguistic merit is a pathetic stubbornness that seeks to separate encyclopedic terms from language constructs despite the myriad of such names that have been individually scrutinized and passed and the myriad of encyclopedic titles that are nonetheless English words. In a more hypothetical construction than the concrete case I've laid out, your denial of the antecedent would not stand. But far be it from me to argue with an exclusionist about the addition of language that would aid your cause rather than include any terms beyond these seven, which I promise you cannot remain red indefinitely for the force of evidence in their favor. DAVilla 17:19, 5 February 2012 (UTC)
  3. Oppose, but not for Mg's reason. We're not an encyclopedia, so we shouldn't be discussing which referents we should include words for but, rather, which words we should include. That is, Statue of Zeus and Statue of Zeus at Olympia are two different words (if you will) and each gets included, or not, on its own merits. There's no cause at all to say "we should include one of them, so let's decide which title is better": that's the purview of an encyclopedia. (Plus, I suspect none of these should be included at all, as Mg alludes to, but that's another issue and not my point here.)​—msh210 (talk) 17:14, 30 January 2012 (UTC)
  4. Oppose per Mglovesfun and msh210 (and maybe Liliana as well). —RuakhTALK 17:50, 30 January 2012 (UTC)
  5. Please, no. DCDuring TALK 18:15, 30 January 2012 (UTC)
  6. Oppose  hanging (lemma: hang) and gardens (lemma: garden) are dictionary terms: lexical units with inherent meaning. hanging gardens is merely a sum-of-parts phrase, deriving meaning from its component terms, and I hope we can all agree it doesn't belong in the dictionary. Capitalizing it Hanging Gardens signals it as a name or title (denoting a Toronto restaurant, among many other things) but again, this is not a lexical unit with unique meaning, and doesn't belong in the dictionary. Ditto for Hanging Gardens of Babylon, but because it is widely used to refer to one particularly famous thing, many editors will argue to keep it. Encyclopedic entries like this just duplicate Wikipedia, very poorly. I say delete them all, or redirect them to Wikipedia, and concentrate on being the best possible dictionary. Michael Z. 2012-01-30 21:55 z

Category:Requests for date

I have started working on this category. About a quarter or more of the requests are for non-English citations. (See non-Roman character entries, eg here, but also various Esperanto entries.) Do we not need to have subcategories for this by language, at least for languages other than English?

Category membership comes almost entirely from {{rfdate}} and templates like {{quote-book}} with the "year" parameter omitted. It is a simple matter to add lang= to rfdate, though it does not now categorize by language. Should we not do this and also add a lang= categorization capability for templates like {{quote-book}}? DCDuring TALK 14:32, 31 January 2012 (UTC)

Category:Old Javanese language

A user has been adding entries for Old Javanese. I've been removing them purely because there's no code for it, I know just about nothing about Javanese, but we do for example have Category:Old Swedish language with the ad hoc code {{gmq-osw}}. Should Old Javanese be permitted a code? NB I would interpret a lack of objections as 'go ahead'. Mglovesfun (talk) 16:32, 31 January 2012 (UTC)

Old Javanese does have a code: {{kaw}}. -- Liliana 16:38, 31 January 2012 (UTC)
It displays Kawi, do we want to change it to Old Javanese? Mglovesfun (talk) 17:08, 31 January 2012 (UTC)
I think Old Javanese would be better for consistency with other languages. -- Liliana 17:09, 31 January 2012 (UTC)
I've edited {{kaw}} and restored the two Old Javanese entries I removed. Mglovesfun (talk) 11:40, 1 February 2012 (UTC)
In Java, we didn't call them "Old Javanese" (Indonesian: bahasa Jawa Kuno), because that would imply something different; instead the name "Kawi" (Indonesian: bahasa Kawi) is more appropriate. Bennylin 11:09, 6 February 2012 (UTC)

Names of languages in their own language (several questions)

We have a French entry for

, an Italian entry for

etc, and I have just added a Javanese entry for

. I think that we really ought to have an entry for every language in its own language.

Is there an easy way of finding out which ones are missing?

Shouldn't they all be simple nouns (uncountable), not proper nouns?

Should they all be uncapitalised (if written in an alphabet)? SemperBlotto 17:06, 31 January 2012 (UTC)

Probably no to all of the last three; English would be an exception to both #3 and #4. Mglovesfun (talk) 17:29, 31 January 2012 (UTC)
Appendix:ISO 639-1? The terms are unlinked, but it should not be hard to link them all. -- Liliana 17:30, 31 January 2012 (UTC)
Done, though some of them might be SOP. —RuakhTALK 17:53, 31 January 2012 (UTC)
And capitalization seems to be incorrect in some cases: the page lists Italiano.​—msh210 (talk) 23:48, 31 January 2012 (UTC)
Capitalization is a function of the rules of whatever language the word is occurring in. The German word for the German language - Deutsch - is properly capitalized, as are all language names in German. bd2412 T 17:52, 31 January 2012 (UTC)
[1] and [2] have local names for many languages, but capitalisation is an issue. Ungoliant MMDCCLXIV 21:14, 31 January 2012 (UTC)
No, these will not all be nouns. Language names in some languages, like Latin and Slovene, are usually adverbs or adjectives. --EncycloPetey 02:21, 2 February 2012 (UTC)

Which form of a letter is lemmatised: the majuscule or the minuscule?

By which I mean, if I define


, do I put the information that concerns both forms of the letter at

or at

? And does it matter in which language is the letter that I'm treating? I ask because, for the members of Category:la:Letter names of the Roman alphabet, for example , should it be defined as "The name of the letter k." (as it is currently) or as "The name of the letter K."? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 23:11, 31 January 2012 (UTC)

For Latin itself it should probably be the capitals, because that's all the Romans used. And I think for the sake of convenience, as well as common practice, it should be the same for other languages too. —CodeCat 23:35, 31 January 2012 (UTC)
Agreed. I've modified the entries for the fourteen members of Category:la:Letter names of the Roman alphabet accordingly. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 23:47, 31 January 2012 (UTC)
It contradicts our policy of using lowercase, though. As well, there are many more languages which use lowercase only than ones who use uppercase only. -- Liliana 11:28, 1 February 2012 (UTC)
Could you provide a link to that policy, please? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:55, 1 February 2012 (UTC)
If you say that there is no such policy, feel free to move free to Free and dictionary to Dictionary, in that case! -- Liliana 15:55, 1 February 2012 (UTC)
I think the policy doesn't concern single letters, though, any more than it concerns acronyms... —CodeCat 16:02, 1 February 2012 (UTC)
Where's the difference between words and individual letters? (It applies to acronyms too, but those are *usually* written in all uppercase, so they're okay) -- Liliana 16:10, 1 February 2012 (UTC)
The difference is in English usage. We use caps to give letters their own identity, whether standing alone or strung together, while in lowercase they are subsumed into words. E.g., Nasa is a word (nah-saw), but in ISO the letters remain letters (aye ess oh). Michael Z. 2012-02-01 18:11 z
That principle clearly doesn't apply to individual letters — and are red-linked as standard, the majuscule forms of letters are never red-linked as standard. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:04, 1 February 2012 (UTC)
As for Latin. The Romans used capital letters when hammering them into stone, but used a lowercase script when writing with a stylus. And anyway, the Latin language outlived the ancient Romans. SemperBlotto 16:12, 1 February 2012 (UTC)
Not to mention Latin is still official in Vatican City. -- Liliana 16:17, 1 February 2012 (UTC)

In English, minuscule is the default case used in running text, while capitalization is used for letter emphasis. However, the majuscule is the basic historical and stereotypical form of each letter, the first form of learners, the one used in indexes, and the most common one used for letters in isolation, and in abbreviations where the letters stand for themselves. it seems sensible to lemmatize the majuscule. Michael Z. 2012-02-01 16:16 z

Like Ruakh, I'd prefer to lemmatise both. For example:
  1. A: "majuscule form of a, the first letter of the basic modern Roman alphabet" or "the first letter of the basic modern Roman alphabet (minuscule form: a)"
  2. a: "minuscule form of A, the first letter of the basic modern Roman alphabet" or "the first letter of the basic modern Roman alphabet (majuscule form: A)"
I expect that at a minimum, if we lemmatise only one, e.g. A, we must include a definition line in a "minuscule form of A".
I (would/do) similarly oppose having some sense lines at e.g. a British spelling like colour but not at color because Americans don't use the word in those ways: it may be true that only one spelling has the sense, but it's confusing. Let usage notes and context and qualifier tags clarify that certain senses are generally used in one place or another, and thus in one spelling or another. Both A and a are the first letter of the alphabet, in addition to A being an ampere and a being a year, so I'd like the letter-ness mentioned in both places, A and a. - -sche (discuss) 20:15, 1 February 2012 (UTC)
I agree with your suggestions, but I think it's inaccurate to say that “only one spelling has the sense.” The term has senses, and spellings, and some of them are used mainly in certain places, times, situations, or media. For exmple, in Canadian English (a branch of “American English,” historically), the term is mainly spelled colour, but also color, and it may share senses with either or both British and US usage. This is why we should lemmatize the term, and not any spellings or capitalizations.
It's incorrect and misleading to treat colo(u)r as two different words. We lemmatized spellings and capitalizations just because MediWiki software lets us. We need a better guideline to help us define and lemmatized terms as lexical units. Michael Z. 2012-02-03 16:36 z

Which form of a letter is lemmatised: the majuscule or the minuscule? — Straw poll!

Scope: The Roman, Greek, and Cyrillic alphabets.

I support lemmatising the majuscule forms of letters
  1.   Support — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:33, 1 February 2012 (UTC)
    There's a problem to lemmatising minuscules in some cases — in the case of the Greek sigma, there is only one majuscule form, viz. Σ, whilst there are two minuscule forms, viz. σ and ς; which of those forms should be lemmatised, if we decide to lemmatise letters' minuscule forms? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 18:21, 1 February 2012 (UTC)
    In that case, clearly σ (s). As a general rule — well, majuscules might have the same problem. —RuakhTALK 18:43, 1 February 2012 (UTC)
    Why "clearly"? For me, lemmatising ς seems the intuitive choice, by analogy with choosing over . Also, which majuscules, if any, have the same problem? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 19:27, 1 February 2012 (UTC)
    I think I disbelieve your claim that you'd rather lemmatize [[s]] than [[ſ]]. ;-)   —RuakhTALK 23:41, 1 February 2012 (UTC)
    All kidding aside, I think that (if we lemmatise minuscules) it would make more sense to lemmatise the terminal forms, rather than the medial forms. The terminal form is the form the letter would take in isolation, because the medial form is only used when it is followed by other letters in the same word. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 23:49, 1 February 2012 (UTC)
    No, as I understand it, σ is the form used in isolation, with ς only being used at the end of a word. (And one reason that Unicode gives them separate code-points is that they can't be distinguished algorithmically, because there are abbreviations that end with σ, but I don't know if that's the exception or the rule.) By the way, in English, even when ſ was in use, I think that s was the default form, though now that the question is raised I suppose I'm not sure of that. —RuakhTALK 00:24, 2 February 2012 (UTC)
    OK. Well, if you're right that "σ is the form used in isolation", then that is the form that we ought to lemmatise, if we were to decide to lemmatise letters' minuscule forms. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:26, 2 February 2012 (UTC)
    There's one problem. What do you do with digraphs, that also have a titlecase form? Would you use the uppercase (DZ), or the titlecase (Dz)? -- Liliana 15:17, 3 February 2012 (UTC)
    That depends; what's the form that's used in isolation, the uppercase or the titlecase form? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 19:12, 3 February 2012 (UTC)
    In most languages and translingual entries, I would also stick to the basic caps forms, as in DZ. Of course, follow a language's rules of orthography, or precedent of other dictionaries where the digraph is used (Dutch IJ?). In this case, Dz has zero definitions, and a meaningless description in the translingual entry, so I can't say that there's a reason for this dictionary entry at all. Michael Z. 2012-02-03 22:13 z
  2.   SupportCodeCat 17:07, 1 February 2012 (UTC)
  3.   Support Michael Z. 2012-02-01 17:58 z
  4.   Support Ungoliant MMDCCLXIV 19:59, 1 February 2012 (UTC)
  5.   Weak support for English. Not convinced this is a good idea in general; it's just that I don't like needless duplication of information across entries. Equinox 22:45, 2 February 2012 (UTC)
  6.   Support. Majuscule letters are the "presentation form" meant to, say, be inscribed in stone. For example, titles of books are often entirely in capitals (I see numerous examples in my own library). Capital letters are geometrically simpler, consisting entirely of compositions of straight lines and circular (or elliptic) arches, and perhaps for that reason capital letters are the letters one first learns (as a child). A capital city represents a country (at least politically) even though small villages in it might be much more numerous; and, by analogy, the capital form of the letter should be the lemma for the lexeme. —AugPi 19:39, 3 February 2012 (UTC)
    Some exceptional cases, like the German Eszett (ß), might not, perhaps, make so much sense have the capitalized form as lemma, but, for German, the majuscule form of the Eszett already seems to be being used as the lemma form (with a See also section; the minuscule form doesn't have one; and that See also section links to majuscule forms). As for the Greek sigma with its two minuscule variants, the fact that it has only one majuscule form (and that the same is true for phi), makes majuscules likelier candidates for the lemma forms of Greek letters. —AugPi 20:06, 3 February 2012 (UTC)
I support lemmatising the minuscule forms of letters
  1.   Support one argument I forgot to mention is that some majuscules are really badly supported (hello , and hello to you too, , as opposed to the minuscules ɥ and ɦ). -- Liliana 16:45, 1 February 2012 (UTC)
    Aren't those IPA signs? Why would they have different cases at all? —CodeCat 17:09, 1 February 2012 (UTC)
    They are orthographic letters in certain minority languages. -- Liliana 17:13, 1 February 2012 (UTC)
    Not a good reason. Lack of font support for new characters will always be a transitory problem, and it is purely speculation that in the long run it would affect majuscules more than minuscules. Michael Z. 2012-02-01 17:57 z
    By the way, the first is in Unicode 6.0 and displays correctly on my Mac, the second is from Unicode 6.1, released yesterday, and displays as a box. Michael Z. 2012-02-01 23:36 z
    I can see both, but that's to be expected I guess. Most people won't see either. -- Liliana 15:34, 2 February 2012 (UTC)
    Aren't those out of the scope (Roman, Cyrillic and Greek)? Ungoliant MMDCCLXIV 19:59, 1 February 2012 (UTC)
    These two are used in the Latin alphabets for some African languages, part of Unicode Latin Extended-DMichael Z. 2012-02-02 00:48 z
    I had incorrectly assumed they were in an IPA block. But even if we lemmatise the majuscule we will need exceptions. The letters above were created because of the need of having uppercase in IPA-based alphabets; ß should also be lemma, not . Ungoliant MMDCCLXIV 02:08, 2 February 2012 (UTC)
    Certainly, it is , and not , that ought to be lemmatised, but I think that ought to be a reasonable exception to the general "lemmatise majuscules" rule. After all, is one of a very few minuscules that (traditionally) have no majuscule forms; in fact, are there any besides the Eszett and kra (ĸ)? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:26, 2 February 2012 (UTC)
    ƛ has no majuscule I know of. Other than that, nothing immediately comes to my mind. -- Liliana 15:31, 2 February 2012 (UTC)
    Well, there's this majuscule form: (codepoint: U+A798), but its addition is only proposed hitherto. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:15, 2 February 2012 (UTC)
    The Cyrillic modifier letters soft sign ь and hard sign ъ only have uppercase forms for stylistic reasons. Of course other exceptions will come up, but this vote is to determine the default choice, all else being equal. Michael Z. 2012-02-02 15:35 z
    Actually no, Bulgarian has words that start in an ъ, and if those occur at the beginning of a sentence, capital Ъ is used (e. g. ъгъл (ǎgǎl, angle)), and it isn't just theoretical exercise either since Slavic languages don't use grammatical articles. -- Liliana 15:40, 2 February 2012 (UTC)
    Oops. I don't know if it's necessary here, but this brings up the question of lemmatizing different forms for different languages. Would it be acceptable to have the main entry in ъ for Russian and in Ъ for Bulgarian? Michael Z. 2012-02-02 18:24 z
    That would be very user unfriendly in my opinion. How would a reader know where to look? -- Liliana 18:47, 2 February 2012 (UTC)
    Each respective entry would say “Lowercase” or “Uppercase form of...” and link to its lemma entry. I'm not saying it's necessarily the best solution here, but I think it could be an acceptable option, especially when these represent a somewhat different letter in each language. Michael Z. 2012-02-02 19:36 z
    Except that they be glossed “Minuscule…” and “Majuscule form of [letter]”, I agree with you. Even if we can't achieve consistency across languages, we should at least be able to achieve consistency within languages. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 20:49, 2 February 2012 (UTC)
  2.   Support, though honestly I think we should treat both forms as lemmata. They generally have different meanings (e.g., the Σ of summation vs. the σ of standard deviation), and they're separate Unicode characters, and they're such a closed class. —RuakhTALK 18:35, 1 February 2012 (UTC)
    But this poll is about what to do with, for example Σ and σ, as letters. We ought certainly to have separate entries for different usages of such characters as symbols. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 19:27, 1 February 2012 (UTC)
    Well, but they're always letters. It's not that the character Σ is sometimes used as a letter and sometimes used as a symbol, but that the letter Σ is sometimes used as a symbol. —RuakhTALK 20:19, 1 February 2012 (UTC)
    But why would you want to duplicate pronunciatory, etymological, and usage information in both the majuscule and minuscule entries? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 23:43, 1 February 2012 (UTC)
    I wouldn't — but that's not the question. Workmanlike and kindness and patronizing are all lemmata, but that doesn't mean that all information has to be duplicated from [[workman]], [[kind]], and [[patronize]]. Conversely, [[bid]] has several lemmata that share a pronunciation — so that pronunciation is given only once. —RuakhTALK 02:37, 2 February 2012 (UTC)
    At present, we have the stupid situation where there's a lot of duplicated information at and because neither is lemmatised. In the case of letters, we can give a lot of information — especially, in the case of English ones, pronunciatory information — and for the same reasons that lemmatisation is A Good Idea™ generally, it's a good idea to lemmatise letters. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:26, 2 February 2012 (UTC)
    Yes, uncoordinated entries are a real problem affecting our quality as a dictionary. I can't even find our guideline governing lemmatizing, but I seem to remember something that actually required redundant lemma entries for American and British spellings of a term. Bad. Michael Z. 2012-02-02 19:43 z
    As far as I know, the only possibly workable guideline is to lemmatise whatever spelling's entry that was created first. In the case of vs. , that would lemmatise (since it was created in December 2004, whereas didn't exist until March 2007), which shouldn't be controversial. But then there are entry pairs like vs. , where it seems impossible to reach consensus as to which ought to be lemmatised (by the same principle that lemmatises , (created in December 2002) would be lemmatised, with (created in May 2003) becoming a "soft redirect"; didn't get a proper entry until the 4ᵗʰ of May in 2003, and didn't until the 15ᵗʰ of May in 2003, but regardless, the result is the same). — Raifʻhār Doremítzwr ~ (U · T · C) ~ 20:49, 2 February 2012 (UTC)
    How about lemmatizing the earliest attested form, or the most etymologically correct one? I believe this would favour some British and some American spellings. Yes, I'm sure there would be a lot of debate over the specifics. The duplication in English entries also concern capitalizations, including aboriginal/Aboriginal and labor/labour/Labour. (Sorry I'm getting off topic.) Michael Z. 2012-02-02 22:34 z
    Lemmatising the earliest attested form wouldn't work, because then we'd get a lot of obsolete late–fifteenth-century spellings being lemmatised. I'd support lemmatisation by etymological correctitude, but there has been a fair amount of opposition to such proposals in the past. How would you suggest that we resolve the duplication issuing from capitalisation? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 22:53, 2 February 2012 (UTC)
    Well, earliest-attested of the current forms. Capitalization is probably a case-by-case question. I once lemmatized Aboriginal because some style manuals recommend capitalizing it as an ethnonym, but its older twin grew back. I would now be happy to put it at the traditional basic form aboriginal to reduce duplication. In the end, the URL and page title are just convenience labels, and the full story is in the full text of a single entry (and lacks integrity as long as it remains scattered about several). Michael Z. 2012-02-02 23:19 z
    I agree; consolidation somewhere suboptimal is better than no consolidation at all. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:51, 3 February 2012 (UTC)
    So let's propose some good lemmatization guidelines. I can't even find the basic common-sense rules we all agree on in WT:English_definitions#Lemma_forms, WT:Lemmas, and WT:About_English#Regional_differences. Am I missing anything? Michael Z. 2012-02-03 14:53 z
    Let's work out lemmatisation rules specifically for letters before we work on ones for terms generally. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 19:14, 3 February 2012 (UTC)
I don’t care which form we lemmatise, as long as we lemmatise consistently
I couldn’t care less (i.e., I abstain)
  1.   Abstain Mglovesfun (talk) 18:00, 1 February 2012 (UTC)
  2.   Abstain --EncycloPetey 02:19, 2 February 2012 (UTC) There are problems with selecting only one or the other as lemma form, and so I don't think we can make a choice for one over the other. Some letters in some languages, such as German and Slovak, have only a miniscule form (the majiscule is theoretical but is never used in the language), and in some languages the majiscule has more than one associated miniscule form. I don't think either form should be lemmatized over the other. --EncycloPetey 02:19, 2 February 2012 (UTC)
    Are there some principles you can recommend whereby we might reach ad hoc solutions? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:26, 2 February 2012 (UTC)
    Umm... solutions to what? I don't see a problem as everyone else seems to. This went to a poll before the "problem" was clarified. There are quite a few issues being discussed here. --EncycloPetey 03:07, 3 February 2012 (UTC)
    What I meant was, are there some principles you can recommend whereby we might decide which form (be it the minuscule or the majuscule) to lemmatise in any particular case? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:51, 3 February 2012 (UTC)
    I think you've misunderstood what EncycloPetey wrote. He didn't write, "I think that one form should be lemmatized in some cases, and the other form in other cases." He wrote, "I don't think either form should be lemmatized over the other." That is, that neither form should be treated as a mere "form-of" of the other form. (Unless, of course, it's I who misunderstood.) —RuakhTALK 22:34, 3 February 2012 (UTC)
    Perhaps you're right in your interpretation, but that just means that EP advocates an unworkable "solution". — Raifʻhār Doremítzwr ~ (U · T · C) ~ 22:47, 3 February 2012 (UTC)
    I interpret any result of this vote as a default choice, a recommendation for consistency when there aren't any specific circumstances that dictate the choice. Obviously, we would lemmatize lowercase and not uppercase ß. But shouldn't the English letter ess have one definition and not three, at S, s, and ſ? We're a dictionary, not a catalogue of Unicode code points. If we can neatly define the diverse verb wrought as a form of both work and wreak, why on earth should we have redundant entries defining the letter J? Michael Z. 2012-02-03 23:56 z
    I agree with your way of interpreting whatever is the result of this straw poll. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 14:30, 4 February 2012 (UTC)
    ſ would deserve special treatment I guess, or how would you describe its use at capital S? -- Liliana 00:12, 4 February 2012 (UTC)
    Most of the information on that letter will be at , but the information specific to in which circumstances ought to have been used instead of should be at . — Raifʻhār Doremítzwr ~ (U · T · C) ~ 14:30, 4 February 2012 (UTC)


This is totally retarded! We're voting on something even though none of us even know what the result would look like!

Am I right that if this passed, it would look like this?

Primary form

#the first letter of the Latin alphabet, yadda yadda
#additional case-sensitive meanings
#the first letter of the English alphabet
#the first letter of the Spanish alphabet

Secondary form

#{{secondary form of|[[link to primary form here]]}}
#additional case-sensitive meanings

Or do you want sections for every single language in the secondary form, whichever it will be? -- Liliana 15:17, 3 February 2012 (UTC)

English a is a form of English A, so there would be a language section. Michael Z. 2012-02-03 15:56 z
If we do decide to have language sections for every language at the secondary form, then there's nothing gained from choosing one form, as you need to synchronize the two entries anyway (add/remove form-of entries etc), in which case this very discussion is pointless. -- Liliana 16:16, 3 February 2012 (UTC)
This is a separate and larger question. We never use “translingual” as a substitute for individual language entries.
Besides that, English a is likely to remain the minuscule firm of A during our lifetimes, so I don't understand what synchronizing problems exist. On the other hand if we don't lemmatize letters, then a letter entry would become out-of-sync or even contradictory with every single edit. This gives the advantages of the w:DRY principle. Michael Z. 2012-02-03 16:51 z
Yes, but a form-of entry of a letter can still contain a pronunciation, audio file, possibly homophones, external links, and Daniel-style lists. Those still have to be synchronized if we were to keep the language entries, so there would be almost nothing saved in maintenance required. -- Liliana 16:56, 3 February 2012 (UTC)
anything that needs to be synchronized should be moved to the lemma entry. Anything else doesn't need to be synchronized. That's the whole point. It serves the task of the editors, the integrity of our information, and the goals of our readers. Michael Z. 2012-02-03 17:09 z

If anyone doubts that a letter entry can contain extensive information, I invite you to read the NED’s and OEDs’ entries, links to which I have provided here; hopefully, they will show the need to lemmatise letters. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 19:10, 3 February 2012 (UTC)

To clarify, what I'm saying is that all that information should be in the entry for one letter form only, and not duplicated over both (or, in some cases, all) the letter forms. I hope that that is not controversial. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:37, 4 February 2012 (UTC)
All what information? Exactly which information do you and others think can be consolidated and which cannot? We can't consolidate the pronunciation information for A and a at the majiscule entry, because the miniscule is both a letter and a word. Likewise, we can't consolidate the etymology at the majiscule entry, because the word has a separate etymology requiring multiple etymology sections. We can't consolidate quotations (if we have them) because we want quotations to support each form of a term/item. So what information do you think can be consolidated? --EncycloPetey 04:42, 8 February 2012 (UTC)
The article and the abbreviations and are entirely different from the letter , ; you're just confusing them because of homography. The entry for “A, n.” in the OED [3ʳᵈ ed., June 2011] has this in its “Etymology” section (I've just copied and pasted it, so its formatting, links, &c. have not been reproduced, but it'll give you an idea):
If you don't think all that's worth consolidating in one place, then there's clearly no argument I can make to persuade you. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 04:28, 16 February 2012 (UTC)