Last modified on 30 July 2014, at 10:48

Wiktionary:Beer parlour/2013/November

discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← October 2013 · November 2013 · December 2013 →

WorderEdit

Hi, can an admin please remove or suppress information from Worder's user page and talk page? It contains an email address. --~ curtaintoad ~~ talk ~ 20:53, 1 November 2013 (UTC)

I took care of it. For future reference, I don't think the Beer Parlour is the best place to bring this up. Try posting on an active admin's user page, or, if it's really bad, on Wiktionary:Vandalism in progress. --WikiTiki89 21:11, 1 November 2013 (UTC)
Okay, thanks. I also got confused if this violates the policies or not (regarding email addresses), but thanks again for taking good care of it. --~ curtaintoad ~~ talk ~ 21:10, 2 November 2013 (UTC)

RollbackEdit

Hey. Please add me to the rollback group as it will allow me to revert vandalism much better and easier. I believe I have made enough reverts to be granted this permission and to be trusted with it. Thanks, --~ curtaintoad ~~ talk ~ 22:51, 2 November 2013 (UTC)

I think it's a little too early for that, considering you've been here since August and only have 65 edits in the main namespace. You have not even been made an autopatroller yet. Keep it up though, and you'll get there soon. --WikiTiki89 23:05, 2 November 2013 (UTC)

Transliteration of Georgian consonantsEdit

What is the rationale behind transliterating the aspirated consonants with an extra «’»? I note that in Georgian: A Reading Grammar by Howard I. Aronson the ejective consonants are marked with a dot below the transliterated letter, while in Beginner's Georgian by Dodona Kiziria, there's a «’» after a transliterated ejective consonant. In IPA representation, ejective consonants are also marked with «’». With the perspective from within the English language, it also seems weird to transliterate consonants that are essientially equivalent to the English ones with an extra symbol, while the consonants that are fairly different are transliterated as if they were the closest equivalents. --Njardarlogar (talk) 10:23, 3 November 2013 (UTC)

Aspirated Georgian consonants are marked with an extra «’» in many transliteration systems, namely ISO 9984 (the one we use), ALA-LC, BGN/PCGN. The tradition probably goes back to the Hübschmann-Meillet transliteration system for Armenian, where aspirates are marked by «ʿ», but I have no proof. Your first source must be using the Caucasiological transliteration scheme employed in serious literature, such as Klimov and Fähnrich. I do not mind switching to it. Your second source must be using the National system; it is unscientific and should not be adopted by us. --Vahag (talk) 11:27, 3 November 2013 (UTC)

Separate articles for inflected formsEdit

Discussion moved from Wiktionary:Grease pit/2013/November#Separate articles for inflected forms.

(Moved from Grease Pit) itsacatfish 14:54, 3 November 2013 (UTC)

I have mentioned this before, but I wasn't convinced. In my mind, articles such as incalzai and salmodiaba have any function at all that is not achieved by a redirect and the presence of a conjugation table on the main article - all the information is provided elsewhere, ergo the article is not needed. I feel these articles - which judging by the "hit-rate" of this type of article when using the Random entry feature take up at least 50% of all articles on the entire wiki - are just unneeded, and should be replaced with indirects in all cases, except for when there is a justification not to (such as the presence of multiple etymologies of the word or specific idiomatic usage of a particular wordform).

itsacatfish

Perhaps there is a way to prevent soft redirects from showing up at Special:Random? --WikiTiki89 21:14, 1 November 2013 (UTC)
Different words can share one or more inflected forms and the lemma form of one word can be identical to the inflected form of another word. In the latin script at least, we cannot safely predict which inflected forms that will not conflict with other words; and it would give the reader the wrong idea if a word redirected when it shouldn't. In short, redirects are a bad idea.
Ideally, the software would have been designed fundamentally different so that inflected forms were added automatically from the inflection tables rather than being created manually or by bot. That way, we could also treat pages containing only inflected forms as non-content pages. --Njardarlogar (talk) 21:51, 1 November 2013 (UTC)
Sure, there are some spellings that are known to be unique among all the languages of the world, and there many that are shared between at least two well-known languages so it's obvious that they can't be redirects. What about all the other cases? No one speaks all of the languages of the world, or even has references for them, so no one person knows if a particular spelling is unique to one language or not. Even if we're able to guess right more often than not- what happens when we're wrong?
The problem with redirects is that they're not redlinked, so someone adding a translation to an English entry may be fooled into thinking there's already an entry for it in their language. If they do realize there isn't an entry, I suspect many casual contributors won't know how to get to the redirect page and convert it to an entry, or may think they're not allowed to. We then end up with the redirect ensuring that there never will be any content on that page- regardless of the merit of the content that might be added.
By the way, this isn't a technical question, so this would have been better addressed at the Beer Parlour. Chuck Entz (talk) 03:55, 2 November 2013 (UTC)
Indeed, and Special:Random isn't really relevant because it has nothing to do with what entries people look for. Also having something like joue redirecting to jouer means you have to visually scan the whole table to find all the instances of joue. And joue has a Dutch section and a noun on it, so you can't redirect it to jouer and lose both the noun and the Dutch section. In fact the Dutch is a form of jouen so you need to simultaneously redirect to two entries, and keep the noun section. Totally impossible and in my opinion not desirable. Mglovesfun (talk) 12:12, 2 November 2013 (UTC)
Ok I take the point of "people may be scared to convert a redirect page into an actual article" argument. That might be true but it's a fairly weak argument since it's still quite possible to do, and the kind of people who wouldn't do it are probably the kind of people who wouldn't make a new article for the word anyway.
All of the other comments completely fail to take into account my final statement that redirects should be made "except for when there is a justification not to" i.e. except when there is an entry in another language or that particlar wordform has other idiomatic or unrelated meanings.
As a learner of a highly inflected language - Russian, I know that 99% of the time, I just want to find the definition of a word. In the 1% of cases that I want to find a particlar inflected form, I am quite happy to just look for it in a declension table - this is not a difficult task and does not warrant the creation of literally millions of soft redirects. itsacatfish
One of the lesser functions of Wiktionary is lemmatization/stemmer. When it becomes technically possible to relegate it to search box, Wikidata or something else, or perhaps when search of template-generated inflected forms becomes feasible, we could get rid of the those "dumb" entries that are also consuming vasts amount of database dump. Until then, you will have to make that extra mouse click. --Ivan Štambuk (talk) 20:39, 3 November 2013 (UTC)
@itsacatfish the whole point was that we cannot know which pages can be safely redirected without at any time knowing every inflected form of every language written with the relevant script. As for your example, many languages are written with the cyrillic script. --Njardarlogar (talk) 09:47, 4 November 2013 (UTC)
Another thought: redirects are not informative. The reader may not understand why the page changed: was it because of a spelling or typographic variation? A common error? An inflected form? And if so, which one? An inflected form has other informations than just "inflected form of" that can't be found easily in a table (if you even know that the information you are looking for is in a table): pronunciation, other spellings, similar sounding words, similar written words (in the same language) or typographically similar words ({{also}}). Dakdada (talk) 16:16, 4 November 2013 (UTC)

Context templates.Edit

Previous discussion: Wiktionary:Beer parlour/2013/June#Lua-cising Template:context

A few months back, CodeCat migrated the context templates to Lua, and deleted them from the template namespace. The first half of this was indisputably a good thing, but I'm less clear on the second: one of the benefits of Lua is that it would actually make it quite straightforward to have e.g. {{transitive|...}} be exactly equivalent to {{context|transitive|...}}. (At the very least, even if we intended to eventually delete {{transitive}}, it would have made sense to set up that equivalence temporarily, so we could have a longer transition period, rather than deleting each template as soon as it had been bot-orphaned. But that's water under the bridge now.) Personally I'm actually pretty O.K. with always requiring context| — especially since we have so many contexts, and it was always impossible to keep track of which ones had their own templates and which ones didn't — but the discussion from the time does not show much support for the change, so I thought I would check and see how people feel about this now. Are we O.K. with the current behavior? Are there any die-hard fans of being able to write e.g. {{transitive|...}}? —RuakhTALK 17:33, 3 November 2013 (UTC)

I don't mind it, as long as {{cx}} (or another shortcut) remains an option. —Μετάknowledgediscuss/deeds 17:38, 3 November 2013 (UTC)
What he^ said. --WikiTiki89 17:42, 3 November 2013 (UTC)
What they^^ said. --Vahag (talk) 18:38, 3 November 2013 (UTC)
I'm ok with it as it is. Mglovesfun (talk) 18:54, 3 November 2013 (UTC)
I was upset seeing them gone, but eventually grew accustomed to {{cx}}. This is on of the cases when even a single-letter template name (e.g. {{x}}) would be justified IMHO. --Ivan Štambuk (talk) 20:31, 3 November 2013 (UTC)
I personally could live with a temporary restoration limited to whatever context templates have not been properly replaced with something Luaic.
I still wonder about how casual and new users are supposed to learn that {{context}} or {{cx}} is what is supposed to be used. Some insert hard formatting, but some use the old templates, probably because it seems plausible or because it used to work. Are such contributors actually or potentially important enough for us to worry about? I think they are, because the projects is both incomplete and needs significant quality improvement and because such contributors add diverse viewpoints and idiolects to the tasks of adding and improving entries. If we had some forms-based input that also taught users how to format using templates, then this approach would not be necessary at all. DCDuring TALK 21:32, 3 November 2013 (UTC)
Well how did they know to use the old templates? Because they saw them being used before. Likewise, they will see our new template and use it instead. So basically the only people using the old templates are the ones that have already been using them and don't know about the change, but they'll catch on soon. --WikiTiki89 21:39, 3 November 2013 (UTC)
@ Wikitiki:The old template system was as if "{{" was "(" and "|" was "," etc. It was a simple leap. It was slightly less typing than hard formatting. And nobody felt the need to require "lang=en" for the default language. It could be a little more complicated if one was trying to get something categorized as well, but that is probably rarely a high priority for end users or casual contributors. DCDuring TALK 02:10, 4 November 2013 (UTC) IFYPFY.​—msh210 (talk) 20:01, 26 November 2013 (UTC)
Yes, but they wouldn't know they could do that unless they saw it. Since they are no longer seeing it, it will become less of a problem. --WikiTiki89 02:35, 4 November 2013 (UTC)
I suppose. The damage is done. DCDuring TALK 03:08, 4 November 2013 (UTC)
I shall just point out that we also have {{label}}. Recently I have switched to it, because the markup is shorter (language codes being, if I recall correctly, mandatory anyway). Keφr 23:53, 3 November 2013 (UTC)
To makes things even easier, we should find a shortcut name for {{label}}. --WikiTiki89 01:15, 4 November 2013 (UTC)
And no, I will not miss standalone context templates. The fewer templates, the better. Keφr 23:56, 3 November 2013 (UTC)

Some proposals at Wiktionary talk:About ArabicEdit

Since I know that it is unlikely that people actually check this page frequently I am letting you guys know that I made a few proposals at Wiktionary talk:About Arabic. Those of you interested in our policy on Arabic, please check it out. --WikiTiki89 01:51, 4 November 2013 (UTC)

I've replied to your three questions there. --Anatoli (обсудить/вклад) 02:04, 4 November 2013 (UTC)

Introducting Beta FeaturesEdit

(Apologies for writing in English. Please translate if necessary)

We would like to let you know about Beta Features, a new program from the Wikimedia Foundation that lets you try out new features before they are released for everyone.

Think of it as a digital laboratory where community members can preview upcoming software and give feedback to help improve them. This special preference page lets designers and engineers experiment with new features on a broad scale, but in a way that's not disruptive.

Beta Features is now ready for testing on MediaWiki.org. It will also be released on Wikimedia Commons and MetaWiki this Thursday, 7 November. Based on test results, the plan is to release it on all wikis worldwide on 21 November, 2013.

Here are the first features you can test this week:

Would you like to try out Beta Features now? After you log in on MediaWiki.org, a small 'Beta' link will appear next to your 'Preferences'. Click on it to see features you can test, check the ones you want, then click 'Save'. Learn more on the Beta Features page.

After you've tested Beta Features, please let the developers know what you think on this discussion page -- or report any bugs here on Bugzilla. You're also welcome to join this IRC office hours chat on Friday, 8 November at 18:30 UTC.

Beta Features was developed by the Wikimedia Foundation's Design, Multimedia and VisualEditor teams. Along with other developers, they will be adding new features to this experimental program every few weeks. They are very grateful to all the community members who helped create this project — and look forward to many more productive collaborations in the future.

Enjoy, and don't forget to let developers know what you think! Keegan (WMF) (talk) 19:48, 5 November 2013 (UTC)

Distributed via Global message delivery (wrong page? Correct it here), 19:48, 5 November 2013 (UTC)
I see that the Typography Update beta on MediaWiki.org has been rescheduled for Nov 14. Michael Z. 2013-11-08 23:20 z
No sign of it yet. Michael Z. 2013-11-14 16:26 z
Typography Update/Typography Refresh is now active on MediaWiki.org (in the Beta preferences link at the top), if anyone would like to have a look. Michael Z. 2013-11-21&n:::bsp;17:44 z
I have enabled Typography refresh. What is it useful for, apart from displaying headers in Georgia font? --Vahag (talk) 11:51, 22 November 2013 (UTC)
Supposedly, Georgia font uses mind control to make the headers easier to read in case you couldn't see them before. --WikiTiki89 13:36, 22 November 2013 (UTC)
It's working! Headers are easier to read. And the added spacing between them is beneficial. --Vahag (talk) 14:26, 22 November 2013 (UTC)
Headers weren't hard to read before, but they would be even easier now with Georgia, if only I liked Vector. Isn't normal-sized type more likely to benefit than already large type? DCDuring TALK 14:37, 22 November 2013 (UTC)
Woah, this is now live on Wiktionary! So much for testing it out in advance.
The body font now has a stack of common OS fonts specified, and the sidebar links are smaller and grey. The CSS cascade is slightly different, so you may have to update your vector.css for everything to look just right.
There is some yadda-yadda about text metrics for multiple scripts, but I can’t tell what went into it or what has changed. Michael Z. 2013-11-22 16:03 z
So where can we vote against these features? --WikiTiki89 15:17, 25 November 2013 (UTC)
There’s a lot of complaining on the MW pages linked above, but I think little of it is based on any factual evidence or real problems. Since WMP has invested a lot into this, I doubt they will just cancel it. Maybe you are better off suggesting how it can be improved. Michael Z. 2013-11-26 17:06 z

Bots anyone?Edit

I don't know if this is the right place to ask about that, but... I've been doing a lot of form-of page entering lately, and I'm wondering if one of the bot owners here would like to help me. User:George Animal used to help me upload form-of pages with his User:GanimalBot, but he hasn't been here much lately... is anyone with a bot interested in adding lots of pre-formatted pages on Latvian adjective and verb forms? --Pereru (talk) 20:15, 8 November 2013 (UTC)

For the future, bot requests are normally done at the Grease pit. --WikiTiki89 20:23, 8 November 2013 (UTC)
Yes. Could you describe the workflow? How are the pages to be generated? DTLHS (talk) 20:18, 8 November 2013 (UTC)
Basically, I create the pages using subst: with a template (like User:Pereru/Adjective forms/source code) and then I place it at User:Pereru/Adjective forms. The result is a single file in which the individual form-of pages have the format:
xxxx
PAGENAME
==Latvian==
......
yyyy

xxx
PAGENAME
==Latvian==
......
yyyy

George Animal would then use that page as an input for a script that splits the content at the xxxx-yyyy border, uses PAGENAME as the name of the page to create, and the rest (from ==Latvian== down to yyyy) as the ocntents of said page. Is that helpful? (Also, if a page of that name already exists, the bot -- at least George Animal did -- warned me about that so that I could deal with it manually, though there may be better solutions to that. --Pereru (talk) 00:08, 9 November 2013 (UTC)
A bot can certainly insert a new language to a page that already exists, but if the page already has that language, it's much more difficult and should probably be done manually. --WikiTiki89 00:17, 9 November 2013 (UTC)
  • Before running a bot to create new inflected form, the same bot should check for errors in existing inflected form entries, as well as for incompletely created paradigms. --Ivan Štambuk (talk) 08:40, 9 November 2013 (UTC)
Now I'm considering building my own bot, regardless of the language version; and regardless, any tutorials on that matter that any bot-oriented veteran could have sent to at least my talk page? --Lo Ximiendo (talk) 10:22, 9 November 2013 (UTC)
OK. Sorry for starting the discussion here--I didn't think that asking' for bot help would be a technical matter, so I didn't place it in the Grease Pit. So, DTLHS, do you think you can help me? (Ivan, I see how the bot could look for incompletely created paradigms, but how could it know a form is wrong? Don't you need human input for that last part?) --Pereru (talk) 10:46, 9 November 2013 (UTC)
1) Extract a linked lemma from the form-of entry 2) generate inflected forms for the lemma 3) isolate those forms equal to the inspected form-of entry 4) compare the existing entry to the generated entry 5) if different, add attention tag. --Ivan Štambuk (talk) 12:01, 9 November 2013 (UTC)
I see. Sounds feasible, though beyond my ken. If someone here with a bot can do that to my form-of pages, I would certainly be glad. So -- is anyone available? --Pereru (talk) 22:01, 9 November 2013 (UTC)
Yes, I can do it easily. I have watched User:Pereru/Adjective forms and will do a test run when you update it. DTLHS (talk) 01:25, 10 November 2013 (UTC)
Good. I'm placing the inflected forms of two new adjectives there right now (spējīgs and izdarīgs) -- that's about 120 forms, that should be enough for your test run. I'm looking forward to seeing this work! (By the way, since you haven't created your User page yet, how do I get in touch with you when there are more adjective / participle forms to upload? Or should I just leave forms at User:Pereru/Adjective forms for you without any message or warning? --Pereru (talk) 12:15, 10 November 2013 (UTC)
Sorry to come to this discussion late. Myasis doesn't appeal to me, so I think I'll pass (see this for details)... ;) Chuck Entz (talk) 01:57, 10 November 2013 (UTC)

The same lame joke I always make...Edit

I will be taking a trip tomorrow, and will be unable to edit for a week. Please try to finish the dictionary by the time I get back. Cheers! bd2412 T 02:10, 9 November 2013 (UTC)

No problem. It will be done. --WikiTiki89 03:03, 9 November 2013 (UTC)
It's all looking pretty much done, except for a certain motley section devoted to the Judeo-Arabic language *cough cough* ;)Μετάknowledgediscuss/deeds 06:24, 9 November 2013 (UTC)
It will probably take about 2 weeks. Haplogy () 07:16, 9 November 2013 (UTC)
We've been trying since about 2003 now. Mglovesfun (talk) 18:34, 9 November 2013 (UTC)

Absolutely no way to add an external link when not logged in?Edit

Well, I must say your policy has become WAY stricter than Wikipedia's meanwhile!! My external link was "automatically deemed harmful" and rejected. I can't believe that! In 8 years, it has never happened in Wikipedia that any of the references I added for clarification was ever rejected in any way! That's no "policy" anymore; that's on the way to be called paranoia! Not amused. This is the entry in question: חן So is there any place where I can read about this policy? I still think this policy of total rejection of external links for new and/or unregistered users is unacceptable. Remember Wikipedia existed when Wiktionary was still in the making, so don't try to be stricter than them! -andy 77.191.199.87 17:41, 9 November 2013 (UTC)

It's not a policy, it's a spam filter. Wikipedia has better spam-reverting bots than we do. Anyway, I have fixed the formatting in your edit, please try to use the correct templates next time. --WikiTiki89 18:03, 9 November 2013 (UTC)
You're joking man? What do you mean by "correct templates"? These poor excuses for templates by chance? There is not even a transliteration possible with them! So with my---as you call them---"wrong" ones I do have at least a transliteration! (what you can perfectly see in the entry's old history) So even though you call them "wrong", they're way better! Now all my transliteration efforts have gone down the drain. Very nice, really. :-/ -andy 77.191.199.87 18:08, 9 November 2013 (UTC)
I don't know what you're talking about, transliterations are supported by virtually all our templates. My advice to you is to stop complaining so much. If you don't understand how things work around here, then ask questions politely and learn. --WikiTiki89 20:07, 9 November 2013 (UTC)
Ironically, Wikipedia would probably frown upon this kind of reference even more than us. See w:WP:SELFPUB. Keφr 18:19, 9 November 2013 (UTC)
Why not create an account? Mglovesfun (talk) 18:33, 9 November 2013 (UTC)
The spam filter blocks any attempt by a user with <2 edits to add an external link. So far today, it has blocked nine spammers from making 14+ edits, one vandalistic but non-spam edit, this questionable edit (which I instated for demonstration purposes and then undid), two possibly-legitimate or possibly SELFPUB / self-published-blog-pushing edits (including yours), and no unambiguously helpful edits. (In days past, it has also blocked quite a few editors who try to link to Wikipedia by pasting URLs rather than using w: notation.) - -sche (discuss) 19:23, 9 November 2013 (UTC)
At least our sandbox is what Lua is for, thankfully. --Lo Ximiendo (talk) 19:33, 9 November 2013 (UTC)

Ivan's endorsement of AltaicEdit

Ivan recently created Category:Altaic languages, added languages to the family in Module:languages and Module:families, and also created a code for Proto-Altaic. I think this endorsement of a theory that is not widely accepted is very worrying, especially when the category failed RFD for the same reason 4 years ago. —CodeCat 16:38, 10 November 2013 (UTC)

So what if it is controversial? "Failing" RfDO in which no one participated years ago does not mean it cannot be recreated. --Ivan Štambuk (talk) 16:46, 10 November 2013 (UTC)
I see no problem with creating the category as long as we mention on its page that its existence is controversial. --WikiTiki89 17:08, 10 November 2013 (UTC)
This kind of stuff needs to be discussed beforehand. I got rid of everything for now. There are millions of theories. If we have this, what stops one from adding Dene-Caucasian next? -- Liliana 17:25, 10 November 2013 (UTC)
No there are not millions of theories. There are in fact very few theories, and specifically Altaic Studies is an established field of scholarship. Here you have inexplicably removed perfectly valid referenced etymology. But let's return to the matter - why do you think we shouldn't include Proto-Altaic reconstructions? --Ivan Štambuk (talk) 17:35, 10 November 2013 (UTC)
As of yet Altaic has not been accepted by the majority of linguists, and we shouldn't lead readers the wrong way by purpoting this theory. Anything that is accepted in the linguistic community is fine, but Altaic isn't so far. -- Liliana 17:41, 10 November 2013 (UTC)
Wikipedia is driven by notability, not acceptance. Controversial topics merit their own articles if they are proven to be notable enough. Similar position could be taken for groupings of languages - theories such as Altaic, Nostratic and so on have decades of scholarship behind them, active community of researchers, and published works (by non-fringe publishers such as Brill), so it's safe to assume that somebody is reading them, and that Wiktionary readers could be interested in them as well. No harm is done if such theories are unambiguously marked as not widely endorsed, similar to what we already do for "unsafe" reconstructions and speculations of prehistorical borrowings. If we could have "all words in all languages", why not also "all reconstructions in all protolanguages" :)
I also think that we should host obsolete etymologies and reconstructions, because they are important for historical reasons. It would be interesting to read evolving explanations of origin of a word X evolved through the ages. --Ivan Štambuk (talk) 18:00, 10 November 2013 (UTC)
I think this is exactly why we can't just blindly rely on references. If references assume Altaic is valid, that certainly doesn't mean we should just copy their point of view. Some OR is necessary to separate it. —CodeCat 17:43, 10 November 2013 (UTC)
I have references that prove that the earth is a flat surface. Can I add it to our definition on earth since it's sourced? -- Liliana 17:47, 10 November 2013 (UTC)
Since Wiktionary only cares about meanings of attested words, you could do that only if you have attestations of Earth being used in the meaning "flat surface". Which is when you think about it not that improbable.. --Ivan Štambuk (talk) 18:07, 10 November 2013 (UTC)
But why not copy their POV, if it is clearly marked? Every reconstruction and etymology in general is a POV by the person signing it. --Ivan Štambuk (talk) 18:00, 10 November 2013 (UTC)
We shouldn't sort e.g. Category:Turkic languages into Category:Altaic languages, but I suppose the latter category still has a reason to exist, namely to contain Category:Proto-Altaic language (and Category:Terms derived from Proto-Altaic). I think it's acceptable to include Altaic theories in etymologies as long as they're sourced and qualified by a mention that the existence of Altaic is controversial, e.g. "As part of the controversial Altaic theory, Smith connects this word to Japanese foo'." A stronger qualifier than was suggested here, certainly. - -sche (discuss) 17:36, 10 November 2013 (UTC)
Maybe it can be mentioned, but this shouldn't warrant a whole category system, or else we will soon have Category:Dene-Caucasian languages. -- Liliana 17:41, 10 November 2013 (UTC)
I'm not convinced that accepting one thing forces us to accept a second, more controversial thing. (If WT:RFD has taught us anything, it's that Wiktionary is capable of being inconsistent.) And if we do mention any Dene-Caucasian theories, I think it is a good idea to gather them in a category in case we later decide to delete them. That said, I'm not wedded to the categories; I could live with allowing qualified mentions of Altaic theories while denying them a code and a category. - -sche (discuss) 18:01, 10 November 2013 (UTC)
It makes sense for there to be a category. Categories are meant for categorization and it is certainly useful to have a category of terms that have proposed Altaic etymologies or of languages that are proposed to have descended from Altaic. The existence of the categories does not imply that we endorse these theories. --WikiTiki89 19:03, 10 November 2013 (UTC)
I certainly agree with the view aiming to create a category for Altaic languages as far as reliable sources are evident, and I think reliable sources such as ToB are worth to be used. --Hirabutor (talk) 20:28, 10 November 2013 (UTC)
So you want Dene-Caucasian too? -- Liliana 22:28, 10 November 2013 (UTC)
If anyone wants to add it, I wouldn't mind. --WikiTiki89 22:52, 10 November 2013 (UTC)
This has the potential of going on forever. Here's my $.02:
(a) There are (more or less) consensus theories, and there are (more or less) non-consensus theories;
(b) The traditional role of a dictionary would be to go with the best (= closest to consensus) theories whenever possible; this would make Altaic unacceptable (as it would Amerind, another proposed superfamily that has next to nothing of substance in favor of it -- yet there are references, "cognate" sets, "etymologies", etc.
(c) We might decide this is not the case for Wiktionary, since wiki-is-not-paper, we-have-plenty-of-space etc. Ivan suggests even older reconstructions should have a place here; in this case, we might even actually accept any proposed reconstructed form, any proposed hypothesis (including Altaic and Amerind), simply adding comments to it ("this hypothesis is rejected by most scholars", etc.; perhaps some handy templates could be created).
(d) However, this would complicate matters enormously for the casual reader -- since in principle Etymology sections could make reference to all these reconstructed forms, obsolete and dernier cri, consensus or far from it. Now, maybe an Etymological Dictionary should be a project in itself, independent from Wiktionary, in which all details of all published theories could be taken into account. But as long as we remain within Wiktionary, it seems we shouldn't try to limit etymological information to some extent.
(e) Which is why in the end I prefer to stick to the traditional practice: only things that are as close to consensus as possible. Therefore, no Altaic, no Amerind, no Nostratic, etc.; at least not in the Etymology sections. (One could, of course, create independent Appendix pages for all proposed reconstructions at all levels, perhaps with indexes in the Appendix to facilitate navigation; but only the (near-)consensus forms should appear in Etymology sections.).--Pereru (talk) 22:58, 10 November 2013 (UTC)
Another facet of this is the reference in some Japanese and Korean etymologies to comparison with Turkic and other Altaic languages (for instance, the one added in diff). I get the impression that the Altaic theories are more accepted in references for those languages- probably due to the lack of anything better for inherited terms in language isolates such as these. I'm not sure how easy it would be to even find all of the entries with these, let alone to convert them. Personally, I wouldn't mind reference in etymologies (with proper cautions/qualifiers) to a few of the more linguistically-rigorous minority theories, but we have to be selective. The sheer volume of mutually-contradicting speculative theories for isolates such as Basque and Sumerian, and (even some Indo-European and Afro-Asiatic languages) would make a horrible mess out of quite a few entries. Chuck Entz (talk) 23:59, 10 November 2013 (UTC)
I don't see a problem in individual theories and etymologies being mutually contradicting. There is functionally no difference between 1) unknown origin of a word in a language belonging to an widely accepted language family, or reconstruction established within a generally accepted protolanguage, which often have half a dozen proposed explanations ranging from probable to speculative. 2) Unknown etymology of a word in a language isolate, or reconstruction established within minority accepted protolanguage such as Altaic, which are inherently moderately speculative. Really, why shouldn't someone interested in Basque or Sumerian see a list of all of the proposed far-range etymologies, provided they are all clearly marked as speculative by their respective authors, and not generally endorsed. Perhaps a necessary notability filter should be published work? --Ivan Štambuk (talk) 05:39, 11 November 2013 (UTC)

How about this proposal:

  1. Altaic, Nostratic and other minority-held theories should only be created in the appendix namespace, as reconstructions in their respective protolanguage frameworks.
  2. Instead of the usual {{reconstructed}} template they would have {{reconstructed-minority}} which would clearly indicate that we're not dealing with a widely accepted theory.
  3. Language isolates (i.e. entries in the main namespace) and protolanguages (i.e. entries in the appendix namespace) covered by such theories should in their respective etymology sections link to such appendices only by means of a special template. There would be no lists of (potential) cognates - user would have to click on the link to the appendix page. That template would have a wording reflecting a degree of uncertainty, such as Within the controversial Proto-Altaic theory, derived from *X, where *X would link to the appendix page. Usage of such template would ensure that editors adding such etymologies don't overzealously emphasize genetic relationship.
  4. For such theories, only reconstructions occurring in published sources are allowed and references are mandatory.

Later if the community decided that e.g. listing of cognates in the main namespace would be appropriate, adding them would be a matter of copy/paste. --Ivan Štambuk (talk) 05:39, 11 November 2013 (UTC)

Altaicist theories should be allowed, for example in 두루미, although the actual reconstructions probably shouldn't, due to crudeness. Wyang (talk) 05:58, 11 November 2013 (UTC)

Well for the "crane" word Starostin-Dybo-Mudrak's dictionary reconstructs PA *tùru ( ~ *ti̯ùro). That doesn't seem too crude to me, as opposed to e.g. Nostratic etymologies which are full of cover symbols (V for vowel and similar). If there are multiple incompatible reconstructions, there is no problem in listing them all in the page name. --Ivan Štambuk (talk) 07:04, 11 November 2013 (UTC)
So what was the revert about? Any good reason for that? -- Liliana 07:27, 12 November 2013 (UTC)
It seems that most of the interested people support having Altaic reconstructions in some form. Your removal of tut and tut-pro codes from Module:languages and Module:families has also caused script errors in some instances where they were used through {{term}} and {{etym}}. --Ivan Štambuk (talk) 07:43, 12 November 2013 (UTC)
But what does the existence of Proto-Altaic have to do with making Turkic, Mongolic etc. subcategories of the Category:Altaic languages? If we have this, what stops people from making Category:Indo-European languages a subcategory of Category:Nostratic languages? -- Liliana 08:17, 12 November 2013 (UTC)
Isn't it required for automatic categorization to work properly? I think it's best that the treatment of proposed but not generally accepted families be handled on a case-by-case basis. We can ignore issues connected with Dene-Caucasian, Nostratic etc. until someone sufficiently knowledgeable starts adding their etymologies/protolanguage appendices. Each of those proposed families has a different set of issues. --Ivan Štambuk (talk) 09:33, 12 November 2013 (UTC)
I think it's only needed so categories like Category:Terms derived from Altaic languages get proper subcategories depending on the members of the family. Arguably, if we leave everything as is and only let {{etyl|tut}} directly categorize in there, we can just leave everything as is and have Category:Altaic languages with no language family subcategories. -- Liliana 13:02, 12 November 2013 (UTC)
I agree that Japonic, etc, shouldn't be in Category:Altaic languages. We shouldn't categorize language families into hypothetical superfamilies the existence of which is disputed / rejected by many. Our categorization system is slipping far enough out of sync with modern scholarship as it is (people have already raised issues with our categorization of things as Finno-Ugric vs Uralic). - -sche (discuss) 17:39, 13 November 2013 (UTC)
If it doesn't break anything, feel free to remove it. Also, what is really the problem with having a language categorized into controversial, or multiple and incompatible Stammbaums? It's not like putting a language into a category means "the official position of Wiktionary is that Japanese is an Altaic language". It's just meant to look things up. Slapping a banner that says "This category represents a language family that is not generally endorsed" should suffice IMHO. --Ivan Štambuk (talk) 06:15, 14 November 2013 (UTC)

You can lead a bot to the river, but you can't feed itEdit

Trying to feed the bot recently, I got an error message saying I wasn't allowed to edit another user's pages. Any way to make an exception? --ElisaVan (talk) 11:29, 11 November 2013 (UTC)

The filter in question came up recently in the Grease Pit (quod vide). Perhaps User talk:BuchmeierBot/FeedMe could serve as a staging area, with the understanding that the bot owner or others must review the accuracy of anything posted there before letting the bot create the entries. I see you've already thought of that. :) - -sche (discuss) 06:51, 12 November 2013 (UTC)

Toggle (HTML/CSS/js) questionEdit

I wanted to inquire whether there's a way to modify the toggle link text (and its styling, e.g., make it superscript) specifically on en.wikt. Here's what my creation looks like right now {{lv-pron}}, but instead of "expand" (or "collapse") I'd like the link/button (or w/e it should be called) to say "references." I'm using the built in toggle functionality that's shipped with all MediaWiki installations (for example, Wikipedia's toggle templates didn't work on here.) Is there any way to modify the text/style of the toggle link/button? In my tests it seemed to be immune to a parent element (another div wrapped around it) having, e.g., font-size:small. Neitrāls vārds (talk) 21:51, 12 November 2013 (UTC)

See mw:ResourceLoader/Default_modules#jquery.makeCollapsible. You can use the data-collapsetext and data-expandtext attributes to change the labels, or use customtoggles to make the toggle look however you'd like. --Yair rand (talk) 22:35, 12 November 2013 (UTC)
Thanks! Neitrāls vārds (talk) 16:23, 13 November 2013 (UTC)
Don’t we already have about five different ways of including footnotes? Heck, this is putting three different interfaces into a single short bullet point: link for “IPA”, superscripted link in parentheses for “key”, and now add a superscripted expand control with a dingbat character in square brackets for “references” (which is apparently meant to reveal a link decorated with an icon leading directly to a raw audio file?).
The type design is also too busy. Professional designers avoid superscript text containing whole words, punctuation, or extra dingbat symbols for good reason.
This looks like far too much visual clutter and cognitive overhead for a reader. Why can’t we just link something with a link any more? Michael Z. 2013-11-13 16:42 z
Personally, I can't stand sliding and would prefer if it expanded instantly. --WikiTiki89 16:59, 13 November 2013 (UTC)
We need to expand a two-word link at all?
More importantly, how does automatically-generated machine audio constitute a reference for pronunciation of human language? Michael Z. 2013-11-13 17:23 z
By the way, I don't believe there was much consensus for including synthesized speech. --WikiTiki89 17:40, 13 November 2013 (UTC)
Bingo. Just include "(speech synthesis)" in parentheses after the IPA, unhidden, if it is to be included at all. But like Michael, I'm not sure auto-generated speech can function as a reference... - -sche (discuss) 17:29, 13 November 2013 (UTC)
A reference is an authority. If we refer to it, we should formally cite it, not just throw an un-annotated link to it in the body of an entry.
But these Google pronunciations are not suitable references. They are synthesized based on pronunciations spidered from the web. I bet the main sources are Wikipedia and Wiktionary, as well as other dictionaries, but the synthesis is of unknown reliability, and is not an authority. For goodness’ sake, please don’t enter IPA transcriptions based on Google’s synthesized pronunciations!
If we link to these as additional resources, we should do so the same way as we do to other external links. Michael Z. 2013-11-13 23:51 z

Yes, the "references" did look kind of awkward so I changed it to "audio" yesterday. On the subject of adding as any other external reference - the way I want to make entries (well "researched") I'm already kind of pushing it (with how many references I have.) For example in bēbis I went up to 3. 1st ref'ing that it is indeed a borrowing from English, 2nd first attested use ever, in a non-standard form "bebijs," 3rd first attested use of the modern form "bēbis" in the rather authoritative multi-volume konversācijas vārdnīca. That's kind of a lot for a dictionary yet all of them add value to the entry. Labeling the synth. speech "references" might have been misleading I kind of intend it to be just an alternative because after all there are as many IPA styles as there are users, for example I think that AIDS#Latvian should be ['aits] another person could go for ['aids] yet another ['ajds]... So just an alternative. The synthesizer is just using a map of chars to sounds it (unfortunately) is not pulling IPA's from Wikt. (I hope that in future it could.) With "trickier" words it can be off (e.g., not usable on čoms) and should probably be used only by native speakers "with discretion." Not saying everyone should use it. Some people choose to use {{usex}} others choose not to, for example, I don't think that slight variations in layout like this are detrimental to the dictionary.

Personally I am more of a user than an editor on Wikt. (those detailed Latvian etyms just keep me coming back, also, English idioms...) And I think it's convenient from a user's perspective, for example, Hungarian has a rather similar sound inventory to Latvian (there are differences ofc.) But even with IPA I feel overwhelmed by words like agyhártyagyulladás, I mean, I know it would be aģhārķaģulladāš (respelled with Latvian letters) but it's actually quite an amount of mental work to read the IPA and put it together in your head. Stuff like this agyhártyagyulladás would be a boon to lazy people like me. Obv. I'd know it's a robot and that it might not be 100% correct but at that point I couldn't care less, I want to hear it and I want to hear it now, lol. I'm not saying HU editors should be adding that type of link to their entries although if they were I def wouldn't mind either. Neitrāls vārds (talk) 11:54, 14 November 2013 (UTC)

P.S. I agree that the sliding is slightly annoying but I didn't see how to disable it in the toggle documentation, let me know if you know how. Also, the IPA template shouldn't have a separate (key) link at all, imo. IPA should be linking to what (key) is linking to right now. Right now IPA links to a page with the academic names of all of the sounds which might not be the most relevant page to link to with every mention of IPA. Neitrāls vārds (talk) 11:54, 14 November 2013 (UTC)

If you yourself find it convenient then go ahead and enter the word in Google translate and click play. But as far as Wiktionary is concerned, bad pronunciations are worse than no pronunciations. --WikiTiki89 15:22, 14 November 2013 (UTC)

Displaying adjectives in the adverb headers?Edit

Is it a good idea to display adjectives in the adverb headwords? E.g. French heureusement (adjective heureux), Russian сча́стливо or счастли́во (adjective счастли́вый). I'm thinking of changing Russian Module:ru-headword a bit to allow an additional optional parameter, just using a French example to make it clearer what I need. If an adverb already has a comparative form in the header, what should the order be?

Is a display like this acceptable? For example for глубоко́:

глубоко́ (glubokó) (comparative глу́бже, adjective глубо́кий)

OR should it be

глубоко́ (glubokó) (adjective глубо́кий, comparative глу́бже)

Perhaps the 2nd example is more confusing, it may not be clear if comparative form refers to the adverb, to the adjective or both.

--Anatoli (обсудить/вклад) 23:36, 13 November 2013 (UTC)

I think that is a good idea and the first version is better. --WikiTiki89 00:07, 14 November 2013 (UTC)
OK, thanks. I'll wait for more feedback. Will change it later. BTW, I think Russian comparatives should be unified under adverbs, except for a few adjectival comparatives like худший, лучший, больший, меньший, старший, младший (that's all perhaps). The rest of comparatives are all adverbs and grammatically used differently, no need to duplicate headers like at лучше, хуже, etc, which have both Adjective and Adverb headers. In other words, most comparatives forms for most adjectives are adverbials (if they don't use additional words более or менее). --Anatoli (обсудить/вклад) 00:20, 14 November 2013 (UTC)
Yeah, I agree that лучше and хуже are not adjectives. --WikiTiki89 00:45, 14 November 2013 (UTC)
The derived terms header isn't good enough? DCDuring TALK 01:14, 14 November 2013 (UTC)
It's good enough but this is an alternative and a shortcut for adverbs derived from adjectives (usually not the other way around). Cf French [[heureusement]] is derived from [[heureux]] (not the other way around). It's similar in Russian for many adverbs. (Not planning to change French headers as well but it would make sense, IMHO). I highlighted "alternative", one can always do the usual way and the longer way ("====Derived terms====") and not all adverbs are derived from adjectives or may have a corresponding adjective. --Anatoli (обсудить/вклад) 01:32, 14 November 2013 (UTC)
So, then the adjective will appear in the "Etymology", right? And it will appear under "Related terms" as well, yes? So why confuse readers with other parts of speech in the headword line, which is meant to be a shortened form of the inflection section, with transcription information. I dont' think that mixing other parts of speech in the headword line is a good idea. --EncycloPetey (talk) 01:44, 14 November 2013 (UTC)
Are you sure this is a good idea? If it's good for de-adjectival adverbs, then why not for diminutives (cf. Dutch nouns), negatives, nicknames, and especially more-frequently-used synonyms, as well as other possible targets for shortcuts that someone more creative than I could dream up?
I know that some view English de-adjectival adverbs ending in -ly as inflections, but this seems a minority view, not widely accepted even among linguists, let alone lexicographers. Is the situation different among Russian lexicographers and linguists? DCDuring TALK 01:49, 14 November 2013 (UTC)
Many Russian adverbs ending in -о /-е are short neuter adjective forms - честный (short form/neuter) -> честно. There is no problem with this view, AFAIK but I haven't done a research yet. Many Polish noun headwords includes diminutives as well, e.g. ryba#Polish. I don't see a problem with that. --Anatoli (обсудить/вклад) 02:12, 14 November 2013 (UTC)
They are not short neuter adjective forms, but independent words formed from an adjectival stem with the suffix -o/e. They should in fact be formatted under different etymologies, because these same-spelled -o/e are two different pairs of suffixes. --Ivan Štambuk (talk) 13:17, 15 November 2013 (UTC)

@EncycloPetey. With this approach the term won't appear in the "Etymology" or "Related terms". As adjectives and adverbs have the same root, the etymology will not be duplicated and may be only in adjectives. Re "mixing other parts of speech" is a stronger argument (or DCDuring's suggestion) against this approach. What about changing the header to e.g. from adjective глубо́кий? --Anatoli (обсудить/вклад) 02:27, 14 November 2013 (UTC)

Numerals reduxEdit

Is there any one place where our policy on when to describe something as a "number"/"cardinal number"/"cardinal numeral"/etc is comprehensively documented? A user recently changed quite a few entries from those things to "numeral", but I recall that there was actually a logic behind our use of the different terms. - -sche (discuss) 02:21, 14 November 2013 (UTC)

I stopped following the arguments when I took my extended wikibreak near the end of 2010, but at that time, no community consensus had been reached. However, three years have passed and some decision may now exist that I do not know about. --EncycloPetey (talk) 04:49, 14 November 2013 (UTC)
No policy, there was a vote but it failed. De facto policy seems to be to prefer numeral though. Maybe it's time to list pros and cons of both terms and restart the vote? --Ivan Štambuk (talk) 06:06, 14 November 2013 (UTC)

Vowel length needs to be marked in GreekEdit

Hello. I'm fairly new to Wiktionary but not to English Wikipedia, where I've done a great deal of work editing historical linguistics articles, esp. on Indo-European (IE) languages (Ancient Greek, Proto-Greek, Old English, Gothic, Proto-Germanic, Latin, various Romance languages, Old Irish, Proto-Celtic, various Slavic languages, Proto-Slavic, Proto-Balto-Slavic, Tocharian, Sanskrit, numerous Proto-Indo-European articles, etc.).

In this case, a number of Wiktionary articles on words for "drink" in various IE languages included references to the Greek word pī́nō "I drink", but written with no length mark on the i, either in the original Greek or the English transcription.

Length marks are (correctly) noted in all other languages on Wiktionary AFAIK, including Latin, Old English, Old High German, etc., and need to be there in Greek as well. In this case the length of the i is extremely important in understanding the close cognacy of the Greek word with e.g. the pi- of Slavic piti (stemming from long pī- in Balto-Slavic) and also Albanian and probably modern Indic languages (with pīnā- and the like) but less so with words like Sanskrit pibati, Latin bibō, Old Irish ibid, where the short i in all of these is a reduplication vowel and is unrelated to the long ī of the other forms.

In this case, I corrected the problem in the transcription of this word in the various pages, and they all ended up reverted, with a link to Wiktionary:Ancient Greek romanization and pronunciation, which asserts that e.g.

In Classical polytonic, the length distinction of ᾰ ([a]) and ᾱ ([aː]) is not indicated usually in writing nor in transcription. However, if ᾱ needs to be transcribed, ā suffices.

This appears to represent a Classicist viewpoint, where length is often omitted because the original texts omitted such length marks and the exact form of words is secondary to their meanings and the broader significance of literary texts. This is a fine practice in a Classicist context. However, Wiktionary is not a Classicist work but fundamentally a linguistic work, particularly when discussing etymologies, and from a linguistic standpoint this suggestion not to include length marks is completely, 100% wrong. All historical linguistic works that discuss Ancient Greek, whether by itself or in the context of other Indo-European languages, include length marks consistently on all Greek words cited (likewise on all other words cited in all other languages where phonemic length exists). Note that in Greek this applies only to α ι υ because the other vowels have inherently distinct ways of notating short vs. long vowels (ε vs. η and ει, ο vs. ω and ου).

We need to follow this practice, also. This should not be very controversial; I am at least 99% positive that all linguists will agree with me, because all follow these conventions and understand their importance.

If for some reason or other people object on aesthetic grounds to including length marks, they still need to be included in the transcription.

Please also note that in a linguistic context, transcription is critical and often exceeds in importance the inclusion of the original text. This is contrary to the Classicist viewpoint, as expressed e.g. by Atelaes, who said:

Transliterations are never used here as a substitute for the original script, as they are in many other contexts. They are a pedagogic tool, used to help those who don't understand the original script, which they accompany. So, they are an approximation for the uninformed. A highly precise technical transliteration is unnecessary, and serves only to confuse those whom it is meant to help.

This viewpoint however is wrong from a linguistic standpoint. As a simple demonstration of this, consider the discussion of the etymology of the Old Irish word ibid "he drinks", which either does or should make references to Latin bibō and pōtō, Greek pī́nō, Armenian ǝmpǝm, Sanskrit pibati, Old Church Slavonic piti. (Various of the articles on these words, all meaning "drink", reference various of the other words, but not all articles reference all words.) There are at least four non-Latin scripts here (Greek, Armenian, Devanagari, Cyrillic) if we insist on representing the words in their original scripts. Requiring that all our readers understand all of these scripts and claiming that transcription is of secondary importance and only for "uninformed" readers will make everyone go utterly crazy. It's for this reason that Indo-European historical linguistics books often don't bother to include the original script at all, but only the transcription. An exception is often made for Greek in highly technical works because it's assumed that the highly technical readers of them will know Greek script, but layman introductions (e.g. Benjamin Fortson's "Indo-European Language and Culture: An Introduction", James Clackson's "Indo-European Linguistics: An Introduction", Philip Baldi "An Introduction to the Indo-European Languages" etc.) invariably transcribe Greek and often leave out the original script, as with the others. The intelligent layman reader of these books is the same type of reader paying attention to the etymology entries, and we should follow the same conventions used in these books. I'm not suggesting throwing away the original script (which is also extremely useful, for a slightly different but still important set of readers), but (a) the transcription is absolutely key and must be included whether or not the original-script text is present, and (b) vowel length must always be notated, both in the original and in transcription.

I suggest that the text on Wiktionary:Ancient Greek romanization and pronunciation should instead read

Although in Classical polytonic, the length distinction of ᾰ ([a]) and ᾱ ([aː]) is not normally indicated in writing, Greek words in Wiktionary should indicate vowel length both in writing and transcription, with the long vowel indicated as ᾱ, transcribed as ā.

Similarly for ι and υ.

Benwing (talk) 10:02, 15 November 2013 (UTC)

  • I agree with default scholarly transliteration for Greek, Arabic, Persian (macrons instead of circumflexes), Russian and so on. For Ancient Greek this could easily be remedied by fixing Module:grc-translit. Greek lengths, however, should only be displayed in the headword line, and not in a page name, like it is the practice for Latin (and stripped when wikilinking with {{term}} and {{l}})). --Ivan Štambuk (talk) 12:45, 15 November 2013 (UTC)
  • Would these accent marks interfere with other Polytonic accent marks? --WikiTiki89 14:40, 15 November 2013 (UTC)

I have addressed this issue and some closely related others a number of times, and so I imagine many will tire of reading this. I think that Benwing does well to raise the issue of context, of exactly what type of work we are and/or what we are trying to be. However, my mind produces a different answer than theirs, which might well explain our disagreement. I think that Wiktionary is supposed to be a general reference work. We are trying to give every possible reader every bit of information on a given word or phrase that they might want to know. This is, of course, impossible (to those who still clung to that lofty ideal, I apologize for shattering your hopes and dreams). Different readers have different needs, and to put any one bit of information that one would like runs the risk of confusing or distracting another. That being said, impossibly lofty goals are often worth striving for nonetheless. When Benwing says that we are or should be a linguistic work (I can only assume they mean historical linguistics, based on their other comments) I must disagree. To be clear, I am glad that we have the capacity for more involved etymologies than most comparable reference works. I feel quite proud that "my" dictionary has full-blown entries for hypothesized terms in hypothesized languages. I think all of this is useful and interesting and I absolutely support its inclusion. However, I simply can't believe that this is our primary thrust. If I were forced to come up with a most common use scenario, I would think it would be more along the lines of someone encounters a word or phrase while reading or speaking, and wants to know what it means. Knowing the history of a word can definitely help flesh out the answer to that question, but I feel it must be secondary to the definitions. And so I will say, as I have said before many times, that I think our transliterations serve to bridge the gap for someone who does not know the script, and that highly nuanced and technical transliterations do a disservice to the majority of their users. Information of such a nature should be (and often is) covered in the pronunciation section of an entry, where we can document specific dialectical and temporal nuances. In spite of its admitted shortcomings (I think it would be well served to be rewritten in Lua, which I have long-term plans to do), I would hold out {{grc-cite}} as evidence that we can provide accurate and precise phonological information without burdening our transliterations with it. This raises another problem with highly technical transliterations, namely that "Ancient Greek" covers over two millennia. Greek is about as conservative a language as they come, but there were nonetheless a number of important sound changes over that period. The difference between long and short alphas, iotas, and upsilons only exists for the briefest of moments. For the majority of the time there is no such difference. Mind you, even the rough transliterations that we currently have run into that problem, as many of the vowels converge on /i/. But in my opinion, this simply serves to reinforce the need for as basic a transliteration as possible. One possibility which might serve as a compromise would be to have a different transliteration format for etymological contexts vs. others. -Atelaes λάλει ἐμοί 03:10, 16 November 2013 (UTC)

  • Is there a way to generate different transliterations in Lua based on user's preferences? --Ivan Štambuk (talk) 06:57, 16 November 2013 (UTC)
    No: all pages contents are the same for everyone. You need javascript/Gadgets to customize content. Dakdada (talk) 12:47, 16 November 2013 (UTC)
    Could we have the transliteration modules/templates output two transliterations, one Classicist and one Scientific, and then use javascript to hide one or the other (per each user's preference)? That would be awfully complicated even if we could do it.
    Personally, my inclination is to indicate length, and to favour scientific transliterations generally; the only thing that gives me pause is Aelaes' point that "the difference between long and short alphas, iotas, and upsilons only exists for the briefest of moments". I don't think users who can't read Greek script are going to be confused by a long vowel mark any more than any of the other accent marks we use... especially given that we do indicate vowel length in Latin (and Old English, etc). - -sche (discuss) 15:07, 16 November 2013 (UTC)
That wouldn’t be too hard. Such a framework would also allow the reader to choose IPA/SAMPA/respelling for pronunciations, and their choice of standards for romanization in other languages.
A template can output two or more romanizations, perhaps in an HTML unordered list. Our default CSS can hide all but the first one. A simple JavaScript widget can introduce a control that toggles CSS visibility for the different list items.
Issues: What would an unobtrusive control look like? This should only be used with automated romanizations – too complicated to deal with missing items, keeping multiple romanizations updated in every etymology where a term appears. We should stick to romanization according to standards, offering readers reference information, not our own unpublishable wikibation. Michael Z. 2013-11-16 17:51 z
  • I think the claim that "the difference between long and short alphas, iotas, and upsilons only exists for the briefest of moments" is a red herring. The difference between long and short vowels did eventually disappear in Greek, but it was present during the Golden Age of Classical Greek literature (the period most people who read Ancient Greek are interested in) and it was present in all older stages of Greek, making it of crucial importance in etymologies. Macrons should be used to mark vowel length in Ancient Greek in all circumstances where they're used for Latin, Old English, etc.—and not just in transliterations, but also in the Greek script directly. Thus for example the headword line of ἄγκυρα should read ἄγκῡρα (but isn't it actually ἄγκῡρᾱ, despite what the pronunciation section says?), and in the etymology section of ibid#Old Irish, the Ancient Greek cognate should be listed as {{term|πῑ́νω|lang=grc}}, and Lua should know to link "πῑ́νω" to πίνω and to transliterate it pīnō. —Aɴɢʀ (talk) 17:03, 16 November 2013 (UTC)
    One option is to install a MW extension that would enable Lua/templates to fetch user's name, and then we could have per-user settings in e.g. Module:User:Xxx/conf for transliterations and other things of dispute. --Ivan Štambuk (talk) 18:09, 16 November 2013 (UTC)
    I think that is a very bad idea. We should keep as many preferences in Special:Preferences as possible. If there were an extension that could pull information from preferences, that would be a different story. --WikiTiki89 18:16, 16 November 2013 (UTC)
    Why should we? Generating all of the possible outputs in Lua and hiding unwanted ones in Javascript sounds like...bad engineering. --Ivan Štambuk (talk) 18:25, 16 November 2013 (UTC)
    I think we're looking at this the wrong way. The crucial decision is what the vast majority of our users are going to see, what the default is. Quite frankly, making a user preference for different Ancient Greek transliterations is probably a waste of resources. Let's focus on a singular decision. -Atelaes λάλει ἐμοί 18:35, 16 November 2013 (UTC)
    I think such an extension, even if possible to implement, would defeat the extensive caching system built around here (template expansion results could no longer be cached, because they depend on the user who is viewing the page). Also, it would be possible to write code like if USERNAME == "Ivan Štambuk" then return "" else return "[[User:Ivan Štambuk]] smells" end. Now tell me, should Special:Whatlinkshere/User:Ivan Štambuk list a page which invokes such a module? So I would not expect anything like that installed here. Keφr 18:37, 16 November 2013 (UTC)
    It would've been a plain table loaded using mw.loadData, so no. But the caching concern is legitimate, if the caching framework doesn't take username into account. --Ivan Štambuk (talk) 19:00, 16 November 2013 (UTC)

Block of User:KephirEdit

A few minutes before he replied to a post by Kephir (talkcontribs) in the #Vowel length needs to be marked in Greek discussion on this page, Ivan Štambuk (talkcontribs) blocked him for a day with the stated reason of "stupidity". I think this was totally inappropriate. If Kephir were an admin, it would be merely rude, but he's not, so he would have been unable to edit until tomorrow- for no reason I could see in his edits. I notice he tagged a vandalistic page for deletion yesterday, so maybe Ivan mistook the deleted content for something he had created- otherwise, it looks like an overreaction to a harmless joke that wasn't even meant as an insult. I unblocked him, since he's been doing a lot of good work that we really need right now and sidelining him for a day hurts us more than it hurts him, not to mention being totally wrong. Chuck Entz (talk) 21:18, 16 November 2013 (UTC)

Ivan Štambuk smells is obviously nothing but insult. Your excuse for unblocking is a load of BS. --Ivan Štambuk (talk) 21:48, 16 November 2013 (UTC)
So is your mom. -- Liliana 21:56, 16 November 2013 (UTC)
This community is starting to disgust me. --Ivan Štambuk (talk) 22:00, 16 November 2013 (UTC)
I think you're just overreacting. -- Liliana 22:06, 16 November 2013 (UTC)
The general rule in most Wikis is that involved admins don't use their admin tools. And calling it "stupidity" is insult that confuses the real reason for unblocking, and calling Chuck Entz's statements an "excuse" that is "a load of BS" is pretty insulting and unproductive.--Prosfilaes (talk) 22:09, 16 November 2013 (UTC)
Yes, Ivan Štambuk smells would indeed be an insult, if Kephir actually meant it. What he actually did was use it as a hypothetical example of how an insult could be hidden from its target. He may have thought he was being mischievous in choosing his example, but I don't think he meant to actually insult you. That aside, a single insult in a discussion shouldn't be grounds for blocking. If it were, we wouldn't have many people left around here.
Look, I'm not trying to demonize you here: you've been in heated arguments, emotions have been high, and you've no doubt felt at times like you're surrounded by an angry mob with torches and pitchforks. I can understand why you would feel the need to show you're standing up for yourself, but this kind of thing will only make it far worse. Please step back, take a deep breath, and try to see how it looks to everyone else. Chuck Entz (talk) 22:54, 16 November 2013 (UTC)
Are you everyone else? --Ivan Štambuk (talk) 09:01, 17 November 2013 (UTC)
I'm pretty sure that everyone is everyone else. (With the possible exception of Liliana and CodeCat. They say you're "overreacting", which implies that your reaction, while disproportionate, is to an actual thing; whereas as far as I can tell from your comments, you're reacting to a thing that did not occur. Kephir did not say or imply that you smell, he simply mentioned the utterance "[[User:Ivan Štambuk]] smells".) —RuakhTALK 06:52, 19 November 2013 (UTC)
How can you know what everyone else thinks? That "utterance" is a thinly-veiled ad hominem attack, esp. when you consider the edit summary behind it (I couldn't think of any other example..). You're justifying personal attacks and later lament how the quality of discourse is being degraded. --Ivan Štambuk (talk) 07:40, 19 November 2013 (UTC)
Ah, I hadn't seen the edit summary. O.K., that changes things. (To be clear: I'm really not sure what the edit-summary meant. The "example" it refers to could just as well have been the general concept of a hidden insult, as the specific insult chosen to demonstrate the concept. But I at least see how the edit-summary could lead you to view the edit as offensive.) So, I now agree with Liliana and CodeCat. —RuakhTALK 07:54, 19 November 2013 (UTC)
I think Ivan is overreacting but Liliana's reaction is adding fuel to the fire as well. —CodeCat 22:15, 16 November 2013 (UTC)
The block was clearly inappropriate. - -sche (discuss) 22:23, 16 November 2013 (UTC)
Yeah, next time editors' body smell and moms are mentioned I'll remember to take it as a harmless joke and not incivility. Must be a cultural thing. Might even use it myself. --Ivan Štambuk (talk) 22:48, 16 November 2013 (UTC)
Take some lessons from Vahag. -- Liliana 08:23, 17 November 2013 (UTC)
Sorry I'm not gay. --Ivan Štambuk (talk) 08:59, 17 November 2013 (UTC)
Hey, I'm not gay! I'm tolerant. --Vahag (talk) 09:27, 17 November 2013 (UTC)
Ivan, would you mind explaining what being gay or not gay has to do with editing Wiktionary? That was definitely off the topic. --Hekaheka (talk) 15:13, 19 November 2013 (UTC)
@Hekaheka: Yes it is offtopic. (as this entire subthread which was started off-topic, so replying off-topic to an off-topic comment is not really off-topic). --Ivan Štambuk (talk) 19:10, 19 November 2013 (UTC)
Oh, good. Since Vahag's tolerant, I guess we “east-coast liberal pinkos” can get off her case now. ~ Röbin Liönheart (talk) 21:46, 19 November 2013 (UTC)
  • diff made by User:Kephir was an insult, even if thinly veiled one, but I think we shouldn't block people for such behavior unless it becomes a pattern. If a block were to be considered, then a mere slap in the face lasting 15 minutes and not one day, but again, better avoid blocking. As far as policies, the block seems to violate WT:BLOCK, whose complete policy text is "The block tool should only be used to prevent edits that will, directly or indirectly, hinder or harm the progress of the English Wiktionary. It should not be used unless less drastic means of stopping these edits are, by the assessment of the blocking administrator, highly unlikely to succeed". (Should we remove all the non-policy stuff from the policy page to ensure that the policy page really only states the policy? Then we would not need the preable on the policy page as well.) --Dan Polansky (talk) 18:47, 19 November 2013 (UTC)
    I think Kephir was just trying to make the example more humorous. I don't think it was meant as an insult. --WikiTiki89 18:58, 19 November 2013 (UTC)
    An innuendo insult works and is set up in such a way that the reader or decoder cannot be completely certain that it was an insult, but there are some clues that it was. If you cannot be certain that it was not meant as an insult, and at the same time are not certain that it was one, the innuendo has succeeded. --Dan Polansky (talk) 19:03, 19 November 2013 (UTC)
    This wasn't innuendo though, it was literal but meant as a joke. --WikiTiki89 19:22, 19 November 2013 (UTC)
    What I find innuendo-like is that the insult was embedded in quotation marks as part of example code. The implied "that was just a joke" tone--if so perceived by several people--seems innuendish to me as well, along the motto of "I'll make an attack that can be explained away as a joke; let's see if you can retaliate". --Dan Polansky (talk) 19:35, 19 November 2013 (UTC)
    The reason it could be explained away as a joke is because it could be a joke. I didn't see any reason in the context of the conversation for Kephir to genuinely want to insult Ivan, and that is an even stronger reason why I think it was not intended as an insult. Also, keep in mind that we are supposed to "assume good faith". --WikiTiki89 19:41, 19 November 2013 (UTC)
    @Dan Polansky: But there is also the explanation section which states: Causing our editors distress by directly insulting them or by being continually impolite towards them. So block was per policy AFAICS. Perhaps a special exemption clause should be added for "valuable" editors... --Ivan Štambuk (talk) 19:05, 19 November 2013 (UTC)
    The only part of WT:BLOCK that is policy is the part in "Policy" section there; the part in "Explanation" section is not a policy. If you read the page carefully from the beginning, you should be able to realize that. That said, I think the WT:BLOCK page set up is unfortunate.--Dan Polansky (talk) 19:11, 19 November 2013 (UTC)
    @Dan Polansky: Well in that case it's left up to the reader to interpret what constitutes harming the progress of Wiktionary. This "deliberate vagueness" reminds me of how certain justice systems are set up so that cases are not decided on the merits of evidence but on the political will and inclination of the judges (in our case: bias of admins raising/lifting the blocks). I must agree with your suggestion to remove/relocate non-policy sections of the page which are set up in a very misleading manner. --Ivan Štambuk (talk) 19:41, 19 November 2013 (UTC)

Including multiple transliterations, from multiple systems, in entriesEdit

Past discussions: Wiktionary:Beer parlour/2008/August#Romanization_example, Talk:канадієць, Talk:горілка.

Following the 2008 BP discussion I linked to above, Michael created ====Transliterations==== sections in a few entries, detailing how their headwords were transliterated in 16 different systems, namely Linguistic, ALA-LC, ALA-LC simplified, BGN/PCGN, BGN/PCGN simplified, ISO 9, French phonetic, German phonetic, GOST 16876-71, GOST 1983, GOST 1986, Derzhstandart 1995, Ukrainian national, Ukrainian national simplified, Ukrainian passport 2004 and Ukrainian passport 2007. You can see examples here and here.

Last year, unaware of the BP discussion, I questioned on Talk:канадієць why so many transliterations were needed, and listed the entry on RFC; Anatoli replied that the standard translit seemed quite sufficient, and I considered the RFC resolved when User:CodeCat removed the transliterations section. Earlier today, I also removed the section from горілка. My reasoning is that it takes a lot of work to manually include 16 different transliteration systems on each page, it produces a lot of clutter, yet it is not, IMO, a lot of help. Furthermore, as long as the transliterations are entered manually, it's going to be a long time before they're present in all entries, and there's the potential for their contents to be wrong some of the time that they are present. I think it would make more sense to have a single Appendix:Cyrillic transliteration or Appendix:Ukrainian transliteration that explained all of the systems. Failing that, I expect that it would be preferable to have a template that worked with a module to supply all the transliterations automatically. (Either of those approaches would also allow new systems to be added.)

In response, Michael has pointed out that lots of things are manually entered, and missing or incorrect in some subset of entries — and he's pointed to me the 2008 BP discussion, which did show some support for the inclusion of all the various transliteration systems. So I thought I'd raise the issue and see how people feel, five years on... should we have these sections (and if so, should they be templatised/Luacised?)? Or would a central appendix be better? - -sche (discuss) 01:49, 17 November 2013 (UTC)

If adopted, this should certainly be automated with Lua, where romanizations are deterministic. I think this would generally include sources in alphabetic scripts, and not in logographic ones.
But even manually, it would no harder to maintain multiple standard romanizations appearing only one time in a table in an entry, than it has been to consistently maintain some of our ever-changing romanizations in an open set of mentions in etymologies, translations, usage notes and other internal links. Michael Z. 2013-11-17 02:47 z
Transliterations schemes are language-specific, so Appendix:Cyrillic transliteration doesn't make much sense. Per-language transliteration appendices should describe other transliteration systems as well, beside the one used by Wiktionary by default (some already do). Transliteration section seems like a good idea particularly for languages where there are multiple standards in widespread use. --Ivan Štambuk (talk) 09:14, 17 November 2013 (UTC)
Having many transliteration variants is useful in that a user may find words by searching their transliterations, upon which he may stumble in different scholarly works, regardless of what scheme that work is using. Wiktionary's native search cannot find Lua-generated transliterations, but Google can. That being said, I think only one user-preferred transliteration should be shown. --Vahag (talk) 09:26, 17 November 2013 (UTC)
Do you mean having the other transliterations hidden on the page? As far as I know, Google purposely ignores hidden text to discourage keyword-stuffing spam. Michael Z. 2013-11-20 18:37 z
Yes, I mean they should be hidden on the page. Google does not ignore the forms generated by Template:hy-decl-noun-pos, which is hidden. --Vahag (talk) 19:01, 20 November 2013 (UTC)
So perhaps the form can be improved, but would you be in favour of a “Transliteration” heading, as in this revision?
Do we prefer a “Romanization” heading title, which implies transliterations and transcriptions into the Latin alphabet only? “Transliteration,” which technically might exclude non-alphabetic scripts and phonetic transcriptions, but could include Cyrillization, etc? “Script conversion” or something, which would be a vague catch-all? Michael Z. 2013-11-20 19:13 z
I do not think that non-alphabetic scripts are excluded by "transliterations". And phonetic transcriptions are supposed to be in the "Pronunciation" section. --WikiTiki89 19:35, 20 November 2013 (UTC)


Well, for logographic scripts or abjads romanization is often essentially transcription, since not all phonemes are represented in the source text. But yes, this kind of romanization is still distinct from pronunciation.
(Unfortunately, our community hasn’t agreed to such terms, so, e.g., we see Russian pronunciation where we expect transliteration. Pet peeve.) Michael Z. 2013-11-20 20:04 z
As I wrote here, I do not like the idea of introducing new headers. In my ideal Wiktionary, on the left sidebar under Tools you would have something like "switch to transliteration scheme X". You could change the transliteration scheme used on that page immediately. I don't know where the transliterated variants would be stored technically. --Vahag (talk) 19:58, 20 November 2013 (UTC)
It's hard to imagine a sidebar that has transliteration options for every language on the page. --WikiTiki89 20:01, 20 November 2013 (UTC)
I thought of such a scheme too. Disadvantages include the inability to see all transliterations of a word together. And what would the tools menu look like when you have a page with several languages, each supporting several transliteration schemes? Michael Z. 2013-11-20 20:04 z
Relevant questions:
  1. It would be easier to start with a simple list of transliterations on the page, and develop it into a more sophisticated interactive version.
  2. If there is any value to comparing different transliterations of a word, then they ought to be shown somewhere.
  3. If they are not shown on the term’s page, then where? An appendix? That would lead to extra work for editors, and for interested readers.
Vahagn, I understand the value of avoiding the clutter of unnecessary headers, but why shouldn’t we introduce headers for completely new categories of information? Does a list of transliterations belong somewhere else? Michael Z. 2013-11-23 17:23 z
Here is what I think:
  • There is absolutely no advantage to be gained from listing variant transliterations that differ only the set of characters they employ.
  • The only useful variant transliterations are the ones that fundamentally differ in how they transliterate. For example for Hebrew, I can see three different kinds of transliterations being useful (I use בְּרֵאשִׁית as an example):
    1. A purely graphemic one: brʾšyt
    2. A graphemic + diacritic one: bərēšīṯ
    3. A phonetic one suitable for use in a non-linguistic English text: b'reishít
    (For some reason we have chosen the third option as our main transliteration system for Hebrew, which I heavily disagree with.)
    For Russian, the same breakdown would result in three nearly identical transliterations. For чёрного: černogo, čórnogo, chórnovo, of which I think the second is the most generally useful and so I don't think any others are needed.
  • Beyond that, I see no reason for people to be picky about which characters we chose to represent what. The only two concerns in choice of characters are how widely used and recognizable they are (in the context of the language in question) and how legible they are (for example, some have complained about the legibility of ʾ and ʿ).
--WikiTiki89 18:36, 23 November 2013 (UTC)
I think a list of transliterations in one place does have some value, albeit a little one. Does that merit a new header? I don't know, I'm on a fence. --Vahag (talk) 18:55, 23 November 2013 (UTC)
Two issues: If you need ISO 9 or ALA-LC, then you need ISO 9 or ALA-LC. Secondly, if I'm searching for chornovo, then the system is likely to find chórnovo but not čórnogo. If someone is looking at pre-transliterated text, or just wants to search without figuring out how to enter Russian, that could be very important.--Prosfilaes (talk) 20:10, 23 November 2013 (UTC)
That problem is very difficult to solve. We can't possibly account for infinitely many different transliterations such as chornogo, chyornovo, chernogo, tchiornovo, tchiornogo, chornavo, chornago, etc. --WikiTiki89 20:21, 23 November 2013 (UTC)
Well, we can’t account for an open list of spellings of a word throughout the history of English, either, but we document attested spellings. Similarly, we can list all of the standard romanizations, thereby showing 90% of romanizations, and 99% of the ones likely to be seen in recent and future publications, maps, the news, etc. Neither list of spellings or romanizations is actually infinite. Perhaps we should manually add attested non-standard romanizations too, if someone is interested in that kind of comprehensiveness. Michael Z. 2013-11-24 19:55 z
If our main concern is people searching for words by transliteration, then perhaps we should come up with a transliteration search feature, which would be much easier to implement and much more useful than a list of a bajillion transliteration variants. --WikiTiki89 20:41, 24 November 2013 (UTC)
I think that it increases accessibility for people looking up foreign terms they find, for example. But entering transliterations is just a matter of dropping in a luacized template: дуля#Ukrainian: dúlya (BGN/PCGN), dúlâ (ISO), dúli͡a (ALA–LC), dúlja (Scholarly), dúlia (UNGEGN), dúlja (ISO 1968). how would you implement transliteration search at all? Michael Z. 2013-11-26 03:43 z
The same thing but the opposite, and use the output as a search query. It's easier to convert many variants into one (the one being the actual Cyrillic spelling) than it is to convert one to many variants. --WikiTiki89 03:52, 26 November 2013 (UTC)
Ah, smart solution. How do we make a search query or link from a template or Lua module? I didn’t know we could output arbitrary links.
However, some transliterations are reversible, and others are not. This is a very useful application supporting the choice of reversible systems. Michael Z. 2013-11-26 17:01 z
You can reverse to all possible variants, which will probably not be too many. --WikiTiki89 17:07, 26 November 2013 (UTC)
Okay. Might be workable if the searcher knows the romanization system, or at least the language. However, who is going to see some foreign word in an English text and decide to go to Wiktionary’s “Search for romanized Ukrainian” page? Sounds like an obscure, complicated solution that has already been solved.
We already have two transliteration searches that work for all languages. They’re called “Search” and “Google.” But there are few transliterations to be found (although the search field doesn’t currently find Lua-generated text, right?). Michael Z. 2013-11-27 16:59 z
I oppose multiplication of transliterations. Didn't we vote to delete one recently? No transliteration is liked by everybody. There will always be people who hate, don't understand, refuse to use it, etc, no matter how good, standard, useful or common it is.
On the matter of searching, "koráblʹ" doesn't find [[корабль]] and it seems terms with stress marks are not searchable by terms without them - "кора́бль" -> "корабль" in the search window. I hope this can be fixed. --Anatoli (обсудить/вклад) 03:46, 25 November 2013 (UTC)
I don’t understand your argument about hating transliterations. We don’t omit attested terms from the dictionary because someone hates them. But if you make up a word that’s unattested and defies logic, we don’t include it. Standardized and made-up transliterations would be treated similarly. We could consider including attested transliterations too.
By “search window” do you mean MediaWiki’s built-in search? Searching for koráblʹ fails there, but everything else works. Searching for кора́бль returns the term as the first result, and also adds a “did you mean” link to it. My browser’s search on the page always finds the Cyrillics or transliteration, ignoring the diacritic on either search term or result. Googling either koráblʹ or кора́бль returns the Wiktionary entry as the first result. Michael Z. 2013-11-25 15:45 z

Language Visual AidEdit

It would be nice to have a visual aid like this, except with the languages being in smaller font and putting the translation in the same box of the language. However, I'm not sure how hard this will be to implement, as I know little of programming, nor do I know how much support will be for this, so feedback will be appreciated. Sorry if this is a stupid idea. Воображение (talk) 02:48, 17 November 2013 (UTC)

We already have one. It's right here. --WikiTiki89 02:52, 17 November 2013 (UTC)
You misunderstood; I mean putting the translations in with the languages (Languages perhaps being italicised or in smaller font) and the words having hyperlinks. Воображение (talk) 03:04, 17 November 2013 (UTC)
I was just joking. But I think that would be too complicated and not very useful. --WikiTiki89 04:49, 17 November 2013 (UTC)
I think that it would be manageable, if we format it in a way similar to the picture dictionaries. And I don't see why it wouldn't be useful; comparison to other languages would be made much easier. Воображение (talk) 12:57, 17 November 2013 (UTC)
The technical aspect of this is easy (a simple template that contains the layout and takes parameters for each language). A few things to consider:
  • Where would such diagrams be located?
  • How will we add a translation for each language for each diagram?
  • The diagram you linked to is only Indo-European, how maintainable would it be if we added all of our language families?
As far as the usefulness, I may not be a great judge of it since I am a good visualizer and don't need visual aids. Maybe for others this would be useful. --WikiTiki89 16:18, 17 November 2013 (UTC)

Three bullets to counter your bullets.

  • These diagrams could be located under translation tabs like we have currently, or they could be images only to supplement the already-existing translations.
  • We don't have all the translations for each language in each translation list. We would have to omit the languages we don't have translations for (That's what I was afraid it would be hard for programming).
  • If we take the tabs idea of my first bullet, I would propose to add subtabs to each translation tab. Perhaps all the language isolates would be grouped into one.

It seems feasible, and it would be useful to at least some, I think. Воображение (talk) 21:14, 17 November 2013 (UTC)

Are there any other opinions? Воображение (talk) 04:02, 20 November 2013 (UTC)
I personally like your idea, but I'm not convinced that most visitors would be interested in such an illustration. I guess (without relevant statistics at hand) that visitors come here to translate a specific English word to a specific foreign language or to understand a specific foreign word's meaning. Maybe less people are familiar with (interested in) multiple languages and such tree visualizations. It may appeal to limited community of language enthusiasts (like myself), but this site's goal should be more 'global'. The other thing is that Wiktionary has already crystallized out its ways of presenting dictionary info and it would be extremely hard for the average user to achieve such a major change to the core functionality. By the way, in my opinion, this whole project should be seriously reconsidered and redesigned, but most contributors got used to it too much and have a kind of tunnel vision problem. It is no coincidence that Wiktionary's place in dictionary business is not even comparable to Wikipedia's place in encyclopedia business. Qorilla (talk) 21:11, 1 December 2013 (UTC)

Christmas Competition 2013Edit

This is an advance announcement of this year's Christmas competition.

In each round I shall specify a topic or category. You have to supply up to 26 different words/terms, each one beginning with a different letter of the alphabet.

Entries must be in the form of a string of words/terms separated by commas, missing entries being specified by a space or a null. Thus if the topic were animals your entry might look like "aardvark,bison,cat,dog,,,,,,,,,,,,,,,,,,,,,,zebra". By default all entries must be in English unless I specify differently for some rounds (which I shall). An appropriate Wiktionary entry must exist for each word/term. Points will be awarded as follows. If only one person submitted "aardvark" that person would be awarded one point. If two people submitted "bison" they would be awarded half a point each. If three people submitted "cat" they would be awarded one third of a point each. And so on. At the end of each round I shall total up each person's score, probably as a decimal.

Because more points are awarded if your entries are different from those of other people, you must not be able to see other people's entries until the end of each round. So, as in previous competitions, entry will be via email. Send your entries to Special:EmailUser/SemperBlottoComp.


I shall be having a trial run soon. This is to see how much work it is to do the calculations for each round, and therefore to see if a daily competition would be too much effort.

I shall create Wiktionary:Christmas Competition 2013 (with more detailed rules) after the trial run.

  • OK. Let's have a trial run using "animals" as the topic. Each entry should be a single, uncapitalised English word for a vertebrate that isn't a fish or a bird. Please send your list of animals before the cutoff time of 12:00 GMT on Thursday 21st November. The scores will not be used in the Christmas competition itself. Cheers. SemperBlotto (talk) 08:04, 19 November 2013 (UTC)
  • OK. Thanks to the few people who took the trouble to enter the trial run. It showed me that I must be very specific in my wording. The manipulation of the entries and calculation of scores was easier than I imagined. the results, if you are interested were as follows.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z username/total
anteater baboon cat dog elephant fox giraffe hyena iguana jackrabbit koala lemur mouse opossum platypus rabbit squirrel tortoise wolf zebra SemperBlotto
1 1 1 1 0.5 1 1 1 1 1 0.5 1 1 0 1 1 0 1 1 1 0 0 1 0 0 1 19
aardvark bat coati deer elephant frog goat hyrax ibex koala llama mudkip numbat okapi psittacosaur qantasaur riosaur sauropod tyrannosaur wavy-gravy zebu MetaKnowledge
1 1 1 1 0.5 1 1 1 0.5 0 0.5 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 19.5
apatosaurus Burmese Python chinchilla duck emu French Bulldog gopher tortoise horned lizard Indian elephant jewelled gecko killer whale lynx Mongol horse nimble-footed mouse oncologist porker quacking frog Rhinolophus Siberian Husky Tibetan antelope ularburong viper white whale yak zorro WikiTiki89
1 1 1 1 1 0.5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 24.5
ascarid bitterling chaetognath dealfish escolar false killer whale giant armadillo hake ichthyosaur jewfish kelpfish lobopod mrigal nase oilfish placozoan queen bee rotifer sockeye toadfish unicornfish viperfish walleye X-ray tetra yellowhammer zander Ungoliant MMDCCLXIV
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 26
Akita beagle corgi Dalmatian English Shepherd French Bulldog greyhound Havanese Irish Setter jack russell Karelian Bear Dog Labrador mastiff Newfoundland Otterhound poodle Queensland Heeler Rottweiler Samoyed Tyrolean Hound Utonagan Vizsla whippet Xoloitzcuintle Yorkipoo Zuchon Mzajac
1 1 1 1 1 0.5 1 1 1 1 1 0.5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 25
argali barasingha cuscus dugong eland farrow grysbok ibex jaguar korrigum labrador manatee ornithorhynchid pipistrelle rakali sambar taruca urial whale "This,that and the other"
1 1 1 1 1 1 1 1 0.5 1 1 0.5 1 0 1 1 0 1 1 1 0 0 1 0 0 0 18

Note: some of the above entries are not actually valid (capitalised, two words etc) but I let them stand as it was only a trial.

Just in case you are wondering - I slotted each entry into a .csv file, adding a blank line after each person's entry. Then converted it into an Excel spreadsheet, added each word's fractional score into the cells of the blank line and got Excel to do the sums. I then converted the file into a simple wikitable as above.

I shall now create Wiktionary:Christmas Competition 2013 and add it here nearer the time. If any of you have ideas for topics/categories, feel free to add them here, or let me know elsewhere. SemperBlotto (talk) 15:00, 21 November 2013 (UTC)

Cool! But you miscalculated the scores. My final tally according to the rules is 6.0, robbing WikiTiki and MetaKnowledge of their rightful trial-run placement. Michael Z. 2013-11-21 16:55 z
I took the liberty of wikilinking the entries. Michael Z. 2013-11-21 17:00 z
Nice work, Semper. It was interesting that few people actually took heed of the need for single, uncapitalized words for each letter!... I hope that in the actual competition, the rules will be made more prominent and clear to avoid confusion. I'm looking forward to it. This, that and the other (talk) 00:56, 23 November 2013 (UTC)
  • Among other things, however, this proves that you (SB) need to look through the submissions carefully. *cough cough* oncologist, mudkip. —Μετάknowledgediscuss/deeds 01:40, 23 November 2013 (UTC)
    Birds and fish were the only types of animals excluded. An oncologist is neither a bird nor a fish. --WikiTiki89 01:51, 23 November 2013 (UTC)
  • Cool, I'll be playing again this year. --ElisaVan (talk) 12:33, 24 November 2013 (UTC)
    • I can't guarantee that Asturian swearwords will be a category though. SemperBlotto (talk) 12:36, 24 November 2013 (UTC)

Call for comments on draft trademark policyEdit

If anything still calls on templates rather than Module:languagesEdit

FYI: over the past several months, many languages have been renamed, many lects have been deleted or merged into other lects, and many new languages have been encoded in Module:languages. Maintaining a single database of all the languages in the world is inherently difficult, and trying to maintain and sync two such databases is downright foolhardy. Hence it is unsurprising that few of the aforementioned changes have also been made to the language template side of things: Template:aqd and Template:nmj, for example, are redlinks, though Ampari Dogon and Ngombe have been added to Module:languages. Anything that calls on language code templates rather than on Module:languages is thus getting information that is several months out of date. This is not to even speak of Template:langrev.
If there is anything that still calls on the templates, I suggest it be updated to call on the module instead. If there is nothing that still calls on the templates, I suggest they be deleted.
- -sche (discuss) 06:01, 19 November 2013 (UTC)

I believe that the only things left using them are javascript entry creation aids that add things like =={{subst:en}}== as the language header. --WikiTiki89 13:59, 19 November 2013 (UTC)
If you search for +Desc within [[user:msh210/format.js]], you'll find code that adds a language name; it uses {{subst:xx}} (where xx is the language code, grabbed from its use in a template). What's the new code? Is it {{subst:#invoke:language utilities|lookup_language|xx|names}}?​—msh210 (talk) 19:54, 26 November 2013 (UTC)
Oh, and I just noticed that you refer to {{langrev}} also. Is that, too, no longer to be used? I use it quite a few times in that JavaScript. What replaces it?​—msh210 (talk) 20:05, 26 November 2013 (UTC)
I don't know if we have a function for that currently, but we are currently working on simplifying Module:languages and Module:language utilities (which will probably become one module). And we can certainly make such a function if it is needed. --WikiTiki89 20:13, 26 November 2013 (UTC)
The two uses of langrev, as documented in its documentation, were "to reverse-map language names to their codes" and "to determine a language's code" (from its name). Like Wikitiki, I don't know offhand whether or not there is already a function for calling Module:languages to do those things, but the data is already there. The module contains a list of all language codes, and sublists of all names each code/language goes by, so anyone wanting to know the canonical name of a code can look up the code and see what the first entry in the 'names=' field is. In reverse, anyone wanting to know the code of a particular named language can search for the name and see which code(s) it is associated with. It may be associated with several; for example, the module records that (although we call them all by different names so as to distinguish them) several languages have "Mari" as one of their names. If desired, one could check if the searched-for name was the canonical (first listed) name of any code. - -sche (discuss) 00:17, 27 November 2013 (UTC)
Looking up canonical names should be fairly easy, but it's not clear what should be done when the name is an alternative name that conflicts with the name of another language. A module could return a list of possibilities, but templates don't support lists (their biggest shortcoming I think, and an important reason to use modules) so it doesn't translate easily. —CodeCat 00:22, 27 November 2013 (UTC)
(In reply to Wikitiki's, -sche's, and CodeCat's respective replies.) Thanks for the info. I guess {{langrev}} stays for now, then, since it has no replacement yet. If it's easy to Luaify, great, but I haven't yet had the time to learn Lua (or module:languages).​—msh210 (talk) 01:16, 27 November 2013 (UTC)

Proto-Altaic voteEdit

..has been started, so if it interests you please vote if so that this contentious matter can be decided. --Ivan Štambuk (talk) 07:44, 19 November 2013 (UTC)

Places where WT:LANGTREAT and Module:languages disagree on the treatment of lectsEdit

Module:languages and its new subpages include(s) codes from the individual members of several language/dialect groups which WT:LANGTREAT says, without citing any discussions, should be merged. That means either WT:LANGTREAT needs to be updated to note that the individual varieties are allowed, or their codes need to be merged in the module. To this end, I have begun discussions [[here]] of each group of lects, namely the Azeri dialects, Aymara dialects, Baluchi lects, and Bikol lects, Dari and friends, and the Gondi lects, Kanuri dialects, Kongo varieties, Oromo dialects and Uzbek dialects. (RFM, you will recall, has a history of being used for discussions of merging lects and their codes.)
I have also started [[a discussion]] of whether or not to merge khk (Khalkha, aka Mongolia Mongolian) and mvf (Peripheral Mongolian) into mn (Mongolian).
Lastly, the merging of the Mari and Buryat languages (not into each other! lol) is underway [[here]] following this discussion. - -sche (discuss) 15:07, 20 November 2013 (UTC)

OUP words of the year and usEdit

Here they are:

  • selfie / selfy
  • bedroom tax: (in the UK) a reduction in the amount of housing benefit paid to a claimant if the property they are renting is judged to have more bedrooms than is necessary for the number of the people in the household.
  • binge-watch: to watch multiple episodes of a television programme in rapid succession, typically by means of DVDs or digital streaming.
  • bitcoin: a digital currency in which transactions can be performed without the need for a central bank.
  • olinguito: a small furry mammal found in mountain forests in Colombia and Ecuador, the smallest member of the raccoon family.
  • schmeat: a form of meat produced synthetically from biological tissue.
  • showrooming: the practice of visiting a shop or shops in order to examine a product before buying it online at a lower price.
  • twerk: dance to popular music in a sexually provocative manner involving thrusting hip movements and a low, squatting stance.

We did not beat them to the punch on three of the eight. Shouldn't we have? DCDuring TALK 16:18, 20 November 2013 (UTC)

  • Well, some of those are less than three-years old, so might not get past our CfI. I have often wondered which is greater - the number of English words that we have and they don't, or the converse. SemperBlotto (talk) 16:23, 20 November 2013 (UTC)
    That might be a problem rather than something we want.
    When I started to do such a comparison with Webster's 3rd International print edition, we lacked about as much as they did, which also implies approximately equal size.
    I am distressed that we don't seem to have the terms redlinked, as from requested entries, nor did someone try to enter them, apparently. We seem to have lost the attention of potential contributors of trendy words. DCDuring TALK 16:29, 20 November 2013 (UTC)
Binge-watch looks like two words linked with a hyphen, and not idiomatic. Mglovesfun (talk) 16:32, 20 November 2013 (UTC)
Bedroom tax is the popular name for the spare room subsidy. The latter probably isn't idiomatic but needs to be in the definition somewhere. The Wikipedia article is w:Welfare Reform Act 2012. Mglovesfun (talk) 16:38, 20 November 2013 (UTC)
To be honest, I haven't heard of any of those red-linked words and I've never seen selfie spelled selfy. It could be that they just aren't "trendy" enough for people to know them and add them. --WikiTiki89 16:43, 20 November 2013 (UTC)
@MG: But the scope of what is watched is limited to serial entertainment, possibly even excluding sports. Can one binge-watch youtubes? Can one binge-watch live entertainment or photos or drying paint?
@SB: It's use goes back at least to 2006.
@SB: Schmeat does not seem to be citable over more than a year.
@SB: Bedroom tax is citable and seems quite as idiomatic as poll tax and more than income tax or sales tax. DCDuring TALK 16:50, 20 November 2013 (UTC)
@Wikitiki: We aren't trying to merely document our idiolects, we're working on the language as a whole, I thought. As such we need newbies to contribute such terms unless we are willing to let ourselves be smoked on coverage of current terms by the likes of OUP. DCDuring TALK 16:50, 20 November 2013 (UTC)
My point was you shouldn't be distressed if a term isn't as common as you thought it was for a newby to add it. --WikiTiki89 16:52, 20 November 2013 (UTC)
I'd not heard or read any of these terms in real life. My personal distress is that I can't rely on the Wiktionary community being inclusive enough to keep me current on my language. DCDuring TALK 17:17, 20 November 2013 (UTC)
Re "binge-watch", you can most certainly binge watch youtube. As far as drying paint, you can certainly say it but I'm not sure if people actually do binge watch drying paint (and I assume this would require watching multiple spots of drying paint in a row). --WikiTiki89 16:54, 20 November 2013 (UTC)
The point is that the term is limited in use to only certain things. I would expect it to be used intransitively with the understanding that one was watching a large number of episodes of some kind of serial entertainment, such as a TV show, not any of the other things that could conceivably be the object of the word watch.
Does what you say about multiple spots of paint mean that seriality is built in to the term. Can one binge-watch a single 30-hour-long movie? DCDuring TALK 17:17, 20 November 2013 (UTC)
I would assume seriality is built into the term. A single 30-hour movie, I think would just be called "watching". But you got me on the intransitive point, if it is indeed used intransitively then it would be idiomatic. --WikiTiki89 17:21, 20 November 2013 (UTC)
That's the normal use of the term, but I don't think it's part of the definition; it's just that normally it doesn't make any sense to binge watch anything else. Watching a single movie wouldn't fit, because binge means excessive consumption, and one normally consumes a movie in one chunk. I think a 30-hour movie is not really a normal use of the word "movie", and whether or not it's binge watching I think would center on whether it's normal or reasonable to watch such a thing in one setting. If it was at a theater for one showing, I wouldn't say that watching a whole movie would be binge watching, because that would be the reasonable way to watch it.
I don't know about intransitively; you could say "I watch intensely", thus by extension "I watch bingefully"*, and "binge-watching" is just another way of forming the same concept as the last sentence. Transitivity seems to be a weak reason to split English words. On the other hand, binge drinking and binge eating have been around for years.
* It's got two hits on Google Books and one on Google Groups, which makes it technically legal for adding to Wiktionary, if you're worried about being too small. As it is a regular formation that English speakers will understand yet say "that's not a word", I'm not rushing to do so.--Prosfilaes (talk) 18:56, 20 November 2013 (UTC)
What I meant was, "I binge watch" is idiomatic because it implies TV shows or something similar, while "I binge watch XYZ" does not imply anything beyond binge and watch. --WikiTiki89 19:39, 20 November 2013 (UTC)
What defines the phenomenon of binge-watching is the unaccustomed mode of watching a series all at once on our time, instead of the traditional weekly instalments on a TV broadcaster’s time, and that modern technology and telefilm distribution (bittorent and Netflix) allows us to do this. Yes, it can also SoP refer to a “binge of watching,” but that is not this thing. If I binge-watch six horror DVDs on Halloween, that is not this. Whether the term “binge-watching” actually means this, for Wiktionary, depends on the usage that can be shown in three citations. Michael Z. 2013-11-20 18:59 z
Usual use does not make something idiomatic. Only exclusive use can do that. Therefore if citations are found proving that anything can be binge-watched, then the transitive term is not idiomatic. --WikiTiki89 19:39, 20 November 2013 (UTC)
Well, this Wired article uses the term intransitively, and seems to rely on this sense for meaning. Would that be a good example?
Also seen nouns: binge-watching, binge-watcherMichael Z. 2013-11-20 20:31 z
First of, the only two times that article uses "binge-watch" as a verb, it is transitive or an adjectival participle where the direct object is implied to be Breaking Bad. It does, however, support a noun sense of binge-watching. --WikiTiki89 20:39, 20 November 2013 (UTC)

On the binge-watch debate, can I point out A) there's a bit of SoP aspect, as I can easily find references to "binge exercising" or "binge shopping" and B) we DO have the damn word: it's at binge watching, which we've had since last summer. Circeus (talk) 17:28, 25 November 2013 (UTC)

Words in the NewsEdit

I propose that Wiktionary have an "In the News" section, like how Wikipedia has, which covers words that feature prominently in news or current events.

For example,

Having top current events on the main page would let people learn about words they think they know by clicking on the bluelinks to see what they're not sure of.

-- 70.24.244.51 14:43, 23 November 2013 (UTC)

  • We did indeed use to have such a feature. But I was the only person who contributed. I discontinued it when the Word of the Day feature was added. SemperBlotto (talk) 14:45, 23 November 2013 (UTC)
  • I would support this in theory, but it seems like it would be a lot of work. And unlike word of the day, words can't be stockpiled in advance. --WikiTiki89 14:58, 23 November 2013 (UTC)
    It would be nice if we could figure out a way to get normal users to give us a clue as to what was hot. Two general possibilities are:
    1. yesterday's or this week's most frequently searched terms (or, less desirable, visited pages)
    2. A special "hot requests" page, possibly augmented by thumbs up/thumbs down voting.
    Even if there were work involved the topicality would be good and it could rejuvenate participation by normal, albeit young users. DCDuring TALK 15:51, 23 November 2013 (UTC)
I would think that Wikipedia's own In the News section could suggest words that could be considered intriguing "in the now", as they'd be the obscure, technical, or complicated words related to the current event. -- 70.24.244.51 06:17, 24 November 2013 (UTC)
I'd say this is off-topic and not needed. Mglovesfun (talk) 11:51, 24 November 2013 (UTC)
@MG: What does "this" refer to? DCDuring TALK 14:38, 24 November 2013 (UTC)
Fair question. I think that this proposed 'Words in the News' feature is off-topic and not needed. I think it's un-dictionary-like. Mglovesfun (talk) 16:07, 24 November 2013 (UTC)
This is the only forum we have to discuss extensions of what Wiktionary is.
I would argue that Wiktionary is running the serious risk of irrelevance to English speakers and even learners and therefore needs to consider means of attracting and retaining users. This seems worth considering. As for being un-dictionary-like, that need not deter us if it helps users see the advantages of using Wiktionary for serious treatment of topical words, even protologisms that would not meet the current version of CFI.. DCDuring TALK 17:25, 24 November 2013 (UTC)
I think it is "dictionary-like". The OED does it, and I imagine all major dictionaries track new words in the news even if (not having our mission of openness) they don't publish them. Equinox 17:27, 24 November 2013 (UTC)
In theory, I imagine a bot could generate this from some free online news sources. Equinox 14:49, 24 November 2013 (UTC)
Does anyone do this already? Wouldn't we need some kind of word-frequency list based on use in, say, blogs over, say, 2008-mid-2013 to compare relative frequency of uncapitalized words and all-capitalized words (abbreviations) of current use with comparable, but older, use. We'd still miss proper names. Is there any such modern corpus that we'd have access to? [There are Google -NGrams and BYU's Global Web English.] Has anyone compiled a usable frequency list already? Are there APIs that we could tap into? DCDuring TALK 15:36, 24 November 2013 (UTC)
Even if we got a bot to choose the words for us, how would we ensure that none of the articles we link to are poorly written? --WikiTiki89 17:45, 24 November 2013 (UTC)
  • I think that Wikipedia's Did you know...-like section would be much more interesting to the readership. It could deal with interesting etymologies, semantic shifts, synonyms, origins of phrases, usages and so on. I bet that some simple Javascript games (like matching up synonyms, cognates or definitions) that could be played on the main page with a mouse click would prove to be very successful.. --Ivan Štambuk (talk) 20:40, 24 November 2013 (UTC)
That also sounds like a good idea. How would you choose DYK? On Wikipedia they choose DYKs from recently created entries, recently greatly revised entries, recently significantly expanded entries. Otherwise, it'd be the same as WOTD or this proposed WITN, if you choose arbitrarily (WOTD) or by currently popular/events (WITN). -- 65.94.78.70 05:34, 27 November 2013 (UTC)
  • I know I am a newer contributor to this site and don't know standard community practice/formatting very well yet, but I would be willing to do the do the technical work of maintaining/curating a "words in the news" feature. Perhaps something updated every week, or so, with words from the previous week's events? — E | talk 21:05, 25 November 2013 (UTC)
Maybe you should create a demonostration page as a subpage of your user page? -- 65.94.78.9 05:22, 7 December 2013 (UTC)

Links in word headingsEdit

(This is a suggestion from a naïve editor who doesn't know how to find out if this fairly obvious feature has already been put forward.)
For people merely browsing in Wikipedia, it might be useful to put links in the heading of each word page that link to nearby preceding and following words. There could be pairs of links of graded strength, for example and where relevant, ones that treat accented letters as distinct and ones that don't (though this might be better as an option). Another possibility, as well as distinctive whole word linking, is to provide links to the nearest words up or down alphabetically that differ in 1st (i.e., go to adjacent initial letter), 2nd, 3rd, and 4th letters. ReidAA (talk) 00:37, 24 November 2013 (UTC)

I already see these in the left-hand sidebar, split by language. There is also an optional extension that shows a few preceding and following words above the entry; I can't remember the name, but I have it enabled. Both of these things take a few seconds of processing after the initial page load. Equinox 00:40, 24 November 2013 (UTC)
I have had that selected in Preferences for many months and have never seen it. Which skin do you use? DCDuring TALK 01:33, 24 November 2013 (UTC)
I don't think I use any skin, just the default. The feature does seem to rely on Hippietrail's Toolserver account (if I remember correctly), which goes up and down, so it doesn't always work. Equinox 01:41, 24 November 2013 (UTC)
By "Wikipedia" do you mean Wiktionary? Because if you do mean Wikipedia, then I'm really not sure I understand. —RuakhTALK 08:14, 24 November 2013 (UTC)

cmn-pinyin or cmn-alt-pinyinEdit

The template cmn-pinyin is inflection template for pinyin syllables and cmn-alt-pinyin is for pinyin. Take a look at hǎo. There are two sections named romanization and the information in the first could also be in the second so really no need for two sections. Further more they dont have sortkey so á comes after z in category "Mandarin pinyin" but not in Category "Mandarin pinyin with diacritics". And what about an inflection template for non standard forms like hao? Kinamand (talk) 13:51, 24 November 2013 (UTC)

Pronunciation of ζ in Ancient GreekEdit

I see that words like Ζεύς (Zeús) or ζυγόν (zugón) are represented in IPA as: /zde͜ʊ́s/ and /zdyɡón/. This is based on the false assumption that the letter ζ had a single value, that is /zd/ (or /dz/, which would be just as wrong for words like ὄζος (ózos)). Given their etymology, /dz/ makes much more sense for these two words. --Fsojic (talk) 14:53, 24 November 2013 (UTC)

According to Sihler §201, the phonetic value is indeed [zd], and [dz] is an earlier form that fell out of use by the time of earliest attestations and can only be reconstructed etymologically. --Ivan Štambuk (talk) 15:25, 24 November 2013 (UTC)
I'm still if favor of marking all reconstructed pronunciations. I created a sample template for this a while ago: {{:User:Wikitiki89/template:IPAr}} that would look something like:
--WikiTiki89 18:03, 24 November 2013 (UTC)
Pronunciations cannot be reconstructed using comparative method, they're just guesswork. [dz] is just one of the likely intermediate forms between ζ = [zd] and PIE *dy, *gy, *ǵy, *gʷy *#(H)y-. A single reconstruction can abstract away many different and equally plausible pronunciations. --Ivan Štambuk (talk) 20:10, 24 November 2013 (UTC)
Actually they are recustructed using the comparative method, as there is no other way of reconstructing them (and yes, it does involve guesswork). But anyway, I'm not concerned with pronunciations that can be abstracted away with one reconstruction, I'm concerned with the fact that we have no way of knowing how closely spelling corresponded to pronunciation, especially with regard to exceptions. For example, someone reconstructing Russian pronunciation a thousand years from now might mistakenly believe that бог (bog) were pronounced /bok/ (or /bog/), when in fact it is pronounced /box/. There is no reason to suspect that Ancient Greek pronunciations were any more regular than those of modern languages. Anyway, all I want is for it to be indicated that the pronunciation is a guess, otherwise people might assume we know this stuff. --WikiTiki89 20:38, 24 November 2013 (UTC)
Reconstructions achieved by comparative method are phonological not phonetic - it can only give you contrasts, i.e. how the segments split and merged, not what they sounded like. At best you can get a set of distinctive features that unambiguously point to a specific sound, but in practice IPA notation is so detailed that there is a range of different value that each such segment can take. And when you add suprasegmental features (such length or tone, both present in Greek) - uncertainties multiply. Furthermore, when you inspect internal history of a protolanguage (i.e. the forms between ζ and its ancestral PIE forms) - it's pretty much all guesswork, and the result will vary on the basis of 1) what the author thinks is "expected" or "normal" development 2) chronological arrangement of sound changes 3) what stage are you reconstructing. Interestingly, while we can be pretty sure that ζ=[zd] in earliest attestations, there is much more uncertainty regarding later stages of Greek where scribal tradition preserved older spellings but actual vernacular changed a lot. (same goes for pre-reform Russian and others.) One place where such template would be fitting IMHO would be languages written in defective scripts (e.g. all of the ones written in cuneiform), or the ones recorded by illiterate or non-native speakers (e.g. Dalmatian, Old Prussian), where pronunciations of attested words are nothing more than educated guesses. --Ivan Štambuk (talk) 21:04, 24 November 2013 (UTC)
Who is to say that [zd] and [dz] could not have co-existed as separate phonemes, both written ζ, before eventually merging? As long as it can be disputed, we should mark it in a way that indicates to our readers that we don't know for sure. --WikiTiki89 21:28, 24 November 2013 (UTC)
All that we have evidence for is ζ = [zd]. The rest is useless speculation that can be neither proven nor refuted. At any case, you should contact User:Gilgamesh who is the resident expert on Ancient Greek historical phonetics/phonology, and the author of the pronunciation template. --Ivan Štambuk (talk) 21:42, 24 November 2013 (UTC)
That's not the point, I'm not arguing this specific case, but that we should indicate the fact that it is not certain. --WikiTiki89 21:46, 24 November 2013 (UTC)
There is not much difference between "100% certain" and "widely agreed by almost all the scholars in the field". According to your suggestion any word attested before the invention of sound recording devices would have its pronunciations marked as reconstructed, because we can never be sure what it sounded like. --Ivan Štambuk (talk) 21:56, 24 November 2013 (UTC)
That's exactly what I'm saying. Unless the pronunciation of a language is well documented (not necessarily audio recording) as I assume may have occurred with some languages that died out relatively recently. --WikiTiki89 22:19, 24 November 2013 (UTC)
Well good luck with that. --Ivan Štambuk (talk) 10:33, 25 November 2013 (UTC)
I know, it's ridiculous. Next he'll be suggesting marking all unattested forms with asterisks! —RuakhTALK 07:56, 26 November 2013 (UTC)
Sorry, I forgot that sarcasm doesn't carry on the Internet. To be clear: we do mark all unattested forms with asterisks. As do all linguists. Even in a book that's entirely about unattested proto-languages, every single form will be marked with an asterisk. —RuakhTALK 18:55, 26 November 2013 (UTC)
The problem is that asterisk doesn't mean "unattested"; more specifically means "reconstructed by comparative method" in the context of protolanguages (it can also mean "ungrammatical" or "pre-form"). With respect to pronunciations the only attestations are audio recordings, and every sequence of IPA symbols enclosed with [] or // is merely a simplification of a range of actual pronunciations with respect to certain phonological features that linguists deem relevant. So all IPA transcriptions are "unattested" if you will, and what constitutes a conclusive evidence for "sufficiently precisely described language" in IPA transcription so as not to be asterisked in case of ancient or poorly attested languages is an entirely subjective matter. Furthermore, usage of asterisks and the term reconstructed (which means two different things in the context of protolanguage reconstructions and phonetic reconstructions) to convey a level of uncertainty regarding phonetic values of attested words would simply introduce an unnecessary confusion. This should best be handled in per-language pronunciation appendices and qualifiers (region, time period, scholar/work) when sources contradict. --Ivan Štambuk (talk) 19:34, 26 November 2013 (UTC)
An asterisk doesn't actually mean anything other than "there is a note somewhere that explains this asterisk". In reconstructed languages, this note is already understood and need not be written. In this case, there is a little tooltip note. We could even have the asterisk link to a page that explains its meaning. As for handling such things in the language's Appendix page, that would not work for handling languages which have both attested and unattested pronunciations (such as Hebrew, which has attested modern pronunciations and unattested ancient pronunciations). Another important note is that there are two separate issues here:
  1. Should we mark unattested pronunciations?
  2. How do we decided if a pronunciation is unattested?
You keep merging the two issues and using the second to argue against the first, which is illogical. --WikiTiki89 20:07, 26 November 2013 (UTC)
It's important to follow established terminology and notation and not to be innovative when there is really no need to. You're making up new meanings of precisely defined symbols and words which only introduces confusion. We could have different appendices for modern and ancient Hebrew linked by different pronunciations qualifiers. There is no such thing as "unattested pronunciation" as I've explained, each IPA transcription is inherently "unattested". It's merely a system to guide humans to pronounce words. Whether that system itself is described or reconstructable beyond some arbitrary threshold of "sufficient preciseness" in terms of IPA symbols is not up to us to decide. If 99% of scholars agree that ζ=[zd] then we write zd. If they don't, we don't. We follow NPOV and NOR principles which renders questions such as "how do we know..." immaterial. --Ivan Štambuk (talk) 20:40, 26 November 2013 (UTC)
The established meaning of an asterisk is "look for a footnote about what this means". So we are not making up a new meaning for it. --WikiTiki89 01:55, 27 November 2013 (UTC)
I'm sure you realize that by established I did not mean "ad-hoc defined" but "used by convention by the general linguistic community". --Ivan Štambuk (talk) 13:51, 27 November 2013 (UTC)
Look at our first definition at *#Punctuation mark. Anyway, if you don't like the asterisk, we can indicate it some other way. You are conveniently using details like that to avoid the questions above. --WikiTiki89 21:49, 27 November 2013 (UTC)
The first definition of *#Punctuation mark has to do with footnotes. The usage of asterisk in your template {{:User:Wikitiki89/template:IPAr}} has obviously nothing to do with footnotes, and is supposed to mimic the usage of asterisked reconstructions. I suggest that you reread my answers above - the whole notion of "unattested pronunciations" and "reconstructed pronunciations" is useless, the former because all IPA transcriptions are inherently "unattested", the latter because uncertain pronunciations are not reconstructed but guessed. --Ivan Štambuk (talk) 23:07, 27 November 2013 (UTC)
Now you're just debating terminology. If you prefer the term "guessed" over "reconstructed" then maybe we'll use it. I just think that we should differentiate transcriptions of attested pronunciations and transcriptions of unattested pronunciations (and that doesn't necessarily mean that the transcription itself is attested, but that the pronunciation it is transcribing is attested). --WikiTiki89 23:19, 27 November 2013 (UTC)
But we do have audio recordings for Latin and some other ancient languages. Are you suggesting that their audio file also be marked as guessed, since they are obviously not made by native speakers but, just like transcription, made on assumption of what contemporary Latin sounded like? --Ivan Štambuk (talk) 12:27, 28 November 2013 (UTC)
Latin should have attested ecclesiastical pronunciations, but classical pronunciations cannot be attested. And yes, I think the audio file should be marked as "Modern Latin" or "Ecclesiastical Latin" or whatever it is, since it cannot be Classical Latin. --WikiTiki89 16:33, 28 November 2013 (UTC)
Ecclesiastical Latin isn't actually attested form of spoken Latin, unless you have it recorded from priests speaking it. The few audio samples that we have are pronounced by editors based on how the pronunciation of ecclesiastical Latin is described in the books. Which then brings us to the issue of pronunciation of ecclesiastical Latin itself being based on the reconstructed pronunciation of Classical Latin sprinkled with local influences. So it's just as "speculative" as the supposedly more uncertain classical Latin pronunciations. We already use qualifiers for region and period in pronunciations, but what you're suggesting is using some additional markers for uncertainty, which should best be spelled out and not weirdly asterisked. But I'm not sure that e.g.
(5th BC Attic, scholarly guess): IPA: /zde͜ʊ́s/
would not look ridiculous. --Ivan Štambuk (talk) 19:55, 29 November 2013 (UTC)
Ecclesiastical pronunciations can be considered speculative with regard to Classical Latin, but not with regard to Ecclesiastical Latin. Ecclesiastical Latin exists today and therefore its pronunciation is (or can be) attested. For living languages, we generally do not accept audio samples from non-native speakers. Dead languages should not be treated differently unless the audio sample is also tagged with "reconstructed" or similar (this would apply to Classical Latin, but not Ecclesiastical Latin). I still don't understand your opposition to the term "reconstructed", as it certainly does apply to pronunciations just as well as it does to any other historical thing or event (and I don't see how it can be confused with the linguist definition that applies to unattested terms or languages). I don't think that something like the following would look ridiculous:
(5th BC Attic, reconstructed): IPA: /zde͜ʊ́s/
(this would presumably link to an appendix or Wikipedia article explaining the reconstruction). --WikiTiki89 20:27, 29 November 2013 (UTC)
The difference between an attested pronunciation of a living language and a constructed pronunciation of a dead language is non-trivial, in my opinion. While an intelligent reader can probably infer that any pronunciation for a dead language is a reconstructed one, when we have all sorts of languages and pronunciations lumped together on one project, it seems a reasonable distinction to be made. An asterisk would be an unobtrusive and consistent way to do it. -Atelaes λάλει ἐμοί 08:15, 26 November 2013 (UTC)
I doubt that an unintelligent reader (there seems to be an implicit assumption that the majority are such) would be enlightened by our made-up scheme of marking uncertain pronunciations with an asterisk. --Ivan Štambuk (talk) 10:21, 26 November 2013 (UTC)
With a little work, I imagine {{IPA}} could be made to add an asterisk automatically whenever the language was set to grc or any other completely dead language. That would save us the trouble of updating all the actual entries. Undead languages like Latin would have to be handled separately. - -sche (discuss) 15:04, 26 November 2013 (UTC)
In case you guys didn't notice, the asterisk is not the only feature. It also adds a tooltip that says "reconstructed pronunciation". --WikiTiki89 16:25, 26 November 2013 (UTC)
What exactly does "reconstructed" mean for pronunciations? With written attestation it's easy; either it's attested or it's not. But with pronunciation, some details might be revealed by native speakers while others are not. For example, I recall a case where authors of Shakespeare's time actually describe /r/ as being realised as [ɹ] in their form of English. So that's not a reconstruction, that's an actual description of the language by native speakers or by those in contact with them, and we can site those contemporary sources as such. The same can be applied to the Greek here too; ancient sources like the one Ivan showed actually attest to the pronunciation by describing it directly. So it's not a reconstruction then. —CodeCat 19:01, 26 November 2013 (UTC)
Even if we know the pronunciation of each letter, that doesn't mean we know the pronunciation of the word. I find it hard to believe that the Ancient Greeks pronounced everything exactly the way it was spelled. --WikiTiki89 19:08, 26 November 2013 (UTC)
They did spell phonemically though, so there's no problem with a phonemic IPA. —CodeCat 19:09, 26 November 2013 (UTC)
And how do we know that? Even if they did in general, how do we know there weren't exceptions? --WikiTiki89 19:11, 26 November 2013 (UTC)
Because native grammarians described it as such, and a bunch of other alphabets modeled after Greek are also phonemic, and modern scholarship generally agrees that that seems to be the case. --Ivan Štambuk (talk) 20:40, 26 November 2013 (UTC)
In Latin, there are texts which describe a subphonemic difference that was not reflected consistently in spelling: sonus medius. If the Roman grammarians went into such detail about their language, then most likely the Greeks did too, and if not the Greeks then surely the Romans would have had their share to say about how to pronounce Greek too, because they saw mastery of Greek as a sign of the elite. —CodeCat 20:55, 26 November 2013 (UTC)
That's exactly what I mean when I say that we can't tell from the spelling how it was pronounced. If we have such a description of a particular word, then we can consider that pronunciation attested. But if we don't and are merely extending things we know from pronunciations of other words to a word that is not attested, then that is a reconstruction. --WikiTiki89 01:55, 27 November 2013 (UTC)

At the end of the day, unless we start demanding audio citations for current languages, I think making the distinction here between scholarly analysis of ancient languages, especially ones so well documented in these terms as Ancient Greek, and stuff where we could have (in theory) audio citations for, is cutting thin hairs.--Prosfilaes (talk) 04:40, 27 November 2013 (UTC)

Currently, pronunciations are disputable. If you think a pronunciation does not exist, you can demand audio or a link to a transcription in another dictionary. We just don't have as formal of a process for it as we do for definitions (it's usually done in the WT:Tea room). --WikiTiki89 04:48, 27 November 2013 (UTC)

English verb forms identical to past participleEdit

I've had MglovesfunBot tagging 'redundant' wikisyntax like this in entries to try and get rid of all the instances where there is a categorizing template and an overt category doing the same thing. The main stumbling block is entries like beat up where there does need to be something. There are three clear options (using beat up as an example):

  1. Add {{en-simple past of|beat up}} to the bottom of the verb section.
  2. Get {{en-verb}} to categorize these automatically by checking is parameters {{{3}}} and {{{4}}} are identical to the page name or not.
  3. Write the category at the bottom (as currently done), hence reverting the bot edit.

Input please. Mglovesfun (talk) 16:25, 24 November 2013 (UTC)

There's a similar issue with nouns that have plurals identical to their singulars. Some of these have a separate sense line, e.g. Japanese, others don't. I think it would be good to implement your proposal 2, and I would prefer that to proposal 3. I am inclined to also include "past of", "plural of", etc sense lines (as at Japanese), but I'm willing to be persuaded that that isn't worthwhile. - -sche (discuss) 18:13, 24 November 2013 (UTC)
I like proposal #2, but it might be a little difficult to for it to work when multiple plurals are listed. But then again, with Lua it might not be a problem. --WikiTiki89 18:21, 24 November 2013 (UTC)
True. Explicit (i.e. manual) categorization could always be a fallback if there are multiple plurals and the template can't make sense of them. Proposal 1 would also handle such cases. - -sche (discuss) 19:40, 24 November 2013 (UTC)
I prefer option 1. I dread the baroque way that I expect option 2 would be implemented, with an attempt to generalize across all languages. Option 3 is a fallback that can't be excluded, so running the bot multiple times would require a list of exceptions, I think. DCDuring TALK 19:10, 26 November 2013 (UTC)

Logos on the Wiktionary portalEdit

Wiktionary-logo-portal.svg
Wiktionary-logo-en.svg
Wiktionary book logo.png

As many of you know, Wiktionary has a painful history when it comes to logos. Attempts to unify the projects have resulted in three main logo designs (plus a fourth, for Galician). The Foundation treats the "tiles" logo as Wiktionary's main logo, because a majority of wikis use it. But the English Wiktionary and 35 other wikis, representing a majority of entries, continue to use a "plain text" logo. Moreover, translating the plain text logo means changing the entire logo, so this wiki's logo has little in common with the Russian Wiktionary's.

This fragmentation poses a problem for the portal at www.wiktionary.org, which currently uses the English Wiktionary's logo. Because this logo represents only part of Wiktionary, I propose a solution based on JavaScript and CSS that celebrates, ahem, our logo diversity.

  1. By default, the portal displays a language-neutral version of the tiles logo that was specifically intended for the portal.
  2. Whenever you mouse over any of the wikis in the "top 10" ring, a more appropriate logo for the wiki fades in. So the English Wiktionary logo fades in if you mouse over the English, Malagasy, or Russian links, and the book logo fades in for the Lithuanian link.
  3. As soon as the page loads, the JavaScript at m:MediaWiki:Gadget-wm-portal.js currently selects a search language based on your browser's language preference. Under this proposal, the logo also changes according to that language preference, as long as the language is one with 10,000 entries or more.
  4. Manually selecting a language from the menu also causes an appropriate logo to fade in and sets a cookie, so that you continue to see the same logo on subsequent visits.

In practice, #3 means that nothing will change for most visitors (except when mousing over links), because most people use the English version of their browser and relatively few set their language preferences. However, only #1 works in IE 6, which currently receives only 0.30% of hits to Wikimedia servers. [1]

If you have JavaScript enabled, you can see it in action: go to m:www.wiktionary.org template/temp and click on "Preview HTML" in the dropdown menu between the Watch button and the search bar at the top of the page (or in MonoBook, the tab furthest to the right). So far, I've tested it in Firefox 27, Chrome 32, IE 11, Safari 7, Opera 12.16 (Presto rendering engine), Opera 18 (Chromium), and IE 6, but more eyes are always welcome.

Please weigh in at m:Talk:www.wiktionary.org template#Logo. I plan to deploy the changes in a couple days unless there are significant issues. Thanks for your attention!

 – Minh Nguyễn (talk, contribs) 00:31, 27 November 2013 (UTC)

In my opinion, we should only have one logo. If several chapters use different ones, the solution is to properly decide to use a single logo everywhere, not to change the logo depending on context. A logo should represent the project as a whole, not each individual language. As such, no logo using text or custom characters should be used: it means that neither the "text" logo nor the "tiles" logo are appropriate. As for the book, I've never heard of any talks about it. Where does it come from?
As for the proposition, it would not celebrate the diversity of logos, it would only show our inability to come up with a common, proper logo. As a result, the readers will wonder why there are so many different logos, why some have tiles, and others text, or even a book. And since the text and tiles are customizable, they do not reflect the current logos in use in each Wiktionary (e.g. "Wiktionary' for Malagasy, where it should be the "Wikibolana"). So, in the end, it will only be confusing for everyone. Dakdada (talk) 13:20, 27 November 2013 (UTC)
The Preview HTML link does nothing in Safari 6.1/Mac.
Yeah, this proposal is contrary to the very idea of what a logo is. So is the so-called logo at the top of this page. Just use the one that represents most Wiktionaries and ignore the 35 silly outliers. And oh, Galicia, Galicia... Michael Z. 2013-11-28 00:51 z
In Safari, you need to turn off popup blocking first. Go to Safari | Preferences, Security tab, and uncheck Block Popups.
I was being slightly sarcastic about "celebrating diversity". As I mentioned above, the Wiktionary community has already tried to create a new logo to unify the language editions – twice. The tile logo won a 2006 vote, while the book logo won a 2010 vote after Wiktionarians complained that the 2006 vote didn't involve enough of the Wiktionary community. These votes only resulted in more logos and fragmentation. While the tiles logo is in use at more wikis, the wikis using text logos have more entries total, so the current situation is a stalemate between the proponents of both logos. If you disagree with the multi-logo approach I described above, please voice your opinion here before it goes live. Thanks!
 – Minh Nguyễn (talk, contribs) 06:31, 30 November 2013 (UTC)

If I had any kind of Photoshop skills I would immediately create a new logo. Maybe something incorporating the ['w] that we currently have in the favicon, and then in the background IPA transcriptions of various translations of dictionary (German, French, Japanese). I think that would make a pretty cool logo if pulled off correctly. -- Liliana 20:35, 29 November 2013 (UTC)

You're welcome to design a logo or just propose a logo concept, but please read up on the previous votes before starting anything. It would be unfortunate for anyone to spend time creating yet another logo for Wiktionary that meets the same fate. Also, note that the ['w] favicon is only in use at a few Wiktionaries this Wiktionary; the rest use a single W tile as a favicon. – Minh Nguyễn (talk, contribs) 06:31, 30 November 2013 (UTC)
So far there simply weren't any good suggestions. I bet that in that last vote people only voted for the book logo so the tile one doesn't win. Note how our vote on adopting the logo (Wiktionary:Votes/2010-02/Accepting the results of the Wiktionary logo vote) went a whole different way. -- Liliana 08:11, 30 November 2013 (UTC)
The portal has waited years for the overall Wiktionary project to unite behind a single logo. Few have confidence it'll happen anytime soon. If it's unfortunate that Wiktionary is unable to make a coherent first impression on visitors, it would be even more unfortunate for the portal to continue to be a victim of this impasse. With my proposal, most users won't see any difference, but for those that will, the portal will finally show them the same logo as the wiki they're about to click through to. That's the fairest, most sensible stop-gap solution I can think of. If a universally accepted logo ever emerges, why, I'll spare no time in switching the portal to it! – Minh Nguyễn (talk, contribs) 09:17, 30 November 2013 (UTC)
Re: "['w] favicon is only in use at ... this Wiktionary": so what? It is supported by consensus as per Wiktionary:Votes/2012-12/New favicon. --Dan Polansky (talk) 11:02, 30 November 2013 (UTC)
I think Liliana is talking about a logo to represent Wiktionary overall, not just the English Wiktionary. I don't have a problem with ['w], but I just wanted to head off the notion that it's already ingrained in the larger Wiktionary community. Outside the English Wiktionary, most Wiktionarians haven't seen it. – Minh Nguyễn (talk, contribs) 11:16, 30 November 2013 (UTC)
I still like the book... —CodeCat 14:11, 30 November 2013 (UTC)

After over a week of discussion both here and at Meta, no technical issues arose, so I've pushed the changes live. The portal is now temporarily working around the seven-year-old logo stalemate, and the fix comes with various other improvements (like HiDPI support and a hefty size reduction). Those interested in resolving the stalemate are welcome, as always, to organize a discussion or poll over at Meta. Going forward, if you see any technical issues, please let the Meta admins know at m:Talk:Www.wiktionary.org template. Thanks! – Minh Nguyễn (talk, contribs) 12:12, 4 December 2013 (UTC)

Jyutping syllableEdit

"Jyutping syllable" is used as a header in approximately 700 entries, and has been since 2009 or earlier. However, Autoformat has considered it a nonstandard header since 2009 or earlier. The result is that any entries in Category:Entries with non-standard headers with headers that are actually nonstandard are drowned in a sea of Jyutping syllables. Let's either legitimize "Jyutping syllable" or bot-replace it. (Note that whatever bot changes the headers should also remove {{rfc-header}}.) - -sche (discuss) 04:42, 27 November 2013 (UTC)

Why not call it "Syllable", and use that across various languages (standardized with symbol, etc); a {{qualifier}} can be used to indicate it is Jyutping. -- 65.94.78.70 05:26, 27 November 2013 (UTC)
I would support a "Syllable" header for these. Do we already have "Syllable" headers? --WikiTiki89 05:33, 27 November 2013 (UTC)
===Syllable=== is fine by me. We do already have a lot of Vai syllables like , numerous Korean syllables like , and miscellaneous other things. - -sche (discuss) 06:19, 27 November 2013 (UTC)
Didn't we vote to approve a 'Romanization' header, or something similar? —RuakhTALK 08:47, 27 November 2013 (UTC)
The votes were for Mandarin and Japanese, any romanisation, mono- and multisyllabic. Not about the headers but they made "Romanization" headers standard. We should use "Romanization" to match Mandarin and Japanese. --Anatoli (обсудить/вклад) 09:07, 27 November 2013 (UTC)
Jyutping is not yet approved for inclusion in Wiktionary. We would need a vote on whether to allow Jyutping entries. -- Liliana 09:12, 27 November 2013 (UTC)
Yes, only if there is an opposition to their existence. At the moment we have a long-time precedence. If we were to decide, WHICH romanisation to use for Cantonese, I would use Jyutping - preferred by Hong Kong government and easy to find on the web. I use my Sheik dictionary. --Anatoli (обсудить/вклад) 09:33, 27 November 2013 (UTC)
No one ever wanted them. User:Opiaterein added them on his own without asking for consensus. Korean Revised Romanization was previously rejected, so there are precedent cases for disallowing East Asian romanization. -- Liliana 09:35, 27 November 2013 (UTC)
OK, you can set up a vote then. --Anatoli (обсудить/вклад) 09:39, 27 November 2013 (UTC)
Why me? Am I asking for anything? -- Liliana 09:40, 27 November 2013 (UTC)
Are you not asking to delete all Jyutping entries? Or is there a policy to disallow them? When I RFD'ed one of entries a while ago, the entry passed it. --Anatoli (обсудить/вклад) 09:47, 27 November 2013 (UTC)
There's no policy to allow them yet. They could be sent to WT:RFV and would fail, since there are no citations for the romanized forms. -- Liliana 10:51, 27 November 2013 (UTC)
Why don't you do it, then? --Anatoli (обсудить/вклад) 00:16, 28 November 2013 (UTC)
People are going to whine... -- Liliana 05:11, 28 November 2013 (UTC)
People will either approve their existence (or leave them if no consensus) or vote to delete. In any case, it seems simpler than setting up a vote. People don't seem to care too much about Cantonese romanisation. If entries are kept, then the heading names could be decided next. --Anatoli (обсудить/вклад) 05:52, 28 November 2013 (UTC)
I've drafted Wiktionary:Votes/2013-11/Jyutping. Feedback is appreciated. - -sche (discuss) 06:50, 28 November 2013 (UTC)

Thank you, -sche. --Anatoli (обсудить/вклад) 22:08, 28 November 2013 (UTC)

The vote having passed, Jyutping syllables are allowed. I have updated all the Jyutping entries I could find to use the agreed-upon format (which includes "Romanization" rather than "Jyutping syllable" as the POS header). Did I miss any? - -sche (discuss) 09:53, 28 January 2014 (UTC)

Restricting editing of heavily-transcluded templatesEdit

Heavily-transcluded templates and modules have much in common with bots: they have the same capacity for widespread propagation of the consequences of errors, and the same potential for unforeseen side effects. Currently, though, we have no policy for limiting who can edit them. In practice, such templates and modules are all protected from edits by non-admins, but there's no requirement that admins know anything about coding, or that they can be trusted to use their knowledge in the best interests in the project. I'm not sure how to translate it into a vote, but I would like to propose that:

  1. We should designate any template or module that exceeds some agreed-upon threshold of transclusion-count and/or importance to be off-limits to edits by anyone not authorized to make such edits.
  2. We should create a list of those whom we authorize to edit high-importance templates and modules, based on both their demonstrated technical ability and their demonstration of the appropriate temperament/commitment to use that ability appropriately. Names should be added in a procedure analogous to that for approving a bot.
  3. As a side issue, perhaps we need to improve our protocols for approving non-emergency changes that might have significant and/or far-reaching effects, so we can be sure that they've been properly considered and have the proper consensus before being implemented.

Chuck Entz (talk) 23:34, 27 November 2013 (UTC)

Technically we already have a list of such people: WT:Administrators. Perhaps we need to revisit the process for promoting and demoting admins? --WikiTiki89 23:40, 27 November 2013 (UTC)
This would be a major step. There is a problem of constituting a panel that would be capable of intelligent oversight of proposed changes. Do we have enough technically competent, up to date (Lua/Scribunto; our language, category, conjugation, and context architecture) - and wise - active participants to make up a panel that would be available to make decisions in a reasonably timely manner? Are we willing to empower them and submit to the decisions they make?
I know that the project and I would have benefited from a go-slow on a recent change I made to a template transcluded on about only 7,000 pages (at least 14 queue jobs, as I understand it, though perhaps many more, with the total queue at 32,000 jobs). DCDuring TALK 00:13, 28 November 2013 (UTC)
Sorry, but I think this is totally the wrong approach. I'm not aware of any admins who are categorically incapable of making good edits, and I'm not aware of any admins who willfully make bad edits. Instead, what I see is a lack of consensus as to what constitutes a good edit, with the result that people sometimes make edits that others would object to. If we can formulate some sort of useful guideline either as to the edits or as to the process — e.g., "changes to templates/modules with thousands (or millions) of transclusions should be proposed beforehand" — then I hope that admins will willingly follow it.
For your bot analogy, note that we do allow all admins to grant themselves the flood flag for one-off bot tasks.
RuakhTALK 07:06, 28 November 2013 (UTC)
  • We obviously need a Modulespace Edit Assassment Panel which will authorize diffs, Heavily Transcluded Template Editor Authorization Committee and a Subcommittee for Editor Temperament and Committment to file regular reports on Wiktionarians' behavior. Seriously, more red tape will just hinder productivity and generally goes against the wiki spirit of openness and assuming good faith. Problems such as bugs and script errors propagating in many pages can be addressed by mandatory unit tests and Ruakh's proposal that forbids modules generating visible script errors. Everyone can make errors, but as long as they clean up after themselves and learn from it there is no problem with that. Much bigger issue IMHO is the lack of unified documentation of modules which are slowly turning in a big framework and "best practices" change all the time and the only way to keep up to date is to read other people's code. --Ivan Štambuk (talk) 12:21, 28 November 2013 (UTC)
    Indeed. I see you have given the matter some thought. Someone like me would actually benefit from the impulse- and stupidity-control aspects of such committees, but perhaps no one else ever has such problems.
The acquisition and diffusion of accurate information about relatively costly processes would go a long way toward making it easier to reach consensus about desirable edits. By "costly", I mean consuming any scarce resource in amounts that affect significantly any aspect of the user experience. "Scarce resource" might included download bandwidth, user RAM, server resources generally, the "queue", software maintainer time, attention from MW technical personnel, etc. Preventing 'race conditions' seems also to be a concern.
Some evidence of consensus on technical direction and on what needs offline testing would be nice. DCDuring TALK 14:08, 28 November 2013 (UTC)
Is it not possible to add a review process for heavy modules and templates? I think there is already an extension for that, although its aim is to prevent vandalism (I think). Dakdada (talk) 14:32, 28 November 2013 (UTC)
What's worse? Poorly architected, undocumented, or unmaintainable templates, modules, CSS, or JS (with good intentions and a longish-term commitment to the project) or vandalistic edits to the same done relatively quickly, but with bad intentions)? I think the first is harder to recover from. DCDuring TALK 14:45, 28 November 2013 (UTC)
The most important requirement is that they are good enough, i.e. that they work. Polishing (documentation, unit tests, review) can come later. I'd rather that we have 10x more poorly documented and occasionally buggy templates, modules and JS, than 10x less that only selected few bother to edit according to some wasteful and complicated process. I understand the necessity to regulate edits to heavily used code, but barring some significant pattern of accidents occurring the proposals seem more like an impediment to a more natural evolution in incremental steps, as editors see fit. --Ivan Štambuk (talk) 19:31, 29 November 2013 (UTC)
@Darkdadaah: What is meant by "heavy" in this context? What is the module called? How does it work? DCDuring TALK 21:12, 29 November 2013 (UTC)
@Ivan: Interesting straw men.
The cases at hand are far removed from incremental evolution, which I imagine to be the result of solving problems on a language-by-language, script-by-script, or heading-by-heading, or namespace-by namespace basis. We are mostly talking about the forced conversion of such small-scale evolving templates, template systems, and modules into all-encompassing ones. Sometimes the conversion provides no current benefit whatsoever, except for uniformity, which is analogous to a reduction of genetic diversity, considered a major problems for biological populations. DCDuring TALK 21:12, 29 November 2013 (UTC)

Prepositional pronounEdit

While we're cleaning out Category:Entries with non-standard headers, we might as well deal with this one. This header is only used in Irish and Scottish Gaelic, where pronouns and prepositions fuse into a single word when they occur together. Possible options that come to mind are recognizing "Prepositional pronoun" as a standard header, using the "Contraction" header, or not giving the forms any header and instead including them as part of the inflection for the prepositions. Hebrew, with its prepositional prefixes and pronominal suffixes, has similar constructions, but I believe they're ignored as SOP. There's a category that should include all of them in its subcategories: Category:Prepositional pronouns by language. Chuck Entz (talk) 07:08, 28 November 2013 (UTC)

In Hebrew they're considered declined (inflected) forms of the prepositions, so we give them the header ===Preposition===, put them in the category Category:Hebrew preposition forms, and define them as form-ofs (specifically as "form of [word] including [gender/person/number] personal pronoun as object", which is unwieldy but is the best we could think of). I don't know any Celtic languages, but my secondhand understanding of their inflected prepositions was such that the same approach would be appropriate for them.
When I hear "prepositional pronoun" I think of our sense #2 (cf. prepositional case; like how an "accusative pronoun" is the accusative-case form of a pronoun), but if that's the standard term for Celtic preposition forms, then O.K.
RuakhTALK 07:25, 28 November 2013 (UTC)
Functionally, the Celtic words are like adverbs. So maybe they should be considered as such? Compare also Category:Dutch pronominal adverbs which are very similar. —CodeCat 14:36, 28 November 2013 (UTC)
I'm in favour of recognising them as a standard header. Prepositional pronouns, or what Wikipedia calls inflected prepositions, exist in all six Insular Celtic languages (Irish, Scottish Gaelic, Manx, Welsh, Breton, and Cornish). If speakers of Semitic languages wanted to use this as well, that'd be fine as far as I'm concerned, but at very least they should exist for the Insular Celtic languages. Maybe as a subcategory of both prepositions and pronouns? embryomystic (talk) 20:26, 2 December 2013 (UTC)
Sorry, but your comment makes no sense to me; it seems to be conflating words, entries, headers, and categories. What exactly are you proposing that we do? —RuakhTALK 21:38, 2 December 2013 (UTC)
I'm proposing that we (continue to) use "Prepositional pronoun" as a header (and, in fact, extend its use to the other Insular Celtic languages; I could have sworn I'd used it in Manx myself), and keep the entries in question in categories for prepositional pronouns in their respective languages. I have no real opinion on Semitic prepositional pronouns (I don't speak any Semitic languages), but if editors who work with languages possessing such things want to treat them similarly, I'd be in favour of that. Is that slightly clearer? embryomystic (talk) 20:57, 3 December 2013 (UTC)
Yes, much clearer, thanks.   Re: Semitic languages: for Hebrew at least, the header ===Prepositional pronoun=== would not be appropriate, since no one uses that term in reference to them. (As I noted above, for Hebrew we're just using ===Preposition===.) I'm not sure about the other Semitic languages, but I'm betting they're similar, since there's much more cross-pollination among linguists of various Semitic languages, and among linguists of various Celtic languages, than between the two groups. —RuakhTALK 22:45, 3 December 2013 (UTC)
I have noticed that we use ===Preposition=== for these in Hebrew and Arabic, but it has always seemed weird to me. I have also noticed ===Prepositional phrase=== used a few times, which also seems weird since it's one word. I like the sound of ===Prepositional pronoun===, but since no one calls them that, it "would not be appropriate" as Ruakh says. --WikiTiki89 23:07, 3 December 2013 (UTC)
Yes, it seems weird. We do that because with other parts of speech we list inflected forms under the main POS: English -ing forms, for example, we list as ===Verb===. —RuakhTALK 00:48, 4 December 2013 (UTC)
I'm also interested. How should, e.g. Arabic forms like معك (with you), له (to him), etc. be classified. I see no difference from the languages listed at prepositional pronoun but Arabic uses notion "enclitic pronouns", so these pronouns are attached to verbs as well, not just prepositions. --Anatoli (обсудить/вклад) 01:51, 4 December 2013 (UTC)
As stated above, in Hebrew we treat them as inflected forms of prepositions and thus use the header ===Preposition===, even though it seems weird. --WikiTiki89 02:07, 4 December 2013 (UTC)
I'd support changing it to ===Prepositional pronoun=== for languages, which have them and legalise them. Using ===Preposition=== header seems only necessary because ===Prepositional pronoun=== header is not allowed (yet). I can't comment on Hebrew - I don't know any Hebrew but for Arabic ===Prepositional pronoun=== seems very appropriate. --Anatoli (обсудить/вклад) 02:27, 4 December 2013 (UTC)
Re: "Using ===Preposition=== header seems only necessary because ===Prepositional pronoun=== header is not allowed (yet)": Nonsense. You might as well say that the only reason we need ===Verb=== for English is that ===Gismu=== isn't allowed. —RuakhTALK 19:34, 4 December 2013 (UTC)
Perhaps, German (and perhaps Dutch) pronominal adverbs could use another header or category? E.g. damit means "so that, in order that" but also "with it, with that, therewith" (da + mit), e.g. "ich schreibe damit" - "I write with it" (e.g. with this pen). Is this grammatically correct to mark them simply as adverbs? Hmm, just checked - Duden says "adverb". --Anatoli (обсудить/вклад) 02:37, 4 December 2013 (UTC)
They're widely called pronominal adverbs. Prepositional phrases aren't clearly distinguishable from adverbs anyway. They fulfill the same syntactic role, so the morphology is the only distinguishing feature. Just think about it: every prepositional phrase that uses a preposition of location can be replaced with and referred to using "there". The same applies to the inflected prepositions in Celtic, so I'm inclined to see them as adverbs too. —CodeCat 02:49, 4 December 2013 (UTC)
I see, thanks. I like the Dutch way of handling pronominal adverbs (e.g. ermee more than German, e.g. damit, they should be perhaps done like hiermit. --Anatoli (обсудить/вклад) 03:02, 4 December 2013 (UTC)
Prepositions function exactly the same way in Hebrew and Arabic (and Aramaic). --WikiTiki89 02:30, 4 December 2013 (UTC)
Thanks, I got it now. --Anatoli (обсудить/вклад) 02:37, 4 December 2013 (UTC)

LogogramEdit

Another non-standard header, used only in English Braille (see Category:English braille logograms), as far as I can tell. Some writing systems such as Han characters are full of logograms, but the distinction isn't all that useful for them, while English Braille is an alphabet with logograms as an addition. Should we recognize this as a standard header, or is there another header that could be used? Chuck Entz (talk)

Wouldn't it be "symbol"? -- 65.94.78.70 05:40, 29 November 2013 (UTC)

Measure wordEdit

Yet another non-standard header. I'm quite familiar with this part of speech in Mandarin, but I believe it's common to all the Sinitic lects, and in neighboring languages as well. In English, we use units when we're applying quantity to mass nouns: a glass of water, a cup of flour, 100 grams of butter, etc. These units are measure words. In Mandarin, every noun has to use such a word, and the choice of which such word to use has more to do with semantic classification of the noun than with units of measure, so most of them aren't really true measure words. Head of cattle is about the only thing I can think of like it in English. Something long and thin will use one word, something flat another, birds another, people another, trees yet another, etc. If we want to be more linguistically accurate, then we should probably go with the Classifier header, which we're using for some southeast Asian languages. On the other hand, "Measure word" seems to be the choice of the editors we have working on Mandarin entries here. "Counter" seems to be the header of choice for Japanese and Korean words on Wiktionary, so that might be another choice if we want to be consistent among northeast Asian languages. Chuck Entz (talk) 08:39, 28 November 2013 (UTC)

I suggest ===Particle===. - -sche (discuss) 18:22, 28 November 2013 (UTC)
They are not particles at all. These languages also have particles. Note that with Mandarin, the term measure word is also incorporated as an additional parameter into {{cmn-noun}}, e.g. 拖拉机 displays measure word . --Anatoli (обсудить/вклад) 20:15, 28 November 2013 (UTC)
A "particle" is, according to our entry "a word that has a particular grammatical function but does not obviously belong to any particular part of speech", so to the extent that Autoformat didn't recognise "measure word" as a part of speech prior to this discussion, they certainly could have been classified as particles. The fact that Mandarin has other, dissimilar things which are also particles would not impede that; the "to" in English infinitives and the vocative "O" are dissimilar, yet both are particles, because "particle" is a catch-all. However, I have no objection to simply legitimating "measure word" as a part of speech. If that's what happens, someone may want to update these entries as well: Category:Bengali measure words. - -sche (discuss) 03:09, 30 November 2013 (UTC)
Terms "measure word", "counter" and "classifier" are all synonyms, none of them better than the other. To me, "measure word" is unambiguous. Obviously Mandarin, Japanese and Korean editors worked independently from each other. Just need to assess the scope and check with appropriate editors. It's better to have consistent names but I don't remember seeing "measure word" as a header for Mandarin, only in the noun templates. --Anatoli (обсудить/вклад) 01:40, 29 November 2013 (UTC)
See w:Measure word. Using measure word in cases where it's referring to numbers of countable things rather than amount of something uncountable is technically wrong, but it's the term of choice in many authoritative works on Chinese. Chuck Entz (talk) 19:58, 29 November 2013 (UTC)
There are definitely inconsistencies across the East-Asian languages on WT. Japanese measure words are usually called counters (助数詞 josūshi) and an example of this can be found here: . As you can see a non-standard heading 'counter' is used here. It is a unique pos for these languages that serve the same purpose and there should be a unified heading to denote them. I would support Anatoli's suggestion (measure word). JamesjiaoTC
Thanks, I actually haven't said that I prefer the header "measure word" :) . I just think it's less ambiguous. There is no category for Mandarin "measure words", "counters" and "classifiers", though but there is a used parameter, as seen in 拖拉机. --Anatoli (обсудить/вклад) 03:30, 29 November 2013 (UTC)
See Category:Measure words by language Chuck Entz (talk) 19:58, 29 November 2013 (UTC)
  • What's so "non-standard" about this header? I don't quite understand. These words are measure words, plain and simple (or classifiers, if you so desire, but they both mean 量词). Surely you are not suggesting we just classify them as nouns instead? How would that help the project? There is no Chinese dictionary on this earth that classifies them as such (i.e. as 名词). ---> Tooironic (talk) 04:59, 29 November 2013 (UTC)
There just needs to be some policy set in stone. I don't see Japanese counters marked as badly formatted or something. Indeed, why are they non-standard? Japanese counters are better known than Lojban gismu but gismu heading is not considered "non-standard". --Anatoli (обсудить/вклад) 05:52, 29 November 2013 (UTC)
It's non-standard only in the Wiktionary-specific sense: we just haven't made a decision to accept it as a valid header. I brought this up so we could either do so, or decide on an alternative. According to w:Measure word, it's technically a misnomer, but we can ignore that based on the widespread usage in works discussing Chinese grammar. Chuck Entz (talk) 19:58, 29 November 2013 (UTC)
  • As mentioned above, a counter word in Japanese is a unique pos called 助数詞 in Japanese. I suppose that it would not be inaccurate to change the header to "measure word" but it would be unconventional. So far I've only heard the word "counter" used to name them. Probably everyone is aware of this but just to avoid any confusion Japanese is not a Sinitic lect. Haplogy () 12:37, 29 November 2013 (UTC)
    On the one hand, it makes sense to use the terminology that's in widespread use in other references, so people can make the connection between our content and the content they find elsewhere. On the other hand, it would be nice to reflect current linguistic understanding and to be consistent across languages: as it stands now, we can't use our normal category structure to include Chinese, Japanese and Khmer examples of what's essentially the same POS, and the term "measure word" as it applies to English is more restricted in scope than as it applies to Chinese. The considerations are mutually incompatible, so we need to come up with a trade-off that everyone can live with- the simplest being to just accept Measure word as a valid header. We don't always go in that direction, as evidenced by our use of "Determiner" instead of "Article", and "Noun" instead of "Substantive" for languages where the main references disagree with us. Chuck Entz (talk) 19:58, 29 November 2013 (UTC)

Under current rules, they should be listed under a noun heading with the context labels "classifier/counter word". eg. :

===Noun===
# root
# origin, source
# basis
# book, notebook
# {{classfier|for books, brochures, pamphlets}}

The ideal format is, however,

===Definitions===
# (n.) Root of plants.
# (n.) Origin, source.
# (adj./adv.) Original; originally, initially.
# (n./pron.) Self; this.
# (adj.) Current, at present.
# (adj.) Main, central.
# (prep.) According to, in accordance with.
# (n.) Capital money.
# (n.) Book; booklet, brochure, notebook, pamphlet.
# (c.) ''Classifier for books, brochures, pamphlets.''

Wyang (talk) 02:04, 30 November 2013 (UTC)

It's a good point that measure words/counters can also be nouns or they used to be nouns. In many cases, though, these words they no longer have these meanings, e. g. knowing that Japanese or [Chinese / were nouns is only interesting from the etymological point of view.
Would be great if we reached a consensus on the naming. This part of spec his common in many unrelated Asian languages. --Anatoli (обсудить/вклад) 02:08, 1 December 2013 (UTC)
For most East Asian languages, classifier is the best term. Counters (助数詞) are classifiers used only after a numeral. In Mandarin, you say 一 and 那 while in Japanese you say 一 but you say just あの. — TAKASUGI Shinji (talk) 05:29, 4 December 2013 (UTC)
  • Wyang, why is this arrangement "ideal"? According to whom? (For the record: I'm not against it, I just didn't realise this had already been "deemed" ideal by consensus.) ---> Tooironic (talk) 21:42, 12 December 2013 (UTC)
Measure word, classifier vs counter header and PoS hav been brought up again, when I moved non-standard header ===Measure word=== to ===Counter===. See Talk:笔. --Anatoli (обсудить/вклад) 05:33, 1 April 2014 (UTC)

abbreviations with periodsEdit

Why are abbreviations with periods being deleted from Wiktionary? They are valid forms of the abbreviations. Not everyone lives in the UK where fullstops have been eliminated from the language when used with abbreviations. Look up an old book from Britian, pre-War (pre-WWII), and you'll see fullstops and spaces between letters in abbreviations. Why would these forms be invalid for inclusion in Wiktionary? If you look in pre-Vietnam War-era books, abbreviations generally included periods (".") aka fullstops . -- 65.94.78.70 05:23, 29 November 2013 (UTC)

Deleting abbreviations with periods is wrong. But I have not seen this happening myself, if you could give some examples, that would help. --WikiTiki89 16:29, 29 November 2013 (UTC)
This user has been creating lots and lots of alternative-form pages for somewhat-marginal terms. If you look at diff, you'll see that I deleted the alt-forms with periods, which was probably an error. I just didn't see the need for that many alt-forms for a term that few will ever be looking up, and the variants with periods seemed like the least necessary of the batch. Chuck Entz (talk) 16:54, 29 November 2013 (UTC)
Clearly this is some form of sexism, since only female abbreviations have periods. bd2412 T 18:04, 29 November 2013 (UTC)
Another factor: After looking through lots of search results in Google Books and Google Groups, almost all the usage is in the format of a three-letter abbreviation for "Markarian" followed by a space, then by a number. I believe there were a few with a hyphen instead of a space, but not a single one with the abbreviation followed by a period. If we do restore the variants with periods, I intend to immediately submit them to rfv, and I fully expect them all to fail. We don't do alternative-form entries unless the alternative forms are in actual use, per WT:CFI. Chuck Entz (talk) 19:13, 29 November 2013 (UTC)
  • I think that abbreviations should be listed as alternative forms of the main entry, not as synonyms. Mutually linking individual abbreviations of the same term as each others' synonyms seems redundant as well. --Ivan Štambuk (talk) 19:15, 29 November 2013 (UTC)
    I second that. Abbreviations and alternative spellings are not synonyms. --WikiTiki89 19:17, 29 November 2013 (UTC)

template: "also"Edit

Shouldn't all entries with variant capitalization, punctuation, etc pages always carry the {{also}} template? If not, why do we bother with template:also ? Right now, many template also transclusions I encounter are not complementary, they each have a different set of pages they link to. I suspect people would not think of what needs to be updated with new entries if all pages weren't going to be linked together through the "also". (such as a diacritic spelling only having an also referring to the diacritic-less lowercase version, while the one on the allcaps version has two or three entries, but the lowercase diacritic-less page has 5 entries on the also, which does not link to the allcaps or the other diacritic version, but five others) Searching for variant versions to add to an "also" is rather impossible, without entering every single type of variation manually, so if someone adds another variant, only the base diacritic-less lowercase version would get updated. If deleting the "also" is standard practice because it is not always used, why bother maintaining it, since it would be quite difficult/near impossible to find the various variants, without it being maintained all the time, isntead of only part of the time.

-- 65.94.78.70 05:39, 29 November 2013 (UTC)

Not every page. The template is for helping people find entries if they type in spelling variants. Having the template on the main entry with links to nothing but spelling variants of that entry isn't helpful at all: no matter what you click on, the page you go to will refer you back to the page you just came from. You also don't need to include every possible punctuation and capitalization variant: the idea isn't to be systematically thorough, but to make sure that whatever variation users are likely to type into the search box will allow them to find the page. Chuck Entz (talk) 06:08, 29 November 2013 (UTC)
{{also}} is intended to address differences such as among Á, À, Â, Ä, Ã, Å, Ą, Ả, Ă, Ā, Ạ, and Æ, and/or between A and a. Words that differ in these regards are often in different languages or otherwise have no semantic relationship. Consequently {{also}} appears above any L2 (language) heading.
In some cases all of the variants are in the same language and should be captured by a listing under alternative forms in a specific L2 section. This would often be the case with differences in periods, hyphens, and sometimes case, especially the all-caps variants.
HTH. DCDuring TALK 18:26, 29 November 2013 (UTC)

Hypothetical inflected formsEdit

Does any language mark these in inflection tables? I'm referring to specific inflected forms in otherwise valid paradigm. For example, vocative case for nouns denoting inanimate objects or abstract nouns (unless personified) is generally never attested. Should these be marked with an asterisk in the table, or it should be left to the reader to decide what is a plausible inflected form, and what merely hypothetical? --Ivan Štambuk (talk) 15:19, 30 November 2013 (UTC)

I could see that information about what authors in a language considered semantically impossible could be somewhat useful, so an asterisk might provide a hint. But, depending on genres and on style practices in the corpus, there can be underrepresentation of, say, second-person forms of verbs, that are spurious with respect to the language as a whole. Is there a way to do this that is worth the effort? DCDuring TALK 15:36, 30 November 2013 (UTC)
I haven't seen it for languages other than Gothic, mainly because the effort of searching through the corpus would be prohibitively high where that corpus is not a single book. In Latin, vocatives for inanimate nouns do come up, as well as other odd forms like plurals of uncountable nouns, and checking for them every time would be ridiculous. —Μετάknowledgediscuss/deeds 16:45, 30 November 2013 (UTC)
If we do it at all, it would only make sense to do it for languages that aren't used anymore. Things like case forms and verb inflections are so productive that even if they are not attested, there is no doubt that they exist in the minds of the speakers themselves. Any speaker can call upon it to use whenever they feel like it, and nobody would think twice about that. For example, if someone creates a new verb in Dutch and we can only find attestations for the present tense, we can still show the past tense and trust that it's correct, because what else could it be? Would Dutch people somehow need to ponder hard before using the past tense when they haven't heard of or used it before? No, they'd just use the same regular past tense formation that they use for all other verbs. To say "this verb has no past tense because there are no attestations for it" would be silly and not helpful at all. Productive inflections, as long as the inflectional pattern of the lemma is known, should be exempt from attestation rules. —CodeCat 17:11, 30 November 2013 (UTC)
Re: "Productive inflections, as long as the inflectional pattern of the lemma is known, should be exempt from attestation rules." I disagree with that, now as before. --Dan Polansky (talk) 17:18, 30 November 2013 (UTC)
Why? Can you give an example of when that would lead to something undesirable? —CodeCat 17:25, 30 November 2013 (UTC)
We've had this discussion before. I find misleading the reader about the existence of forms undesirable. I do not want non-existing forms represented as actually existing. That pertains to derivations as well: -ness forms ("blueness", "bigness") should be included only if they are attested. With -ness forms or -er forms (agent nouns), I don't recall people suggesting that we create unattested but plausibly formed entries in the mainspace, but for inflected forms, you apparently feel otherwise. In your arguments, you fail to make a distinction between a predictably obtained inflected form and a predictably obtained derived lemma, as in this sentence of yours: "Would Dutch people somehow need to ponder hard before using the past tense when they haven't heard of or used it before?". If that rhetorical question is meant in earnest as an argument, it would support our having unattested -ness forms as well. --Dan Polansky (talk) 19:13, 30 November 2013 (UTC)
I feel the same, though in my own editing I have bowed to consensus and added entries for the occasional unlikely form. I don't think we should add words that don't exist (in use, or in corpora), even if they might conceivably exist. Same as for other parts of speech like, say, the noun unwhippableness. Equinox 17:27, 30 November 2013 (UTC)
I think you're conflating different issues. No-one is suggesting that we create entries for unattested lemmata; only unattested inflected forms that have no reason not to become attested, once their lemma form is attested. —Μετάknowledgediscuss/deeds 18:07, 30 November 2013 (UTC)
In grammar, the term "defective" is used when parts of an inflectional paradigm are missing, when they would otherwise be expected to exist (semantic reasons excepted). I think it's quite a reasonable assumption that, unless proven otherwise, a word is not defective. What's open to discussion is what is part of an inflectional paradigm and what is a separate lemma. That is, which words can be expected to exist (and their absence would make the paradigm defective/irregular) and which would not necessarily be expected to exist? This differs for each language. In Dutch, diminutives could be considered part of a noun's inflections, but certainly not in many other languages. —CodeCat 18:23, 30 November 2013 (UTC)
Meta: but unattested inflections of verbs are what I said: "words that don't exist (in use, or in corpora), even if they might conceivably exist". Dutch diminutives of nouns would be yet another example. Equinox 19:07, 30 November 2013 (UTC)
FWIW, latrate has marked its third-person singular and its past tense form with asterisks ever since Doremítzwr added it. I'm not aware of other entries that do that, and I imagine someone will eventually update latrate to not do that.
Regarding "productive inflections", I think it appropriate to distinguish between individual slots in inflection tables that don't happen to be attested, and entire sections. If I could only find two citations of mitternachtsblauen as the neuter mixed genitive form of mitternachtsblau, I'd still list it in mitternachtsblau’s inflection table if enough other forms were attested (e.g. feminine weak dative mitternachtsblauen, masculine strong nominative mitternachtsblauer, plural mixed accusative mitternachtsblauen, etc) that it was clear mitternachtsblau inflected. As Metaknowledge notes, it'd be prohibitively hard to do otherwise; users would need 156 citations to attest a single table, a good understanding of German grammar to know which of the 26 homographic grammatical slots each citation of mitternachtsblauen supported, and different tables for every German adjective that was missing a slot, based on which slot was missing. On the other hand, if a noun is never attested in the plural, we should describe it as uncountable; if an adjective is never attested in the comparative or superlative, we should describe it as incomparable; etc. - -sche (discuss) 18:38, 30 November 2013 (UTC)
Talk:horreo may also be of interest. - -sche (discuss) 18:40, 30 November 2013 (UTC)
As for practicality of requiring attestation of forms, it should proceed on the same principle as the attestation of lemmas, namely that a request for attestation should be made only if attestation is actually in doubt. I do not see any difficulties here. --Dan Polansky (talk) 19:10, 30 November 2013 (UTC)
That would require us to loosen our attestation requirements for some languages. Spanish, for example, would have to be a "limited documentation language", because there are many parts of the conjugation that are uncommon. Are you O.K. with that? —RuakhTALK 19:32, 30 November 2013 (UTC)
I do not think we would need to loosen attestation requirements. If a Spanish inflected form is not 3-attested, then it isn't; that is not a big deal. But as a matter of practical consequences, I admit that marking hypothetical forms in inflection tables may be impractical; what is not impractical, though, is keeping hypothetical inflected forms in the mainspace but marking them as hypothetical on their own pages once they are challenged and fail RFV. For some forms, their being unattested will be obvious even without a RFV. --Dan Polansky (talk) 20:48, 30 November 2013 (UTC)
If you're marking some forms as unattested, you're implying that unmarked forms are attested. Some of that kind of confusion is unavoidable in an unfinished work-in-progress such as ours, but in this case there's not much chance that more than a vanishingly-small percentage of the entries will have correct information anytime soon. Chuck Entz (talk) 02:30, 1 December 2013 (UTC)
I think marking certain inflected forms of valid paradigms as insufficiently attested would mislead more readers, and mislead them in a more harmful way, than not marking such forms would. For example: someone who was learning German and who was about to use an adjective (foobar) in a sentence could turn to Wiktionary to double-check that the ending applied to a neuter adjective in the nominative after ein is indeed -es and not -e (as it would be after das). If this dictionary told them foobares was not attested in that context, I think the odds are slim that they would grasp that such a statement signified only that at the time some Wiktionary users checked, insufficiently many books using the word in that case had been digitised by Google. I think the odds are better that they would conclude that they had to use some other form, and thus they would end up writing something ungrammatical.
I can conceive of linguists who might want to study whether certain forms of valid paradigms actually occurred — they might want to see, for example, whether or not speakers actually used certain semantically complex colour adjectives in the dative — but such people should, and almost certainly would, do their own research using well-defined corpora, rather than copy the results of Wiktionary RFVs. - -sche (discuss) 17:58, 4 December 2013 (UTC)
I agree with -sche (specifically, his comment of 18:38, 30 November 2013 (UTC)). —RuakhTALK 19:32, 30 November 2013 (UTC)
There aren't that many terms in ancient languages that are attested for the entire paradigm, especially for rarer parts of the paradigm such as the dual in Greek. If we're going to start insisting that unattested, but predictable forms be marked, that's going to conflict with our use of templates to create inflection tables, and our use of bots to create form-of entries based on those tables. Besides, not every ancient language has the kind of easily-accessible resources necessary to verify that a given form really is unattested.
If we deal with it on a case-by-case basis, marking will be more an indicator of which terms people are interested in than of the distribution of attested vs. unattested forms. If we insist on including the information when things are added, we'll just end up with fewer people who are willing to work with those languages. Chuck Entz (talk) 19:41, 30 November 2013 (UTC)

es-verb-form redundancy discussionEdit

Please see Wiktionary talk:About Spanish#es-verb-form redundancy. I thought that would be the best place to discuss a language specific issue. Mglovesfun (talk) 19:19, 30 November 2013 (UTC)

Distinguishing languages which have identical names: first by country? by region? or by family?Edit

From time to time, it happens that English calls multiple languages by the same name. Because Wiktionary's system of categories, headers, etc requires languages to have distinct names, our first recourse (as documented in WT:LANG) is to see if one or both can be called by another name.

If neither language has another name, Wiktionary's next recourse has been to distinguish them by their home countries: hence "Bodo (India)" and "Bodo (Central African Republic)", and "Mono (United States of America)", "Mono (Democratic Republic of the Congo)" and "Mono (Cameroon)". If two languages share both a name and a nation, then we use family membership to tell them apart: "Austronesian Mor" vs "Papuan Mor". (This is also Ethnologue's practice. It's not spelled out in WT:LANG, but it's demonstrated by WT:LANGLIST.)

On my talk page, WikiTiki suggested using regions rather than countries to distinguish languages, because countries change. For the Monos, that would mean "Mono (America)" and "Mono (Congo)" or "Mono (Africa)". Personally, I prefer country names, and I'll lay out my reasons in a while.

However, WikiTiki's post did prompt me to wonder if we should make family info our second recourse rather than our third. If we did, only a few lects would be left that had to be distinguished by place (mostly because the family of one or both was unclear, rather than because they shared a family). What do you think? Would you prefer to distinguish languages by place first and family second, or vice versa? And when it's time to distinguish languages by place, do you think country names would be better, or region names?
- -sche (discuss) 23:19, 30 November 2013 (UTC)

Or just use the zero-width joiner so they all look identical but are different from a Unicode standpoint. -- Liliana 23:24, 30 November 2013 (UTC)
Clearly we should just give each language a unique number and stop worrying about name collisions. DTLHS (talk) 00:06, 1 December 2013 (UTC)
The word polity could be used in place of the word "country". --Lo Ximiendo (talk) 23:26, 30 November 2013 (UTC)
What exactly are we using the language names for? We have language codes to provide a unique key for our data structure to keep languages separate, so if we're using the language names as unique keys, too, we're just being redundant. My impression is that English-language language names are for the purpose of letting humans distinguish between languages, not computers. Humans are the ones who use the same names for different languages, and they won't add invisible codes unless you go to a great deal of trouble to educate and police them. By all means, let's avoid ambiguity in language names- but for the sake of humans, not computers. Chuck Entz (talk) 01:47, 1 December 2013 (UTC)
Right now we're using the language names in approximately every single entry. It seems like getting rid of explicit language names absolutely everywhere should be a separate discussion (good luck with that). DTLHS (talk) 01:56, 1 December 2013 (UTC)
My point wasn't that we shouldn't use language names or that we shouldn't keep them unique, just that you can't have categories that are indistinguishable to humans. If someone manually adds a category name, how are are we going to keep them from typing in the name without the zero-width joiner for the language that has one in the name? It seems like a very efficient way of making our contributors feel dumb, but not very effective for making sure that things get put in the right category. Chuck Entz (talk) 02:14, 1 December 2013 (UTC)
I assumed Liliana was joking, because the idea would be unworkable for the reasons you outline. When Wiktionary one day contains all the world's languages' words and no-one needs to type language names any more, then it might be a decent idea. - -sche (discuss) 02:22, 1 December 2013 (UTC)
Actually, I implemented that solution once. It worked fine until I was called out by RU. -- Liliana 21:11, 1 December 2013 (UTC)
(after edit conflict) The reason I prefer region names to country names is that country names (and borders) change all the time, region names generally don't. Also, countries refer to specific land between specific borders, while regions are more general and usually don't have fixed borders, thereby allowing for its name to be used for languages that overflow across the borders of countries.
@DTLHS, That's what the language code does, but we still want to have unique names as well.
@Lo Ximiendo, that is not relevant, we do not use the word "country" in language names anyway, the question here is whether we should use names of countries (or polities, if you prefer), rather than names of regions.
--WikiTiki89 01:54, 1 December 2013 (UTC)
When place-names are used, I'd prefer they continue to be country names rather than region names, because I think country names are better disambiguators. I recall only three cases where lects had the same name and home country, viz. the Mors, two of the Maris, and two of the Karas, of which the first two were solved by using family info and the last by alternate names. In contrast, it's probably more common for same-name languages to come from the same region. For example, two of the Monos are from Africa, and both the DRC and its neighbour the CAR have Ngombes. The Ngombes in particular can't be distinguished any other way, AFAICT. They can't be distinguished by family info, because it's not clear which family one of them belongs to, even though it is clear it's not the Bantoid family the other one belongs to. And what regions could distinguish them, "Central Africa" and "Congo"? But the Congo is in Central Africa.
And that's just the languages we already encode: as more languages are encoded, I expect that a lect called "Foobar (South America)" is more likely to have to be renamed following the recognition of another South American Foobar than a lect called "Foobar (Chile)" is to have to make way for a second Chilean Foobar.
Lastly, "countries" is a well-defined, almost-closed set whose members have agreed-upon names. Regions are murkier; there's room for disagreement (is a language spoken in Iraq from "Mesopotamia"? the "Middle East"? "Western Asia"? "Asia"?) and for simple inconsistency: I foresee that the Cameroonian Mono might end up being "Mono (West Central Africa)" to distinguish it from the DRC's Mono, yet another Cameroonian lect that only had to compete with an Indonesian homograph might end up being just "Foobar (Africa)".
Countries don't change often enough to cause us problems, IMO. - -sche (discuss) 03:10, 1 December 2013 (UTC)

(after edit conflict) :There are problems with each of the criteria:

As I'm sure you're aware, languages in less-accessible parts of the world are often surprisingly poorly-studied, and language classification can change as linguists debate how much weight to give to vocabulary, which might be borrowed, or typology, which might be the result of substratum or areal influence (not to mention philosophical and methodological changes as the lumpers/splitters and mass-comparisonists/comparative reconstructionists battle it out over the years)
Regions can be problematic because of areas like Southeast Asia and Sub-Saharan Africa with huge concentrations of individual languages and little awareness of languages outside of their immediate area, so duplication within a region is much more likely.
Countries do change names and borders, but beyond that, language geography only loosely correlates with political boundaries in places without strong national institutions/culture (and mass population movements don't help), so country names may not mean much.
I think it boils down to this: what's going to make the most sense to the people using the names? Are the contributors who edit in such languages more likely to recognize their languages by the family, or by the geography? If they're (amateur or professional) linguists, they'll know the language family, but may have to look up the geography. Native speakers and people who learned from them know the geography, but may have no clue about classification. Or do we cater to the general users- what's important to them? Chuck Entz (talk) 04:20, 1 December 2013 (UTC)
Countries don't change often enough to cause us problems – we have major problems in knowing what we mean when we talk about languages, dialects, and regionalisms, partly because we naturally conflate regions with states. American English came about before a United States of America was ever conceived, and British English is not defined by nor constricted to within the borders of the United Kingdom of Great Britain and Northern Ireland.
Every case is different, but we should use geographic or ethnic names where possible, instead of names of states.
Since we are an etymological and historical dictionary, we should also try to use names that make a bit of sense when applied over the period of languages’ existence, which may encompass its native written and spoken record, third-party documentation of them, and reconstructions of historical forms. Need our readers and editors guess whether Mono (Democratic Republic of Congo) is the same language as Mono (Congo Free State), Mono (Belgian Congo), Mono (Republic of the Congo), or Mono (Zaïre)? Michael Z. 2013-12-04 21:42 z
I'd be very wary of conflating language with ethnicity.
What do you think of the suggestion of using linguistic family names more often? E.g. "Chadic Foo" vs "Senegambian Foo"? We already use them when placename-based disambiguation fails.
If we were to use region names, they would end up being country names in at least some cases, I think — unless you can think of recognisable regions to put, say, Cameroon and Nigeria in, to distinguish Cameroon's Mada from Nigeria's Mada.- -sche (discuss) 01:02, 5 December 2013 (UTC)
I only mentioned ethnicity as a fallback, because a few languages are not strongly associated with countries, like maybe Yiddish, Roma, or Mennonite Low German (Plautdietsch). I have no idea if this is relevant to any languages that need disambiguation of their names. Michael Z. 2013-12-07 04:22 z
I don't think there is anything wrong with using a country as a region. The thing that I really think should be avoided is using the official political name of a country. For example, in "Mono (Democratic Republic of the Congo)" the words "democratic" and "republic" have absolutely no role in distinguishing the language. I don't see why we can't just call it "Mono (Congo)", which is more concise and readable and still accurate even if the Congo decides to change from a democratic republic to a republican democracy. --WikiTiki89 01:15, 5 December 2013 (UTC)
Yup, that’s exactly what I meant. Michael Z. 2013-12-07 04:16 z
I just want to note that, as part of my ongoing effort to ensure that all ISO-recognised codes are either included or documented as excluded, I have recently added a number of hitherto unuploaded codes. Many unuploaded codes belong to languages that have the same name as other languages — it seems the bots that imported most of the codes to Wiktionary in its early days skipped cases of naming conflict. In some cases, it has been possible to find alternative names to resolve the naming conflicts prior to uploading the codes, but in other cases, it has not, and I added the languages using country name disambiguators, as that has traditionally been our and Ethnologue's first go-to method of disambiguation. I am not trying to make more work for anyone if we decide to switch to family or region names; I'll rename the entries myself. I simply recognise that (as someone commented recently in a thread I can no longer find) if work (in this case, the verification and importation of codes) is put off, around here, it often ends up being put off forever. - -sche (discuss) 01:02, 5 December 2013 (UTC)

Identical names for Tatars -

  1. Modern Tatars from Tatarstan, Russia
  2. Mongolian tribes of the 13th century that attacked Russia in 13th century with Mongols.

The original name was later transferred to any Turkic people and finally stuck with a single Turkic people, one of the peoples living on the Volga river. --Anatoli (обсудить/вклад) 01:12, 5 December 2013 (UTC)

  • Following up on this discussion, here are all of the languages which are already or should soon be in Module:languages which are currently disambiguated by country. After each entry are my notes, including my suggested place-based renames. Please let me know if you agree, or disagree, or have other suggestions. - -sche (discuss) 06:58, 17 December 2013 (UTC)
I'd rather we didn't use (America) as a parenthetical. If United States of America is to be changed, then instead to United States, US, North America?--Prosfilaes (talk) 07:21, 17 December 2013 (UTC)
The weak consensus/plurality above seems to be to use region names rather than country names, which would seem to rule out "United States". But it's a good point that we'd be better off specifying "North America" (because there are also a lot of languages in South America). - -sche (discuss) 09:25, 17 December 2013 (UTC)
Perhaps we could call it "Mono (California)", which is more specific and avoids the wordiness of "North America". California was still California before it was a state, and will still be California if it somehow ceases to be a state. --WikiTiki89 13:30, 17 December 2013 (UTC)
Either that, or Mono Paiute. Chuck Entz (talk) 13:52, 17 December 2013 (UTC)
I have effected all of the renames I suggested above. I have also renamed all of the languages which used "Central African Republic" as a disambiguator, either to use completely different names that don't require disambiguation, or to use "Central Africa" (as contrasted with West Africa). - -sche (discuss) 05:17, 23 December 2013 (UTC)