Open main menu
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit


June 2012

Cognates of parts of a compound in etymologies

I just made an edit to overmorrow, removing a long list of cognates diff that seemed to overly complicate and obscure the etymology. I don't think there really should be any cognates listed for the parts of a compound word (cognates of the compound are fine, which I added). But I can't find anything about this on WT:ETY so I wonder is there a rule or guideline about this? Should there be one? —CodeCat 00:32, 2 June 2012 (UTC)

I would just use my discretion. I was going to cite the cranberry example, but I'm sure we have some coverage at cran-, or soon will. DCDuring TALK 15:45, 2 June 2012 (UTC)
I tend to remove them and add to the lemma form. This is really just one facet of the debate on how much to include in etymologies, I'd say removing cognates in this sort of situation is as uncontroversial as it gets on this topic. Mglovesfun (talk) 16:42, 2 June 2012 (UTC)
FWIW, I agree. Ƿidsiþ 05:43, 17 June 2012 (UTC)

Update on IPv6

(Apologies if this message isn't in your language. Please consider translating it, as well as the full version of this announcement on Meta)

The Wikimedia Foundation is planning to do limited testing of IPv6 on June 2-3. If there are not too many problems, we may fully enable IPv6 on World IPv6 day (June 6), and keep it enabled.

What this means for your project:

  • At least on June 2-3, 2012, you may see a small number of edits from IPv6 addresses, which are in the form "2001:0db8:85a3:0000:0000:8a2e:0370:7334". See e.g. w:en:IPv6 address. These addresses should behave like any other IP address: You can leave messages on their talk pages; you can track their contributions; you can block them. (See the full version of this announcement for notes on range blocks.)
  • In the mid term, some user scripts and tools will need to be adapted for IPv6.
  • We suspect that IPv6 usage is going to be very low initially, meaning that abuse should be manageable, and we will assist in the monitoring of the situation.

Read the full version of this announcement on how to test the behavior of IPv6 with various tools and how to leave bug reports, and to find a fuller analysis of the implications of the IPv6 migration.

--Erik Möller, VP of Engineering and Product Development, Wikimedia Foundation 00:53, 2 June 2012 (UTC)

Distributed via Global message delivery. (Wrong page? Fix here.)

The pages on this might be a little inaccessible for the administrator who just wants to know "How do I block these new-fangled long IP address things?". I've therefore started an alternative — designed as a user guide rather than project notes — page at m:User:Jonathan de Boyne Pollard/Guide to blocking IP version 6 addresses. It's on Meta because, of course, it's not only the administrators here on this particular project that are going to be affected by this. Jonathan de Boyne Pollard (talk) 03:37, 5 June 2012 (UTC)

Here's a v6 IP: Special:Contributions/2A01:E35:8AAF:B70:3DEC:3BFC:B001:26C5. Seems to be able to edit, and to be editing normally and helpfully. That's good. The WHOIS/Geolocate pages don't work, though. - -sche (discuss) 04:08, 10 June 2012 (UTC)
And we've also already had a fake-IPV6 user name pop up .fedc:6482:cafe:ba05:a200:e8ff:fe65:df9a (talkcontribs) (now blocked). Chuck Entz (talk) 04:21, 10 June 2012 (UTC)
The WHOIS link doesn't work at all for any IP, but Geolocate definitely is having problems caused by the new IPs. The other links seem to work, though: RIPE narrows it down to France, if I'm reading it right. Chuck Entz (talk) 04:38, 10 June 2012 (UTC)
Could any administrator replace the link in Template:anontools with [{{{1}}} WHOIS]? This should work with both IPv4 and IPv6. --Whym (talk) 06:39, 10 June 2012 (UTC)
How does it work? I just put in an IP address that recently edited, and the page seemed to just be a mirror of the ARIN page (which {{anontools}} already links to, as "America"). If I had put it a non-American IP address, would it have recognized that and done something different instead? —RuakhTALK 13:13, 10 June 2012 (UTC)
I don't think they are much different. The 'Regional' ones may add more detailed information, as Chuck Entz mentioned above. If I recall correctly, the first link on the template had been to simply provide what a whois command gives, and the link to the Toolserver tool would restore that function in the most visible place in the template. Having said that, I agree that the list might need to be revamped to reduce redundancy. --Whym (talk) 16:43, 10 June 2012 (UTC)

2011 Picture of the Year competition


Dear Wikimedians,

Wikimedia Commons is happy to announce that the 2011 Picture of the Year competition is now open. We are interested in your opinion as to which images qualify to be the Picture of the Year 2011. Any user registered at Commons or a Wikimedia wiki SUL-related to Commons with more than 75 edits before 1 April 2012 (UTC) is welcome to vote and, of course everyone is welcome to view!

Detailed information about the contest can be found at the introductory page.

About 600 of the best of Wikimedia Common's photos, animations, movies and graphics were chosen –by the international Wikimedia Commons community– out of 12 million files during 2011 and are now called Featured Pictures.

From professional animal and plant shots to breathtaking panoramas and skylines, restorations of historically relevant images, images portraying the world's best architecture, maps, emblems, diagrams created with the most modern technology, and impressive human portraits, Commons Features Pictures of all flavors.

For your convenience, we have sorted the images into topic categories.

We regret that you receive this message in English; we intended to use banners to notify you in your native language but there was both, human and technical resistance.

See you on Commons! --Picture of the Year 2011 Committee 18:13, 5 June 2012 (UTC)

Distributed via Global message delivery. (Wrong page? Fix here.)

—This unsigned comment was added by EdwardsBot (talkcontribs).

LDL template

See also: Wiktionary:Votes/2012-04/Languages with limited documentation

With the LDL (languages with limited documentation) vote set to pass later today, I thought I'd point out a line in the vote that requires an LDL template:
"a box explaining that a low number of citations were used should be included on the entry page (such as by using the {{ldl}} template). "
To that end, BenjaminBarrett12 and I have created such a template, at User:Metaknowledge/ldl. Your comments, revisions, and other feedback is are requested. Feel free to edit it, but discuss any major changes here. Thanks --Μετάknowledgediscuss/deeds 19:12, 5 June 2012 (UTC)

Probably that's not the best name, since it looks like a language code. —RuakhTALK 19:35, 5 June 2012 (UTC)
I already noticed that, and my plan is to move it to {{LDL}}. Any feedback about the substance? --Μετάknowledgediscuss/deeds 19:42, 5 June 2012 (UTC)
The text seems right. The color could use some improvement. —RuakhTALK 19:49, 5 June 2012 (UTC)
BB said the previous color (hi-liter yellow) caused interference with link color due to color blindness. If you feel like trying something new, here's the palette. --Μετάknowledgediscuss/deeds 19:57, 5 June 2012 (UTC)
I have shifted the existing green up from 99FF99 to CCFFCC. Looks better to me, as the previous green was very lurid and felt more like foreground than background. Please tweak/revert as needed. (I agree with the text, BTW!) Equinox 20:03, 5 June 2012 (UTC)
Looks much more official! Thanks, guys. What do you say to bolding the key words about "less than three"? --Μετάknowledgediscuss/deeds 20:34, 5 June 2012 (UTC)
I don’t think it’s necessary. It whole box already stands out more than enough. Ungoliant MMDCCLXIV 23:01, 5 June 2012 (UTC)
I think the template should mention somewhere that it’s allowed to have less than 3 citations only because the language has limited documentation. Otherwise someone might think that any unattested term is acceptable (in any language) as long as the template is there. Ungoliant MMDCCLXIV 23:01, 5 June 2012 (UTC)
Ach, I was hoping to keep it to two lines... --Μετάknowledgediscuss/deeds 02:35, 6 June 2012 (UTC)
I was hoping for slightly shorter, too, but I really like the second sentence! --BB12 (talk) 05:34, 6 June 2012 (UTC)
Thank you. I just cut it down a bit more in length (in tabbed langs it was running 4 lines). --Μετάknowledgediscuss/deeds 05:44, 6 June 2012 (UTC)
Msh210 and others: please see Template talk:LDL. --Μετάknowledgediscuss/deeds 05:55, 6 June 2012 (UTC)

OK, I think it is basically finalized. For an example of the template in action, see ovaspen. Note that I added a new sentence to address Ungoliant's concerns. --Μετάknowledgediscuss/deeds 05:34, 6 June 2012 (UTC)

'cognate with' or 'cognate to'?

What is actually the right way to say this? Most of our etymologies use 'cognate with' so I've just copied that... —CodeCat 00:02, 6 June 2012 (UTC)

To me, only "cognate with" is correct. I’ve always considered "cognate to" to be a barbarism based on analogy with "related to". It reminds me of this new idiom "bored of". I have wondered whether "cognate to" and "bored of" might be Britishisms. —Stephen (Talk) 00:28, 6 June 2012 (UTC)
I think the latter came from (have) tired of, confused with (am) bored with. It is hardly new; heard it in the '80s. Equinox 00:32, 6 June 2012 (UTC)
According to the Google Ngram Viewer, "cognate with" is more than three times as common: —RuakhTALK 00:31, 6 June 2012 (UTC)

Collins uses with: something that is cognate with something else.”[1] Oxford gives an example with with: “the term is obviously cognate with the Malay sedan.”[2]

COCA has five examples of cognate to and 17 of cognate withMichael Z. 2012-06-13 01:49 z

Minor CFI change

As was noted in the thread two sections above this one, the CFI has been changed to reference {{ldl}}, which unfortunately exists (although it is displayed as a redlink in the CFI by using an illegal namespace) and has as its output the string Kaan. I want to make sure there is consensus to instead have it mention {{LDL}}, which was formerly known as User:Metaknowledge/ldl. --Μετάknowledgediscuss/deeds 02:46, 6 June 2012 (UTC)

+1, as they say on Reddit and Stack Exchange (and probably elsewhere).​—msh210 (talk) 05:21, 6 June 2012 (UTC)
I support this as well. I assume right now I'm in the template dog house for getting that wrong.... --BB12 (talk) 05:51, 6 June 2012 (UTC)
+1. (By the way, I've created [[+1]]. I think it actually originated on Slashdot, and here's a Slashdot cite from before either Reddit or Stack Exchange existed — but I've left the etymology a bit open.) —RuakhTALK 12:49, 6 June 2012 (UTC)
Of course, stupid of me, how could I forget Slashdot.​—msh210 (talk) 22:41, 6 June 2012 (UTC)

  Done --Μετάknowledgediscuss/deeds 18:03, 7 June 2012 (UTC)

Bot flag for JAnDbot

I'm asking for bot flag and unblocking for JAnDbot. It is global bot based on pywikipedia. Is active on all wikipedias and many wiktioanries, but it was blocked in en more than three years ago. In that time there was active Interwicket, but is no more. When I am doing interwiki, I usually see, that in en are missing several links. I am working on category namespace too.

The problem was, that bot removed "incorrect" links - non matching in wiktionary mode (e.g. Mars/mars). In that time was no other possibility how to delete dead links and leave these non-matching, which are allowed in some Wiktioanries, but not welcome in others (like cs:). Now I am usualy running bot with -cleanup parameter, which does not remove links to redirects and non-matching, but removes dead links only, so the main problem is solved.

Thanks, JAn Dudík (talk) 16:50, 6 June 2012 (UTC)

To get a bot flag, you need to make a bot vote at WT:V. --Μετάknowledgediscuss/deeds 18:47, 6 June 2012 (UTC)
You say it does not remove links to redirects now. But that’s what it did here: ... why did that happen? Can it be fixed? —Stephen (Talk) 01:25, 7 June 2012 (UTC)
Yes, this was my fault, I use bot both at home and at work. And in one of these two computers I had bad template in command line. Now it should be fixed.
But, if flag will not be granted, please, unblock my bot, I want to make interwiki in category namespace too. JAn Dudík (talk) 21:40, 9 June 2012 (UTC)

Update to languages with limited documentation

Following the pass of the vote for languages with limited documentation, Metaknowledge presented me with some additional languages to be added. As I looked at those languages and others, I wound up reconstructing the LDL language list into an inclusion and exclusion list as I had long suspected would be necessary.

The only language not excluded among those in Google's advanced search settings is Esperanto.

In my review, I tried to be reasonably thorough with the languages of the world. Among the resources I used were w:Language isolate, w:List of language families and w:Mixed language.

I would like to ask for consensus to change the list from this:

Other languages with limited online documentation

The following are considered to be other languages with limited online documentation as provided on the Criteria for inclusion page (living languages unless specified otherwise):

  1. languages of the Americas (including the Caribbean), Australia and Oceania (excluding European languages having an official national status in Europe,* and Tagalog);
  2. languages of Europe not having an official national status, including the extinct language Dacian (excluding Basque and Scots);
  3. the North Caucasian languages; the Kartvelian languages (excluding Georgian); Kven Finnish; and Meänkieli;
  4. the following languages of Africa: Amharic, Khoisan languages, Wide Grassfields languages, and Zarma;
  5. the Andamanese languages; the Dravidian languages (excluding Kannada, Malayalam, Tamil and Telugu); Assamese; Kokborok; Lepcha; Maldivian; Meitei; Mizo; and Sinhalese;
  6. Äynu, Shaozhou Tuhua and the Tibetan languages; and
  7. the Formosan languages and languages of Southeast Asia (excluding Cantonese, Indonesian, Malay, Standard Mandarin, Thai (tha) and Vietnamese).
* Thus, while Dutch is the official language of Suriname, because Dutch is a European language with official national status in the Netherlands, it does not qualify. This applies similarly to languages such as English, French, German, Norwegian, Portuguese and Spanish.

This page may be modified through general consensus. To make a request to add or exclude a language, go to the Beer Parlour and click the "+" tab at top to input your request.

to this, where boldface text (except for the title) indicates a change in content:

Other languages with limited online documentation

This page lists languages considered to be other languages with limited online documentation as provided on the Criteria for inclusion page. Languages in the Inclusion List qualify unless they are on the Exclusion List.

Inclusion List (living languages unless specified otherwise)

  1. languages of the Americas (including the Caribbean), Australia and Oceania;
  2. Maltese, languages of Europe not having an official national status and the extinct language Dacian;
  3. the Uralic, North Caucasian and Kartvelian languages;
  4. the Afroasiatic, Khoisan, Niger-Congo and Nilo-Saharan languages, Bangime, Dompo, Jalaa, Mbugu and Sandawe;
  5. the Andamanese, Dravidian, Indo-Iranian and Siangic languages, Jarawa and Kusunda;
  6. the Mongolic and Turkic languages,
  7. the Austro-Asiatic, Hmong–Mien and Sino-Tibetan languages, Bonin Mixed Language and Shaozhou Tuhua;
  8. the Austronesian and Tai-Kadai languages.

Exclusion List

  1. ,
  2. Basque, Catalan and Scots;
  3. Estonian, Finnish, Georgian and Hungarian;
  4. Afrikaans, Arabic, Hebrew, Swahili, Xhosa and Zulu;
  5. Bengali, Farsi, Kannada, Malayalam, Tamil and Telugu, and the languages of India provided in the Eighth Schedule to the Constitution but not Assamese or Meitei;
  6. Azerbaijani and Turkish;
  7. Cantonese, Standard Mandarin, and Vietnamese; and
  8. Indonesian, Malay, Tagalog and Thai (tha).

This page may be modified through general consensus. To make a request to add or exclude a language, go to the Beer Parlour and click the "+" tab at top to input your request.

We also discussed whether Yiddish should be excluded. I would like to bring that up as a separate topic after this is concluded. --BB12 (talk) 16:56, 7 June 2012 (UTC)

Actually, Catalan has official status (in Andorra), which is why I didn't suggest it for the exclusion list; it is already excluded. Otherwise, I support this important advance. --Μετάknowledgediscuss/deeds 17:44, 7 June 2012 (UTC)
Catalan is also official in Catalonia itself. Does that count? What about West Frisian? That is fairly well documented too, and West Frisian Wikipedia has almost 25000 articles. —CodeCat 17:49, 7 June 2012 (UTC)
Official, in this case, denotes national recognition, which Catalan lacks in Spain. West Frisian WP might be doing well, but if you search a term in West Frisian on bgc, do you get enough non-mention hits to fully cite every term we have in it? --Μετάknowledgediscuss/deeds 17:53, 7 June 2012 (UTC)
This change would count French and Spanish as LDLs (under item 1).​—msh210 (talk) 18:50, 7 June 2012 (UTC)
I assume it is supposed to mean “native languages of the Americas, etc.”. This should indicated. Ungoliant MMDCCLXIV 18:57, 7 June 2012 (UTC)
That's an oversight - I think the original footnote should be carried over. --Μετάknowledgediscuss/deeds 21:31, 7 June 2012 (UTC)
I think that exclusions should be kept adjacent to the associated inclusions. BTW, Afrikaans can be removed from the #4 exclusions, since none of the #4 inclusions covers it. (Right?) —RuakhTALK 20:08, 7 June 2012 (UTC)
It was getting pretty messy that way. (But you're right about removing Afrikaans.) --Μετάknowledgediscuss/deeds 21:31, 7 June 2012 (UTC)
There are other ways to address messiness. For example, the exclusions could be given as bullet-points right under the corresponding inclusions:
  1. The non-extinct Afroasiatic, Khoisan, Niger-Congo, and Nilo-Saharan languages are considered languages with limited documentation, as are Bangime, Dompo, Jalaa, Mbugu, and Sandawe.
RuakhTALK 22:24, 7 June 2012 (UTC)

I wasn't aware of Catalan's status (or of the existence of Andorra, my apologies, Andorrans!). That can be removed. More importantly, though, this feedback made me realize the inclusions are not longer needed. Languages with official status in Europe can be added to the exclusions and Afrikaans kept, and then the Exclusion List should be adequate. The Wiktionary:CFI#Languages_with_limited_online_documentation section will need to be rewritten as well. I'll draft all that up in the next day or so if there are no objections. --BB12 (talk) 23:27, 7 June 2012 (UTC)

  • Linking to language family categories might be more helpful than linking to ethnologue's pages or Wikipedia articles, by the way. --Yair rand (talk) 23:48, 7 June 2012 (UTC)
If we move to an exclusion list, linking to Wiktionary family categories won't be necessary, but even so, Wiktionary does not provide the necessary language cover. Category:Andamanese_languages does not cover the w:Ongan languages, for example, and there is no Category:Siangic. BTW, I used Wikipedia links as a general rule, but the Ethnologue when there was any doubt about affiliation. w:Niger-Congo_languages has a question mark for affiliation with the Mande languages, but the Ethnologue says they are related. (I'm not concerned here with the truth of affiliation, just trying to make a list for our purposes.) --BB12 (talk) 00:21, 8 June 2012 (UTC)
For Inclusion List #1, I would change "languages" to "indigenous languages". For #2, Maltese is an Afroasiatic language, so might not need to be listed, unless you have something somewhere explicitly excluding languages with national official status in Europe. Speaking of official status, I think "national official status" is clearer than "official national status". Also, why is Dacian listed, alone among all the extinct languages of Europe? Chuck Entz (talk) 05:17, 8 June 2012 (UTC)
I don't want to change it to "indigenous languages," because that rules out languages like Hunsrik. Thank you for Maltese; I'm planning to just use exclusions, so I will have to note it as not being excluded. I agree that "national official status" sounds better and clearer. I will look at that a little more. As per an earlier discussion (Wiktionary_talk:Votes/pl-2010-12/Attestation_of_extinct_languages), Dacian is known only through mentions, not through usage, so not including it here necessarily excludes it completely. A vote is planned to expand the extinct languages. I hope you will join in the discussion when it goes up :)
Indigenous can be ambiguous too, because any language becomes indigenous with time. I think 'pre-colonial' or 'non-indo-european' might be a better term. —CodeCat 11:17, 8 June 2012 (UTC)
But 'pre-colonial' would exclude (most of?) the signed languages.​—msh210 (talk) 15:08, 8 June 2012 (UTC)
FWIW, Hunsrik is Indo-European and sign languages have a separate policy. --BB12 (talk) 15:24, 8 June 2012 (UTC)
Oh, right. I forgot that the SL policy page also includes special attestation rules.​—msh210 (talk) 17:19, 8 June 2012 (UTC)

Help with wiki bot (Java)

Okay, so I decided to try using a bot using Java stuff from here. I've only done a very small few edits with it, through my own account. My intention is to mainly if not exclusively use it to create/fix/standardise some form of entries starting with Icelandic nouns (seeing as I made {{is-inflection of}} to handle them, but I do plan to work with other things with it as well. I don't plan on requesting a bot flag for a bot account (when I make it) just yet, seeing as I've got next to nothing working aside from having all the necessary extra java classes set up to be able to be imported to classes I make.

Instead, I'm posting here hoping that someone can help me with a problem that has arisen...when I try to save an entry with the bot if the text contains "special characters" (so far seeing as it's Icelandic I've only tried with ö and ý. Here is a bot test edit that should have wrote the text öxl...but as you can see there is a question mark instead of the ö... Does anyone here know what could be going wrong? 50 Xylophone Players talk 01:24, 10 June 2012 (UTC)

I'm no expert, but usually the problem is that you haven't enabled UTF-8 encoding. (Right, technically minded folks?) --Μετάknowledgediscuss/deeds 01:32, 10 June 2012 (UTC)
Well...I don't know if I need to do something like that somewhere else but, in Eclipse ( the Java IDE <-- that's the right term right? that I'm using) I right clicked the Java file that I created to run the bot code in and went to properties and changed encoding to UTF-8 earlier, and that was before (at least one) of the edits that had this problem....:/ 50 Xylophone Players talk 01:37, 10 June 2012 (UTC)
If you're sure it's enabled, I can't explain this phenomenon. --Μετάknowledgediscuss/deeds 01:47, 10 June 2012 (UTC)
It sounds like what you did is, you encoded your Java source file — your *.java file — in UTF-8? That isn't a problem, but it also doesn't help. Μετάknowledge's point was that, when you're converting back and forth between characters and bytes (conversions between String and byte[], or between InputStream and Reader, or between OutputStream and Writer), you need to specify a character-encoding of UTF-8. Probably the easiest way to do that, in your case, is just to set the default character set to UTF-8, by adding -Dfile.encoding=UTF8 to the VM arguments. (If you're running the program in Eclipse, you can specify VM args via "Open Run Dialog..." or "Run Configurations...", depending on your version of Eclipse. If you're running it at the command line, just insert -Dfile.encoding=UTF8 after java and before the name of the class.) —RuakhTALK 05:46, 10 June 2012 (UTC)
OK WTF...Am I doing something wrong??? I went to run configurations for the project as a whole and I put exactly -Dfile.encoding=UTF8 in VM arguments but the problem hasn't been resolved....Help??? :/ 50 Xylophone Players talk 15:07, 10 June 2012 (UTC)
Can you show the lines of code that initialise or generate the string that gets corrupted (e.g. are you reading from a file, or extracting from existing entries, or...)? I would imagine the problem is with the way the Java code is handling strings and not with the configuration of the VM itself. Equinox 15:11, 10 June 2012 (UTC)
Another thing... what you're setting right now is the encoding of the source code, so that would only affect if you write special characters directly in your code. It shouldn't affect how the actual program handles UTF-8 text, at all. Maybe that is your problem? —CodeCat 15:14, 10 June 2012 (UTC)
The entirety of my main method....
The comment line are just some stuff I was using to fix entries that I did't use in my latest tests, not that it should matter much I think. Also so notes in reply to what you said, Equinox: while I have some stuff in the source coe as you see with special chars that aren't saving to wikt. right, the odd thing I found is after I perform entry = b.readContent("User:PalkiaX50/Sandbox"), if I print out entry.getText() to the console and there were accented letters and such in it they print out fine...However as I've said the problem occurs nonetheless...50 Xylophone Players talk 15:23, 10 June 2012 (UTC)
Hm. Instead of doing setText with new text, try just appending to the string that prints correctly: add a dot or something (text + "."). Does that save back to the wiki properly or not? Equinox 15:27, 10 June 2012 (UTC)
Well see, AFAIK, setText is needed because that modifies the text variable/field in the Article object "entry" I guess my variable naming is a little confusing, all the assignments text = foo are to the String variable text in my class, which is, as you can see, eventually used to set the text in the Article object. 50 Xylophone Players talk 15:34, 10 June 2012 (UTC)
So see if this corrupts the accents or not:
Godammit it did it again...see? =/ 50 Xylophone Players talk 15:54, 10 June 2012 (UTC)
Well, sounds as though the library you are using wasn't designed to support UTF-8 (which is bad, since all Wikimedia wikis use that encoding, according to Meta). Equinox 15:56, 10 June 2012 (UTC)
I find that hard to believe, but at any rate I think I've found something in the libraries that might help...I just need to figure out how to use it. 50 Xylophone Players talk 16:08, 10 June 2012 (UTC)
Ugh....this is pissing me off...¬_¬ Does anyone know if there's some shit in Java that maaaaybe I could try to use to get around this by using alternate representations of characters?? Like you know, how for example in unicode chars are represented in a way like "U+XXXX"...Or better yet is there a template or something on wiktionary here (or could one easily be made to do stuff like that) that could do something like this or maybe a template that would be for substing only to display special chars? 50 Xylophone Players talk 17:13, 10 June 2012 (UTC)
It is ages since I wrote Java. But there is a String(byte[], String) constructor that will create a string from the source data bytes and the name of the encoding, so you can safely create a UTF-8 string that way if you can get the original entry contents as bytes rather than a pre-screwed-up string. Equinox 17:15, 10 June 2012 (UTC)
@ Palkia: a template is unnecessary, since in text body we can just use & yacute; without the space, which produces ý. However, that won't help (AFAIK) for page titles. It's hard to say which will take less time, fixing pages or designing a Java workaround. --Μετάknowledgediscuss/deeds 17:23, 10 June 2012 (UTC)
I see, but that looks really ugly in entries though, but it worked!!! See here. Just to clear things up, I think you misinterpreted me a little Equinox; when I use the code to get content from an entry here, if I print out the retrieved content on Eclipse's console for example, the special chars are not screwed seems to be something to do with the saving to that wiki that is where the screwing up is occurring. :/ Hence, as you see I the revision I linked, there were no screw ups.
But yeah as I said, that looks kinda ugly in the entries, no? That kinda thing is why I was suggesting a template to subst, or a similar strategy. Oh, and is there anywhere someone could link me to for a nice simple list of these "&XXXX;" codes for chars? Never mind, I actually did find something on Google. 50 Xylophone Players talk 17:57, 10 June 2012 (UTC)
If you really want to subst it (I'll admit, it would be a lot neater), just create subpages, like User:PalkiaX50/yacute, and create each one with the Unicode (U+XXXX, not &XXXX) version of the accented letter as its content. You can just call it by putting {{subst:User:PalkiaX50/yacute}} in the entries. --Μετάknowledgediscuss/deeds 01:00, 11 June 2012 (UTC)
Can you hold off for a week or so? I'll try to spend some time this week playing around with this library and seeing if there's a better way . . . because there really must be. —RuakhTALK 01:33, 11 June 2012 (UTC)
Well, for now anyway I have made a class of my own to handle the special chars; before saving an entry to wikt my program now runs through the text to be saved, replacing special characters with their HTML codes. It mostly only covers Icelandic atm (and well I guess anything that has some of those chars. , e.g. Irish due to acutes) but it's only a matter of adding more chars and their codes as needed. 50 Xylophone Players talk 15:01, 12 June 2012 (UTC)
Yes, I understand that you've done that. I'm asking if you're willing to hold off for a week or so, since that is not the best approach. —RuakhTALK 16:56, 12 June 2012 (UTC)
This JWBF library does support foreign languages. I have done some tests with the public static void main example from above: "öxl" (whatever this may be): diff, and Gothic: diff. --MaEr (talk) 19:14, 13 June 2012 (UTC)
On a slightly related note, could someone here that's good with templates modify {{is-inflection of}} for me, so that it can take an optional parameter indicating that it's being used for a proper noun form? So that it'd make the Category be "Category:Icelandic proper noun forms - xxxx"? 50 Xylophone Players talk 15:24, 12 June 2012 (UTC)
  Done but I'd like somebody to check it, as I'm no template-writing wizard. See Template talk:is-inflection of. --Μετάknowledgediscuss/deeds 16:43, 12 June 2012 (UTC)

New update to languages with limited documentation

Following the pass of the vote for languages with limited documentation, Metaknowledge presented me with some additional languages to be added. This resulted in an inclusion list and an exclusion list as described above. As a result of that discussion, it became clear that rather than trying to define which languages can have only one citation, it would be simpler to state which languages have to have three citations (e.g., English, French, etc.). Metaknowledge and I, along with some help from Liliana, have worked up that list of languages.

Because we are going from a list of limited documentation languages to a list of well documented languages, additional wording changes need to be made. But I believe the only substantial changes made as compared to the above discussion are:

  • the elimination of Dacian (to be addressed in a later extinct language vote),
  • the addition of constructed languages, something that had been omitted before, and
  • the change of "appropriate" to "inappropriate" to make it easier for one-citation languages to maintain a list of sources.

There are three parts to this proposal. Hopefully consensus can be reached here.

1. Deletion of Wiktionary:Criteria_for_inclusion#Languages_with_limited_online_documentation. It is no longer necessary. (Also the "sign languages" section should have three equal signs instead of four in the Wiki code.)

2. Change from

Number of citations

In general, three citations in which a term is used are the minimum number for inclusion in Wiktionary. For terms in extinct languages, one use in a contemporaneous source is the minimum. For languages with limited online documentation, only one use or mention is adequate, subject to the following requirements:

  • the community of editors for that language should maintain a list of materials deemed appropriate as the sole source for entries based on a single mention,
  • each entry should have its source(s) listed on the entry or citation page, and
  • a box explaining that a low number of citations were used should be included on the entry page (such as by using the {{LDL}} template).[1]


Number of citations

For languages well documented on the Internet, three citations in which a term is used is the minimum number for inclusion in Wiktionary. For terms in extinct languages, one use in a contemporaneous source is the minimum. For all other spoken languages, only one use or mention is adequate, subject to the following requirements:

  • the language community should maintain a list of materials deemed inappropriate as the sole source for entries based on a single mention,
  • each entry should have its source(s) listed on the entry or citation page, and
  • a box explaining that a low number of citations were used should be included on the entry page (such as by using the {{LDL}} template).[2]

3. Deletion of Wiktionary:Criteria_for_inclusion/Languages_with_limited_online_documentation, replacing that limited documentation list with a page at Wiktionary:Criteria_for_inclusion/Well_Documented_Languages whose content is as follows (boldfacing showing languages not specifically named before):

Well Documented Languages

The languages well documented on the Internet as provided on the Criteria for inclusion page are:

  1. Albanian, Armenian, Basque, Belarusian, Bulgarian, Catalan, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Georgian, Greek, Hungarian, Icelandic, Italian, Latvian, Lithuanian, Macedonian, Norwegian (Nynorsk and Bokmål), Polish, Portuguese, Romanian, Russian, Scots, Serbo-Croatian, Slovak, Slovene, Spanish, Swedish, Ukranian and Welsh;
  2. Afrikaans, Hebrew, Swahili, Xhosa and Zulu;
  3. Algerian Arabic, Egyptian Arabic, Libyan Arabic, Modern Standard Arabic, Moroccan Arabic, North Levantine Arabic, South Levantine Arabic and Tunisian Arabic;
  4. Bengali, Hindi, Kannada, Malayalam, Persian, Punjabi, Sindhi, Tamil, Telugu and Urdu;
  5. Azeri and Turkish;
  6. Cantonese, Hokkien, Japanese, Korean, Standard Mandarin and Vietnamese;
  7. Standard Indonesian, Malay, Tagalog and Thai; and
  8. Approved constructed languages.

This page may be modified through general consensus. To make a request to add or exclude a language, go to the Beer Parlour and click the "+" tab at top to input your request. When making a change to this list, consideration of how to handle existing entries should be taken into account.

I notice that Irish is missing from the list, even though it's an official national language. Also, I think it may be better to change 'language community' (which has a different meaning in everyday discourse) to 'community of editors for that language'. —CodeCat 10:47, 11 June 2012 (UTC)
Irish may be official, but I don't think it is really well-documented on the Internet. -- Liliana 12:01, 11 June 2012 (UTC)
Less so than Welsh? I would expect it to be the other way around... —CodeCat 12:03, 11 June 2012 (UTC)
I don't feel all that strongly about Welsh either. -- Liliana 12:05, 11 June 2012 (UTC)
In the British isles, between 3 and 4 times as many Welsh speakers use it in the home as Irish speakers, and that's in absolute terms. In terms of users per acre, Welsh bounds ahead even more. This has a major impact on internet penetration, and I've found tons of Welsh blogs and the like, but also a lot of Welsh material in bgc. --Μετάknowledgediscuss/deeds 16:22, 11 June 2012 (UTC)
I have changed "language community" to "community of editors for that language." Thank you for that suggestion, Code Cat! --BB12 (talk) 19:01, 11 June 2012 (UTC)
To test that theory (anecdotally), I plucked three words from Category:Irish nouns and tried to cite them. I cited [[aicme]] (not every sense, but every sense is probably citeable); I couldn't find any citations at all of [[xéirifít]]. I added one noun and one verbal noun citation to lasc, and I left on the talk page one more citation the POS of which wasn't clear to me. Unlike [[aicme]], I'm not sure every sense of [[lasc]] is citeable; "lasc"+agus (to filter our non-Irish works) and "lascadh" get only ~180 hits each. I'll try Welsh words now. - -sche (discuss) 18:05, 11 June 2012 (UTC)
I tried to cite Welsh words with the same meaning as the Irish words I cited. I created [[chwip]] (whip) with two citations in the entry and more available: it gets ~360 citations when I search for it together with "dyn" to weed out non-Welsh books (and "dyn" is not as common a word in Welsh as "agus" is in Irish). I couldn't find any citations of and therefore did not create [[seroffyt]] (xerophyte). "Adran"+dyn gets three thousand hits, to two thousand five hundred for "aicme"+agus. Hm... that anecdotal evidence suggests that Welsh is only marginally more attested than Irish. - -sche (discuss) 18:37, 11 June 2012 (UTC)
Thank you for checking these, -sche! I have deleted Welsh from the list. Given the increasing presence of Welsh on the Internet (Indigenous Tweets cites 6146 users of Cymraeg, for example), it may be possible to put it back on the list soon, but I'm glad to get this right now. --BB12 (talk) 19:05, 11 June 2012 (UTC)
I think it would be more standard to write it as xeroffyt in Welsh (that gets one hit, but I can't view it). I don't know much about Irish, but look at this search in Welsh. If you really think that Irish has a comparable amount of material, I suppose we can add it. --Μετάknowledgediscuss/deeds 19:00, 11 June 2012 (UTC)
Whoops, I may have spoken too soon. Perhaps Irish and Welsh both need to be on the list.... --BB12 (talk) 19:05, 11 June 2012 (UTC)
Er, I think we've defended Welsh and possibly given reason to add Irish, actually. --Μετάknowledgediscuss/deeds 19:15, 11 June 2012 (UTC)
After I did that, I realized I was probably wrong. I've put Welsh back in. Is it clear that Irish should be added? There is at least one radio station in Irish (RTE). --BB12 (talk) 20:38, 11 June 2012 (UTC)
Using "agus" will not filter out Scottish Gaelic. For my filter word I usually use "bhfuil", which is quite a common word in Irish but whose spelling is so outlandish as to filter out even Irish's closest relatives. For Welsh you could try "wrth", a fairly common preposition that is unlikely to be a word in any other language. —Angr 21:42, 11 June 2012 (UTC)
How does sandhi factor into this? I never spent that much time on Irish, but I seem to remember quite a few spelling variations caused by neighboring words- I would think that could really cut down on the number of hits. Chuck Entz (talk) 05:10, 12 June 2012 (UTC)
I guess you mean the initial mutations. Yes, you should look for all possible mutations of a word when looking for attestations. This doesn't apply to [[lasc]] and [[xéirifít]] since "l" and "x" aren't affected by the mutations, but "n-aicme" and "haicme" should be searched for alongside "aicme". -sche seems to have done this, since one of the quotations for [[aicme]] is actually of "haicme". —Angr 11:05, 12 June 2012 (UTC)
At Wiktionary:Requests for verification#tacuinum, there were users saying that Latin (at least in its medieval/modern form) needs 3 cites - annoyingly, neither side actually linked the relevant Beer Parlour discussions, so I'm not sure what the actual position on it is. Does modern Latin have a sufficient internet presence for this list (or, indeed, is it counted as a constructed language, since it has no native speakers)? Smurrayinchester (talk) 17:49, 11 June 2012 (UTC)
I would suggest that we make a distinction between ancient and modern Latin (without treating them as different languages) and that we make one ancient cite count as three modern cites. So if an old cite is found, one is enough, while if we only have modern ones, we need three. Would that work? —CodeCat 18:23, 11 June 2012 (UTC)
The discussion is at #Modern Latin. @SMurray: it has a good internet presence, but it would be a stretch to call it a constructed language, considering that a smaller portion of the vocabulary seems to be composed of neologisms than in modern English.
@CodeCat: That is exactly the policy I advocate for, and, as I noted in the RFV discussion, that's how I interpret past votes. Putting that interpretation explicity to a vote is fine, but that's out of the scope of this proposal, which is talking about living languages. --Μετάknowledgediscuss/deeds 18:32, 11 June 2012 (UTC)

Language names lower/upper case in the target languages

Whatever happened with language names quite recently on interwiki, it's not complete. A few language names still need to be converted to lower case: Tiếng Việt -> tiếng Việt, Ελληνικά -> ελληνικά, Bahasa Melayu -> bahasa Melayu, Bahasa Indonesia -> bahasa Indonesia (just looking at recent changes, noticed that most language names appear in the same case as in the target language). Don't know who made the change but if you know, please pass along the required change. --Anatoli (обсудить) 00:09, 14 June 2012 (UTC)

What an unfortunate decision. It looks really bad to have some names on the list capitalized and others lower-case. The first word in a list item can always be upper-case, even if it would normally be lower-case inside a sentence. —Angr 01:10, 14 June 2012 (UTC)
I have no strong opinion about this, just stating a problem I noticed. A benefit though, is to educate users about how to spell those language names inside a sentence (if they are written consistently, that is), there are still many editors who enter words in upper case as in Wikipedia articles. --Anatoli (обсудить) 01:14, 14 June 2012 (UTC)
This was a MediaWiki software change; see mw:MediaWiki 1.20/wmf5, where the list of core changes includes this one:
RuakhTALK 01:33, 14 June 2012 (UTC)
Thanks, Ruakh. "Where usual" didn't quite work obviously. --Anatoli (обсудить) 02:04, 14 June 2012 (UTC)

Request for consensus on new limited documentation language list

It seems that people are fine with the revisions above to greatly expand the languages considered to have limited online documentation, but nobody actually said they agree with it. If we can get consensus here, then the issue will not have to go to a vote. How do people feel? --BB12 (talk) 22:19, 15 June 2012 (UTC)

Mobile view as default view coming soon

(Apologies if this message isn't in your language. Please consider translating it, as well as the instructions on Meta)

The mobile view of this project and others will soon become the default view on mobile devices (except tablets). Some language versions of these projects currently show no content on the mobile home page, and it is a good time to do a little formatting so users get a mobile-friendly view, or to add to existing mobile content if some already exists.

If you are an administrator, please consider helping with this change. There are instructions which are being translated. The proposed date of switching the default view is June 21.

To contact the mobile team, email mobile-feedback-l

--Phil Inje Chang, Product Manager, Mobile, Wikimedia Foundation 08:27, 16 June 2012 (UTC)

Distributed via Global message delivery. (Wrong page? Fix here.)


RF is back. That might be good news. However, given his previous history of adding (often wrong) inflections of foreign terms, can I ask someone to look at [3] and check that he isn't going mad again? Equinox 02:27, 17 June 2012 (UTC)

His Esperanto and Spanish look good; not so sure about Ido. Does anyone know if you're allowed to have terminations in -ii in Ido? --Μετάknowledgediscuss/deeds 03:55, 17 June 2012 (UTC)
Yes, that is the standard termination for Ido nouns ending in -io. Razorflame 04:26, 17 June 2012 (UTC)
Sorry; I've never met you. I only started here on Jan 1 2012. Now that I've seen that you're io-2, I realize I should've trusted you. --Μετάknowledgediscuss/deeds 04:32, 17 June 2012 (UTC)
No problems :) Nice to meet you too :) Razorflame 04:33, 17 June 2012 (UTC)
Oh yeah, and if you want it, it would be Raduloflamma in Latin. --Μετάknowledgediscuss/deeds 05:40, 17 June 2012 (UTC)
Thanks, but I'm not adding any further names in different languages to my userpage :) I'm quite content with the ones I got :)

People and humans

Can anyone give me some criteria which I can use to decide whether an entry belongs in (a given language's instantiation of) Category:People or Category:Human? Cat:People is a subcategory of Cat:Human, so when does something belong in Cat:Human but not in Cat:People? —Angr 19:47, 18 June 2012 (UTC)

AFAICT, we've never had any objective criteria whatsoever for any topical categories, either for membership in or for the creation of a category. Should the criteria for Category:en:People be the same as for, say, Category:egy:People? As this is the English Wiktionary, I would suppose yes, but who knows? These might be voting matters. DCDuring TALK 19:55, 18 June 2012 (UTC)
I'm with DCDuring on this; anything is valid unless it fails a deletion debate. I feel there has been or was a bit of a creep to including more and more varied topical categories, mainly due to Daniel Carrero (talkcontribs) though I suppose he isn't/wasn't the only one, just the one I remember best. Mglovesfun (talk) 19:58, 18 June 2012 (UTC)
I do think that the topic categories should have the same criteria regardless of what language they're used for, since they're semantically based (I don't see what this being the English Wiktionary has to do with anything). But these two in particular are confusing to me. At Category:en:Human the only entries are [[man]], [[woman]], [[child]], [[Homo sapiens]], and [[uncanny valley]] (of all things). Not even [[human]] is in that category, though it is in Category:en:Hominids, of which Category:en:Human is not a subcategory; and [[human being]] isn't in any topic category at all. I like the topic categories, but they don't seem terribly well maintained even for English, let alone other languages. —Angr 20:05, 18 June 2012 (UTC)
I agree that we should use the same criteria for each language, but I don't think we should dismiss the question so lightly. You say that these categories are "semantically based", and that's true, but semantic classification depends on the language. For example, Category:Blue might make sense for English (where words like "cerulean", "aqua", "royal blue", and so on all denote shades of blue), but wouldn't make intuitive sense for languages like Vietnamese that treat green and blue as a single basic color (for them, Category:Grue would make more sense), nor for languages like Russian that treat light-blue and dark-blue as separate basic colors (for them, Category:Light blue and Category:Dark blue would make more sense). For another example, a potentially more contentious one: English has many more Christianity-related words than Taoism-related words, so we might subsplit Category:Christianity into many more categories than (say) Category:Taoism, even though Mandarin would probably benefit from more subcategories of the latter. So a decision to use English-y categorization, while IMHO correct overall, nonetheless has implications that warrant consideration, and may require exceptions in some cases. —RuakhTALK 20:44, 18 June 2012 (UTC)
What about backing down from excessive semantic categorisation? Why should Rapa Nui vai (water) be in Category:rap:Water anyway? So that a water-obsessed Polynesiaphile can find rano (lake)? I oppose any Category:en:Blue being created, and if anyone wants to make a list, it belongs in Wikisaurus or perhaps an appendix. --Μετάknowledgediscuss/deeds 05:15, 19 June 2012 (UTC)
Do you think we should make all topical categories hidden, to avoid misleading users or encouraging the creation and use of them? DCDuring TALK 10:42, 19 June 2012 (UTC)
If you did, some users would have nothing to do [4] Chuck Entz (talk) 13:56, 19 June 2012 (UTC)
Absolutely not; topical categories only exist for our readers, unlike some categories which are for editing purposes (Category:Request for autoformat, for example). Having some sort of rules would be really good, though in all honesty I have nothing to propose on this subject. Mglovesfun (talk) 10:48, 19 June 2012 (UTC)

Votes starting soon

Announcing Wiktionary:Votes/2012-06/Well Documented Languages (which I think was announced in the middle of discussion up the page, but I thought should be mentioned more noticeably) and Wiktionary:Votes/2012-06/Enabling WebFonts Extension (the result of a GP discussion).​—msh210 (talk) 22:50, 18 June 2012 (UTC)

Thank you. I forgot to announce the vote here. --BB12 (talk) 23:29, 18 June 2012 (UTC)

Vote first draft: ʻokina, apostrophe, and similar characters in entry names in Latin alphabets

I would like to propose a formal policy to reduce the confusion between these visually-similar characters sitewide:

For all terms in Latin alphabets containing apostrophe-like characters such as ʻokina, left- and right- single quotes, etc. , the simple apostrophe (') shall be Wiktionary's default standard, to be used in entry names to the exclusion of all other similar characters, with the following exceptions:

  1. When another character is specified by the language's orthography
  2. When there is more than one of this type of character in the language in question, and the distinction is necessary to distinguish meanings, pronunciations, etc.
  3. When predominant usage for the language (especially online), or occurrence in important texts available online, etc. would lead the majority of those searching for such terms on Wiktionary to use another character.
  4. When the consensus among editors working with a given language is for using another character.

Recommended practices:

  1. All About pages for languages with a Latin alphabet using such characters should specify which character(s) to use.
  2. Alternate spelling entries, redirects (where appropriate), and other methods should be used to ensure that searches for terms containing such characters will eventually lead to the correct entry whether the correct character is used or not.
  3. If a different character is used for a given language, that character should be used consistently for entry names within that language.

Corrections, additions and/or comments would be appreciated so I can produce the best possible version before submitting it as a formal vote. Chuck Entz (talk) 14:51, 19 June 2012 (UTC)

Can you give us some examples of languages for which the 1st, 2nd and 3rd exception apply? Ungoliant MMDCCLXIV 15:11, 19 June 2012 (UTC)
For the 1st, Tongan. For the 2nd, Ancient Greek. For the 3rd, it definitely comes up with other letters, but I can't think of an appropriate example. --Μετάknowledgediscuss/deeds 17:34, 19 June 2012 (UTC)
  • And does anyone know if the Search feature will find 'okina etc. entries when a user spells a search term using the regular old ASCII apostrophe? If the Search feature does not find entries using the 'okina or other notation, then that presents a significant usability problem, and we should probably find a way to either add redirects from the ASCII spelling, or add the ASCII spelling to the text (code?) of the page in some way so that the Search feature will find the term. -- Eiríkr Útlendi │ Tala við mig 16:57, 19 June 2012 (UTC)
    Another search problem. I think a null parameter would be the easiest. --Μετάknowledgediscuss/deeds 17:34, 19 June 2012 (UTC)

Apostrophe-like marks represent a wide variety of things: orthographic elision, phonemic stops, quotation marks, character-modifying diacritics, breathing marks, pronounced iotation and transliterated iotation marks (Cyrillic ь), unit symbols (foot/inch, (degree)/minute/second, or (hour)/minute/second), math/logic symbols, etc. It's an oddly very specific rule for a very wide scope of representation.

Why have a detailed guideline for glyph variants of single apostrophes, but ignore double quotation marks, and other kinds of orthographic marks? What about Greek, Cyrillic, and other alphabets that use apostrophe characters in similar and different ways? (If this is only about Latin alphabets, then Ancient Greek wouldn't be a suitable example.)

I'd also point out that following this guideline would mandate the use of ‘ ’ ′ ʻ marks in English, as well as their non-spacing Unicode variants, because there is “more than one of this type of character in the language in question.” Would this also apply to “ ” ″ ‴ « »  marks, etc? Michael Z. 2012-06-19 20:12 z

What we need is Wiktionary entries on all the apostrophes. I already started a bit, but never quite finished. -- Liliana 20:19, 19 June 2012 (UTC)

I think that this proposal to some extent misses the point. Look at the first two criteria; they don't address the right questions IMO:

  • "When another character is specified by the language's orthography"

Fine, but what's considered "specified"? Consider English, which has no governing body. Does the presence of directional (as opposed to straight) apostrophes in virtually every book ever published mean that they're specified? (We are descriptive, after all.) Does the presence of straight apostrophes in every Internet RFC mean they're specified? And then what's "another"? A large part of the issue of which to use in a page title is the question of whether the two apostrophes are actually distinct in the language. If they're not distinct from one another, then which one is specified is moot.

  • "When there is more than one of this type of character in the language in question, and the distinction is necessary to distinguish meanings, pronunciations, etc."

Again, whether two things are distinct is part of the issue. Are the apostrophe in "I was born in '76" and the first one in "He sang '76 Trombones Led the Big Parade' today" the same or different? Are there "more than one of this type of character in [English], and the distinction is necessary to distinguish meanings"?​—msh210 (talk) 21:35, 19 June 2012 (UTC)

  • I am opposed to any attempt to standardize the typewriter apostrophe in places where it functions as a letter (e.g. the ʻokina). There is a reason that Unicode has provided separate code points for modifier letters (i.e. things that look like punctuation marks but function as letters), namely that they collate differently and can be searched for differently. It isn't like the difference between curly quotes/apostrophes and typewriter quotes/apostrophes, which is purely aesthetic. Mixing up modifier letters and punctuation marks is as wrong as mixing up Latin and Cyrillic letters that happen to look identical--Latin "A a" and Cyrillic "А а" look completely identical in all fonts, roman and italic, but they are different and the software knows that, and Wiktionary completely respects that difference. It should be no different with letters like ʻ (the ʻokina) and punctuation marks like ‘ (the open curly apostrophe): just because they look the same doesn't mean they function the same way. We should reserve apostrophes for correct apostrophe usage (e.g. contractions in English and French) and modifier letters for correct modifier letter usage. By all means we can include redirects from apostrophes (both typewriter and curly forms), as well as {{also}} messages where appropriate, so that people can find what they're looking for, but we should acknowledge that using those apostrophes is a kludge that we do not sanction ourselves. I also want to point out that exception 1, "When another character is specified by the language's orthography", seems biased in favor of languages that have regulatory bodies to issue official guidelines on such things, and against languages (certainly the majority) that have no such regulatory bodies, or whose regulatory bodies have not yet caught up with technology to the point that they specify particular Unicode characters. For most languages, all we have to go by is the look of the printed page, which is no more use in distinguishing the letter ʻ from the punctuation mark ‘ than it is in distinguishing Latin a from Cyrillic а. But in both cases, just because the printed page doesn't distinguish them doesn't mean we don't. —Angr 21:46, 19 June 2012 (UTC)
That's not quite right. None of these differences are “purely aesthetic,” but neither is ignoring some of them always “wrong” or necessarily a “kludge.” According to Unicode, some code points can represent several kinds of “abstract characters.” In fact, Unicode says that the neutral apostrophe U+0027 can represent many different things, including some modifier letters.[5] Unicode also says that the typographic apostrophe and single quotation marks are preferable. It's a matter of how specific we want to be.
As a dictionary, I think we should be as specific as possible in our lemmas, while being as true to the source as possible in quotations. Michael Z. 2012-06-27 03:27 z
I don't see that Unicode says that U+0027 can represent modifier letters. It merely says "→ 02BC ʼ modifier letter apostrophe" under the entry for 0027, which I take to mean "see also". I'm pretty sure the → mark doesn't mean "can also represent" since the entry for 003D (=) is followed by "→ 2260 ≠ not equal to". —Angr 18:38, 27 June 2012 (UTC)
Quite right. There's no easy-to-find key, but p 5, c 17[6] explains that “→” is just a cross reference, with no specified relationship. We still know that various characters are used variously, but now I need to read more about what is acceptable to the Unicode standard. Michael Z. 2012-07-05 00:05 z

Czech users?

Who on here speaks Czech or knows enough about Czech to help out with an entry that I'm not sure about? [b]I did not create the entry[/b], but I'm not sure if the entry is the best it can be. It is pulec. I was wondering why there is no plural marked for this term. Anyways, thanks, Razorflame 18:39, 20 June 2012 (UTC)

Plural is pulci. Coding such as [b] and [i] don’t work here. Bold is indicated by ''' and italic by ''. —Stephen (Talk) 19:16, 20 June 2012 (UTC)
Actually <b></b> and <i></i>. Mglovesfun (talk) 15:53, 22 June 2012 (UTC)
Thanks for the help guys. Is there any way you can add the plural entry Stephen? --Razorflame
Added. Maro 18:52, 24 June 2012 (UTC)

Durability and online archives

This question has been discussed before but as far as I know no discussion has specifically encompassed WebCite so I would like to reopen the issue explictly regarding this one site.

The WebCite Consortium has been specifically set up to guard against linkrot in scholarly citations by maintaining a digital archive of such material. It is now well established and widely used. An argument used against the Wayback Machine was that it is not durably archived because any Internet organisation can go "off-the-air" overnight. In contrast, Usenet, it was argued, is distributively archived and is much less likely to suffer from this problem. WebCite directly address this question in their FAQ; "WebCite® feeds its content to digital preservation partners such as libraries and the Internet Archive ( WebCite® is operated and supported by publishers, who are already using it for their journals and citations and therefore have a vital interest in keeping the service alive." WebCite is therefore, at least according to them, distributively archived also.

Another point raised about the Wayback Machine is that they will remove material that is alleged to be in breach of copyright. WebCite might possibly also do this. The circumstances this is most likely to come up in is copies of works of art, books, newspapers, magazines and scholarly papers. Works of art are of no relevance to us and the rest are all print sources that are durably archived elsewhere anyway. On more general web sites, where the site operators do not restrict access it is highly unlikely that the question would ever arise. WebCite respects robots.txt, no-archive tags, no-cache tags, and cannot archive subscription based content. I cannot deny that in theory something could be taken down, but then again, I cannot guarantee that an important library won't burn down either. Likewise, I am quite sure that a determined legal department with unrestricted resources could succeed in taking down something on usenet if they really wanted.

I can't help feeling that the durability issue is not really at the heart of the objection and what really is at issue here is that we want to exclude random, poor quality webpages from the CFI requirements. If that really is the issue, then we should say so in the policy instead of using the convenient scapegoat of durability. For instance, we could exclude from valid CFI citations; pages clearly written by the speaker of another language, pages littered with spelling and grammar errors, pages using copious street gang patois etc (I am sure there are many more that could be added to that list). SpinningSpark 17:09, 21 June 2012 (UTC)

They are quite clear at License: "Copyright and license for all content archived by the WebCite® system are retained by the original authors of the archived pages. If you are the author of such a page, and would like its content removed, please contact us." --BB12 (talk) 18:02, 21 June 2012 (UTC)
I’d say durability IS the heart of the objection; after all poor quality Usenet comments are accepted. WebCite is indubitably much more durable than the Internet Archive; but as BenjaminBarrett noted the content’s author can easily get it removed. Let’s see where this discussion goes; we could come up with a solution. P.S. I don’t think it’s a good idea to ban quotations by foreigners :-( . Ungoliant MMDCCLXIV 18:26, 21 June 2012 (UTC)
I have not denied that WebCite can take down material. What I am challenging is the likelihood that that will happen. All likely cases are either of no concern (because they are print) or are cases WebCite will not archive in the first place. SpinningSpark 00:39, 22 June 2012 (UTC)
Now I'm confused about what you are proposing. If you want to have approval for WebCite for printed material, that will probably work (I would want to have some specific examples where WebCite provides information that can't be accessed easily in some other way). But you also say, "On more general web sites, where the site operators do not restrict access it is highly unlikely that the question would ever arise," but those general web sites (which include blogs) can have their material taken down at any time and I don't understand how your "unlikely" factor is determined. --BB12 (talk) 00:47, 22 June 2012 (UTC)
A solution is to require more than 3 citations from WebCite. For example, we could require a minimum of 5 WebCite citations, and if 2 of them are eventually removed (unlikely, as you claim) the term can still be verified. Ungoliant MMDCCLXIV 00:56, 22 June 2012 (UTC)
I like that! --BB12 (talk) 01:00, 22 June 2012 (UTC)
I was saying unlikely because I doubt bloggers etc commonly demand their material be removed from an archive. I admit I am only guessing, so I have e-mailed WebCite to see if they will provide any stats on number of take-downs. I too think an increased number of cites is a sensible precaution. SpinningSpark 01:15, 22 June 2012 (UTC)
One more thought, if the requirement for WebCite was made exactly twice the normal CFI rule, that would make it easier to calculate mixed citations. So something that entirely depended on WebCite would require six citations, but, for instance, if one citation was provided from gbooks and one from usenet, then a further two would be needed from WebCite. SpinningSpark 09:24, 22 June 2012 (UTC)
If you haven't already, you might want to check out Wiktionary:Votes/2012-06/Well_Documented_Languages. The voting is set to start in a couple of days, and that language is what would need to be changed to enact this proposal. --BB12 (talk) 17:14, 22 June 2012 (UTC)
It might be the same policy, but it is not the same issue. Is it sensible to piggy-back this proposal on it? And if so, how do we go about it? SpinningSpark 18:39, 22 June 2012 (UTC)
A separate vote is definitely needed. What I've found to be good is to wait until about two days after the BP conversation has stopped and then go to Wiktionary:Votes. The instructions are under "Starting a new vote on this page." It's confusing, so I go through the instructions multiple times to make sure I've done everything. Unfortunately the language in the other vote affects this proposal, so this proposal has to either be voted on after the other vote, or else the proposal has to be worded to cover both. --BB12 (talk) 23:46, 22 June 2012 (UTC)

I have a reply from Dr. Eysenbach, founder of WebCite;

We removed around 20 out of several million records. Note that they are only removed from public view, we still keep them in an internal archive and make them accessible on an individual basis.

That seems to pretty much settle the question of durability. SpinningSpark 23:14, 22 June 2012 (UTC)

Great! I think this is enough information to get a vote underway. Ungoliant MMDCCLXIV 23:20, 22 June 2012 (UTC)

Per-category collation orders

This question has come up a few times before I think, most recently at WT:ID#Alphabetic order. For a project like Wiktionary which handles so many different languages, it seems very strange that there is no way to handle collation orders better than the way we do it now. To clarify this: a collation order determines the ordering of words in a sorted list, such as (especially) categories. The reason this is important is that every language has its own alphabet or script and may have different rules for ordering the characters. Swedish, for example, considers the characters Å, Ä, Ö as separate letters and orders them at the end of the alphabet after Z, while in German the characters Ä, Ö, Ü are considered equivalent to A, O, U. In Hungarian, Cs is considered a separate letter and is ordered between C and D, as is É which is placed between E and F, while in French É is equivalent to E. Lithuanian, meanwhile, orders Y to be placed between I, Į and J, K. Currently the wiki software always orders entries by Unicode encoding order, which is of course not right for many languages. Our solution so far has been to add sort= to many templates, but this approach has a few serious flaws:

  • It has to be added to every template that adds entries to a category, every time it's used in an entry. It's likely to be missed or forgotten.
  • It does not change the collation order of characters, it can only make an entry behave as if it had another name for collation purposes, so it only works for languages that have the same collation order as Unicode.
  • Hence, it does not work for languages like Lithuanian, German or Danish that sort their letters differently from the Unicode ordering.
  • Nor does it work for languages like Hungarian that consider a sequence of characters to be a separate letter.

For this reason I'd like to petition for a feature that allows Wiktionary to specify, for each category individually, which collation order should be used. Presumably such an order would be selected depending on the language code used, so the ordering would be set by the category boilerplate template and be completely automatic from an editor's point of view. —CodeCat 15:30, 22 June 2012 (UTC)

mw:Bugzilla/Notes/164 is quite similar in purpose. --Bequw τ 17:43, 22 June 2012 (UTC)

Official Wiktionary Mobile app for Android released

I'm happy to announce the Mobile Wiktionary app for Android is now available via Google's app store. We do not have an iOS app available yet, but we'd be happy for other volunteers to join this project to build it!

This app allows people to browse wiktionary in their own language anywhere they have digital services for their Android device. But there are always things which can be improved, and we need to work together to make this the best dictionary app. Nearly all of the appearance and content is drawn from the individual wiktionary, via Mobile Frontend, so this app is a real partnership between the developers and the community.

  • The wiktionary app has feedback pages!meta and mediawiki. Please leave comments on either of them. We also track the app store feed back, wiktionary-l, the mobile-l, and we are (usually) available on IRC at #wikimedia-mobile.
  • There is a bugzilla product: Wiktionary App (links to open app bugs seach.) If you see something you think might be a bug, or a cool feature, or just want something made a bit more obvious, report a bug for this product and we will try to respond quickly.
  • Each wiktionary has control of how their main page appears as the 'start-up' page for users in their linguistic localization. MediaWiki.Org has guidance on improving your home page.
  • Certain features of Mediawiki's standard javascript and css are not forwarded to mobile devices. This includes common.js. Displays which rely on common.js will appear broken on mobile devices, potentially making your community's work completely illegible. Please keep in mind that appearance is of lower priority than content, and develop layouts which do not rely completely on js.

- on behalf of the UCOSP Spring 2012 students, Amgine/ t·e 15:43, 22 June 2012 (UTC)

Quick look: Congratulations. Not bad at all. What about making it possible to make type larger for folks like me? Some web pages allow scaling on an as needed basis. DCDuring TALK 16:47, 22 June 2012 (UTC)

User:Darkicebot for manual adding for Ido verb forms

Lately, you guys might have seen me adding Ido verb forms using AWB on this account. What I would like to ask the community is if I could continue doing this, but on an account that has a bot flag, like my bot Darkicebot, to keep these creations off of the main recent changes page. I realize I haven't been the most trustworthy of people in the past, but I've turned over a new leaf, and I definitely know how to handle running a bot. My bot, Darkicebot, has made over 2 million edits globally, and has been run for over two years in the past. While I won't be using Pywikipedia for this, I will be "manually" typing in what it needs to add and "manually" running this bots' task. It won't be automated except for the addition of the verb forms. Everything else, including the preparation, will be manual.

Thanks, Razorflame 18:42, 22 June 2012 (UTC)

Probably needs a vote. Equinox 00:25, 25 June 2012 (UTC)
I know it will need a vote, but before a vote can begin, a discussion has to be had, otherwise, the vote cannot even start. Further complicating the matter is the fact that my bot is currently blocked by Encyclopetey over two years ago when I made a few test edits with it, and he refuses to unblock him for any reason. Razorflame 01:29, 26 June 2012 (UTC)
20385 User contributions. "...a few test edits". - Amgine/ t·e 18:48, 27 June 2012 (UTC)
There was a period in time when he was not blocked that I used him for test edits. I did not remember the exact number of edits because it was like four years ago. I thought he had only made a few edits, guess not XD Razorflame 18:56, 27 June 2012 (UTC)

Unless anyone has anything else they want to say about this, I will go ahead and make the vote tomorrow. That should give people enough time to write any other concerns or questions they might have for me. Razorflame 22:39, 28 June 2012 (UTC)

There are some things I don't understand. For example, what is AWB? If Darkicebot has made over two million edits globally, why do you need this permission? What is a bot flag? How exactly have you turned a new leaf and is there evidence of that? What is the difference between running a bot "manually" and running one "automatically" (and what do the quotation marks mean)? I understand that most of these questions are due to ignorance on my part, but given your past, my intention is to vote against this unless I thoroughly understand all the issues. --BB12 (talk) 23:02, 28 June 2012 (UTC)
Given that I don't really know you very well, I find this a little disturbing, but will answer your questions. AWB is AutoWikiBrowser. It is a program that people use to help them do multiple manual tasks that are the same over a large number of pages quicker than normal. While Darkicebot has made over 2 million edits globally, most of those are on the Wikipedia side. I need this permission in order to run my bot because WT:BOT says that I have to have the bot flag in order to run my bot. You can find evidence that I have turned over a new leaf by looking back through my contributions since I've come back. The difference between a bot that is running manually and one that runs automatically is that the bot operator is running the bot and overseeing it when it is manually run. The bot operator does not monitor an automatically run bot. Quotation marks were meant to be bolding, not quotation marks. Razorflame 23:10, 28 June 2012 (UTC)
Well, I don't know you at all, but it seems this is a very important issue, so thank you for the response. On June 16, 2012, you said, "Maybe you should wait until I actually come back before making these kinds of remarks." That was in response to a comment almost a year earlier. According to [user log], that was the day you came back and you have made thousands of changes since, which as far as I know are good contributions. Given the combative tone in that response, however, it seems that you still have an attitude of aggression rather than remorse, and so I wonder how sincere your turning over a new leaf is. --BB12 (talk) 23:25, 28 June 2012 (UTC)
My only attitude of aggression that I have on this site is towards one particular user. Hence, my aggressive tone in that response. Razorflame 23:22, 29 June 2012 (UTC)
Thank you. --BB12 (talk) 20:59, 4 July 2012 (UTC)

Numbers and numerals, again (again)

Please don't hit me! :( This topic has been discussed so many times... I'm hoping this one will be more conclusive. The question is about whether the category containing words like one, seventy and so on should be called Category:English cardinal numerals or Category:English cardinal numbers, and similar for ordinals and the category containing both. Up till now, every language has gone its own way, some using numbers and some using numerals. One point that was overlooked in previous discussions is that we also have another category called 'numbers', which is used by the {{symbcatboiler}} template and is used for numeric symbols like 1, 2, and so on (compare Category:English letters). So, in effect, the category named Category:English numbers could use either {{poscatboiler}} or {{symbcatboiler}}, depending on what entries you put in it. This hasn't been a major problem so far, but there has been some discussion about merging several of the category boilerplate templates. And to do that, this question needs to be solved because the two meanings of 'numbers' conflict and the templates can't be merged without renaming one of them. So what should be done? —CodeCat 12:11, 23 June 2012 (UTC)

I've always been in favor of 'number' purely because it's the most used, and I don't believe 'numeral' is in any way more correct than 'number'. We've done Google Books searches and we know that 'cardinal number' and 'ordinal number' are more common (on Google Books) than 'cardinal numeral' and 'ordinal numeral'. In my opinion, the biggest roadblock to consensus has been a couple of influential administrators trying to overrule the majority and impose numeral instead of number. The majority of users are happy with the statistically more common 'number', but were subjected to reversion, blocking and threats of lifetime blocks, so we ended up in this no consensus situation. Mglovesfun (talk) 14:15, 23 June 2012 (UTC)
But now that there is this conflict of names, which should be called what? Is 1 a numeral and 'one' a number, or is 1 a number and 'one' a numeral? Or something else? —CodeCat 14:18, 23 June 2012 (UTC)
Isn't 'one' a number as well as '1' is, while '1' is also a numeral (in its first sense)? We can go by this logic if there's necessity for further distinction and accompanying categorization of 'one's and '1's. --BiblbroX дискашн 22:38, 23 June 2012 (UTC)
On the side note (if worth noting): at the time of writing ordinal numeral is a red link. --BiblbroX дискашн 22:47, 23 June 2012 (UTC)
I don't think that the word one is an "English number"; rather, it's the English word for a number. That said, I don't know if "numeral" is much improvement, since although in theory it refers to any sort of representation of a number (and linguists do use it to refer to number words), in practice I think that most normal people understand it to refer to things like 1 (Hindu-Arabic numerals), I (Roman numerals), and so on. —RuakhTALK 14:25, 24 June 2012 (UTC)
Then I think in this case our usage should reflect common understanding, rather than linguistic definitions most people won't be familiar with. —CodeCat 11:57, 27 June 2012 (UTC)


If Ruakh is right that "numerals" are any representations of numbers, my slight preference is to use the terms in the technical way, 1=number and one=numeral. However, I don't mind doing it the other way around, as long as we standardise on something. - -sche (discuss)
What -sche, times a million! Mglovesfun (talk) 22:24, 18 July 2012 (UTC)
I think we mostly already agree we need to standardise, the question that still needs answering is on what. —CodeCat 23:04, 18 July 2012 (UTC)
@-sche: To clarify my comment: in technical usage, the symbol 1 and the word one are both "numerals", because they're both representations of a certain number. (When I wrote that linguists use the term numeral to refer to number words, I did not mean to imply that they consider that term to refer exclusively to number words, only that they, unlike most people IME, consider that term to be applicable to number words.) —RuakhTALK 02:42, 19 July 2012 (UTC)
Wiktionary considers only terms, and linguistically all terms represent certain concepts, so 'one' is no different from 'house' in this way. The big difference though is that 'one' represents the English word for the first positive integer, while '1' represents just the integer, although it may also stand in for the word in any language in certain contexts. So you can see it this way: <one> represents the phonemic sequence /wʌn/, which in turn represents the first positive integer. Wherease <1> represents either the first positive integer directly or, in English, most of the things that <one> may represent. —CodeCat 22:20, 19 July 2012 (UTC)

Foreign Word of the Day

Several years ago there was a project for creating the foreign-language equivalent of Word of the Day (Wiktionary:Word du jour/Nominations).

I think a foreign word of the day would encourage foreign-language contributors to write high-quality entries, and help spread the fact that our intention is to include all words in all languages. Would anyone be interested in reviving this (with new name, rules, etc.)? Would you accept adding a new box in the Main Page for this feature? Ungoliant (Falai) 03:36, 24 June 2012 (UTC)

Any problem with just reactivating? --BB12 (talk) 03:41, 24 June 2012 (UTC)
I believe this was discussed before, and it was suggested by someone (possibly me) that we should start it as a "word of the week", to cut down on the effort and words required, and switch to a daily format later if needed. - -sche (discuss) 03:48, 24 June 2012 (UTC)
I can see this opening the door to politics: we have a certain portion of the contributors who are real partisans where their mother tongue is concerned (exhibit A: How the Tamils invented writing, exhibit B: Etymology as an insult against a superior people). Every time we choose a word, someone will interpret it as an endorsement of the language in question, and as a slight to another language whose word wasn't chosen. Given that our admins' expertise is concentrated mostly in Europe and certain parts of Asia, I can see how we end up either a) neglecting whole regions or b) blundering into political landmines because we've ventured into areas we don't know. Chuck Entz (talk) 05:07, 24 June 2012 (UTC)
I can imagine setting the last day of each month as a deadline for two months after that month (the end of April is the deadline for June words). Priority is given in order listed, with a precedence for languages not represented in the past three months. --BB12 (talk) 05:31, 24 June 2012 (UTC)
I hadn’t thought about that. It’s certainly a valid point, but I don’t think the problem is as serious as that. For one, those two exhibits are complete nonsense, and we shouldn’t base our consensus on such tosh. It’s also worth noting that in both cases the commenters are angry at the etymology section of an English entry! As an advantage, this could encourage such contributors to create high-quality entries in their mother tongue. As for neglecting whole regions, aren’t we already neglecting the entire non-English speaking world?
I wasn't saying it would cause problems, just that it could- especially if we don't think about such considerations. I don't apologize for giving a couple of extreme examples: we never have problems with the vast majority of contributors of any nationality- it's the few gung-ho idiots that we have to watch out for. I just don't want to unnecessarily give them an excuse to act up. Chuck Entz (talk) 21:14, 24 June 2012 (UTC)
If this is to be revived, rules should be carefully though out in order to avoid political landmines, such as avoiding repeating the same language, languages from the same region, or languages associated with a certain culture too frequently. If we are careful, a Foreign Word of the Day won’t open any door to politics which isn’t already open: if we define a Serbo-Croatian WOTD, Croatians who hate Serbians and vice-versa will be angry, but since we include Serbo-Croatian words in the main namespace they would be angry anyway. — Ungoliant (Falai) 06:10, 24 June 2012 (UTC)
How about rotating the continents or regions of Earth, so that one week is Europe week, another is Africa week, and so on? That would allow us to spread things some. —CodeCat 11:10, 24 June 2012 (UTC)
Excellent. In Africa week, we'll use French (Algeria); in South America week, Dutch (Suriname); and in Oceania week, English (Midway). Oh, drat, we can't use English.​—msh210 (talk) 18:06, 24 June 2012 (UTC)
I was wondering whether a Europe week would actually increase the focus on individual European languages since there are so few compared to other regions. But in addition to regions, other themes could be used: IE languages, endangered languages, lesser-spoken languages of Europe, little-known languages with populations over a million, European languages in the Americas (Pennsylvania Dutch, Hunsrik), lingue franche past and present, agglutinative languages, tonal languages, etc. --BB12 (talk) 18:39, 24 June 2012 (UTC)
I started a list at Wiktionary_talk:Word_du_jour/Nominations#Focus_weeks. --BB12 (talk) 20:20, 24 June 2012 (UTC)
Regardless of any measure we adopt to prevent bias, the focus should be on high-quality entries, not on the language itself. — Ungoliant (Falai) 00:23, 25 June 2012 (UTC)

Which IPA symbol to represent ch and j, and other IPA questions

I have noticed two differing viewpoints on these two pages, here and here. Which one is correct? A more exact question, on the specific sounds, j and ch which d͡ʒ or dʒ and t͡ʃ or tʃ is more acceptable and/or correct. I also notice on the first link... IPA Pronunciation key, that it has more IPA symbols for us to use with super subtle differences in sounds, what is your take on that? Thanks in advance. Speednat (talk) 14:47, 25 June 2012 (UTC)

For English, the use of the tie bar is optional; including it is neither more nore less correct than omitting it. In other languages, like Polish, including the tie bar is necessary, because Polish actually distinguishes between [t͡ʃ] (as in czy) and [tʃ] (as in trzy). For English and other languages that don't distinguish them, it doesn't matter whether you include the tie bar or not. WT:IPA pronunciation key has more symbols than Appendix:English pronunciation because the former is for all languages, while the latter is only for English. —Angr 14:51, 25 June 2012 (UTC)
I wonder if Polish does distinguish them, what is the difference in actual pronunciation between them? —CodeCat 15:08, 25 June 2012 (UTC)
They both have sound files; listen for yourself! :) —Angr 15:26, 25 June 2012 (UTC)
Although many English dictionaries use /tʃ/ and /dʒ/, the correct symbols for these affricates are /t͡ʃ/ and /d͡ʒ/. An affricate is one sound, one phoneme so it should be written with a tie bar above. An English speaker who looks at the pronunciation knows that /tʃ/ is one sound and it's pronounced as voiceless palato-alveolar affricate, not voiceless alveolar stop + voiceless palato-alveolar fricative because there is no such consonant cluster in English. It's not a big problem in English pronunciations, but for example in Polish, affricates can contrast phonemically with stop-fricative sequences; for example: "czysta" /ˈt͡ʂɨsta/ and "trzysta" /ˈtʂɨsta/ are different words. Maro 16:02, 26 June 2012 (UTC)

I believe the tie bar turns cat shit into catch itMichael Z. 2012-06-27 02:48 z

The only difference I hear between those phrases is in stress and maybe length of the first vowel (ˈkæˌtʃɪʔ vs. ˈkæˑ.tʃɪʔ (or, in either case, with at the end)).​—msh210 (talk) 04:22, 27 June 2012 (UTC)

Robot to create inflected forms

I've noticed for some languages with lots of inflected forms (e.g. Latin), we often have inflection tables in the main entry, but the inflections are either redlinked, or blue-linked but to another language. I was thinking, why not create a bot that parses the inflection tables and then creates the redlinked pages, or adds e.g. a Latin section to the page blue-linked for another language. The inflected forms entries would be rather pro forma, so a bot should be able to do that. I'm not volunteering (I might give it a go one day, but don't have the time right now.) Was more just raising the suggestion to see what others think. Has this idea been discussed before? ZackMartin (talk) 09:45, 26 June 2012 (UTC)

I've been here since 2009 and to my knowledge... no it hasn't. Interesting, I will let someone with more knowledge on the matter reply. Mglovesfun (talk) 09:52, 26 June 2012 (UTC)
We have User:FitBot, though it looks as though it hasn't been run in awhile. -Atelaes λάλει ἐμοί 10:19, 26 June 2012 (UTC)
Would be great if there was a good bot to do this. --Anatoli (обсудить) 23:04, 26 June 2012 (UTC)
I'm currently working on an Ancient Greek inflection bot. Aside from orthography, Ancient Greek and Latin are very similar in regards to their inflection. I might be able to do both. Note the "might" in that sentence. If anyone has their heart set on getting these forms, they might want to pursue other avenues in the meantime. -Atelaes λάλει ἐμοί 01:33, 27 June 2012 (UTC)
Um, User:SemperBlottoBot, right, or am I missing something? Latin inflected forms should all be bluelinks. Anyway, thanks, Jesse, and I'm sorry that I haven't really been able to help beyond suggesting it. --Μετάknowledgediscuss/deeds 14:23, 27 June 2012 (UTC)
SemperBlottoBot does indeed create inflected forms of Latin, Italian and French nouns, adjectives and verbs. It tends to run regularly only against the language that I happen to be working on at the moment. Latin was getting a bit behind but should now be up to date. It doesn't cover unusual inflected forms, so some Latin verbs, especially, will remain with red links. SemperBlotto (talk) 14:41, 27 June 2012 (UTC)

Indefinite or long-term blocking of registered well-meaning editors

Past discussion: User talk:Msh210/Archive/Razorflame.​—msh210 (talk) 04:24, 27 June 2012 (UTC)

I think we need a policy statement on this matter. Active registered users who are trying to be useful should not be treated the same way that we treat anonymous IPs who vandalize or write love messages. Specifically I am referring to the case of User:Luciferwildcat. Equinox and Mglovesfun want to block him indefinitely, but without any sort of due process. Equinox gave the reason as "Abusing multiple accounts: User:Gtroy, User:Acdcrocks. persistent bad editing and refusal to improve", but the accounts User:Gtroy and User:Acdcrocks have not been used in over eight months (and were only ever used because of a lopsided personal dispute with Dick Laurent when Gtroy first started editing back in 2011). No diffs or any other evidence supporting the charges were ever provided, and I could not find any warnings or anything that looked like a reasonable and fair treatment. Then Mglovesfun gave the reason as "intimidating behavior/harassment, disruptive edits" (but still no diffs or evidence of any sort). I think the "intimidating behavior/harassment" that Mglovesfun mentioned refers to his saying that Equinox was mean (but I could not find the specific exchange). For the details, see User talk:Luciferwildcat#Unblock (23 June 2012).
This does not come up very often. It happened last year with User:Razorflame, and there were a couple of efforts to ban Daniel, and that editor who had a strange joint disease that forced him to type with his knuckles. In Daniel’s case and the knuckle-typer’s case, the matters were brought and discussed here, as they should have been. For User:Razorflame and User:Luciferwildcat, they were blocked, one for a year, the other permanently, without any discussion, specific charges, evidence, or any other process that would seem reasonable for a registered editor who is clearly trying to do a good job. I think we need a policy statement, beginning with this case against User:Luciferwildcat. I think such a block of this kind of registered user should be put to a community vote. —Stephen (Talk) 01:08, 27 June 2012 (UTC)

I would prefer not to have to go through a formal vote to block a problematic user. I'm sorry, but that's simply too much overhead. I tend to think that our current system, whereby an administrator can block users, but other administrators can contest such blocks, while imperfect, is probably the best. Speaking of Sven70 (talkcontribs) (the knuckle-typer), I think there are some strong parallels between that case and the current one with Lucifer. I remember it well, as I was the one who placed the long-term block. Sven70 had the bad habit of making mistakes, refusing to correct them or improve, and then playing the victim card when administrators chastized or blocked them. Lucifer is following a similar tactic. He does make a lot of good edits, but he also makes a lot of bad ones. His prodigious speed makes for a lot of mistakes that need to be cleaned up, and no one wants to be the one saddled with the tedious work of all that cleaning. Whenever someone asks him to improve, he either ignores it, or acts as though he is being needlessly persecuted. I think Mglovesfun and Equinox are doing a poor job of demonstrating it, but I think the block is quite justified. -Atelaes λάλει ἐμοί 01:54, 27 June 2012 (UTC)
Maybe a formal vote to set the initial block is too much, but we could instead require that a vote is started after it is placed, in order to make it last longer than, say, a week. —CodeCat 02:05, 27 June 2012 (UTC)
Or perhaps if someone contests a block they can start a vote, with the burden of consensus on retaining the block? -Atelaes λάλει ἐμοί 02:25, 27 June 2012 (UTC)
Then we'd have to allow blocked users to start and modify votes, which opens up all kinds of issues (like sockpuppets). Besides, a lot of blocks seem to go unnoticed by most users, so it's likely that many users who would contest it don't because they didn't know it happened. And the blocking administrator of course has no personal incentive to contest the block. So, requiring by policy that the blocking administrator starts the vote prevents that. —CodeCat 11:51, 27 June 2012 (UTC)
I think Atelaes meant if an admin contests a block, as in the present case. And we are only talking about the blocking of registered, active, well-meaning users, not anons or vandals like WF. The registered user who is being blocked would not necessarily have to vote, although it would not hurt to allow him to vote and postpone the block until the vote is in. After all, it isn’t about vandals. As I pointed out, this rarely ever comes up. —Stephen (Talk) 23:58, 28 June 2012 (UTC)
In the specific case of Lucifer, I agree with the block, and had been tempted to block him myself several times. I agree that most of his edits were good, but he could never had been whitelisted and it was just too much trouble to review them all. I could never find an individual edit that would have justified a block, so never did it; but cumulatively, he was just very annoying. SemperBlotto (talk) 07:11, 27 June 2012 (UTC)
Yeah it's a really tricky one. As a rule, we should put Wiktionary first and editors second. That said, I said I'd indef block Luciferwildcat if he verbally abused anyone else, and I don't think he has yet, so I'm okay with the unblock. Mglovesfun (talk) 09:51, 27 June 2012 (UTC)
I strongly agree with the comment above that "His prodigious speed makes for a lot of mistakes that need to be cleaned up, and no one wants to be the one saddled with the tedious work of all that cleaning." I tried to go through all of his contribs checking formatting and attestability, at a time when he'd made ~5000; I got through ~1000 before stopping. Whether to block him or not is a tricky question, because the majority (>50%) of his work is acceptable. Also, FWIW, his contributions to Wikipedia are also split, sometimes acceptable, sometimes undesirable. - -sche (discuss) 20:22, 27 June 2012 (UTC)

@Stephen G. Brown: I really respect the effort you've put into this matter, and I generally agree that blocking oversight would be good. However, I expect if it were to be put to a vote, Lucifer would be blocked anyway. All the same, I will support endeavors to build a policy and I even think creation of a vote, even in this case, can only benefit us by making the community opinion indisputably clear. --Μετάknowledgediscuss/deeds 14:32, 27 June 2012 (UTC)

In Lucifer’s case, there have been several mentions here of only a few minor problems among his prodigious contributions. It sounds like his main problem is that he works too hard and does too much. Looking at his user page, apparently no specific complaints have ever been brought to his attention, only vague accusations of being slightly annoying, and inappropriate accusations of lying, exaggerated assertions of abuse and harassment, and a silly trumped-up charge of a year-old minor misdeed, almost all of which is simply not supported by the evidence. He didn’t abuse multiple accounts, hasn’t lied, has not harassed or threatened anyone. Nobody has pointed out a single actual error, just something about him being cumulatively annoying. He just does too much, works too hard, so we’re gonna chase him out with the shit-house broom.
Has anyone who has seen some of his occasional errors ever thought to make him aware of these errors and ask him to fix them himself? Apparently not. I think if you’re really going to block someone like him for whatever trouble he is causing, you should actually document some of it and explain it clearly and calmly on his talk page, instead of those contrived remarks about him lying, etc. I would do it myself, but I still have not seen any evidence of anything he has done wrong. —Stephen (Talk) 23:58, 28 June 2012 (UTC)
I just noticed this discussion, and while I am indirectly a part of it, I think that you should only block users if they are directly violating the rules and policies of the project, never for anything else. That being said, I still contest the one year block placed on myself, as I do not agree that it should have happened, and I agree that something needs to change to prevent such a thing from happening in the future. I agree that if more than a few administrators contest a block, it should not happen. Razorflame 23:30, 29 June 2012 (UTC)
@Stephen G. Brown no I think it's that he produces so many invalid entries, or entries with formatting problems, and seems to be unable to get better. Also personal attacks that have nothing to do with Wiktionary, and everything to do with anger/attempting to hurt people's feelings. As far as I can tell, you haven't actually bothered to do any research into the matter, which is why you're so badly informed. Mglovesfun (talk) 23:34, 29 June 2012 (UTC)
I think Stephen's points are valid, that there haven't been concrete, blockable offenses, and Lucifer's just been blocked because he does create some dodgy entries, and misspellings like [[chemical inbalance]], and no-on wants to monitor his many contributions for such things: but if we assume good faith, most of us have accidentally misspelt or misplaced entries before. (I've mistakenly created {{citations}} pages in the main namespace.) I'd support unblocking him. - -sche (discuss) 02:58, 30 June 2012 (UTC)
  • Nobody should be blocked for making bad edits in good faith, especially prolific contributors which are sorely lacking on this projects, and particularly people with disability. If you don't like the x% junk that Luciferwildcat was producing - just tag it and let someone else clean it up. Content is more important than the length of backlog or the average quality of editors. At any case, I think they should be unblocked. --Ivan Štambuk (talk) 16:40, 1 July 2012 (UTC)
    I have some reason to question this contributor's good faith. In what might be called excessive enthusiasm for entries relating to hair, such as for arm hair, this contributor has created entries for armhair and other similiar terms with purported attestation. On closer examination the attestation is ambiguous or simply wrong. As you know this kind of attestation provides the justification for the corresponding open compound, which would almost certainly fail SoP. This looks suspiciously like deception to me. DCDuring TALK 01:02, 4 July 2012 (UTC)
  • I have created this vote: Wiktionary:Votes/2012-07/Blocking of Luciferwildcat. - -sche (discuss) 23:39, 3 July 2012 (UTC)

Vote open for well documented languages

The vote has now begun on well documented languages. This will increase the number of languages covered by the recent limited documentation (endangered languages) vote from perhaps 3000 to around 6850. See WT:Votes/2012-06/Well Documented Languages to vote. --BB12 (talk) 06:40, 27 June 2012 (UTC)

Courtesy when reverting people's edits

I have been an occasional contributor at Wiktionary over quite a few years. I have made perhaps 100 edits I suppose, many of them eminently worthwhile, if I may say so myself. A while ago I began to get very annoyed that some of my edits, which I put some effort into, were being reverted without any comment or explanation. I find this so rude and annoying that I pretty much resolved to stop bothering to try to contribute. However, yesterday I relented and did make a small addition, which I was appalled to see was AGAIN reverted without any explanation [7]. I am astonished at this repeated discourtesy towards people who are trying to contribute. 11:20, 27 June 2012 (UTC)

Record... stuck... *cough*. Mglovesfun (talk) 11:34, 27 June 2012 (UTC)
Wow you're easily appalled, may I suggest you don't ever leave the house... you'd be amazed what happens in the real word. On an equally serious note, this does seem like a good spot to use undo rather than revert, as undo allows one to make a custom edit summary, and revert does not. This is what SemperBlotto did the second time right. Mglovesfun (talk) 11:37, 27 June 2012 (UTC)
That can be frustrating. My understanding is that some administrators delete a lot of garbage and given the small number of people available, there isn't time to give a careful review to a lot of issues. The solution is to bring the topic up at the Tea Room and find out how people feel about it. Also, if you create a user name and at least put a Babel box (see "user pages" on WT:USER) on your page, you will get more respect. That's because an IP number is an indicator of someone who is just having fun. --BB12 (talk) 19:08, 27 June 2012 (UTC)
Has anyone here ever wondered why there is only a "small number of people available"? Perhaps if biting the newbie wasn't a celebrated pastime here, Wiktionary would actually have more editors willing to help maintain the project. Kaldari (talk) 07:59, 29 June 2012 (UTC)
Hear, hear! The boilerplate welcome message aside, the nasty welcome that many people get seems counterproductive to me. --BB12 (talk) 09:44, 29 June 2012 (UTC)
As I told you last time — SemperBlotto's rollbacks were not rude. It's your response that was rude. Rudeness is decided by cultural norms, and Wiktionary does not have the cultural norm that you think it does. (You must have us confused with Wikipedia.) If you are unable or unwilling to adapt to Wiktionary practices, then you should leave. This sort of comment is rude and unacceptable behavior here. (The culturally appropriate behavior would have been to ask what was wrong with your edit.) —RuakhTALK 20:43, 27 June 2012 (UTC)
BB12 is right. If you create a user account, give yourself a name, even introduce yourself a little on your user page, it suggests that you are serious, as you say you are. --Haplology (talk) 04:21, 4 July 2012 (UTC)

Wiktionary validator

I'm writing a validator for en.wiktionary (it run on the dump file) and I'm looking for some clarification about formatting translations.

  • qualifiers - multiline:
    Sometimes it is useful, or even necessary, to explain how to use a translation but I cannot find a convention on how to format these explanations. Looking at actual translations the situation is , oh well... anarchic. cousin#Translations is a good example.
    I'd like to see a single way to format translations, I like the "* language, *: sublanguage, *:: newline" format, as in cousin#Translations -> "nephew or niece of a parent" -> Chinese. Only the language line uses a bullet list and definitions are one per line for complex cases (as in Mandarin) and single line for simple cases (as in Min Nan)
    If multiline is not necessary (or not to be used, possibly messy if {{trans-mid}} end up in the wrong place) a semicolon could be used instead of a newline (as in yours#Translations -> possessive pronoun -> Italian)
  • italic
    Are (''text''), {{italbrac|text}} and {{qualifier|text}} equivalent? wouldn't be better to use only one? (I vote for {{qualifier}} since it is meant to qualify a translation)
  • translations should be short and "template only" (IMHO)
    It would be great (from a validator point of view) to have only combinations of {{t}} and {{qualifier}} templates, with commas, semicolons and newlines (*::) to separate/group them. However some cases are excluded (such as the example in so the idea needs some work...:
    * Arabic: {{t|ar|فراشة|sc=Arab|f|tr=fará:sha}}; (fertito) ''(Morocco)''; (fartattu) {{italbrac|Tunisia}}

--(Fedso (talk) 20:49, 27 June 2012 (UTC))

None of these questions have agreed upon answers and standardized formatting for anything beyond "definitions start with #" is likely to be contested. If you try to impose a standard for anything you will run into editors who have been formatting it in their personalized way for years and will resent your attempts at consistency. Sorry to be so negative but you're working against years of path dependence. 23:47, 27 June 2012 (UTC)
...I love challenges :) however maybe you are too pessimistic, there are already rules: Wiktionary:Translations, Wiktionary:ELE#Translations, Template:t; in the vast majority of cases editors comply with them. I noticed problems arise when, for complex cases, there is no standard and each editor uses the best format he can think of. Since "best" is subjective it is natural to end up with different formats.
Said that it is my intention to improve the current standard, not to impose a new one. I think a "good" standard should be naturally adopted with time and shouldn't be set in stone either (software dev. mindset: there is no "perfect", only "good enough" with a moving "enough"). Eventually, I think, a uniform format will also improve readability giving more credit to the great hard work editors are putting in this project.—Fedso (talk) 10:02, 28 June 2012 (UTC)
Re italics: {{qualifier}} should be used, yes. {{italbrac}} has been deleted. - -sche (discuss) 00:44, 28 June 2012 (UTC)
I am doing something similar, i.e. extracting translations from the dump. You might have a look at User:Matthias Buchmeier/trans-en-es.awk.
  • multi-line: I far as I understand multi-line translations are discouraged.
  • examples: Examples are supposed to be put on the respective non-English language section.
  • t-templates, template only translations: I agree that generally templates should be be used for translations. However there is a problem with Sum-Of-Part (SOP) translations, for which no standard formatting exists. Particularly it's not clear how to add transliterations to SOP-translations and whether to use a template for them. Some SOP-translations are formated with {{el-p}} (only Greek), some {{onym}}, others have individually wiki-linked words. Anyhow IMHO the practice of putting transliterations in round brackets is no good idea as in that case there is no easy way to (automatically) identify them as a transliteration.Matthias Buchmeier (talk) 11:14, 28 June 2012 (UTC)
There is another thing you could validate for translations: genders. It may be a bit tricky because genders should only be present in noun translations, and generally not in adjective translations. And of course you need to identify which languages have genders and which don't. —CodeCat 11:26, 28 June 2012 (UTC)

Update: Thanks for the scripts Matthias, I gave them a good read :) Validator is slowly shaping up and I have a new question about headings:

  • in dog (English section) Noun is a level 3 heading and next one, Synonyms, is level 5. In Wiktionary:Entry_layout levels are never skipped but it doesn't actually say that it cannot be done. Your choice, should I make the validator strict of flexible when checking heading levels? —Fedso (talk) 10:03, 8 July 2012 (UTC)
I've now corrected the header levels at dog. At one point we had two separate etymology sections, so Noun was a level-4 heading (under ===Etymology 1===). When the second etymology section failed RFV, I guess the subsections within the Noun section never got promoted from level-5 to level-4. —RuakhTALK 16:13, 8 July 2012 (UTC)

Archaic en-verb.

¶ On the Spanish Wiktionary, we possess an alternative verb template that includes the common -est & -eth forms, so I propose that we permit one here as well. It need not be applied to every English verb, just the ones where the archaic inflexions are attestable. Are there any against this idea ? --Æ&Œ (talk) 23:12, 28 June 2012 (UTC)

This user objecteth not. —Angr 23:20, 28 June 2012 (UTC)

We can entitle the new Template as {{en-verb/2}}. I envision the Display in the Entries as :

carry (second‐person singular simple present carriest, third‐person singular simple present carries or carrieth, present participle carrying, simple past and past participle carried)

This may require a lot of rearranging though. I can attempt to create this Template myself, but even if I could modify the current Templates, I wou’d likely bungle the Set‐ups due to my lack of Experience with the Templates. --Æ&Œ (talk) 13:08, 29 June 2012 (UTC)

I think you should put (archaic) in front of carriest and carrieth. Siuenti (talk) 17:56, 29 June 2012 (UTC)
I think a better name would be eng-verb-arch, which is more mnemonic. Also, I suspect the "/" might cause it to be interpreted as referring to a subtemplate of en-verb called "2". As for format, I like the idea of having pairs of ·modern form·, (archaic) ·old form·, thus: "second‐person singular simple present carry, (archaic) carriest". That would give a better feel of the relation of the old to the new. I don't know what others might think about having so much verbiage in that place, but I could see the archaic template replacing the normal template when used, and having the usual stuff with the old forms integrated into it. The alternative would be using the arch template to provide just the old forms, on a second line. Chuck Entz (talk) 04:29, 30 June 2012 (UTC)
I seem to think this idea has already been discussed and rejected. Not that it shouldn't be discussed again. Mglovesfun (talk) 10:44, 1 July 2012 (UTC)

Wikimedia Foundation Request for Comment

You may be aware of the English Wikipedia's blackout to protest the proposed U.S. legislation Stop Online Piracy Act and PROTECT IP Act and the Italian Wikipedia's protest of the proposed Italian legislation DDL intercettazioni. The Wikimedia Foundation wants to know whether the Wikimedia community is willing for it to join an organization called the Internet Defense League, which has the professed aim of coördinating more such protests. Unfortunately, the Foundation representatives only directly notified that part of the community that is on the English Wikipedia. ☺ The RFC, on Meta, is hyperlinked above.

Uncle G (talk) 11:51, 29 June 2012 (UTC)

  • Thanks for informing people about the RfC. Just a quick remark that it is not true that "the Foundation representatives only directly notified that part of the community that is on the English Wikipedia" - this was posted on Wikimedia-l (formerly Foundation-l), quite the usual venue for such issues, and on the "Wikimedia Forum" on Meta. Regards, Tbayer (WMF) (talk) 15:23, 29 June 2012 (UTC)

Well documented languages - revision

After the vote for well documented languages began, controversy over whether a list of inappropriate or appropriate sources should be kept (for one-source words) led me to modify and reset the vote using "appropriate" (no change from the current rule with respect to that aspect). The vote can still be found at Wiktionary:Votes/2012-06/Well Documented Languages. Please vote! --BB12 (talk) 01:30, 30 June 2012 (UTC)

Conjugation tables

Related, or even as an alternative, to #Archaic_en-verb: the German Wiktionary has entire conjugation tables like de:melt (Konjugation) and de:be (Konjugation). These tables are incorrect in places ("have been been"), but if we could design correct tables, would it be desirable to include them in entries (in the way we include French verbs' conjugations), or on subpages (as de.Wikt does) or in appendices (as fr.Wikt does)? The tables could also contain archaic forms, like "beest"/"meltest" and "beeth"/"melteth". If nothing else, we should consider having three such tables in appendices: one of be, one as a model of transitive verb conjugation (with a "have been [past participle]" form), and one as a model of intransitive verb conjugation (without such a form). - -sche (discuss) 06:03, 30 June 2012 (UTC)

Subpages are nice but they would have to have language sections themselves for when the inflection of more than one language is shown. —CodeCat 10:40, 30 June 2012 (UTC)
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.