Wiktionary:Beer parlour/2013/August

Add script to HTML language tags? edit

Now that we have a nice framework for languages and scripts set up in Lua, I was wondering if we could do this. The HTML standard allows adding script tags into the language tag when disambiguation is needed. For example, it's valid to write this in the HTML: lang="sh-Cyrl". This tells the browser specifically that it's dealing with Serbo-Croatian written in Cyrillic. Such information is obviously useful, so I think we could try to automate this through our modules. It would work like this:

If the language has only one script (in Module:languages), show only the language tag.
If the language has more than one script, then see if the script code is a valid script tag as well. Things like "unicode", "polytonic" and "fa-Arab" are not valid tags; they have to be four characters long and begin with a capital letter (the standard is case insensitive, but all the ISO codes are named that way).
If the script code is a valid tag, add it onto the language code. Otherwise, see if it contains a valid tag within it. This lets us extract "Arab" from "fa-Arab" nicely.
If there's still no script tag, then just show the language tag alone like normal.

This would be used in addition to our own script codes, which are placed in the "class" attribute rather than the "lang" attribute. So a fully-formed Cyrillic Serbian headword would look like this in HTML: мо̑ре. —CodeCa t 11:54, 1 August 2013 (UTC)[reply]

Yes.

See the IANA Language Subtag Registry. We should try to respect the “suppress-script” attribute. For example, fa = Persian has suppress-script = Arab, meaning that Arabic is the language’s default script, and should not normally be indicated. —Michael Z. 2013-08-25 14:45 z

That would be more difficult to do, though. Is it an error to put the script in anyway? —CodeCa t 14:55, 25 August 2013 (UTC)[reply]

It is not an error. BCP 47 says “the script subtag SHOULD be omitted when it adds no distinguishing value to the tag or when the primary or extended language subtag's record in the subtag registry includes a 'Suppress-Script' field listing the applicable script subtag.”[1]

However, I betcha if you add the script subtag, someone’s browser will use the wrong font, leading to constructive feedback. —Michael Z. 2013-09-09 17:22 z

Centralising related terms edit

Here’s an idea to reduce the amount of reduplication in WT: moving sets of related terms to individual pages and transcluding them.

For example, the page possible has the following content in its Related terms section:

* [[possibility]]
* [[potence]]
* [[potency]]
* [[potent]]
* [[potentate]]
* [[potential]]
* [[potentiality]]

These words ultimately derive from the Latin verb possum. Each of them is also related the others (as well as possible,) so each entry would have to repeat nearly identical content. Instead, we could move this content to some page like Etymon:en:Latin possum or Template:rel-terms/la-possum/en as:

* {{l-self|en|possibility}}
* {{l-self|en|possible}}
* {{l-self|en|potence}}
* {{l-self|en|potency}}
* {{l-self|en|potent}}
* {{l-self|en|potentate}}
* {{l-self|en|potential}}
* {{l-self|en|potentiality}}

We could even embolden the main words:

* {{l-self|en|possibility}}
* '''{{l-self|en|possible}}'''
* {{l-self|en|potence}}
* {{l-self|en|potency}}
* {{l-self|en|potent}}
* {{l-self|en|potentate}}
* '''{{l-self|en|potential}}'''
* {{l-self|en|potentiality}}

And using a template ({{rel-terms|en|Latin possum}}) transclude the content and display a link for editing the page in each of the entries. — Ungoliant ^(Falai) 12:19, 1 August 2013 (UTC)[reply]

I like the idea — rather like what we do with synonyms. I wonder whether this could somehow cover the Derived Terms as well (since they are a subset of Related Terms). Equinox ◑ 12:23, 1 August 2013 (UTC)[reply]

It can be done, but it won’t be as useful, since the set of terms derived from X will only be used in the page X. — Ungoliant ^(Falai) 12:25, 1 August 2013 (UTC)[reply]

This would certainly force us to be more rigorous in what is included in related terms. Are terms related if they are cognates through reconstructed languages or just attested languages or just the main ancestor language (ie, Old English for English)? Even including both poss and potent stem words in English in the same equivalence class is not obvious.

Also, I think multi-word terms should be excluded in the first round of this. DCDuring TALK 13:20, 1 August 2013 (UTC)[reply]

Why stop at the Latin origin? If we are going to do this, we might as well go back to Proto-Indo-European and list all cognates that are present in English, even the less obvious ones. That's the only way I see that we can be completely objective about this. However, I don't think that ====Related terms==== is the best place to put these lists. Most people take that header to mean semantically related, and I myself have included terms there only if they were both semantically relevant and etymologically related. There is no real practical use to listing headphones as a related term to headword purely because they both contain "head", as those two words don't have anything to do with each other semantically. So I definitely agree with DCDuring that we should not include compound terms in these lists, only terms derived through affixation. But I also think that we should make a distinction between semantic relationships (people who found this term might want to find terms related to the same general subject) and etymological relationships, and I think that "related terms" should refer to the former, not the latter. We should use a different header name for etymological relationships. —CodeCa t 13:31, 1 August 2013 (UTC)[reply]

We already distinguish them: Related terms is for etymological relation, and See also for semantic relation. — Ungoliant ^(Falai) 13:37, 1 August 2013 (UTC)[reply]

For terms more distantly related, only the main term should be included, IMO. For example, I’d include sedentary but not sedentariness as a related term of sit. — Ungoliant ^(Falai) 13:37, 1 August 2013 (UTC)[reply]

What would you show at [[sedentariness]]? Nothing? "See sedentary."?

More fundamentally, how does this help users? What kind of users would it help? DCDuring TALK 13:58, 1 August 2013 (UTC)[reply]

Avoiding duplication lets us offer Related Terms for many many more words without the slow and error-prone process of us having to type them at each entry. That is a user benefit. Equinox ◑ 14:02, 1 August 2013 (UTC)[reply]

sedentary, sedentism, sedentarise, sedentarisation. — Ungoliant ^(Falai) 14:09, 1 August 2013 (UTC)[reply]

What I do, if I find that I am adding a group of related words is to construct a ====Related terms==== section in notepad (not saved anywhere), then copy/paste it into each word, remembering to remove the word itself from its own list. Simples. SemperBlotto (talk) 14:09, 1 August 2013 (UTC)[reply]

I also do that (though I have a script that automatically removes the entry, wikifies and adds box templates based on size.) But it often happens that I later find another related term, and then have to manually add it to a bunch of entries.

An advantage that you might be able to more closely relate to: less edits to patrol. — Ungoliant ^(Falai) 14:16, 1 August 2013 (UTC)[reply]

How would the novice/casual contributor know how/where to add or edit the list? We already have to revert bad interwikis and addition of content to categories (as opposed to addition of categories to entries). This could end up as another opportunity for newbies to accidentally screw things up and feel dumb. Chuck Entz (talk) 14:18, 1 August 2013 (UTC)[reply]

The template transcluding the page would display a link for editing the page with related terms. {{Portuguese personal pronouns}} has one. — Ungoliant ^(Falai) 14:21, 1 August 2013 (UTC)[reply]

Aren't derived terms and related terms just an extension of the etymology? We should move these sections to the etymology section (in collapsible boxes of course) and extend the etymtree module to show them. DTLHS (talk) 16:07, 1 August 2013 (UTC)[reply]

They are supposed to be etymological relations, though many contributors can't read our minds and insist on putting terms that are semantically related.

Eliminating the heading would reduce the amount of merely semantic additions. I think it would further diminish the potential for contributions by normal users, not something that concerns many here, but which does seem to worry some at WMF. DCDuring TALK 16:20, 1 August 2013 (UTC)[reply]

I oppose moving RT and DT to etymology sections. --Dan Polansky (talk) 11:37, 3 August 2013 (UTC)[reply]

I like this idea; it fits nicely with the idea mentioned in a recent discussion of moving all semantic relations into Wikisaurus. We already templatize certain usage notes that are included in multiple entries, such as the one about less/fewer, the one about ize/ise, and the one about Latin vowels/letter names. Re Equinox "I wonder whether this could somehow cover the Derived Terms as well (since they are a subset of Related Terms)": I think it could, using a mechanism similar to Template:etymtree's. And yes, if we could distinguish semantically related terms from etymologically related terms, that'd be great. I think we've all seen that no-one except veteran users perceives "related terms" as implying etymological and not semantic relation. - -sche (discuss) 17:05, 1 August 2013 (UTC)[reply]

I oppose moving all synonyms away from the mainspace to Wikisaurus. --Dan Polansky (talk) 11:37, 3 August 2013 (UTC)[reply]

This would introduce significant dispersion of content across various pages which would impede further reuse and processing by third parties. Entries should be as static a possible, with no content transcluded or generated from other pages. If duplication is the problem, something similar to {{trans-see}} could be created, but only for related terms (e.g. "For more related terms, see the possible."). --Ivan Štambuk (talk) 17:41, 1 August 2013 (UTC)[reply]

I agree: keep RTs where they are, enhanced with "For more related terms, see the possible." or the like. --Dan Polansky (talk) 11:31, 3 August 2013 (UTC)[reply]

I have made a new etymtree module at Module:User:DTLHS/etymtree. Some examples:

Albanian ujë: {{:User:DTLHS/Template:test|ine-pro/*wódr̥|sq|ujë}}

Czech voda: {{:User:DTLHS/Template:test|ine-pro/*wódr̥|cs|voda}}

English whiskey: {{:User:DTLHS/Template:test|ine-pro/*wódr̥|en|whiskey}}

DTLHS (talk) 03:15, 2 August 2013 (UTC)[reply]

As a proof of concept I have created Template:etymtree/ine-pro/*dʰeh₁-, to show how big such pages could eventually become (I wasn't completely thorough, you could probably double the size of it without too much trouble). DTLHS (talk) 04:56, 2 August 2013 (UTC)[reply]

As an aside: to solve the redundancy problem of RT, I expanded Appendix:English words by Latin antecedents years ago. It is not linked from the mainspace, and has no anchors, but it makes it possible to collect lists of related terms in one place, for Latin-derived English words. I am not making any proposal, merely posting for interest. --Dan Polansky (talk) 11:31, 3 August 2013 (UTC)[reply]

The Template:etymtree/ine-pro/*dʰeh₁- example seems like a proof of lack of utility. I doubt that user want us to be all-inclusive as much as they want us to be appropriately selective.

Do all MWEs, all compounds, or indeed all multi-stem or multi-morpheme terms have to show all of the related terms for each component? Consider a term like triggerfish. Should the its RTs include all terms related to trigger and all those related to fish? 60 or so DTs appear at [[fish]], which list does not even include triggerfish. I could imagine some tree-size reducing principles that would make Related terms more useful for users, though it would require some thought about how to be appropriately selective. Implementation of something appropriately selective might require an algorithm and updating system that would dramatically reduce the number who could usefully contribute to the total implementation or it might increase the complexity of the structure of the repository of RTs.

Some more thought about how to keep the RT list useful rather than merely comprehensive, while retaining the simplification and elimination of redundancy, seems essential. — This unsigned comment was added by DCDuring (talk • contribs).

One idea I had about making the large potential lists more useful would be to label each derivation- grammatical, compounding, borrowing, inherited, reduplication, etc. Related terms could then be taken from anything borrowed or inherited but no other categories. DTLHS (talk) 00:45, 26 August 2013 (UTC)[reply]

It is the total size of the list that can overwhelm, not particular classes. Is there a way to include some additional derivation categories until the total number of RTs reaches some number, say, 12 or 20, never including grammatical derivation, but always including all of the borrowed and inherited categories. DCDuring TALK 12:18, 26 August 2013 (UTC)[reply]

Why not use Wikidata for all those relations? Dakdada (talk) 15:13, 26 August 2013 (UTC)[reply]

A good idea. It seems the Wikidata people are more interested in Wikidataising definitions though. — Ungoliant ^(Falai) 15:15, 26 August 2013 (UTC)[reply]

Yes, that would be nice but I don't know when or if Wikidata will ever be added to Wiktionary, what work needs to be done to make that happen, how far away it might be, how to use Wikidata if it is installed, or basically anything about it. In the absence of any clear information all we have is Lua. DTLHS (talk) 21:49, 26 August 2013 (UTC)[reply]

Old Chinese etymologies edit

See also: Beer parlour/2013/June

From June until now there has been a rather long discussion (BP: Phonosemantic interpretation) related to Old Chinese etymologies, meaning the origin of words (as sounds, not characters) in Old Chinese.

To narrow the discussion, I’m breaking off a separate topic.

How should we treat Old Chinese etymologies?

The standard way to treat these would be to put them in the Old Chinese (L2) Etymology (L3) subsection. Unless there’s a good reason to do otherwise, shall we do this?

Content-wise, etymologies from Schuessler (2007) and Baxter & Sagart (2011) (citing these specifically), which appear to be the main authorities, sound acceptable.

In more detail…

We currently have virtually no entries in Category:Old Chinese language, there is no Wiktionary:About Old Chinese, and there is no mention of Old Chinese (or Middle Chinese) in Wiktionary:About Sinitic languages, so specific details may need some hashing out, but using the general format for now should be fine.

The main issue with Old Chinese etymology is that it’s under active research and relatively speculative, at least compared with, say, Proto-Indo-European. However, it seems sufficiently established (and certainly of interest!) that it’s acceptable to include some etymology information, with two caveats. First, it should be phrased as tentative, e.g., “Baxter-Sagart reconstruct as */t-lˤewk/”. Second, it should only be in the Old Chinese entry, not in entries for modern Chinese languages. We trace English terms back to PIE, but that’s quite established. If we get to a state where most modern Chinese words are traced back to Middle Chinese and Old Chinese, we can think about including pre-Old Chinese roots in the modern Etymology section, but that’s realistically a long, long way off.

AFAICT, the two main references are Schuessler ABC Etymological Dictionary of Old Chinese (2007) and Baxter & Sagart Old Chinese reconstruction (2011); we have some data based on this latter in Appendix:Baxter-Sagart Old Chinese reconstruction (thanks Gilgamesh!). It seems fine – indeed, better – to include these sources inline, as illustrated above, following convention used in Latin (earlier etymologies are uncertain, and standard etymology dictionaries are thus referenced inline).

The proximate cause of this discussion was a number of edits by Lawrence J. Howell listing various sound symbolism origins of Old Chinese words in a new “Phonosemantic interpretation” subsection under the L2 Translingual section. We’ve generally instead been listing graphical origins of the character itself in this section, not of the word (the sound). The sound symbolism etymologies themselves are unsourced (the cited reference is a modern Japanese kanji dictionary, and has no contents on Old Chinese pronunciation AFAICT), and some editors (notably Wyang) argue that these go far beyond the sources. Further, sound symbolism in general is very easily speculative, so it raises alarms, even if it is ultimately well-founded.

Personally, I know nothing about Old Chinese etymology. Thus I’m happy to include content referenced from standard texts, but not anything that goes beyond these – this area is speculative enough as it is.

So to summarize: Old Chinese etymologies in the Old Chinese Etymology subsection, containing etymologies from the standard references, with citations – Schuessler (2007) and Baxter & Sagart (2011) (any others?)

How does this sound?

—Nils von Barth (nbarth) (talk) 15:35, 1 August 2013 (UTC)[reply]

I'd go as far as forbidding any unsourced etymology of being added involving an unattested languages (i.e.everything preceded by an asterisk; those etymologies when terms derive from attested language such as Latin are more or less uncontroversial). It seems that all of the problems with etymologies that we have involve editors speculating, copy/pasting from outdated sources, or simply imaginatively synthesizing various sources (if I understood correctly that is the main objection with Old Chinese phonosemantic interpretations).

Rearding Old Chinese etymologies: Wyang's Proto-Sino-Tibetan appendicies all seem to be sourced (should probably create some reference templates so that we can categorize by them, specify page numbers etc.) The only problem with Old Chinese entries that I can see is their names - if the authorities disagree too much on the reconstructed form (which seems to be the case), it could be problematic create a NPOV-abiding naming scheme. --Ivan Štambuk (talk) 18:04, 1 August 2013 (UTC)[reply]

Thanks Ivan!

The one tricky thing about Old Chinese is that it’s attested in Chinese characters, but the reconstructed pronunciations are much less direct than in alphabetic languages. So Old Chinese goes in the main namespace (since attested), under the entry for the character. Since reconstructed pronunciations for Old Chinese are more tenuous than alphabetic languages, I agree we should be strict in requiring sources.

The etymology can then link to the roots in the Proto-Sino-Tibetan appendices, where the form of the word is itself a best tentative. So long as sources agree on which root, if not necessarily its exact form, probably clearest is to link to the appendix entry and have details there. This seems to be what Wyang has done in the naming, like *(t)sam, where “(t)” is uncertain.

As a concrete example of what we could do, *(t)sam gives reconstructed pronunciations for 彡 in Old Chinese and Middle Chinese, but the 彡 page doesn’t include a Middle Chinese or Old Chinese entry, and the Mandarin section doesn’t include any etymology section. Shortest would simply be “Ultimately from Proto-Sino-Tibetan *(t)sam” in the Mandarin section, but adding Middle Chinese and Old Chinese entries would be helpful. These Modern/Middle/Old Chinese entries often look a bit redundant (often same meaning and character), compared with languages where an alphabetic word form changes over time, though meaning and character form do change over time, and this is the way we reflect it: secondary meanings developed at some point, and obviously shouldn’t have Simplified Chinese entries for Middle Chinese or Old Chinese!

—Nils von Barth (nbarth) (talk) 23:32, 1 August 2013 (UTC)[reply]

Looks good to me! Separate entries for Old and Middle Chinese could be useful for providing separate citations, discussions on semantic shifts, newly derived terms (compounds) and similar. I also suspect that there would be many identical etymologies for all of the Sinitic languages - perhaps one day a more "integrated" approach would be feasible, should the community decide so. --Ivan Štambuk (talk) 00:21, 2 August 2013 (UTC)[reply]

I wouldn't decompose it into Old Chinese, Late Middle Chinese, Mandarin and 15 or so other ISO modern varieties because there would be so much overlap between them (each having tens of identical definitions) but I guess this goes against established community consensus. Sometime ago I operated a bot on zh.wt to add all B-S (2011) Old Chinese etymologies to individual character entries. There are also templates there for other Middle Chinese and Old Chinese reconstructions (only placement in rhyme books is needed for Middle Chinese, eg. zh:耳). An example is the previously mentioned zh:二 ("two"), where the glyph origin and PST etymology/cognates are incorporated into one page. That formatting (header language, order of languages, etc.) is very far from what is currently allowed here, and even partial implementation would be very discussion-consuming. Wyang (talk) 00:58, 2 August 2013 (UTC)[reply]

Just to make sure everyone is on board: no Howell & Morimoto anywhere. I don't think their theories (phonesthemic, or glyphic, or the messy confounding of the two) have even been published in a journal, leaving the work well outside academic credibility, let alone accepted lexigraphical authority.

Note that the current Middle Chinese sections also need a reworking. They don't indicate "reconstructions" in any way, and don't indicate source, and don't indicate transcription type.

I would do:

Old Chinese (reconstructed)

Baxter & Sagar reconstruction (IPA): xyz

Schuessler reconstruction (IPA): xyz

Links to Wikipedia articles. And same for Middle Chinese. Asterisks-of-reconstruction if you think required.

HanEditor (talk) 06:24, 2 August 2013 (UTC)hanEditor[reply]

Community consensus is not set in stone and can change. If what you say is true, that typical entries such as 耳 (unfortunately mostly stubs on Wiktionary) would have many identical meanings, etymologies etc., perhaps the day that becomes a hindrance for editors a different formatting scheme could be adopted that would optimize presentation by reducing such redundancy within a page. What is important is to have constructive proposals and effectively engage the community. Look what has happened to phonesthemic interpretations by H&M - even though they are based on somewhat speculative and fringe interpretations of OC, by initially taking it the BP H has (mostly due to the absence of editors knowledgeable in OC) gained silent approval. Now it appears that sanctioning them was a mistake after all. Not creating OC/MC entries simply because you disagree with Wiktionary's treatment of Chinese varieties is IMHO more detrimental to your cause over the long run than vice versa. --Ivan Štambuk (talk) 04:17, 3 August 2013 (UTC)[reply]

Note in all of this that some Han glyph etymologies, in the Translingual section, give Old Chinese reconstructions as part of the (true) Phono-Semantic analysis of the character. (Hopefully you know that a "phono-semantic/phonosemantic" character in Sinology normally refers to characters created from a simpler "radical" character that describes a category, plus a "phonetic" character that has a rhyming or otherwise similar sound to the word that the new composited character is intended to represent. Howell very confusingly repurposes the term "phonosemantic" for his non-standard phonesthemic interpretations.)

So a necessary part of the graphical etymological analysis of a phono-semantic character, is to ask what pronunciation the "phonetic" element had in Old Chinese, such that it would have been chosen for this character. About 80% of Chinese characters are phono-semantic, and there should ideally, eventually, be glyph etymologies for all of them, with Old Chinese reconstructions for the "phonetic" and resulting composite character, in the Translingual sections. Here's an example, which cites a Baxter-Sagart reconstruction: http://en.wiktionary.org/wiki/的#Etymology

My point? Must I have one? Yes. So it's correct to have spoken Old Chinese in these glyph etymologies. And it's appropriate to separate glyph etymologies, in the Translingual section, from spoken-Old-Chinese verbal etymologies (which go further back to Proto-Sino-Tibetan, and are non-graphical), in a language-specific section. So I'm just pointing out, don't notice the Old Chinese spoken words in the glyph etymologies, and say those should be removed, or merged, or not duplicated elsewhere. Separate verbal etymologies are needed at least because 20% of characters are not phono-semantic, and thus shouldn't have any spoken Old Chinese in the glyph section.

Note that there may also be characters created in the Middle Chinese era, whose phonetic element may be a Middle Chinese not Old Chinese pronunciation, that might appear in the glyph etymology instead.

HanEditor (talk) 08:05, 2 August 2013 (UTC) hanEditor[reply]

User:Liliana-60 has removed all of the Howell & Morimoto etymologies. I hope that this discussion doesn't die out now that that is settled. --Ivan Štambuk (talk) 19:57, 4 August 2013 (UTC)[reply]

Hi Ivan, thanks for noticing! For the record, I did most of the reverting (about 140 pages), but Liliana’s work (about 20 pages) gave a good head start (thanks Liliana!).

Sorry for the delay in responding, everyone. Thanks for the great input (esp. HanEditor!) – I’ll reply in more detail this weekend. I’ve taken a stab at adding Old Chinese and Middle Chinese (etc.) to the discussion of Chinese languages and characters at Wiktionary:About Sinitic languages (latest edit) and Wiktionary:About Chinese characters#Etymology (latest edit). These are “AFAICT”, so please feel free to revise or change utterly! – just wanted to make sure this discussion leads to changes in or at least regularization of practice and guidelines.

(More later.)

—Nils von Barth (nbarth) (talk) 15:47, 8 August 2013 (UTC)[reply]

To summarize what I’m gathering from above:

For Chinese characters, I think HanEditor and I are in full agreement: keep characters and words separate, and no discussion of sound at pictograms and ideograms, but sound is relevant for phonosemantic characters, and (referenced!) reconstructions are appropriate.

See Wiktionary:About Chinese characters#Etymology for my shot at writing this up.

For historical Sinitic languages, there’s general agreement, but some concern (represented by Wyang) about separate Sinitic entries being redundant, full stop. Setting this issue aside for a minute, we seem pretty agreed on:

Standard Wiktionary procedure is to have entries for all modern and historical languages.
In historical Sinitic languages, reconstructed pronunciations are particularly speculative, due to written form not changing. …and thus references are particularly important.
Due to extensive cognates with identical written forms in Sinitic languages, full entries for every language can be pretty redundant. (…so minimizing this if possible is desirable.)
Other than these issues (and script and romanization variations), Sinitic languages are otherwise similar to other languages and can be treated similarly.

I’ve given some brief guidelines on historical languages at Wiktionary:About Sinitic languages#Historical languages; please feel free to flesh out or revise!

I’ve attempted to balance this “separate entries, but avoid redundancy” at Wiktionary:About Sinitic languages#Cognates and stubs, stating that it’s ok to have entries for each language (as per Ivan), but that these should really have something specific about the language, otherwise it’s not helpful and just redundant.

How do these look?

—Nils von Barth (nbarth) (talk) 16:17, 10 August 2013 (UTC)[reply]

To address the issue of separate/merged Chinese entries separately: it’s definitely a debatable point for modern languages, but separating off historical entries at least helps keep modern entries uncluttered by historical detail.

Wyang, your concern about Chinese entries being bulky and not very usable is well-taken, but I think inevitable given how Wiktionary is structured, and the situation is by no means restricted to Chinese – I’m sympathetic, but I don’t think there’s a really great solution. Because the fundamental element we use is a word in a language, the question of what is a language and what is a dialect significantly affects the dictionary structure. Consider pages such as (deprecated template usage) un, or “language” groups such as Yugoslav languages (Wiktionary:About Serbo-Croatian; currently one language again, was split for a time into Serbian and Croatian (and Bosnian? Montenegrin?)), or Scandinavian languages Danish/Wiktionary:About Norwegian/Swedish). More radically, wider groups such as Romance or Germanic could have merged entries.

We don’t have a good technical way to let users adjust the resolution, e.g., “Give me a dictionary of Provençal French.” vs. “Give me a standard French dictionary” vs. “Give me a Romance languages dictionary.” This is a fundamentally messy problem (do we want to line up definitions between Spanish and Italian entries??), so while in future we might have better technical tools, I think it really reduces to where we draw the line between language group/language/dialect.

Probably the best way, given our structure, is to do more overall contrasting within a language group at the ancestor language level, as you’ve already done at Proto-Sino-Tibetan and could usefully be done especially in Middle Chinese. How does this sound?

It might also be the case that different Wiktionary communities want to make different decisions – perhaps Norwegian Wiktionary wants the two varieties of Norwegian to be in one entry, but English or Swedish Wiktionary wants to treat these as separate languages. So perhaps merged Chinese entries make more sense for native speakers of (some form of) Chinese, while for speakers of English, separate entries are more useful.

Certainly most English users are interested in Standard Chinese, and thus it’s clearer to have simpler, unmerged entries for “Mandarin” (which I personally think should be called “Chinese”, but that’s another discussion). If one is interested in Min Nan, then for an English user a Min Nan-specific entry is useful, while for a user with some Chinese proficiency, using Chinese Wiktionary and looking for 閩 is probably more efficient.

So I think there are definite advantages for either splitting or merging entries, I don’t think either is generally superior, and different language communities can reasonably make different choices.

To return to historical Chinese languages, regardless of how we treat modern Sinitic languages, I don’t think we want to clutter the Mandarin entry with discussion of pronunciation or meanings (or citations) from the 6th century! At the very least, having separate entries for these past forms keeps the modern entry brief and usable.

—Nils von Barth (nbarth) (talk) 16:17, 10 August 2013 (UTC)[reply]

Proposed tit-for-tat to reduce various backlogs edit

WT:RFD and WT:RFV are perpetually backlogged with discussions that should have been wrapped up and closed ages ago. WT:RFDO is beyond ridiculous. I propose (not as a "rule" but as a courtesy) that from now on, whenever an editor adds a new section to one of these pages, please find one of the many old sections ready for closure and close it, or a closed section ready for archiving, and archive it. Also, there are many very old discussions which lack a clear resolution altogether, and could use some learned input. Cheers! bd2412 T 21:16, 1 August 2013 (UTC)[reply]

I like this idea. I'm actually intending to clean out RFDO myself, so hopefully I can get it down to merely "ridiculous". :) —Μετάknowledge^{discuss/deeds} 21:27, 1 August 2013 (UTC)[reply]

It’s a good idea. The only drawback is that it might discourage people from nominating stuff, but I’ll try to follow it. — Ungoliant ^(Falai) 22:49, 1 August 2013 (UTC)[reply]

How do you archive discussions? Is there a guide anywhere? I always shied away from it because I was never sure what to do. —CodeCa t 22:52, 1 August 2013 (UTC)[reply]

Just make a new section on the talk page of the nominated entry or page (or create if the page does not exist), and copy the discussion into a {{rfd-failed|text= }} or {{rfd-passed|text= }} template, with everything (including the initial header) after the text=. bd2412 T 23:27, 1 August 2013 (UTC)[reply]

Ok, I can do that. But what about archiving two for each one you create? That way more goes out than gets put back in. —CodeCa t 00:28, 2 August 2013 (UTC)[reply]

FWIW, it's actually common to drop the header (by which I mean, the header that appeared over the discussion when it was a section on WT:RFV or WT:RFD) when archiving a discussion. If the header is kept as part of the archive, it should(?) be wrapped in {{fake==}} or {{fake===}} as appropriate, so it doesn't show up in the talk page's TOC. You can see examples on [[Talk:Wi-Fi]] and [[Template talk:dat.]]. - -sche (discuss) 01:08, 2 August 2013 (UTC)[reply]

@CodeCat - even better. @-sche - my impression that the header was included was based on the examples I've seen. Actually, I'm really not sure what to do about some mass nominations, whether certain RFD and RFDO discussions need to be archived at all. bd2412 T 01:32, 2 August 2013 (UTC)[reply]

Not sure this is a good idea. People (esp those who are not regular editors) may want to nominate without doing this cleanup, which could lead them (again, esp non-regulars) to close something wrongly or prematurely merely to allow themselves to post. I'd prefer a bot like Conrad's old one that automatically archives items tagged as closed. Equinox ◑ 10:42, 2 August 2013 (UTC)[reply]

I'm not proposing this as a rule, merely as a courtesy - I hope that it becomes a practice on the part of regular editors (and particularly admins, who tend to be the most active editors here). Since it will not be a rule, I do not expect that transient editors will follow this practice at all, or even be aware of it. bd2412 T 12:22, 2 August 2013 (UTC)[reply]

trreq's edit

How long do you think trreq's (translation requests) should be kept? Do we need any rules for them? I know some people are totally against these requests and {{trreq}}, I'm not. I use it too but I also remove some old ones and those, which are very unlikely to be filled, like a very specific name into an exotic language, e.g. requests to translate northern tree shrew into Burmese, Lao. DCDuring seems to have taken this very personally. He shouldn't take this personally. I fill many requests when I can (Hippietrail and Lo Ximiendo know this well). He has very valid points too, though.

My reasons are:

Too many requests (in categories) are discouraging. I got this feedback from at least 2 contributors, that's my personal feeling and observation. Basic, everyday words are not translated either.
A request causes a conflict when a neighboring language needs to be added in an accelerated method (a bug).
Many translations requests in a table also look annoying to some (not me).

I hope it won't generate another template rfd. I think the template is useful and is used. --Anatoli ^{(обсудить}/^вклад) 03:51, 2 August 2013 (UTC)[reply]

I agree with DCDuring on this; if we have a feature for requesting translations in certain languages, and someone is making targeted use of it, it seems inappropriate to remove the requests. (I confess I've removed a few requests myself, though, because of the bug you mention that blocks the addition of new translations...) - -sche (discuss) 04:27, 2 August 2013 (UTC)[reply]

How long do you think the requests should stay? Indefinitely, one year, six months? Do we really need to have the requests to be there, if they are unlikely to be filled. If we DO get volunteers, they can add translations even if there is no requests. In other words, a trreq is by no means a guarantee that the sought translation will be provided, they just sit there making more people dislike the feature, judging by previous discussions. --Anatoli ^{(обсудить}/^вклад) 05:14, 2 August 2013 (UTC)[reply]

I think it is impolite to request translations for useless entries such as Ghost of Christmas Present or how do I get to the train station. For normal entries trreq can stay infinitely, until fulfilled. As for northern tree shrew, DCDuring requested translations only into the languages indigenous to that animal's habitat; that's a completely justifiable use of {{trreq}}. --Vahag (talk) 08:49, 2 August 2013 (UTC)[reply]

They should only be removed if the requested language is unlikely to have a translation for the term. Otherwise, they should remain indefinitely. — Ungoliant ^(Falai) 09:25, 2 August 2013 (UTC)[reply]

{{trreq}} is good. I admit that having a table with too many trreqs makes the actual translations harder to spot. That's the only problem I have with it. Mglovesfun (talk) 08:59, 3 August 2013 (UTC)[reply]

I'm with Mglovesfun here. I usually only request translations for the most common languages: Russian, Spanish, Italian, French, Portuguese, German, Czech, Hungarian to name a few. I also try to only add them to words that would benefit from having more translations. Examples include swiftness, quickness, and other words like that. Razor flame 12:49, 3 August 2013 (UTC)[reply]

A problem with Template:suffix edit

Now that {{term}} has been converted to Lua as well, this template poses a bit of a problem for reconstructed terms. Our new way of specifying that a term is reconstructed is by putting * in front of it. That works well for links, which don't try to modify the term. But this template does; it puts a - before so now the term begins with -*. And that is throwing off Module:links, which expects reconstructed terms to begin with *, so it shows a script error. I'm not sure if there is really a good way to fix this other than to convert {{suffix}} to Lua as well.

But there may also be a better option. {{compound}}, {{prefix}} and {{suffix}} all do pretty much the same thing. The only real difference is that the latter two add hyphens and they have different categories. But with Lua we don't actually need this; we can achieve the same with {{compound}} alone. We can convert, say, {{suffix|green|ness|lang=en}} into {{compound|green|-ness|lang=en}}, and let the template detect the presence of - and respond accordingly. This will not work right if there are any entries that use {{compound}} with prefixes or suffixes but explicitly do not want to be placed in the prefix/suffix categories. But it's worth considering at least. —CodeCa t 12:41, 3 August 2013 (UTC)[reply]

Don't forget {{confix}}.

From a contributor perspective it would seem you should need not to change how the user interface works at all, however much you change the guts. Presumably each of the four templates would pass some parameters to the module in different ways, or with the name of the source template, so that different results would emerge in terms of categorization and formatting. But, if the present problem is only with {{suffix}} why bother with more right now? That would seem to be asking for unnecessary trouble. Once {{suffix}} has been Lua-cized to everyone's satisfaction and proven itself bug-free, then it could be applied to the other cases. That would seem to just be a question of designing the module with the ultimate result in mind. The situation has presented an apparently adequate motivation for Luacization and a nearly ideal rollout scenario.

But what other other problems with these four templates does Luacization actually fix. Does it improve performance? DCDuring TALK 13:06, 3 August 2013 (UTC)[reply]

We could certainly make a Lua version of {{suffix}} work to support the original behaviour. The module could detect the presence of * in the word and insert the - after it rather than before it (which templates can't do, because of the lack of string functions). But I figure if we're going to convert it all to Lua anyway, we might as well look at ways to simplify it and reduce redundancy. I'm sure it's easier to learn how to use only one template ({{compound}}) rather than four. I just added some code to {{compound}} that puts pages into Category:Compound with hyphen if any of the terms begins or ends with a hyphen. That way we can get an idea of how many entries would be affected by this proposal, and whether it's acceptable. The category needs some time to be filled, though. —CodeCa t 13:36, 3 August 2013 (UTC)[reply]

What problem does any of this generalization solve? DCDuring TALK 14:58, 3 August 2013 (UTC)[reply]
- "Template-itis" as you call it. I figured you'd like that idea. —CodeCat 17:01, 3 August 2013 (UTC)[reply]
  - You have completely misunderstood me, This "cure" is worse than that disease. Replacing easy-to-use, low-overhead templates (not those that call one another, do hundred of existence tests, and require a full time maintenance staff) with templates + Lua modules that require more human keystrokes to solve uncommon problems does not seem like progress. DCDuring TALK 13:29, 4 August 2013 (UTC)[reply]

If I've been working with {{compound}} and then need to mention a suffix or prefix, I do sometimes type {{suffix|foo|-bar|lang=en}} and only notice and fix the error after I've saved the page. Typing word|-hood (typing the exact names of the pages you want to refer to) would be more intuitive than typing word|hood (dropping part of one of the pagenames), for users starting from a tabula rasa. Of course, for users used to not typing hyphens, it could be unintuitive. Hm... - -sche (discuss) 17:25, 3 August 2013 (UTC)[reply]

The same happens to me as well. But there are also other advantages to making {{compound}} work this way. What if there are several suffixes? Or something more complicated still? In the past we had to hack this together using several uses of {{suffix}} but it's not that ideal and rather confusing too, and there were still cases that they couldn't handle easily. —CodeCa t 17:34, 3 August 2013 (UTC)[reply]

All of which can be handled by iterations of existing templates. I have no faith, based on the record of {{context}} for one, that there will not be regression of capability in some are of the user interface or elsewhere if this effort goes ahead. DCDuring TALK 02:00, 4 August 2013 (UTC)[reply]

Would it help if we changed the way we call reconstructed terms in templates like these? Rather than putting an asterisk at the beginning of the term, we could have a parameter like recons=yes or the like. —Angr 09:34, 4 August 2013 (UTC)[reply]
- That would make things less consistent and more complicated to use. In any case, the problem of the asterisks in suffixes was solved by Lua, so the only question that remains is whether we want to merge them all into {{compound}}. ZxxZxxZ pointed out that not all words ending in - are prefixes, some are roots instead. So we would need some way, a parameter perhaps, to tell the template that a term shouldn't be considered an affix despite beginning or ending in -. —CodeCat 11:15, 4 August 2013 (UTC)[reply]
  - @Angr: Yes it would help.
  - @CodeCat: For whom would it be more complicated? The vast majority of contributors never contribute anything involving reconstructed terms. Should all of those existing users relearn habits for the sake of some foolish "consistency"? To have contributors in every normal language contend with ANY inconvenience whatsoever to accommodate reconstructed terms makes no sense whatsoever. Something specialized to make it easier for the few who contribute reconstructed terms would better fit with the nature of the problem, it seems to me. I don't understand why such a basic design principle needs to fought over. DCDuring TALK 13:29, 4 August 2013 (UTC)[reply]

The fact that we'd need to add a parameter to tell a revamped {{compound}} not to consider something a suffix makes a compelling case, IMO, for continuing to use {{compound}} for non-suffixes and {{suffix}} for suffixes. - -sche (discuss) 14:20, 4 August 2013 (UTC)[reply]

But what about more complicated cases like verdonkeremanen, which are both prefixed and compounded? And to DCDuring: I know of only one case where a term ending in - is not a prefix, and that's when it's a root. And roots are... (surprise) reconstructed! —CodeCa t 15:17, 4 August 2013 (UTC)[reply]

Our current practice of {{prefix|ver|lang=nl}} {{compound|donkere|maan|lang=nl}} works fine there. - -sche (discuss) 15:22, 4 August 2013 (UTC)[reply]

There are pronouns (Ꮵ-, ᎠᎩ-, ch-) and particles (gu h-, tév-) too. — Ungoliant ^(Falai) 15:47, 4 August 2013 (UTC)[reply]

I oppose converting {{suffix|green|ness|lang=en}} into {{compound|green|-ness|lang=en}}. --Dan Polansky (talk) 18:11, 4 August 2013 (UTC)[reply]
- Any reason why, or just felt like it? —CodeCat 18:28, 4 August 2013 (UTC)[reply]
  - Marking things that are not compounds as compound is a poor idea, to begin with. Furthermore, if it ain't broke, don't fix it: I do not see any problem with having several templates (prefix, suffix, compound, confix) instead of one. The one problem that you have presented seems to do with a wish not to Luacize a template, despite frantic effort to Luacize everything in sight, so I do not really understand why that's an issue. --Dan Polansky (talk) 19:33, 4 August 2013 (UTC)[reply]
    - This discussion is primarily about Lua-izing and merging these templates, though. The template's title can always be changed. --Z 20:50, 4 August 2013 (UTC)[reply]
I strongly support merging everything into {{compound}}. Will make editing easier for me. --Vahag (talk) 19:21, 4 August 2013 (UTC)[reply]
I oppose any change until a workable alternative is actually made and demonstrated (via tests cases) that it works properly. All of the recent ports of working old templates to Lua introduced many bugs. --Ivan Štambuk (talk) 19:48, 4 August 2013 (UTC)[reply]
- That's actually because we were at the beginning of things and Lua-izing basic stuff like linking, language codes and script templates, formatting, etc. New modules like this one will be less complicated (and therefore less buggy) since they mostly just use those tools, which are proved to be bugfree. --Z 20:50, 4 August 2013 (UTC)[reply]
  - I wouldn't say proven, but we're getting there as we find more and more use cases for them and work out the rough edges. Module:links is probably the most important module on Wiktionary right now, next to Module:languages. Think of it as laying the foundation for other modules to build on, and it seems to do quite well in that role so far. —CodeCa t 21:26, 4 August 2013 (UTC)[reply]

Since this is turning into a !vote: I think the drawbacks discussed above outweigh the benefits, so I oppose deprecating {{suffix}}/{{prefix}}. - -sche (discuss) 23:00, 4 August 2013 (UTC)[reply]

Goodness me. I'm back from a two-week hiatus and trying to figure out what's happened in the meantime.

I was editing 手刀打ち to update the wikicode for the etymology and found that {{compound}} has changed from under me -- the pos argument values are now italicized, which is neither expected nor wanted.

So I wander over to {{compound}} to check the history, and discover that a ~2 KB template has been replaced with a ~9.5 KB module. I'm not up on Lua, but I tried reading through the code as best I can, and I am left with no idea where to go about un-italicizing the pos values (i.e. to restore the previous, and expected, behavior).

Was there a vote that I've missed? I don't see one linked in this thread.

If there was no vote, could we please revert this? At the bare minimum, could we have a fuller explication 1) of the advantages of Luafying something that didn't seem to need it, and 2) how to make (what should be trivial) changes in formatting, when templates are converted to Lua?

For the time being, I am opposed to Luafying {{compound}} in specific, and opposed in general to Luafying templates that don't do much beyond formatting and categorizing.

Confused, ‑‑ Eiríkr Útlendi │ Tala við mig 23:30, 14 August 2013 (UTC)[reply]

As I count this, we have 5 opposed to the change, and three in favor. Who died and left the techies in charge? DCDuring TALK 00:06, 15 August 2013 (UTC)[reply]

I find this comment extremely offensive, especially after you've said something like this. Have fun with your toy dictionary consisting of plain text files. DTLHS (talk) 01:15, 15 August 2013 (UTC)[reply]

We have had many techies whom I have found to be extremely helpful. Perhaps I should have been more specifically offensive:

Who died and left CodeCat in charge?

I thought things were bad with Daniel. This seems worse because the effort is more central to the functioning of Wiktionary. Despite a clear lack of consensus, s/he plows ahead, probably certain of the rightness of the cause, ignoring express opposition. DCDuring TALK 01:37, 15 August 2013 (UTC)[reply]

I'm more concerned with broken Lua code being tested on Wiktionary instead of local computer, and many "script error"s and miscellanous bugs continually popping out in entries. But that's just how open source software works - nobody gets paid, and nobody bears responsibility. --Ivan Štambuk (talk) 22:36, 15 August 2013 (UTC)[reply]

Regressions in Template:context edit

Are the regressions of the capability of {{context}} all fixed? Specifically, if a naive new user inserts {{obsolete}}, does a good result emerge? Does it appear right or does it tempt the user to remove the template and wikicode the appearance or cause the user to abandon the effort and possibly contributing at all? DCDuring TALK 13:06, 3 August 2013 (UTC)[reply]

(I split off this comment as it's off topic) What do you mean, regressions? Why would a naive user insert that at all when it clearly gives a red link? —CodeCa t 13:36, 3 August 2013 (UTC)[reply]

Nice put-down. It is not off-topic because embarking on new projects that are not emergencies when the old ones are not complete is poor practice. I'm inclined to view all new projects as essentially off-topic until the old regressions are remedied. DCDuring TALK 14:54, 3 August 2013 (UTC)[reply]

So basically, until things are the way you want them to be, you're going to obstruct everything else? If it hadn't occurred to you, the context templates are being deleted, they're not coming back, that's the consensus that has been formed over the last few months. So you can tap your feet all you like thinking that we're going to "fix the regressions" for you, but it won't happen. And taking your disagreement out by filibustering any further changes is just immature. —CodeCa t 17:05, 3 August 2013 (UTC)[reply]

If you want to run on your track record of overcomplicating user input and incomplete, defective, or inferior replacement of existing capabilities and folks here buy it, I can't stop it. DCDuring TALK 13:32, 4 August 2013 (UTC)[reply]

Inactive administrators edit

While I'm not sure what the inactivity policy is for administrators here, there are quite a few administrators that have not edited here in over a few years..I'll list them here in case anyone wants to do a desysop vote for them since they've not been active:

Jun-Dai (talk • contribs) 2/14/2010
Celestianpower (talk • contribs) 10/21/2010
Alhen (talk • contribs) 2/2/2011

There's also Rsvk (talk • contribs), who hasn't used his sysop tools at all, even though he's edited more recently....does he really need the flag if he hasn't used it?
Another user, Timwi (talk • contribs), hasn't done anything with the sysop tools since 2005, and again, does he really need the flag?

Anyways, you guys can do whatever you'd like to do with this information....I just wanted to display it for those that wish to use it. I'd suggest a desysopping of these five admins.Razor flame 00:21, 4 August 2013 (UTC)[reply]

I don't think there is any harm as such in keeping them as they are. But keeping the list up to date is useful for people who are looking for administrator support, and need someone who is active. It also gives us a better idea of how many there really are. —CodeCa t 00:25, 4 August 2013 (UTC)[reply]

There isn't much harm in keeping their flags, but I think we should remove these five and if they'd like them back, they can always request them directly to a bureaucrat to regain the flags. Razor flame 00:28, 4 August 2013 (UTC)[reply]

Even more remarkably, one checkuser hasn't been active in a year, while another checkuser has made only a handful of posts in the past four years, several of which were to espouse conspiracy theories about why the community was trying to desysop him for inactivity. (The move to desysop him failed.) At least one bureaucrat has likewise not been active in over a year. I thought the whole point of the recent vote on Meta about sysop inactivity was that Meta would step in and desysop any admins who'd been inactive for 2+ years, but that doesn't seem to be happening. - -sche (discuss) 03:08, 4 August 2013 (UTC)[reply]

I would propose removing sysop, checkuser, and bureaucrat rights from any user who has made less than 10 mainspace edits in the past 12 months, but allow them to retain their rights if they specifically ask for it each year. —Μετάknowledge^{discuss/deeds} 04:39, 4 August 2013 (UTC)[reply]

I have felt uneasy at the number of inactive sysops for some years (See User:SemperBlotto/Sysop Activity. If we were to have any sort of vote, I would probably vote yes for any reduction in their numbers. SemperBlotto (talk) 06:55, 4 August 2013 (UTC)[reply]

I could get behind desysopping the first three on the list above, but not the last two. I don't think we should desysop active editors just because they don't use their sysop tools. As for the first three, are they active on any Wikimedia projects? If not, desysop them. If so, give them the chance to say they would like to remain Wiktionary admins before desysopping them. —An gr 09:26, 4 August 2013 (UTC)[reply]

Jun-Dai (talk • contribs) has made six minor edits on En.Wikipedia this year (2013).
Celestianpower (talk • contribs) has made four edits on En.Wikipedia this year (2013).
Alhen (talk • contribs) has made 28 edits this year (2013), mostly on Commons and En.Wikipedia. —Stephen ^(Talk) 12:56, 8 August 2013 (UTC)[reply]

Category:Hawaiian Pidgin language edit

This language failed RFD some time ago, and its language code has been deleted. But it turns out that it has quite a few entries still. What should be done with those? —CodeCa t 16:41, 5 August 2013 (UTC)[reply]

I already said what to do with them in the RFD. Change the L2 to English, clean up as necessary, and tag with the context labels (Hawaii, slang). —Μετάknowledge^{discuss/deeds} 16:46, 5 August 2013 (UTC)[reply]

Done —Μετάknowledge^{discuss/deeds} 18:53, 5 August 2013 (UTC)[reply]

Kannada as an LDL edit

I think perhaps we should make Kannada an LDL, given how difficult it seems to find cites, at least for me just poking around a bit. Granted, I know almost nothing about the language, but nobody here really does (well, Stephen knows the most, I reckon). What do you think? —Μετάknowledge^{discuss/deeds} 03:18, 6 August 2013 (UTC)[reply]

What is a LDL? --Ivan Štambuk (talk) 03:23, 6 August 2013 (UTC)[reply]

Limited Documentation Language. See {{LDL}} and WT:LDL. --Yair rand (talk) 03:41, 6 August 2013 (UTC)[reply]

At the moment I'm neutral on this. If we judge by online resources, then it seems poorly documented. There is Shabdkosh (Kannada). Google Translate supports it too, although translations are often in the wrong form. I've seen textbooks and paper dictionaries. Kannada wiktionary is quite large but can't be used by English speakers. Most resources have little in terms of grammar - no gender is provided and no inflections. The script input has been supported for quite some time. Kannada seems to have a lot of synonyms and words with many meanings. --Anatoli ^{(обсудить}/^вклад) 04:00, 6 August 2013 (UTC)[reply]

Kannada is most definitely a language that has a ton of synonyms. Take a look at one of the entries I made and you can see the amount of synonyms that each word can have. Every word in the Kannada language can easily have upwards of five to ten synonyms. As for making this an LDL, I'm neutral on this as well. Razor flame 22:14, 6 August 2013 (UTC)[reply]

It’s surprising that an official language with that many speakers has limited documentation. A Google Books search for ಕರ್ನಾಟಕ (the name of the region where it is spoken) gets a mere 5 hits. — Ungoliant ^(Falai) 09:41, 6 August 2013 (UTC)[reply]

Southern, Dravidian speaking states are the main opponents of making Hindi a truly national language of India, they prefer English and some states use English as the primary language in education and politics at the expense of local languages and Hindi. --Anatoli ^{(обсудить}/^вклад) 20:37, 6 August 2013 (UTC)[reply]

`{{cmn-hanzi}}` and pinyin entries edit

Currently pages like lù and lu4 are just massive disambiguation pages. This seems inefficient. What I propose is to modify cmn-hanzi to generate categories for each pinyin reading (with Lua to maintain correct sorting). So 䴪 would get a category of something like "Mandarin entries by reading (lu4)" or "Mandarin entries by reading (lù)" (or even "Mandarin entries by reading (lu)" if you want to get crazy). Then each pinyin page can just point to the relevant category and everything is automatically updated. Thoughts? DTLHS (talk) 23:59, 6 August 2013 (UTC)[reply]

First of all, numbered pinyin [[lu4]] should be made a soft redirect to [[lù]]. As for reading categories, I'm for it (if this is (semi-)automated). I've been suggesting categories like "Mandarin terms spelled with ...". Other Mandarin editors were not enthusiastic about this. Note that Mandarin hanzi readings are much more consistent than Japanese (it's actually much easier to autotransliterate Mandarin than Japanese despite Japanese being partially phonetical!), so categories by "readings" have much less value than spelled with (+ hanzi)". This can be done by modifying the hanzi template. Over 95% of Mandarin hanzi have just one reading, the rules for reading characters with more than one reading are not complicated or are applicable to rarely used senses. There are actually two templates: {{Hani-forms}} (trad. and simp. exist) and {{cmn-hanzi}} (trad.=simpl.), which could be merged into one. --Anatoli ^{(обсудить}/^вклад) 00:15, 7 August 2013 (UTC)[reply]

My Editing edit

Listen, I understand all of you when you say that I am still showing signs of not editing right on this Wiktionary. I am here to tell you that I am trying to do so, but it makes it incredibly difficult to do so when I continue to get heckled every single day by people such as Dan Polansky, who continues to harass me by posting pure criticism, and by others not pointing out any of the positive things that I bring to this project.

While there are plenty of negatives that I'm continuing to work on, people nitpicking every single little thing I do on here is not helping the situation. I'm still trying to make this project the best dictionary that there is and my goal since I've come back from my extending hiatus from editing is to be more careful and less mistake-prone than I have been in the past. While that might not be evident right now, I'm definitely trying, and that is something that editors like Dan Polansky are completely ignoring.

I'd like to see these editors either stop posting complete criticism towards me or to post constructive criticism only. I'm getting sick and tired of people on this project acting like complete asses and not even doing anything to help the project.

For example, Dan Polansky, has made next to no edits in the past month, while I've made well over 3,000 edits to improve this project. While I know that there are things that need more work, like WT:CFI issues, I just need you people to understand that it isn't very easy to be on the receiving end of nothing but a long stream of criticism. I'm on the verge of just leaving this project again because of the negativity I've been receiving from this project. It was bad enough when it was coming from Opiaterein in the past, but I don't need everyone else jumping on the band-wagon.

In short, I just need you to understand that not everybody is perfect, and that those of us with flaws and continuing to try to work them out.

Just a note, but I will not stand for any comments to this post that are mean-spirited or anything of that nature. Razor flame 04:41, 7 August 2013 (UTC)[reply]

As a start, I recommend you to refrain from editing Kannada entries or similar languages. Nobody, of course, would stop you from learning it or working on words, example sentences your private pages, asking questions (for your own benefit, not for adding contents). When you're ready, it will show. Is it constructive enough?

I also sometimes edit in languages I don't speak but I use more precautions, check more thoroughly, use more reliable resources and work with simpler, less ambiguous words, so I make much less mistakes. There's no strict rule here not to edit in a language you don't know but please don't make people making it a rule.

BTW, Dan is right in many aspects when he criticised you. You won't gain anything by becoming defensive. --Anatoli ^{(обсудить}/^вклад) 05:15, 7 August 2013 (UTC)[reply]

First point: I think it's quite self-aware and appropriate that you created this discussion and own up to both your positive and negative qualities here.

Unfortunately, that leads me to the second point. It's very lacking in self-awareness and inappropriate that you decided to use this post to rail against Dan Polansky and other users that have pointed out your mistakes. Blaming us for telling you that you made a mistake is a really bad attitude. Some users (like Dan Polansky) have done so in a manner I do not condone, and have been warned for doing so and told explicitly to cease. However, if you keep editing with so many problematic edits, there will necessarily be a stream of people telling you so.

We know we're not all perfect. But those of us who care put the time in to be more careful, as Anatoli said. I for one, as I've been entering in Swahili words recently, never enter a word I have not seen in a physical Swahili reference book, and if I have doubts on its meaning, check Google Books and a couple online Swahili dictionaries. If you really care about improving the project, you'll learn not to estimate your contribution by the quantity of your edits, which is high, but the quality, which has been lacking. —Μετάknowledge^{discuss/deeds} 05:43, 7 August 2013 (UTC)[reply]

This isn't a language club, it's a dictionary. People use us as a reference. That means any mistakes you make aren't just your problem, they're a problem for everyone who uses those entries. You seem to think you should be getting credit for good intentions. Unfortunately, information is either correct or incorrect- there's no "A for effort". The kind of risks you've been taking are kind of like driving without watching where you're going- that tree in your path doesn't care whether you've been getting better about looking more often. Chuck Entz (talk) 06:33, 7 August 2013 (UTC)[reply]

Criticism of errors, and the absence of positive feedback is normal practice here. Just learn to live with it (like I do), or go elsewhere. SemperBlotto (talk) 07:19, 7 August 2013 (UTC)[reply]

I would accept criticism of errors, if it were constructive criticism. It is very off-putting to receive just criticism. Razor flame 04:29, 8 August 2013 (UTC)[reply]

There really isn't a way to be constructive about such a long-repeated pattern of making the same error, except to say "please stop making that kind of error". Equinox ◑ 00:35, 9 August 2013 (UTC)[reply]

You don’t need to stop editing Kannada. Just find a good, trustworthy published dictionary and don’t add any content that is not supported by it. — Ungoliant ^(Falai) 12:56, 7 August 2013 (UTC)[reply]

This sounds reasonable to me. Provided there are no (unverified) additions). --Anatoli ^{(обсудить}/^вклад) 23:29, 7 August 2013 (UTC)[reply]

I will look for a better published dictionary to continue to add more entries, though I probably won't add any more Kannada entries for a while. Razor flame 04:29, 8 August 2013 (UTC)[reply]

Thank you for all the constructive criticism that I received on this thread. I will do these things :) Razor flame 04:29, 8 August 2013 (UTC)[reply]

The new Template:term, what to call it? edit

{{term}} has now been "upgraded" to the same level as {{l}}, so it supports all the extras as well, like handling of embedded wikilinks, script detection, automatic transliteration and so on. It also supports reconstructed terms, so {{recons}} is obsolete and not needed anymore. However, Z and I also tried to make {{term}} more consistent with {{l}} parameter-wise, which means specifying the language code with the first parameter. That can't be done directly of course because it would break compatibility with all the entries that still use {{term}}. So Z made a second template called {{term/t}} (t for temporary) which is the same as {{term}} except that it takes the parameters in the same order as {{l}}. But that name is temporary, so we really need to decide on a more permanent name that is short and easy to use, and that is not {{term}} (because that name is already in use). Z suggested using {{m}}, with m standing for "mention". This fits with using short names for frequently-used templates like {{l}}. Of course, {{m}} is still in use, but we don't need it anymore (it's equivalent to {{g|m}}) so we could orphan it and usurp the name for this new purpose. —CodeCa t 00:13, 8 August 2013 (UTC)[reply]

Personally, I think we should use capital letters instead, since WT is case-sensitive. That way we can have a short, one-letter name for all our common templates. {{M}} and {{T}} are both already open and unused for other purposes, so I'd suggest one of those, probably the former per Z's mention. —Μετάknowledge^{discuss/deeds} 08:18, 8 August 2013 (UTC)[reply]

I don't think we should be introducing case sensitivity into our template names. Not only is it confusing to those who don't expect it (like experienced Wikipedians) but it also makes things much harder to remember as well, especially when both casings are used. —CodeCa t 11:14, 8 August 2013 (UTC)[reply]

I prefer {{m}}. --Vahag (talk) 12:18, 8 August 2013 (UTC)[reply]

That's a gender template though, so as much as we might want, we can't use that. -- Liliana • 16:56, 8 August 2013 (UTC)[reply]

As I pointed out above, it's not needed because it's just a shorthand for {{g|m}}. A bot could replace all instances of {{m}} fairly easily, and then the name would be available. —CodeCa t 20:05, 8 August 2013 (UTC)[reply]

It will be ugly if people who are not used to Wiktionary (or who simply haven't checked in for a long time) try to use it as if it were a gender template. I remember we had numerous occasions where people used {{sg}} and {{pl}} as singular/plural, so we had lots of words marked as Sango and Polish. -- Liliana • 20:49, 8 August 2013 (UTC)[reply]

Using the template without passing any values to it (like {{term}}) will return a script error and list the page in Category:Pages with script errors, so that's not much of a problem. --Z 21:05, 8 August 2013 (UTC)[reply]

That is a sign of the attitude that scares me. You are failing to take seriously enough the contributor side.

Maybe it's not a problem for you and for Wiktionary insiders, at least those who are current, but what about newbies of someone returning after a hiatus. Our error message should scare the hell out of a new contributor, leaving no recourse except going to the some forum, if they can figure out the appropriate one given that some of our names differ from MW standard, or going away and not coming back. For someone coming back after a hiatus, it will create the impression that you can't go home again, that it's not the same old place. To what population of contributors does this offer a positive inducement to contribute? DCDuring TALK 00:52, 9 August 2013 (UTC)[reply]

Your arguments are defending a nearly non-existent demographic. Newbies won't mistakenly use {{m}} for its old purpose because they'll never see it used that way, and newbies only use templates the way they've seen other entries use them. Oldtimers returning after a hiatus are a handful of users who ought to skim WT:N4E, and if we assume a few do so, there are only a few such users, and they know how to ask about something in the GP or to look at documentation. Basically, you're claiming to defend the common user's interests, but in reality all I see here is dislike of change and newness. You should instead be glad that a highly used template with a four-letter name will get a one-letter name, which makes it easier for real contributors. —Μετάknowledge^{discuss/deeds} 05:00, 9 August 2013 (UTC)[reply]

Huh, I never knew that WT:N4E even existed until reading your post here. ‑‑ Eiríkr Útlendi │ Tala við mig 20:49, 15 August 2013 (UTC)[reply]

I added it to {{welcome}} some time ago, but then someone else removed it because it wasn't important or was too confusing, or something like that. Kind of circular reasoning...? —CodeCa t 21:25, 15 August 2013 (UTC)[reply]

WT:ELE#Example sentences edit

* be italicized, with the defined term boldfaced.

This is inaccurate. We only use italics for the Latin script. I would start a vote to modify this but... it's too hard and too time-consuming to get such a vote to pass, so if someone else wants to do it, great, otherwise let's do what we normally do, leave it wrong. Mglovesfun (talk) 20:41, 8 August 2013 (UTC)[reply]

If example sentences not written in Latin script were enclosed in a template, then the whole matter could be a question of a vote by those contributing in a given language or script or language-script combination or by a scriptmaster, probably self-appointed. My understanding is that the problem is limited to some non-Latin scripts. Couldn't identifying the example sentences in question be done by bot? Have the months of non-operation of AF or equivalents left the formatting unusable as a guide to locating the example sentences reliably, so we need AI-like pattern recognition to find them?

If this is a matter that can be resolved on a script-by-script or similar basis by consensus, why do we need a vote? Especially for a document that the legalist nihilists among us claim isn't a policy document anyway? DCDuring TALK 20:59, 8 August 2013 (UTC)[reply]

I couldn't understand most of DCDuring's dense diatribe (what the hell is a scriptmaster???) or what he's getting at, so I'll ignore it for now.

No, we don't need a vote, it's perfectly legitimate to make a change like this by community consensus. The real question is: do we want to remove this line or just modify it? If so, what would the rewrite look like? —Μετάknowledge^{discuss/deeds} 21:03, 8 August 2013 (UTC)[reply]

Let me break it down for you:

The scope of the problem is limited to some non Latin scripts. Implication, it doesn't need to involve everyone.
A person could be in charge of managing whatever technical aspects (templates, Lua) were required for implementation, as some of our usage examples are in {{usex}}.
A bot could identify the other usexes, unless neglect of basic format maintenance has made that difficult.
We have been told this isn't a policy document by someone inclined to destroy whatever structure protects what he dislikes, though IMO we'd be better off to treat as part of our Constitution.

Proposed language:

"be italicized, if in a Latin script and preferably for other scripts, with the defined term boldfaced."

Plenty vague for non-Latin scripts to allow sufficient freedom for contributors in those scripts to make the required decisions. Other scripts can be added to "Latin" if there is consensus about them. "Preferably" expresses a presumption favoring italics, for consistency with past practice and the standard for Latin script languages. DCDuring TALK 22:01, 8 August 2013 (UTC)[reply]

As far as language-specific formatting of text goes, it's done through CSS, so the italicness of the wikitext shouldn't even matter. —CodeCa t 22:25, 8 August 2013 (UTC)[reply]

Are you saying that the CSS doesn't faithfully implement WT:ELE or something about implementation? Can CSS identify usage examples within an L2 reliably if the usage examples are properly formatted? Do we have a bot that is patrolling to make sure that usage examples are properly formatted? DCDuring TALK 22:37, 8 August 2013 (UTC)[reply]

No, it can't do that... well it might be able to, with enough effort. But that's not what I meant. I meant that even if text is placed in italics in the wikitext, the CSS should decide whether the text is actually shown in italics. ELE should only concern itself with the wikitext, and not with the actual visual interpretation of what we write, which is the responsibility of CSS. Therefore, it doesn't do harm to write "italics" in ELE, as long as it's made clear that we mean italic wikitext markup (''..'') and not necessarily actual italic text. —CodeCa t 22:46, 8 August 2013 (UTC)[reply]

What you are saying sounds like it reduces WT:ELE to invisibility and presentational irrelevance. I thought WT:ELE was about what actually appears. If direct wikitext doesn't make it appear, and no template makes it so appear, then CSS should make appear what is specified in ELE or by some consensus of relevant contributors. DCDuring TALK 23:29, 8 August 2013 (UTC)[reply]

That's the purpose of CSS, though. It separates presentation from content. —CodeCa t 00:10, 9 August 2013 (UTC)[reply]

Well, then is our CSS, WT:ELE-compliant? If not, why not? Isn't this the third time I have asked this question? DCDuring TALK

I think it's a bad idea to have a voted-on policy that mandates that e.g. Mandarin example sentences be italicised, and to then strip that italicisation by default through css. If we decide we don't want Mandarin example sentences to be italicised, we should just not italicise them in the first place. We did vote that non-controversial changes to policy pages no longer required lengthy formal votes, so does anyone object to modifying WT:ELE to clarify which, and that most, scripts aren't italicised? - -sche (discuss) 21:22, 9 August 2013 (UTC)[reply]

A BP mention of what scripts are not to italicized, preferably an illustration or a link to one, would fully address any lingering concerns about that specific matter. Such mention might be useful for any departure from WT:ELE implemented in Lua or CSS, for those of us without Lua-foo or CSS-foo. The topical category template experience makes me very leery of the works of wizards. DCDuring TALK 22:13, 9 August 2013 (UTC)[reply]

Maybe we should have a footnote that says something to the effect of: "references to font styles such as bold or italics only apply to scripts compatible with them, or to their script-specific equivalents, if there are any. Chuck Entz (talk) 01:33, 10 August 2013 (UTC)[reply]

Present Participles in Afrikaans verbs edit

I'm not a frequent editor of Afrikaans entries, but I've noticed that the past participle is the only inflected form that is given in Afrikaans verbs, and that the current template {af-verb} allows only this form to be given. But Afrikaans verbs have a second inflected form, the present participle, which is less frequent in use, but just as important to indicate, because it is sometimes irregular (e.g. wag -> wagtend, bly -> blywend). It would be wonderful if someone could create the technical requirements so that it become at least possible to denote present participles. Thank you!

I'm basically the only person that has cared much about Afrikaans templates here, and I'd like to get back to work on it, but I've been distracted by other languages (currently Swahili). I really couldn't find much on the present participle, so I left it out, but this and some other considerations still need work. I'll add something in for the present participle, but if you're a fluent speaker of Afrikaans, I'd appreciate if we could continue this conversation on my talkpage so we can improve the infrastructure for Afrikaans entries. Thanks! —Μετάknowledge^{discuss/deeds} 17:36, 11 August 2013 (UTC)[reply]

Currently, the first parameter specifies the past participle, but it would be better to specify the present participle with the first and the past participle with the second, as there are far more unpredictable present participles than past participles. I will try to make a module to handle the different cases. —CodeCa t 17:50, 11 August 2013 (UTC)[reply]

Do you know that for a fact? In any case, if you're making a module, you should make it automatically handle separable verbs like doodgaan. —Μετάknowledge^{discuss/deeds} 18:19, 11 August 2013 (UTC)[reply]

Dutch entries have dedicated inflection tables, and there is a sep= parameter like the Afrikaans template. But that parameter doesn't just say that the verb is separable, but it also gives the prefix. If we do the same for Afrikaans, then the template can create the participle as sep+ge+pagename. —CodeCa t 18:23, 11 August 2013 (UTC)[reply]

I think that would be best. Of course Afrikaans can put the whole inflection on the headword-line, but otherwise copying Dutch seems wise. —Μετάknowledge^{discuss/deeds} 18:28, 11 August 2013 (UTC)[reply]

I've made some modifications to {{af-verb}} and corrected the entries as necessary. I'll rewrite the documentation as well. I was able to predict most of the forms based on knowledge of Dutch but some are a bit unclear so they need to be checked: gee, dag, sê/seg, leef/lewe, aftree, graaf/grawe, aanbestee. About separable verbs, I do wonder how the present tense works. In Dutch, the separable part and the main verb switch places (like in oplossen) and the object and any adverbs are inserted in between the two. Does this happen in Afrikaans as well, like "ek los (dit) op"? Or does it remain as "ek oplos (dit)"? —CodeCa t 19:52, 11 August 2013 (UTC)[reply]

Looks good. I will look for the present participles of the ones you mentioned, but if you're not sure you should use {{attention}} in the mean time. As for separable verbs, yes, it works like Dutch. —Μετάknowledge^{discuss/deeds} 19:59, 11 August 2013 (UTC)[reply]

Should we also show this form in the headword line then, as it's not the same as the infinitive? I updated the documentation now. —CodeCa t 20:34, 11 August 2013 (UTC)[reply]

I don't think so... or at least I wouldn't think it deserves it own entry. PS: The documentation is a little confusing, you may want to read it over again. And do the greenlinks work now? —Μετάknowledge^{discuss/deeds} 20:39, 11 August 2013 (UTC)[reply]

We don't need to link the entry, but we can link to both of the individual words. I've made the change now, tell me if it's ok: gaan and doodgaan. And yes the green links work again. Which part of the documentation is confusing? —CodeCa t 20:50, 11 August 2013 (UTC)[reply]

Yeah, I like the presentation. You edited the doc and perhaps unknowingly fixed it, now I get it all. I'm not really sure why we're putting the irregulars directly in the module when {{head}} can do the job just as well, but in the end I really don't care about that. —Μετάknowledge^{discuss/deeds} 02:53, 12 August 2013 (UTC)[reply]

Multi-line quotations edit

See User talk:-sche#Multi-line_quotations

Is there a preferred style for multi-line quotations, e.g. quotations of poems? There are several styles in use, e.g.:

First line of poem, Second line of poem.

First line of poem,

Second line of poem.

First line of poem / Second line of poem.

Are all of the styles OK? I have no strong preference; I've used both of the last two styles before. I haven't used the first style because I thought wiki-code was preferable to HTML-code. Hyarmendacil prefers the first style. WT:Quotations#Line_breaks mentions only the third style, but that whole page is out-of-date. - -sche (discuss) 01:11, 12 August 2013 (UTC)[reply]

I usually use the first style as I think it makes the source text easiest to read. —An gr 15:04, 12 August 2013 (UTC)[reply]

There's actually a <poem> tag in the wiki software for situations like this. —CodeCa t 15:31, 12 August 2013 (UTC)[reply]

Yeah but does it work in lines beginning with *? Let's see:

I think that I shall never see
A poem lovely as a tree.

The bullet is left all by itself and the poem isn't indented. —An gr 16:13, 12 August 2013 (UTC)[reply]

<poem> doesn't seem to be intended for (or at least, doesn't work for) muli-line quotations that aren't already set off by a template like {{examples-right}}. Only these entries use it: Ladin, all singing, all dancing, ducks and drakes, condicional; Talk:天若有情天亦老; Template talk:fo-decl-adj, Template talk:cite-book; Template:quote-hansard/doc, Template:Thesaurus. - -sche (discuss) 20:57, 12 August 2013 (UTC)[reply]

Quotations, etc edit

I created this a while back but forgot about it: Module:User:DTLHS is a rough citation / quotation module where all the bibliographic data is consolidated on one page and as a Lua table (with the goal as always of separating data from presentation). Obviously this could be split up by language or by some other criteria in the future (in a /data subpage of a module). This would let us move individual quotation templates (Category:Latin quotation templates) into one place. I would envision this being used for heavily quoted works (the Bible, Shakespeare, Chaucer...) with raw formatting still allowed as a fallback. Some examples:

{{:User:DTLHS/Template:test|Augustinus Confessiones|text=text text text}} {{:User:DTLHS/Template:test|Authorized Version|book=Genesis|chapter=1|verse_start=5|verse_end=10}}

The custom parameters seem kind of hackish to me- any ideas for improving that aspect would be welcome. Also let me know if any of my code is unclear. DTLHS (talk) 00:07, 13 August 2013 (UTC)[reply]

Sounds like a good idea in principle. We also cite references with templates like {{R:Webster 1913}}, which require a different format, but could be stored in the same or a similar database. I think you will need to support article title and author in addition to book/journal title and author/editor.

Ultimately, it would be nice to store such structured data in a more wiki-ish plaintext format, or in Wikidata, instead of as an array assignment procedure. —Michael Z. 2013-09-09 17:42 z

bn-0 language userbox edit

The bn-0 language userbox renders as "These users do not understand Bengali (or understand it with considerable difficulty)." It should match the others by saying "This user does not...". Same probably goes for bn-1, bn-2, etc. Equinox ◑ 13:27, 14 August 2013 (UTC)[reply]

Then change it. --Ivan Štambuk (talk) 22:17, 15 August 2013 (UTC)[reply]

I don't know how. Equinox ◑ 17:52, 16 August 2013 (UTC)[reply]

Fixed. —Μετάknowledge^{discuss/deeds} 18:24, 16 August 2013 (UTC)[reply]

Section Headers: "Alternative Forms" vs. "Variant Forms" edit

WIktionary currently has an established section header of "Alternative Forms".

But in American English especially, "alternative" has a strong sense of something which you can optionally choose, and which may even be better. The term "alternate" is used instead for more neutral situations of something which only might be exchanged (an "alternate juror", not an "alternative juror").

To the contrary many of the differing forms of something in Wiktionary (especially the Chinese characters I am working with) are not free choices for the writer, but instead are mandatory in some contexts and forbidden in others, or are optional only in some contexts.

So I propose the term "Variant Forms" in place of "Alternative Forms".

This is more neutral and can comprise both volitional and required situations.

Additionally, in this entry I ended up creating quite a lengthy "Alternative Forms" section, and it is somewhat overwhelming the following main definition of the Chinese character:

http://en.wiktionary.org/wiki/艹

I tried putting the "Alternative Forms" section after the "Han Character" section, but this got reversed. This got me thinking in general whether it is logical to define variations of a subject, before defining the subject itself.

So I would propose having "Variant Forms" (formerly "Alternative Forms") after "Han Characters" in the official order for Chinese characters. I'm not sure how this works with other languages.

Thanks for your consideration of this issue. HanEditor (talk) 08:50, 15 August 2013 (UTC)[reply]

I think I like "Variant forms" better too. I'd also like it if it weren't the very first section, before even etymology and pronunciation. —An gr 14:07, 15 August 2013 (UTC)[reply]
Yes, I could go with that. 'Variant forms' (or should it be 'Variants'?) placed as the first thing after the definition. BTW, HanEditor, you can use an autocollapsible table so the alt-forms don't clog everything up. —Μετάknowledge^{discuss/deeds} 16:34, 15 August 2013 (UTC)[reply]
Have you ever seen a dictionary that places alternative forms after definition lines, and not as the very first thing? --Ivan Štambuk (talk) 22:20, 15 August 2013 (UTC)[reply]
Yes. The major Irish-English dictionaries list the alternative spellings of Irish words at the end of the entry, not the beginning. —An gr 09:00, 20 August 2013 (UTC)[reply]

I dislike "variant forms", because people already often misunderstand "alternative forms" as implying inferiority (and some editors actively promote that misunderstanding), and I think "variant forms" is even more liable to this problem. Maybe "other forms", or even just "forms"? —Ruakh_TALK 20:04, 15 August 2013 (UTC)[reply]
- Plain "Forms" is bad because it will get mixed up with inflection. After all, people say things like venīrēmus is a form of veniō. —An gr 21:43, 15 August 2013 (UTC)[reply]

Do you have a reference for this interpretation of alternative? --Ivan Štambuk (talk) 22:16, 15 August 2013 (UTC)[reply]

To me, "variant forms" seems either no better or worse than "alternative forms", or possibly slightly 'worse', meaning 'more likely to be misunderstood'. "Forms" seems too vague and is liable to be filled up with inflected forms, as Angr says. "Other forms" might work. In any case, if an entry has a lot of forms, they can be collapsed, like in [[ambergris]]. (Another header people often misunderstand is "Related terms", which is for etymologically, not semantically, related terms.)- -sche (discuss) 03:16, 16 August 2013 (UTC)[reply]

Unicode officially uses the term "Glyph Variants":

* http://www.unicode.org/search/ — search for "glyph variants"

Though note that for Wiktionary purposes, our different forms will not be limited to glyphs. "Variant Forms" can mean variant glyphs, variant spellings, etc.

Ivan Štambuk:

* Alternative Energy — generally not coal, even if that is sometimes used in place of crude oil

* Alternate Juror — a juror who may replace another due to illness, etc., but who is not an alternative that a party to the case can choose if they wish.

But "Other Forms" would also work for me.

HanEditor (talk) 04:03, 16 August 2013 (UTC)[reply]

You're just mixing up meaning #2 of alternative (as in "alternative lifestyle") with a perfectly valid and original meaning #1 "relating to a choice between two or more possibilities". Terms such as alternative energy originally really meant only #1, but once they've become a part of the Green/anti-globalist/tree-hugging movements they were also perceived as something "non-mainstream".

OTOH we have variant usually defined as "differing from a standard or type" or "Deviating from a standard, usually by only a slight difference." (definitions taken from some random dictionaries) which would imply that the main entry which contains all of the content, and that variant forms soft-redirect to (by means of {{variant form of}} template) is the "standard" or "proper" form, which would be quite problematic because many of these alternative forms are really equal in status (think of US/UK spelling differences). alternative seems much more neutral in meaning. --Ivan Štambuk (talk) 06:19, 17 August 2013 (UTC)[reply]

I oppose renaming "Alternative Forms" to "Variant Forms". --Dan Polansky (talk) 17:20, 16 August 2013 (UTC)[reply]

OK many are co-equal forms. Many are specific forms that can only be used in strict contexts. Many are actually rare non-co-equal, lesser forms.

So the utterly neutral solution: "Other Forms"

HanEditor (talk) 10:05, 20 August 2013 (UTC)[reply]

I don’t agree that alternative and alternate are necessarily interpreted in these ways. Alternative can also mean non-mainstream or counterculture, and so alternate is often a better word choice when you mean “other.”

But other is the simplest synonym that means exactly what it is, and can't imply those other things. Other forms would be an improvement. —Michael Z. 2013-09-09 17:51 z

`{{fr-noun}}` edit

When specifying - as the second unnamed parameter, which indicates to the audience that there is no attested plural to this entry, the template somehow puts the entry into the Category:French plurals category. The code is a little beyond me. Can someone have a look? Examples: abrine, arthrite. Jamesjiao → ^{T ◊ C} 23:24, 15 August 2013 (UTC)[reply]

It's the code in Module:fr-headword that adds the categories, but that code was converted directly from what the template used to do. I reverted the template back to its original state, but it looks like it still puts entries in the plurals category, so at least I know there weren't any mistakes in converting it. What I'm not sure of, though, is what it actually should do and why the template worked this way in the first place. Could someone who is more familiar with French help out? Mglovesfun? —CodeCa t 23:50, 15 August 2013 (UTC)[reply]

It had never done this before the Scribunto module was introduced. I don't think this is intended at all. To intentionally include the singular form in the plurals category would just be a nonsensical decision. Jamesjiao → ^{T ◊ C} 02:42, 16 August 2013 (UTC)[reply]

I think - used to mean 'plural the same as the singular' which I updated to mean 'uncountable' in line with other templates like {{en-noun}}. Perhaps I forgot to change the categorization. Perhaps it wasn't me at all. It doesn't really matter; it needs changing, that's all there is to say, really. Mglovesfun (talk) 02:38, 19 August 2013 (UTC)[reply]

So-called Proto-Yiddish edit

I'm not sure what L2 header to use for words in Proto-Yiddish, the language of works like the Dukus Horant. If they were written in the Latin script, we'd probably call it Middle High German; if they used a more modern orthography or more Hebrew-derived vocabulary we'd probably call it Yiddish. Having an L2 header 'Proto-Yiddish' is certainly possible, but seems like a problematic choice, merely because it's easier to insist in etymologies that Yiddish is a direct descendant of Middle High German, without accounting for a protolanguage. I'd especially appreciate it if our Germanicists could weigh in, but I'm curious to hear what everybody thinks in terms of dealing with these words. —Μετάknowledge^{discuss/deeds} 00:56, 16 August 2013 (UTC)[reply]

I'd be more inclined to treat it as very archaic Yiddish than as Middle High German, because of the written form. I do think it's interesting that they used the letter š to represent the combination sch in the poem. That tells us that this combination was pronounced [ʃ] at the time already. —CodeCa t 01:06, 16 August 2013 (UTC)[reply]

But we can't really transliterate it like modern Yiddish, right? We'd have to transliterate it as Middle High German or use an idiosyncratic middle path. —Μετάknowledge^{discuss/deeds} 05:15, 16 August 2013 (UTC)[reply]

Judeo-Middle-High-German? JulieKahan (talk) 12:12, 16 August 2013 (UTC)[reply]

A widespread mistake I have been making edit

While expanding the Webster 1913 abbreviations, I have incorrectly been putting Beaumont and Flanders (or Beaumont & Flanders, with an ampersand); in fact, the writers referred to are Beaumont and Fletcher. Anyone got a bot that could sort these out? There are dozens, possibly over a hundred. Equinox ◑ 05:22, 17 August 2013 (UTC)[reply]

I'll give it a shot. —Ruakh_TALK 05:56, 17 August 2013 (UTC)[reply]

At the last database dump (which ran early Thursday morning), there were 177 affected entries. (I found them by examining all pages that contained both Beaumont and Flanders, and manually eliminating the one false positive, namely Wiktionary:Frequency lists/PG/2005/08/10001-20000.) Rukhabot (talk • contribs) is currently going through them at a rate of ten per minute, so should be done within the next ten minutes or so. [example edit]
Do you think you did any within the past few days, after the database dump began?
—Ruakh_TALK 06:18, 17 August 2013 (UTC)[reply]

Thanks, and no, I probably didn't. Equinox ◑ 06:21, 17 August 2013 (UTC)[reply]

No problem. If I remember (or if someone reminds me), I'll check the next database dump when it comes out, just to be sure. (Even if you didn't add any new ones, there's a chance that due to temporary vandalism or whatnot, the database-dump happened to pick up a page-version that lacked this problem even if the page now has it. But that seems unlikely, so I'm not too worried.) —Ruakh_TALK 06:37, 17 August 2013 (UTC)[reply]

Done. The complete list of edits is at http://en.wiktionary.org/wiki/Special:Contributions/Rukhabot?offset=20130817063000&limit=179 (which also includes two unrelated edits from another bot that ran concurrently with this one for a few seconds, but I don't think there should be any confusion about which two those are). —Ruakh_TALK 06:37, 17 August 2013 (UTC)[reply]

Singular forms categorized as English pluralia tantum edit

Somehow Category:English pluralia tantum has become populated with both the singular and plural forms of nouns that have p.t. senses, eg, academic and academics. We can't want the singular to appear in the category. I suggest that "plural only" in context should not category and humbly request the powers that be to fix this and review contexts general to make sure that similar outrages (eg, Chinese English) are not perpetuated.

Is there a control table for labels that determines which ones should categorize and which ones not?
What about labels that should categorize in some languages but not it others?
What workarounds do we need? Can they be applied by bot or AWF?

This is possibly separate from the confusion of usage context and other types of labels or possibly a consequence of that confusion. DCDuring TALK 03:58, 18 August 2013 (UTC)[reply]

I'm not sure I agree with your premise. Insofar as pluralia tantum are occasionally (or more-than-occasionally) defined at entries for imputed singular forms, I think it makes sense for the category to link directly to those entries. —Ruakh_TALK 04:06, 18 August 2013 (UTC)[reply]

(In other words, I think the problem is that categories aren't as intelligent as we might like. Ideally, a category would be a list of terms or senses, with links to the appropriate entries; but as it is, a category is just a list of linkified pagenames, which is a very course approximation. Given that, I don't think it's always self-evident how to most closely approximate the ideal. It's tradeoffs all the way down.) —Ruakh_TALK 04:21, 18 August 2013 (UTC)[reply]

Why would we want both when it is within our power to have a clean list of the actual forms that are p.t.s? We can even use our new sense-ids to link from the plural form to the specific definition if it only exists at the singular form. But the p.t. definition should always appear at the plural form. If the context/label-based categorization system leads to this result, it is causing us to get less from this category that we could.

The remedy doesn't seem terribly hard, just tedious to apply. In the singular definition, the label should not be one that categorizes, ie instead of "plural only", it could be something like "in plural only". DCDuring TALK 05:06, 18 August 2013 (UTC)[reply]

Re: "the p.t. definition should always appear at the plural form": If that's true, then the "insofar" clause in my comment is not satisfied, and I agree with you that the plural form is what the category should link to. The problem is that it's tricky; "scissors" and "pants" and "gummies" are mostly pluralia tantum, except in attributive position, where people typically say e.g. "scissor kick", "pant leg", and "gummy bear" (though the details depend on dialect and idiolect). I don't put a high value on having categories be "clean" lists, since I always expect that people browsing a category can ignore the ones they're not interested in, so I'd rather err through inclusion than through exclusion. —Ruakh_TALK 06:25, 18 August 2013 (UTC)[reply]

I disagree with DCDuring's suggestion: If the same noun has both singular and plural-only senses then the plural-only senses should be listed in the singular lemma. Content should be centralized and not dispersed. It's especially ugly to have a normal definition line next to the soft-redirect template like academics currently has. --Ivan Štambuk (talk) 09:26, 18 August 2013 (UTC)[reply]

I really don't care where the full definition is located and didn't say that I did. I care which form is categorized in Category:English pluralia tantum. I simply don't want to reduce all of our categories to mush beyond what we are compelled to by what is inherent in the very idea of such categorization, the limits of MW software, and our page structure.

If our label autocategorization system creates confusions as between topic and usage context and between lemma form of a plural-only term and the actual plural form that is not something forced on us, but something some seem not to care about. I care about it and find it unacceptable. DCDuring TALK 11:20, 18 August 2013 (UTC)[reply]

If all of the definitions are listed in a single place, then the problem of having both singular and plural forms categorized inside the Category:English pluralia tantum would be solved.

A separate issue is that you want that category to contain only the plural forms. Which there is not point in doing since such plural forms (apart from "true" pluralia tantum) would have no definitions and would be treated as inflected forms. --Ivan Štambuk (talk) 17:38, 18 August 2013 (UTC)[reply]

Block of a user edit

User:Tedius Zanarukando's translations of English terms into languages he doesn't speak have mainly been copying from Wikipedia. Many of them are in incorrect form or case. I'm currently very busy, will have to address those translations later. --Anatoli ^{(обсудить}/^вклад) 10:43, 18 August 2013 (UTC)[reply]

No need to block him just yet. Give him a chance to show whether he heeded the warning. — Ungoliant ^(Falai) 14:27, 18 August 2013 (UTC)[reply]

There once was a Tedius Zanarukando
Who looked not at all like Marlon Brando
His dubious translations
Caused great consternation
And earned him a quick reprimando.

—Μετάknowledge^{discuss/deeds} 04:21, 19 August 2013 (UTC)[reply]

So you're saying that if he ever changes his username to 'Tedius Zanarundako', then he'll have earned a block? —Ruakh_TALK 05:28, 19 August 2013 (UTC)[reply]

Many of those wrong translations get picked up by other (usually smaller) Wiktionaries where they are bot-generated into articles in an effort to quickly boost the number of entries. Most of these will never ever get fixed. This is not the problem of English Wiktionary, but it's annoying nevertheless. --Ivan Štambuk (talk) 18:30, 23 August 2013 (UTC)[reply]

Sanskrit in Latin script? edit

Sanskrit is written in a lot of different scripts. One of the scripts listed in Module:languages is Latn, which I temporarily removed so that we can track down any links that need fixing. But this makes me wonder, why was it there in the first place? Do we actually want to support Sanskrit in Latin script? So, should it stay removed or should it be restored after the work is done? —CodeCa t 13:05, 19 August 2013 (UTC)[reply]

I don't see why. Devanagari works just fine and is the de facto standard for Sanskrit. -- Liliana • 14:00, 19 August 2013 (UTC)[reply]

I agree that we should use Devanagari, but one could argue that transliteration systems such as IAST are the de facto standard in the types of etymologies available to most English-speaking contributors (and in a great deal of the literature). I wonder if it would be possible/worth it to automatically tag transliterated Sanskrit and other uses of the wrong script with something like an rfscript (or am I re-inventing the wheel again, and we already have it?) Chuck Entz (talk) 14:36, 19 August 2013 (UTC)[reply]

That's the "tracking down" I referred to. Any links that are not in any of the language's recognised scripts, or for which the script can't be recognised at all, are placed in Category:Terms using script detection fallback. —CodeCa t 14:46, 19 August 2013 (UTC)[reply]

Sanskrit is allowed in any Indic (Brahmic) sript that is written in, but not Latin. See w:Sanskrit#Writing_system for more. So far only Devanagari has been used because almost no one here knows something else. However, many modern editions (e.g. Clay Sanskrit Library) use only IAST, and years ago User:Dbachmann argued that in fact Latin script should be the default one, with main arguments being:

Sanskrit has no "native" script so we're free to choose any as the default one
defaulting to Devanagari would give too much prominence to Hindus/Hinduism, and Sanskrit is also heritage of other Indian religions/peoples, and IAST is neutral that with regard to that issue
in Latin script one can also mark accents and use hyphens for compounds
there are many published Sanskrit dictionaries, grammars, and works using IAST in English-speaking countries, so it's the most "natural" to choose it. Hindi Wiktionary would default to Devanagari for obvious reasons, Tamil Wiktionary to Tamil script and so on.

Using IAST for Sanskrit was shot down by RU&CM way back with all of the IAST entries moved to Deva (e.g.), but now that that era is over perhaps we can reopen this topic. --Ivan Štambuk (talk) 16:36, 19 August 2013 (UTC)[reply]

I'm in favor of allowing Latin script entries for Sanskrit in addition to Devanagari, but I'm less enthusiastic about making Latin the default script. Latin-script Sanskrit entries should have the same status as Latin-script Chinese, Japanese, and Gothic entries. —An gr 09:18, 20 August 2013 (UTC)[reply]

In practice there is no difference between "default" and "non-default" scripts because full entries in either are allowed. Having Devanagari as the main (content) script and others soft-redirecting is not an option for NPOV reasons (see point 2 above). In real-world usage, IAST is at least equally used for writing Sanskrit in English-speaking countries, particularly in recent times (as opposed to the 19th century and early 20th century). There are Sanskrit grammars, dictionaries and entire series (CSL) written using only IAST.

One additional advantage of having IAST as the main script is that it would be very easy to generate Devanagari spellings (or any other) from it - in fact, it would be a 100% automatible task. We could just create main entries in IAST, and let bots do the dirty synching work. It could also be handled at Lua level (it's a simple character substitution), i.e. we could have a single set of inflection, headword etc. templates supporting IAST, and supporting other scripts would amount to adding a respective transliteration module. --Ivan Štambuk (talk) 10:17, 21 August 2013 (UTC)[reply]

I support placing Latin IAST for Sanskrit at least on the level of romanization of Chinese, Japanese, and Gothic entries. --Dan Polansky (talk) 15:51, 20 August 2013 (UTC)[reply]
Support, as Dan says.

@Angr, why would you be opposed to Latin-script Sanskrit entries? As noted above, Sanskrit has no default script, which would seem to make it much like Pali -- for which we have entries in the Latin, Devanagari, and Burmese scripts (c.f. buddha). I'm not looking to argue, I just want to know your reasons. ‑‑ Eiríkr Útlendi │ Tala við mig 19:34, 20 August 2013 (UTC)[reply]

I'm not opposed to Latin-script Sanskrit entries. I said the same thing Dan did: that IAST for Sanskrit should be at the same level as the romanization of Chinese, Japanese, and Gothic. (I think we should also have romanized entries for Hittite since absolutely nobody uses cuneiform for it.) —An gr 10:25, 21 August 2013 (UTC)[reply]

No one except for Hittites themselves? I'm afraid we're conflating several different issues here:

Easiness of looking up entries. Gothic has very limited attestation (a hundred pages or so?), and transcription to Latin is fairly straightforward. Gothic Romanizations as entries serve simply as soft redirects to compensate for the primitive look-up facilities of MediaWiki. (This a practical issue that could be solved by WikiData, together with interwikis, inflected forms etc. if they were interested in solving practical issues). Hittite in particular has several problematic issues and Romanization is not straightforward (e.g. hyphens, Sumerograms, determinatives, combining plene writing as macrons) and each Romanized Hittite word is likely to have several different forms.
Reduced maintenance for alternative scripts of sub-equal status. Latin scripts for Japanese are Chinese varieties are not "native"; however, their usage as soft redirects is understandable with respect to reduced content duplication and maintenance. Everyone familiar with one form of writing is likely to know/learn/encounter other, so it makes sense to degrade in status some of them while concentrating efforts on others.
Equally valid scripts which reflect different cultural backgrounds. It is doubtful that there are people familiar with all of the Indic scripts used for writing Sanskrit. With NPOV issues in mind, Latin script as the most neutral and familiar one (with respect to the perspective of someone using English Wiktionary) could easily serve the purpose of being the main (suggested) script to create entries, whence entries in other scripts could be created in an automated fashion (either as full-blown entries, or as soft redirects).

It should also be noted that there are several large out-of-copyright Sanskrit dictionaries available in XML and IAST, as well as databases of inflected forms (to build lemmataziers/stemmers) - one of them that I've inspected years ago has > 1M forms), and the current - more or less haphazard treatment of Sanskrit is something that has the potential to be massively scaled with the help of bots. --Ivan Štambuk (talk) 12:24, 24 August 2013 (UTC)[reply]

I, as always, stand for Latin Gothic, since, as far as I know, no significant volume of Gothic has ever been published in any other script. Pick any book containing Gothic off the shelf, and it will be in Latin script.--Prosfilaes (talk) 03:06, 27 August 2013 (UTC)[reply]

Quick Google search shows that the 1665 editio princeps of Codex Argenteus was written in Gothic script [2], this 1750 edition as well, as is the 1927 facsimile edition Codex argenteus Upsalensis. A tradition that can hardly be ignored by relatively recent trends of Romanizing everything for the sake of convenience. The next thing you know we'll be writing German as dâs ist mayne froyndin because off-the-shelf "for dummies" books do it so.. --Ivan Štambuk (talk) 06:34, 27 August 2013 (UTC)[reply]

Next thing you know we'll be writing Turkish in the Latin script just because of relatively recent trends of everyone who uses the language writing in the Latin script. Or maybe just dropping our long-ses and not using Fraktur for German.

Seriously, look at that 1665 book. Wiktionary does not render its Saxon that away, though I think recent versions of Unicode have the characters; we do not render our Greek in that horribly ligatured way, and I'm not sure the digital fonts to do so exist. And we don't use the long-s. In all ways do we follow the recent trends, not the typography of that book. If the goal is to be useful, then we use the orthography used in the books that people are reading and might want to look up words from.--Prosfilaes (talk) 07:44, 27 August 2013 (UTC)[reply]

Turkish is a living language, and the decision to switch script was being deliberately made (note also that we do add Ottoman Turkish words in Arabic script, and that Ottoman Turkish is in some respect a lot different than modern standard Turkish). Gothic is a dead language, scarcely attested, and if the technical means are available to reproduce it faithfully as it is actually attested, such as the availability in Unicode, it is imperative that we do so. Wiktionary is primarily an archeological project where words are collected, explained, categorized and analyzed, and only secondarily a learning aid for students of Gothic (some [most?] editors would disagree with this and think that what a typical user wants is the most important thing, but I don't). The goal is not to be useful but thorough, precise and credible (cf. OED). Ideally, we should have all of the Gothic words in headwords and citations photographically rendered from original documents. Same goes for any other language with such issues. Romanizations as means to look up properly spelled entries? - that's OK. Perhaps one day transliterations will be stored at WikiData and integrated into language-specific search, and we could get rid of them as entries as well. --Ivan Štambuk (talk) 08:15, 27 August 2013 (UTC)[reply]

Automated linking of IPA characters to their corresponding WP entry edit

So {{IPA|/ˈdɪkʃənɛɹi/}} would give /ˈdɪkʃənɛɹi/. Is it worth it? It's easy to do with Lua. It may be a bit resource-eating in some entries (particularly on the users side, as too many links would result in a big HTML code). --Z 15:07, 19 August 2013 (UTC)[reply]

The link that appears before any pronunciation should be entirely sufficient, I'd think. It's much easier than having to click on ten links (maybe more depending on the complexity of the term), in my opinion. -- Liliana • 15:15, 19 August 2013 (UTC)[reply]

My first impression is that it is really smart. Quick access to both audio and prevalence. --Njardarlogar (talk) 15:46, 19 August 2013 (UTC)[reply]

Actually I forgot to add "IPA: " there (we should always mention the alphabet that is being used; linked or unlinked). Thus one does not have to click on the characters. --Z 16:08, 19 August 2013 (UTC)[reply]

I think it's a fantastic feature. IPA is tough, and it's so esoteric that it has its own characters. Short of direct links like this, the barrier of time and energy would be too high for most users. For the regular person, if I have to to go to w:Japanese phonology for 被爆者 and match each character up to its explanation, my interest disappears in 5, 4, 3, ... --Haplology (talk) 16:29, 19 August 2013 (UTC)[reply]

No need: {{IPA|/ha/|lang=ja}} already links to this page. But it could be a good idea to link every character, not only "IPA:", to make it more obvious to the reader. Dakdada (talk) 16:38, 19 August 2013 (UTC)[reply]

Adding a title text with a short explanation would also be useful. But the wiki software already adds its own title text to links, so I don't know if it would work nice together. —CodeCa t 16:39, 19 August 2013 (UTC)[reply]

It is possible: [[page|link]] = link. Dakdada (talk) 16:41, 19 August 2013 (UTC)[reply]

I created a module for it, it was harder than what I thought first, there are symbols with multiple characters and Lua's support for regular expressions is poor. Here are some tests: [3], the data is here, which contains fields for title of each symbol and title of the corresponding entry, so we can add description for each symbol as a tooltip. --Z 16:46, 20 August 2013 (UTC)[reply]

So the problem is that it is too hard for the reader to follow one link and look up the ten characters of ˈdɪkʃənɛɹi in the pronunciation table? Providing a single word with eleven links to encyclopedia articles is not a solution. Increasing the help links and text by 1,000% increases the cognitive load on the reader, adds unnecessary decisions, immerses them in a sea of data. It make it harder for the reader to make sense of the single word in IPA.

This solution also presents the reader with some other new problems:

For one pronunciation, there are now two apparent links: IPA and ˈdɪkʃənɛɹi: which one to click?
The second blue underlined link is actually ten links: ˈ, d, ɪ, k, ʃ, ə, n, ɛ, ɹ, and i. After clicking or tapping the link text “ˈdɪkʃənɛɹi,” will the reader realize they have actually followed a link for one of its characters?
1-character links with no separating text, punctuation, or even whitespace, even if the reader is aware of them, are hard to hit with a mouse pointer, and harder still on a touchscreen.
Title attributes can be helpful, but they cannot be relied upon, because they are not accessible on touchscreens and screen readers.
Title attribute text like “voiced alveolar plosive” will not be helpful to a reader who doesn’t know IPA.

For English pronunciations, we link to Appendix:English pronunciation, which contains three tables comprising about 60 data rows. If this is to much, then provide the reader with a single custom table with 10 data rows or columns, explaining the relevant IPA characters, and offering links to more help. This could be presented in a separate page or a popup, either with inline code, or maybe in a Pronunciation: namespace. It can be automated with Lua.

Simplify IPA is helpful; complicating it is not. —Michael Z. 2013-08-21 16:04 z

Another thing that may help readers is if we put together corresponding IPA, enPR, and audio, so readers can learn the symbols from actual examples. Because we commonly put them on separate lines, a pronunciation section becomes a daunting soup of qualifiers, brackets, abbreviations, special characters in four alphabets, and multimedia widgets.

And hide the X-SAMPA by default, because it is just a technical-use version of the IPA. Let readers who want it reveal it with a Toolbox widget or preference. (Can anyone cite a single instance of a Wiktionary reading making use of one of our X-SAMPA transcriptions, ever?)

And maybe remove the label text “Audio” from the self-evident audio widget. —Michael Z. 2013-08-21 16:21 z

I agree with Michael Z. I also think that there is room for improvement: hiding/Removing X-SAMPA is a good idea (and it can be transliterated from IPA with a script if needed). Also, why not link the entire pronunciation to the help page, instead of just the three (small?) letters of "IPA"? That would be easier to click/touch/access. Dakdada (talk) 16:31, 21 August 2013 (UTC)[reply]

Regarding the new problems:

It's OK because when the reader hover the mouse over the second apparent link, each character would be underlined separately so it would be obvious that they are about symbols.
Same as above, each character is underlined separately, so it's not much of a problem.
This can be fixed by using the letter-spacing CSS property.
True

We can improve the title text by adding examples from English words, e.g. the title "Near-open front unrounded; e.g. bAd, cAt, rAn" (or "-- a in bad, cat, ran") for "æ".

I think adding title is definitely a good idea, we can put all of the information of Appendix:English pronunciation there, so there would be no need to open and explore another page.

I support putting IPA, enPR, and audio (without "Audio:") in one line and removing X-SAMPA. --Z 17:35, 21 August 2013 (UTC)[reply]

Re. no. 1 and 2: readers may not stop to hover, and as mentioned, many of them cannot do so. When our entry is reused in other contexts, the browser’s default CSS would put an underline under everything. The underline may help some readers sometimes, but it doesn’t make it OK.

Re. 3: this is not what letterspacing is for. This would just add bad typography to the list of problems.

By the way, I notice that your example is in a <code> element, and so is set in a monospace font, giving the links more space than they would actually have.

/ˈdɪkʃənɛɹi/

A more accurate representation of IPA in an entry. In my browser, the individual link underlines are so tiny that they are mostly hidden under my mouse pointer (even though my stylesheet ups the font-size to 123% of Wiktionary’s default!). Even best case, these individual links will not be evident to most readers.

IPA: /ˈ d ɪ k ʃ ə n ɛ ɹ i/

An example with letterspacing added. Even though it has been exploded to a degree that it no longer reads like a word and looks distracting on the page, the individual links are still obscure and difficult targets for mouse or touch.

IPA: /ˈ d ɪ k ʃ ə n ɛ ɹ i/

I took the liberty of numbering the points for easier reference.

—Michael Z. 2013-08-23 18:36 z

MZ's proposals seem like a good way to go to make IPA more useful, more worth the effort to begin to learn it. It does not address the problem of someone who just wants some quick help with pronunciation, which MZ's sound-like system suggestion addresses, whatever its imperfections and possible lack of universal coverage. I'll bet that more users get value from our sparse coverage of homophones and less sparse coverage with rhymes (once they click through the first time) than from the other three systems. Of course audio is great, unless previous bad experience with trying to make it work discourages them from trying it again. DCDuring TALK 17:40, 21 August 2013 (UTC)[reply]

Are there any free IPA-to-speech systems that we could use to automatically generate machine pronunciations from IPA? It wouldn't be perfect but it would be a good complement for the IPA text itself. —CodeCa t 17:44, 21 August 2013 (UTC)[reply]

@Z: you are assuming that the reader is using a mouse, and that he can accurately hover each and every character, and that he can actually see the tiny underline under each letter. This is impossible to use with a touchscreen, and really difficult with small screen/small letters/big pointer or just moderately bad eyesight. Dakdada (talk) 10:25, 23 August 2013 (UTC)[reply]

So do you mean if some users can't use a feature, we should disable it?? --Z 10:40, 23 August 2013 (UTC)[reply]

No, but if the feature can't be used by a significant proportion of users, and if that feature actually makes their navigation worse, then it is not a good idea. Accessibility is important. Dakdada (talk) 12:20, 23 August 2013 (UTC)[reply]

I love the idea of linking them, but why not just copy the Wikipedia entries here? They are certainly relevant to a dictionary. bd2412 T 17:46, 21 August 2013 (UTC)[reply]

As said by MZ, it's a bad idea. There are already links to each letter in Appendix:English pronunciation (and in each language), and in this case all characters needed are in one page without a clutter of unneeded information exploded in several independent pages. Dakdada (talk) 10:25, 23 August 2013 (UTC)[reply]

We have pronunciation appendices like that only for few languages, and ALL of their information can be presented through "title" text (no need to open and browse another page; much more faster to access), beside giving the reader easy access to even more information via the link to WP entry (people still can use Appendix:English pronunciation etc. though, as the link would be kept) --Z 11:03, 23 August 2013 (UTC)[reply]

The absence of appendix for a lot of languages is a fair point.

However, "title" text should only give information about the link, it is not designed to do much more (it's very limited: no format, no link, can't be used with touchscreens...). If you want to actually give more information (especially information as complex as pronunciation), an actual "popup" would better, because we can format the text, use links etc. E.g. one link on the whole pronunciation would invoke a small box which would decompose the pronunciation, explain each character used, with links to Wikipedia or the Appendices (or even provide a computer generated audio). That would be more useful, readable and accessible than simple hover titles and separate links. Admittedly, that would require more work. Dakdada (talk) 12:20, 23 August 2013 (UTC)[reply]

That looks better. --Z 18:07, 23 August 2013 (UTC)[reply]

This has the aroma of a bright shiny object. Such projects are always more fun, but vastly less useful, than either fixing what's broken, delivering asked-for features, or finding out what actual users want. DCDuring TALK 17:00, 23 August 2013 (UTC)[reply]
No, at least for me, it's really useful, (I always searched for WP entry whenever encountered a new IPA symbol, they provide audio and examples from languages that I'm more familiar with and not just English, this led me to propose this, not just because the linked form is sexier), but it seems most people don't want this, so I'm disappointed (although, DCDuring, I can't consider your 'vote' in such discussions a valid one; nothing personal, but you always tend to oppose every single proposed change by default). --Z 18:07, 23 August 2013 (UTC)[reply]
I enthusiastically support MZ's proposals, which strike me as much more likely to respond to the needs of users other that ones just like you. When a given technical contributor repeatedly proposes changes that are unresponsive to the needs of anyone other than those much like himself, it becomes hard to take proposals from that contributor seriously. DCDuring TALK 19:36, 23 August 2013 (UTC)[reply]
Wouldn't it be trivial to put a class around these links so that they can be enabled with WT:PREFS? That way ZxxZxxZ or anyone else that wants them can use the feature but it would be off by default. DTLHS (talk) 20:31, 23 August 2013 (UTC)[reply]
CSS can’t be used to change links that way. But a Javascript widget could be written to add such links to all characters within each span.IPA. If it had a table of IPA characters and the sound names, I suppose the script could even add the full tooltips. —Michael Z. 2013-08-23 23:55 z

Eek. I'm not sure if this is an improvement or not. Instinctively, it will make the page harder to read, not easier. Renard Migrant (talk) 10:35, 9 June 2014 (UTC)[reply]

Relevant proposal: Using Module:IPAc in Template:IPAchar would add tooltips like "/dʒ/ 'j' in 'jam' " to IPA symbols. Only works for sounds that occur in English, though. --Yair rand (talk) 18:56, 9 June 2014 (UTC)[reply]

Huh, I just noticed that this is a discussion from almost a year ago. That's what happens when we have this month-based system instead of archiving things, I guess. --Yair rand (talk) 19:00, 9 June 2014 (UTC)[reply]

The idea is that these discussions are all still active, so there is nothing wrong with contributing to them. --Wiki Tiki 89 20:32, 17 July 2014 (UTC)[reply]

Wikibet: A simplified version of the International Phonetic Alphabet for Wiktionary and Wikipedia? edit

It is very useful that Wiki entries show the pronunciation of the foreign words, names, locations...etc using the International Phonetic Alphabet (IPA) and a listening button.

However, most people are not familiar with the IPA. Reading previous discussions on Wikipedia, it is estimated that less than 1% only of native English speakers are familiar with the IPA. My experience corroborates this statement.

In addition, it is very difficult to learn or repeat the pronunciation of a foreign word using the listening feature if you are not already familiar with the phonemes of the foreign language. This is because many phonemes in foreign languages don't have equivalent in English.

As a result and inspired by Wikipedia, we launched a collaborative project called the Spell As You Pronounce Universal alphabet project SaypYu. The aim is to spell words from all languages using a simplified version of the IPA which could make it easier for everyone to pronounce and learn foreign words. This could be used by readers who are not familiar with the IPA because SaypYu is more informative and phonetic than arbitrary language-specific romanisation systems and pronunciation respelling. SaypYu could also be used to pronounce words from English and other Latin-based languages given that different European languages pronounce the same letters differently (i.e. X in Portuguese is equivalent to SH in English and to CH in French).

SaypYu has 24 letters only and its purpose is comprehension. Obviously, it is not as accurate as the IPA, but it is much easier to learn and it is more consistent across languages than alternative systems that are currently used. SaypYu is not an alphabet as much as an internationally-consistent and phonetic orthography.

We view SaypYu as a standardised approximation of pronunciation for foreign languages. Because it is standardised, if everyone applies it consistently over a long period of time, one day, native speakers would get used to more easily understanding this 'inprecise' but standardised international accent.

Here are some examples:

English: COUGH PLOUGH THOUGH THROUGH would be spelled in SaypYu as follows: KOF PLAW DHOW THRU
French: the word for CAT in French is CHAT. Using SaypYu, it would be SHA
Italian: the word for BYE in Italian is CIAO. Using SaypYu, it would be TSHAW
Greek: the word for GOOD EVENING in Greek is καλησπέρα. Using SaypYu, it would be KALISPERA
Arabic: the word for WORD in Arabic is كلمة. Using SaypYu, it would be KALIMA. Also, كلام would be KALAAM
Chinese: the word for HELLO in Chinese is 你好. Using SaypYu, it would be NIHAW

Given Wikipedia's mission in making knowledge available freely to everyone, we are wondering if it is a good idea to use SaypYu’s standardised and simplified orthography when showing how foreign words are pronounced in Wiktionary and Wikipedia?

We could show this spelling in addition to the IPA's transcription under a separate heading. If the Wiki community wishes, we are ok if such a heading has reference to Wikipedia (i.e. Wikibet, WikiAlphabet, WPA, Wiki Phonetic Alphabet...etc), without any reference to SaypYu or to us as we are not doing this project for profit.

If such an initiative is approved, it could be easily implemented using a bot.

I am new to Wikipedia so I hope that the format and outline of this suggestion is appropriate.

Additional Background on SaypYu

The Spell As You Pronounce Universal alphabet project (SaypYu – pronounced Sipe-You) is a collaborative project that aims to build a list of words from all languages spelled phonetically using a simplified version of the International Phonetic Alphabet. All of the 24 letters of the SaypYu alphabet are taken from the standard Roman alphabet, with the exception of the letter schwa ‘ɘ’, which is essential for English and many other languages and which could be represented using the asterisk sign * for the ease of typing on a standard keyboard. The letters C, Q and X have been removed because these could be replaced by their phonetic equivalents: K and/or S.

The simplified phonetic spelling of words would enable everyone to more easily pronounce foreign names, brands, food, words..etc.

Unlike Esperanto, SaypYu involves phonetics only and does not require the learning of new vocabulary.

Unlike the attempt of George Bernard Shaw to reform English spelling, this project does not aim at reforming spelling. Instead, users can switch from one spelling system to another back and forth with a press of a button depending on what better suits them at a particular moment.

Unlike the International Phonetic Alphabet (IPA), whose purpose is to be very accurate and as a result has over 100 letters and 50 diacritics, the purpose of the SaypYu alphabet is comprehension. This is the reason we can afford to have a fewer number of letters as it is easier to pass the test of comprehension than to pass the test of accuracy. In addition, because the IPA has such a large number of characters it is not practical for everyday use. Naturally, the level of accuracy and comprehension using SaypYu would vary from one language to another – it is highest for English and other Latin-based languages and lowest for certain Asian and tonal languages. — This unsigned comment was added by Cosmopolitanism (talk • contribs) at 18:22, 19 August 2013.

I'm not sure about using a relatively unknown scheme like this. 1% of people might be familiar with IPA, but 0% of people know this system. If we do it, it should be done as an automatic mapping of IPA much like we (are going to) do with SAMPA, and implemented in Lua. —CodeCa t 17:34, 19 August 2013 (UTC)[reply]

Thank you CodeCat for the feedback. You are right, a much fewer number of people are familiar with this orthography vs IPA. However, because it has only 24 letters that are based on the Roman alphabet, anyone who knows English or a Latin-base language can grasp it almost instantaneously. — This unsigned comment was added by Cosmopolitanism (talk • contribs) at 18:57, 19 August 2013.

I don't think we want a system which is 'imprecise'. Also looking at the examples above, it looks pretty difficult. Is there a chart to explain it somewhere? That would be a good start. Mglovesfun (talk) 18:06, 19 August 2013 (UTC)[reply]

Yes, there is a chart on the website at SaypYu dot com — This unsigned comment was added by Cosmopolitanism (talk • contribs) at 19:09, 19 August 2013.

This system seems to be too simple to accurately describe pronunciation. For example, in your Greek example it’s impossible to know which syllable is stressed, in your English through example it’s impossible to know whether the word has a fricative or a cluster.

You seem to complain the IPA’s purpose “is to be very accurate”. How is that a bad thing? Even if it is, IPA has the //-[] distinction, a clever feature which permits both detailed and undetailed phonetic transcription.

See [4]. — Ungoliant ^(Falai) 18:06, 19 August 2013 (UTC)[reply]

Yes, this is not a competitor system to the IPA by any stretch of imagination. It is a competitor to current inconsistent and sometimes unphonetic transliteration systems and pronunciation respelling. The stress could be indicated by doubling the vowels.

Being accurate could be a bad thing if it leads to a complicated alphabet that most people cannot learn and therefore they give up on learning a new word or a correct pronunciation — This unsigned comment was added by Cosmopolitanism (talk • contribs).

Doesn't even cover English (which is pretty complex to start with) as I covers both /i/ and /ɪ/. It's also complex because it has ambiguities, like SH for /ʃ/ but S and H also have their own sounds. To know how to interpret these pronunciations, you really need to know how the word is pronounced first, which defeats the object. Mglovesfun (talk) 18:21, 19 August 2013 (UTC)[reply]

Also surely Wikibet would be a wiki-based betting website. Mglovesfun (talk) 18:23, 19 August 2013 (UTC)[reply]

lol... Mglovesfun, actually, it does approximate a large number of sounds as it uses only 24 letters instead of the over 100 for IPA. I believe that it would perform better than current pronunciation respelling and romanization systems, would you agree? — This unsigned comment was added by Cosmopolitanism (talk • contribs).

I definitely don't agree, if in order to simplify the system we sacrifice so much information so it can't even cover English without inherent errors, what's the point? The word city defeats this system, for crying out loud, because /ɪ/ doesn't have a character. If you want to simply pronunciations at the cost of accuracy, why not have even fewer sounds? Why not just have one, and transcribe everything as 'o'. Mglovesfun (talk) 19:34, 19 August 2013 (UTC)[reply]

Mglovesfun, apologies I think I might not have explained clearly that the point of this system is comprehension so someone who doesn't know English for example can use it to try to pronounce English words. If we take your example city, if someone pronounces it as /ˈsɪti/, /ˈsɪtɪ/, /ˈsitɪ/ or /ˈsiti/, you would still understand them. In fact, the link you posted shows two pronunciations, one for North of England which is completely understood by any English speaker.

The question is what is the optimal number of letters that balances accuracy, comprehension and time people are willing to put to learn the system. An approximation that works is better than a perfection that no one uses.

Also, my last question was not about comparing IPA to SaypYu which was discussed above, but about comparing SaypYu to pronunciation respelling and romanisation systems. For instance, would it not be better to write siti instead of sih-tih? Also, would it not be more accurate/easier to write Mubaarak instead of Mubarak or Mubārak for the name of the previous president of Egypt?Cosmopolitanism (talk) 07:00, 20 August 2013 (UTC)[reply]

Opposed, rather strongly. I fail to see the point of implementing Yet Another Transcription System™. We already have phonemic and phonetic IPA transcriptions for pronunciation sections, and various other official and semi-official transcription systems for each individual language. SaypYu is a solution in search of a problem, which also creates additional problems in the process. No thank you. ‑‑ Eiríkr Útlendi │ Tala við mig 19:41, 20 August 2013 (UTC)[reply]
Explore further: I am predisposed by nature in favour of phonetic spelling as a way of making things more accessible for Wikipedia readers. But we cannot judge a system without at least giving it a preliminary testing, or else we are talking just theory and generalisations. What I suggest is that Cosmopolitanism would provide a list of let's say 100 words (made of a variety of simple and complicated words, in English, in Latin origin vocabulary and in non-Latin origin vocabulary, and some personal names of world personalities in suggested SaypYu format. I have spent some time and checked at least 40-50 words off the SaypYu website and I found the suggestions pragmatic, quite easy and straight forward , and reasonably accurate. Werldwayd (talk) 21:29, 20 August 2013 (UTC)[reply]

Please don't test it here though. Mglovesfun (talk) 22:08, 20 August 2013 (UTC)[reply]

- after e/c... Yet, the proposal here is to use SaypYu for much more than just English and Latin -- the proposal is (presumably) to use it in all cases where the IPA would be used.
As such, any serious investigation of the practical use of this additional transcription scheme must look at a broad array of terms in languages with a broad array of phonetic features.

It appears that SaypYu fails. As Mglovesfun mentions, city doesn't have a good transcription in SaypYu. Moreover, the system is ambiguous -- even assuming (reasonably, I think) that the EN Wiktionary is for English readers, one must acknowledge that different readers of English have different understandings of what each individual glyph means, phonetically speaking. A native French speaker's ⟨i⟩ at the phonetic level is not a native Japanese speaker's ⟨i⟩. Past there, what of digraphs? Is SaypYu ⟨th⟩ equivalent to IPA [th], or to [θ]? Is SaypYu ⟨sh⟩ equivalent to IPA [sh], or to [ʃ], or to [ɕ]? What of sounds not found in English, like tones in Chinese (c.f. 你好 (nǐhǎo, more closely transcribed as “níhǎo”), notably distinct from 你 (nǐ)號／号 (nǐ hào, “your horn”) or 妮 (nī)嗥 (nī háo, “the maid roars”)), or the nasalised vowels or ejectives in Navajo (c.f. the long low-tone nasal a in tábąąh, or the contrastively ejective k in bikʼé (“in exchange”) vs. bikééʼ (“after, behind”), or the phonemically-important distinction between ałʼąą (“separate”) and áłah (“together”))? SaypYu has no facility for tones, no facility for nasals (they suggest just adding an n afterwards, but this is problematic to say the least), no facility for ejectives...
If SaypYu is not intended for phonetic transcription, what is it for?
- If it's for phonemic transcription, we already use a subset of the IPA for that. Why require users to learn both systems?
Because SaypYu could be grasped almost instantaneously by anyone who knows English or a Latin-base language Cosmopolitanism (talk) 11:08, 21 August 2013 (UTC)[reply]
Yet that's also the inherent bias, isn't it -- English readers? What reason would a reader have for guessing that SaypYu ⟨th⟩ == IPA [θ], or that SaypYu ⟨sh⟩ == IPA [ʃ]? These digraphs suggest that this transcription system was invented by native-English readers. As demonstrated below, these digraphs, and indeed the system as a whole, also entail unavoidable ambiguities.

That English bias is also apparent in the lacunae of this transcription system. How would you transcribe the voiceless alveolar lateral fricative [ɫ]? This sound doesn't exist in English; in fact, the only European languages that contain this sound (that I'm aware of) are Welsh and Czech. It's also found in Zulu, among other African languages, and Navajo and multiple other North American languages. I've seen this phoneme latinized as ⟨hl⟩ for Czech, but doing so for all languages causes other problems -- Navajo often has consonant clusters like ⟨tł⟩ obsolete or nonstandard characters (ł), invalid IPA characters (ł) ([tɫ]), but transcribing this as ⟨thl⟩ simply doesn't work, due to the immediate abiguities: is this [tɫ] or [θl], or again [thl]?

My complaint about SaypYu isn't that it's "loose" or "inaccurate" -- my complaint is that it is 1) incomplete to the point of unusability, as it cannot transcribe terms in multiple languages; 2) ambiguous to the point of unusability, as described by multiple people in this thread; 3) inherently biased to native-English readers. I'm sure I could come up with more serious concerns, but those are my big three at the moment.

SaypYu just isn't usable. ‑‑ Eiríkr Útlendi │ Tala við mig 17:58, 21 August 2013 (UTC)[reply]

Yes, SaypYu takes a few things from English and Roman alphabet because of the importance of English and because the Roman alphabet is the most known one. However, it takes things from IPA as well, for example an ee as in Street is replaced by ii => Striit.

Compared to the IPA, I agree that SaypYu misses a lot of information - that is by design :) Compared to mainstream transcription systems, it does miracles even with fewer letters than 26 (I have just added below an examples re former president Mubarak).Cosmopolitanism (talk) 20:10, 21 August 2013 (UTC)[reply]
- If it's for general transcription of non-Latin-alphabet languages, we already have various transcription systems inherited from scholars using the Latin alphabet to describe these languages. Why require users to learn another system, especially when that system isn't widely used?
Because the various transcription systems to which you are referring are not consistent with each other. Similar phonemes in different languages are transcribed very differently to each other which requires the users to learn each system separately and cannot apply their knowledge of one system in another system Cosmopolitanism (talk) 11:13, 21 August 2013 (UTC)[reply]
.... If it's consistency you want, that's exactly what the IPA is for. And, again, as demonstrated at various points in this thread, SaypYu itself is so inherently ambiguous that users would still have to learn the caveats and workarounds for each individual language. I.e., it's no better than the existing transcription systems used historically for each language. So what, again, would the advantage be? This looks like a lot of work, for no appreciable benefit. ‑‑ Eiríkr Útlendi │ Tala við mig 17:58, 21 August 2013 (UTC)[reply]

The aim is to have an international standard to be used by everyone. It is kind of a cross between IPA and the different transcription systems. Just clarified what I mean by adding an example below re the name of Mubarak.Cosmopolitanism (talk) 20:13, 21 August 2013 (UTC)[reply]
I haven't seen any compelling reason even to explore further. @Werldwayd, can you give any additional reasons, factoring in that the EN WT is intended to contain terms in all languages? ‑‑ Eiríkr Útlendi │ Tala við mig 22:38, 20 August 2013 (UTC)[reply]

Can this scheme be automatically generated from IPA, similar to what we do with SAMPA? DTLHS (talk) 22:10, 20 August 2013 (UTC)[reply]

Yes, SaypYu can be automatically generated from IPA or from SAMPA Cosmopolitanism (talk) 11:15, 21 August 2013 (UTC)[reply]

Oppose, rather strongly as well. This system would fail on simplest tests. Even Lonely Planet (who use the silliest imaginable transliteration system) can't do without using additional symbols or numbers, 26 letters and all in upper case is not enough. No need even to try Arabic (emphatic and guttural consonants), Hindi (retroflex consonants) or Mandarin (qi/chi, ji/zhi, xi/shi and tones), etc. It will fail on European languages like French or Polish. E.g. describing Polish pairs ś/sz, ź/ż, ć/cz as "SH", "ZH" and "CH" is a big NO. --Anatoli ^{(обсудить}/^вклад) 00:05, 21 August 2013 (UTC)[reply]

Oppose. I wouldn't dismiss IPA so quickly; IPA may be difficult for the average user now, but it will gain currency as this site becomes the go-to dictionary for the world. The accessibility that wiki projects provide (the discussion directly above is particularly apropos) will make it much easier for the average person to learn to read IPA. Learning 100 characters is not that hard in itself. Kids in some countries do it every month. The problem was that the barrier has been too high. It's going down fast, so problem solved. --Haplology (talk) 05:13, 21 August 2013 (UTC)[reply]

There is some misunderstanding. Thank you all for the feedback. Most of the opposing comments above focus on the argument that SaypYu is not as accurate as the IPA. With all due respect, this argument is not relevant to this proposal. Obviously SaypYu is not as accurate as the IPA - it has 24 letters only! SaypYu is designed for those who are familiar with the Roman alphabet and are not familiar with the IPA or various transcription systems (probably more than 99% of the speakers of English and other European language). @Anatoly: you are referring to emphatic and guttural consonants, retroflex consonants, tonal...etc. How many people know what these term mean, let alone know the IPA representation of these phonemes or how to replicate these phonemes?

I would love if everyone in the world learns the IPA. Even with initiatives such of the Wiki projects of which I am a massive fan, I believe that mastering the IPA requires proper university level training in linguistics. It is not realistic to expect people to do that just to be able to more accurately pronounce foreign words. Cosmopolitanism (talk) 11:28, 21 August 2013 (UTC)[reply]

How about it's a really bad system therefore we want no part of it, does that help? Mglovesfun (talk) 11:32, 21 August 2013 (UTC)[reply]

@Mglovesfun The purpose of SaypYu is to approximate the phonemes of foreign languages into a simple system. Many Wikipedia pages already do just that in an ad hoc fashion. For examples, the Pinyin or the Polish_alphabet pages on Wikipedia show the approximate English pronunciation of various phonemes. The purpose of SaypYu is to simplify this process by spelling the words directly and consistently using a simplified alphabet so readers don’t have to review each letter in the table individually. Also, there is the ad hoc pronunciation respelling which is widely used on Wikipedia.Cosmopolitanism (talk) 11:43, 21 August 2013 (UTC)[reply]

I'm not denying any of that. Mglovesfun (talk) 11:44, 21 August 2013 (UTC)[reply]

@Mglovesfun: Thanks for this. Werldwayd suggested to provide a list of words to explore further. You suggested not to do it on this page of Beer parlour. Would you and Wiki be ok if we set up a separate page on Wiki to provide examples of words from English and other languages? This could be also used for discussion purposes. SaypYu is not set in stone and as the IPA could evolve over time to accommodate more phonemes and better accuracy. We are always trying to balance the level of accuracy that could be achieved with the level of effort that people are willing to put to learn the pronunciation of foreign words. We could hugely benefit from your expertise and the expertise of other participants of Wiki.Cosmopolitanism (talk) 11:58, 21 August 2013 (UTC)[reply]

It's not up to me (not only me I mean) but I meant not on Wiktionary rather than just not on the Beer Parlour. We don't want an irrelevant personal project that can only harm Wiktionary in the long run, by attempting to replace well established systems that work with an unestablished system that doesn't work. Am I still being too subtle? Excessive subtlety not usually something I'm associated with. Mglovesfun (talk) 12:04, 21 August 2013 (UTC)[reply]

I agree that the system is not established, but I think it is unjustified to say that it does not work. To properly test such a system you need to ask people who don't know the IPA (you already know it) or a foreign language to pronounce words using this system and then you can see if their pronunciation is more accurate. We have already done these tests.

Being a personal project doesn't mean it is bad. The question that should be ask is not whether it is personal project, but whether it is useful to Wiki users. You have already conceded above that the approximation of phonemes of foreign languages is ad hoc and inconsistent in Wiki. Cosmopolitanism (talk) 12:34, 21 August 2013 (UTC)[reply]

See Wikipedia:Wikipedia is not for things made up one day. This is clearly not an established way of transcribing pronunciations and it can't be used here on Wiktionary. That's all. Dakdada (talk) 16:12, 21 August 2013 (UTC)[reply]

Thanks Dakdada for pointing out to this link. Hope this post was not inappropriate. info at wiktionary suggested to bring the proposal here after I had sent them an initial email.Cosmopolitanism (talk) 17:48, 21 August 2013 (UTC)[reply]

Here are examples of where it fails:

asshole (impossible to know the SH is /sh/ instead of /ʃ/)
cathouse (impossible to know the TH is /th/ instead of /θ/)
ad-hoc (impossible to know the DH is /dh/ instead of /ð/)
parrying (impossible to know the II is /i.ɪ/ instead of /iː/)
Portuguese álcool (if the vowel-doubling system for indicating stress you suggested above is used, it’s impossible to know the /OO is /o.o/ instead of /ˈo/)
Portuguese olho (noun) and olho (verb) (impossible to know which has /o/ and which has /ɔ/)
Portuguese pelo (noun) and pelo (verb) (impossible to know which has /e/ and which has /ɛ/)
Portuguese matamos and matámos (impossible to know which has /a/ and which has /ɐ/)
No symbol for /ɲ/, /ʎ/, /ŋ/, /ʂ/, /ɣ/, /ʁ/, /q/, /ħ/, /ʕ/, /ʔ/, rounded front vowels, tones, palatalisation, nasalisation, pharyngealisation, retroflex consonants, pitch accent. Fails for every major language. — Ungoliant ^(Falai) 16:27, 21 August 2013 (UTC)[reply]

Ungoliant: You are absolutely right, the purpose of SaypYu is not to address the above points - that is the purpose of the IPA. SaypYu is not a competitor to the IPA, but a way to make it easier for people to better pronounce foreign words with the least amount of learning. We removed C, Q and X because they have some ambiguity across languages, but we had to add a letter for the schwa with reluctance. Adding more letters/diacritics would make the system more accurate, but less likely that people will learn it. Unfortunately, it is impossible to have it both ways: simple and accurate at the same time.Cosmopolitanism (talk) 18:05, 21 August 2013 (UTC)[reply]

Well, this is yet another respelling system (YARS). Why use this one? No one in the world would automatically anticipate SaypYu’s meaning of the multigraphs sh, th, dh, tsh, etc., and aa, oo, ee, ii, etc. It could be an alternative to “our” “enPR” (cough AHD) but I don’t see any advantages or disadvantages of the one over the other, nor any good reason to add a second respelling system.

If we choose to add another transcription, it should probably be some variation of the sound-alike English (dictionary = \DICK·shun·air·ee\) that so many native anglophones relate to.

Ultimately, a widget could enable a whole menu of transcription systems, according to a reader’s choice. But any of these could probably be automatically derived from IPA. —Michael Z. 2013-08-21 16:45 z

Re the multigraphs: the aa is a long version of a, oo is a long version of o…etc. Also, the pairs s and sh, t and th, d and dh are linked to each other phonetically. Tsh is the application of IPA to SaypYu.Cosmopolitanism (talk) 18:38, 21 August 2013 (UTC)[reply]

Also, if we’re avoiding transcription systems used by 0% of our readers, let’s just kill X-SAMPA. —Michael Z. 2013-08-21 16:48 z

This is an area where innovation seems unnecessary and probably confusing. The "sound-alike" system is different in kind from the any of the the three others we support in that it has almost no learning curve. That is, it is more likely to be useful than yet another system that has to be learned before it is useful. Both enPR/AHD and X-SAMPA seem to have relatively small user bases. IPA is more or less standard, though it is far from a universal one (in the sense of one symbol being the identical sound in all languages) as I understand it. DCDuring TALK 17:17, 21 August 2013 (UTC)[reply]

Michael Z. and DCDuring: Thanks for your comments. I completely agree that the “sound-alike” system is different in kind from the other systems. However, it is relevant for English speakers only and it is arbitrary. SaypYu was developed as an international and consistent version of the “sound-alike” system with the possibility of being used one day as a mainstream system for transcribing words across foreign languages. That cannot be said about other phonetic systems because they have a far larger number of letters than the SaypYu alphabet.

To clarify what I mean: in the news today there is reference to the former president of Egypt: Mubarak (as spelled in English media), مبارك (in Arabic), /moˈbɑːɾɑk/ (IPA based on Wiki page. Btw, I think the o should be u), Mubārak (in official transliteration), and something like \moo•bar•ack\ (in “sound-alike”). The IPA is necessary, but I am not sure about the rest. We are suggesting to replace the rest with Mubaarak because it uses letters with which everyone is familiar, it could be easily typed on a standard keyboard, yet it is more accurate phonetically than the media's use of Mubarak.

The IPA, other phonetic systems, transliteration systems with diacritics that are not consistent across different languages and “sound-alike” systems can never make their way as mainstream international and consistent transcription systems. Who better than Wiki to support such an international initiative to help bridging the barriers between different languages? We are not proposing anymore for Wikti to adopt SaypYu. Instead, we are suggesting that Wiki, using its fantastic community, develop an international and consistent spelling standard that could be used one day as an the only mainstream system for writing foreign words and names in the media across various languages. With initiatives such as Simple English Wikipedia, simplifying and democratizing foreign language pronunciation does not seem such a far fetched objective for Wiki. Cosmopolitanism (talk) 19:56, 21 August 2013 (UTC)[reply]

I think it's time to stop talking about this. Mglovesfun (talk) 21:11, 21 August 2013 (UTC)[reply]

A Wiki projects non-IPA simplified phonemic representation system, for English words only, for Wikipedia and WIktionary, commissioned by Wiki Headquarters from professional lexicologists and implemented as the official standard Wiki policy. Yes. Years overdue. Typical anarchy/dictatorship of Wikiness, that some random 3 editors one day decided dogmatically on IPA as the sole system, and it's now written in stone (along with a gruesome mishmash of UK/US allophonic representations like [oʊ] for /oː/).

But that is not the SaypYu system, which isn't any more comprehensible to Americans than IPA, and is also being vigorously "marketed" by its advocates as a spelling reform for English (and allegedly all the languages of the world... ?), and so has massive non-neutral social-political connotations.

HanEditor (talk) 06:57, 23 August 2013 (UTC) hanEditor[reply]

We are already using a non-IPA simplified phonemic representation system, for English words only, from professional lexicologists. We call it wt:enPR. It’s about as good as another score of respelling systems. It happens to be compatible with the system in a very influential professional English dictionary, the AHD. What on earth would be gained by spending thousands of dollars on a novel, non-standard system?

We are also using an international standard that is compatible with hundreds of other English, foreign, and multilingual dictionaries, the IPA. One can quibble about our implementation, but there would be something dreadfully wrong if we were not using it.

I think our pronunciation practices are not terrible. —Michael Z. 2013-08-23 18:53 z

Vulgar Latin edit

When Vulgar Latin terms are used in etymologies, should they be red-linked ({{term/t|la|*battālia}}) or greyed-out ({{term/t|la||*battālia}})? Red-linking them is okay if somebody is at some point going to create an appendix of Vulgar Latin terms; however, if that's never going to happen (due to not enough reconstructive evidence or whatever), having them greyed will be better than having lots of permanent redlinks. Is there any consensus on this? WT:About Vulgar Latin is not helpful. Hyarmendacil (talk) 00:20, 20 August 2013 (UTC)[reply]

If there’s not enough reconstructive evidence, no reconstruction should be displayed at all. — Ungoliant ^(Falai) 00:28, 20 August 2013 (UTC)[reply]

Ok, but I'm talking about entries which already have Vulgar Latin terms in them. They aren't cited, of course, and I can't make the judgement myself. The reason that I am asking is because a lot of the current Vulgar Latin {{term}}s are causing cleanup/Latin extended messages through not having lang=la, which I'm clearing up. In the process of doing so, though, I've noticed that there is no clear rule for redlinking or greying these terms - there are a lot of both. So I'm asking whether there is a consensus on this. Hyarmendacil (talk) 10:22, 20 August 2013 (UTC)[reply]

Citation Needed Template edit

Don't we need a "citation needed" (or "reference needed") template?

But someone deleted this years ago: http://en.wiktionary.org/w/index.php?title=Template:citation_needed&action=edit&redlink=1

HanEditor (talk) 09:53, 20 August 2013 (UTC)[reply]

This is a bit sticky- see WT:CITE. Since our Criteria for Inclusion (WT:CFI) are based on usage, not on authoritative references, a citation is different here than in Wikipedia. Etymologies are the only quasi-encyclopedic part of the entries that would need to be referenced like Wikipedia. If we had a citation needed template, people would be adding it to definitions where it doesn't belong. The closest we have is {{rfv-etymology}}, which tags an entry for discussion in the Etymology Scriptorium (WT:ES). Chuck Entz (talk) 14:41, 20 August 2013 (UTC)[reply]

We do have {{unreferenced}}. —An gr 15:03, 20 August 2013 (UTC)[reply]

Unreferenced is probably overused. How many entries using {{unreferenced}} actually need citations showing usage, not secondary sources. Such secondary sources are however useful in particular for etymology, but also for pronunciation ad usage notes. Mglovesfun (talk) 15:36, 20 August 2013 (UTC)[reply]

I've added documentation for {{rfv-etymology}} as well as changed it to link to WT:REF#Etymologies. I wasn't aware that that template existed before this discussion. --Ivan Štambuk (talk) 09:45, 21 August 2013 (UTC)[reply]

My apologies for using the term "citation" — I mean references for facts, not illustrations of words in use. {{unreferenced}} will do nicely. I'm not sure {{rfv-etymology}} will convey to average readers that the material tagged is to be regarded with suspicion. Thanks HanEditor (talk) 06:32, 23 August 2013 (UTC)[reply]

If an etymology is suspicious and unreferenced, it should simply be removed or taken to the talk page. Average reader has no business verifying etymologies. These templates simply mean "to whomever added this: please provide a reference". this v in rfv is a bit misleading because it has nothing to do with the ordinary RFV process - perhaps the template should be renamed to e.g. {{ref-etymology}} ? --Ivan Štambuk (talk) 09:42, 23 August 2013 (UTC)[reply]

HTTPS for users with an account edit

Greetings. Starting on August 21 (tomorrow), all users with an account will be using HTTPS to access Wikimedia sites. HTTPS brings better security and improves your privacy. More information is available at m:HTTPS.

If HTTPS causes problems for you, tell us on bugzilla, on IRC (in the #wikimedia-operations channel) or on meta. If you can't use the other methods, you can also send an e-mail to https@wikimedia.org.

Greg Grossmeier (via the Global message delivery system). 18:59, 20 August 2013 (UTC) (wrong page? You can fix it.)[reply]

I thought I'm going to be apologized to for this English message... --Z 19:30, 20 August 2013 (UTC)[reply]

(With no apology for this being in English, hahah!)

As I just updated on the meta page, we've delayed this rollout by one week. The change will now take place on August 28 at 1pm Pacific Time. Please take a look at gadgets or bots you maintain to make sure they'll continue to work; more information at meta. Sharihareswara (WMF) (talk) 21:25, 21 August 2013 (UTC)[reply]

Gigadictionary? edit

"Our dream is to give you definitions for all words in all languages". Copycat, much? bd2412 T 21:31, 22 August 2013 (UTC)[reply]

Well, it has similar aims, but is much easier to actually use. I added the Italian adjective copiabile and the the English one copiable without having to know any exotic formatting. SemperBlotto (talk) 07:18, 23 August 2013 (UTC)[reply]
- Hmmm. It has no place for etymologies or pronunciation that I can see. I added "barbarity" (copying our own brief and noncopyrightable definition over) and saw no licensing information. In fact, I can't fine licensing information on the website at all. bd2412 T 02:35, 27 August 2013 (UTC)[reply]
  - The site´s much better than ours. Time to accept our fate and all move over there. See you all soon! WF
    - I think it's shit. It looks terrible, has no place for etymologies or pronunciations and I haven't been able to find a single word after searching a few very common English terms that actually has a proper definition. BigDom (t • c) 09:02, 2 September 2013 (UTC)[reply]
I don’t think they will ever replace Wiktionary. At least not with their current scheme of basing content around images and translations. — Ungoliant ^(Falai) 11:12, 2 September 2013 (UTC)[reply]

Section Headers: replace "Alternative Forms" with "Other Forms" edit

To avoid confusion with my original proposal to change section headers from "Alternative Forms" to "Variant Forms", which I have abandoned:

http://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2013/August#Section_Headers:_.22Alternative_Forms.22_vs._.22Variant_Forms.22

I am here starting a topic about changing section headers from "Alternative Forms" to "Other Forms".

The problem with "Alternative" is it means usable at will:

Oxford Dictionary of English (3rd Ed.): "(of one or more things) available as another possibility or choice"

But the content in the "Alternative" sections is often variant forms which must, or may not, be used in certain cases, or which are rare and not really a "good alternative" to choose. A Chinese side radical character like 氵 is absolutely not an "alternative" to the main character form 水. The word cannot is truly an alternative to can not, but you're risking people not getting jobs and university admission if you advise them that using ain't is an "alternative" to isn't.

The solution is the utterly neutral term "Other Forms".

HanEditor (talk) 07:22, 23 August 2013 (UTC)[reply]

Pretty much all of the alternative forms have qualifiers ({{qualifier}}), such as regional, obsolete, archaic, rare, colloquial etc., unless they're truly interchangeable forms or such information is somehow obvious from the spelling. --Ivan Štambuk (talk) 18:17, 23 August 2013 (UTC)[reply]

Indeed. Mglovesfun (talk) 11:36, 24 August 2013 (UTC)[reply]

Other forms has the advantages that it is a simple Anglo-Saxon word, and lacks any implications which don’t apply here. Alternative forms has more syllables and makes us sound oh so much more academicky. Changing the heading would be an improvement. —Michael Z. 2013-09-09 18:18 z

(Modern) Church Slavonic? edit

I noticed that Ivan has been adding some "Church Slavonic" descendants to Proto-Slavic entries, but he has been using "cu" (Old Church Slavonic) for the links because we don't recognise or use modern Church Slavonic. Should this change? Church Slavonic is somewhat like Ecclesiastical Latin, but we don't treat that like a distinct language either, and unlike Latin, Church Slavonic was never spoken by anyone and doesn't represent anything really. It's more or less just a vernacularised Old Church Slavonic. Imagine that someone were to write Italian but replace the words with their Latin spellings and old grammatical constructs that nobody understands, that's kind of what Church Slavonic is. It's not even really one language, it's used differently in different places. According to Wikipedia, modern Russian CS is just pronounced as Russian with a somewhat archaic pronunciation. So should we admit it as a separate language? —CodeCa t 15:57, 23 August 2013 (UTC)[reply]

CS recensions are different languages and should be clearly separated. It is not "vernacularized" OCS, they have different scripts/orthographies, vocabularies, traditions, grammars, dictionaries, and is nothing alike ecclesiastical Latin. --Ivan Štambuk (talk) 16:03, 23 August 2013 (UTC)[reply]

m:Requests for comment/Global ban for Ottava Rima edit

Per the m:Global bans global policy, you are informed of the discussion above. Please comment there and feel free to appropriately distribute more widely in prominent community venues in order to «Inform the community on all wikis where the user has edited». Nemo 10:11, 24 August 2013 (UTC)[reply]

FYI: Ottava Rima (talk • contribs) has two edits in English Wiktionary. --Dan Polansky (talk) 10:41, 24 August 2013 (UTC)[reply]
Of those two edits, both were in 2010, and both were constructive (one reverting vandalism, the other adding a valid sense to a term). I see no reason for any ban on this user on this site. ‑‑ Eiríkr Útlendi │ Tala við mig 17:36, 24 August 2013 (UTC)[reply]

A small question regarding the format of links edit

The display of links has been a little inconsistent, but this has never really been too much of a problem. However, with Lua helping to homogenize things, it becomes a question. It specifically regards the order of the different annotations (gender, transliteration, gloss etc.) that follow a link. In translations and in the original version of {{l}}, the transliteration was displayed first in brackets, then the gender, and then the gloss in another set of brackets. Translations normally don't show glosses (the templates never had a parameter for it, and still don't) so it would look ok. But for {{l}}, when the gender was left out (which was most of the time), then there would just be two pairs of brackets, which looked a little odd. {{head}}, similarly, displays transliteration, then gender, then the inflections, so it too shows two pairs of brackets right after each other if there is no gender to show. {{term}}, meanwhile, never supported genders until recently, so it would merge the brackets into one. {{l}} and {{t}} now work this way as well, but as they also support genders, the end result is gender followed by a single pair of brackets which contains transliteration and glosses. I'm not sure if this is the best way, because I think transliteration should come before gender, but what should be done with the double bracket pairs? —CodeCa t 11:54, 24 August 2013 (UTC)[reply]

I don't suppose you have any examples? I'm having trouble imagining what this looks like (though that might be the lack of coffee this morning :). ‑‑ Eiríkr Útlendi │ Tala við mig 17:37, 24 August 2013 (UTC)[reply]
- {{l|ru|гора||mountain|tr=gora|g=f}} > гора f (gora, “mountain”)
- {{head|ru|noun|head=гора|tr=gora|g=f|plural|горы}} > гора • (gora) f (plural горы)
- —CodeCa t 18:11, 24 August 2013 (UTC)[reply]

phrases, multiple word expressions, idioms as entries in Wiktionary edit

I wonder where you feel is a reasonable limit to include multiple words expressions to Wiktionary. My question has been triggered by "sphere of influence" entry, which I consider appropriate rather for Wikipedia than for a dictionary kind of database. Thanks for your opinion and maybe some existing rule reference, Peter

The limiting factor isn't number of words but idiomaticity: does the expression have a meaning that can't be understood from the sum of its parts. If you look at the entries for sphere, [[of], and influence, you would be hard put to arrive at the actual meaning of the phrase (see WT:CFI). There's a bit of a subjective element to it, so we're constantly debating at WT:RFD whether various phrases are sum-of-parts (SOP in Wiktionary jargon) or not. The sum-of-parts criterion only applies to multi-word entries: if it's a single word, we generally leave it alone, even if the meaning is predictable from its component prefixes, suffixes, etc. Chuck Entz (talk) 15:43, 25 August 2013 (UTC)[reply]

I think the answer (i.e. "where"") is W:CFI#Idiomaticity. Mglovesfun (talk) 10:17, 26 August 2013 (UTC)[reply]

Remove macronless forms from Latin links edit

With the advent of Lua, I believe macronless forms are no longer need inside Latin links. To demonstrate, {{l|la|ago|agō}} becomes {{l|la|agō}} but links in exactly the same way. Is there any loss of functionality at all? Removing the macronless forms by bot would be, I think, pretty simple. Which is what I'm proposing. Mglovesfun (talk) 11:39, 26 August 2013 (UTC)[reply]

We could make a modification to Module:links so that, if both parameters are given, it compares the entry name it generated from the display form with the one that was given. If they are the same, it places the entry in Category:Link alt form tracking/redundant. So it would work the same as Category:Sort key tracking, although I'm not sure if we'd need a corresponding Category:Link alt form tracking/needed. I don't know if it will catch all cases, either. For example, it may not catch embedded wikilinks, which can take many forms; sometimes people have added {{l}} into the head= parameter which is not needed at all, others have added #section links to it. Of course, if someone did that, then the above no longer works because "ago#Latin" does not match "ago". But we could probably also make it add another category to track down such links, so they'll be found and fixed in any case. —CodeCa t 11:50, 26 August 2013 (UTC)[reply]

I've been working on converting these now, but there are about half a million (!) of these for Latin so it will take some time. —CodeCa t 12:09, 31 August 2013 (UTC)[reply]

-yse/-yze edit

I linked analyze to analyse and paralyze to paralyse, as we already handle this with verbs having -ise/-ize suffixes. I chose that direction, because the spelling -yse (don't confuse that -ise/-ize, both are different things with different etymologies!) is the one used by most English-speaking countries (52 of 54) and used by most international organisations, e.g.the United Nations. (w:Oxford English) Also, the translations of analyze linked already to the article of analyse. Is everyone o.k. with that decision? --2.243.231.72 22:15, 26 August 2013 (UTC)[reply]

"the one used by most English-speaking countries (52 of 54)" is a numbers game and it’s not a very good argument. Why not choose the one used by the most native speakers of English? What you have done is nothing but rationalization. You can rationalize anything you like. I assume that, since you rationalized in the direction of British English, you are a speaker of British English. —Stephen ^(Talk) 23:03, 26 August 2013 (UTC)[reply]

I see no reason to break from the "earliest entry" rule: whichever entry in a given pair was created as an entry first is the lemma. In the case of analyze/analyse, that means analyze is the lemma, while paralyze is the lemma of the paralyze/paralyse pair. (Paralyse was created four minutes earlier, but by the same contributor as, and as a redirect to, paralyze.) There will be other pairs from which -yse spellings will become the lemmata.

Only the fact that -ize is accepted by all major varieties of English led various users to suggest that -ize entries should always be lemmata, even when -ise entries predated them. - -sche (discuss) 00:04, 27 August 2013 (UTC)[reply]

American English is by far the most common variety of English (in term of native speakers, and produced written/oral output as consumed by the rest of the world), and if anything, we should standardize on it. That British English spelling is standard in so many ex colonies, and yet fails to be the most common one, is a telling indication of past and future trends. --Ivan Štambuk (talk) 06:47, 27 August 2013 (UTC)[reply]

I'm the user who started this topic. Please give a source for your figures. Wiktionary is not the right instance to judge about what's the right world standard of English, so we should use what the reality is giving us. My figures are:

1. Google Search (an American company) results: analyse (187,000,000 results) vs. analyze (99,000,000 results)

2. The United Nations, a worldwide organisation spells it analyse and paralyse

3. The USA, Canada are the only countries adopted the spelling with -yze. All other 52 English-speaking countries are spelling it -yse.

Now you have us to give your sources for your figures. The produced oral content does not play a role here, but even in many American produced media, British voices are used e.g. in w:Diablo III or Harry Potter (by Warner Bros.) --Zinoural (talk) 09:50, 27 August 2013 (UTC)[reply]

The statistics are irrelevant because we already have the policy that the first created entry stays as the main entry. Equinox ◑ 09:51, 27 August 2013 (UTC)[reply]

Then we should follow this. I just disagreed with the statement of Ivan Štambuk to claim U.S. English as world standard. --Zinoural (talk) 10:11, 27 August 2013 (UTC)[reply]

Google gives different results from different locations. When I google analyze, I get 118,000,000. When I google analyse, the results are for English, Dutch, French, Latin, and Norwegian (as well as all its forms, including nouns and verbs, such as analysing, analyses, analysen, analyser, analysons, analyserait, and so on). The population of the UK is about the same as that of the two U.S. states of California and Texas. Ivan Štambuk’s statements have a strong tendency to be accurate. —Stephen ^(Talk) 12:19, 27 August 2013 (UTC)[reply]

...said by an American. Neither is the UK the only country using British English, nor is English the solely spoken language of California and Texas. And on the U.S. version of Google you won't find all results for other languages. I just tried it and found on Google FR more results for analyse than on Google COM, the same with paralyse where I find on German Google more than on the U.S. version. Also. your assumption to find more results for analyse (1st+3rd p.sg. of analyser in French) because of the other conjugations forms, is wrong. I just typed "analysons" in Google and found only 512,000 results, so as you see, other conugated forms aren't included for the specific form you search. Finally, this discussion is not getting us anywhere, since there is a clear rule we have to follow. --Zinoural (talk) 12:53, 27 August 2013 (UTC)[reply]

If you discredit Stephen because he is American, then you'd have to discredit everyone because everyone uses some English spelling, right? Maybe we should ask people who don't write in English at all? —CodeCa t 14:29, 27 August 2013 (UTC)[reply]

Plain Google search combines various languages as well as making various "assumptions" about your needs based on your search habits and similar. Much better reference is Google Ngram Viewer which specifically targets written English corpus as scanned by Google Books. Here are the results for analyse, analyze. --Ivan Štambuk (talk) 14:50, 27 August 2013 (UTC)[reply]

I don't really care about this argument one way or the other, but...

It's worth pointing out that 1) Google is a US company, and 2) Google has (AFAIK) been most active in its digitization efforts in the US, likely for very practical logistical reasons.

Given that we already do have a policy in place, I don't think statistics at this point are going to do anything useful. I already see several people in this thread spinning (i.e. not doing anything useful, and spending energy being upset that might be put to better purposes).

Shall we let this thread go? ‑‑ Eiríkr Útlendi │ Tala við mig 17:27, 27 August 2013 (UTC)[reply]

I oppose going by the number of countries. I furthermore oppose diff, which turned "analyze" into a mere alternative-spelling entry. I weakly support going by the number of language users. --Dan Polansky (talk) 17:56, 27 August 2013 (UTC)[reply]
For -ize vs. -ise in particular, I support that -ize entries be made the main entries rather than -ise entries. --Dan Polansky (talk) 18:09, 27 August 2013 (UTC)[reply]

Regarding "we already have the policy that the first created entry stays as the main entry". It's not a policy is it? It's a norm that if it ever was followed, isn't followed anymore and hasn't been followed since I start editing here about 4 years ago. Mglovesfun (talk) 20:52, 27 August 2013 (UTC)[reply]

Where is the guideline or policy? WT:NPOV only addresses the language variant used in our working copy, and not the choice of lemmas (and it is quite non-neutral in its use of “both US and UK standards,” which is broken in two or three ways). —Michael Z. 2013-08-30 18:11 z

Ugh, another page that hasn't been updated much since 2007. If you can improve the bit about "both US and UK standards", be my guest.

Following several other discussions, there was this discussion, following which I updated WT:AEN. It now says: "If a word is spelled differently in different standard varieties of English, the spelling (that is, the entry) which was created first is made the lemma; to avoid unmaintainable duplication of content, other spellings soft-redirect to it." - -sche (discuss) 19:59, 30 August 2013 (UTC)[reply]

I have reverted the change. I oppose the policy and do not see consensus. --Dan Polansky (talk) 20:09, 30 August 2013 (UTC)[reply]

...and I've undone your reversion. The page was updated to reflect current practice following several discussions, including the BP discussion I linked to. Your edit not only re-instated something that hasn't been current practice for a long time, but also rolled back several unrelated improvements to the grammar and wording on the page. - -sche (discuss) 01:19, 31 August 2013 (UTC)[reply]

I won't engage in an edit war at WT:AEN. As far as I am concerned, your edits at WT:AEN are not supported by consensus and are not policy. --Dan Polansky (talk) 07:06, 31 August 2013 (UTC)[reply]

I don't know of any such policy in Wiktionary. --Dan Polansky (talk) 18:35, 30 August 2013 (UTC)[reply]

It's not a policy, but I, for one, follow it and recommend it frequently, as established practice. Partly, it's part of common wiki courtesy: don't make major changes to someone else's work without a good reason. This is a debate for which both sides have good arguments, and neither can win decisively on the merits. There are problems when a term uses one form and a compound including the term uses the other, but on the whole, it's beneficial in preventing edit wars and endless debates. Since it's not an official policy, it has more flexibility in cases where a term is much more rare in areas with one of the spellings, or where there are other factors. Chuck Entz (talk) 20:12, 30 August 2013 (UTC)[reply]

I don't follow this would-be policy. If a spelling is much more common than its alternative, and if it is also fairly common in U.K. per Google Ngram Viewer, I feel free to standardize on the most common spelling. --Dan Polansky (talk) 20:16, 30 August 2013 (UTC)[reply]

A case where we did not consider what entry was created first is at Talk:yogurt#RFM. --Dan Polansky (talk) 20:19, 30 August 2013 (UTC)[reply]

Since -sche (talk • contribs) has decided to have his way at WT:AEN, which I don't believe is supported by consensus, I have opposed his change at Wiktionary_talk:About_English#Entry_created_first_to_be_made_.22lemma.22. --Dan Polansky (talk) 07:13, 31 August 2013 (UTC)[reply]
Any principle or policy that is based on the direct personal preferences for or the relative commonness of any spelling among our current crop of participants or contributors is yet another step in making this project more inward focused and less focused on the population of users. I think our standard ought always be based on facts of relative frequency or size of the populations that our host and its financial supporters believe we should be trying to reach. Further I believe that there may be good reason to differentiate by region between different spellings and cultural content in usage examples and in terms used in the definiens. It is no accident that many dictionaries have US editions that differ in content from the editions sold in the UK and that Macquarie is the standard in Australia. If we could support it, I would favor more differentiation by region. I would favor more effort to differentiate our content by region in hopes of getting more contributors to help us correct our defective content. DCDuring TALK 13:35, 31 August 2013 (UTC)[reply]
- Everyone agrees that a logical rule would be better than an arbitrary one. The problem is that there's no consensus on what that rule should be, nor is there ever likely to be one. Your concept is just another in a long line of reasonable suggestions that are met with other equally reasonable suggestions and go nowhere. All the compromises suggested so far have failed for practical reasons, so we're stuck with using an arbitrary, but pragmatic way to minimize time and goodwill wasted on edit wars and arguing. Chuck Entz (talk) 17:38, 31 August 2013 (UTC)[reply]
  - Do you think the problem is that we don't have enough Javascript talent (ie, not enough of the time of such talent as we have or not the skills) to improve the user interface or that our talent is unwilling? In any event, I see little benefit from this particular proposal, as opposed to a general effort to improve English entries, especially of fundamental words, so that we might match the quality of at least RHU and AHD if not MWOnline. We can and probably have exceeded them in quantity and, especially, currency. I used to find it fun to try to improve quality, but not since the "improvements" in our widely applied templates. DCDuring TALK 18:00, 31 August 2013 (UTC)[reply]

superoptihupilystivekkuloistokainen, eat the wind, albifrons edit

These are our three oldest RfV discussions. Can we wrap them up (and be done with 2012)? Cheers! bd2412 T 02:53, 27 August 2013 (UTC)[reply]

Keep superoptihupilystivekkuloistokainen, it’s useful; delete eat the wind, I don’t think it is used in English and that meaning does not fit the phrase in English thinking; and keep albifrons, I like it. —Stephen ^(Talk) 08:55, 27 August 2013 (UTC)[reply]

The debate on superoptihupilystivekkuloistokainen seems to be whether the Finnish version of Mary Poppins is a well-known work per W:CFI#Attestation. I have no opinion on the matter, so I haven't commented. Mglovesfun (talk) 20:48, 27 August 2013 (UTC)[reply]

Navigational popup gadget +default edit

Hello! I recently had the pleasure of discussing a link creep problem whereas there are places when people that come foe a specific piece of information have to follow multiple links to get it. This is discouraging to the reader and may cause them to give up on the site and go elsewhere for their information. I would like to propose that MediaWiki:Gadget-popups.js be enabled by default in MediaWiki:Gadgets-definition (Navigation_popups|popups.js ⇒ Navigation_popups[default]|popups.js). Doing so would greatly reduce the amount of clicking needed to get the wanted information and improve use of the site, in turn encouraging more new readers and potential good editors. Thank you for your time to entertaine my proposal! Happy editing! Technical 13 (talk) 18:51, 27 August 2013 (UTC)[reply]

Wikiversity templates edit

Hi!

I have created a second Wikiversity template: Template:wikiversity lecture and placed it on the Wiktionary word planets as an example for consideration. The usual Wikiversity cross-wiki template is Template:wikiversity which states that "Wikiversity has information: word". Comments, criticism, yeas or nays, welcome. There are pros and cons to having only the general one and more specific ones such as for lectures, courses, resources, etc.--Marshallsumter (talk) 18:32, 30 August 2013 (UTC)[reply]

Search results page edit

I just searched for dílis and got the following announcement on the "404" page:

dílis is a Irish translation of the word dear ("precious to or greatly valued by someone").

It's great that our 404 page gives us information like this, but would it be possible for it to detect whether the language name starts with a vowel sound, and if so, to write "an" rather than "a"? —An gr 11:29, 31 August 2013 (UTC)[reply]

If not, maybe it could just say "(word) is the (language) translation of"? Or "In (language), (word) is a word for" / "In (language), (word) is a translation of"? - -sche (discuss) 20:24, 31 August 2013 (UTC)[reply]

You can also get fun things like:

ord för ord is a Swedish translation of the word word for word ("in exactly the same words").

I would suggest something along the lines of "[language name] [term] is given as a translation for [English term]". Chuck Entz (talk) 21:15, 31 August 2013 (UTC)[reply]

I added a check for whether the language name starts with a vowel. Hopefully there aren't a whole pile of exceptions... --Yair rand (talk) 19:34, 15 September 2013 (UTC)[reply]

British/American/etc spellings, redux edit

Remember when we had a discussion a few weeks ago about duplicating content across spelling variants? We're having another one. See Wiktionary talk:About English. - -sche (discuss) 20:22, 31 August 2013 (UTC)[reply]

Change to scripts in Category:Ainu language edit

Category:Ainu_language says that Ainu is written in katakana (actually "Katakana") but doesn't mention the fact that it's frequently written with the Latin alphabet and historically was written in Cyrllics (though perhaps not any more). I tried looking into the template, but quickly got confused. Can someone help me with adding these scripts? --BB12 (talk) 06:02, 1 September 2013 (UTC)[reply]

The scripts that a language is written in are listed at Module:languages. Does that help you? —CodeCa t 11:47, 1 September 2013 (UTC)[reply]

Thank you! I've now figured out how to get to that page and have added my request to the talk page :) --BB12 (talk) 14:41, 1 September 2013 (UTC)[reply]