Open main menu
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit

January 2009

Questions concerning the use, naming and placement of inflection, declension and conjugation templates for FL languages.

A couple of years ago there was a flurry of criticism against inflection templates such as the Swedish ones as they does not clearly enough separate "inflection line templates" from "declension templates"[1]. I really wouldn't have bothered about it had it not been that I too find the Swedish template names (and parameter use!) somewhat less intuitive than they would have to be, to put it mildly. Hence, I have for quite a while been thinking about how to rectify the situation, but I am realizing that there are nowhere any clear directions on how FL inflection/declension templates are to be treated, what kind of structure I should aim for. And as I don't want to do this more than once, I would like some indication which direction I should aim for before I make any more work on the upgrades and reworkings of the Swedish templates. Important to point out is that Swedish uses a relatively low number of forms: up to: 3 for the adverbs, 8 for nouns, 13 (or 17, if one opts for better comprehensiveness) for verbs and up to 14 for adjectives. Still, that is mostly too many to fit in the inflection line (though one could make a selection of forms to present on such a line). Hence, a few questions (but first some observations):

  • At present, I see:
    1. an "infl-line"-solution, exemplified by the {{infl}}-template and the various English templates.
    2. an "infl-table"-solution which uses the whole page's width. This is also used by the same English templates, but is by default hidden from the user (I found out now; I had forgotten that I had edited my .css to see them.)
    3. a right-floating table, used by all Swedish templates (these are the ones which were criticized, back then) and German declension templates.
    4. a table under a heading of its own, mainly used by languages with a very large number of different forms.
  • WT:ELE only, as far as I can find, mentions case 1 and 4, though presumably 1 and 2 could be considered equivalent from that point of view.
  • My questions would then be:
    1. Should one always present a few forms on the inflection line?
    2. Should one retain the right-floating tables? (Or should one use an ====Inflection==== header no matter how little additional information, compared to the inflection line, would ever be present there?)
    3. If the answer is 'yes' to both questions, could one then use the same template to display them both? (So that one doesn't end up as in Apfel with two consecutive templates doing IMO very similar things.) However this would "blur the separation of the two template types". [1]
    4. When it comes to naming: *if* all should be used, the inflection line templates should merely be {{sv-<PoS>-<class>}} and the inflection/declension/conjugation *table* templates should be {{sv-<decl>/<conj>-<pos>-<class>}}, right? Conj for verbs and decl for nouns and adjectives?
    5. How many forms would be acceptable to squeeze into one inflection line?
      I think {{en-verb}} displays up to 5; I also think 8 would be too many to fit in, so perhaps one could restrict to 3 adverbs (that's all of'em) and adjectives, up to 4 nouns (that would mean 1,2 or 4 additional forms in the declension table whereever one would put that - right-floating or under a header) and finally 5 verb forms (which means that for a number of verbs the conjugation table would present up to 2 (or 4) new forms, which wouldn't be present on the inflection line already, the rest would present up to 8 (or 12) new forms.)
    6. Finally, what is really the intention of the css classes infl-inline and infl-table in the context of an FL? Are they intended to be useful at all for any language other than English? Could they be made useful by making one of them hide/display the more extensive declension/conjugation tables?
      Part of the reason I ask this is because as I have tried to sketch here, I don't think it is very useful for anyone, really, (when it comes to Swedish) to see *both* the brief inflection line information and the complete inflection table - either you need all the information given in the table *or* it is sufficient for you to see the gist of the inflection pattern, as given in a inflection line. My first idea was thus to give both the inflection line and the inflection table, *but* that's no good as it is quite difficult for the average reader to make wiktionary display the complete tables (.css editing seems to be a must?) Or should I leave those classes to the English templates and, say, create another solution for hiding the complete tables by collapsing them? Would that be an acceptable compromise even if they are still right-floating?
    • Now, if the answer to number 2 is 'no', then I guess one would have to return to the old situation where the Swedish entries "all" used an ====Inflections==== header. Personally, I'm not very fond of this, as it definitely would require the entering of two templates + one extra header almost every time - and then have very little information below it. Presumably one could avoid it for the adverbs (only 3 forms), but that would be a very small benefit. The second possibility - to skip that section for uncountable nouns, proper nouns and periphrastic or absolute adjectives - would add the extra drawback of making it inconsistent within the language+PoS-combination. Else the (hypothetical class name, I haven't got around to really decide on the updated format of the noun templates yet) {{sv-noun-unc-n}} and {{sv-noun-decl-unc-n}} would display the same 4 forms, but in different layouts and at different places.
    • Of course one could argue that one could use different solutions for different PoS's, but I am doubtful that it would benefit the reader to find a single language's inflection information at various places in the entry for different entries. If s/he is used to find it under an Inflection header and constantly finds it there for adjectives, verbs and nouns, why should s/he look elsewhere just because it's an adverb? No, I strongly think that any solution will have to be as consistent over the entries of the language as possible.

I hope this discussion will yield something tangible for a Wiktionary:Conjugation and declension templates page to go with the Wiktionary:Inflection templates page, linked from the ELE, so that users who wants to add to the infrastructure of inflection/declension/conjugation templates for new languages could see what's expected of them and their templates.

Thanks for your patience with my (almost) never-ending writing... :P \Mike 23:12, 1 January 2009 (UTC)

For languages and POSes where there are too many forms to put them all on the inflection line, but not too many forms to fit in a right-floating table, I think the right-floating table idea is a good one. But, it shouldn't replace the inflection line, which all of our entries have. For example, I think yttrandefrihet looks good.
Personally, I don't care too much whether the inflection line is built into the template that generates the table, or created using {{infl}}. If the latter, then {{sv-noun}} should be renamed to {{sv-decl-noun}} or something. (The name "sv-noun" makes it sound like it creates the inflection line, like with {{en-noun}}, {{fr-noun}}, and so on.) —RuakhTALK 20:30, 2 January 2009 (UTC)
I dislike the right-floating boxes, and don't think they should ever be built into the inflection line. All too often, images need to be placed in a POS section, and these are typically placed ahead of the inflection line. This causes severe problems if there is a right-floating inflection table competing for that spot. I also constant run across such pages where the right-hand table extends down into the following language section, which is visually confusing. I prefer to always see inflection included either on the inflection line when there is very little to display (more than 4 or 5 items becomes messy), or else in an Inflection / Declension / Conjugation section. Including the explicit section with header puts additional information into the page's TOC, which can be very helpful on long pages with multiple languages.
"Should one always present a few forms on the inflection line?" I think so in most languages and situations, but only if it either (1) helps to summarize the forms or (2) is a complete and concise listing. Some other users here disagree, and would rather that no inflected forms appear on the inflection line in certain languauges, for reasons I understand but don't necessarily agree with for those languages. There are some languages (e.g. Japanese) where only forms in other characters are given on the inflection line, and some languages (e.g. Polish) where only the gender of nouns is given, and some African languages where the class of nouns is given, but not the inflected forms. I'm not sure that a general agreement can (or should) be reached that applies uniformly to all languages, except perhaps as a principle of "make the inflection line a summary". --EncycloPetey 23:48, 2 January 2009 (UTC)
I generally agree with EP on this. Right hand templates are completely unacceptable, and I very much want to see them all go away. They so often get in the way as EP notes. I think that if a word has four or fewer forms, putting them all in the inflection line is fine (so adverbs in this case probably don't need a dedicated inflection table). Anything more should be in an inflection table under its own header (there's debate between inflection and declension/conjugation. I prefer the former, as I think it is more workable and I think the latter makes an unnecessary and sometimes difficult distinction, but others disagree). I think that inflection templates should always be collapsable (again, others disagree with me on this, but I have yet to be swayed by their arguments). Then again, I feel like everything except language, part of speech, and definitions should be collapsable, but I may be drinking at my own party on that one. Anywho....the more I think about it, the more I am beginning to think that any entry with a dedicated inflection template should have zero inflection data inside the inflection line, however this is basically the opposite of current practice in most cases. I think a good rule of thumb for current practice in inflection line use would be, a quick and dirty version, often mimicking what traditional dictionaries put in their entries (e.g. Ancient Greek nouns have genitive singular next to nominative singular here, which is basically what all paper grc dictionaries do). So, to make sure I answer all your questions:
  1. Not necessarily, but most entries currently do.
  2. Right handed tables are a bad idea and must go.
  3. See above, but if right-handed tables must be kept, they need to be more flexible than this, so they need separate templates (although it might work to have the option of having the inflection line template be able to call the full inflection template).
  4. Yes.
  5. Four, at most five.
  6. I don't understand CSS enough to really answer this. However, I might note that in the latest incarnation of grc inflection templates, the basic info is presented in the collapsed version, basically mirroring the info presented in the inflection line.
Hope that helps. -Atelaes λάλει ἐμοί 07:47, 3 January 2009 (UTC)

Broken templates

Someone has recently changed the templates for tenses so they no longer include a link around the target word. Is this correct? Sometime ago I was admonished for entering some without this link.

Also they now expand incorrectly, no longer having the '*' at the beginning. See insnares, insnaring, and insnared which I just entered using these (broken?) templates. - dougher 02:38, 2 January 2009 (UTC)

These templates seem to work fine - unless you are referring to a template that is not on the page? The format of these "Form-of" pages varies wildly depending on which language you edit, but they generally should look something like the following to allow for a reasonable amount of consistency:
# {{present participle of|[[verb]]}}

If you don't want the hassle of typing this every time, you can enabled Accelerated creation by ticking the appropriate box on WT:PREFS. Conrad.Irwin 02:43, 2 January 2009 (UTC)

I was using the buttons on this page They have worked fine forever. If the policy is now to use the Accelerated creation thing, shouldn't this page be fixed?

All those missing parts of the insnare pages (English, Verb) that you point out are missing are normally created automatically by clicking on the red links for the missing tense pages -- what happened to those red tense links, they're broken too? - dougher 02:52, 2 January 2009 (UTC)

Those buttons work fine for me (as in they create the parts that were missing from yours), so I don't know what the problem was for you - maybe you just hit them on a bad day for the software. Which red tense links are you talking about? Conrad.Irwin 15:53, 2 January 2009 (UTC)

"singular delative of" or "delative singular of"?

What is the recommended order of words in form of entries for nouns? First the singular/plural, second the case name? Or the opposite? --Panda10 23:27, 2 January 2009 (UTC)

For Latin, I have consistently been using the sequence: case, gender, number for adjectives, and: case, number for nouns. The case has more variability in most languages, and is more often important for translation, so listing it first gives it the extra attention to help our users. --EncycloPetey 23:36, 2 January 2009 (UTC)
Hmmm...I have been using (gender) case, we're agreed on nouns at least. :-P -Atelaes λάλει ἐμοί 19:13, 6 January 2009 (UTC)
I just tried google:"the nominative masculine singular" and the other five permutations, then likewise for google:"the accusative feminine plural", with the following results:
nom. masc. sing. hit counts acc. fem. pl. hit counts
c. g. n. 1460/83 9
c. n. g. 2000/109 7
g. c. n. 1140/124 109/21
g. n. c. 279/67 58/15
n. c. g. 104/18 2
n. g. c. 8 4
(where foo/bar means that foo was Google's first-page estimate, but bar was its last-page estimate; I'm assuming bar is more reliable). All told, it looks like some orderings are better preferred than others, but we have some flexibility. Going just by the numbers, I'd definitely use gender-case-number (as Atelaes suggested), but we don't have to go just by the numbers if we think EP's reason for case-gender-number is a good one.
RuakhTALK 21:35, 6 January 2009 (UTC)

Category for possessive noun forms

Where should we put Hungarian possessive noun forms, e.g. házam (my house)? In Category:Hungarian noun forms or in a separate Category:Hungarian possessive noun forms? --Panda10 19:29, 4 January 2009 (UTC)

For Hungarian, the latter is probably appropriate, since there is a whole series of additional suffixes used to create the possessive forms. However, I can't imagine there would be much value in having that separate additional category. --EncycloPetey 19:38, 4 January 2009 (UTC)
Are you saying someone should or shouldn't create the separate category (IMO however, it should be a subcategory of Category:Hungarian noun forms)?50 Xylophone Players talk 19:44, 4 January 2009 (UTC)
I'm leaving the question open of whether to do this. I wouldn't normally want to see such a category, since in most languages it wouldn't be worthwhile, but Hungarian is unusual in this regard, and may have reasons for such a category. --EncycloPetey 23:21, 4 January 2009 (UTC)
At this point we have not come up with a useful way to categorize inflected forms, and the only strategy which has been adopted with any regularity is to relegate them all to "Category:language noun forms", which can become utterly monstrous in highly inflected languages. So, I would advise being adventurous in this sort of categorization until we find a useful standard. However, I would advise using templates, so that these many forms can be easily categorized in the future, should it be deemed necessary. -Atelaes λάλει ἐμοί 23:27, 4 January 2009 (UTC)
I like the idea of a template in the inflection line which would put the possessive noun form into a category. What would be a good name for the template: hu-noun-posform? --Panda10 23:34, 4 January 2009 (UTC)
That seems reasonable to me. -Atelaes λάλει ἐμοί 23:38, 4 January 2009 (UTC)
I'd recommend hu-noun-form-poss(essive) instead, since it keeps the base template name. --EncycloPetey 23:45, 4 January 2009 (UTC)
By the way what about Category:Hungarian noun forms? Should it perhaps be split into separate categories —one for each case, maybe— or what? As it stands it seems to be 34 form ofs (going into this category) per lemma. Also (I've been wondering about this for a while) why is the essive-formal line always blank? 50 Xylophone Players talk 00:38, 5 January 2009 (UTC)
It is not always empty. See . If we go for a separate category for each case, we need another template to handle it. --Panda10 00:45, 5 January 2009 (UTC)
Ultimately, the most useful, long-term solution is this: Have two templates: One template for the inflection line, and one for the definition line. The inflection template does little more than bold the headword, and the definition template works similarly to {{inflection of}}. Have the the definition line template take parameters such as n, m, s, e, etc. (make it work for Hungarian, however you think best), and spits out "nominative singular of x" or whatever. Use them both in all inflected form entries. Thus, if we decide we want all inflected forms in a single huge cat, we can turn categorization on in the inflection line template. If we decide we want more specific categorization, we can turn inflection off in the inflection line template, and on in the definition line template. Since the definition line template will have all the information it needs to do any sort of categorization, we can adjust it all at the template level. Maybe we want all nominative singulars in a specific cat, and genitive singulars in another; the template can do that. Maybe we want all inflected forms from a specific lemma in a "Category:inflected forms of lemma" category; the template can do that. Maybe we want both; it can still do that. -Atelaes λάλει ἐμοί 00:57, 5 January 2009 (UTC)
Sorry, I meant essive-modal, but anyway why is it often blank and why does have no essive-modal plural? 50 Xylophone Players talk 03:10, 5 January 2009 (UTC)
PalkiaX50: Not every noun can use the -ul/-ül case ending, and even if it is used in singular, it may not make sense in plural.
Atelaes: Thanks for your suggestions, I will work on it, it may take a few days, though. --Panda10 12:51, 5 January 2009 (UTC)
I created two templates, please see bankot:
  • {{hu-noun form}} - displays the bolded page name; it should be used in the inflection line
  • {{hu-inflection of}} - dipslays the definition (e.g. accusative singular of <lemma>); it does not handle possessives, only the regular cases
Should I worry about the lack of wikilinks on the noun form page? I think AF is handling them lately, correct? --Panda10 23:52, 5 January 2009 (UTC)
Personally, I think it would work better if you made these changes:
  1. Don't make it automatically generate a link to "nom. sing.#Hungarian; that way you could type {{hu-inflection of|[[bank#Hungarian|bank]]|acc|s}} without getting a truly ugly result hence keeping the page looking perfect and counted in the statistics.
  2. It would be nice if there were more shorthands, e.g. for superessive, essive, translative, etc. 50 Xylophone Players talk 01:00, 6 January 2009 (UTC)
Disagree with PalkiaX50 on 1, per Wiktionary:Page count. Agree with 2. -Atelaes λάλει ἐμοί 04:15, 6 January 2009 (UTC)
With regard to 2; the template {{inflection of}} deliberately does not use additional shorthand because they are not as generally applicable across languages, but for a language-specific template like the new {{hu-inflection of}}, there should be real benefit in having more shorthand parameters. I strongly suggest, from experience, that you carefully plan out the shorthand coding. You want to be sure they're easy to remember, don't cause confusion by seeming to be short for something different, and (ideally) be a fixed number of characters (3 or 4) to make them easier to remember. It may not be possible or feasible to meet all of these criteria, but I mention them in case you hadn't yet considered them. --EncycloPetey 05:56, 6 January 2009 (UTC)
Thanks for the suggestions. I created a table with the proposed shorthand for each case. Please take a look at User:Panda10/Inflection. I am planning to create categories for each case. This will help identify errors (entries that were defined erroneously and have to be corrected). Example category name for accusative: Category:Hungarian noun forms - accusative - will this work? These categories would be under Category:Hungarian noun forms. --Panda10 23:26, 6 January 2009 (UTC)

Historical/ Legendary Fictitious Proper Nouns

Are proper nouns that appear in Historical literature or legends (i.e. The legendary sword Hrunting in the elegy of Beowulf) potential candidates for word entries? Just wondering. --Dictionman 22:40, 6 January 2009 (UTC)

Yes, for a couple reasons. Famous, notable works, such as Beowulf get special privileges as far as citations go, and so a single cite from Beowulf (and perhaps a notable English translation of it) would suffice to justify inclusion. Thus, the Old English word absolutely, the English equivalent probably. -Atelaes λάλει ἐμοί 22:57, 6 January 2009 (UTC)
I agree that we should have Hrunting, but I don't know if I agree that appearance in a well-known work, taken alone, is enough justification for that sort of proper noun that refers to a unique entity. Appearance in a well-known work counts for attestation, which means that actual words from Beowulf should be included, but I don't know that it just wipes away all the other CFI. :-/   —RuakhTALK 00:19, 7 January 2009 (UTC)
My mistake in the title, I meant my request to go only for legendary and renwoned widespread proper nouns solely, not words is general. So, may I have permission to create an article for Hrunting and Naegling (Beowulf's other sword)?--Dictionman 00:32, 7 January 2009 (UTC)
I have serious reservations about this; IMO entries of this type should be restricted to either a) proper nouns that have entered literary lexicon as bywords for something or other, or b) proper nouns occurring in some closed set of works that we can all agree on (good luck...).
I note that Wikipedia has articles for w:Hrunting and w:Naegling, as it should. Any etymological/philological scholarship that has been done on these names can surely be summarized in the Wikipedia articles. Since not even "Hrunting" seems to have entered the modern lexicon in any significant way, I'm not sure what we could add to the information these entries provide. -- Visviva 02:09, 7 January 2009 (UTC)

Danish nouns

I've been thinking about changing {{da-noun}}, and made a proposal in User:Leolaursen/da-noun, with some examples of use in User:Leolaursen/sandbox. Is this a step in the right direction? and is it worth the temporary breakage of about 1000 pages?

Also i attempted to prepare for the use of Accelerated, so is this done the right way? – Leo Laursen – (talk · contribs) 18:47, 7 January 2009 (UTC)

The new template is missing the plural definite, is that deliberate? The Accelerated creation should now work for your new template (sorry about the wait), thank you for adding the class names yourself. Let me know if it's not perfect. Would it be possible to fix the 1000 pages with a robot? Otherwise it's a lot of work for a human. Conrad.Irwin 09:57, 9 January 2009 (UTC)
The plural definite is left out on purpose, because it is only an added "-ne" in the majority of cases, and the few exceptions will be shown in the inflection table (please see User:Leolaursen/da-noun-infl).
I don't see green links (yet?), but accelerated is enabled and works for eg. Dutch nouns.
Using a bot might work, taking arg. 3 and 4 unchanged, and either finding gender from arg 1 (en/et) or keep g. – Leo Laursen – (talk · contribs) 11:34, 9 January 2009 (UTC)
The bot is not needed, I made a wrapper, so the old version is called, if the fifth argument exists. Accelerated, unfortunately does not work. – Leo Laursen – (talk · contribs) 10:34, 10 January 2009 (UTC)


It's nice that this template standardizes fromat across the several SI measures where it's used. However, it gives all numerical quantities in scientific notation, which I can say from experience the majority of Americans don't understand. Using this template, a decisecond is defined as "An SI unit of time equal to 10−1 seconds", when saying "one-tenth" or "1/10" would be understandable by a far greater percentage of our users. Is there a simple way to modify the template to provide a parenthetical English or fractional value? --EncycloPetey 00:17, 8 January 2009 (UTC)

User talk:Visviva/SI would be one way to do it; not sure what the best layout is. Maybe the scientific notation should go in parentheses? -- Visviva 09:31, 8 January 2009 (UTC)

Proposals for Philippine languages

I think this is the best place to start asking for advice and permission. So, I shall now ask about:

Templates such as:

1. Verb conjugation Typical Phil. verbs, particularly in Tagalog, are made from nouns attached by a patterned set of prefixes and suffixes. I see this as similar to the Spanish conjugation system--as these verbs have a table for each word, could Tagalog words have them too? The templates would be helpful for automatically listing the "verbed nouns" in one place, and in itself would portray how verbs work. Finally, it would be convenient for linking inflections onto their root words. Like in the Spanish verbs, the tables would be mainly on the root words, then stemming out for their individual meanings. Further information about Phil. verbs could have its Appendix page also.

2. Descendants Perhaps a template would work better here. The template words would be listed into the existing "(Phil language here) words derived from Spanish" pages.


A talk page where plans for the Phil. languages could be, and where this may be best created in; I'm thinking it could be linked from Category:Languages_in_the_Philippines. It would be where someone willing to help would first look and see how his language is progressing. This would have a to-do list, discussion on other improvements, appendices that would be helpful for editing the Phil. languages, and anything that would help.

What do you think?

Absolutely. This is all good. Generally, every language is encouraged to have inflection templates set up, and if you don't feel up to the task of writing them, or need some tips on standard formatting for them, we're quite happy to help with that. I'm a little uncertain as to what you're talking about with number two. Certainly every Spanish word is allowed a "Descendants" header, where Filipino/Tagalog/whatever else can be placed. And certainly every word can have an etymology section, where its etymon/etyma can be listed. As for a central page, I suggest creating Wiktionary:About Filipino and/or Wiktionary:About Tagalog. What might be easier is to simply create an example page and we can work out formatting in a more concrete fashion. -Atelaes λάλει ἐμοί 07:09, 9 January 2009 (UTC)
Thanks for responding, Atelaes! For the descendants, I always type the language and word, (==Descendants== * Filipino: manibela, something like this), and I thought the diff. Phil. languages could have their own "{{}}" thing to list the Spanish words down somewhere on another page. For now the list of Phil. language words from Spanish are composed of existing terms being in a category. I want the descendants to be listed somewhere before they're made. I suppose this is unnecessary, but I thought it would help to keep them standardized. The Sandbox seems like the perfect place for formatting. Thanks again! --Icqgirl 08:09, 9 January 2009 (UTC)
Well, you're certainly welcome to write a bunch of Philippine words under the Descendants header of a Spanish word, even if they haven't yet been written. Also, if you like, you can write an appendix, such as Appendix:Greek words with English derivatives, but these are generally seen as a tool for getting data into entries, not as a final product. -Atelaes λάλει ἐμοί 08:17, 9 January 2009 (UTC)
For a guide to adding Descendants, take a look at the Latin page for vermis (worm), which lists many Descendants in various languages. Note that there is a red link for Occitan because that entry has yet to be written. This is perfectly acceptable, and you can do the same thing. You will also notice that the Italian and Portuguese descendant words are the same, but that the {{l}} templates (that's a lower-case "L") allows each word to link directly to the corresponding language section of the target page. --EncycloPetey 18:36, 9 January 2009 (UTC)

User:Conrad.Bot and Indices

It struck me that what was originally a way for me to upload indices without flooding recent changes is now more like a bot for generating indices. Are people happy to allow Conrad.Bot to continue uploading the Indexes for Galician, Hungarian, Italian, Irish and Spanish, or should I call a VOTE? Conrad.Irwin 02:24, 10 January 2009 (UTC)

Go for it! —RuakhTALK 02:40, 10 January 2009 (UTC)
The index in this format is extremely helpful. Thank you for doing this. One question: the main page of the Hungarian index is somehow left out of the refresh process, this is the page that contains the total number of entries in the index. Would it be possible to include it in the refresh process? --Panda10 02:49, 10 January 2009 (UTC)
Yes, this now happens, but note the count has jumped up because it now indexes red-links from translation tables too. If you would actually like a count of the number of pages, I can probably generate this as well. Conrad.Irwin 02:06, 11 January 2009 (UTC)
It would be nice to have all the information. Would this format (or similar) make sense to you: "The 10773 terms (4000 red links, 6773 blue links) on this page..." --Panda10 13:12, 11 January 2009 (UTC)
Likewise, this is page maintenace, not the creation of new material or entries. You might go for a formal vote if you plan to expand to additional languages (which would be nice), just to put the Bot on the books, so to speak. And if expansion can be done easily, then it would be helpful to have a designated place for specific language requests, especially for helpful notes on odd characters and what alphabetical order means in that language. I'd be particularly interested in seeing additional Romance languages indexed, like Catalan, Occitan, Asturian. --EncycloPetey 03:00, 10 January 2009 (UTC)
I am happy to extend this, and, thanks to recent restructuring it is much easier to extend. If you direct requests to my talk page it is most likely to be noticed, but I will (at some point) set up User:Conrad.Bot/Indexing giving information about what goes on (though there is so much mucky code loitering around this process that I'd rather not upload all 15 scripts that get involved - that would probably be less use than a human-readable description anyway). Conrad.Irwin 02:06, 11 January 2009 (UTC)
What happened to grc? We had all the rules worked out well, and then you just dropped me, like a cheap hooker. I'm hurt. :-P -Atelaes λάλει ἐμοί 05:26, 10 January 2009 (UTC)
It's now there(ish), I'm afraid I've been irresponsible and not left myself enough time to fix the header, will do that in 10hours or so. See you later. Conrad.Irwin 11:19, 10 January 2009 (UTC)
'Tis very much appreciated. My self worth is now a great deal more secure. Thanks. -Atelaes λάλει ἐμοί 05:04, 11 January 2009 (UTC)
Like Ruakh and (resp.) EP, I, too, would like to see this continued and (resp.) extended to other languages. All languages, in fact; why not?—msh210 21:31, 12 January 2009 (UTC)
It seems to me that there's fairly strong support for this from the community and no opposition. I suggest that Conrad should feel quite free to create indices for any languages he has the motivation for. I would like to eventually see all languages have indices which are all updated on a regular basis, but clearly that's a fairly massive undertaking, and will probably take time. Also, different languages will have different sorting criteria (as I'm sure is Conrad already aware). In my opinion, an imperfect index is better than no index (and is more likely to provoke discussions leading to a perfect index). Since this is the creation of a series of pages, and not a change in policy, I see no need for a vote, but if Conrad would feel more comfortable with one, I'd be happy to create it. -Atelaes λάλει ἐμοί 22:25, 12 January 2009 (UTC)
Ok, I'll continue to create indices. I'd prefer to be creating pages I know are being used, it means I can keep track of what's going on; so if you would like an index, please ask for it on my talk page and give me an indication of how sorting/splitting should work. I will get around to doing the English one when I've worked out how to split letters in half nicely (it's currently too big to fit on a single wiki page per letter for some letters as the software has excessive memory usage for rendering links). Conrad.Irwin 23:48, 13 January 2009 (UTC)
Suggestion: find a print copy dictionary, especially the Compact OED, and count pages used for a particular letter of the alphabet. Then see what divisions happen near the place where that section is divided in half, thirds, fourths, fifths... or whatever is necessary given the quantity of data for that letter. As a rough guide, you could look at the navigation template at the top of Category:English nouns, but I don't know if that would be enough for dividing up the whole of our English entries. --EncycloPetey 20:30, 16 January 2009 (UTC)

"Plural form of xx." vs. "nominative plural of xx"?

This was probably discussed before, but can we rethink the standardization of wording and punctuation of form-of lines? Is it all lower case, no period? Or should it start with a capital and end in a period? Should the plural contain "nominative", since normally a form-of entry would contain the case? Sometimes the different formats appear on the same page in multi-language entries and it doesn't look good. --Panda10 15:00, 10 January 2009 (UTC)

Concerning the nominative: does it make sense to add that word for languages which doesn't separate nominative from other cases? No, I don't think one should specify that one has to give exactly "case and number" for noun forms; some languages won't see the point to include the case (which simply would mean to add the word "nominative" to every single inflected form), other would want to include more information (such as Swedish and the definiteness with the nouns). On the other hand, if your question is "Given a language with several cases, should one always specify the case, and not be allowed to use 'nominative' as default", then I agree. But on the other hand, I don't think that would do much for the consistent appearance of multi-language entries, would it? \Mike 16:24, 10 January 2009 (UTC)
Actually, there are two questions: wording and punctuation. I agree with you that adding nominative to the plural form does not make sense in English. There is still the question of punctuation. --Panda10 16:33, 10 January 2009 (UTC)
Yes, I silently skipped the punctuation issue as I don't really have a preference for how it should be done (well, short of "standardized") \Mike 17:14, 10 January 2009 (UTC)
Standardarizing the capitalization and punctuation may not be possible, because many editors do not use the inflection line alone. Some add a translations in front of the "form of", some add it after, and some just rely on the link to the lemma. I prefer uncapitalized, no period, but I'm sure there are others who prefer a period and/or a capital letter at the start. The {{inflection of}} template can be made to capitalize (if desired) by explicitly adding the first descriptive word in capitalized form, if this is necessary for a specific page. However, it can't be set to automatically capitalize the first word because there is no way to tell which of various possible choices may come first in sequence. That is, the first word could be gender, case, number, or something else. We can't specify that one of these items always come first, because not all languages or words require that particular item to describe the language's grammar. This is why I prefer no capitalization; it eliminates a messy and unnecessary difficulty. Because it's not capitalized, and because there may be following parenthetical text, I prefer no punctuation either. --EncycloPetey 18:59, 10 January 2009 (UTC)
Regarding "nominative": If it's specifically the nominative plural, then yes, it should say so. (The lemma in relevant languages is usually the nominative singular, so it's tempting to say that this is just the plural of the lemma, but that's misleading: it's actually the nominative plural of the word as a whole, and we just happen to use the nominative singular form to identify the word. Likewise, feminine singular adjectives should be identified as singular as well as feminine, and so on.)
Regarding punctuation: I prefer a capital letter and a period, but it doesn't bother me too much if other people do it differently. Consistency would be nice, but is probably too much to hope for.
RuakhTALK 20:59, 10 January 2009 (UTC)

Spanish combined forms

I want to propose some guidelines for adding these. I'd like this to be somewhat more strict than the thousands of Italian combined forms that we have. First, require a quotation for every form. This will cut down on just adding them en masse without attestation. Second, require a general English translation. Also, would it be worth it to make {{compound of}} for entries? For an example, see aceptarla. Nadando 21:32, 10 January 2009 (UTC)

We already require a quotation for every form, don't we? I agree that having an English translation is necessary, but if someone is only willing to add the entries without such, then I think that that's better than nothing.—msh210 21:27, 12 January 2009 (UTC)

Dutch language categorization

The category Dutch adjectives contains random Dutch adjectives, and Dutch adjective forms contains the same. I'm proposing they are merged, I am not sure how Wiktionary works, but if it's similar to the off-site Wiki I contribute to, categorization in the 'parent' may cause issues displaying their subcats when full of entries themselves, I do not mind having the pages in the Dutch adjective forms category, but I think the two should at least be merged.
I'd also like to see Category:Dutch abbreviations, Category:Dutch initialisms and Category:Dutch abbreviations, acronyms and initialisms merged (since the latter suggests they contain the abbreviations...) unless, there is a reason why they're in the (near identical) different categories? -- 6Sixx 06:00, 13 January 2009 (UTC)

The Category:Dutch abbreviations, acronyms and initialisms is the master category into which separate categories should exist for abbreviations, acronyms, and initialisms. This is done across all languages. The master category will only contain entries that have yet to be properly categorized, but the master category exists for both these uncategorized words and to group the subcategories. The contents of the subcategories will each be very different. --EncycloPetey 06:34, 13 January 2009 (UTC)
That explains it; then shouldn't Category:Dutch abbreviations (etc) be inside Category:Dutch abbreviations, acronyms and initialisms? -- 6Sixx 12:41, 13 January 2009 (UTC)
Yes, it should. --EncycloPetey 02:01, 16 January 2009 (UTC)
From what I understand, Category:Dutch adjectives should contain lemma forms while Category:Dutch adjective forms should contain inflected forms. For instance, vriendelijk is a lemma form, while vriendelijke, vriendelijker, vriendelijkere, vriendelijkst, vriendelijkste are inflected forms; see also the table at the right of vriendelijk. --Dan Polansky 10:09, 13 January 2009 (UTC)
Hmm I did not think of that. I think the naming could be a lot better, but I assume it is like that for consistency between languages... -- 6Sixx 12:41, 13 January 2009 (UTC)

mots quotidienne

(Inspired by Visviva's analysing the print editions of the NYT, Guardian, and several journals, plus my need for some code for Wikamusi (sw.wikt), I wrote something to read the on-line articles from a number of FL print newspapers)

I have been creating lists of word in the on-line articles of print media in Italian, French, and Spanish. Several people have looked at the Italian so far. The idea is to collect the words that are appearing in the daily editions, but we don't have in the wikt. These are of increased interest either because of frequency or because of the current news, thus more likely to be looked up. It also finds words that appear to exist, but don't have needed language sections, e.g. horas has (as of this writing) no Spanish section.

These three languages work well because a large number of inflections exist; this wouldn't work as well for others. So the coverage is quite high (98+% for Italian, not enough data on French and Spanish yet).

See User:Robert Ullmann/Español, User:Robert Ullmann/Français, and User:Robert Ullmann/Italiano. Any feedback much appreciated. Robert Ullmann 17:57, 13 January 2009 (UTC)

In French, c', j', m', qu' are contractions of resp. ce, je, me, que. There are also l', n', r', t' , I don't know if you remove them or there just wasn't any occurence the 12th.
(BTW, if the title is in French, there are some mistakes in it ;o) )
Koxinga 20:47, 13 January 2009 (UTC)
I forgot to say that the end result is very good for French. Most of the words are real words, and useful ones at that ! Koxinga 20:49, 13 January 2009 (UTC)
Out of curiosity, what's r' ? I've never seen it in French (but I'm definitely not an expert). Was it a typo for s'? Equinox 22:38, 13 January 2009 (UTC)
Well, I forgot if I had any real word in mind at the time of typing, but considering s' isn't in my list, it may well be a typo. Let's forget about r and replace it by s' . Koxinga 23:30, 13 January 2009 (UTC)
I put in a few simple contractions (that is why you didn't see l' for example, and I already had s') will refine the list. The section title is just j'amuse. Robert Ullmann 23:53, 13 January 2009 (UTC)

Script template change may break old user styles

Please read this if you see a sudden change in font style for foreign scripts.

I've just removed some old class names from the style sheet MediaWiki:Common.css, e.g., .AR for Arabic, to be replaced by the ISO-15924-compliant .Arab. Everything should continue to work as before, unless you have old-style class names in your monobook.css. If so, you can restore the old behaviour by replacing them with new class names.

The affected classes are:

  • .AR → .Arab
  • .FA → .fa-Arab
  • .KS → .ks-Arab
  • .KU → .ku-Arab
  • .OTA → .ota-Arab
  • .PA → .pa-Arab
  • .SD → .sd-Arab
  • .UG → .ug-Arab
  • .UR → .ur-Arab
  • .HY → .Armn
  • .BN → .Beng
  • .RU → .Cyrl
  • .EL → .Grek
  • .scHebr → .Hebr
  • .KM → .Khmr
  • .LO → .Laoo
  • .TE → .Telu
  • .TH → .Thai

Let me know if there are any problems. Michael Z. 2009-01-13 22:07 z

Mystery editor

I recently received a message on my talk page from someone who claims that he created a number of Min Nan entries before being blocked. The account is User:Sven70. Can anyone help shed light on the issue. Specifically, which Min Nan entries should I be looking at? Thanks. -- A-cai 23:14, 15 January 2009 (UTC)

Take a look at Special:Contributions/Sven70. The edits were all reverted, and no L2 was ever introduced. -Atelaes λάλει ἐμοί 23:24, 15 January 2009 (UTC)
Looking at his deleted contributions is far more instructive. When blocked, he continued to make garbled edits and to post hate comments under a variety of IP addresses. He is also permanently blocked on Wikipedia (within three days of his welcome). --EncycloPetey 02:00, 16 January 2009 (UTC)
Looks as though he's acquired speech recognition software, and has made great progress towards using it. --EncycloPetey 17:18, 19 January 2009 (UTC)

enPR renamed unilaterally on English Wikipedia

I have posted to WT:ANI regarding admin abuse over the unilateral renaming of the enPR material on Wikipedia as "non-controversial". --EncycloPetey 09:42, 16 January 2009 (UTC)

The incident was closed without action. So, Wikipedia no longer has a page about enPR. --EncycloPetey 20:43, 17 January 2009 (UTC)

jocular or humorous?

I only recently noticed that {{jocular}} had been redirected to {{humorous}}, although this actually occurred several months ago. This bothers me somewhat. I understand that "humorous" is a more common word, but it differs in its primary meaning, being oriented to the reader/listener rather than the author/speaker. In this regard, I'm not sure we should be tagging usages as "humorous" at all, since few things are more variable from one person or context to another than whether something is perceived to be funny. ... If I use a jocular insult that results in my being stabbed to death, it is fair to say that my use of the term was not at all humorous, but it was jocular all the same. ... If "jocular" is too obscure, I wonder if we could perhaps use a closer synonym, maybe {{joking}}. -- Visviva 10:54, 16 January 2009 (UTC)

I beg to differ: given that _all_ tags are (theorically) oriented toward the speaker rather than the listener, there is no ambiguity, and there is no case to be made at all that a word can have one, but not the other tag. Furthermore, both words explicitly use the other in their definitions, which one would tend to consider confirms the synonymy. Circeus 19:50, 16 January 2009 (UTC)
I disagree. Many of the context tags provide information about how a word is used or intended, while others provide information about how it is received. A tag that says (Australia) indicates that a person using the term is likely to be Australian, and says nothing about the person reading or hearing the word. A tag that says (proscribed) indicates something of how the word will be received, and advises a potential user.
In the case of jocular vs. humorous, I agree with Visviva that there is a difference in the shades of meaning between the two terms. The term jocular describes intent of the speaker; a person hearing the term/sense may not even be aware of the humor involved. The term humorous implies a reaction on the part of the person hearing/reading, and not necessarily in accordance with the intent of the writer/speaker. I find many things humorous that were not at all jocular. A student who asks for the answer to a question, immediately after that answer was given aloud to the class, may have his question considered humorous, but the question was not jocular. Or consider, I find the terms irregardless and nosegay humorous, but neither is necessarily jocular. --EncycloPetey 20:19, 16 January 2009 (UTC)
My point is that nobody would expect "humorous" to mean "will be funny regardless of what was meant" (which would almost certainly the case of most "jocular" words anyway, cf. wuv). Circeus 05:40, 17 January 2009 (UTC)
But you do understand my point that "humorous" does not say anything about the intent of the speaker/writer. Things can be humorous solely on the part of the hearer/listener, and what we're trying to communicate with the context tag is something of the intent in using the word. --EncycloPetey 20:33, 17 January 2009 (UTC)
I agree strongly with this and think you stated it well! Imagine if we put sarcastic on every word that is liable to be used sarcastically. The whole point of humour and sarcasm is that they are deliberately "twisted" or unexpected usages requiring some thought by the listener/reader; they are creative rather than dictionary-prescribed. Equinox 20:37, 17 January 2009 (UTC)
I guess this varies from person to person. I normally think of "humorous" as having only the objective sense, so that overrides what I would otherwise expect in a context label. This gives me an idea, though -- could we just change the label on {{humorous}} to say used humorously? That might still be a little obscure for folks like me, but I think we could figure it out. -- Visviva 02:19, 17 January 2009 (UTC)
That might work. I'd like to hear opinions from a few more people though, in case there's an issue I'm not aware of. --EncycloPetey 03:29, 20 January 2009 (UTC)

as soon as possible

I would like to create the entry "as soon as possible", which is now a redirect, but I sense it is disputable as being sum of parts. Even if it is sum of parts, the entry seems valuable to me. It strikes me as a set phrase—a common expression whose wording is not subject to variation; the property of this being a set phrase is witnessed by the existence of the internet initialism ASAP. Unfortunately, WT:CFI does not have a provision allowing for set phrases. The phrase "as far as one knows" seem to come into the same bucket of set phrases that are sum of parts.

When I, a non-native, read the set phrase in a given text, I can understand it. But when at earlier times I went in the other direction, from the meaning to the phrase, I knew how to say this idiomatically or in a standard way only because English textbooks explicitly documented the phrase. That is, without the documentation or before I have learned the phrase actively enough, I would have ended end up saying things like "Please call me back at the earliest time you can" or "Please call me back as soon as you can".

What do you think of me creating the entry, then? Does its creation require a modification of WT:CFI? Is there a discussion of the topic of SoP set phrases that I have missed? --Dan Polansky 09:57, 17 January 2009 (UTC)

I agree that as soon as possible is a set phrase and warrants an entry, though I don't agree with all of your comments about it. (Acronyms don't always indicate a pre-existing set phrase; and "as soon as you can" is perfectly ordinary English, as is "at your earliest convenience".) —RuakhTALK 17:06, 17 January 2009 (UTC)
It seems like there's a good case for it as a "Phrasebook" entry; many other languages also have formulaic ways of expressing the same basic idea. -- Visviva 17:10, 17 January 2009 (UTC)
And of course there's "Quick as you like!" - meaning "Now!". Pingku 17:53, 17 January 2009 (UTC)
All right, created. I agree that the existence of an acronyms only suggests but does prove that a phase is a set phrase. I have noticed there is the {{set phrase}} template, and used it in the entry. --Dan Polansky 06:51, 18 January 2009 (UTC)

Multiple Alternative Spellings

If a word has multiple alternative spellings, should the words be placed in one row (e.g. sadhe) or listed vertically (e.g. kris)? I believe there is a sentiment that space above the definition is considered "prime real estate", so one row is better? --AZard 16:28, 17 January 2009 (UTC)

I'm an advocate of above-the-fold screen-space conservation. There is less justification, IMHO, for taking up lots of space for Alternative spellings (and forms) than for Pronunciation and Etymology. I'd favor some space-reduction approach for both of them, too, but there is definitely a lack of consensus on how and possibly opposition to the very idea of it. For Alternative forms and spellings, there seems much less opposition. DCDuring TALK 16:43, 17 January 2009 (UTC)
Personally I list them vertically, unless some spellings are groupable for whatever reason (e.g., they differ only in capitalization or hyphenation or the like). Consistent horizontal listing seems weird to me, because the bullet-point suggests a vertical list. —RuakhTALK 17:01, 17 January 2009 (UTC)
In the absence of specific policy guidance, I have been listing them vertically. There are cases where a form is followed by a parenthetical qualifier such as (UK), and this seems more likely to be visible if the items are arranged vertically rather than horizontally. However, despite my being an opponent of over-compacting the Etymology and Pronunciation sections, I can see real merit in having a way to collapse the Alternative spellings/forms section when there are more than a couple of items listed. Earlier periods of English employed myriad different spellings of some words, and in many cases these other spellings are now obsolete. I somewhat favor the idea of a collapsible box for this section (as is done for Related terms), to be used when more than X terms are present, and think X=2 or 3 would be a sensible cutoff. --EncycloPetey 20:42, 17 January 2009 (UTC)
Bullets are the de facto standard, although as Ruakh says, if a group share a certain characteristic (e.g. are all archaic), they could probably be put on one line.
IMO we should reconsider the placement of ===Alternative spellings===; there are powerful structural reasons for the placement of ===Etymology=== and ===Pronunciation===, but those reasons don't seem to apply to alternative spellings. The default placement often results in ontologically incorrect arrangements, where a spelling that actually pertains only to one sense, POS, or ety is placed so as to appear to apply to the whole language section. -- Visviva 04:45, 18 January 2009 (UTC)
It is permitted, in situations where the spellings do not apply to all parts of speech, to place the section at L4 under tonly the relevant POS. It is permitted because we do it, and there is no guiding policy at all. The ELE makes no explicit recommendations about this section at all. Its preferred location must be inferred from the example, which is the only place in that document that it is mentioned. Even Wiktionary:Alternative spellings does not mention that this is a possible section for a page, but rather discusses only the items that may be regarded as alternative spellings. --EncycloPetey 05:07, 18 January 2009 (UTC)

I agree with EncycloPetey that once the number of alternative spellings exceeds two or three, we should house them in a rel-table. However, for both functional and æsthetic reasons, they ought to be listed verically, not horizontally. See slave#Alternative forms and enmity#Alternative forms, in which cases no other præsentation is practical.  (u):Raifʻhār (t):Doremítzwr﴿ 19:02, 18 January 2009 (UTC)

The use of {{rel-top}} conflicts with the floating right table of contents under my circumstances. The difficulty is that, if the window is narrow enough all content is pushed below the right-hand-side table of contents. That would seem to argue against any features that did not have nice word wrap and any fixed-width tables or similar features. To see the problem (if you use right-hand toc, go to slave and narrow your window. Also expand the show/hide, even if you do not get the white space. DCDuring TALK 16:32, 19 January 2009 (UTC)
I don’t use a right-hand ToC, and don’t know how to use one, so I don’t understand the problem.  (u):Raifʻhār (t):Doremítzwr﴿ 16:54, 19 January 2009 (UTC)
What is "floating right table of contents"? --AZard 16:50, 19 January 2009 (UTC)
Most users will see a table of contents in the upper left of a page that is long enough to generate one. The TOC pushes all text on the page to begin after the TOC. Some users dislike this default setup, and have arranged for a customization that puts the TOC in the upper right corner of the page and has the text begin in the upper left, alongside the TOC instead of below it. This is what we mean by a "floating right table of contents". DCDuring is pointing out that having collapsible tables near the outset of the page interferes with this customization, which is why some (like myself) have never opted to use the right-floating TOC. It can interact badly with images, collapsible tables and other items. Nevertheless, some people prefer this customization and have grown attached to it. --EncycloPetey 17:17, 19 January 2009 (UTC)
By default, the table of contents floats to the left, but it can be changed to float to the right, freeing up a great deal of space. See WT:PREFS. Hopefully this or a no-TOC view will become standard in the future. -- Visviva 17:11, 19 January 2009 (UTC)
I agree with EP that a vertical listing is preferable, as this allows for notes on them (in my experience, alt spellings are very rarely arbitrary). I also agree that a collapsible box is a good idea. I sort of agree with Visviva on reconsidering the placement, but with some caveats. Under our current nesting scheme, if an alt spelling applies t o multiple POS's, an L4 alt spelling header is a poor choice, as it would force the duplication of information. A below the fold L3 might be acceptable, but I almost feel as though it would be a bit nonstandard (although I will admit that when an inflection applies to multiple POS's or etymologies, I have been placing trailing L3 inflection lines, in lieu of repeating the information). However, when an alt spelling applies to only on POS (including when there is only one POS), I would be ok with an L4 header being the standard. -Atelaes λάλει ἐμοί 02:45, 19 January 2009 (UTC)
Summary: There seems to be consensus that a vertical list of more than 2 or 3 alternative spellings is an inefficient use of above-the-fold space. There seems to be two potential solutions: 1) collapsible box - vertical list (at the sacrifice of right-handed TOC) or 2) below-the-fold (Level 3 header for most entries. I found an existing example: tire-pressure. Level 4 for specific POC or etymology.) Solution #2 affects all entries with alternative spellings, not just those entries with multiple alternative spellings (and would require a vote to modify the ELE). Did I miss anything important? --AZard 21:38, 19 January 2009 (UTC)
If a user is already looking at the entry for the primary spelling of a word, why would a user care about alternative spellings (or variant spellings)? If we feel users place a high value on alternative spellings, then the collapsible box makes more sense. If it's more like trivia, then below-the-fold makes sense (L3, maybe right above Anagrams? L4, maybe right above Synonyms?) I've looked at a few dozen definitions with alternative spellings; I'm leaning towards trivia. What is your opinion? Valuable or trivia? Any other ideas on helping us pick one over the other? --AZard 21:38, 19 January 2009 (UTC)
According the NielsenNorman Group ("NNG") there is an argument that, from perspective of a user who has come to a main entry from an alternative spelling, it is useful to have something on the page that corresponds to what the user had possibly entered, like the small-font line for redirects. This argues against concealing alternative spellings under show/hide bars or having them below the fold. OTOH, most users know about the "Back" button, as NNG also points out. Our users are probably mostly browser-smart, so I wouldn't mind seeing the alternative spellings above just above Anagrams. DCDuring TALK 22:52, 19 January 2009 (UTC)
In English, the alternatives are usually (1) archaic spellings, (2) slight variations, or (3) regional differences. I don't think anyone is terribly concerned about how placement would affect the first two situations, so let's ask ourselves what it would mean if color/colour had the alternative forms explanation moved to the bottom of the page. I don't have an answer to that; I'm suggesting a narrower focus on a situation where the potential for real impact exists.
DCD, your suggestion of "just above anagrams" assumes that the alternative spellings apply equally to all parts of speech. Whatever we decide must provide for situations where the alternatives apply to just one part of speech or to many parts of speech. If we lump them all in a single end section, that presents problems in cases where the alternatives apply only to senses under a single etymology or single part of speech.
We also need to consider what this would do to CJKV languages, where (as I undersatnd it) the alternative forms can be very, very important to the user. I'd like to hear from people who work in those languages before we carry a particular line of thinking too far. --EncycloPetey 03:26, 20 January 2009 (UTC)
FWIW, I care about the first situation as well as, probably, the second situation.  (u):Raifʻhār (t):Doremítzwr﴿ 18:27, 22 January 2009 (UTC)
The suggestion of Azard which you attribute to me makes no assumptions whatsoever. OTOH, I made some. I would like to reanalyse. The Wiktionary-usage scenarios that I believe ought to be foremost in our mind are of occasional mostly non-contributing users (mostly unregistered), either English-language-mostly or English-language-learners. I assume that a user comes to a given entry seeking the meaning of something read or heard.
If coming from something read, then the user has particular spelling in mind. When the user enters the spelling and comes to an alternative spelling entry, the user's search might be over upon seeing the main word for which the spelling was the alternative. (It would be useful that the alternative spelling entry itself contained any context information about its use, notwithstanding the duplication-of-information problem. Bot updates or transclusion would be nice for this purpose.) Otherwise, the user has the main entry itself as a resource, including the "also" line at the very top. If the user knows the word under one of its alternative spellings we save the user time by having the alternative spellings on top. That would suggest that all and only spellings currently used by a significant(?) fraction of ordinary users need be listed at the top of the page. It would also suggest that they not be concealed under a show/hide bar. Space conservation would argue for horizontal arrangement. In this scenario, some users would probably be saved a modest amount of time from having context information adjoining the spellings. Because most context information is brief or even terse (eg, US, UK, Australia), I would argue that even three alternatives would often fit on one line.
If coming from something heard, then there is an intermediate step possible if a user's naive phonetic or semi-phonetic spelling does not correspond to a lemma or one of the alternative spellings. There is also potentially the confirmatory value of the Pronunciation section material the user finds accessible.
What this leads me to is the desirability of keeping all the useful confirmatory material (the top "see" line, current alternative spellings, pronunciation, and toc (the toc possibly abbreviated)) visible at the top by default. -- IOW, our current position! -- (Whether the assumption of rapid and broad accessibility of the Pronunciation information is warranted seems beyond the reach of facts at our disposal.)
Historical information, such as older alternative spellings, does not seem to be justified in taking large amounts of space by this argument. Information of principally scholarly interest should certainly be included, but not take up above-the-fold space by default. Whether a space-conserving show/hide bar for older alternative spellings should appear beneath the alternative spellings header, under Etymology, or at the bottom (because of incompatibility with right-hand table of contents) I don't know. DCDuring TALK 18:22, 22 January 2009 (UTC)
IMO, the best solution would be to have any list of alternative spellings exceeding two in number tucked away in a rel-table and the ToC collapsed by default for unregistered users.  (u):Raifʻhār (t):Doremítzwr﴿ 19:40, 22 January 2009 (UTC)
Why? DCDuring TALK 23:17, 22 January 2009 (UTC)
Certainly don't hide them if they can be fit on one or two lines. Seems a shame to grab a heading above the fold, and then give the reader nothing, when a dozen or more terms separated by commas could be offered in the same space. Collapsible boxes on the page should be a last resort, especially above the article core. Michael Z. 2009-01-23 00:17 z
I don't understand how alternative spellings are confirmatory. Because we don't use redirects, someone searching for an alternative spelling will get their confirmation on the alternative-spelling page. If they click on the link for the main entry, they will presumably be expecting information on that spelling, not the one they entered originally. -- Visviva 00:13, 23 January 2009 (UTC)

Certainly exhaustive lists of historical forms don't need to be visible by default. These seem to me to belong with in the “Etymology” section, rather than “Alternate forms” which comes first for practical reasons. Even if some historical forms aren't ancestors of the modern form, they belong in the etymology for comparison.

We have missed the simplest of all alternatives: plain lists without bullets. This may not always be the most appropriate form, but even complex lists with groups or comments can be listed in a sentence or paragraph. Some of the above-linked examples can be remade as below. (What ae α-form and β-form?) Michael Z. 2009-01-22 22:21 z

Alternate forms

sade, sadi, tsade, tsadi

Alternate forms

crease, creese, keris

Alternate forms

[skl]-initial α-forms of the 14th century include sclaue; 15th c. sclaue, sclave; 16th c. sclaue, sklaw, sklaue, sklave. [sl]-initial β-forms of the 16th c. include slaif, slaue, and the modern form slave, 17th c. slaue and slave, whenceforth the modern spelling predominated.[1]

  1. ^ slave, n.1 (and a.)” listed in the Oxford English Dictionary [2nd Ed.; 1989]


This is a rather subjective thing, but I hate the word ‘filmology’. Apart from anything else, it's not in any dictionary I own. At the least, it is a specifically American term and I wonder if there isn't anything more inclusive. What about just ‘Film’? The other problem I have (because I work in television) is that a lot of these terms pertain to TV as well, but I can't think of a Category name that incorporates all that. Any thoughts? Ƿidsiþ 20:22, 17 January 2009 (UTC)

Though perhaps too broad, there's always "Visual media". That incorporates both film and TV without adding in too much other stuff. —Leftmostcat 20:27, 17 January 2009 (UTC)
I had the same initial reaction when the category first appeared, but have to come to accept it as a necessary evil. If there is another appropriate term, I don't know what it is. "Visual media" is too broad, since it includes several additional artforms. "Film" is more restrictive and is ambiguous since that word has a physical properties sense of a "membrane". "Cinematography" is restricted to only certain aspects of filmmaking. So, the best I could suggest is "Filmmaking", but I'm not sure that term is quite right either. --EncycloPetey 20:30, 17 January 2009 (UTC)
I would go with Category:Film myself, or Category:Cinema if that is too ambiguous. I don't think it is ambiguous, though; IMO any science/tech category would have to be at "Film processing" or similar, not just "Film". "Filmology," which I understand as the academic study of film, might be appropriate for words like auteuristic, but not for words like gaffer. (I think our current definition of filmology is substantially incorrect, see e.g. [1].) If neither "film" nor "cinema" are acceptable, as a last resort we could perhaps use "motion picture industry"...-- Visviva 04:36, 18 January 2009 (UTC)
I could go with Cinema. There's some ambiguity, but the translations in most major languages that are cognates to cinema refer to the industry of making movies. I'd prefer it over Film, whose cognates in most languages would refer only to a particular print of a flim or the physical medium on which it was produced. --EncycloPetey 04:43, 18 January 2009 (UTC)
Thanks to the inscrutable magic of {{context}}, {{film|and|TV}} renders correctly: (film and television). May not be the most elegant solution, but it gets the message across, and looks a little better than plain {{film|TV}}. -- Visviva 04:40, 18 January 2009 (UTC)

I'm exploring wiktionary. I found significant error on Wiktionary logo. "a multilingual free encyclopedia". Wiktionary is not encyclopedia but dictionary. Who can change that? Best regards.--Kwj2772 07:17, 19 January 2009 (UTC)

That is not an error. Notice the text is in llight gray because it is the definition for the preceding term in the list (not visible). The definition of Wiktionary is in black. --EncycloPetey 07:21, 19 January 2009 (UTC)
This is the first time that I realize that that logo is supposed to depict a boldened list entry... __meco 13:57, 19 January 2009 (UTC)

I still think we should find a logo that matches the logos of the other projects by being shades of blue and roundish. There were a bunch of good ones proposed using speech bubbles, yet we continue to use the boring text-only version because of some protest of how the vote was organized. --Arctic.gnome 18:43, 23 January 2009 (UTC)

Linking of language name in translation section

Why should some language names be linked in the translations section[2]? __meco 13:28, 19 January 2009 (UTC)

See WT:TOP40; the most common and familiar language names, including the ones clearly associated with a country, are not linked, and the others are. (There used to be exactly 40 in the first table.) But not something anyone needs to think about too much, as AF will make the adjustments. Robert Ullmann 14:12, 19 January 2009 (UTC)

Category for IPA entries

Could we have a hidden category displaying all word entries that have an IPA entry, for each language? That would assist me immensely in getting the hang of the IPA thing and surely lead to a lot more word entries getting an IPA entry. __meco 14:02, 19 January 2009 (UTC)

I don't understand. How would that help? Each language uses a different collection of symbols, and a language like English does not follow predictable patterns in pronunciation as related to its spelling. What language(s) are you seeking to be able to code IPA for? --EncycloPetey 17:10, 19 January 2009 (UTC)
You can use CatScan to search for entries in your desired language's category (with some depth to allow for different parts of speech) by use of the template IPA.—msh210 17:13, 19 January 2009 (UTC)

Links to Indices

User:Panda10 raised a good point on my talk page. It would be nice to link entries back to the language indices, where such exist. It would also be nice to support "previous alphabetical" and "next alphabetical" links on language page entries - though we should treat these ideas as seperate for now. Are other people of this opinion? Conrad.Irwin 23:38, 19 January 2009 (UTC)

Yes and no. I agree that a link to the index is a superb idea, and I agree that a link to previous or next is a superb idea iff there's no link to the whole index; and otherwise it's a good idea anyway. I do not, however, agree that the ideas should be kept separate for now.—msh210 21:11, 20 January 2009 (UTC)

In terms of implementation, it would probably be best to include a small link under the language name (much as the prominent interwiki link preference does). This could be added by javascript, or by inserting a template onto each page under the correct language heading. Other possibilities include adding it in a ===See also=== section, and, though I hate them, a right floating box. Conrad.Irwin 23:38, 19 January 2009 (UTC)

A tiny link right under the language heading sounds good to me.—msh210 21:11, 20 January 2009 (UTC)
If we choose to implement this, then I agree with msh210. --EncycloPetey 22:12, 20 January 2009 (UTC)
Does {{seeindex}} meet with general approval? It would take two edits per new lemma form creation to maintain all these links, which is not a huge number, but also not an insignificant number. If we were to link only to the Index, it would only need adding to each page once, so only reduces the workload by a third. It could also be done in javascript, but that is a less "nice" solution. Conrad.Irwin 01:08, 21 January 2009 (UTC)
I think {{seeindex}}, with the links to previous and next entries, would be fantastic in terms of helping us and our users see the lay of the land. The mutual invisibility of entries has been a problem for a while. On the other hand I am troubled by the thought of basically tripling* the number of revisions entering the database, and the number of write operations. I don't have the skills or knowledge to evaluate how many resources this would actually consume, but it seems like the cost over time (in server or other resources) could be non-trivial, particularly as the rate of new-entry creation continues -- it is hoped -- to increase. -- Visviva 02:32, 21 January 2009 (UTC) *OK, given the form-ofs and whatnot, not actually tripling, but multiplying by some non-trivial factor.
I think you are overestimating, we only need to edit on "creation of lemmas". Much less than an interwiki bot which must edit on the creation of any page on any wiki. Conrad.Irwin 14:09, 21 January 2009 (UTC)
I really like the way the links generated by {{seeindex}} look. How will this template handle the red links that have just been introduced in the index? Will it always point to the next blue link? Can you explain the maintenance? What are the two edits for each new lemma? --Panda10 03:14, 21 January 2009 (UTC)
This would be another discussion entirely, but I don't think it's a great idea to put redlinks in Index: pages. For one thing it violates the principle of least astonishment: indices normally list the content actually in a book, not the content that should be in the book. :-) IMO it would better to use Wiktionary:Requested entries or -- for particularly important groups of words -- a specialized hotlist. -- Visviva 04:42, 21 January 2009 (UTC)
The index was relatively invisible to the general user community and was mainly used for maintenance. So in this respect, the red links were very helpful to me. If we decide to bring the index forward and make it visible, then yes, it would look better with blue links only. As you said, Visviva, a separate category would work fine listing the red links with a pointer back to the page where they are mentioned. --Panda10 13:03, 21 January 2009 (UTC)
The words in the index are still all in Wiktionary (with the exception of the italic words in Index:Ancient Greek which all come from a list that was originally stored in that index at the request of the user who asked for that index) it's just that they don't all have their own entry pages. It doesn't bother me particularly whether people want these words listed in the indices, or whether they'd prefer I generated some Requests pages from them. The forward and backward links would have to link to the Translations section of the entry that includes the word; if they were going to include the red-links. This might be a bit confusing - so it's probably better if they ignore the red-links. Conrad.Irwin 14:09, 21 January 2009 (UTC)
Are the indices being automatically updated now? Or would that be part of this change? -- Visviva 02:32, 21 January 2009 (UTC)
I run scripts to update some indices (those that people have asked me for) as and when, I currently think about every fortnight or so is often enough. See User:Conrad.Bot for more gory details. The process evolved more than I programmed it, and that is really beginning to show; but until I find something that I can't just "hack on" to it, I'll leave it as it is. Conrad.Irwin 14:09, 21 January 2009 (UTC)

Surely there's an easier way to link to indices without individually editing each page? Maybe the links could be added to other common templates that are already on the page. Nadando 02:38, 21 January 2009 (UTC)

There is a possible template-based non-JS solution... If the index (or perhaps a list generated from the latest dump) were chunked into sizes small enough that the parser wouldn't choke on them -- maybe 50-100 words per chunk -- a metatemplate could be generated for each chunk which would contain the code for each word. So the code for {{metaindex-hungarian-absz}} or whatever would be something like:
These metatemplates could then be updated by bot on a daily/weekly/whateverly basis, without needing to touch the individual entries. (Of course, periodically the list would need to be rechunked, but if we're clever enough about the original setup that could probably be done with minimal pain.)
There may be reasons why this isn't a good idea, but I thought I'd throw it out there. -- Visviva 04:36, 21 January 2009 (UTC)
It's an interesting idea certainly, but I'm not sure it's actually any nicer than editing each page. Every edit to the "hundred-word" page would cause all hundred pages that include it to be re-parsed; and as you say, if re-chunking is necessary, then we still have the same problem. I suppose the "nicest" solution would be to have actual MediaWiki support for these indices and the links thereto; but I can't see that happening for several months even if we had a specification and the time to code it up. Conrad.Irwin 14:09, 21 January 2009 (UTC)
Is there a way to create a template that doesn't contain the prev/next word, but dynamically would figure out what the prev/next is when the entry page is displayed? --Panda10 13:03, 21 January 2009 (UTC)
My other thought was to put the list into onto the toolserver, and use Javascript to do a lookup; this has the advantage that the page doesn't ever need to be edited, but the disadvantage that it won't make "real" links; maybe a half and half solution is better; whereby the pages contain a link to the index, and then javascript can be used to generate forward and back links? Though, as I said before, editing two pages on the creation of one lemma-form entry is still very doable, particularly when you compare it to the much larger task done by the interwiki bots editing all pages on all Wiktionaries whenever any page is created. Conrad.Irwin 14:09, 21 January 2009 (UTC)
I like JavaScript best, on consideration (and I can see that my idea above would not really have improved things). The lack of edits is an important factor -- after all, just setting up a template-based system would require hundreds of thousands of edits, to say nothing of ongoing updates. But in addition, JS would allow all sorts of arbitrary customizations. If there are situations where it would be nice to, say, allow entry-to-entry browsing through all of the verbs in a language, or if there are people who would like to browse through a reverse index, etc., it would just be a matter of posting the requisite files and flipping a setting (either in the default skin or in PREFs/user JS). Could we host the JS-index files locally, or is toolserver better for this sort of thing? -- Visviva 03:45, 24 January 2009 (UTC)

The following is copied from user talk:msh210:

This is a relational operator, and is more like a verb than a preposition. We usually mark these as "Symbol" rather than trying to fit them to a part of speech. --EncycloPetey 23:20, 19 January 2009 (UTC)

It is both a verb and a preposition, and there are loads of examples for each. Why would one mark it "symbol" if it fits a POS perfectly?—msh210 23:21, 19 January 2009 (UTC)
Please provide an example of prepositional usage. I can't think of one, which means it doesn't fit a POS perfectly. --EncycloPetey 23:23, 19 January 2009 (UTC)
Added.—msh210 23:25, 19 January 2009 (UTC)
That's not a preposition, that's [negating adverb] + [comparative adjective] + [preposition]. --EncycloPetey 23:26, 19 January 2009 (UTC)
Huh? In English we'd read it as a phrase, yes, but if there were a word for it in English (as there may be in some languages) it would be a preposition. I mean, suppose "notlessthan" were a word. What POS would it have in "for all 'x' notlessthan 3"? Preposition, of course, like "over" in "for all 'x' over 3". So that's what is then.—msh210 23:31, 19 January 2009 (UTC)
As part of a phrase, yes, not as a part of speech. Since this is a Translingual entry, it would have to translate as a preposition into every language where it is used, but it doesn't even translate that way in English. Would you call "not older than", "not wider than", "not later than", etc. prepositions? No. They're all examples of a particular phrasal construction in English, and not one of those would merit an entry. They're all sum of parts, combining several parts of speech. We don't get to invent hypothetical words to justify a part of speech label. --EncycloPetey 23:36, 19 January 2009 (UTC)
I think the tradition here is to mark things as adverbs if they act as adverbs, as nouns if they act as nouns, as prepositions if they act as prepositions. Thus, for example, carry the message to Garcia is called a verb even though it is a long thing with a noun, a proper noun, and a verb (inter alia) in it; over the top is called an adjective even though it is actually a prepositional phrase (preposition plus the complement (or whatever it's called) of that preposition); and next to is called a preposition even though it's an adverb+preposition. Same here: is used a preposition, so it's a preposition, no matter how it splits up. (And note that it doesn't even split up!! Only its English gloss does. I'm saying that even if it would, it's still be considered a preposition, and a fortiori in the case at hand.)—msh210 23:41, 19 January 2009 (UTC)`
That is indeed the tradition for English phrases, but not for any symbolic Translingual entries. Also consider: Abbreviations, Contractions, Initialisms, etc. are categorized as such, and not as any particular POS. And you haven't demonstrated a function as a preposition; I still disagreee on that point. Yuo certainly haven't demonstrated it for German, Dutch, French, Italian, Japanese, and all other languages that make use of this symbol. --EncycloPetey 23:44, 19 January 2009 (UTC)
Would you like to have some beer with me and with a few of our mutual friends?—msh210 23:49, 19 January 2009 (UTC)
Additional opinions and suds could clarify the matter. --EncycloPetey 23:52, 19 January 2009 (UTC)

The preceding is copied from user talk:msh210. Please continue discussion here.

On whether to classify the ≮ symbol as a symbol or to assign it a part of speech: As a symbol. Reasoning: The symbol "+" is now classified as a symbol, although it could be classified as a preposition, following the example of plus. A similar consideration holds for other symbols in Category:Translingual symbols, including ⇐, ⇒, ⇔, ∀, and ½ — a meaningful part of speech can be determined for their rendering in words, but they are still classified as symbols. --Dan Polansky 08:46, 20 January 2009 (UTC)
I view such entries as stubs waiting to be fixed: the symbol header is fine if nothing else fits, but should be replaced if something does.—msh210 17:15, 20 January 2009 (UTC)
The current common practice at Wiktionary is to use the symbol header, as witnessed by the entries at Category:Translingual symbols. What you are proposing is a change in the common practice.
On the topic of whether the current practice should be changed, I do not see how changing the heading from one "Symbol" heading to several PoS headings is going to help anything. It is unclear to me how to determine the SoP (AKA lexical category) of such phrases as "greater than" or "sooner than"; neither strikes me as a prepositional phrase. It seems to me that the SoP of such phrases might as well be undefined. I am no export on the determination of SoP, but it seems not every subsequence of words in a sentence can be assigned a SoP; the string "Socrates is" comes to mind. --Dan Polansky 19:04, 20 January 2009 (UTC)
You mean POS ("part of speech", i.e. lexical category), not SoP ("sum of parts", i.e. not idiomatic). —RuakhTALK 19:28, 20 January 2009 (UTC)
Oops, there I go; right. And, quoting myself, "I am not export". Sigh. --Dan Polansky 21:02, 20 January 2009 (UTC)
I agree that "symbol" is better, since it seems to be a unit of meaning rather than of syntax. In English it can be a verb or a preposition; it can also be a noun (as in "≮ and < are complementary relations", which resembles mention, but does seem to be use when you really think about it: it's talking about the relation, not the symbol denoting it), and possibly other things as well. (And I'm not sure the preposition-like use is really a preposition; it seems more like an adjective that takes a directly construed complement. This is rare in English — most of our complement-taking adjectives use a preposition, like "full of ___", "concerned with ___", etc. — but there are examples, such as worth, that seem apposite. And since ≮ is also a verb, we could actually view its preposition-like uses as a participle, "not being less than ____".) And of course, other languages may give it completely different POSes. (BTW, this is neither here nor there, but I think next to actually is a preposition, at least in some cases; when we P-strand, we say "the statue next to which I was sitting", not *"the statue to which I was standing next".) —RuakhTALK 13:36, 20 January 2009 (UTC)
Not sure I follow. How is this different from set, which also is a noun, verb, etc., and which also has the characteristic that other languages give a different POS to some senses than we do (e.g., one adjective sense has ingesteld listed as a translation, but we list that as a verb)?—msh210 17:15, 20 January 2009 (UTC)
One way it is different from set in that set is an English word used in English in those ways. To treat a Translingual symbol entry similarly would be a nightmare, as we would have to justify any part of speech for each language in which the symbol is used. Sometimes (as you note) the translation of a word into another language results in a change in the part of speech. For example, although English considers language names to be proper nouns, Slovene regards them strictly as adjectives. In English, we use a noun for the concept of a year, but in Navajo they use a verb. An idea does not always have a universal part of speech assigned to it; the POS depends upon the norms of the particular language in which the idea is expressed. So, cannot be assigned a part of speech solely based on an analysis of its translation into English. --EncycloPetey 18:52, 20 January 2009 (UTC)
What EP said. —RuakhTALK 19:28, 20 January 2009 (UTC)
By the same logic (i.e., that it might have a different POS in some language), every ==Translingual== word should have no POS listed. (===Cardinal number=== might be an exception, since it's not really a POS.) What do we do, then, with all the taxonomic names we currently list as translingual proper nouns (or, sometimes, nouns), e.g., Lemmus lemmus?—msh210 20:21, 20 January 2009 (UTC)
Taxonomic names have internationally accepted guiding documents that stipulate things like: "The name of a genus is a noun in the nominative singular, or a word treated as such, and is written with an initial capital letter." (ICBN 20.1, 2006 ed.) The orthography and part of speech have been set by international agreement. --EncycloPetey 20:51, 20 January 2009 (UTC)
So lemme get this straight (and this is a question for y'all, of course, not just EP). Any ==Translingual== entry except taxonomic names and ===Cardinal number===s is automatically a ===Symbol===?—msh210 21:04, 20 January 2009 (UTC)
I don't see that anyone has argued for that. We have Letters and Abbreviations, for example, and a very few Translingual phrases and adverbs (all from Latin, and I think the POS-neutral "Phrase" is superior), but not much else that isn't a Symbol, numeric symbol, or taxonomic name. There are many items that are indeed given as "Symbol", such as chemical formulae (H2SO3), mathematical symbols, and paragraph and section markings, among others. I classify as "Symbol" those items that are not expressed with letters (or their equivalent) and are not an attempt to write a word in any traditional sense (although it might be possible to express the same idea with one or more words). Items that are Symbols typically have a written form that is abstract, geometrical, or... symbolic. It's a very fuzzy concept, the symbol, but in Indo-European languages it's a bit easier to distinguish a symbol from other categories of items, because our writing uses either letters or an abugida alphabet. Our definition of symbol seems to reflect that viewpoint. --EncycloPetey 21:21, 20 January 2009 (UTC)
Re: "By the same logic [] , every ==Translingual== word should have no POS listed.": Well, one small difference is that even in English, it's not obvious what POSes something like ≮ has — it can be used in a variety of POSes, and will be read differently to fit its use in a sentence. (This occurs even within a POS: it may be "is not less than" or "are not less than", for example.) But yes, that's a good point. It might be worth considering what kind of words and such are genuinely translingual, and coming up with appropriate parts of speech, such as "Taxonomic name" or just "Name". —RuakhTALK 21:22, 20 January 2009 (UTC)
Note: name is synonymous with proper noun when discussing a particular thing or entity. --EncycloPetey 21:28, 20 January 2009 (UTC)
Not necessarily. In English the names for particular things or entities are often proper nouns, and often non-count nouns. A proper noun in one language is not necessarily a proper noun in another (as in your Slovenian example above; those are still the names of languages, but grammatically they function as adjectives, right?). —RuakhTALK 21:49, 20 January 2009 (UTC)
Granted, but then they aren't names in Slovene, are they? They're descriptions. Slovene refers to languages by describing them, rather than naming them. If a word is truly the name of a particular thing, then it is a proper noun. So introducing a new POS header of "Name" is superfluous. And in the case of taxonomic names, the part of speech is specified (as noun) in the international documents that govern their formation, acceptance, and use. --EncycloPetey 22:09, 20 January 2009 (UTC)


Two years ago, we had this discussion about renaming the "AHD" pronunciation scheme, which led to a series of votes, culminating in this vote where we renamed it to "enPR". The main impeti for this, as far as I can discern, were that:

  • This is our own system, not the same as the one used by the AHD.
  • "AHD" is the name of another dictionary, so it seems weird to name our pronunciation system after them.

Now a Wikipedian, who shall remain nameless, has commented at Wiktionary talk:English Phonemic Representation and raised the possibility that this is in fact the same system as used by AHD. (He's also accused us of plagiarism, and purports to have notified the publishers of the AHD.) As far as I can tell, he's mistaken — I do see some differences, such as the second vowel of city (which we give with "i" and the AHD gives with "ē") — but the differences seem to be minor. By and large, our system seems to be very close to the AHD's. I think a lot of the similarities are standard — for example, there's a long tradition of ē for /iː/ — but I'm not sure if all are.

Is this something we should look into? If the system really is the same as the AHD's, should we consider either changing it, or returning to the name "AHD"?

RuakhTALK 03:03, 20 January 2009 (UTC)

We should update the table, as Hippietrail has begun doing. Most of the enPR symbols in the table were added last summer by two users, neither of whom had been with Wiktionary for very long at the time (one appears to have edited the chart among his first edits on Wiktionary, and the other had been here six months at the time). The update will require that people who use enPR notation insert the symbols they've been using, or that someone use a bot to extract a symbol list from all the calls to {{enPR}}. Additionally, there is some discussion on the table's talk page about problems with symbols listed in the table. Two of these issues have sat unresolved for some time, and a third one was started recently. There is a chart on Wikipedia comparing several phonemic systems that can be used to judge what other dictionaries have used, and how much difference exists between such publications. --EncycloPetey 03:13, 20 January 2009 (UTC)
It begs the question, why do we use it at all when we have IPA? (I can read neither, so it doesn't really bother me). If AHD assert their right to the alphabet they use, then we can't use it under any name - as forcing us to always display their name with it is not compatible with the GFDL. I strongly suspect that they won't, in which case it matters little what we call it, though maybe we should change the wording from "designed to be similar to" to "strongly based on" to reflect the only slight deviation. If the purpose of enPR was to be understandable to those who already know the system used by the AHD, should we not remove the differences that do exist, lest we cause more confusion? Conrad.Irwin 19:43, 20 January 2009 (UTC)
Our system is not identical to AHD's. Our system differs from that of the AHD at least as much as their system differs from those of other major dictionaries. There are a number of significant differences, and a few more potential differences have been under consideration for some time. We allow IPA, enPR, and SAMPA for Wiktionary entries, each for their own reasons. The purpose of enPR was to make pronunciations more accessible to Americans, where IPA is still a rarity and where dictionaries such as Webster's, AHD, and Random House have used an alternative system for decades. These systems all differ from each other, so making our system identical to any one of them does not actually improve anything. We already discussed the name and differences in enPR and had a vote on the name. Please see those old discussions for more. --EncycloPetey 20:01, 20 January 2009 (UTC)

I promised Ruakh that I would desist from the previous discussion he mentioned, as I agree it had become exceedingly childish, but I hope he won't mind me summarizing my opinion here. I'll try to leave childish argumentation behind.

EP has made repeated claims that EnPR is not the same as AHD. However, he has refused to back up that claim, despite a week of discussion as to what these alleged differences might be. (That's the last mention I'll make of EP.) As far as the symbols themselves are concerned, they are identical but for two minor differences:

  1. EnPR uses aʹ and a' for 1ary and 2ary stress, whereas AHD uses aʹ and aʹ. Note however that when EnPR was copied over to Wikipedia, it was with standard AHD stress marks, not the Wiktionary ones, and the difference was so minor it does not appear to have even been noticed.
  2. EnPR uses a period rather than a hyphen for syllable breaks. However, this appears to be a misrepresentation by the key, as the Wiktionary entries I've seen all use EnPR with the AHD hyphen.

EnPR has been represented as a compromise between AHD, Random House, and MW. However, if you look at where those systems differ, in every instance EnPR follows AHD. (See the comparative table at Wikipedia:Pronunciation respelling for English.)

Ruakh suggested above that "city" has a final <i> in EnPR rather than final <ē>. However, EnPR is not used for city. It is used for January, and there the final vowel is <ē>, as it is in the AHD. There may be slight differences elsewhere, such as whether the vowel of sing should be that of sin or of seen, but such differences (assuming they exist) would be in the application of the system, not in the transcription system itself, which would remain that of AHD. Besides, since these minor rules are not spelled out in the key, there is nothing to maintain their stability, and when in doubt people would be likely to follow the example of the AHD itself. And any such differences would have to be transparent despite not being covered by the key, and so effectively trivial. Trivial differences do not make a convincing case for the transcriptions being different. Furthermore, the AHD often gives more than one transcription to cover dialectal variation, which would subsume the kinds of details we're grasping for here.

So it would appear that EnPR and AHD are effectively identical, so close IMO they're equivalent to misspelling a word or two in the lyrics of a song and claiming that therefore they aren't a copyright violation. So yes, I do think this is blatant plagiarism.

Now, AFAIK you cannot copyright a transcription system, so we have every legal right to use the AHD transcription. And plagiarism isn't a crime, just highly unprofessional. The concern expressed earlier was that if we call it "AHD", that might be trademark infringement. This is the current situation at English Wikipedia, and I've directed the attention of the permissions dept. of Houghton-Mifflin to both Wikipedia and Wiktionary and asked them to comment on what we're doing. Of course, it's very possible they could care less.

Regardless of legality and the opinion of AHD, I believe the only honest thing to do is to either give the AHD full credit for their transcription system, or to develop one that is truly our own. I personally don't care which, but do have a few suggestions if we decide to go the latter route. We wouldn't have to change much. The macrons and breves on the vowels are learned by every schoolchild, so are universal among such systems, and they make up the vast bulk of the special symbols. All we would need to do is fiddle with the 'other' symbols:

  • Among the consonants, these are basically <KH> and <th>, as all the other digraphs are close to universal. The latter is ripe for change, as its formatting is lost when cut and pasted, confounding it with <th> (a complaint I've seen elsewhere). Most Americanist dictionaries distinguish the two TH's with formatting as the AHD does, or use a symbol like <th̸> that will get messed up by font rendering on many readers' browsers, and so are not ideal for computer transcriptions. The exceptions are COD, Cham, and AB, which all use <dh>.
  • Among the vowels, IMO the most intuitive change would be to make the rhotic vowels phonemic. That is, to use the same vowel symbol regardless of whether it has a following R. AFAIK, no Americanist dictionary does this consistently. The changes from AHD would be <ār> for <âr> and <ēr> for <îr>, as they're transcribed in the COD. (The other EnPR rhotic vowels are all phonemic.)

  • It might be worthwhile making secondary stress more distinctive, since very often it marks phonemic lack of stress (so-called "tertiary stress").
  • Finally, though this isn't currently listed in the key, it might be worth considering northern European <ö> to go along with <ü> for the rounded front vowels.

Changes like these would make EnPR as distinct from the published dictionaries as they are from each other, but yet IMO would still be readily accessible to someone like me who's been educated in US schools. Kwamikagami 08:53, 21 January 2009 (UTC)

a comment on the dissimilarity: it really is practically the same as AHD (pace "Our system differs from that of the AHD at least as much as their system differs from those of other major dictionaries"). If you look at the other systems like Random House & Webster, this will be obvious (and I have as I made the comparison chart in wikipedia).
(and a by-the-way question: I'm not clear on what the significance of me being one of the new wiktionary summer editors is.) Ishwar 15:04, 21 January 2009 (UTC)
it's easy to make the enPR chart different from AHD. Just do it. & then you can claim that it's significantly different but based on it and other dictionaries. Ishwar 15:07, 21 January 2009 (UTC)
The AHD's system has some trivial variations, which is seen in the paper version, on the nine-year-old website, using inline images to represent some characters, and the more recent version, which uses Unicode characters. Our “EnPR” system is practically identical to AHD's, even duplicating the graphical nuance of by Bartleby's obsolete image technique. (specifically, we all use various methods to represent the paper dictionary's bold and roman stress marks, sĭ-lābĭ-fī′, and the small caps in loch, KH, and bon, N, sometimes appear raised). Substituting middle dots for hyphens for syllabification doesn't make this a novel system—the two are commonly used as equivalents in typography.
Making incremental changes to this for the sake of originality doesn't seem productive. Who's going to change the 3,300 pronunciations already out there in our dictionary?
If we were to abandon it and use another, then I suggest we pick an existing one. An independent standard has the advantage that it is already finished, and there is no reason for us to mess with it. As amateur volunteers, we are justified in compiling information from published sources, but not so much in noodling around with our own original transcription systems, so let's use one developed by lexicographers. If we can find one which is in the public domain, then this type of controversy won't arise.
Candidates include a chart from the 1913 Webster's Dictionary, a pre-1923 Oxford English Dictionary fascicle, and the 1911 Concise Oxford Dictionary. I suppose Webster's would be ideal, since it is the Americans who demand this. See w:Pronunciation respelling for English for a summary, but it would be best to find an original PD copy to start from. Michael Z. 2009-01-21 18:05 z
Michael, "Oxford English Dictionary" and "Concise Oxford Dictionary" are still trademarked, so they'd present the same problem as "American Heritage Dictionary". The transcription systems are not protected, AFAIK, just the names.
There's something to be said for choosing a system that retains both the macrons and the breves. That way all symbols are unambiguous even to someone accustomed to a different dictionary. Among the dictionaries in the comparison chart, only the AHD and old COD editions do this, though I don't know if the COD system dates back to 1911. (The old OED system would be unfamiliar to almost everyone.) Actually, the old COD is really nice apart from using italicized vowels instead of a schwa. That could be a typographic concession on our part. But it wouldn't resolve our worries about trademark infringement. Kwamikagami 19:17, 21 January 2009 (UTC)

Please compare the AHD table of values with the ones we list to see the differences. There are several AHD symbols that we do not have at all. The Wikipedia article on AHD has them, but we do not: e.g. œ and ü. I am surprised that no one has been able to spot these differences. We also include symbols they do not, e.g. i, although Hippietrail has only recently added this into the table. The need for a symbol was discussed some months ago (I haven't been able to locate the discussion yet), in which British and American speakers came to the realization that there was a consistent difference between UK and US pronunciation of "i/y" in certain situations. At that time, we agreed about how to represent this difference in IPA, but I need to find the conversation in order to determine whether enPR was part of that discussion.

If you look at the discussion associated with our system, there have been suggestions made about changing other symbols as well, as the symbols in the table for these sounds were added without discussion last summer, and possibly without comparing our current usage at the time. Some of the changes proposed by Kwamikagami are in line with the proposals already made. If we can agree on an option for these cases, then we can proceed. The discussions that noted the problems seem to have stalled months ago, so it is good to see them active again. --EncycloPetey 19:38, 21 January 2009 (UTC)

Thank you for answering this point! As for œ and ü (yes, I noticed them, and that you included them in the Wikipedia version of EnPR), IMO the omission of symbols for non-English sounds is a trivial difference, and we're still dealing with the AHD. And the i may or may not be EnPR ... So the essential difference between EnPR and the AHD would seem to be that the EnPR is not stable.
Michael raised the serious concern that modifying a system while it is in use makes it impractical to maintain the coherence of the dictionary. IMO we should either stick to the AHD system we have (I mean, who care about GA / RP differences like i? Americans are the only ones who are going to use this transcription anyway, and we have the IPA for the Brits) and give it proper credit, or we should settle on a distinctive Wiktionary variant and stick with it. If the latter, as Michael asked, who's going to revise all the existing AHD transcriptions, which are already somewhat incoherent from several minor changes which have never been followed through on? Kwamikagami 20:06, 21 January 2009 (UTC)
I agree that omitting the 4 foreign sounds from AHD doesn't make our system novel. It hasn't differed significantly from the AHD's transcription since August 2004.[3] Although it appears to have changed in the details rather regularly, which doesn't do anything for its usefulness as a “standard”. If any of the symbols have changed at all, then our transcriptions in entries are unreliable, serving only as a placebo to mollify the IPA-haters. Michael Z. 2009-01-21 20:18 z
Re: " [] our transcriptions in entries are unreliable, serving incredibly useful, if only as a placebo to mollify the IPA-haters": FTFY. :-)   —RuakhTALK 23:33, 21 January 2009 (UTC)
If that makes me smile is it an indication that I am lacking integrity? Cheers. Michael Z. 2009-01-22 15:54 z
  • I was the one who introduced both the IPA and AHD pronunciation systems to Wiktionary. I seeded them and helped them grow in the early days years ago but no longer have much to do with them since the community is larger now and there are many things for me to do.
  • At the time I had the mistaken belief that there was an "American dictionary pronunciation system" that all American dictionaries used. I suppose I expected there was a system used by linguists before IPA and that the same system was also used in dictionaries.
  • When I made the first version of the American system I did in fact have only the AHD system at hand and never made any secret about that fact. Only later did I come to realize that other American dictionaries had different systems with mainly the macrons in common.
  • In the early days I did not label either the IPA or American style pronunciations. Soon came font wrappers for each so they were displayed correctly and likewise when SAMPA was added. At some point a template for IPA began to be used which had a label. Soon people also wanted to label the American system so a name for it was needed. It may have been at this time that I searched for the name for the sytem only to find that it was AHD's proprietary system and not a general system used by American linguists and dictionaries.
  • The American style pronunciations then were labelled AHD and this continued for some time. In several discussions I reiterated the story of how I came up with them, that I wanted an American style system because I knew that many people were not comfortable with IPA. It was never my intention that our American system mimic one proprietary system. I disliked the italics and couldn't find definitive Unicode characters for primary and secondary stress.
  • At some point after this I stopped caring much about the pronunciations on Wiktionary at all. There were lots of arguments about how we should use IPA as well for a while and I ended up losing interest and left it up to the community. It was some time after this that the vote appeared on changing the name to enPR.
  • Initially there were more differences between our IPA and non-IPA systems as I was using it. But as others started to do pronunciations they didn't keep those differences. One thing I was trying to do was unify British and American pronunciations wherever possible as the IPA is a phonetic system and I had hopes that our other system could be more phonemic. The lost differences were various optional sounds in parentheses: I used (r) for postvocalic "r" pronounced only by rhotic dialetcs and also for "connecting r" as used word-finally in non-rhotic accents. I also used (ə) after any "l", "m", "n" which could be considered syllabic to allow both possibilites in a single transcription, and (j) or (y) to allow both the British and American pronunciations of "news", "emu", etc.
  • The pronunciation chart was not made or maintained by me. I seem to recall we had severl different ones on several different pages at various points. I had a table of my own where I was comparing how IPA was used in different ways in various dictionaries.
  • The /i/ segment used for final "-y" and possibly other unstressed "i" sounds has been used here by me for years rather than months in both IPA and American systems. It has been discussed more than once. Oxford dictionaries began using this symbol and I have seen it in at least some other dictionaries.
  • I recommend coming up with a new system of our own which keeps the schwa, the macrons and breves including the two-letter versions, and discusses possibilites for all other phonemes. I hate the italics in "th" and would like to suggest the Icelandic / Old English letters ð and þ instead.
  • If there are any questions I've doubtless left out some things here. — hippietrail 01:25, 23 January 2009 (UTC)
Your IPA notes are very valuable info. In addition to the English chart for readers, we need English transcription notes for editors with this kind of information. I guess this could be added to Wiktionary:Pronunciation#Phonetics and phonology. Michael Z. 2009-01-23 18:45 z
It's been a week, and I haven't even had acknowledgement from AHD. I guess they don't care one way or another. I still think it's best for us to either be upfront and call our system AHD, or to modify it significantly (and of course update all the articles which use it!).
It may, of course, not be a good idea to make our own system. But here's the kind of thing that would strike me as fair for a non-credited transcription:
This has the additional advantages of displaying properly with very limited font support, and of being copy&paste-friendly. —Kwamikagami 11:22, 25 January 2009 (UTC)
I'm not sure about a couple of the vowels, but this looks like an overall improvement to me.
I think the only way to reliably update existing pronunciations would be to give this a new name, and enter it with a new template, and let it happen on its own time. Michael Z. 2009-01-26 05:59 z
You're right, of course. That's the way to go. (Which vowels?)
But back to my original point: should we rename the EnPR system, and template, AHD, and stick to it in detail? Kwamikagami 22:58, 26 January 2009 (UTC)

Dictionary perpetuation of nonce word

I looked about for a policy element covering this and I can't figure it. What exactly is the policy for a word that is in virtually all dictionaries, but only because, well, it's in all dictionaries? The word in question is the French abdicataire, which a cursory search tends to confirm the note helpfully given in the fr:wikt entry "A nonce word of w:Chateaubriand". The word is otherwise perfectly usable: it sounds perfectly legit (e.g. not particularly contrived) and follows a fairly productive pattern. It's just frozen in dictionaries (and I have to wonder why it hasn't been cut before!). Should we go with our CFI or bow to dictionary "consensus" with a note like at fr:? Circeus 03:28, 22 January 2009 (UTC)

*sigh* A better directed search revealed a few uses (in Belgian legal language, also about Bhutan and Sardinia). The point still stands regarding other similar words. Circeus 03:31, 22 January 2009 (UTC)
Here are some that we have decided to keep in the past- dord, zzxjoanw. Nadando 14:47, 22 January 2009 (UTC)
In principle, these brazenly fail to meet CFI. In practice, they are hard to get rid of once created; the community is reluctant to delete them for various ill-considered reasons. Some sort of consistent placeholder template (similar to {{only in}} but allowing for some additional data) would perhaps be ideal. -- Visviva 00:36, 23 January 2009 (UTC)
I corrected your link so that people can understand what you are talking about. I have never seen this word but it is true most educated french speaker would be able to infer the meaning from the construction. If you have some examples of use by other authors, please add them to the french wiktionary. Koxinga 14:56, 22 January 2009 (UTC)
It depends on the nonce. In this case, it might meet CFI even without the Belgian legal uses and so on: I think Mémoires d'Outre-Tombe could be considered a "well-known work". —RuakhTALK 15:18, 22 January 2009 (UTC)

Ottoman Turkish

There was a discussion in the ibrik article about whether we use Turkish template along with Ottoman Turkish, since Ottoman Turkish is actually within the Turkish language. The result seems appropriate, but since the topic is general and not specific to the word ibrik, we decided to bring it up also here for discussion. Following is the copy of the recent discussion in the related article's talk page:

I noticed that you changed the language from Turkish to Ottoman Turkish in the etymological sections. Of course, most of the Turkish loans to Western languages took place during the Ottoman period. But I should state that Ottoman Turkish is a period of the Turkish language, not a separate one. That is to say, Turkish is not a language having started in 1923, what started in 1923 was the modern Turkish period only. And if you have noticed, major online English dictionaries prefer the term Turkish rather than Ottoman Turkish, even the word was loaned into English during the Ottoman period.
Here is a Britannica article mentioning about the 4 periods of the Turkish language, namely Old Anatolian and Ottoman Turkish, Middle Ottoman Turkish, Newer Ottoman Turkish and Modern Turkish periods: the article
Therefore, I propose that we write firstly Turkish, and beside it Ottoman Turkish if the word was loaned in the Ottoman period, in the etymological sections of the words. Thus, a reader may see a complete list of the Turkish origined words in the related category; and if s/he wants, s/he may also see the list of the words that were loaned in the Ottoman period in the Ottoman Turkish category. --Chapultepec 15:11, 24 January 2009 (UTC)
I agree, Ottoman Turkish is a period of the Turkish language. However, Old French is a period of the French language, Middle English is a period of the English language, and yet they are all listed here in the etymologies. Yes, major sources perfer Turkish, because that is how it has been done historically. The language was not known as "Ottoman Turkish" to foreigners until the development of Modern Turkish. Most sources also write words in the Roman script, (and sometimes in their reformed, modern pronunciation) however Ottoman was written in Arabic script. What you can do, is list Modern Turkish words that have descended from Ottoman Turkish under the Ottoman Turkish headwords as Descendants. --Dijan 18:21, 24 January 2009 (UTC)
Thank you for the comments. But, like French and English, Turkish language has earlier periods as well. Such as Seljuk era Turkish (or Turkic), Middle Turkic, Old Turkic etc. I do not even count them. What I try to explain is that the Ottoman era is a very recent era, and when a word is mentioned that it is of Turkish origin, this does not comprise only the modern Turkish period. Let me give an example with a link to ibrik article in Turkish Language Association's online dictionary. As we can see, its etymology is given as Arabic, and Ottoman Turkish is not mentioned as predecessor at all: TDK - ibrik
As for the major online dictionaries, as we all know they are modern dictionaries and are all up-to-date. But they still use the term Turkish for the etymologies in the existence of modern Turkish for almost 90 years.
And as for the script change, alphabet changes do not necessarily imply a change in the languages. Let's take some east European and ex-USSR languages for example, several of them went through alphabet changes in 1990s. But this did not make them different languages.
Although it does not correctly reflect what I try to explain here, another solution is also possible; we can enter the Ottoman Turkish template and term firstly, and the Turkish template and term beside it, in parentheses. (of course if the word belongs to the Ottoman era) --Chapultepec 19:14, 24 January 2009 (UTC)
I understand your point. I am not against Modern Turkish listings at all, however it would be greatly appreciated if you bring it up with the rest of the community in the Beer parlour (Community portal) for discussion. --Dijan 07:39, 25 January 2009 (UTC)
I am pleased to see that you are not against it, thank you. But, I feel the necessity of stating it once again :), the term Turkish language comprises not only modern Turkish period, but also the Ottoman Turkish era. Until 1928 it used the Arabic alphabet, and thenceforward the Latin alphabet is being used. --Chapultepec 18:58, 25 January 2009 (UTC)

--Chapultepec 19:25, 25 January 2009 (UTC)

I'm not sure I understand why we would need to say both "Turkish" and "Ottoman Turkish". How is this different from what we do for Latin, which is just one language, but for which we distinguish several periods in etymologies? --EncycloPetey 14:41, 26 January 2009 (UTC)
As I stated above, like Latin, Turkish language has earlier periods as well, such as Anatolian Seljuk Turkish, Seljuk era Turkic, Middle Turkic, Old Turkic etc. I do not even count them at all. But the Ottoman era is a very recent era, and the major difference is the script change, nearly 81 years ago. As we can all remember, several countries in east Europe and ex-USSR went through alphabet changes in 1990s as well. So, I can say that it is a display in both scripts. --Chapultepec 15:53, 26 January 2009 (UTC)
The issue ultimately boils down to how we divide languages here on Wiktionary. On Wiktionary, Latin is a single language (all of it), where English, Middle English, and Old English are separate languages, like Greek and Ancient Greek are separate languages. The logic behind this is an adherence to ISO 639 codes, which have different codes for all three Englishes ({{en}}, {{enm}}, {{ang}}), but only one for Latin ({{la}}). Turkish and Ottoman Turkish have different codes ({{tr}}, {{ota}}). Thus, we absolutely cannot simply say "from Turkish" when something is "from Ottoman Turkish," as, on Wiktionary, Turkish specifically means modern Turkish. Now, it is true that our etymologies are often not consistent in this area, as words from Old French are often cited as "from French" and words from Ancient Greek are often cited as from "Greek." However, this is something which is slowly being fixed, not promoted. We do occasionally break from 639 standards. For example, we treat all of Hebrew as "Hebrew", even though there is a separate code for Ancient Hebrew, and we lump all the Nahuatls together. However, these are both the result of formal proposals. If you would like to treat Ottoman Turkish and modern Turkish all as "Turkish," then you would need to propose a comprehensive proposal, which includes not only etymology format, but also entry and translation format, among other things. -Atelaes λάλει ἐμοί 19:51, 26 January 2009 (UTC)
Yes, I have checked ISO 639 codes now, there is a different code for Ottoman Turkish, I have no objections for that. But this is simply because of the alphabet change, not due to language change. Turkish language also has earlier periods like Middle Turkic, Old Turkic etc corresponding to that of Greek or English, namely Middle English and Old English. And, for instance, we know that major online English dictionaries generally prefer to use the term Turkish in etymological sections even if the word was loaned during the Ottoman era. Following is a citation from Ethnologue website:
"ISO distinguishes this code from [ota] on the basis of time, [tur] applying to the Turkish language since 1928, and [ota] applying to the Turkish language prior to 1928. The year 1928 corresponds to the year in which writing reform occurred, changing from Arabic to Latin script. Thus, these two codes are distinguishing between the Arabic- and Latin-based writing systems rather than between languages. This goes against the normal practice for ISO 639-x, as described in clause 4.1.3. Thus, we deem that this language is also covered by the ISO code [ota]."
Here is the link for the above citation. What I try to do is find a simple solution rather than merge the two templates. Therefore I suggested to append the Turkish template where applicable. --Chapultepec 22:27, 26 January 2009 (UTC)
Hmm.....that was an interesting cite. Thank you. However, the fact remains that we sort of need to do this all or nothing. If it is better that we treat everything as a single Turkish language, then we also need to deprecate {{ota}}, reformat everything in Category:Ottoman Turkish language, and set up a policy about how this all words. If we do it half-assed (i.e. just do it in etymologies, but nowhere else) we just end up with a mess, a very confusing mess for our readers. —This unsigned comment was added by Atelaes (talkcontribs) at 22:37, 26 January 2009 (UTC).
If you say we can end up with a mess, then we should merge the templates. But this generalization should be under the name "Turkish", since there are newer Turkish loanwords that do not belong to the Ottoman era. For instance, when the user reads the word sultan comes from Turkish instead of Ottoman Turkish, it's not a problem. But if he reads that the word doner kebab comes from Ottoman Turkish, he will be messed, since that word is from the modern period. Maybe we can embed the term in Arabic script beside the one in Latin alphabet for etymologies, such as from Turkish sultan سلطان < from Arabic ..., but this can be technically problematic since we will be expected to do the same for declensions etc. Deprecation of [ota] template is also possible, maybe other users can reveal their ideas as well. --Chapultepec 23:10, 26 January 2009 (UTC)
From what you say, and what Ethnologue says, I agree that it makes sense to merge Ottoman Turkish and {{ota}} into Turkish and {{tr}}. I somehow had the impression that Atatürk also pushed through linguistic reforms, but I guess you can't change a language overnight in the way that you can a writing system. —RuakhTALK 02:35, 27 January 2009 (UTC)
Yes, he pushed through some linguistic reforms to rid the language from the yoke of Arabic and Persian. He changed some words with its Turkish equivalents, if available with existing Turkish words, if not with newly derived ones. But most of the old words continued to take place in the vocabulary. Of course it was not possible to change it overnight. So we gained some new words, but it is too normal thinking that tens of thousands of new words English acquired within the last century for example. As for merging {{ota}} into {{tr}}, yes that is what I think too if it is generally agreed. But the question here should be how it is gonna be achieved. --Chapultepec 03:49, 27 January 2009 (UTC)
If there are different ISO codes, then the policy here postulates that we have different languages. Similarity in the vocabulary is obviously no argument, otherwise we would have merged Bosnian, Croatian and Serbian into SH a long time ago. There is no language on this Earth which has remained unchanged for 710 years(Osman in 1299 coming to power) - even Icelandic has some changes in comparison to Old Norse and there are several new words for recently emerged novelties(albeit not borrowed from English). Just as non and is are different languages, from what I read here (many new words, different ISO-codes, Seljuk Turkish, ergo the following is evidently Ottoman Turkish) I came to the conclusion that these two (ota and tr) are quite dissimilar as well. To put it mildly, I oppose the merger. Bogorm 07:39, 27 January 2009 (UTC)
(refutation of the merger appeal) Here is how БСЭ (BSE) judges on this matter: (translated roughly) The literary [Turkish] language began to emerge in the mid of the 19th century, abolishing the Ottoman litterary language which was fraught with Arabic and Persian loanwords - whence two conclusions: 1) the vocabulary is far more different than claimed above, and 2) There was no (Modern) Turkish language before 1850, just Ottoman Turkish (cf. Middle French, MHG). Bogorm 07:53, 27 January 2009 (UTC)
But ISO 639 starts Ottoman Turkish from 1500, not from 1299. Thinking that modern English also started roughly in 1550-1600, that starting date seems normal. As for the literary language, this is a different thing. Ottoman literary language was a variety of the Turkish language used by the higher class for literary and administrative purposes only (and full of Arabic and Persian words), but it was not the mainstream Turkish language. The related article of Larousse Encyclopedia in Turkish also approves this.[1] Therefore Ethnologue deems no difference between the two languages, and sees it only a script change in the year 1928.[2] And let us not forget, similarly Katharevousa was used for formal and official purposes in Greece until 1976 in the existence of modern Greek since at least the 15th century. --Chapultepec 14:39, 27 January 2009 (UTC)
As I stated above a couple of times, Turkish language has earlier periods as well, such as Old Anatolian Turkish, Seljuk Turkish, Middle Turkic, Old Turkic etc which correspond to that of other languages mentioned above. I have a Turkish English dictionary dating from 1856 in PDF. I look through the dictionary, but find almost no difference except for the script change.[3] For the ones who may be interested, here is the link. And here is the link for the online contemporary Turkish dictionary by the Turkish Language Association, anyone interested can compare them. As for the merger, I thought it, because it was written above that we could end up with a mess, if there will not be any problems I am ok for the easy solution too, simply appending the template. And as for the claim that there was no Turkish language before 1850, this is also in contradiction with what we discuss here. We discuss about the [ota] and [tr] codes, and their split date is 1928. --Chapultepec 15:39, 27 January 2009 (UTC)
Here are two citations from "An Analysis of ISO 639" by SIL International:
"[tur] “Turkish” and [ota] “Turkish, Ottoman (1500–1928)” both map to the language Turkish [TRK]".[4]
"The name “Turkish, Ottoman (1500–1928)” suggests to us that the code element [ota] was created to distinguish Turkish literature written in Arabic script from more recent Turkish literature written in Latin script. In other words, the primary distinction is based on script rather than linguistic differences. The alternative is that this represents a linguistic distinction based on time, but there is no other precedent for such distinction in the period since 1500, and a claim of a purely linguistic distinction with such a recent boundary is suspect. Also, the year 1928 corresponds with the year in which orthographic reform for Turkish took place".[4] --Chapultepec 12:22, 28 January 2009 (UTC)
1. "Osmanlıca." Büyük Larousse Ansiklopedisi (Turkish edition). Vol. 15. Gelişim Yayınları, 1986.
2. ISO 639 Code: tr in Ethnologue.
3. And except for the new words naturally, but the same goes for the English part too, i.e. you cannot find words like "computer", "database", "laser" etc.
4. SIL International. "An Analysis of ISO 639". pages 17, 18 --Chapultepec 00:06, 28 January 2009 (UTC)
In the light of information and sources given above, and if there is no objection, I would like to apply the initial suggestion, namely appending the Turkish template where applicable in the etymological sections. --Chapultepec 23:27, 31 January 2009 (UTC)

template:typesetting and other context labels

If I enter {{typesetting}}, the template changes that to a different concept: “metal typesetting”. Entering {{context|typesetting}} does the same. (If you happen to view the template's page, it helpfully adds “For more general terms, use {{typography}}”.)

It's not only annoying to type what you're thinking and have a template second-guess you: you might not even notice that your text is changed (as I didn't notice here for about six months).

Shouldn't context templates have identical names and text, to avoid such mistakes? Michael Z. 2009-01-26 18:15 z

Yes and no. Template:legal can't be called law any longer (as that's a language code), but still needs to display law in entries for backward compatibility. I'm not saying that's the usual situation, but it does come up. As far as typesetting, I, for one, would have no problem changing its text if someone first goes through each of its calls and verifies that typesetting is a good text to use there (or adds metal|_|). (And I'm not volunteering.) But although ideally each context template should display its name, I think each template would need to be discussed separately before changing it, rather than just saying "every context template should display its name".—msh210 19:01, 26 January 2009 (UTC)
Well, there's only 5 entries using {{typesetting}}, so even I'd volunteer to do that:) --Bequw¢τ 05:10, 27 January 2009 (UTC)

I'm glad to sort this out, but the categories need some reworking too. Metal typesetting is a (mostly historical) subset of typography, but the two are diversely categorized. The former adds category:Printing, which is in category:Technology and category:Publishing, the latter adds category:Typography, in category:Language and category:Communication.

I think they should both just add Typography, which should also be made a child of Printing. Any objections?

I think we also need a new category—there is category:Web design, but nothing in category:Media which encompasses publishing on the Web, and related electronic or digital media. Or is category:Publishing meant to include this too? Michael Z. 2009-01-28 16:57 z

Okay, I've gone through and changed a couple of senses from metal typesetting to typography, because they are terms used in digital typesetting too. I don't know whether the Italian carattere is still in use or belongs only in metal type.
To solve the template naming problem, I'd like to move {{typesetting}} to {{metal typesetting}}, and add a convenience redirect from {{metal type}}. Another possibility is to make it more specific as {{movable type}}, the main technology used for five centuries, until the invention of hot metal type in the 1880s. {{typesetting}} should either be deleted, or redirect to the slightly more general {{typography}}Michael Z. 2009-01-28 17:21 z
Done. Michael Z. 2009-03-03 18:19 z


I've come across an online open source Catalan-English-Catalan dictionary called DACCO (Diccionari Anglès-Català de Codi Obert). It uses the Creative Commons Attribution-Share Alike 2.5 License. Before I start importing it to the Wiktionary (and creating a reference template {{R:DACCO}} for the entries), I thought I'd make certain that doing so would be acceptable. I think so, but I'll be the first to admit that trying to figure out how various open source licenses interact is beyond my ken. Carolina wren 23:02, 28 January 2009 (UTC)

I can't seem to find the discussion at present, but my understanding is that the GFDL and CC-BY-SA are not compatible, i.e. content released under one cannot be distributed under the other outside of fair use, because the GFDL is more restrictive than CC-BY-SA, while CC-BY-SA in turn prohibits redistribution under terms more restrictive than its own. Or something like that; you may want to check the exact wording of the respective licenses. This situation is noxious and absurd, since for almost all human purposes the two licenses are identical, but there it is.
Should you want to pursue this question further, it might be worth posting at commons:Village pump, as that's where you will find the largest concentration of copyright-savvy Wikimedians. -- Visviva 06:08, 8 February 2009 (UTC)
If we ignore technicalities, we could just ask DACCO if they would let us import their data? It should be a fairly easy task to run with a bot. Would people here be interested in this happening?. (If we add DACCO as a reference in the entry, and link to them in the edit summary, I can't see that the people there would mind - oh wouldn't it be nice if licenses were better written :). Conrad.Irwin 01:20, 9 February 2009 (UTC)
Right now, since Wiktionary apparently has R: templates to dictionaries still under copyright without any sort of copyleft scheme attached to them, I'm of the opinion that so long as I don't slavishly copy their defs, using them as a credited reference with an appropriate R:DACCO template should suffice. If it doesn't then there are quite a few other entries with problems. DACCO has enough quirks and differences in formats that I wouldn't want to bot import them anyway. Carolina wren 02:26, 9 February 2009 (UTC)

Foreign language sections in dictionary entries

I feel it inappropriate that full foreign language definitions are added to the bottom of Wiktionary entries. There are separate dictionaries for separate languages and, at most, there should be a link to the appropriate page, any more is wasteful repetition. Consider the page for "program": it has entries for Czech, Slovak, Norwegian, Hungarian... what is the point of having these here, complete with meanings copied from the English section above, when they already exist in their respective dictionaries? The Norwegian section even has its own verb conjugation table! M0thr4 09:44, 29 January 2009 (UTC)

Every Wiktionary has its own entry for every word in every language. Compare our Swedish section on program and its Swedish counterpart. The grammatical tags on the Swedish Wiktionary are in Swedish, whereas our are in English. As someone who does not know Swedish (and certainly not Swedish grammatical terminology), our entry is far more useful for me. This applies to a great many things which our entries often do not yet have, but will eventually, such as usage notes, regional/dialect information, etc. The foreign language sections you're seeing on program are short and simple, and will be expanded in the future with lots of information about the words written in English. -Atelaes λάλει ἐμοί 10:13, 29 January 2009 (UTC)
As Atelaes says, each Wiktionary has explanations in a different language. Compare the English Wiktionary entry for flōs with its counterpart on the Latin Wiktionary at la:flos. Which version of the entry do you find easier to read? --EncycloPetey 23:13, 31 January 2009 (UTC)

ISO 639-5

ISO 639-5 was released May, 2008 and it assigns 3-letter codes to language families (eg "Turkic languages" is trk). There are currently around 114 codes and they use the same "pool" as ISO 639-2 & 3 codes. 639-5 is disjoint from 639-3 (the standard for individual languages) but is a superset of the "collective" codes from (which codes both "collective" and individual languages). I think Wiktionary should employ ISO 639-5 codes for specific purposes here, such as in standardizing etymologies. These codes would allow us to standardize many of the entries in Wiktionary:Languages without ISO codes (see how). As these codes aren't valid for L2 entries we should prefix them. That way they could be restricted from being subst'd or used with {{infl}}, {{term}}. Certain templates, such as {{etyl}} and possibly {{proto}}, could be coded to look for language family codes at the specific prefix. Atelaes suggested macro: as a prefix, though that could be confusing as certain codes in ISO 639-3 are termed 'macrolanguages'. The title of 639-5 is Alpha-3 code for language families and group so maybe group: or family: but maybe someone has a better idea. As there are some existing ISO 639-2 "collective" codes (and therefore 639-5 codes) currently in use they would be prefixed as well. Thoughts on this plan? --Bequw¢τ 06:44, 30 January 2009 (UTC)

So far some of the -3 family codes (e.g. {{sla}}, {{bat}}, {{dra}} etc.) are used with {etyl}, so they should first be relocated to fam:xxx (or whatever the prefix be) before this gains official blessing. I like this idea of usage of secondary namespace for families, as it contains more direct metadata providing the separation between individual and groupings of languages, and would prob. simplify maintenance. I'm not sure how this is supposed to work with {proto} as that templates takes explicitly name of the family as the first positional parameter (unless it gets rewritten to support both e.g. {{proto|Indo-European|...}} and {{proto|ine|...}} types of invocations, much like some of the templates now are accepting both ISO code and full language name). --Ivan Štambuk 23:32, 31 January 2009 (UTC)
{{etyl}} can now be passed language codes that exist in the etyl: prefix ([fam] is an ISO code so also wouldn't have made a good prefix). Right now we just have ISO 639-5 codes there (see cat). Note for those wanting to create these language code templates: as these templates have limited use (they aren't {{subst:}}-ablee) they have a different, more useful, format than the normal language codes. --Bequw¢τ 19:50, 14 February 2009 (UTC)

Hungarian form of template - new approach

I am trying to simplify the way noun forms are entered. The current approach requires the editor to know what ending belongs to what case, the abbreviated case name, and the order of parameters. It's easy to make mistakes. In the new approach, the template will figure out the case name, the editor would have to enter only the ending. I am also thinking about leaving out singular and plural. Examples for kert (garden):

  • kertben (in the garden) - inessive singular
    Current method: {{hu-inflection of|kert|ine|s}}, output: inessive singular of kert
    Proposed method: {{hu-infl|kert|ben}}, output: inessive of kert
    A supplemental grammar tag template would contain the case information: ben = inessive.
  • kertekben (in the gardens) - inessive plural
    Current method: {{hu-inflection of|kert|ine|p}}, output: inessive plural of kert
    Proposed method: {{hu-infl|kertek|ben}}, output: inessive of kertek
  • kertemben (in my garden) - possessive inessive singular
    Current method: {{hu-inflection of|kertem|ine|s}}, output: inessive singular of kertem
    Proposed method: {{hu-infl|kertem|ben}}, output: inessive of kertem

Do you think this is simpler? Are there any risks to this approach? Any feedback is appreciated. --Panda10 22:57, 31 January 2009 (UTC)

This would work as long as each possible ending is unique to just one inflectional form across all parts of speech that will use the template, and provided that every possible ending is included in the template switch. I don't know whether this approach would increase server demand or not. --EncycloPetey 23:07, 31 January 2009 (UTC)