This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


October 2010

Categories for derivation in All topics

The Category:All topics contains a host of categories for derivations, such as "Ancient Hebrew derivations" and "Byzantine Greek derivations". This seems wrong: classification by derivation is not topical but rather lexical. And the derivations categories overflood the Category:All topics. Many of the derivation categories are using the {{topic cat}}, which is how they probably land in Category:All topics.

It seems that at some point the derivations categories were located at Category:Etymology, and indeed, this category still has many derivations categories. An example derivation category located in there is Category:Old Frisian derivations, which does not use {{topic cat}}. Category:Etymology is a subcategory of Category:Fundamental, as it should given it is not a topical category.

Was there some discussion that has lead to using {{topic cat}} for derivations categories?

Can we do something in order to remove all the derivation categories from Category:All topics, and place them only in Category:Etymology?

Like, I would start removing {{topic cat}} from the categories for derivations, as they are not topical categories.

Thoughts? --Dan Polansky 10:25, 1 October 2010 (UTC)[reply]

It is only because apparently, someone created these without also creating the topic cat parents to make sure they end up in Category:Etymology, and not in the main category. There should be no reason to have them somewhere else than Category:Etymology. -- Prince Kassad 12:54, 1 October 2010 (UTC)[reply]
I hadn't noticed, but it makes sense to me and seems an important distinction. I don't recall any discussion that covered this.
All categories need to have a structure that prevents well-populated classes from overwhelming less populated ones. Usually it is the more-populated homogeneous ones that need their own structure or, at least, collecting category. The language-derivation class seems to be like this.
Many categories of a linguistic nature seem to serve a special role because, even though they can be considered as topics, they refer to language (ie, our stock in trade), our entries (or parts) themselves and not their referents. That Etymology is a section in WT:ELE makes it a very special "topic" indeed.
Further, it might make sense to somehow allow for distinguishing "synchronic"/morphological etymology from "diachronic" for same-language derivations within the structure. It requires much less research to do acceptable "synchronic" derivations, which are useful. For many languages, it seems there is virtually no way to do very reliable "diachronic" etymologies, establishing dates of attestion that are not decades and centuries after likely first use in speech.
I'd favor your proposal as a first step as it would make the treatment of this type of category more uniform. Why would you not come up with a replacement template to facilitate uniform treatment of these hereafter, whatever the ultimate placement in the hierarchy might be?
  • after edit conflict:
    That last query seems in accord with the Prince's view, AFACIT. DCDuring TALK 13:10, 1 October 2010 (UTC)[reply]
    Re "Why would you not come up with a replacement template to facilitate uniform treatment of these hereafter, whatever the ultimate placement in the hierarchy might be?": I am okay with a replacement template for derivation categories, if someone proposes and designs one. And if someone creates the topic-cat-parents mentioned by Prince Kassad, that would be another first stopgap measure.--Dan Polansky 14:41, 1 October 2010 (UTC)[reply]
    I think I know what you mean. I suppose one must just put in some kind of request at WT:GP or on the appropriate user wizard's talk page and hope for the best. One can no longer hope to be able to help oneself, especially given the lack of clear documentation or access to whatever flexibility is (or had better be) built in to the category templates. DCDuring TALK 19:13, 1 October 2010 (UTC)[reply]
After some trial and error, I have entered "{{topic cat parents/helper derivations|lang={{{lang}}}|source=gkm}}" into Template:topic cat parents/Byzantine Greek derivations on the model of Template:topic cat parents/Sanskrit derivations, and this works as it should, after I have refreshed Category:Byzantine Greek derivations: it removes Category:Byzantine Greek derivations from All topics. Notice the use of helper derivations in the wiki markup.
I have removed {{topic cat}} from Category:Ancient Hebrew derivations[1][2], and it naturally works as expected. I have undone this removal.
I have found there was {{dercatnav}} and {{dercatboiler}} at some point. Dercatboiler was deleted on 4 February 2010, allegedly because it was redundant to {{topic cat}}; see Template_talk:dercatboiler.
People who have been involved in deprecating these templates include Williamsayers79[3], Prince Kassad[4], and Mglovesfun[5].
People have been moving derivations categories to {{topic cat}} since 2008, as follows for example from Template:topic_cat_parents/Sanskrit_derivations, created on 6 May 2008. --Dan Polansky 07:45, 2 October 2010 (UTC)[reply]
I have made some edits to my post. --Dan Polansky 09:32, 2 October 2010 (UTC)[reply]
I am planning to create more topic-cat-parents for derivation categories, in order to place derivation categories into the category for etymology. Please raise any objections before I proceed. --Dan Polansky 08:07, 2 October 2010 (UTC)[reply]
I've added {{documentation}} to topic cat. Essentially it's a pretty convoluted, but overall very flexible system. I have plenty of sympathy for anyone who doesn't understand the system, it's taken me months to get it, and I still find the odd surprise. Mglovesfun (talk) 09:38, 2 October 2010 (UTC)[reply]
{{topic cat}} is just its name, it can format any category correctly that uses fr and not French. Let's not get hung up on its name. Mglovesfun (talk) 10:13, 2 October 2010 (UTC)[reply]
Using {{topic cat}} for derivation categories, and using {{topic cat parents}} and {{topic cat parents/helper derivations}} to ensure their proper placement in the etymology category is now a common practice, whatever its origin. It follows from the following list of topic cat parents created for derivation categories. The list states the creating user and the date of creation.
The list: Afrikaans derivations, Williamsayers79, 21 March 2008; Akkadian derivations, Williamsayers79, 21 March 2008; Ancient Greek derivations, Williamsayers79, 22 March 2008; Breton derivations, Williamsayers79, 13 April 2008; Catalan derivations, Williamsayers79, 13 April 2008; Cornish derivations, Williamsayers79, 16 April 2008; Anglo-Norman derivations, Williamsayers79, 17 April 2008; French derivations, Williamsayers79, 19 April 2008; Middle French derivations, Williamsayers79, 19 April 2008; Abenaki derivations, Williamsayers79, 19 April 2008; Gaulish derivations, Williamsayers79, 21 April 2008; German derivations, Williamsayers79, 21 April 2008; Greek derivations, Williamsayers79, 23 April 2008; Celtiberian derivations, Nadando, 24 April 2008; Irish derivations, Williamsayers79, 25 April 2008; Italian derivations, Williamsayers79, 25 April 2008; Latin derivations, Williamsayers79, 27 April 2008; Lithuanian derivations, Williamsayers79, 27 April 2008; Manx derivations, Williamsayers79, 28 April 2008; Middle English derivations, Williamsayers79, 29 April 2008; Sanskrit derivations, Meco, 6 May 2008; Havasupai-Walapai-Yavapai derivations, Nadando , 17 June 2008; Halkomelem derivations, Williamsayers79 , 10 July 2008; Miskito derivations, Nadando, 21 July 2008; Punic derivations, Atelaes, 30 August 2008; Fore derivations, Nadando, 14 September 2008; Evenki derivations, Nadando, 19 September 2008; Coptic derivations, CyberSkull, 25 September 2008; Oscan derivations, Williamsayers79, 7 November 2008; Southern Tiwa derivations, Nadando, 13 November 2008; Spanish derivations, Nadando, 21 December 2008; Neapolitan derivations, Nadando, 26 December 2008; Khowar derivations, Nadando, 22 January 2009; Old Irish derivations, Leftmostcat, 30 January 2009; Dhivehi derivations, Nadando, 31 January 2009; Twi derivations, Nadando, 20 February 2009; Old French derivations, Jackofclubs, 1 March 2009; Portuguese derivations, Jackofclubs, 10 March 2009; Tshiluba derivations, Nadando, 18 March 2009; Moksha derivations, Nadando , 25 March 2009; Alemannic German derivations, Nadando, 30 March 2009; Kabardian derivations, Nadando, 10 April 2009; Western Highland Purepecha derivations, Nadando, 5 May 2009; Albanian derivations, Opiaterein, 5 May 2009; Cora derivations, Nadando, 8 May 2009; Miami derivations, Daniel., 27 May 2009; Ligurian derivations, Nadando, 11 August 2009; Mobilian derivations, Nadando, 18 August 2009; Welsh derivations, Prince Kassad, 18 August 2009; Old Church Slavonic derivations, Prince Kassad, 18 August 2009; Tamil derivations, Prince Kassad, 18 August 2009; Latvian derivations, Prince Kassad, 18 August 2009; Serbian derivations, Prince Kassad, 18 August 2009; Nauruan derivations, Prince Kassad, 19 August 2009; Ukrainian derivations, Prince Kassad, 19 August 2009; Urdu derivations, Prince Kassad, 19 August 2009; Dzongkha derivations, Prince Kassad, 19 August 2009; Tibetan derivations, Prince Kassad, 19 August 2009; Nahuatl derivations, Nadando, 28 August 2009; Romani derivations, Nadando, 28 August 2009; Purepecha derivations, Nadando, 28 August 2009; Old Spanish derivations, Nadando, 28 August 2009; Munsee derivations, Nadando, 1 September 2009; Nyunga derivations, Nadando, 1 September 2009; Galician derivations, Nadando, 8 September 2009; Phoenician derivations, Nadando, 8 September 2009; Old Polish derivations, Nadando, 10 September 2009; Caló derivations, Carolina wren , 26 October 2009; Low Saxon derivations, Msh210, 4 November 2009; Serbo-Croatian derivations, Opiaterein, 18 November 2009; Polish derivations, Nadando, 7 February 2010; Ottoman Turkish derivations, Nadando , 14 February 2010; Lombardic derivations, PalkiaX50, 15 February 2010; Tahitian derivations, PalkiaX50, 15 February 2010; Min Nan derivations, PalkiaX50, 15 February 2010; Kannada derivations, PalkiaX50, 16 February 2010; Phrygian derivations, PalkiaX50, 18 February 2010; Nootka derivations, PalkiaX50, 24 March 2010; Tocharian B derivations, PalkiaX50, 21 April 2010; Sumerian derivations, PalkiaX50, 21 April 2010; Jurchen derivations, Nadando, 26 April 2010; West Frisian derivations, PalkiaX50, 9 May 2010; Friulian derivations, Yair rand, 25 May 2010; Ido derivations, Yair rand, 25 May 2010; Mapudungun derivations, Msh210, 2 June 2010; Jèrriais derivations, Mglovesfun, 6 June 2010; Tswana derivations, Yair rand, 25 June 2010; Algerian Arabic derivations, Yair rand, 27 June 2010; Bislama derivations, Yair rand , 27 June 2010; Burmese derivations, Yair rand , 27 June 2010; Caddo derivations, Yair rand , 27 June 2010; Cree derivations, Yair rand , 27 June 2010; Egyptian Arabic derivations, Yair rand , 27 June 2010; Sicilian derivations, Yair rand , 28 June 2010; Scottish Gaelic derivations, Yair rand, 28 June 2010; Lojban derivations, Yair rand, 28 June 2010; Slovene derivations, Yair rand, 30 June 2010; Romanian derivations, Yair rand, 30 June 2010; Nepali derivations, Yair rand, 4 July 2010; Translingual derivations, Yair rand, 4 July 2010; Tatar derivations, Yair rand, 4 July 2010; Tagalog derivations, Yair rand, 4 July 2010; Tok Pisin derivations, Yair rand, 4 July 2010; Telugu derivations, Yair rand, 4 July 2010; Vietnamese derivations, Yair rand, 4 July 2010; Volapük derivations, Yair rand, 4 July 2010; Swahili derivations, Yair rand, 4 July 2010; Cherokee derivations, Yair rand, 4 July 2010; Kurdish derivations, Yair rand, 4 July 2010; Luo derivations, Yair rand, 4 July 2010; Dalmatian derivations, Yair rand, 9 July 2010; Woiwurrung derivations, Yair rand, 9 July 2010; Ojibwe derivations, Yair rand, 12 July 2010; Middle Irish derivations, Yair rand, 12 July 2010; Old Breton derivations, Mglovesfun, 22 July 2010; Romansch derivations, Mglovesfun, 28 August 2010; Old Saxon derivations, Mglovesfun, 21 September 2010; Middle Welsh derivations, Mglovesfun, 26 September 2010; Old Welsh derivations, Mglovesfun, 26 September 2010; Aragonese derivations, Mglovesfun, 27 September 2010; Franco-Provençal derivations, Mglovesfun, 27 September 2010; Guernésiais derivations, Mglovesfun, 27 September 2010; Norman derivations, Mglovesfun, 27 September 2010; Picard derivations, Mglovesfun, 27 September 2010; Venetian derivations, Mglovesfun, 27 September 2010; Walloon derivations, Mglovesfun, 27 September 2010; Byzantine Greek derivations, Dan Polansky, 2 October 2010; Old Frisian derivations, Mglovesfun, 2 October 2010; Yoruba derivations, Mglovesfun, 2 October 2010; Mycenaean Greek derivations, Mglovesfun, 2 October 2010; --Dan Polansky 08:12, 4 October 2010 (UTC)[reply]
I have created topic-cat-parents for a host of derivation categories. For some derivation categories, I was not able to determine the language code, so I have left these without topic-cat-parents, not knowing how to create them without a language code. For a source of language codes I have used the page Wiktionary:Index_to_templates/languages.
The derivation categories without topic-cat-parents: Classical Arabic derivations, Dravidian derivations, East Germanic derivations, Hellenic derivations, High German derivations, Jewish Babylonian Aramaic derivations, Proto-Altaic derivations, Proto-Bantu derivations, Proto-Eskimo derivations, Proto-Finno-Permic derivations, Proto-Finno-Ugric derivations, Proto-Iranian derivations, Proto-Semitic derivations, Proto-West Germanic derivations, Proto-Yeniseic derivations, Provençal derivations, Sami derivations, Sinitic derivations, Tocharian derivations, Tupi derivations, Wathawurung derivations, Zapotec derivations. --Dan Polansky 09:38, 4 October 2010 (UTC)[reply]
I have created topic-cat-parents also for the categories that correspond not to single languages but to language families, or to languages for which I could not find a code. In these parents, I have used the template "topic cat parents/helper" as a workaround; I would have used "topic cat parents/helper derivations" otherwise. As a result, no derivation category is a subcategory of All topics right now. --Dan Polansky 19:46, 4 October 2010 (UTC)[reply]
A list of both topical categories and derivation categories that still lack a topic-cat-parent is here: Category:Topical categories without topic cat parent. Ideally, each category that uses {{topic cat}} should have a topic-cat-parent, or else it lands in "All topics", where it does not belong. --Dan Polansky 14:35, 5 October 2010 (UTC)[reply]
I have ensured that a topical category that has no topic-cat-parent ends up in a category "Miscellaneous" instead of "All topics". Thus the "All topics" category remains clean rather than becoming a generic container for whatever category that has been incompletely set up. I have achieved this by editing {{topic cat parents/default}}. --Dan Polansky 15:30, 5 October 2010 (UTC)[reply]

Depth of etymological categories

To pick a specific example, Category:Old High German derivations feeds into Category:Germanic derivations, and in turn into Category:Indo-European derivations. We could of course insert Category:West Germanic derivations in there. More generally, how do we decide how much to split up these etymological categories. Category:Balto-Slavic languages is a pretty good example. Mglovesfun (talk) 10:13, 2 October 2010 (UTC)[reply]


Isn't "Sign gloss" redundant?

I am still not used to the namespace "Sign Gloss", so I'm going to ask a question that I find relevant.

When a person wants to know how box is translated to American Sign Language, that person is naturally supposed to look for it in the translation table of the entry box. However, there is also Sign gloss:BOX for the same function. Apparently, both the entry and Sign gloss:BOX are functioning simultaneously.

Why are we keeping a "Sign gloss" page redundant to the entry? --Daniel. 05:45, 18 October 2010 (UTC)[reply]

For many signs, there will be such redundancy, but not always. Sign glosses are commonly used in literature that talks about sign language, and certain signs have come to be known by certain sign glosses that wouldn't merit entry in the main namespace because they're not common English phrases. For example, NOT-YET and CLOSE-DOOR are sign glosses known to pretty much everyone who has intermediate or greater knowledge of ASL, but we would not necessarily allow an entries for "not yet" or "close door" in the main namespace. (Looking now, I see we do have an entry for "not yet", but hypothetically, we may not have such an entry.)
Does that answer your question? —Rod (A. Smith) 18:36, 18 October 2010 (UTC)[reply]
Yes, from your explanation I can now understand and support the existence of a namespace to list glosses for sign languages. Thank you. --Daniel. 18:28, 27 October 2010 (UTC)[reply]

Terminology of possessives

Possessives are known by various parts of speech: pronouns, adjectives or determiners. On Wiktionary we seem to use 'pronoun' to refer to them, however that's not always correct. Pronouns stand in (i.e. replace) for nouns or noun phrases, whereas adjectives and determiners modify (add to) such phrases. In the sentence my house is mine, 'my' is an adjective/determiner since it modifies 'house', and 'mine' is a pronoun that takes the place of the complement of the verb 'is'. The problem is that a lot of times, modifying possessives are being miscategorised as pronouns. I think it is important to make a clear distinction, to reflect distinctions in usage, so I think we should implement clearer standards to distinguish the two.

The difference between 'adjective' and 'determiner' is less clear. As far as I know, determiners specify the reference within the context (which? how many?) rather than an attribute (what kind?). So in that sense, possessive modifiers are clearly determiners, and not adjectives. Going with my proposal, we would then have Category:Possessive pronouns by language and Category:Possessive determiners by language. {{poscatboiler}} should at least support all three, even if no standardisation occurs. —CodeCat 17:08, 2 October 2010 (UTC)[reply]

According to The Cambridge Grammar of the English Language, my is a genitive pronoun that merely functions as a determiner. (CGEL uses determiner to refer to a syntactic function — like how "red house" is a "nominal" — with determinative being a lexical category — like how "house" is a "noun".) Obviously we can debate whether that's the best approach for our purposes, but it's not wrong. (At least, for English. CGEL makes no pretense of describing other languages.) —RuakhTALK 14:39, 3 October 2010 (UTC)[reply]
If my is a pronoun then it can stand on its own without something to modify. However, a sentence like 'that car is my' is obviously ungrammatical. So it should be labelled as a modifier. —CodeCat 17:57, 3 October 2010 (UTC)[reply]
*shrug* You're entitled to your opinion — and honestly, I tend to share it — but it's idle to describe the contrary view, espoused by what is probably the most authoritative grammar of contemporary English, as wrong/incorrect. Obviously it's not wrong to call "my" a pronoun in the genitive case, any more than it is to call "asleep" an adjective (even though *"the asleep boy" is ungrammatical); the question is whether a different label would be preferable. —RuakhTALK 22:43, 3 October 2010 (UTC)[reply]
I think traditional grammar isn't very useful in this case, since scientific views (and therefore definitions) often change over time. If we have to choose between following tradition and being technically accurate according to modern interpretations, I'd prefer the latter anyday. And in any case, my point was not so much related to English specifically, but to languages in general. It's one thing to follow an obsolete definition for English, but to do that for other languages is going to be just plain confusing, since established grammars/native speakers might view such words quite differently in that language! So I think {{poscatboiler}} should, at the very least, allow categories named xx possessive adjectives and xx possessive determiners. After all, we allow ===Determiner=== as a POS header, so then {{poscatboiler}} should also support categories for them. —CodeCat 00:07, 4 October 2010 (UTC)[reply]
Who said anything about "traditional grammar" or "an obsolete definition"? Following CGEL is the very definition of "being technically accurate according to modern interpretations". —RuakhTALK 14:10, 4 October 2010 (UTC)[reply]
Then by what interpretation is a word that can only modify a noun phrase a pronoun? —CodeCat 15:53, 4 October 2010 (UTC)[reply]
CGEL, in contrast to some other modern grammars, takes the view that ordinary nouns have two cases: "plain", as in "doctor" ("The doctor spoke", "I saw the doctor", "I spoke to the doctor", etc.), and "genitive", as in "doctor's" ("I met the doctor's husband", etc.). In cases such as "someone else's", where the -'s doesn't attach to a noun, they use the phrase "phrasal genitive". Honestly, I'm not sure what the justification for this is — I think it has to do with the fact that -'s interacts with (other) inflectional endings (always "the doctors' spouses", never *"the doctors's spouses", even though both "Mr. Doctors' wife" and "Mr. Doctors's wife" are fine), but I'm not sure. I mean, obviously there are many languages where a genitive case is indisputable (Latin, German, Classical Arabic, and so on), which already puts paid to your question, but I'm not sure exactly what their basis is for asserting that English has one as well. Regardless, given that breakdown, it naturally follows that roughly the same system works for pronouns. For pronouns they replace the "plain" case with the traditional "nominative" and "accusative" cases (rejecting "subjective" and "objective" as misleading), and they retain the "genitive" case, with a minor sub-distinction between the forms that stand alone (mine, ours, etc.) and those that do not (my, our, etc.). —RuakhTALK 19:12, 4 October 2010 (UTC)[reply]
While functionally possessives are indeed genitives of personal pronouns, the main issue with that reasoning is that in many languages they themselves inflect according to the case and gender of the noun (phrase) they modify. This is the case in Dutch, German, Latin, Spanish, Polish and even in Indo-European itself. In all these languages, they are without a doubt modifiers, and most unlike regular genitives which are uninflected. However, they can often stand alone without a noun to modify. Such constructions are structured the same way as they would with any other adjective used without a noun: the adjective simply takes on the inflection of the implied (but absent) noun. Look for example at Dutch, which also does this:
Die boom is de grote. - That tree is the big (one).
Die boom is de mijne. - That tree is the my (one). That tree is mine.
The inflection of both 'big' and 'my' makes this explicitly clear, since the uninflected forms used in predicate position lack the -e.
Die boom is groot. - That tree is big.
*Die boom is mijn. - *That tree is my. (ungrammatical in both languages)
So it would seem that from a cross-linguistic perspective, the standard view of English possessives is actually rather exceptional. It makes little sense to apply it to other languages. —CodeCat 22:38, 4 October 2010 (UTC)[reply]
Oh, definitely, definitely. I completely agree. A word or construction in one language can have the same meanings as those in another while syntactically being completely different, and it's a mistake to take for granted that analyses that work for English work for any other language. Indeed, many of the problems with traditional grammar result from its misguided attempt to fit English into the mold of such languages as Latin. (The same holds, incidentally, even within a language: in Hebrew, it's obvious that we can't treat the inflected preposition שֶׁלִּי (shelí, of-me, my/mine) the same way as we treat the pronominal suffix ־י (, -me, -my/-mine), and in Spanish, it's debatable whether we should treat the "long" possessive pronoun mí@(s) (my) the same way as the "short" possessive pronoun mi(s) (my).) —RuakhTALK 23:21, 4 October 2010 (UTC)[reply]
Re: Codecat: If my is a pronoun then it can stand on its own without something to modify. That's not an accurate statement. Pronouns are the one group of words in English that strongly reflect the old declension system. The word my is a genitive form descended from a form that could only be used in a descriptive way. It's alternative counterpart mine is used to stand alone, but this distinction did not appear until the Middle English period. So, in Old English (and Middle English sometimes) my and mine were not distinct words. The history of a word can be as, or even more, important in understanding its grammar than analysis of current grammatical limitations.
That said, the part of speech for possessive words is not universally the same. See for example Appendix:Spanish pronouns#Possessive pronouns, where a situation is described in which a possessive can function either as a pronoun or an adjective depending on the syntax. --EncycloPetey 06:12, 7 October 2010 (UTC)[reply]

List of families at WT:LANGCODE

Currently, WT:LANGCODE includes three main lists of codes: languages, families and dialects.

Only the exceptional codes are displayed in these lists. That is, basically, codes like mo, sh, nds-nl, Late Latin and Sha., that are not found in basic and up-to-date lists of ISO 639-1, 639-3 and 639-5 codes.

There are thousands of languages with ISO codes, so a Wiktionarian page listing them all (or listing both ISO and exceptional codes, for that matter) would be enormous. So, the current list of codes is based on a very good approach, because it highlights which codes are unique to Wiktionary, leaving ISO codes to be found elsewhere.

However, there are very few family codes from ISO, so I propose listing them all together. Since WT:LANGCODE displays Central Semitic, Northwest Semitic and East Semitic, shouldn't it include Semitic and Afro-Asiatic as well? --Daniel. 20:10, 2 October 2010 (UTC)[reply]

I think they should be listed on a separate page. -- Prince Kassad 20:15, 2 October 2010 (UTC)[reply]
Agreed, a separate page should be good. Also, even if there are lots of language codes, maybe they can be split up into coherent groups? Grouping all Indo-European languages together for example, or all languages in Europe (or a smaller region)? —CodeCat 20:19, 2 October 2010 (UTC)[reply]
Thirded. Perhaps we should have Wiktionary:Languages, Wiktionary:Families and Wiktionary:Dialects, then. --Daniel. 23:18, 2 October 2010 (UTC)[reply]
My suggestion has been effected. Languages and families are now listed separately. However, the list of languages is still not completely comprehensive, due to the concern of enormity as expressed above. I support CodeCat's suggestion of splitting a huge list of languages by their families. Comments on Wiktionary:Languages, Wiktionary:Families and Wiktionary:Dialects are welcome. --Daniel. 01:38, 4 October 2010 (UTC)[reply]

Simpler pronunciation representation

I've seen a lot of complaints/requests/etc while reading here for a simpler pronunciation scheme, easier for the non-IPA/SAMPA-fluent general public to understand (such as the one used at Dictionary.com). This would probably be Wiktionary-specific (i.e. created and used solely by editors here) and possibly limited to use in English. Thoughts? — lexicógrafo | háblame22:31, 2 October 2010 (UTC)[reply]

I don't think it can really get any simpler than IPA. After all, each phoneme needs to have its own symbol, so you can't simplify without being inaccurate. —CodeCat 22:49, 2 October 2010 (UTC)[reply]
Indeed. Perhaps a better word would be "intuitive"? IPA is great, but not everybody knows the symbols. — lexicógrafo | háblame22:53, 2 October 2010 (UTC)[reply]
No phonetic notation can't get away with using extra symbols, not even enPR (which I have a serious dislike for). And let's not forget we have Wiktionary:English pronunciation key, which explains the symbols. I don't really think it can be any clearer than that. In any case, we should prefer international standards so others can understand it, and since IPA is already the most widely known, I don't think we need yet another system. —CodeCat 22:59, 2 October 2010 (UTC)[reply]
It's only in the US that IPA has not yet become standard for dictionaries. How could we make an "intuitive" system that is universal to all cultures, including those that do not speak English and that lack some sounds present in English? IPA exists precisely to address that problem. If additional help is needed, audio files are an excellent solution. --EncycloPetey 06:03, 7 October 2010 (UTC)[reply]


Wiktionary traffic?

This amusing page shows how many times per day the word "Dictionary" has been viewed on Wikipedia, in a given month: http://stats.grok.se/en/200907/Dictionary

Is there any similar tool for Wiktionary, displaying traffic per entries? Can I know how many times, and when, people saw example? --Daniel. 02:54, 21 October 2010 (UTC)[reply]

http://wikistics.falsikon.de/latest/wiktionary/en/ . Apparently we're just here to tell people about MILF and sex. --Bequw τ 22:25, 23 October 2010 (UTC)[reply]
Yes, apparently there are many people interested in the abbreviation of Moro Islamic Liberation Front and the sixth Icelandic cardinal number.
Thanks for the link, that's what I wanted! --Daniel. 01:35, 24 October 2010 (UTC)[reply]
But that data is for August 2009 and is tagged as latest. Apparently, that page is no longer up to date. Is there a substitue? The uſer hight Bogorm converſation 18:28, 24 October 2010 (UTC)[reply]
It seems this link will give statistics for the page Wiktionary:Beer Parlour (for example) on Wiktionary. Note the «d» in the URL. Kåre-Olav 18:55, 24 October 2010 (UTC)[reply]
Cool, thanks.
Would it be too demoralizing for us to get some information on our top 100 or 1000 most visited pages every month? Should we include the link in every entry page, possibly in the new Statistics section? Should we have annual totals for every page or every page with an English L2 section? DCDuring TALK 20:04, 24 October 2010 (UTC)[reply]
Kåre-Olav, thank you very much for that link. It, unfortunately, displays a box named "Enter another wikipedia article title:", that may be used to serch for Wikipedia pages, not Wiktionary pages, so I have to type Wiktionary pages at the address bar. I have asked the developer to fix this.
I would appreciate a list of top 100 pages per month, or other top [number] per [period] at a place like possibly Wiktionary:Pageviews, which would be foreseeably populated by bot. On the other hand, I strongly oppose placing pageviews below a Statistics section at entries; there, they would be irrelevant, meaningless in long-term and distracting. --Daniel. 02:55, 25 October 2010 (UTC)[reply]
The top page visits are somewhat interesting but don't help us understand overall utilization. We really need facts about overall usage. I have a strong suspicion that the overwhelming majority of our English entries have had no visitors other than contributors. I would love to know what kinds of entries have no page views over a year. It would be interesting to see what kind of entries interest users (besides sex, internet/texting, computer-related, word-play, and topical/news-related and seasonal entries). Making it easier for contributors to find out whether a given entry has ever been viewed might give them some assistance in determining whether some tedious kinds of work have any payoff for users. Perhaps some contributors really don't care to know, the project apparently being more a pastime than anything else, with our ostensible noble purposes being so much eyewash.
One point of a Statistics header would be to have a home for such statistics. For the overwhelming majority of entries (the ones with no usage) there is, of course, no one to distract. It is difficult to see how such information could be very distracting even on entries that have visitors.
But if we wish to protect our poor, distractable unregistered users we could:
  1. make the information not visible on the page by default.
  2. make the page-view count tool appear only on the left-hand-side frame.
  3. allow editors to opt in to having it there, lest the distraction of usage facts distract them from their efforts. DCDuring TALK 03:40, 25 October 2010 (UTC)[reply]

Duplication of citations

On WT:Citations (the proposal of policy to manage the "Citations:" namespace), there is the following piece of text:

  • If the citations page exists, it should hold all quotations and references for the term, including any inflected forms. Any quotations used within the entries would be a duplication of these.

I, personally, agree with this concept; it makes sense to keep together a possibly long list of citations while copying 0, 1, 2 or more of them into the entry. I prefer to have the chance to look at all citations of a word together when I want to, instead of having to visually scan both pages completely to achieve the same goal.

However, at my talk page and the history of one Citations:be, there is the suggestion of not following that guideline; specifically, the idea is apparently of only placing citations at the entry or at the "Citations:" page, not both. Thus, effectively, one can move a citation from the "Citations:" page to the entry, causing the former to be less complete. If I am missing something, I would appreciate to be enlightened by an explanation of the benefits of this idea. Otherwise, I would prefer to keep the comprehensive "Citations:" pages. --Daniel. 13:07, 21 October 2010 (UTC)[reply]

Duplication is inevitable: if you use the Citations page then you have to duplicate the definitions. Personally I think they are more useful and convenient on the main page under each definition line, but this is obviously unwieldy if there are very many. I don't mind what's kept on the Citations page or how many quotes there are, because I personally never use it except to store citations of a new word that hasn't been defined yet. (Hopefully eventually we will implement that nifty thing Ruakh invented and solve everyone's problems.) Ƿidsiþ 13:18, 21 October 2010 (UTC)[reply]
Which thing that Ruakh invented? I think the current Javascript show/hide for quotations does a lot to allow us to keep citations in the entry without making the entry unscannable. DCDuring TALK 14:39, 21 October 2010 (UTC)[reply]
Aargh sorry, that's what I meant, and I'm well aware we're already using it. I have no idea what I was thinking of when I wrote that... (extenuating circumstances, I've been up now for nearly 48 hours and I'm literally starting to lose it a bit) Ƿidsiþ 16:07, 21 October 2010 (UTC)[reply]
To clarify, I believe that that show/hide thing is a combination of Atelaes (talkcontribs)'s and Conrad.Irwin (talkcontribs)'s work. Specifically, I believe that Conrad.Irwin added the general support for having show/hide sections that belong to different categories with cookie-based sidebar support for user preferences to show or hide different categories by default, and that Atelaes added the specific support for a "quotations" show/hide category that works by finding unordered lists [of quotations] nested within ordered lists [of definitions]. What you were thinking of when you invoked my name was probably the thing that eventually became {{quotations-top}}; that, I had created by making relatively minor changes to the {{trans-top}}/{{rel-top}}/etc. family of templates so they could "play nice" with nested lists. The current show/hide thing is a much more dramatic change. {{quotations-top}} never gained consensus, and even if it had, it would now obviously be obsolete. I consequently deleted it a few months ago. —RuakhTALK 17:19, 21 October 2010 (UTC)[reply]
Duplication is annoying because of the risk of the two copies becoming unsynchronised. In theory perhaps we could just have template-style references to individual citations within the entry, which would be populated from the unique copy on the citations page when the entry was displayed, but I have no idea about the practicality of that. Equinox 13:21, 21 October 2010 (UTC)[reply]
Luckily, quotations are rarely edited, so synchronization isn't much of an issue. DAVilla 10:38, 27 October 2010 (UTC)[reply]
How about this: For each set of citations for a particular entry, create a subpage of the citation page. Then transclude those subpages in both the entry and the main citation page. That way there is no duplication while they still appear in both locations. —CodeCat 14:29, 21 October 2010 (UTC)[reply]
I would rather avoid any approach that makes it even more difficult to add additional citations. —RuakhTALK 14:55, 21 October 2010 (UTC)[reply]
I agree. Also, including all the citations under each definition on the page (even in a collapsible form) makes it impossible to select and promote those few really good citations that easily deomnstrate use of the word. Citations are organized chronologically. If we have all the citations collapsed under the definition, then opening them hits the reader first with the oldest and hardest to use quotations. And if we reverse the order (an idea I dislike), then that puts the most recent, and possibly the most "innovative" uses of the word first, or uses that are not clear-cut. I much prefer that an entry have a small and select subset of the Citations page quotations in the entry, a subset selected critically by a person because they most clearly show the sense defined. --EncycloPetey 02:09, 23 October 2010 (UTC)[reply]
I would like to evole CodeCat's idea: what if we introduce several subpages of the citations page corresponding to the sets of citations under each sense in the main entry? Thus, if we transclude Citations:entry x/sense 1 (and so forth up to Citations:entry x/sense n, where n is the number of senses) in the entry and main citations page, both pages will remain complete and up to date. The uſer hight Bogorm converſation 19:03, 23 October 2010 (UTC)[reply]
Using numbers to reference senses is a bad idea because the order might change, or we might insert another sence in between. —CodeCat 19:45, 23 October 2010 (UTC)[reply]
No, in lieu of resorting to numbers in the title, I would rather propose using an appropriate and descriptive word from each definition, e. g. Citations:chafe/heat, Citations:chafe/injury and Citations:chafe/vexation for each of the nominal seses. The uſer hight Bogorm converſation 07:18, 24 October 2010 (UTC)[reply]
As EncycloPetey said and I agree "including all the citations under each definition on the page (even in a collapsible form) makes it impossible to select and promote those few really good citations that easily deomnstrate use of the word." Thus, I oppose the creation of these subpages. --Daniel. 03:22, 25 October 2010 (UTC)[reply]
Please don't create any more subpages and templates for citations. You are making editing impossible for non-technical editors like me. I see no harm in duplicating citations in a "Citations" page, though I wouldn't bother creating such pages. The system of hiding quotations under each sense is fine. Should we keep the "Quotations" header if it only directs to a Citations page where the same old quotations are repeated?--Makaokalani 16:38, 25 October 2010 (UTC)[reply]
Absolutely. It is a visual indication that the page structure includes a link to such information. Our data structure needs this signal. --EncycloPetey 04:54, 26 October 2010 (UTC)[reply]
There may be some cases where we have quotations in citation space that don't support anything on the entry page (either temporarily or more lastingly), but are somehow worth retaining, perhaps related to a RfV- or RfD-failed sense. In those cases we might not want to distract readers with a Quotations header. Otherwise, the phenomenon of frame-area blindness is likely to lead normal, newer users to miss the citations tab, so the visual indication is valuable, as EP says. DCDuring TALK 11:24, 26 October 2010 (UTC)[reply]
Absolutely not. Actually, why even have a Citations page where the same old quotations are repeated? The Citations page is only useful if it has one or more quotations that the entry does not have. —RuakhTALK 11:55, 27 October 2010 (UTC)[reply]
Your answer presumes that the entry and citations pages will remain static. A citations page should continue to grow well beyond the capacity to contain the same information in the entry. Better to add the section and link now than forget to add it later when the citations have expanded. --EncycloPetey 06:18, 31 October 2010 (UTC)[reply]

Compare the quotations at amnicolist#Noun with those at Citations:amnicolist for an example of the utility of "duplicating" citations in the Citations: namespace. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:25, 27 October 2010 (UTC)[reply]

"A Programming Language" or "APL"?

From at least two discussions (1 and 2), I see that more than one person seem to want Appendix:A Programming Language renamed to Appendix:APL.

The latter is currently a redirect to the former. There is a whole set of pages with "A Programming Language" in their names, so renaming one appendix alone would bring inconsistency. On the other hand, if we decide to keep a set of APL pages without "A Programming Language" in their names, then, as a result, the following pages would probably be moved:

From the arguments seem by me until now (e.g., people most commonly know that language as APL, not as A Programming Language), I conclude that this is a reasonable proposal. I, personally, support moving the pages as described above. --Daniel. 01:59, 23 October 2010 (UTC)[reply]

w:APL (programming language) never once uses "A Programming Language" to refer to APL, nor did the same page a year ago, so that's an argument that "A Programming Language" is simply wrong.--Prosfilaes 11:34, 23 October 2010 (UTC)[reply]
The WP article starts: "APL (named after the book A Programming Language)". Further it states (in poor English): "Published in 1962, the notation described in A Programming Language was recognizable yet distinct from later APL." Later: "APL was first available in 1967".
All of the commercial product implementations mentioned in the article use APL as part of the name. This is uncontroversial.
What is more controversial is the existence of an appendix for each individual character of the APL character set. DCDuring TALK 12:13, 23 October 2010 (UTC)[reply]
I have moved Appendix:A Programming Language and other pages, per consensus. I have chosen the particular name Category:APL language for the primary category, mainly by comparison with Category:C language.
The list with glosses and links to individual pages for each spelling gives proper places for etymologies, categories, "The Universal Character Set" box, scannable connections between words and so on. I have not seen any alternative proposal that simultaneously results in avoiding indidual pages, keeping relevant lexical information and providing technical consistency. --Daniel. 02:59, 24 October 2010 (UTC)[reply]
I'm not sure if the name 'Category:APL language' it's a good idea. It's bound to bring up discussion about RAS syndrome eventually. —CodeCat 19:28, 24 October 2010 (UTC)[reply]
Would the title Category:APL look better, on your PC computer? --Daniel. 02:27, 25 October 2010 (UTC)[reply]
It is no accident that the WP article is w:APL (programming language). "APL" doesn't help someone not familiar with the existence of the language, leading to wasted time waiting for page downloads. "APL language" could be a computer language or a natural language or something else. DCDuring TALK 03:47, 25 October 2010 (UTC)[reply]
If a certain distinction between types of languages is desirable to be shown at the category title, perhaps the current Category:APL language should be renamed to Category:APL computer language. --Daniel. 07:06, 25 October 2010 (UTC)[reply]

Announcing its existence.​—msh210 (talk) 04:42, 26 October 2010 (UTC)[reply]

Thanks. --Daniel. 04:45, 26 October 2010 (UTC)[reply]
That ... sounds like it's not going to be very productive. We haven't really had much discussion on how a list/table format could look, or the benefits and downsides of each method. Nor has the idea of using a namespace really been discussed. Also, it seems strange that this vote is specifically limited to fictional universe appendices, when we have the exact same issue with appendices for certain constructed languages, proto-language appendices, and other similar appendices like the snowclone appendix. --Yair rand (talk) 04:51, 26 October 2010 (UTC)[reply]
Most of these multiple issues were pointed out by me and elaborated at Wiktionary talk:Votes/pl-2010-10/Disallowing certain appendices#My possible vote. Feel free to read it and discuss them. --Daniel. 05:44, 26 October 2010 (UTC)[reply]
We often don't have very much meaningful discussion of the consequences of our decisions. In this case most folks have some experience with what tables and lists look like and with what entries look like. The vote seems focused on an area that is perceived to be a problem as opposed to proto languages, which is not. Moreover, the current state of affairs for proto languages is the result of extensive discussions and gives our normal entries significant benefits, without compromising our standards for inclusion in principal namespace. Snowclone appendices are explicitly a hoped-for experiment to allow us to cover a set of real-world natural-language phenomena. That might be worth a separate vote, when, as and if enough folks perceive it as a problem. DCDuring TALK 11:08, 26 October 2010 (UTC)[reply]

Splitting Láadan and other by groups

The list of works and languages at the See also section of Appendix:Láadan includes an increasingly large list of pages that are effectively related by their treatment from CFI, but not always directly related to each other by their uses and contexts. This list of can be improved in similarity. If no one objects, I am going to remove it and replace it by the new {{minor auxiliary languages}} and other templates for different groups of pages as applicable. Some examples of subdivided lists would be possibly computer languages (linking between COBOL, HTML and APL but not Harry Potter, Klingon or Unattested terms), Japanese works and unattested terms. --Daniel. 19:21, 27 October 2010 (UTC)[reply]

Duplication of categories

Is the duplication of categories such as Category:id:Cardinal numbers and Category:Indonesian cardinal numbers normal and common? Malafaya 14:49, 3 October 2010 (UTC)[reply]

Major point of contention. See Wiktionary:Votes/pl-2010-06/Number vs. numeral. — [ R·I·C ] opiaterein17:01, 3 October 2010 (UTC)[reply]
There was an earlier vote where it was decided that we use the language names now. Therefore, the categories containing the language codes need to be orphaned and then deleted. -- Prince Kassad 17:23, 3 October 2010 (UTC)[reply]
That vote concerned thematic categories too (such as "Indonesian colours" instead of "id:Colours")? Malafaya 17:29, 3 October 2010 (UTC)[reply]
No, just the number categories (Category:Cardinal numbers by language and Category:Ordinal numbers by language). -- Prince Kassad 17:34, 3 October 2010 (UTC)[reply]
I wonder how much support there is for undoing the vote Wiktionary:Votes/pl-2010-01/Number_categories. The vote had a subject that was out of sequence, before Wiktionary editors decided which part-of-speech headings to use for number-related words. The argumentation that I have made in that vote made an assumption about the part-of-speech headings for number-related words that may turn wrong or unsupported. The vote asked a secondary question of how the categories should be named instead of asking the primary question of whether they are part-of-speech categories or topical categories. I would support undoing the vote, returning to indeterminacy that reigned before the vote. --Dan Polansky 07:26, 4 October 2010 (UTC)[reply]
Certainly not from me. I never understood how a word can be in no lexical categories at all, which was the case with all cardinals and ordinals before that vote. -- Prince Kassad 07:59, 4 October 2010 (UTC)[reply]
Re "... word can be in no lexical categories at all, which was the case with all cardinals and ordinals before that vote": That is demonstrably incorrect. Many ordinals had already part-of-speech categories assigned before the vote. They still have: čtvrtý and fourth have the part-of-speech of adjective. --Dan Polansky 08:16, 4 October 2010 (UTC)[reply]
Basically, the "numeral" cats have been replaced by "numbers" counterparts. Is this it? Malafaya 22:23, 4 October 2010 (UTC)[reply]
No, that is not it. The vote on "numeral" vs "number" is currently still running: Wiktionary:Votes/pl-2010-06/Number vs. numeral. The other vote, already completed, was on "Spanish ..." vs "es: ...". --Dan Polansky 08:45, 6 October 2010 (UTC)[reply]
Thanks, Dan. Meanwhile, I had realized that. Malafaya 15:01, 6 October 2010 (UTC)[reply]


Malagasy Wiktionary

I've noticed that someone on the Malagasy Wiktionary (mg.wiktionary.org) is using a bot to mass-import entries from our site. This isn't so much of a problem except that they are leaving the entries writtn primarily in English, and the template calls aren't being removed or supported. That Wiktionary may be headed down the same dark road as the Russian and Vietnamese Wiktionaries. [sigh] --EncycloPetey 03:05, 30 October 2010 (UTC)[reply]

The Burmese is the absolute worst, check it out. Mglovesfun (talk) 11:27, 30 October 2010 (UTC)[reply]

They do the same from the French wiktionary (without any change to the pages). Some templates have an mg version, making entries more or less readable, but some templates are not supported, and definitions, notes, etc. are in French. Lmaltier 19:54, 4 November 2010 (UTC)[reply]

Why is this in the Wiktionary: namespace, and can it be moved to an Appendix? Also, is there some sort of criteria for words to be listed on that page? — lexicógrafa | háblame19:22, 30 October 2010 (UTC)[reply]

No, no and no. -- Prince Kassad 19:40, 30 October 2010 (UTC)[reply]
Why is this in Wiktionary, and can it be moved to Geocities? Equinox 23:04, 1 November 2010 (UTC)[reply]
I proposed it for deletion, and it was kept, in part because it would have meant some official policies becoming wrong, such as Wiktionary:CFI#Protologisms. Seems a bit backwards to me, but hey. I don't make the rules. Not on my own, anyway. Mglovesfun (talk) 23:58, 1 November 2010 (UTC)[reply]
We should really come up with a decent criteria for inclusion of protologisms, clear out most of the existing list, move it to the appendix namespace, and figure out a standard format for it. --Yair rand (talk) 00:04, 2 November 2010 (UTC)[reply]
I suppose if it is moved to the Appendix: namespace then whatever the format is will be influenced by the vote Wiktionary:Votes/pl-2010-10/Disallowing certain appendices. — lexicógrafa | háblame01:31, 2 November 2010 (UTC)[reply]

New index for disambiguation pages

Currently, Appendix:Variations of "a", Appendix:Variations of "b" and other twenty-five disambiguation pages comprised of only one character display this index:

Variation
Appendices
:
a b c d e f g h i j k l m n o p q r s t u v w x y z
(space) | - . ? ^ = #


I would like to replace it by another index, that lists all pages together, regardless of the quantity of characters. My initial proposal would be this:

Template:vars

Thoughts? --Daniel. 01:27, 5 October 2010 (UTC)[reply]

Does that have all these disambiguation pages right now? Even if it does, I'm worried that it will eventually get out of hand.--Prosfilaes 02:07, 5 October 2010 (UTC)[reply]
Why link at all? People don't go to the disambig pages directly, they go via real entries (eg a). Those entries already link to logically related pages (eg "a" links to the other single Latin letters). There's just going to be more disambig pages, so organizing them by category (Category:Variation appendices) seems easiest. --Bequw τ 03:11, 5 October 2010 (UTC)[reply]
Allow me to be more clear: By "I would like to replace it by another index", I am mainly interested in indexing properly the disambiguational pages. That is, they may be shown anywhere as long as it is findable: they may be listed in all variation appendices by means of a template; or, alternatively, they may be displayed at a dedicated page, such as possibly Wiktionary:Disambiguation. The organization of disambiguation pages would help people to browse and edit them. --Daniel. 01:35, 7 October 2010 (UTC)[reply]

Only main categories in All topics

I have disabled automatic placement of all non-English topical categories into the non-English All topics category, by editing {{topic cat parents}}.

The English category Category:All topics now contains the following major categories: Category:Business, Category:Communication, Category:Containers, Category:Culture, Category:Food and drink, Category:Geography, Category:History, Category:Human, Category:Information, Category:Language, Category:Matter, Category:Military, Category:Movement, Category:Nature, Category:People, Category:Philosophy, Category:Recreation, Category:Sciences, Category:Senses, Category:Sex, Category:Social sciences, Category:Society, Category:Space, Category:Sports, Category:Technology, Category:Time, Category:Transport.

The SpanishGreek category Category:el:All topics should also contain only the major categories: Category:el:Business, Category:el:Communication, Category:el:Containers, Category:el:Culture, Category:el:Food and drink, Category:el:Geography, Category:el:History, Category:el:Human, Category:el:Information, Category:el:Language, Category:el:Matter, Category:el:Military, Category:el:Movement, Category:el:Nature, Category:el:People, Category:el:Philosophy, Category:el:Recreation, Category:el:Sciences, Category:el:Senses, Category:el:Sex, Category:el:Social sciences, Category:el:Society, Category:el:Space, Category:el:Sports, Category:el:Technology, Category:el:Time, Category:el:Transport.

The update of Category:el:All topics to contain only the listed categories is not immediate; it will take some time for the servers to catch up. Before the update has been completed by the servers, Category:el:All topics now contains 265 subcategories, including derivations categories.

Related subjects: {{topic cat}}, Wiktionary:Topics. --Dan Polansky 10:29, 5 October 2010 (UTC)[reply]

As I understand it, you've reduced 'All topics' to only those categories that have no parent? (Also, I think you meant Greek, not Spanish). —CodeCat 12:19, 5 October 2010 (UTC)[reply]
I have reduced the English "All topics" to mostly those major categories that were originally there back in 2008, plus possibly some others. For instance, "Sports" is a subcategory of "All topics" while "Archery" is a subcategory of "Sports" but not of "All topics". This was the common practice back in 2008, and even now in 2010 the practice was still largely kept for English topical categories.
I have reduced non-English "All topics" to match English "All topics". Before the reduction, non-English "All topics" have contained every single non-Englich topical category (and derivation categories), in variance with what was the case for English "All topics".
Yeah, I meant Greek and not Spanish; I actually wanted to pick Spanish and erroneously picked "el" instead of "es". --Dan Polansky 12:48, 5 October 2010 (UTC)[reply]
To answer more closely your questions: the categories that are placed in "All topics" have mostly explicitly the category "All topics" in their topic-cat-parent. Of these, some have no other parent than "All topics", but some also have another parent: Category:Social sciences has parents Sciences, Society, and All topics. --Dan Polansky 12:51, 5 October 2010 (UTC)[reply]
I've never ever browsed Category:All topics in my life, because it's too big. So reducing it to exclude categories that have parent categories which are also in Category:All topics seems good to me. I favor it, but not all that strongly as I doubt I'll ever use that category. Mglovesfun (talk) 12:57, 5 October 2010 (UTC)[reply]
I came across a question and an answer related to this subject from 2008 in Wiktionary:Beer_parlour_archive/2008/January#Category tree for topic categories. I quote:
"...: do we want to have all the topic categories listed in *Topics, or merely the top level categories (eg Category:xx:Sciences)? Physchim62 12:28, 8 January 2008 (UTC)
"Merely the top level ones. *Topics is intended to be the root of a topical category tree. The structure of that tree should be parallel across all languages for which the English Wiktionary has categories. --EncycloPetey 02:23, 11 January 2008 (UTC)".
--Dan Polansky 15:02, 6 October 2010 (UTC)[reply]

Since many people have felt we should first have a vote on headers in the number vs. numeral war, I created this vote. Advice and suggestions are welcome. -- Prince Kassad 12:07, 6 October 2010 (UTC)[reply]

The title "Number vs. numeral 2" is misleading. The choice of "number" vs "numeral" is secondary to the main distinctions drawn by the vote. A better title would be "Part of speech of number-related words".
There needs to be a thorough discussion of the subject "Part of speech of number-related words", and it is quite likely to be complex. --Dan Polansky 14:00, 6 October 2010 (UTC)[reply]
Re: "The categorization is not affected by this vote" standing in the vote: How come? The determination of part-of-speech headings drives part-of-speech categories.
I have added an option to the vote. One things is unclear: is the vote only about cardinal and ordinal numbers, or is it also about such terms as "double", "triple", "quadruple", "twice", "thrice", "duplicate", "triplicate" and "doubly"? How does the vote affect Latin numbers, including not only cardinal and ordinal but also adverbial, distributive, multiplicative? --Dan Polansky 14:18, 6 October 2010 (UTC)[reply]
Re: "The categorization is not affected by this vote" standing in the vote: How come? - it is because some people want to use ===Numeral=== as a heading but categorize into Category:cs:Cardinal numbers (yes, with codes), and I wanted to accommodate this group. The other questions should preferably be answered by someone who has a better grasp of numbers than me, but I don't regard double or twice as numbers/numerals at all. -- Prince Kassad 14:57, 6 October 2010 (UTC)[reply]
He who wants the PoS "Numeral" and category Category:cs:Cardinal numbers (a topical category) should still be okay with having the entry also in Category:Czech numerals, a part-of-speech category.
You should think about how you arrive at your views about numerals, other than through an exposition through a school tradition particular to a country. The vote that you propose is so formulated that it would regulate all languages rather than being constrained to German. The Czech words for "double" and "twice" are classed as numerals by the Czech grammar that I have been tought and that is exposed in the Czech Wikipedia, in W:cs:České číslovky. That is why it is not necessarily clear what the vote is about. The vote appears only to treat cardinals and ordinals. In Latin, there are also adverbial, distributive, multiplicative numerals, it seems. --Dan Polansky 15:24, 6 October 2010 (UTC)[reply]
I have renamed the vote; the original title was really about something else. --Dan Polansky 15:27, 6 October 2010 (UTC)[reply]
If we don't even know what numbers and numerals even are, how are we ever supposed to solve this problem? CFI should be changed to forbid numerals completely until we understand what they are in various languages, until then it is impossible to edit in peace -- Prince Kassad 15:33, 6 October 2010 (UTC)[reply]
But if we don't know what numerals are, then how we can find, recognize, and delete entries for them? —RuakhTALK 16:52, 6 October 2010 (UTC)[reply]
(unindent) There really is no urgent and painful problem to be solved. There is a disunity in the treatment of part-of-speech headings, that's all. This disunity could best be first consolidated on the level of a specific language. This disunity did not prevent a cooperative creation of entries for number-related words. We don't need to forbid things about which there is no consensus that they should be forbidden. There is no emergency. There is no need for edit warring.
We do have an idea of which words can come within the shooting distance of number or numeral; we only first need to calmly discuss the weird cases, point them out, and then state in the vote how we deal with them, or state that they lie out of scope of the vote. --Dan Polansky 15:45, 6 October 2010 (UTC)[reply]
I strongly agree with DanP's PoV. As an outsider to the discussion, it seems more useful for us to be inclusive of number words, whatever syntactic category they fall under for particular languages, than to have some universal PoS. Does anyone have the expertise to say they comprehend the universal grammar of number underlying all languages' treatment on number? I'd be surprised. DCDuring TALK 19:11, 6 October 2010 (UTC)[reply]
I think we all have that expertise: the concept of "universal grammar" is B.S., ergo {the universal grammar of number underlying all languages' treatment on number} = Ø ⊆ {what Foo comprehends}, for any user Foo. (But, I also agree with Dan: this is no emergency. It's just frustrating how incapable we are of reaching any level of agreement on anything. :-P   ) —RuakhTALK 20:29, 6 October 2010 (UTC)[reply]
I think we need a thorough presentation of data about how these words are used in the major languages we treat before we create a list of alternatives in how we handle these. Without a look at data concerning actual usage, we'd be making a decision blindly. We've had some comments made in recent discussions, and ideally we'd have someone reasonably fluent in a language give a short summary stating why (or why not) recognize "Numeral/Number/etc." as a part of speech in that language, and which classes of number words actually merit the recognition. Rather than a discussion page, we could start a Wiktionary:Number words treating the general subject with some details for those major languages. Even if this just results in an incomplete and sketchy draft covering just a few languages, we would have a more solid idea of the relative merit of various approaches. If this idea catches on, I could contribute something this next weekend. (no visiting relatives, no illness, and no upcoming obligations) --EncycloPetey 05:44, 7 October 2010 (UTC)[reply]
I have written a very sketchy draft for Wiktionary:Number words. I am looking forward to seeing this draft completely revamped or extended by you, as you see fit. You are possibly the most knowledgeable editor as regards number words in several languages. --Dan Polansky 12:43, 7 October 2010 (UTC)[reply]

Disambiguation by appearance

Since some time ago, I have been quietly developing lists of disambiguational pages by appearance. As a result, I have created these four individual pages recently, based on the system of Appendix:Variations of "album" and other related appendices:

I believe that they can improve findability of related entries. Thoughts? --Daniel. 01:47, 7 October 2010 (UTC)[reply]

I'm not so sure on the use of the word curvy. Would curved be usable? Curvy doesn't necessarily parse correctly until I'm thinking entirely and one hundred percent in English. I'm assuming that other nonnative speakers might also have the same thought. Obviously it's usable as it is and the meaning becomes apparent, but curvy doesn't parse directly well for me. --Neskayagawonisgv? 20:51, 23 October 2010 (UTC)[reply]

What about users of such entries other than participants in WT:BP?

  1. What do we know about users coming to wikt for typographical information?
  2. Do we get anon contributions to such entries, queries from anons about the entries at Feedback or WT:ID, or even anon vandalism?
  3. Why would anyone think to come to wiktionary for such information?
  4. What are the other available sources? (WP comes to mind, but there must be other online sources)?

For words, even proper nouns, we have models, standards of comparison, competitors, user expectations that we have some modest understanding of and intuition about. I personally have no such understanding or intuition about users and usage of this kind of information. DCDuring TALK 03:34, 7 October 2010 (UTC)[reply]

At Daniel Dot: Do you plan to let Category:Variations of words contain as many entries as there are diacritics-free spellings? That is what one would think from the existence of Appendix:Variations_of_"album". I am not sure this is a good idea: this is going to overflood the appendix namespace with a huge number of these sorts of lists.
This revision of "album" looked about right; albúm, and albüm would need to be added, but אלבום, альбом, アルバム have not traditionally been placed to the See-also top sections in the mainspace.
This change in practice seems big enough to require a thorough discussion and even a vote. --Dan Polansky 11:32, 7 October 2010 (UTC)[reply]
DCDuring, please see -, =, and . They display the common "See also" at the top. Notably, Appendix:Variations of "i", Appendix:Variations of "l" and various other similar pages have a section for "similar symbols". Therefore, other people have been making major edits that contribute to our current practices on how to disambiguate between entries. I do not see the creation of Appendix:Variations of blank characters as a novelty; instead, I see it as part of the natural course of developing a reliable and helpful disambiguational system. That system includes, for instance, the very useful (but poorly named) Appendix:Easily confused Chinese characters.
Keeping readers from seeing lists of characters (or words, for that matter) whose appearances are similar would result in poor findability of certain entries. For instance, one would have to have certain knowledge of both IPA and unicode to reach ˜ directly. Otherwise, he or she might conceivably look for the IPA character at the erroneous entry ~, which is much more easily typeable from common keyboards.
Our current practice is apparently designed mostly to help users who have difficulty with character sets other than the default on their keyboard, AFAICT. It is limited to words, AFAICT. Both abstraction/generalization and harmonization are interesting heuristics for suggesting new directions, but utility is still an issue.
What may be useful in Chinese is not guaranteed to be of use for other scripts.
I have some difficulty in imagining the use cases for the four appendices you offer as examples. DCDuring TALK 20:17, 8 October 2010 (UTC)[reply]
Dan Polansky, I don't see a practice stating exactly which romanization system is to be used for each script, to prohibit (or allow, for that matter) アルバム as a variation of "album" to be listed at Appendix:Variations of "album". --Daniel. 13:30, 7 October 2010 (UTC)[reply]
If you don't see the practice, then look around and show me the entries that contain various scripts in "See also" at the top. From what I remember, I have not seen any such entries. So from what I can tell, you are introducing a new practice, one that has been rejected from what I remember. I do not feel like looking through the archives, though; that is your job. Maybe other editors can provide some input.
A common practice does not forbid things: it simply does not contain them. It is written policies that forbid things. There is certain precedent for what the "See also" sections typically contain. Have you bothered to find out what the precedent is? Have you bothered to look for previous discussions of the subject? --Dan Polansky 13:36, 7 October 2010 (UTC)[reply]
My question again, and some more: Do you plan to let "Category:Variations of words" contain as many entries as there are diacritics-free spellings? How many appendixes do you estimate this is going to generate? --Dan Polansky 14:52, 7 October 2010 (UTC)[reply]
No, I do not plan to let "Category:Vatiations of words" contain as many entries as there are diacritic-free spellings. I have never suggested this, so I estimate the total of new appendices based on it as being 0.
One page that contain various scripts together is Appendix:Variations of "a".
The new practice that I am introducing here are names of pages, such as "Appendix:Variations of arrows". If these names are not wanted, we may simply use the common names, that would be Appendix:Variations of " " (spaces and blank characters), Appendix:Variations of "→" (arrows), Appendix:Variations of "l" (L, l, vertical lines and crooked lines), Appendix:Variations of "Δ" (delta and triangles), Appendix:Variations of "口" (squares) and Appendix:Variations of "日" (the han "day" character), while leaving various other characters without close links to each other. --Daniel. 23:59, 20 October 2010 (UTC)[reply]
  • Are you suggesting that:
    1. users will come to Wiktionary seeking to find the meaning of a symbol, that is not a letter in any script?
    2. users will come without any knowledge of the context of use of the symbol (which context might lead them to an entry (eg, IPA) which could have links to appendices) ?
    3. users will find the appendices you have created based on the title you have given the appendix, even though appendices do not appear in default searches?
If so, it seems implausible to me. User expectations or a dictionary are not generally to find things other than words and the symbols used to make text. Users are likely to have some idea of where they found something. Until we have done some work to:
  1. find categories of symbols that people really want,
  2. provide means of helping them realize that wiktionary has them, and
  3. have means for them to actually find them efficiently and surely once at Wiktionary,
this kind of effort seems likely not to yield useful results now. I could see that might turn out to be more useful in the future than it appears to me now.
This is in contrast to the Appendices for HTML, IPA, and APL, which clearly have value and for which we can construct paths that are likely to work for users. Signal-flag conventions, sign languages, Morse code, road signs, proofreading marks, shorthand, hazard signs etc, all might be of value as appendices. We still may have some trouble getting users to realize what we have, but other parts of the battle seem easy. DCDuring TALK 01:44, 21 October 2010 (UTC)[reply]
After analyzing various statistics from this link, I conclude that users usually do not come seeking the meaning of symbols that are not letters in any language. (they come mostly to see the entry "sex" and related ones)
However, based on other resources, the answer could be different. Firstly, histories like this show that multiple people create and revise symbols. Relatedly, entries like ",", with its currently twelve definitions, are considerably useful for displaying grammatical information, comparison with the fullwidth and other versions, and links to more punctuation. Many punctuation marks (and numbers, and others) carry meanings that are I expect to be searched, as I expect words to be searched. I, particularly, remember using Wiktionary to search for Japanese punctuation when I was learning it.
More obscure symbols found on unicode such as a skull, an umbrella or pieces of domino may require further discussion; in my opinion, one reasonable decision would be to move them to appendices akin to our list of APL symbols, leaving them out of the main namespace.
I absolutely support the existence of a page to list all the blank characters, and one to list all the arrows; however, I am not particularly interested in advocating any title: I find both Appendix:Variations of arrows and Appendix:Variations of "→" equally imperfect, because they are too long to comfortably type from memory, but they are very findable because they would be at the top of the relevant entries.
Yes, the hypothesis of one wanting to see the definition for the IPA ˜ but arriving at ~ instead seems very reasonable to me. If I remember correctly, AutoFormat has the function of transforming the common slash "/" into the IPA bar, because editors occasionally type the former, so a perfect mental awareness of different but similar unicode characters is not to be always expected. If, otherwise, every reader were supposed to know exactly what entry contains relevant information, then we would not need the headers with "See also" at all.
Finally, I support the statement "the Appendices for HTML, IPA, and APL, [...] clearly have value and for which we can construct paths that are likely to work for users. Signal-flag conventions, sign languages, Morse code, road signs, proofreading marks, shorthand, hazard signs etc, all might be of value as appendices." --Daniel. 01:22, 24 October 2010 (UTC)[reply]

OUP looking for Language Engineer, Dictionaries

I saw this announcement and thought there might be folks here who would be interested. I have no connection with OUP.

Language Engineer, Dictionaries, Academic Division Oxford University Press Oxford

Exciting opportunity to help shape the future of dictionaries with the world’s leading dictionary 
publisher.

The successful candidate will join our language engineering team, which is responsible for 
development and exploitation of the corpora and lexical data used in OUP’s print and online
dictionaries and licensed for uses such as search engine technology, machine translation,
e-readers and mobile applications.

You will have experience of writing programs that process natural language data, an enthusiasm 
for developing and applying innovative techniques in linguistic analysis for monolingual and 
bilingual lexical data development, and an awareness of the issues involved in processing diverse
languages. A practical, problem-solving attitude is essential. Expertise in any of the following
would be an advantage:
 
*Perl, Python, Java, or similar
*XML, XSLT and related technologies
*Statistical Language Processing
*Interactive web application development
*Familarity with a non-European language such as Arabic, Chinese or Japanese.

OUP offers excellent benefits including:
*Competitive salary dependent on skills and experience.
*final salary pension scheme
*25 days' holiday
*subsidized staff restaurant
*50% discount off OUP books
*flexible start/finish times

Apply for this position at: http://ukjobs.oup.com/Exp/Vacancy.aspx?VacancyId=39440

Closing Date: 31 October 2010

Informal enquiries: pete.whitelock@oup.com

Pete Whitelock
Head of Language Engineering, Dictionaries
Reference Department 
Academic Division
Oxford University Press

--Brett 11:22, 7 October 2010 (UTC)[reply]

colspan, etc.

Hmm. I see colspan, bgcolor, nbsp, cellpadding and cellspacing with a "(HTML)" context, but at the entry mainspace. I'm going to move these definitions to subpages of Appendix:Hyper Text Markup Language and format them accordingly. --Daniel. 12:38, 7 October 2010 (UTC)[reply]

I have serious doubts that keywords of HTML should be extensively documented in Wiktionary appendixes, one appendix per keyword. The existence of the following pages raises doubt:
This does not seem to be a dictionary material. One appendix for all the HTML keywords would be more than enough.
A usage note of the form "Template:art-html-tag-attribs" seems to fit into an HTML reference rather than a dictionary. --Dan Polansky 14:14, 7 October 2010 (UTC)[reply]
Furthermore, what is next? Appendix:Java/if, Appendix:Java/while, Appendix:C++/if, Appendix:Bash/switch, Appendix:Mediawiki/#switch, Appendix:LaTeX/\usepackage? This is all material for Wikibooks. --Dan Polansky 14:22, 7 October 2010 (UTC)[reply]
Just to be sure things are clear: I would like to you to stop creating these appendixes until some other editors support what you are doing. --Dan Polansky 14:26, 7 October 2010 (UTC)[reply]
Wikibooks is for books that teach a subject, including languages. Wiktionary is a reference for terms in a subject, including especially languages, but perhaps also for other things? There is currently no Wikimedia project for HTML/programming references and such, so until there is, why not take on that task? We've already made a start, after all. —CodeCat 14:32, 7 October 2010 (UTC)[reply]
OK, Dan. I am hereby formally ceasing my efforts on organizing or otherwise editing entries (or appendices, for that matter) of programming languages, for respect to the apparent controversy of the situation. I'll happily reconsider this decision, once the position of other editors is clarified. --Daniel. 14:37, 7 October 2010 (UTC)[reply]
Okay. To avoid confusion and as a summary of past activity: you have not only organized but also started every single appendix page for HTML listed above, and you are almost the sole editor of these pages, with few exceptions. You appear to have started in February 2010, with a further surge of activity in September 2010. --Dan Polansky 15:22, 7 October 2010 (UTC)[reply]
CodeCat, not exactly right. There is B:HTML and B:HyperText Markup Language/Tag List, and in particular the likes of B:HyperText_Markup_Language/Tag_List/img. The latter seems to be set up much like the HTML appendix in Wiktionary. If you don't like these Wikibooks pages and you want to setup a book in Wikibooks for HTML reference, I don't see anything that should stop you. The B:HyperText Markup Language/Tag List should actually better become a separate book dedicated to HTML reference.
There is B:Cascading Style Sheets, which has B:Cascading Style Sheets/Index. This model shows that Wikibooks is well suited for these sorts of reference books, and that an index can make a lookup by keyword rather convenient. --Dan Polansky 15:01, 7 October 2010 (UTC)[reply]
Very well, I wasn't aware of all that material on Wikibooks. However, there may be some confusion as to what Wiktionary itself really is. After all, a dictionary is a kind of book too. So do we specifically limit ourselves to defining terms in languages intended for interpretation by humans? Or do we define terms of languages in general, even if those languages are intended to be interpreted by a computer? Either way, I think a formal policy should be established if there is not already one. —CodeCat 15:49, 7 October 2010 (UTC)[reply]
I think we should specifically exclude languages not intended for human communication, such as programming languages and a few others. -- Prince Kassad 16:15, 7 October 2010 (UTC)[reply]
Naturally, when a glossary or a page otherwise deemed appropriate for a dictionary appears at Wikibooks, the Wikibookians apparently feel strongly inclined to move it to Wiktionary. Related discussions include: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
These ten examples are not specifically related to any or all programming languages. However, there is a very specific and relevant warning at the very top of HyperText Markup Language/Tag List: "Further extending this page will create problems. This page has many subpages that are not included in the printed version and there is no clear method how to include them. Given this book is a user guide, it is organized around topics from the user's perspective, not around the names of the tags."
Since this issue concerns both projects, I have specifically asked for opinions of Wikibookians on where to keep HTML tags. --Daniel. 17:20, 7 October 2010 (UTC)[reply]
That note was placed by your very own Dan Polansky above and should only be seen as his opinion alone. The ten instances above are not all accurately characterized as Wikibookians deciding to push pages with definitions here. Also, sometimes people at Wiktionary ask for things at Wikibooks to be copied to Wiktionary as well. The problem comes when a book is just a dictionary, rather than having a glossary. Many textbooks have glossaries, but they aren't dictionaries. b:Glossary of Astronomical Terms was supposed to be transwikied to Appendix:List of astronomical terms for that reason. I also left comments at Wikibooks, specific to the question there. Adrignola 18:15, 7 October 2010 (UTC)[reply]
Re Adrignola, that is correct: the note in "HyperText Markup Language/Tag List" was written by me, and was marked as an editor note. I think that "HyperText Markup Language/Tag List" should better become a separate Wikibook. It is organized around technical keywords such as names of elements and attributes, while "B:HyperText Markup Language" is organized around tasks a user wants to achieve, and other topics. The page ".../Tag List" was once a separate page, moved on 4 June 2006 by Hagindaz from "HTML Tag List" to "HTML Programming/Tag List". If someone wants to write a HTML reference, moving "HyperText Markup Language/Tag List" to "HTML Reference" to get started would be the right step I think. --Dan Polansky 18:57, 7 October 2010 (UTC)[reply]
I don't think we want this sort of thing, but if we do add it, then we have to be descriptivist about it; for example, it should include <blink> and <marquee> and <nobr> and <xml> and possibly <plaintext> and <image> (if anyone uses those last two), and should mention that <blockquote> is used to indent text and alt="" to generate hover-text for an image, and so on. An appendix about HTML 4.01 as deemed valid by the W3C belongs at w3.org. An appendix about HTML 4.01 as actually used … I don't think it's in scope for a dictionary, but it's not quite as bad. —RuakhTALK 18:34, 7 October 2010 (UTC)[reply]
Ruakh, I wonder if an "appendix about HTML 4.01 as actually used" would contain citations from sources of actual web pages... --Daniel. 21:52, 7 October 2010 (UTC)[reply]
Some discussions:
Re "Or do we define terms of languages in general, even if those languages are intended to be interpreted by a computer?" From looking into mainspace (if, switch, then, else, call, return, break), it is our practice right now not to define terms of languages that are intended to be interpreted by a computer. --Dan Polansky 19:11, 7 October 2010 (UTC)[reply]
In reply to CodeCat's "So do we specifically limit ourselves to defining terms in languages intended for interpretation by humans? Or do we define terms of languages in general, even if those languages are intended to be interpreted by a computer? Either way, I think a formal policy should be established if there is not already one.", let me quote a sentence from WT:CFI#Constructed languages:
"Constructed languages have not developed naturally, but are the product of conscious effort in the fulfillment of some purpose. [...]"
The relevant section where the quote appears is subdivided into types of constructed languages, mainly providing information on which ones merit to be defined in entries and which others merit to be defined in appendices (apparently, they are discriminated by: sheer consensus [?], fictional or nonfictional, well-known or obscure, recognized or not by ISO, etc.). Since CFI, according to the quote, is characterizing C, Java and HTML as constructed languages, I suggest eventually updating the section with another paragraph with an ideally solid consensus about whether and where their terms should be defined, when possible. --Daniel. 22:50, 7 October 2010 (UTC)[reply]
Re: "CFI, according to the quote, is characterizing C, Java and HTML as constructed languages": I completely disagree. That quotation is describing constructed languages, not defining them. Or at least, it's not a complete definition. Unless you wish to argue that WT:CFI is characterizing tables and houses and militaries and universities as "constructed languages"? —RuakhTALK 00:06, 8 October 2010 (UTC)[reply]
No, Ruakh, sorry for the apparent confusion. My proposal from that message was simply of future clarification of the policy. My point is that C, Java and HTML are constructed languages anyway, the concept being a simple sum-of-parts meaning "language that is constructed", and I believe that CFI fundamentally agrees with me, by describing (or defining, or otherwise stating) constructed languages in a manner that does not exclude programming languages. The final conclusion is, I suggest that we eventually ban or allow or do something else regarding programming languages, then expose the decision at WT:CFI#Constructed languages, where it fits neatly. --Daniel. 00:33, 8 October 2010 (UTC)[reply]
Understood. I more or less agree with your final conclusion, though I think a WT:CFI#Computer languages right after that section would do just as well. I don't think computer languages are quite "constructed languages", in that they aren't "languages" in quite the same sense. They're not intended for human communication. For example, BCP 47 currently (rfc:5646 always) says:

Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. This includes constructed and artificial languages, but excludes languages not intended primarily for human communication, such as programming languages.

Note that it does use "languages" in reference to programming languages, but presents them as a mutually exclusive set from "constructed and artificial languages". I think one can take programming languages to be a kind of constructed language; that one can take human languages and computer languages to form a single class called "language"; and that such a view is compatible with the text of the CFI. I just don't think that's the most natural reading of the current CFI, and if/when we make the CFI more explicit on this point, I don't see a particular need to use that breakdown to organize it.
RuakhTALK 03:13, 8 October 2010 (UTC)[reply]
Re "My point is that C, Java and HTML are constructed languages anyway": They are not. The phrase "constructed language" (google books:"constructed language") is implied to mean "constructed human language". I doubt that the term "constructed langauge" as commonly used outside Wiktionary has programming languages within its scope. It has Esperanto and Ido in its scope. The differentiating modifier "human" as contrasted to "programming" or "markup" is usually dropped unless there is a risk of confusion in the particular context. The context of "constructed language" is human language and linguistics. I thought it obvious that the context of a dictionary is human language. CFI could be amended to be explicit on this point, but, even without the amendment, the term "constructed language" is already used outside Wiktionary, and gets its meaning from there. (Deriving meaning of terms from their use outside of Wiktionary is what makes it possible to phrase descriptive definitions in Wiktionary.) If you want to show that Java comes within the scope of "constructed language", the process of RFV suggests how to do it: provide citations that show the use of the term "constructed language" in this way. --Dan Polansky 07:06, 8 October 2010 (UTC)[reply]
What CFI means by "constructed language" is also clear from the fact that, in the section "Constructed languages", CFI lists many examples of human constructed languages, and no examples of programming languages. --Dan Polansky 07:32, 8 October 2010 (UTC)[reply]

(unindenting) Verifying the use of "constructed language" as a hypernym of "computer language" is by no means necessary for this discussion, because both terms are fundamentally comprised of sums of parts. Yet, I can attest constructed language as you suggested. And it's done.

As a related note, Appendix:Programming language terms, with an arbitrary and uninformative list of languages and commands, seems rather useless and hideous to me. However, I believe that it will either be deleted or be improved along the computer language appendices as a whole, so I am eagerly waiting for an overall decision on this issue.

Now back to CFI. I have already pointed out that it lists "many examples of human constructed languages, and no examples of programming languages", then concluded that the policy then might be improved by a statement regarding whether or not we define terms for computer languages; and, if we do define them, where they are kept: in the main namespace, appendices or whatnot. It is a simple request that I expect to be fulfilled in the future.

Another piece of policy that can definitely be interpreted as covering programming languages (and does not employ the word "constructed"), is our motto of "all words in all languages". Since C and Java are languages, their words merit inclusion here. One particular controversy that I foresee would be whether or not to consider "switch" and "float" as words. It may span an interesting discussion.

Notably, in Wikipedia, w:Category:Computer languages contains w:Category:Programming languages, and is a subcategory of w:Category:Constructed languages. The computer languages are as well formal languages. Now, let me possibly blur the supposed distinction between languages for humans and for computers, for the purposes of defining them here:

  • For an example of constructed language, not for computers, and that displays strictly logical characteristics akin to programming languages, see w:Lincos (artificial language).
  • For an example of notable use of "grammar" (or, more precisely, "syntax") of programming languages to make poetry, see w:Black Perl.
  • For five examples of entries of computer languages in the main namespace (that are labelled as English nouns or abbreviations whose context is of HTML), see the beginning of this discussion.

Finally, a personal note: as an student of various computer languages, I would appreciate very much if the project of defining each relevant command/tag/value/symbol/etc. were completed, improved and expanded, either here on Wiktionary or anywhere else. Because then I hopefully could easily check the syntax of, for instance, #include <iostream> int main() { std::cout << "Hello, world!\n"; } part by part, like I already do constantly for English, Japanese, Spanish and Portuguese, thus improving my own technical knowledge. In addition, for that matter, I see the potential for additional means of overall organization, including possible indexes like Appendix:Control flow statements, to explain "go to", "for" and "while" of various languages together. --Daniel. 20:07, 9 October 2010 (UTC)[reply]

Personally, I think that any use of the word language in Wiktionary policy should be interpreted as excluding computer languages. It's not inconsistent with the use or standard meaning of the word. As a student of various computer languages, I think having the dictionary here would be a bit bloating; even if I want comparative meanings, mixing computer and human meanings of if would help looking up neither normal nor computer languages. There are no pronunciations for computer pieces of syntax, citations to standards are more appropriate than quotations, and Backus–Naur Form should be standard. It's entirely different entity.--Prosfilaes 20:42, 9 October 2010 (UTC)[reply]
Despite your preference for having no definitions for computer languages here at all, I like your suggestion of using Backus–Naur Form as the standard format, and will take it into consideration when I work with these languages in the future. Thanks.
I also agree with you on the fact that mixing computer and human meanings of if would not be a helpful approach.
As for pronunciations, of course computer languages have them, as their codes are spoken out loud all the time.
The search for the individual meanings of each piece of computer syntax is commonplace by programmers. To mention external web pages, there are this and this to explain the HTML tags img and td respectively. There are also this one and this other one to explain the statements SELECT and HANDLE of MySQL. --Daniel. 20:18, 11 October 2010 (UTC)[reply]
There are at least a thousand ways to pronounce the simple line "GOTO 100", since surely users pronounce numbers in their native tongue. And in my experience, the symbols do not have standard names; even in pure English environments, "{strA[13] = b ^ !c;}" probably has a dozen different verbalizations.--Prosfilaes 08:02, 12 October 2010 (UTC)[reply]
I believe the pronunciation of only GOTO, as opposed to GOTO 100, is enough.
As for the various native tongues, Wiktionary is able to keep multiple pronounciations at once. The potential for many pronunciations is not a novelty; but it has not been a serious issue to date. Since there are at least forty-nine varieties of English, a "complete" English entry would conceivably display many pronunciations.
With that in mind, since apparently most English entries contain only zero, one or two pronunciations, then these entries are astoundishingly incomplete. Kudos to our recorders and IPA writers, who have been doing an excellent job. However, since the goal of writing and speaking every word, in every dialect of every language, is humonguous in itself, it is not surprising that we still do not have a huge amount of huge pronunciation sections to play with.
It is not the job of a dictionary to store multiple verbalizations, like deconstructing the work of "{strA[13] = b ^ !c;}". I would, nonetheless, at least expect entries defining the symbols from your example: {, }, ^, etc. --Daniel. 10:17, 12 October 2010 (UTC)[reply]
I still maintain that "constructed language" is not a perfect sum-of-parts. Surprisingly enough for me, you have indeed found citations of "constructed language" that manage to include computer languages. I surmise that the quotations that you have found show fringe usage by some creative scholars, and that the overwhelming number of texts that use "constructed language" mean "constructed human language". I maintain that the list of examples found in CFI is a sound means of disambiguation of "CFI:constructed language", and that the author who added "constructed languages" to CFI did not mean to include programming languages under "constructed language". Constructed human languages are a group worth of dedicated attention because of attestability requirements: they do not have readily available bodies of texts from which their use could be attested in the same way in which non-constructed human languages have. That is why they have a dedicated treatment in CFI. But I admit that it can be made explicit in CFI that a constructed language is not a programming language and not a markup language such as XHTML.
Wikipedia is confused on what it means by "constructed language", a consequence of anyone being able to edit. Thus, pointing out to Wikipedia's having some category structure is a rather weak argument for anything, especially for the common use of the term "constructed language". When you look at Wikipedia's "List of constructed languages", you find no programming language, but you find some knowledge representation languages, which are markup languages, some of them flavors of XML, other looking rather like LISP. Wikipedia article "Constructed language" does not in any way indicate that programming languages are included.
"All words in all languages" is a short slogan-like highly ambiguous phrase. I read "language" in it as "human language" without hesitation, but other people may differ.
The existence of one broad concept of "language" that covers both human languages and computer languages is questionable. Historically, "programming language" is a term coined on the basis of some similarity, a metaphor. Programming languages do share some features with human languages, but they also show important dissimilarities to them. The word "language" has been used metaphorically for all sorts of purposes, straying away from the meaning of "human language", but that does not mean that all things that are part of some metaphorical meaning of "language" should be included in Wiktionary. "Virtual reality" does share some features with reality, but it is not so that there is a broad concept of "reality" that has hyponymic subconcepts of "real reality" and "virtual reality". And "black hole" is not a hole at all.
The five entries that you have found in the mainspace (colspan, bgcolor, nbsp, cellpadding and cellspacing) show a set of exceptions to a practice, not a common practice. Even you probably realize that they show an exception, or else you would not be proposing to move them to an appendix.
Documenting programming and markup languages in Wiktionary as if they were natural languages is an innovative project that requires a thorough consideration and investigation of likely implications or consequences. The project would basically propose that each keyword of a programmming language or a markup language, be it a core keyword or part of a commonly used API, is documented as if it were a term of natural language. New attestation criteria would have to be developed. Snippets of real programming and markup code would probably count toward attestation: three quotes from CVS in SourceForge spanning some years or something would pronounce a term attested. The result would be a rather original all-in-one documentation of programming languages and their APIs (and of markup languages as well), including such terms as "switch", "JOptionPane", "println", "printf", "paint", "Exception", and "Error"; all names of public classes and methods often used by programs would be covered; all names of elements of XML vocabularies that are actually in use would be covered. Basically, all Java online API documentation would be replicated in Wiktionary, and the same would be done for other languages, and for those Java libraries that are not part of the standard JRE. As intriguing as this may sound, this IMHO lies outside of remit of a language dictionary such as Wiktionary. And it also lies outside of remit of a Wiktionary appendix. To create a page named like "Appendix:Computing language/body" for HTML element of "body" ("Appendix:Computing language/" + keyword) is to imply a quasi-namespace "Appendix:Computing language/" whose number of entries and senses would be huge, although probably not as huge as the number of entries in the mainspace. --Dan Polansky 08:14, 11 October 2010 (UTC)[reply]
I have not advocated the existence of a broad concept of "language" that covers both human languages and computer languages. This broad concept does exist, but it makes no difference, because the word "languages" in all words in all languages is pluralized, opening the way for wider interpretations: "all holes" may refer collectively to a black hole and a hole in a wall, among other holes; "all realities" may refer collectively to consensus reality and virtual reality, among other realities.
No, I do not consider the existence of the five HTML tags as an exception to common practice. My decision to move them to appendices is meant to achieve overall consistence, logic and readability.
Thank you for linking to the old BP discussions Computing languages and Parts of speech of reserved words in computing languages. The first discussion mentions the apparent phenomenon of the creation of xpage and others, and clearly ended prematurely, though I see an inclination to not deny the creation and maintenance of the discussed terms.
Let me quote two relevant messages from it:
 

In the meantime, it might be worrthwhile to begin amassing a list of computer syntax terms as an appendix page (without links). That way, there would be physical evidence of (1) the extensive list of terms, (2) cross-language use, and (3) demonstration of someone's willingness to work on the project. --EncycloPetey 05:51, 28 April 2006 (UTC)

 
 

Perhaps we could start something like that, initially containing only the Wiki-relevant protocols, languages and jargon? Off the top of my head: .css, PHP, python, Perl, WikiSyntax, JavaScript, Solaris (toolserver), RedHat/Fedora Core 3 (cluster), HTML, XML, XHTML, bash, DOS .bat files, M. Perhaps only the top 200 most relevant keywords for each, for a start? --Connel MacKenzie T C 15:26, 12 May 2006 (UTC)

 
Both these discussions also deal with the issue of what part-of-speech should label entries for programming languages. One conpicuous suggestion was of shoehorning the grammatical classes of natural languages into the programming syntax, for example calling GOTO a verb. Today, this question seems to be solved by the current HTML appendices: head is a tag, href is an attribute and / is a symbol. These discussions also specifically prove that I am not the first editor to interpret "all words in all languages" as including computer languages.
I have found another interesting opinion on this issue, from User talk:Connel MacKenzie/archive-2007-1:
 

My pet segment, by the way, is programming languages...I would like to see a "dictionary" style definition for every possible computer instruction, in any/every programming language. I'd like such a thing to identify syntax, by language vendor/version. [...]

 
Your statement "I maintain that [...] the author who added 'constructed languages' to CFI did not mean to include programming languages under 'constructed language'." does not seem relevant for this discussion, because the point is whether CFI is precise or ambiguous, instead of whether or not Eclecticology could express well his or her thoughts on what a constructed language is.
If, on the other hand, we assume that definitely "constructed language" is not a sum-of-parts because it stands only for human languages and never for computer languages, then we come into the field of formation of terms. If one can mix constructed and language to form constructed language with such a specific definition, then terms are not static; as a result, the "second" definition of constructed language that also covers programming languages, also exists. As you have pointed out, it's the job of Wiktionary to inform about both definitions. Since constructed human languages have been developed before electronic computers, it is not surprising that constructed language someday unambigously did not cover programming languages; for what it's worth, according to the current Citations:constructed language, the first quote is from 1844, but the first one encompassing computers is from 1972.
I have been playing it safe by defining HTML tags, C basic functions, and other pieces of code (either directly at the dictionary or at preemptive personal notes) that seem uncontroversial enough to fit any criteria based on letting computer languages to be defined. However, I like your suggestions of developing more mature criteria for inclusion of computer languages, that are prone to result in improvement of the current practices. --Daniel. 20:18, 11 October 2010 (UTC)[reply]
I disagree with a lot of what you write above; in the rest of this particual response I respond only to a selection of points of disagreement, in part by repeating myself in one way or another. "All words in all languages" is a phrase that uses the word "language" in one sense, and the plural seen in "languages" does not change that: the plural pluralizes individuals (individual things, here human languages), not senses. "All holes" in a particular sentence does not refer to black holes. Quantification typically ranges over a single sense; actually, single senses of a term can be recognized as those things over which quantifications range. If you want to quantify over several senses of a single term, you have to be very explicit, because you are doing something odd; by plainly writing "All holes" you have not achieved quantification over all senses of "hole" but rather the reader has to find the sense of "hole" over which "All holes" quantifies, mostly by looking at the surrounding words in the sentence and other immediate context. There are exceptions to this, often in jokes based on equivocation, but a piece of regulation cannot be interpreted as such a joke.
Intentions of authors of sentences do matter, to the extent the intentions can be discerned and used to disambiguate sentences. We are disambiguating sentences all the time; the reader is not free to place any odd meaning to every term that is not syntactically perfectly disambiguated; this is a notable feature of natural languages. Your post has enlightened me on the extent to which some people are willing to misread sentences in a creative way, discarding any collateral evidence of the intention of the author of the sentence, such as a list of examples that come under the term. --Dan Polansky 07:55, 12 October 2010 (UTC)[reply]
I have already proved that constructed languages do include by definition programming languages. I have openly acknowledged that is also possible to consider constructed languages without including computer languages. I used the RFV process, as you suggested and I accepted. I have even formally added another definition for constructed language and attested it, that would label English, too, as a constructed language, contributing to my view of the term not being definitely precise. Now let's go back to how to improve CFI, instead of how to discourage suggestions based on your knowledge of linguistics over mine.
To mention an unreal but similar hypothesis, if User:Example proposes "Every editor, from now on, must only write in blue font.", builds a vote and people accept it, but Mr. Example writes only in cerulean, it shouldn't stop people from writing in azure or baby blue. If there's a good reason to write only in cerulean, that should be made clear. If there's doubt on whether azure and baby blue are usable fonts, a discussion would clarify its meaning or develop other possibilities of colors.
My analogy shows a hypothetical use for the idea of considering the unspoken intentions of the writer for the purposes of this discussion, a concept that you support and I oppose.
However, this idea does not fit the current issue, hence my opposition. How can you conclude that Eclecticology has any particular intentions on dealing with constructed languages, in addition to what is stated on CFI, if apparently he or she has never edited any entry in any constructed language? When he or she added that section, he or she included a subsection "Languages for which there is no apparent consensus" with only nine examples; what are the other constructed languages? If I have ideas on the treatment of other languages, why should I limit myself to choose languages that are very similar to these nine (i.e., human, non-auxiliary, or whatever), instead of also including engineered languages, controlled languages and markup languages? Why should users not be able to search for APL symbols? Why should readers go elsewhere to find out the meaning of "{strA[13] = b ^ !c;}", instead of having the chance to look for these symbols individually as Wiktionarian entries? Do you oppose defining musical notations as well? Why the justification of "Not dictionary material." is enough to terminate new entries for computer languages? CFI has been edited by many people, either directly as shown in its history, or indirectly through votes and discussions; if unspoken intentions of people are to be counted, and you want to count them in favor of your beliefs, what about the editors who have openly expressed the inclination to not deny definitions for computer languages at Wiktionary? Shouldn't these intentions be counted as well? If, differently from current practices, nobody ever mentioned computer languages, shouldn't I be able to introduce this brand new idea and defend it? Why should I have to search for old discussions, instead of simply talking to current people, ask for their opinions and explain why the definition of computer languages here if a good idea? --Daniel. 03:15, 13 October 2010 (UTC)[reply]
The only thing that you have proved is that some few authors use "constructed language" is this broad way, not that most authors use the term "constructed language" in your way that includes computing languages. The definition that you have added "A natural language that has been formally regularized" is wrong: natural language is an antonym to constructed language; a constructed language is a human language but not natural language; by your new definition, Czech would be a constructed language because it has been formally regularized by regulatory bodies, but neither Czech nor other languages that have been made more regular by some governing bodies is a constructed language.
In your usage note at constructed language sourced from the page 3 of Humphrey Tonkin, 2003, you have confused "constructed language" and "artificial language", so I have removed the usage note. The page 3 could not source a single claim about "constructed language", as it speaks of "artificial language". Even if you find one work that explicitly defines "constructed language" as including Java, this still will not show that this is a majority usage, or that there is no sense that constrains "constructed language" to human languages.
Furthermore, the source states on the page 3 that "artificial language" has several different meanings, not that it has one meaning that encompasses programming languages. Again, usually, most terms have several meanings, and most sentences invoke only one meaning or sense of a term rather than invoking all of them at once.
The intended meaning of the CFI's section on constructed languages is unspoken but discernible from the list of examples of constructed languages, to repeat myself. You may reject the discussed section of CFI as unvoted on. But you cannot correctly claim that the section is very ambiguous: the section contains no explicit definition of constructed language, but it provides a set of examples that serve to disambiguate the term "constructed language".
If you want computing languages included, either in mainspace or in a quasi-namespace of an appendix, create a vote, and the discussion is over. In that vote, my normative opinion on what a language dictionary should include will be reflected in my vote. --Dan Polansky 07:10, 13 October 2010 (UTC)[reply]
No, the definition of constructed language that includes English and Czech is correct according to the citations and Humphrey Tonkin, who formally considers artificial language preferred over constructed language, because the latter is "non-technical", but uses both terms in multiple instances throughout his book. Then, I have rightfully readded that kind of definition to that entry. Dan, you appear to not have read the rest of my last message (i.e., my message above your message above). I suggest you read it before replying, so we can continue the discussion; thanks in advance. --Daniel. 15:31, 13 October 2010 (UTC)[reply]
Please quote the passage of Humphrey Tonkin, 2003, from which it follows that "the definition of constructed language that includes English and Czech is correct". --Dan Polansky 15:57, 13 October 2010 (UTC)[reply]
Or not; you don't need to quote anything. The third sense that you have added to constructed language is attested by quotations that I do not fully understand. But it does not really matter. Even if there really is such a third sense according to which "constructed language" is synonymous to "human language", it does not change the fact that CFI is rather unambiguous on what it means by "constructed language" for its inclusion of a set of examples; and even if you and a significant number of other editors would insist that CFI is too ambiguous, a simple edit to CFI could remove the ambiguity. --Dan Polansky 16:08, 13 October 2010 (UTC)[reply]
Thank you! --Daniel. 17:27, 13 October 2010 (UTC)[reply]
Re "No, I do not consider the existence of the five HTML tags as an exception to common practice. My decision to move them to appendices is meant to achieve overall consistence, logic and readability": What on Earth? So the five tags establish a common practice? Furthermore, even if it really were common practice to document computing languages in the mainspace, it would be none of your business to be moving valid things to appendices, no matter whether this would supposedly achieve "consistence, logic and readability" (not that I understand how a change of namespace achieves consistence and readability, or that a change of namespace is the single way of achieving this). --Dan Polansky 08:12, 12 October 2010 (UTC)[reply]
The common practice is people occasionally creating definitions for programming languages in the main namespace and editors deleting them, usually as "Not dictionary material." Another common practice is keeping certain constructed languages in appendices. Another common practice, as apparently advocated by SemperBlotto in a message below at this same discussion, is listing the commands/statements/tags/attributes/etc. all in only one appendix per language. I may be wrong, but I guess SB's suggestion is already fulfilled for HTML by Appendix:Hyper Text Markup Language. I do not fathom the lack of additional appendices per term as something desirable.
As Prosfilaes has mentioned, it would not be helpful to mix human and computer languages of "if". In addition, the change of namespace is encouraged as documented in CFI, regardless of how undocumented it is for computer languages specifically. Subdiving the entries into two namespaces causes the three results that I described. You may as well suggest other ways of achieving "consistence, logic and readability" if you wish. --Daniel. 03:15, 13 October 2010 (UTC)[reply]
Re "I do not fathom the lack of additional appendices per term as something desirable": Let me decipher the double negation in that sentence: The sentence says that you think it desirable that more appendixes are created with one subpage per term, right? I say that it is undesirable, and so say some other people. Understood? Create a vote, if you want to proceed in this way for computing languages or even only for HTML, okay? First formally show that what you are doing is based on consensus, okay? Because I vehemently oppose what you are doing. --Dan Polansky 08:06, 13 October 2010 (UTC)[reply]
Ugh, please don't. Votes accomplish nothing, and we'll get into the whole issue of whether we need consensus to keep/create something or to delete it, and the fact that we have no real CFI for appendices, and a whole lot of unproductive conflict without discussion, and a huge mess. --Yair rand (talk) 08:13, 13 October 2010 (UTC)[reply]
Votes have acoomplished some useful things. One vote has rejected the proposal pushed by Daniel Dot that some appendix pages should be called "...Femine given names..." rather than "...Female given names...". Another vote has proposed a rather imperfect regulation of geographic names, and has been rejected; an improved version of that vote has been accepted, and became a hard-to-dispute part of CFI. There are more votes that have achieved useful things. The currently running vote on "number" vs "numeral" has disclosed a considerable opposition to first voting on category names before part-of-speech is clarified; such a strong opposition was nowhere obvious from mere discussions, to me anyway. --Dan Polansky 10:20, 13 October 2010 (UTC)[reply]
This is just far too messy. We need a single appendix for HTML (and every other programming or scripting language) (call it what you want) that has sections for each tag. Otherwise, they might as well be in the main namespace. SemperBlotto 10:14, 12 October 2010 (UTC)[reply]
You haven't seen messy until you've seen one of those types of appendices. :) The fictional universe appendices looked like that for a while, and it was getting ridiculous. Having appendices set up with subpages for each term seems to work well. --Yair rand (talk) 23:07, 12 October 2010 (UTC)[reply]
Would I be correct in thinking that the entire dispute over whether there should be Wiktionary content explaining elements of programming languages comes down to a disagreement over whether a "word" (in Wiktionary's own slightly distorted understanding of "word", being whatever we feel like including) refers to "a series of symbols (or a single symbol) that represents something" or "a series of symbols (or a single symbol) that is used in human communication"? --Yair rand (talk) 08:46, 13 October 2010 (UTC)[reply]
The dispute seems rather academical, on how to read a piece of regulation such as CFI, and how to assign meanings to some undefined terms in CFI. It is not so much a dispute about "word" as about "language" and "constructed language" as used by CFI; whether a word can contain spaces is out of scope of the discussion. I think arguing with "all words in all languages" is a bit pointless anyway, given its being more of a slogan than anything else.
Given we can change CFI, we can discuss outside of CFI about what we would like CFI to become, and, in particular, whether it is a good idea to create one wiki database that combines lexicographical documentation of words of all human languages with documentation of programming languages and their APIs. That extra-CFI discussion is not about what it says in CFI, exactly because CFI can be changed as a result of that discussion. For that extra-CFI discussion, I think that the idea to combine the two in the mainspace is a poor idea, and that the idea to create a quasi-namespace in Wiktionary for computing languages is also a poor one. --Dan Polansky 10:20, 13 October 2010 (UTC)[reply]

Religions as common or proper nouns

Why Christianity is a common noun, and Buddhism a proper noun? Catholicism and Orthodoxy are divided between parts of speech as well. --Daniel. 01:26, 11 October 2010 (UTC)[reply]

They are all principally proper nouns, with the possible exception of Orthodoxy, but (like most proper nouns) are all attestable as common nouns as well: google books:"Christianities", "much Buddhism", and so on. (Note: most of the "much Buddhism" cites are proper-noun uses. About one-fourth seem to be common-noun uses.) —RuakhTALK 15:37, 11 October 2010 (UTC)[reply]

User setting to hide/show specific languages

I recently realised that a lot of the time, I'm trying to navigate through all the various languages we have on Wiktionary, both in translation sections and on the pages themselves. And then it occurred to me that we already have scripts to hide certain things such as tables and quotations, and there is also a script to show specific translations in the translation header. So my suggestion is this: To add an option in the user preferences that allow a user to show only those languages that they want to see. This will completely remove all languages not chosen from the translations AND from the pages themselves. A button should be present on each page where information has been hidden, to show it. I think this will yield a huge usability gain for people because quite often people are only interested in one or two languages, and don't particularly feel like scrolling past the others to get what they want. This way, people will be able to see what they need to see a lot faster, making using Wiktionary a much more pleasant experience. —CodeCat 14:40, 11 October 2010 (UTC)[reply]

Something like User:Atelaes/TabbedLanguages.js (which I would love to enhance) is probably more functional. You could extend it by allowing a default L2 to jump to (currently it just does the top one I think) assuming the user hadn't already specified a hash. --Bequw τ 18:50, 13 October 2010 (UTC)[reply]
I think that's too limited, it only works if you use one language. But I edit several languages, so it's lost on me. And I am not very comfortable with editing JS on here, especially not something that might eventually be deployed site-wide... —CodeCat 20:14, 13 October 2010 (UTC)[reply]

Computing languages

I am posting this for the sake of findability in Beer parlour. There is a long discussion "#colspan, etc." on the subject of inclusion of keywords of computing languages (including programming languages such as Java and markup languages such as XHTML), whether in the main namespace or in a quasi-namespace created in the appendix namespace. The discussion started on 7 October 2010. --Dan Polansky 08:33, 12 October 2010 (UTC)[reply]

I thought this might be an interesting addition to our Proto- appendices. I'm just unsure how to format it. It needs some sort of unattested banner, and some way to link to it. Does anyone think it's not a good idea? Mglovesfun (talk) 10:07, 12 October 2010 (UTC)[reply]

Looks fine to me. I've added the banner and a category. As for linking, that is still a problem because most templates like {{l}} and {{term}} won't work, and neither will {{proto}} in this case. —CodeCat 15:33, 12 October 2010 (UTC)[reply]
Yep, apart from just doing it by hand, such as *{{term|Appendix:Vulgar Latin *montanea|montānea}}. Mglovesfun (talk) 15:47, 12 October 2010 (UTC)[reply]
The pagename, the ==Language== header, and the category ("Vulgar Latin nouns") say this is a word in VL, but AFAIK we treat that as merely a form of Latin; why not do so here, too?​—msh210 (talk) 16:04, 12 October 2010 (UTC)[reply]
I don't think we want unattested terms in Category:Latin nouns. We don't link directly to Vulgar Latin in the mainspace because by it's nature, it's unattested. Mglovesfun (talk) 03:22, 13 October 2010 (UTC)[reply]
Actually, some Vulgar Latin is attested. There are countless inscriptions and instances of grafitti that attest to the existence of many words that lie outside standard Latin, and the plays of Plautus include instances of common speech. The problem with attestation lies primarily in those reconstructed and speculative source words for Romance language words. --EncycloPetey 03:46, 15 October 2010 (UTC)[reply]
For linking, we can use {{lx}} and {{termx}}, but those require that there is a language template prefixed with proto: for VL, and it must be added as such to {{langprefix}}. —CodeCat 11:29, 13 October 2010 (UTC)[reply]

Chinese dialects

As has been pointed out to me by user:Mglovesfun, Chinese dialects are not languages. So, how do I find the particular pronounciation of words in a particular dialect, if there is no categorization scheme for such a thing? I tried to add some pronounciations for Hoisanese/Toisanese/Taishanese, but there seems to be no way to actually categorize such information, unless it is indicated as a language. 76.66.200.95 09:17, 13 October 2010 (UTC)[reply]

Sadly, it's much more complicated than that. We've 'banned' Chinese as a language, using only the sub-languages. Other than that, we follow ISO 639 pretty rigidly, but we do allow exceptions, such as {{roa-jer}}. We could 'award' Hoisanese full language status, but the evidence isn't really there to do so. Or is it? I'm not an expert on Chinese languages, far from it, I just fix broken things as I find them. Mglovesfun (talk) 09:50, 13 October 2010 (UTC)[reply]
What determines which dialects get full sections? 76.66.200.95 10:10, 14 October 2010 (UTC)[reply]
There is no way. While we have {{a}} to add pronunciation information in a particular dialect, it does not do any categorization. -- Prince Kassad 09:53, 13 October 2010 (UTC)[reply]
Should there be a categorization, based on entries that have pronounciation for a dialect available? 76.66.200.95 10:09, 14 October 2010 (UTC)[reply]

Appendix:Marvel Comics? Concordance:Marvel Comics?

Here is the small, current page Wiktionary:Concordances, fully quoted (except a technical note about categorization):

 

This page organizes vocabulary lists based on works of literature. Concordances have their own namespace (Concordance:). Someone reading the stories of Sherlock Holmes, for example, may need to look up rare and obsolete words even if he is a native speaker; someone reading Shakespeare certainly would! Someone reading the works of Doctor Seuss as an aid to learning English (or just pronunciation of English) may find it handy to have a vocabulary list.

 

This explanation, particularly the part about organizing "vocabulary lists based on works of literature" seems ambiguous to me.

It is phrased in a way that implies that Appendix:Marvel Comics (which defines mutant, spider-sense, image inducer, etc.) should be moved to Concordance:Marvel Comics. Should it? Perhaps Appendix:Marvel Comics is not a good enough name?

I, differently, believe that the Concordance: namespace should only list all words from each work, and their quantities, and links to their definitions, like how it's done at Concordance:Bible and Concordance:Holmes A. --Daniel. 17:51, 13 October 2010 (UTC)[reply]

What action do you propose that you or other people take? --Dan Polansky 20:29, 13 October 2010 (UTC)[reply]
I propose that people say what do they expect from the "Concordance" namespace.
  1. Should it contain lists of all words from each work, like how Concordance:Holmes A informs that "abandon" appears 13 times on Sherlock Holmes and "accept" appears 18 times?
  2. Should it contain lists of words whose meaning is expected to be understood solely from the context of the work, like how Appendix:Elfen Lied/Diclonius defines "Diclonius" as a fictional race from the Elfen Lied series?
  3. Or, alternatively, they don't care with the Concordance namespace, since it is essentially not edited by anyone?
These three options are apparently the only uses for that namespace that have been considered in Wiktionary.
I, personally, prefer that the "Concordance" namespace be used only for lists of all words from each work. For instance, if my preference comes into effect, the Australian TV series Farscape may be focused into two pages:
  • Concordance:Farscape to count the words, possibly saying that "the" appears 15,000 times.
  • Appendix:Farscape to define words from the context of the series, possibly informing that "crindar" is a form of currency.
However, currently, Concordance:Farscape is not counting terms but defining them, so I believe it should be moved into Appendix:Farscape. --Daniel. 21:09, 13 October 2010 (UTC)[reply]
I see. I don't really know the intended scope of the Concordance namespace. I don't feel like thoroughly researching the subject, so here are some quick results. Searching the discussions in Beer parlour for "concordance" finds some discussions, including Wiktionary:Beer_parlour_archive/2008/January#Proposed_vote_on_fiction_concordances.. In this discussion, EncycloPetey's take is this: "However, those are complete word lists, with all words that appear in those works. To propose a Concordance, I would want to see a word list that is likewise comprehensive." In there, bd2412 thinks that "Concordance:A Clockwork Orange" should be moved to an appendix, because it is not a list of all words from A Clockwork Orange.
Given how few pages there are in the Concordance namespace, the namespace could as well be dropped altogether in favor of Appendix namespace.
Another relevant point of research is to find what "concordance" means outside of Wiktionary, by finding real printed concordances. One such search: google books:"Concordance". Here is a page from one printed concordance. The meaning of "concordance" outside of Wiktionary can help steer the choice of Wiktionary's idiosyncratic use of the term "concordance". --Dan Polansky 06:37, 14 October 2010 (UTC)[reply]
Here is a page from quite a different sort of concordance, a Shakespeare one. The concordance shows, for each word, a list of quotations from an original Shakespeare work. This sort of concordance apparently does not have each word that occurs in some work as a heading but rather only some selected words of interest, such as "grief", "peace" and "youth". --Dan Polansky 06:44, 14 October 2010 (UTC)[reply]
In my view, a concordance here should be initially generated by bot as a rule, being a statistical analysis of all the words used in a given work or corpus. It should then be augmented with quotations for all the words, illustrating as well as possible context they are used in within the work and the scope of meaning they may have there. An appendix is a freer form that can include everything from word lists to grammar, any sort of language statistics, context-based multi-language translation tables (e.g. chemical elements, Harry Potter terms), and whatever else doesn't fit into other namespaces. – Krun 09:38, 14 October 2010 (UTC)[reply]
Thank you for linking to other discussions and concordances, Dan. I conclude that the meaning of "Concordance" is, by consensus (i.e., consensus between Wiktionarians and between other writers), a list of all words from a work.
I appreciate the idea of improving concordances by adding quotes from the original works. That is, differently from the non-compreehensive Shakesperean concordance that you described, a Concordance:Shakespeare would contain all the quantifiable words, which in turn could be backed up by sentences from the Shakespearean works.
As for defining words at concordances, this may as well be done eventually, depite being redundant to specific entries like Appendix:Pokémon/Ditto.
Other than the suggestion of dropping the "Concordance" namespace in favor of appendices (which may be further discussed), apparently there is no editor opposing the distinction between "Appendix"/"Concordance", so, in the future, I'm going to move Concordance:A Clockwork Orange, Concordance:Farscape and other vocabularies to the appendix namespace. --Daniel. 13:11, 14 October 2010 (UTC)[reply]
What you say holds true not of the meaning of "concordance" (there is no such thing, notice the definite article), but of one of the several meanings of "concordance". Wiktionary's "concordance" seems to tend to lists of (a) all words from a work, (b) without definitions and (c) without quotations. To list all words with quotations is to replicate the complete work in the concordance, and to do so several times, as each quotation would be listed under each word that occurs in that quotation; that makes not much sense to me. Moving Concordance:A Clockwork Orange and Concordance:Farscape to the appendix namespace makes sense. I have moved Concordance:Farscape; I do not know what to name Concordance:A Clockwork Orange, as it is not really a glossary; OTOH it defines each newly coined term in English. --Dan Polansky 13:32, 14 October 2010 (UTC)[reply]
Then, I agree with your wording on "meaning". I'm now going to create the individual pages, including Appendix:Farscape/dren and Appendix:Farscape/fahrbot. --Daniel. 13:41, 14 October 2010 (UTC)[reply]
Stop! Leave Appendix:Glossary of Farscape terms alone. It is formatted exactly the way it should be. --Dan Polansky 13:51, 14 October 2010 (UTC)[reply]
I have never edited Appendix:Glossary of Farscape terms. --Daniel. 13:56, 14 October 2010 (UTC)[reply]
Okay, avoid copying material from Appendix:Glossary of Farscape terms to Appendix:Farscape/dren etc. I guess I have to formulate everything precisely, or you are going to interpret things as you see fit, right? --Dan Polansky 14:06, 14 October 2010 (UTC)[reply]
Let us close more loopholes: Avoid creating Appendix:Farscape/dren etc., regardless whetehr based on the appendix or based on something else. --Dan Polansky 14:08, 14 October 2010 (UTC)[reply]
I, differently, suggest that you copy information from subpages of Appendix:Farscape into Appendix:Glossary of Farscape terms, if you want to improve the quality of the latter, since it includes broken links to Wikipedia and some definitions that are more encyclopedical than lexical. --Daniel. 14:15, 14 October 2010 (UTC)[reply]
(<-) Either demonstrate consensus for one-term-per-page-in-appendix, or stop. As simple as that. I am having this discussion with you on your talk page. --Dan Polansky 14:18, 14 October 2010 (UTC)[reply]
I, for one, oppose the idea of one-page-per-headword appendices. This is not the first time I have said so. I have not seen any groundswell of support for the creation of a parallel structure of entries and categories in Appendix space. DCDuring TALK 18:29, 14 October 2010 (UTC)[reply]
Do you (or anyone here) have a proposed alternative format for appendices for documenting words that have three citations in separate works but do not have three citations that are independent of reference to a specific fictional universe and words in constructed languages other than Esperanto, Ido, Interlingua, Interlingue, Lojban, Novial, and Volapük (as specifically allowed by CFI)? If so, please explain it and we can discuss the advantages and disadvantages of both formats, and try to come to a consensus. It makes it difficult to discuss something where the alternative is not mentioned or understood. Once someone has a feasible format that takes into account usability, ease of editing, how it will look when extensive content is added, long-term managing, etc., that is what we should use. --Yair rand (talk) 22:59, 14 October 2010 (UTC)[reply]
I think Appendix:English dictionary-only terms is a good approach. That said, unlike DCDuring and Dan Polansky, I don't object to Daniel. creating single-word appendices, as long as he (1) recognizes that he might simply be wasting his time, if we end up deleting all of the appendices he's created, and (2) doesn't modify mainspace templates and categories to support those appendices. —RuakhTALK 00:06, 15 October 2010 (UTC)[reply]
I imagine that the Appendix:English dictionary-only terms format would be rather stretched once the words have more content than just part of speech, etymology, definition, and quotations. --Yair rand (talk) 00:28, 15 October 2010 (UTC)[reply]
Ruakh, as the disclaimer says "If you do not want your writing to be edited and redistributed at will, then do not submit it here." By editing Wiktionary, I am already formally giving the permission for people to improve my appendices (and entries, and whatnot); if deleting them all is better than all other approaches, so be it. As for modifying templates, do you have a different proposal? One old suggestion was of using {{apdx-en-noun}}, a different template strictly for appendices formatted as entries; would you perhaps prefer that way? As for your suggestion, in my opinion, for example, Appendix:Harry Potter/Muggle would not easily fit into the table of Appendix:English dictionary-only terms, so I share his Yair's opinion about that format. --Daniel. 00:40, 15 October 2010 (UTC)[reply]
Seems a bit ... cramped, but it is a possibility, I guess. It can't do categories, but that's not that important, and linking to specific words could be accomplished by anchors inside templates. Any other proposals? --Yair rand (talk) 03:13, 15 October 2010 (UTC)[reply]
Thanks for this sample, Yair. I, personally, prefer the current version of Appendix:Marvel Comics as a simple list of words and glosses where it is not necessary to display additional information like etymologies and pronunciations, resulting in a cleaner overall effect. --Daniel. 03:39, 15 October 2010 (UTC)[reply]
I don't see the need for the "quotations" and "translations" columns: quotations can go in the Citations: space, and translations are basically arbitrary. —RuakhTALK 16:42, 15 October 2010 (UTC)[reply]
Can you please provide examples of arbitrary translations, to better express your point of view? --Daniel. 17:03, 15 October 2010 (UTC)[reply]
The ones in Yair rand's linked sample. —RuakhTALK 17:12, 15 October 2010 (UTC)[reply]
"trouxa" is the legitimate way of saying "Muggle" in (Brazilian) Portuguese, so I believe it is not arbitrary at all. --Daniel. 17:24, 15 October 2010 (UTC)[reply]
A legitimate way, perhaps, but google:"um Muggle" suggests that it's not the legitimate way. If you want to create an appendix for words used in a particular Portuguese version of the series, that would be fine, but I don't see why the English words need version-specific "translations". And given that all Portuguese counterparts will be listed on a single page, there's no real need for individual English table entries to link to individual Portuguese table entries. —RuakhTALK 17:31, 15 October 2010 (UTC)[reply]
If "legitimate" means "official", then both "trouxa" and "Muggle" have that benefit, because one appears in the Brazilian Portuguese books and other in the European Portuguese books. Anyway, WT:FICTION suggests a wider approach; basically, if people use any term, be it from official books or anywhere else, it is defined on Wiktionary. Since Harry Potter was translated into at least 64 languages, we can expect dozens of versions of "Muggle"; it would be very convenient if they could be found at a single translation table. You seem to imply that the English, original version "Muggle" is likely to be known by all languages, so it would be defined in all of them. If we assume this fact as true (regardless of it being your conclusion or mine), then it simply would not be very different from pizza, with various scripts, genders and hopefully citations for each language. In addition, it would display Portuguese trouxa, Italian babbano, Polish mugol, German Muggel, Français moldus, and so on, if I could list the translations correctly. --Daniel. 17:52, 15 October 2010 (UTC)[reply]
Re: "You seem to imply that the English, original version 'Muggle' is likely to be known by all languages": No, sorry, I didn't mean to imply that. Can you indicate which part of my comment gave that implication, so I can clarify what I actually did mean? :-P   Thanks in advance. —RuakhTALK 19:13, 15 October 2010 (UTC)[reply]
No problem; sorry if I misunderstood anything. Your paragraph from "17:31, 15 October 2010 (UTC)" as a whole gave me that impression, so I will backtrace my thoughts from when I read it.
After I called trouxa the "legitimate translation" of Muggle, you dismissed it as the only legitimate, because there is also the (European Portuguese) Muggle, and apparently introduced the possibility of having an appendix to list only the Brazilian version of Harry Potter.
However, you pointed out that all the Portuguese versions are supposed to be together, making the version-specific appendix sound as an additional, repetitive page; and apparently used this fact as an argument to avoid translations linked from the English words, then I concluded that you don't want translation tables to fulfill the purpose of linking one Portuguese word to a synonymous Portuguese word.
You searched for google:"um Muggle", the English word with a Portuguese article. This peculiar choice of words implies that you wanted to find the original "Muggle" used in Portuguese sentences, thus contributing with your idea that "trouxa" is not the only Portuguese version available.
On the first page of the Google results, there is not any mention of the official difference between Muggle and trouxa in Portuguese. I assumed that our results are equal and you didn't go through more pages, as you mentioned a Google search instead of more solid results like official sources; or, more importantly, that you simply wanted to know how the word is used from the original source into Portuguese, which is a reasoning that can work for other languages as well, as maybe "Muggle" is used in French, German, Polish and Italian, despite the official translated books. So I felt inclined to point out the advantages of "Muggle" being translated from English into various languages at once, regardless of versions, dialects and official books. --Daniel. 20:27, 15 October 2010 (UTC)[reply]
O.K., I think I see. To clarify: I didn't have a specific expectation that "Muggle" would exist in Portuguese, and although you used the phrase "(Brazilian) Portuguese" above, I didn't realize that the European Portuguese version would be so different! But I figured that not everyone who discusses Harry Potter in Portuguese will necessarily use the terms chosen by the official translation, so I tried "um Muggle" just as a first guess. (If it didn't work, I was thinking I could spend a few minutes figuring out how Portuguese-speakers tend to respell English loanwords, then apply those rules in making my next guess.) My goal was simply to demonstrate that "official" terms are not the only ones. I think we should treat the Portuguese translations as their own works, separate from the English versions, and though they should link to each other, those links should be outside the table, one set of links for the whole page, rather than one set of links for each individual word. Does that make any sense? —RuakhTALK 21:05, 15 October 2010 (UTC)[reply]
You successfully demonstrated that the "official" terms are not the only ones. Another notable example is HM slave, that despite being widely used, was coined by fans of Pokémon, not by official sources; this information should be clarified at the etymology, whenever possible. As for your proposal of having one appendix per language, I think I can understand it. Would this page reflect your idea? I particularly do not like how the code of the table is very confusing to edit, but it may hopefully be improved. And, of course, I prefer the benefit of higher usability, intuitiveness and cross-linking of having a translation table like on other entries, but I am open to discuss this alternative, so feel free to defend it and/or suggest improvements to it. (: --Daniel. 21:42, 15 October 2010 (UTC)[reply]

Additional namespace

  • reverting to left margin.

I am beginning to wonder whether we might want to have an additional namespace complete with its own distinct (if that is possible) category structure for various sorts of terms that do not fit within the scope of a dictionary as narrowly construed, but fit in the broadest plausible scope. Such a realm might allow more experimentation in presentation than we should have in our main entries, where we need to be more serious to serve whatever base of normal users we might have won over the years.

Our current Appendix space has many glossaries that might be sources for fuller entries. I think that once a given term has "enough" actual content beyond mere glosses it might merit something that resembled our regular entries. The same Citation space that supports our regular entries could support this space as well. Such a namespace could serve as home for a wide range of debatable classes of entries. One good use of categories in this space would be to classify the items by the rationale for their exclusion from namespace 0.

The rationales for not having the various likely classes of items in their own separate wikis include:

  1. They share many characteristics with principal namespace entries (eg, need for translations, general format and grammar, etymology).
  2. They may become principal namespace entries when as and if they "enter the lexicon" of general speech and writing.
  3. We may be able to recruit contributors with special interests who may contribute to mainspace entries.
  4. Such a namespace may provide opportunity for experimentation not desirable in principal namespace.
  5. No single one of the classes of terms that might be in such a namespace would survive independently with its own wiki. DCDuring TALK 15:20, 15 October 2010 (UTC)[reply]
First of all, I created a subsection "Additional namespace" for this discussion, to make editing it easier in the future.
So, you are basically proposing that Appendix:Marvel Comics/image inducer be moved to Xyz:Marvel Comics/image inducer? That is, another namespace, whose name is yet undecided (but represented by Xyz: in my example), while keeping the format of one page per spelling?
It seems natural that, for example, both pages Appendix:Marvel Comics/mutant (with definitions restricted to Marvel Comics) and mutant (with definitions allowed by CFI to be at the main namespace) be cited by Citations:mutant, so I appreciate the particular proposal of sharing the "Citations:" namespace between related pages. --Daniel. 23:55, 15 October 2010 (UTC)[reply]
A month or two ago I made a similar suggesting regarding unattested terms such as those in proto-languages and appendix-only conlangs. Perhaps that possibility could be taken into consideration, whatever the result of this discussion is. —CodeCat 16:26, 18 October 2010 (UTC)[reply]
As DCDuring pointed out, "One good use of categories in this space would be to classify the items by the rationale for their exclusion from namespace 0."
Up to this date, as I perceive them, the words excluded would be these from:
  1. Minor auxiliary languages, such as Afrihili and Unilingua.
  2. Computer languages, such as HTML and APL.
  3. Artistic languages, such as Klingon and Na'vi.
  4. Individual opuses (or, less comprehensively, "fictional universes"), such as Star Wars and Pokémon.
  5. Reconstructed languages, such as Proto-Germanic and Proto-Algonquian.
I can imagine the creation of multiple namespaces for this job, for instance Proto:Algonquian, Computer:HTML, and Fiction:The Simpsons. However, they share certain characteristics that, in my opinion, would be better kept at a single namespace. The most abrangent, long-term, intuitive and elegant namespace proposal that I can think of is "Context:".
Certain words only exist in a certain context. For instance, last year, particularly, there were multiple entries formatted like "(Star Trek) A fictional race from the planet Vulcan.", with a context label of Star Trek between parentheses. This acknowledges the fact that the fictional race only exists in context of Star Trek. Two years before, there were also words from Klingon at the main namespace, naturally differentiated by the language header.
Since then, I have been moving these pages to the right "Star Trek" (or other) appendices, in an attempt to clean up the main namespace from unwanted words. These appendices serve the purpose of displaying the right context, without the use of a context label. Equally, "Expecto Patronum" exists in the context of Harry Potter, therefore we don't want it in the main namespace. "*dagaz" also fits in a context: of being a word from a reconstructed language (Proto-Germanic) so we equally don't want to make it a main namespace entry.
This proposal would also differ from the other namespaces: "Talk", "Appendix", "Wikisaurus", "Citations" that refer to different practices and formats, like one being a bunch of citations, other a list of relations between words, other a place for discussions. "Context" would be simply a place that serves as a context label, or a language header.
That said, I propose the creation of Context:Harry Potter, Context:HTML, Context:Proto-Algonquian and so on. --Daniel. 17:13, 18 October 2010 (UTC)[reply]

What to do with these 9 constructed languages?

Let me quote one piece of text from CFI:

At present another 12 of the 7000 languages in the ISO 639-3 list are constructed languages. Words in 9 of those languages have not yet been approved for inclusion in the English Wiktionary. These are Afrihili, Blissymbols, Brithenig, Dutton World Speedwords, Glosa (Interglossa), Kotava, Láadan, Lingua Franca Nova, and Romanova.

How should I read the part "have not yet been approved for inclusion"? Should they never be defined here, neither in the main namespace, nor in appendices? Or, perhaps, can I create Appendix:Afrihili and Appendix:Láadan?

Please note that we have Category:Lingua Franca Nova language, with subcategories and entries (e.g., simbolojia) like most languages. So, I propose removing it from the CFI list of "not yet been approved", unless there is some reason for it to be not approved as of yet. --Daniel. 13:43, 15 October 2010 (UTC)[reply]

I think approval of a constructed language for the main namespace requires a vote. -- Prince Kassad 14:10, 15 October 2010 (UTC)[reply]
It certainly would seem to require a vote to keep them. Under the current status they could certainly all be deleted, possibly speedily. OTOH, I know of no reason to exclude them from glossary-style Appendices. DCDuring TALK 16:24, 15 October 2010 (UTC)[reply]
As a minor constructed language present in only 27 entries, Lingua Franca Nova may as well be moved to the appendix namespace. --Daniel. 18:01, 15 October 2010 (UTC)[reply]
"have not yet been approved for inclusion" doesn't seem to justify a speedy deletion. It makes it sounds like all other languages have been approved. I created Category:Picard language without a vote. I suspect this is just another example of poor wording in CFI where it means to say "these are not allowed" but it doesn't. Mglovesfun (talk) 16:20, 16 October 2010 (UTC)[reply]
Since Picard is a natural language and distinct from other languages, I believe it is uncontroversial. Lingua Franca Nova, differently, by nature is prone to being discussed and possibly moved away from the main namespace. --Daniel. 17:37, 16 October 2010 (UTC)[reply]
Daniel. made my point. I think there is a presumption favoring any terrestrial (Earthist bias here.) natural language, even one without its own ISO 639 code. DCDuring TALK 18:45, 16 October 2010 (UTC)[reply]
Yes, let me quote another piece of text from CFI: Esperanto, in particular, is a living language with a sizeable community of fluent speakers, and even some native speakers! By it alone, I can figure out that major constructed languages have the benefit of being present at the main namespace because they are... major. Let me try to rationalize it. Esperanto possibly has more than 2 million speakers and Lingua Franca Nova nas less than 100 speakers, therefore the former is prone to draw the attention of more people, help more people, thus is present in the main namespace, and the latter is (or not?) exiled to appendices where hopefully no one will notice its existence. Does that make sense?
Lingua Franca Nova may be a minor constructed language covered in only 27 entries here. However, Pochutec is a minor extinct natural language covered in only 1 entry, thus, by consensus, it merits to be in the main namespace. --Daniel. 23:53, 16 October 2010 (UTC)[reply]
No, that does not make sense. Did you bother to read the sentence you quoted? Esperanto and Interlingua at least have literature written in the language and have been published in peer-reviewed journals not about constructed languages. Lingua Franca Nova is still a little game that no one knows if anyone will ever care about anything written in the language in 20 years. (The information on Pochutec, on the other hand, is still valued a century later to give us insight into the Aztecan languages.) However, LFN advocates are probably more than happy to fill 20,000 entries with words that are probably just as well kept in their own database and probably without good citation to boot.--Prosfilaes 21:07, 17 October 2010 (UTC)[reply]
@Prosfilaes: Replies merely valuing plenty of sources of citations for [constructed language] or insight into [family] languages are predictable, despite being counterarguable: there are Sharespeare books, Tao Te Ching and Gilgamesh all translated to Klingon, thus they are possible sources of citations; on the other hand, Pochutec is virtually only citable directly or indirectly from Tim Knab and Franz Boas, according to a quick research. I wonder if a Pochutec version of Appendix:English dictionary-only terms would be created, such as possibly Appendix:Pochutec Boas-only terms, to keep noncitable Pochutec words outside the main namespace.
As for the second argument as I listed, Lingua Franca Nova may serve as an insight into how to merge and simplify French, Italian, Portuguese, Spanish, and Catalan together.
Anyway, I am not particularly interested in advocating the placement of Lingua Franca Nova in the main namespace, nor its removal to appendices. I have started Appendix:Láadan to cover another of the 9 languages listed at the first message. Feel free to further discuss their existence, if necessary. --Daniel. 05:22, 18 October 2010 (UTC)[reply]

APL with glosses

  1. This is one revision of Appendix:A Programming Language, mainly a mere list of symbols:
  2. This is my proposal, another revision, differentiated by glosses next to each symbol:

The 1st link is the current revision; that is what appears when someone clicks Appendix:A Programming Language today. I propose replacing that result with the 2nd link, that includes glosses to better identify each symbol by how to use them. --Daniel. 15:30, 15 October 2010 (UTC)[reply]

Since no opposed opposed my proposal, I added the glosses to that page. If no one objects, I am going to do the same thing to Appendix:HTML in the near future. --Daniel. 01:04, 21 October 2010 (UTC)[reply]
That's exactly the kind of thing I had in mind. The glosses are essential. I expect such appendices to be more useful both to our users and to other-language wiktionaries than the individual entries would be. DCDuring TALK 01:12, 21 October 2010 (UTC)[reply]

Since we've deprecated the Alternative spellings header, I think this at least merits a discussion, no? I'd actually favor keeping them all, but the category summary says "This category contains [foo] alternative spellings: Alternative spellings of [foo] terms, with identical pronunciation and context." I think this is correct. If it's not a homophone, it's an alternative form. So I feel the same way I do about this as I did about the header; as long as they're used correctly, no reason not to have both this and Category:Alternative forms by language. Mglovesfun (talk) 16:15, 16 October 2010 (UTC)[reply]

Actually the "alternative spellings" or "alternative forms" categories don't really make sense: a given spelling isn't an alternative form, per se, it's just that if two spellings are alternative forms of each other, we choose to label one an "alternative form" pointing to the other. Which isn't to say that we shouldn't have a category for spellings that we choose to label that way, but it's weird to put it in the same supercategory as "archaic spellings" and "informal spellings" and so on, which are real things. —RuakhTALK 16:20, 16 October 2010 (UTC)[reply]
Good point. I suppose in general, the most common spelling gets a full entry and alternative spellings means less common spelling, but that's not a hard and fast rule; consider favor/favour where both are very much common. Mglovesfun (talk) 16:22, 16 October 2010 (UTC)[reply]
If favor and favour "are alternative forms of each other", then both probably should be at Category:English alternative spellings. --Daniel. 01:07, 21 October 2010 (UTC)[reply]

What should Category:Abenaki language contain?

I found out two facts about languages and their codes:

  1. Abenaki (no individual ISO code) apparently means Eastern Abenaki (ISO code aaq) and Western Abenaki (ISO code abe) together.
  2. Wiktionary, differently, has a category Category:Abenaki language whose code is abe (ISO for Western Abenaki) and has neither Category:Eastern Abenaki language nor Category:Western Abenaki language.

Should Category:Abenaki language contain only Western Abenaki words, or should it contain both the Western and the Eastern versions together? --Daniel. 23:41, 16 October 2010 (UTC)[reply]

Note that Eastern Abenaki is classified as extinct by ISO. This means it is an unlikely candidate for words to be added in. -- Prince Kassad 10:12, 17 October 2010 (UTC)[reply]
No, actually the unlikeliness of words to be added in Category:Eastern Abenaki language is 0%. That is, we already have words to be categorized in there: Both moose and wigwam are described as English words derived from this language, therefore they probably should be at Category:Eastern Abenaki derivations. They are, however, in Category:Algonquian derivations, which stands for derivations from a whole family. --Daniel. 05:59, 18 October 2010 (UTC)[reply]

Glosses for {{form of}}s

I assume we are not actually discouraging definitions in form-of entries? While this is often done by simply adding the definition after the template in the definition line, a gloss for the lemma in the template itself would be a quick half-measure, and simple for the editor to include. This would probably be most useful in foreign-language entries, but I could imagine it being used elsewhere. Perhaps this has been proposed before, but I can't recall it. Is there any reason not to? Dominic·t 04:25, 17 October 2010 (UTC)[reply]

Yep, I tend to use glosses when there is potential ambiguity, like {{alternative form of|foo|nodot=1}} {{gloss|fish}}, when foo has to or more etymologies. So I'd support adding gloss=. Mglovesfun (talk) 12:47, 17 October 2010 (UTC)[reply]
However useful the gloss is in some cases, it is safe to predict that it will be misused, as in giving one gloss for a highly polysemic word with but a single etymology. One useful thing is to make sure there is a link the to appropriate etymology section (usually works only for first language on page) within {{form of}}. DCDuring TALK 15:50, 17 October 2010 (UTC)[reply]
Well, this is bound to happen occasionally, since form-of entries are not automatically synced to lemma entries when a new sense is added. It doesn't seem like a major problem, though. One gloss could hardly be much worse than none, though, can it? And most words will not face this issue. Dominic·t 17:36, 17 October 2010 (UTC)[reply]
It could be worse than none if the gloss leads a user to fail to understand that there are other senses. I would favor forbidding such glosses for polysemic words and mandating links to appropriate Etymology or PoS sections if there are multiple possibilities at the lemma (at least for English). I wish it were possible to extend my desired prohibitions and mandates to other languages. DCDuring TALK 18:25, 17 October 2010 (UTC)[reply]
Okay, I do understand that concern and it is a valid one. You are envisioning the worst case scenario, whereas I had in mind the likeliest situation (where the most common sense(s) are used as a gloss, and so most readers would find them appropriate; that would be misleading to some, but so is giving them nothing to go on at all). Clearly, we need to tread carefully, but I don't think we need to eliminate the possibility of all glosses in entries for forms. Dominic·t 19:30, 17 October 2010 (UTC)[reply]
I, for one, would happily discourage definitions in form-of entries except in certain rare cases; but I'm pretty sure there's no consensus for that. (Don't worry, I'm still working on getting the "dictator" priv. I'll get there someday.) But contrariwise, I'm not sure there's consensus for encouraging or promoting such definitions, either. We modified the CFI a while back to say that regular inflected forms do merit entries (it previously advocated them only for irregular ones), but it still says only that such entries "should indicate what form they are, and link to the main entry for the word": that is, it pointedly doesn't indicate what other sort of information they should or should not have. (It does preclude the most extreme views on each side — the one extreme being that a word like "carrots" needs no entry, the other being that it needs an entry just as complete as "carrot", with no distinction between lemmata and non-lemmata — but I haven't seen too many people advocating either of those extremes, anyway.) —RuakhTALK 17:56, 17 October 2010 (UTC)[reply]
Perhaps I shouldn't have mentioned definitions at all. A gloss of the lemma is not the same thing. It really only functions to prevent the reader from having to click over to the lemma entry themselves. As I said, this would be of even more use in FL entries, especially because some forms-of entries descend into absolute grammatical gobbledygook when you start having definition lines that consist of "Informal second-person singular imperfect subjunctive form of..." I admit I don't share the same resistance to non-lemma definitions, but even if you do, a simple translation of the lemma in the form-of entry to aid reader comprehension seems in order. Dominic·t 19:30, 17 October 2010 (UTC)[reply]
I see what you're saying, but I actually feel almost the reverse: I would rather that we keep the information in form-of entries to an absolute bare minimum, so that it's obvious to readers that they're supposed to click the link to get full information. Otherwise, our form-of entries are just horribly incomplete full entries. —RuakhTALK 22:29, 17 October 2010 (UTC)[reply]
I agree, though AugPi with his Latin entries seems to feel precisely the opposite. Mglovesfun (talk) 08:54, 20 October 2010 (UTC)[reply]

Planned vote: Deleting Wikisaurus slash-more pages

Planned vote: Wiktionary:Votes/2010-10/Deleting Wikisaurus slash-more pages.

Discussion for the vote, from September 2010: Beer parlour, Poll: Deleting "/more" pages from Wikisaurus.

Planned start date of the vote: 20 October 2010.

--Dan Polansky 07:48, 18 October 2010 (UTC)[reply]

An administrator shouldn't use its rules to ban Pinyin

Wiki has no rules to ban Pinyin entries. An administrator shouldn't use its rules to ban Pinyin. 91.104.17.51

You're talking bollocks, Tooironic isn't against pinyin, he's against toneless pinyin. Do you knowthat if you search for the toneless version it will find the toned version automatically (if it exists) because the search mechanism knows to substituted diacriticless letters diacriticked letters. I have almost no knowledge of Mandarin, but I think it's analogous to the "the The THE" problem - that The and THE are attestable as alternative forms of the - at the start of a sentence or in a book title, for example. Mglovesfun (talk) 08:47, 20 October 2010 (UTC)[reply]
The point is he opposes Pinyin entries actually. If an administrator can ban toneless Pinyin entries base on himself's rules today, he can also ban toned Pinyin entries tomorrow. 91.104.17.51
No, you are mistaken. There is consensus to allow pinyin with tones, and individual toneless pinyin syllables (see Wiktionary:Votes/pl-2009-12/Treatment of toneless pinyin syllables), but we do not include combinations of toneless pinyin syllables. Tooironic cannot (and does not) "ban toneless Pinyin entries base on himself's rules"; (s)he's merely enforcing community consensus. —RuakhTALK 19:41, 21 October 2010 (UTC)[reply]
It is wrong, please see here 91.104.17.51 16:31, 22 October 2010 (UTC)[reply]
Can someone range block him again? This is becoming extremely tiresome. ---> Tooironic 22:14, 20 October 2010 (UTC)[reply]
123abc, if you're going to ignore my comments don't leave comments on my talk page. Since this is a written conversation, can you read what I've written above, and please reply to it? Tooironic, I don't know. Seems he's found a way of accessing a lot of IP address. I'm not the person to ask about that. Mglovesfun (talk) 22:28, 20 October 2010 (UTC)[reply]
123abc, if you were a little more honest you might get friendlier reactions here. From WT:RFD#Shengdanjie he Xinnian kuaile
"This is not a common phrase in Chinese, nobody says this, so it doesn't meet Wiktionary:Phrasebook. Furthermore, writing it in pinyin without tones isn't appropriate; for phrases that actually are common, it should either be written in pinyin with tones, or in characters. But anyway, as it's not a particularly common phrase, there's no use for this entry. Rjanag 04:25, 31 August 2010 (UTC)"[reply]
Clearly it's more than just Tooironic, Jamesjiao and Atitarev who think that tonless entries should not be allowed
The point I'm making above, perhaps badly, is that if we move all the toneless entries to toned, the search system will still be able to find them. Use isn't irrelevant of course, but we don't need to document every mistake either. Mglovesfun (talk) 10:48, 21 October 2010 (UTC)[reply]

Not only is pinyin overemphasised here, there is also this selbstgefällig and amateurish sentiment from contributors to classify varieties of Chinese which are lexically highly convergent as separate languages and endow them with symbolic headings in Chinese entries, so as to demonstrate the fact that Wiktionary acknowledges and recognises their individual language status no matter what implications this may have in terms of semantic duplication and the hollowness and paucity of significance of the actual content. Pinyin itself is not and was never even a standard orthography of Chinese in any form, and the vast inclusion of pinyin entries here in the English Wiktionary is simply astounding. Transliterations do not warrant their own entries especially if the contents are outside the scope of the basic wordlist; a correspondence list between pinyin and the actual characters is acceptable if it assists readers who are unfamiliar with what written Chinese is actually like or are at the beginning stage of learning the language, but pinyin is not Chinese and adding explanation to the pinyin entry is just overly unnecessary. What is more preposterous is this whole pinyin-based Mandarin "suffix" category and its omnifarious subcategories which basically aim to include everything that is used in compound word formation as a suffix. Wow. Never knew that "-disease" in "heart disease" is also a suffix. 贻笑大方啊。Weijicidian 05:16, 24 October 2010 (UTC)[reply]

From w:Pinyin: "The romanization system was developed by a government committee in the People's Republic of China (PRC) and published by the Chinese government in 1958. The International Organization for Standardization adopted pinyin as the international standard in 1982." Not an official orthography? It even shows up on road signs. We also classify Scandinavian as four different languages, South Slavic as eight, the Iberian Romance language as at least five. I can't imagine why we'd follow expert advice and give our users options like that.--Prosfilaes 08:50, 24 October 2010 (UTC)[reply]
Yeah, people tend to argue the opposite - that unifying Chinese would be misleading. Mglovesfun (talk) 09:37, 24 October 2010 (UTC)[reply]
Yes, traditional character, simplified character, toned pinyin and toneless pinyin are existing in the world. The entries of different forms are convenience for users. Wiktionary should be user friendly but not user unfriendly. So, Wiki shouldn't ban pinyin entries and shouldn't block users in range block (91.106.0.0) as well. 91.104.7.148 11:46, 24 October 2010 (UTC)[reply]

Categorizing "English" into "West Germanic languages"?

I believe this is a simple question, with possibly multiple replies: Do we want the entry English to be a member of a topical category restricted to West Germanic languages?

This question has been brought up multiple times: Category talk:ja:Sign languages, WT:RFDO#Category:fr:Constructed languages, WT:RFM#Category:All sign languages and similar categories and probably others. However, these discussions did not achieve a conclusion. As I read them, they focus on particular individual categories and use the whole system of other categories as arguments, such as calling Category:Sign languages as "overcategorization" because we can use only Category:Languages, or stating that Category:Turkic languages "seems topical".

Since my question is related to a whole categorization system, I may as well reword it, while keeping it unchanged in essence: Do we want the entries English, Portuguese, Chippewa and all other languages to be members of categories restricted to West Germanic, Romance, Algonquian or other families when applicable? --Daniel. 03:34, 21 October 2010 (UTC)[reply]

As almost all normal English-speakers don't think of English as being a West Germanic language, to restrict English to be in that category seems laughably perverse, especially on English Wiktionary. That languages should also categorized into one or more families, OTOH, seems appropriate and useful. I have viewed our user-visible category structure as a user aid, to be used to create useful lists and groups of lists for users. The very notion of restricting such categorization on global principles seems utterly contrary to the idea of a wiki.
Also, do all languages really fit into a strict one-parent hierarchy? Finally, is the state of knowledge about languages so complete and beyond dispute as to make a definitive strict hierarchy for all languages possible?
If there is a need for categories to be restricted to a strict hierarchy for technical purposes, perhaps that should be done with hidden categories or even a completely distinct system. DCDuring TALK 09:50, 21 October 2010 (UTC)[reply]
If I understand you correctly, you are proposing to create new topical subcategories within the topical category Category:Languages. The question behind your stated question is not so much about the entry "English" in particular as about the granularity of subcategories of the topical category Category:Languages. If this estimate is correct, you should detail your proposal by stating what new subcategories you would like to see created, and under what names.
Category:West Germanic languages is not a topical category, and it is not a subcategory Category:Languages; its parent is Category:Germanic languages, whose parent is Category:Indo-European languages, whose parent is Category:Language families, whose parent is Category:All languages, a root of some non-topical categories. Category:Turkic languages is also a non-topical category.
Until a proposal for the structure of new subcategories of Category:Languages comes, I think that the status quo of Category:Languages containing 1,161 languages is okay. --Dan Polansky 10:07, 21 October 2010 (UTC)[reply]

To reply DCDuring's questions, there would probably a category "Undetermined languages" or similarly named for languages whose family is not clear. Other variety, as I imagine them, would be "Language isolates".

Dan Polansky and DCDuring, above, gave good reasons to avoid subcategories of Category:Languages. Yair rand and Prince Kassad apparently agree with this lack of subcategories, as expressed in this other discussion. This totalizes four people. Let me make it five people, by joining you in your position. From

Dan, somewhere in your reply above you must have misunderstood me by thinking that I've had one unique proposal. What I had was a question; that is, logically, may be read as two contrary proposals: yes or no? topical subcategories or no topical subcategories?

My interest from the start was cleaning up that mess. I was asked to make a boilerplate for families, I have ideas for how to improve the organization of language categories from now on, but that huge inconsistency between "Category:Sign languages" and "Category:West Germanic languages" obliges me to start my work by applying some logic to their titles.

Since apparently everyone (or most people?) is happy with a Category:Languages that contains all languages without being further subdivided into detailed categories, I am going to remove entries from Category:Sign languages, Category:Extinct languages, Category:Constructed languages, Category:Turkic languages and all other related categories that have entries, effectively deposing their status of "topical categories". If anyone opposes this decision, please say so. I would be happy to further discuss it, if necessary. --Daniel. 23:20, 21 October 2010 (UTC)[reply]

  • You have jumped to a conclusion that does not follow from any statements above except yours. You not only have misrepresented what I have stated but are willing to act in haste based on your either willful or careless misreading. I explicitly rejected the notion that we need a simple hierarchy and advocated multiple category memberships if they might do users some good. To be even more explicit, I see no particularly good reason that English should not be in Category:Languages and Category:West Germanic languages and Category:Germanic languages and Category:Indo-European languages.
Could you please tell me how I could communicate more clearly with you to avoid such gross misunderstanding in the future. DCDuring TALK 00:14, 22 October 2010 (UTC)[reply]
In this case, it is me who should be less hasty and more attentive of others's opinions. I am sorry for suggesting the conclusion of this issue so early and unwisely. Your recent words are more than enough as reply to my "If anyone opposes this decision, please say so." I am reconsidering my thoughts and my approach; I am going to be quietly waiting for a decision to be achieved. As I said above, I will be happy to continue discussing it. As for your proposal of placing English into multiple categories, it may require a simple boilerplate such as maybe {{categorize language|gmw}} that preemptively knows what are the superfamilies of West Germanic. --Daniel. 00:55, 22 October 2010 (UTC)[reply]
OK. I don't know whether we have to go all-out with categorization. I really don't know whether it is important that English appear directly in each of the four categories I mentioned above, rather than, say two or three of them, or perhaps three or seven others as well or instead.
Our standard one-level-at-a-time presentation of members of a category tree is silly way to display a category tree. Any decent print work would display multiple layers of the tree with attested and imputed ancestral languages. Displaying the tree structure with multiple levels at once, with node labels that user can understand and explicit display of what is important to them (ie, their own language or Wiktionary's language), seems to be the standard.
In contrast, our default is only a display of category membership. If we had some more graphical means of display that fit a lot of information on a screen and worked directly from our categories, it might be worthwhile to restrict categories to support such displays. DCDuring TALK 01:20, 22 October 2010 (UTC)[reply]
(unindent) Re "Dan Polansky and DCDuring, above, gave good reasons to avoid subcategories": I have merely said that the status quo of Category:Languages being a container for all languages is okay, good enough, in no need of change. If someone comes with a proposal for subcategories of Category:Languages, I am going to evaluate that proposal. --Dan Polansky 11:03, 22 October 2010 (UTC)[reply]
I can't be bothered reading all this, but the opening statement is all wrong. West Germanic languages isn't topic, and it isn't restrictive - you can add other categories as well! It's a bit like saying an entry can't be in English nouns and English proper nouns at the same time. Mglovesfun (talk) 11:17, 22 October 2010 (UTC)[reply]
"Category:West Germanic languages" could be a topical category given its name, but it is not; and as it is not a topical category, the entry "English" does not belong there. The root of non-topical language categories starts with "All " ("All languages"), which makes it clear that it is non-topical, but the subcategories of this root are not so clearly marked as non-topical in their name. Hence the confusion. --Dan Polansky 11:23, 22 October 2010 (UTC)[reply]
It seems now to me that the category structure would benefit from deleting "Category:West Germanic languages" and its cousins, as their names are confusing. Thus, "English" and "German" would be members of "Category:Languages", while "Category:English language" and "Category:German language" would be a subcategory of "Category"All languages" but not of "Category:West Germanic languages", as that would be deleted with its parent "Category:Germanic languages" and its parent "Category:Indo-European languages". --Dan Polansky 11:31, 22 October 2010 (UTC)[reply]
Don't forget that these are also tied in with the etymology system we use. Cf. Category:West Germanic derivations, for example. -- Prince Kassad 13:16, 22 October 2010 (UTC)[reply]
Well I for one would not feel particularly sorrowful if Category:West Germanic derivations and its cousins were gone, but it is good that you point out that there is currently a dependence between Category:West Germanic derivations and Category:West Germanic languages. Nonetheless, it is possible to keep Category:West Germanic derivations while deleting Category:West Germanic languages: it suffices to remove the former from the latter. --Dan Polansky 14:09, 22 October 2010 (UTC)[reply]
Deletion of categories that have informational content because they create problems for the distinction some seem to be trying to maintain between topical and other categories seems to be a case of the tail wagging the dog. Perhaps the distinction is artificial or, at least, not to be taken too seriously. The obviously information-bearing categories for languages seem useful. Do categories have some enormous resource cost? Do they mislead or confuse users? Do they squander important screen space? DCDuring TALK 15:09, 22 October 2010 (UTC)[reply]
The current categories are prone to mislead and confuse users, given the inconsistency of names and objectives between Category:All languages, Category:Languages, Category:All sign languages, Category:Sign languages and Category:West Germanic languages.
One particular proposal that can help their distinction is creating Category:Sign language categories and Category:West Germanic language categories to contain the language categories; and, simultaneously, having Category:Sign languages and Category:West Germanic languages to contain the languages (that is, the entries defined as languages). --Daniel. 02:29, 23 October 2010 (UTC)[reply]
The sole obvious source of confusion is between all the pairs like Category:Languages and Category:All languages. This looks like a meaningless distinction to a normal human, I think, and probably to most contributors. If there are other confusions possible, I am among the confused. DCDuring TALK 20:12, 24 October 2010 (UTC)[reply]
I agree. As a contributor, I have already learned the distinction between Category:All languages and Category:Languages, but not from their names. --Daniel. 03:12, 25 October 2010 (UTC)[reply]
Suggestion: rename Category:All languages to Category:Languages by language. Compare similar pairs like Category:Adjectives and Category:Adjectives by language. -- Prince Kassad 21:45, 29 October 2010 (UTC)[reply]
That proposal does not follow the described pattern. I would expect "Languages by language" to link to categories naming languages in each language, or something similar. There may be a better alternative to our current name, but I don't believe this particular name proposal is suitable for the function. --EncycloPetey 06:22, 31 October 2010 (UTC)[reply]

Why is the Template:hbo (Ancient Hebrew) listed at WT:LANGTREAT as a red link? I would prefer that page to contain only blue links to templates, if possible. --Daniel. 04:08, 21 October 2010 (UTC)[reply]

If you click on it, when don't differentiate between Ancient Hebrew and Hebrew - we have {{etyl:hbo}} for Ancient Hebrew derivations. Ruakh deleted it and since msh210 hasn't objected (or anyone else, for that matter) I'm assuming it's a good decision. Mglovesfun (talk) 10:41, 21 October 2010 (UTC)[reply]
Ruakh has removed Hebrew from the list. It seems good enough for me. --Daniel. 01:36, 22 October 2010 (UTC)[reply]

Alternative spelling vs. wrong orthography

I have found two words (and I suspect several others) from Czech, which are marked as "alternative spelling", although they are not, as they are wrong orthography form of such word (though common mistake) and even never been alternative spelling in past.

Alternative spelling is something, what is allowed to be written as correct form (and Czech allows several words to be written in two forms), but these words are definitely not.

How do you mark wrong orthography here?

Thanks.

Danny B. 10:37, 21 October 2010 (UTC)[reply]

See {{misspelling of}} and {{nonstandard spelling of}}. The whole issue is a bit cloudy. Mglovesfun (talk) 10:39, 21 October 2010 (UTC)[reply]
One word involved in this discussion is "tchýně", a variant of "tchyně". The form "tchýně" is plentifully attestable and very common; it is not a misspelling resulting from a typo or something. It actually sounds more natural than "tchyně", to me anyway, as an analogue of "tchán". As a descriptivist lexicographer, I have no easy recourse to such things as "wrong orthography". Nonetheles, if a regulatory authority can be found that forbids the spelling, this can be mentioned in a usage note, right? I have placed such usage note to dceřinná společnost.
The template {{nonstandard spelling of}} lacks documentation, and I don't know what it means. CFI in Wiktionary:CFI#Misspellings.2C_common_misspellings_and_variant_spellings does not help much in that regard. --Dan Polansky 11:33, 21 October 2010 (UTC)[reply]

Wiktionary is not a place to say "this is tolerable or not". That would be original research and/or POV. Even if there were hundreds of thousands of appearings of such form of the word, it doesn't make it correct orthography or alternative spelling, unless authorities say so.

Thanks for {{misspelling of}} - that according to the documentation does what I needed. I'll be correcting the word I'll find with that one.

Danny B. 14:19, 21 October 2010 (UTC)[reply]

Quite to the contrary, the statement "A is not tolerable", one that you are making, is a non-descriptive one, and a point of view, in the sense of evaluation based on the evaluator's specific value attitudes and experience background. The statement that a form is plentifully attested and commonly used by the members of a language community is a descriptive one, easily verified by any Wiktionary lexicographer. A spelling that is intended by the speaker who used it is not a misspelling.
You should get acquanted with Anglo-American descriptivist lexicography. Your authoritarian continental attitude is at odds with English Wiktionary. --Dan Polansky 15:46, 21 October 2010 (UTC)[reply]
Here I go: "acquanted" in the previous paragraph is a misspelling, a spelling not intended by the author of the sentence. --Dan Polansky 15:47, 21 October 2010 (UTC)[reply]
Also note that Wiktionary has no established original research policies, and in fact OR is necessary in many cases. —Internoob (DiscCont) 19:34, 21 October 2010 (UTC)[reply]
Now that would be good to have in writing! —CodeCat 20:24, 21 October 2010 (UTC)[reply]
If we had to follow Wikipedia's OR policy, assuming that our quotations are primary sources of how words are used, we'd have trouble with "Any interpretation of primary source material requires a reliable secondary source for that interpretation." See also this discussion. —Internoob (DiscCont) 18:03, 22 October 2010 (UTC)[reply]
The idea of "no original research" is that a reader should be able to check everything from another source. For etymologies... Wikipedia rules should apply (except for obvious information). For word inclusion, attestation rules allow readers to check that the word is used. The main problem is with definitions: we must be able to include definitions for words not defined anywhere else, and this is interpretation not confirmed by a secondary source. I think that this should be allowed nonetheless, when the meaning is clear enough (but polemical definitions not confirmed by another source should be removed). Lmaltier 20:13, 4 November 2010 (UTC)[reply]

A synonym of itself in Wikisaurus

Here are two facts directly related to each other:

  1. The page Wikisaurus:man lists "man" as a synonym.
  2. The page Wikisaurus:woman does not list "woman" as a synonym.

I propose the negation of the second fact; that is, I would like to add "woman" to the list of synonyms of "woman". By extension, my opinion is that, whenever possible, the lists of synonyms of Wikisaurus should be able to duplicate the title of the page like in my first example. --Daniel. 01:16, 22 October 2010 (UTC)[reply]

I absolutely agree: "woman" should be listed as a synonym at Wikisaurus:woman. This has been my practice in most Wikisaurus pages. --Dan Polansky 05:49, 22 October 2010 (UTC)[reply]
Why? How is this a benefit? Equinox 15:37, 23 October 2010 (UTC)[reply]
Some pages are not titled as an entry word would be, such as Wikisaurus:beautiful woman, while others do have an associated entry by the same title. So, for that reason alone, it would be good to include an entry when it exists. Secondly, a user may be sent to a Wikisaurus page from an entry other than the titular one, and would miss basic information if we did not include the most obvious synonym with the collection of included words. --EncycloPetey 15:43, 23 October 2010 (UTC)[reply]
A set of synonyms in Wikisaurus does not stand for a particular headword; each set stands for a particular sense. WS:stingy stands equally well for "stingy", "miserly" and "niggardly", so it makes sense that all are listed in the list of synonyms for the given sense. A Wikisaurus entry should be complete even if someone decides to move the entry from WS:stingy to "WS:sense612" or to "WS:miserly": it should be headword-independent. --Dan Polansky 05:31, 26 October 2010 (UTC)[reply]

Linking APL "⍺" with others

The character is based on the Greek script "α" (alpha); I added this information to its etymology now.

That APL version is restricted to a computer language, therefore it is not in the entry namespace. Nonetheless, I propose adding it to the list of Appendix:Variations of "a". --Daniel. 10:31, 22 October 2010 (UTC)[reply]

Sounds good to me.​—msh210 (talk) 07:30, 24 October 2010 (UTC)[reply]
Then, done. --Daniel. 18:25, 5 November 2010 (UTC)[reply]

Double-check the Wiktionary editions, please

I have recently reorganized all the links from language categories to Wiktionary editions.

I have developed a pattern for how to deal with each language, but I do not speak most of them, so I am not confident over all my decisions. Please double check them, and point out incorrect links if necessary.

Thanks in advance. --Daniel. 15:53, 22 October 2010 (UTC)[reply]

Moldavian Wiktionary should not be linked to, because it's a closed wiki and cannot be edited. -- Prince Kassad 16:10, 22 October 2010 (UTC)[reply]
OK. I edited {{wiktionary edition}} to remove the link to the Moldavian Wiktionary. --Daniel. 02:14, 23 October 2010 (UTC)[reply]

Difference between phrase, phrasal POS, idiom and POS with idiomatic context

We have five (maybe more) different ways of categorising and displaying terms made up of multiple words. Either we can:

  1. Use a regular POS and treat it like a single-word lemma
  2. Use the POS header 'Phrase' and categorise it in 'Language phrases'
  3. Give it a proper POS and categorise it in 'Language phrasal POSs'
  4. Use the POS header 'Idiom' and categorise it in 'Language idioms'
  5. Give it a proper POS and use {{idiomatic}} to categorise it in 'Language idioms'

The difference between these possibilities is not at all clear to me. When is something idiomatic? Is a verb like give up an idiom, or is it a phrasal verb? Or maybe it's just a verb? When to use an 'Idiom' header and when to use a POS header with an idiomatic context? The overlap between these categories is often so high that I struggle to pick the right one for many entries. Usually I go for the easy option and pick option 1. So, are there any good and clear guidelines to be followed here? —CodeCat 20:58, 22 October 2010 (UTC)[reply]

For English my idiosyncratic approach is to prefer a regular PoS (traditional + Determiner) (or Proverb or Prepositional Phrase) first. For a term that crosses those categories, I choose Phrase. For more specific grammatical categorization, there are Category:English sentences, Category:English predicates, Category:English non-constituents, Category:English coordinates, and others. English has few Idiom headers left. Complications include contractions, which I often leave to others.
I treat Idiom as a sense-level phenomenon, using the {{idiom}} to autocategorize.
I don't pretend to know what should happen in other languages. DCDuring TALK 00:21, 23 October 2010 (UTC)[reply]
BTW, the leading English grammar, CGEL, is hostile to the idea of phrasal PoSes, noting that such items do not really behave as a corresponding one-word PoS, admitting modifiers and coordination, for example, in many cases. DCDuring TALK 00:27, 23 October 2010 (UTC)[reply]
Re: CodeCat - I also prefer option 1, and only use 5 if the meaning is idiomatic. For many (most?) mutli-word terms, the combination does not merit an "idiomatic" tag, in my opinion. I would call (deprecated template usage) bring owls to Athens idiomatic, since it's not actually about owls or Athens. But I wouldn't call (deprecated template usage) give up idiomatic, since it does pertain to senses of give. Past discussions have disfavored "Phrasal x" or "X phrase" as a header, although recent discussions have favored adding "Prepositional phrase" as an option. --EncycloPetey 02:02, 23 October 2010 (UTC)[reply]

Reflexive verbs

As of right now, many languages have entries for a base verb and then a second entry for the reflexive verb derived from it. However, I have also seen many entries with just a sense beginning with {{reflexive}} and no separate entry. I am wondering which of these two approaches is preferred.

A related issue is that, given that a separate reflexive entry exists, how to handle form-of pages. For example, take the Catalan verb passar-se. Its conjugation table contains links to the forms of the base verb. This is fine, because they are exactly identical. However, in many languages, there are a few verbs that have no non-reflexive equivalent, such as revenjar-se or zich vergissen. In that case, this approach fails, because you'd get form-of entries that say 'xxx form of revenjar' which is useless because that verb does not actually exist. The alternative, which is to have separate form-of entries for all combined verb+reflexive pronoun forms, seems like a bit of a waste of effort. So, what way can we handle this in the affected languages? —CodeCat 12:50, 23 October 2010 (UTC)[reply]

The tool for "assisted" adding of translations encourages the adding of reflexive phrases like "zich vergissen", which later leads to their creation as separate entries. I'm not fully comfortable with this, since I prefer to use {{reflexive}} inside the main article for the verb. --LA2 04:37, 22 November 2010 (UTC)[reply]

Category for cellular automata?

Somebody created Category:Demoscene for me a while ago. I now wonder whether cellular automata deserve a category. (On the off-chance that there are any non-nerds here: these are mathematical curiosities consisting of patterns that spread around a grid according to sets of rules.) We have breeder, glider, gun, rake, puffer, oscillator, spaceship, spacefiller, blinker, and still life: there are certainly more terms, but many would not meet our attestation criteria. Equinox 02:11, 24 October 2010 (UTC)[reply]

Alternatively, you could just list all the attestable names for cellular automata in Coördinate terms sections. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:08, 24 October 2010 (UTC)[reply]
speed of light... yes, there are certainly more attestable terms. DAVilla 10:03, 27 October 2010 (UTC)[reply]

In reaction to previous debates, I've created this. The idea is that British English (English used in the UK) should not be in the same category as British spellings (a spelling norm system with the word British in its title). Paper dictionaries tend to use UK for both, but of course, paper dictionaries don't categorize, either. Thoughts? Mglovesfun (talk) 12:12, 24 October 2010 (UTC)[reply]

I think I know what you're talking about, but can you elaborate and give some examples and explanations of each? Equinox 21:38, 25 October 2010 (UTC)[reply]
Well, realise isn't restricted to the UK, it's used in English all over the world. But it is a British spelling. Snicket certainly isn't a British spelling, but it is used in the UK. It's misleading to have two different concepts in the same category. This solution would work, just keeping the two categories consistently separate would be a nightmare. But, how would that be different from many of our categories? Mglovesfun (talk) 09:55, 26 October 2010 (UTC)[reply]

There is no consensus that British English = “English restricted to the United Kingdom of Great Britain and Northern Ireland,” so there is no clear need for a template that defines “British spelling” as something else. Furthermore, using British to mean two different things in two nearly-identical labels will just cause confusion and more arguments.

The vast majority of British English (the set of vocabulary, spellings, pronunciation, and usage) is not restricted to the the United Kingdom, nor even to Britain. It has a centuries-old world-wide history, and predates the United Kingdom itself. Elements of it are propagated in Irish, Indian, Australian, South African and other world Englishes. Elements originating in or characteristic of British English must be labelled British if they are to be distinguished from (North) Americanisms.

Yes, there are many British elements with much broader or narrower occurrence, but virtually none of these corresponds exactly to the physical, political, or chronological bounds of the multinational political entity, the United Kingdom. There is no “UK English.” (And using the label UK in the dictionary is factually incorrect, and misleads readers. A few professional dictionaries do so for brevity or political correctness, but most use British)

And if we're to categorize spellings separately from and in parallel to vocabulary, we really need a clearer and more detailed proposal. This would be an innovation above and beyond how professional dictionaries (not just “paper”) label regionalisms. Are you proposing creating {{Canadian spelling}}, {{Scottish spelling}}, etc, and how would these be applied and categorized? Michael Z. 2010-11-17 19:59 z

As far as I know, there is no Canadian or Scottish spelling system for English. Of course there is UK English, I live in the UK and I speak UK English. Your argument is that 'other dictionaries only use British, so so should we'. We're not limited by what other dictionaries do, also paper dictionaries don't categorize so this debate would be wholly irrelevant. Plus, I don't see how British English has influenced South African English, Indian English (etc.) but not American English. Your 'goal' seems to be to cause as much confusion as possible by having British spelling of internationally accepted words like realise in with regionalisms like snicket and ginnel. Mglovesfun (talk) 19:23, 25 November 2010 (UTC)[reply]
BTW I found Wiktionary:Beer parlour archive/2010/March#Why British spellings are not British words. Mglovesfun (talk) 19:25, 25 November 2010 (UTC)[reply]
Huh? Of course there's Canadian spelling! We don't spell according to either the British or US systems. There are several good dictionaries documenting it (like the Canadian Oxford Dictionary), and a rather good little book specifically about Canadian spelling (as opposed to, say Canadian English in general), Organizing Our Marvellous Neighbours: How to Feel Good About Canadian English.
So you think using different labels than other dictionaries and inventing arbitrary, undocumented conventions wouldn't fall under the general umbrella of causing as much confusion as possible? I honestly still do not understand exactly what your labelling convention is supposed to mean, or how readers are supposed to know that the label British means “according to the British spelling convention, but not necessarily a British regionalism.” Michael Z. 2010-12-02 23:16 z
Since you're the only person who doesn't, I don't really care. Mglovesfun (talk) 23:29, 2 December 2010 (UTC)[reply]
You're clearly a very smart guy, so it's hard for me to give you credit for not understanding. The main thing this debate has brought up is that people don't care a great deal. As I see it:
  1. No-one's saying that linguists don't use the term 'British English'
  2. I'd like to see proof however, that linguists use it to mean both English used in the UK and British spellings, with no discrimination between the two
  3. Per Ruakh in a previous discussion, you can write in American English using British spellings, e.g. "while walking down the sidewalk I realised my favourite pants had been torn"
  4. Since paper dictionaries don't categorize, the 'other dictionaries' argument you're presenting does not cover categorization, only what the templates display

Mglovesfun (talk) 08:09, 3 December 2010 (UTC)[reply]

Huh? Can we just make appendix pages for any fictional character now? Equinox 12:46, 25 October 2010 (UTC)[reply]

The title of this discussion is linked to "Rurutie", which is a group of fictional characters. Yes, I support the creation and maintenance of pages to define groups of characters, such as Order of the Phoenix, Team Rocket, Green Lantern Corps, Klingon, Redpill and persocom. --Daniel. 17:49, 25 October 2010 (UTC)[reply]
I am very strongly against this use of a general-purpose dictionary for fanwank. Does anyone else care? Equinox 21:26, 25 October 2010 (UTC)[reply]
Yes, it's crap and should be removed (or made into a single-page appendix). SemperBlotto 21:34, 25 October 2010 (UTC)[reply]
There are two distinct concepts: [1] the criteria for inclusion of fictional words (covered at WT:FICTION), [2] and how to format them (not formally covered as of yet). The possible proposal let's remove it or reformat it concerns both concepts but fundamentally does not provide an objective solution. One possible implementation of your (Jeff's) suggestion would be moving Rurutie to a single-page appendix (regardless of what are its differences from the current method and what are their benefits), since removing the attestable definition would be contrary to the relevant policy. --Daniel. 23:45, 25 October 2010 (UTC)[reply]
I don't think "Rurutie" is attestable, actually. --Yair rand (talk) 22:27, 25 October 2010 (UTC)[reply]
None of this stuff Daniel has been adding is attestable: they are all terms from specific (and sometimes rather niche) pop-culture franchises. I don't feel that the Wikt appendixes should be free for anyone to add their favourite TV series or whatever. Let's try to be a moderately professional project. Equinox 23:05, 25 October 2010 (UTC)[reply]
If any definition is not attestable, then it should be deleted from Wiktionary. It is conceivably not the case of most or all of the current entries that are at appendices. --Daniel. 23:45, 25 October 2010 (UTC)[reply]
I don't see the value in these appendices. They're gonna be almost impossible to find as they shouldn't be linked to from the mainspace, with a few exceptions. Mglovesfun (talk) 09:53, 26 October 2010 (UTC)[reply]
I thought the consequence of our early decisions in this area was to put such items in one Appendix per universe. There is precedent for such treatment in the appendices for the undoubted real-world phenomenon of unattested (and unattestable) military slang. Apparently the decision about fictional-universe items was not worded explicitly enough to prevent the creation of one-item-per appendices. As a result the matter is now to be brought to a vote. I, for one, don't really think a vote should be necessary.
The discussion should now be at Wiktionary talk:Votes/pl-2010-10/Disallowing certain appendices to contribute to a wise decision. DCDuring TALK 11:40, 26 October 2010 (UTC)[reply]

What is the difference between "quotation" and "citation"?

Both words citation and quotation are used in different places, including the Quotations header at entries and the Citations: namespace. In addition, there are two policies: Wiktionary:Citations (how to use the namespace) and Wiktionary:Quotations (how to format the citations).

I believe I don't have enough English knowledge to properly differentiate between the two words, and Wiktionary doesn't help, by defining them as synonyms at quotation and citation.

If possible, can we simply call them "citations" as the standard name at every relevant place (i.e., at headers, the namespace and policies)? --Daniel. 17:01, 25 October 2010 (UTC)[reply]

I would say no, because the distinction is useful. On Wiktionary, a quotation appears within an entry. A citation appears on the accompanying citation page. Otherwise, there is no difference on Wktionary. Both terms have additional meanings in the standard language. A quotation can refer to a repeated or printed version of something said (without source identification), and a citation can refer to the source identification (without the actual text being quoted). --EncycloPetey 04:51, 26 October 2010 (UTC)[reply]
"Quotations" is the better word for the principal namespace header because it is a word whose most common meaning is the best fit with ordinary user understanding IMO. Once the user is on a "Citations" page, "citations" emphasizes that we are interested only in cited quotations, as opposed to "quotations", which could be interpreted by normal users as what we call "usage examples". In reviewing entries I see evidence of user misunderstanding about what to call what we want as evidence for attestation: users often insert cited and uncited quotations under headers like "Examples" and "References".
In the user-visible content of principal namespace we do not have the luxury to being able to ignore actual usage. In such areas as template naming we sometimes can adopt a more prescriptive stance. DCDuring TALK 11:59, 26 October 2010 (UTC)[reply]

Shouldn't this be removed? It contradicts the paragraph above on inflected forms. It's outdated; nowadays we use inflecto-bots like SemperBlottoBot (talkcontribs), KassadBot (etc.). Furthermore redirects aren't better than red links, as redirects give a false impression that the entry exists. Furthermore redirecting from amabant to amo (example), for most people, is gonna cause mass confusion - vert few people will understand why they have been redirected there. Mglovesfun (talk) 09:46, 26 October 2010 (UTC)[reply]

To end all the useless chatter, I actually started something. Feel free to look through it and comment. -- Prince Kassad 15:00, 28 October 2010 (UTC)[reply]

Seems like an excellent initial criterion. Do the different phrasebooks have to use absolutely identical wording? DCDuring TALK 17:01, 28 October 2010 (UTC)[reply]
Discussion moved to the vote's talkpage.​—msh210 (talk) 17:09, 28 October 2010 (UTC)[reply]

Treatment of toneless pinyin other than syllables

Mglovesfun has created Wiktionary:Votes/pl-2010-10/Treatment of toneless pinyin other than syllables. I have posted some comments of the talk page of the vote. The vote is scheduled for tomorrow; it could better be postponed. --Dan Polansky 12:46, 30 October 2010 (UTC)[reply]

Traditional Character, Simplified Character, toned Pinyin and toneless Pinyin are existing in the world. The entries of different forms are convenience for users. Wiktionary should be user friendly but not user unfriendly. So, Wiki shouldn't ban tonless Pinyin entries 91.104.62.166 13:05, 30 October 2010 (UTC)[reply]
Yes, it should be postponed. 91.104.62.166 15:52, 30 October 2010 (UTC)[reply]

Jyutping syllable

Seems that a lot of our entries with non-standard headers are these. I don't know any Cantonese. I'd guess it's like Pinyin syllable, so it should be valid, right? Mglovesfun (talk) 13:01, 31 October 2010 (UTC)[reply]

But do native Cantonese speakers use Jyutping? This is the reason we accept Pinyin and Romaji (but, for example, no Revised Romanization). -- Prince Kassad 20:17, 31 October 2010 (UTC)[reply]
Actually, these syllables are not really "words", but rather phonetic units that may be any of several words or word parts (represented by Chinese characters). I believe they are included merely so that one can look up a character knowing only its pronunciation, and to list homophones. – Krun 12:12, 7 November 2010 (UTC)[reply]
What about pinyin syllables, same thing? Mglovesfun (talk) 12:16, 7 November 2010 (UTC)[reply]

Colloquial, formal, informal etc.

Modern Greek has two words for wine (κρασί & οίνος) - οίνος is described in dictionaries as "learned" and this seems to be just right, it is a word which is used on wine labels in order (I feel) to add an element of class or sophistication. κρασί on the other hand is the "everyday" word, but feel there should be a better term, "colloquial" implies some element of informality and would be wrong. Has anyone got a suitable sense which I could use. —Saltmarshαπάντηση 19:46, 31 October 2010 (UTC)[reply]

If it's the normal word, then it probably doesn't require any label at all. —RuakhTALK 20:23, 31 October 2010 (UTC)[reply]
If we regard {{formal}} and {{informal}} as exhaustive of possible contexts, then if κρασί is unsuitable in formal contexts, it is necessarily informal, and as such should get the {{informal}} tag. Only if κρασί were limited to spoken use (including written dialogue) would a {{colloquial}} tag be appropriate. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 18:27, 1 November 2010 (UTC)[reply]