Wiktionary:Beer parlour

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017


December 2016

Fifth LexiSession: giftEdit

Monthly trend topic is gift. This time, it was suggested by Cbyd on LexiSession Meta page. So, if you want to participate in this common task, let's try to discover what can be gathered around the word gift. Maybe not a thesaurus this time but verbs and related terms. I have no idea, it's a collaborative experiment with no guide no direction. You're free to participate as you like and to suggest next months topic. Hope there will be some people interested by this   Noé (talk) 14:45, 1 December 2016 (UTC)

I actually added gift-giving as an idea for a focus week over at FWOTD before even seeing this; fits in nicely :) — Kleio (t · c) 11:05, 2 December 2016 (UTC)
Hmm, in English to gift includes genetic or social inheritance, e.g. "Father's bequests were immaterial, gifting us with red hair, fiery temperaments, and poverty." - Amgine/ t·e 16:38, 2 December 2016 (UTC)
See also German Gift. But I think the English examples are just irony. An example with a noun would be "The gift I got for Christmas was a black eye." --WikiTiki89 16:43, 2 December 2016 (UTC)

Translations of placesEdit

I am curious to hear the current opionions/policies regarding translations of proper noun place names, with the example being Navarre and these translations added by a single anon. - Amgine/ t·e 03:26, 4 December 2016 (UTC)

The same as any other translations (or entries)- usage or an external reference if it's a limited documentation language. DTLHS (talk) 03:29, 4 December 2016 (UTC)
I may have been too abstruse: the user added translations in 69 languages. It seemes likely the user is copying from another reference work. How would one attempt to determine if the copying is a copyvio if we do not know the origin? Is there a policy directly addressing translation copyvios? WT:COPY would seem to guide, and it suggests raising the question in fora, like here. - Amgine/ t·e 03:41, 4 December 2016 (UTC)
It seems more likely they were copied from the various Wikipedias. DTLHS (talk) 03:42, 4 December 2016 (UTC)
There is an Egyptian Arabic wikipedia? <weirds> - Amgine/ t·e
@Amgine: Arabic is a pretty broad language family--moreso than German, less so than Chinese. The Egyptian Arabic (Masri) Wikipedia was announced at a Wikimania in Egypt. And a list of facts can't be copyrighted, so it's not a problem for anyone to copy them. No one can "own" a list of U. S. state capitals. —Justin (koavf)TCM 05:14, 4 December 2016 (UTC)
<nods> I am aware, barely, of the semitic groups. Being more knowledgeable of online communities than linguistics, I was surprised by there being a large enough active community to maintain a wikipedia, and that they had created an article on arz:Navarre. - Amgine/ t·e 17:06, 4 December 2016 (UTC)
Not under that name, obviously. It's at w:arz:نافارا. —Aɴɢʀ (talk) 18:38, 4 December 2016 (UTC)

Splitting MonguorEdit

Monguor, spoken by a Mongolic minority in Qinghai and Gansu provinces of China, is traditionally considered a single language with two macro-dialects: Huzhu and Minhe Monguor, henceforth Mongghul and Mangghuer. However they have been treated as two languages in most recent literature:

"Mongghul, or Huzhu Mongghul, is, together with (Minhe) Mangghuer, generally
referred to as ‘Monguor’ in the specialist literature. The Chinese nomenclature subsumes
the two populations and their languages under the designation Tu or Turen ‘Local
People’, and assigns only dialect status to the two varieties. Linguistically it is, however,
clearly a question of two separate languages. The traditional name Monguor, which is
nothing but a transformed shape of *monggol, is, strictly speaking, not justified for
Mongghul, since the syllable-final sound change *l > r characterizes, apart from
Mangghuer, only part of the dialects of the Mongghul language, notably the Naringhol
(more exactly, Narin ghuor) dialect. The shape Mongghul, on the other hand, is based on
the Halchighol (Halqighul) variant, which is territorially more widespread, has more
speakers, and is the basis of a newly created literary language."

~Stefan Georg in The Mongolic languages (2003)


"Mangghuer, or Minhe Mangghuer, is spoken in Minhe Hui and Tu Autonomous County,
at the extreme eastern edge of China’s Qinghai Province, just north of the Yellow River.
"Mangghuer has not usually been described as a language in its own right. Rather, it has
been treated as one of the two main dialects of the ethnic ‘language’ spoken by the
official ‘Monguor’ (Tu) nationality, the other dialect being (Huzhu) Mongghul, spoken
mainly in Huzhu Tu Autonomous County, also in Qinghai. However, Mangghuer speakers
and Mongghul speakers alike report that they are unable to understand each other. While
no comprehensive study of the differences between these two linguistic systems has been
undertaken, it is fairly clear that they are different enough to warrant independent treat-
ment. Since the two speech communities are not geographically contiguous, this ought
not to be surprising."

~Keith Slater in The Mongolic languages (2003)

If you want to see the difference yourself, check this paper out, it is pretty drastic, especially phonotactics. Neither of them is featured in print much, but Mongghul is a legit literary language endorsed by the Chinese authorities, with an orthography based on pinyin; while for Mangghuer "there exists an unofficial orthographical norm", which developed naturally due to Mangghuer phonology being quite similar to Mandarin phonology.

I don't know if we need to have a vote on this, but here's my case. Crom daba (talk) 09:17, 4 December 2016 (UTC)

How would we implement this split practically? The code for Monguor is mjg. Would we keep that code for one variety (presumably the one with the larger published corpus, if that make sense in this case) and devise a new Wiktionary-internal code (presumably xgn-something) for the other? Or should we deprecate mjg and devise two new codes, e.g. xgn-hmo and xgn-mma? Could we use mgj as the basis for the new codes, and call them something like mgj-huz and mgj-min? These questions aren't necessarily for Crom daba, but more for users with experience in creating new codes, like CodeCat and -sche. —Aɴɢʀ (talk) 09:38, 4 December 2016 (UTC)
We don't have much Monguor material here so splitting it either way won't be much of a headache. I'd advise for keeping mjg for Mongghul and perhaps using mjg-min for Mangghuer. Crom daba (talk) 16:47, 4 December 2016 (UTC)
The codes should start with xgn. Pinging @-sche because they haven't commented yet. Personally, I'm on board with the split. —Μετάknowledgediscuss/deeds 04:48, 7 December 2016 (UTC)
On a related note, I see that we already have the most obscure of the Mongolic languages, Kangjia. May someone with access to Module:languages/data3/k mark it as Mongolic (xgn). Crom daba (talk) 03:07, 5 December 2016 (UTC)
  Done. By the way, Requests for moves, mergers and splits is where we usually discuss language treatment, though it sometimes gets discussed here, too. Chuck Entz (talk) 04:25, 5 December 2016 (UTC)

Romance reflexive verbsEdit

How are we supposed to treat them? I've seen usage all over the place. See recoucher, an apparently reflexive-only verb with a non-reflexive conjugation table that isn't linked to from se recoucher. Then we have the mention in the non-reflexive form and also the page itself, as in atrofiar and atrofiarse. I've been avoiding adding reflexive verbs because they're such a mess. Not to mention the translation tables that reflect these inconsistencies. Ultimateria (talk) 16:45, 5 December 2016 (UTC)

It depends on the language. In Spanish, reflexive verbs are one word so it makes sense to treat them as such. But in French or Catalan, they're two words, so it makes more sense to treat them as reflexive senses on the base verbs. The latter is how we treat them for Dutch as well, see vergissen. —CodeCat 16:49, 5 December 2016 (UTC)
Italian reflexive verbs are, of course, handled perfectly - even having proper conjugation tables. See lavarsi as an example. SemperBlotto (talk) 16:50, 5 December 2016 (UTC)
Part of the problem is that "reflexive" is used to mean many different things. In Spanish there are "pronomial verbs", some of which are purely reflexive (must always use the reflexive pronoun), some which can be reflexive or not depending on context, some verbs which are "quasireflexive" and are used in passive constructions. DTLHS (talk) 18:16, 5 December 2016 (UTC)
Recently, generally, with my Spanish reflexive verbs, I generally add a reflexive tag to a defn line. And link, for example, documentarse to documentar. But if something is only used reflexively, I give it its own lemma, like autolesionarse. --Derrib9 (talk) 18:21, 5 December 2016 (UTC)
I like the idea of keeping the reflexive tag in the lemma unless it's reflexive only. So in that case you just create the page in the [infinitive]se form with something like "reflexive form of [inf]" and a conj table? Then link to it in the lemma as a...related term? Alternative form? Ultimateria (talk) 14:11, 7 December 2016 (UTC)
Will the reflexive infinitive still receive its own inflection table? —CodeCat 15:04, 7 December 2016 (UTC)
I think we discussed this a long time ago; I'm in favor because although it's a logical combination of pronouns and verb forms, it's still helpful for language learners (plus things get tricky in commands) (and look at the cool pronunciation feature of the French ones!). The question becomes where does the conjugation belong. Look at cansar. It has a reflexive sense, a reflexive conj table, and a link to cansarse. I also like that cansarse is mentioned as "reflexive of cansar" followed by a description. However I also think it makes sense for cansarse to have the same table. The other thing is that unless the label in cansar says |reflexive|cansarse}}, the conj table labeled "conjugation of cansarse" doesn't mean much. Once again I'm thinking of the average American high schooler forced to take Spanish who doesn't have a clue. The struggle was real. Ultimateria (talk) 16:10, 7 December 2016 (UTC)

Article creation wizard discussionEdit

Hi all,

This proposal suggests WMF to fund the writing of an article creation wizard at Wikipedia, but with enough interest it may -- in my personal view -- be expanded to write an article creation wizards framework or library for use at non-Wikipedia wikis, such as here. If desired, please join the discussion before December 12.

  • What tools do we use here, now, to make article creation easier for newbies?
  • What requirements do we have for a potential implementation?
  • How would you like to inform the people of the article creation perks and difficulties on this wiki?
  • What else needs to be considered?

Thanks. --Gryllida 03:47, 6 December 2016 (UTC)

Judging by the average quality of new English entries and requested entries, our bigger need is for quality-improvement processes and tools to make English articles better. But non-English entries may be helped. DCDuring TALK 12:50, 6 December 2016 (UTC)

Unavailability.Edit

I'm going to be unavailable until December 14. Normally I'd joke that I'd like to see the dictionary finished before I get back, but this time I'd like to make an earnest plea for us to spend some time focusing on the projects of our departed friend User:Robert Ullmann, particularly User:Robert Ullmann/Missing, User:Robert Ullmann/Oldest redlinks. Cheers! bd2412 T 03:54, 7 December 2016 (UTC)

Not really related, but I have spent a lot of effort cleaning up after Robert Ullmann's Swahili contributions. The largest remaining technical hurdle is redoing his conjugation templates from scratch; if anyone Lua-competent wants to help me do that, I would appreciate it. (This may have to wait a couple weeks because of my obligations IRL, however.) —Μετάknowledgediscuss/deeds 04:45, 7 December 2016 (UTC)
Also much appreciated. Robert Ullmann left a great legacy here, and I would like to see all of his projects completed. bd2412 T 20:28, 14 December 2016 (UTC)

English present participlesEdit

I have noticed that even though they are verb forms, this is deliberately not recorded, for example having a wank. Is there any particular reason for this? I can't think of one, and prefer to record them as verb forms (in line with other verb forms) as well as present participles. DonnanZ (talk) 12:28, 7 December 2016 (UTC)

And the silly thing is, because the template used is "head|en|present participle", it is futile adding "nocat=1", it is still recorded as a present participle. DonnanZ (talk) 12:37, 7 December 2016 (UTC)

Participles are verb forms, so they are categorised as verb forms, just indirectly. —CodeCat 15:05, 7 December 2016 (UTC)
Since the main point of using a headword template is categorization, |nocat=1 would be silly. {{head}} allows for extra categories, so you could simply add "verb form" as one, I suppose. Chuck Entz (talk) 15:12, 7 December 2016 (UTC)
I don't think there is a point to this. I'm not aware of any language where the normal practice for participles is to categorise them as verb forms and participles. —CodeCat 15:13, 7 December 2016 (UTC)
The problem seems to be inconsistency, as "head|en|verb form" is used with {{en-third-person singular of}}, {{en-past of}} {which also records past participles), and{{en-simple past of}}; but not {{past participle of}} or {{present participle of}}. Forgive me if I'm wrong but this seems to be a CodeCat decision. All that needs to be done is alter "head|en|present participle" to "head|en|verb form" and remove |nocat=1 in every case. DonnanZ (talk) 15:51, 7 December 2016 (UTC)
But it's not an inconsistency, because those aren't participles. As long as CAT:English participles is a subcategory of CAT:English verb forms (and it is), the individual participles don't need to be in the "verb forms" category directly. —Aɴɢʀ (talk) 15:54, 7 December 2016 (UTC)
Use of {{en-past of}} makes a mockery of that idea, as it records entries in both Category:English past participles and Category:English verb simple past forms. DonnanZ (talk) 16:12, 7 December 2016 (UTC)
{{en-past of}} encompasses both past simple and past participle, so categorizing in both Category:English past participles and Category:English verb simple past forms makes sense. Compare grown, which is only categorized as a participle. Crom daba (talk) 16:40, 7 December 2016 (UTC)
Of course it makes sense, they are also recorded in Category:English verb forms, which doesn't happen with other participles with the present set-up. DonnanZ (talk) 16:51, 7 December 2016 (UTC)
I see what you mean. Perhaps participle categorization should only take place in definition templates rather than headword templates.
On the other hand I'm guessing the only reason you care is because our category browsing display can't show a union of subcategorized entries, and I'd like it if our communal anguish grew great enough for someone to implement this rather than constantly making workarounds (our recent derived/borrowed categorization change is another example of this.) Crom daba (talk) 17:56, 7 December 2016 (UTC)
I agree about the latter point, that's a real mess too, with countless terms listed as requiring etyl cleanup. Not a lot of joined-up thinking. DonnanZ (talk) 21:00, 7 December 2016 (UTC)
Petscan may interest you. It's a WikiLabs project, it can do most set operations, works for Wiktionary and has a nice interface. —Enosh (talk) 11:32, 9 December 2016 (UTC)
It looks as though the only way of dealing with participle entries is to do it yourself. There's only (!) 25,000-odd present participles. As for etyl cleanup, that's being left to the dogsbodies and could take years. DonnanZ (talk) 14:59, 8 December 2016 (UTC)

Deprecated labelsEdit

We have a number of deprecated labels (see: Category:Entries_with_deprecated_labels). I don't know that we have a process for deprecating labels, so I am not sure how they came to be deprecated. Some of the deprecated labels have suggested replacements (and I have replaced most or all of those). The ones which do not have a suggested replacement (singular, plural, ordinal [and really cardinal, but there is a typo]) probably should have. The questions are: 1. how should labels be deprecated? 2. should the currently deprecated labels be deprecated? 3. what should be the replacements for the four remaining deprecated labels? - TheDaveRoss 21:49, 9 December 2016 (UTC)

Much confusion about {{given name}} in parameter |from=Edit

As seen in Category:English male given names, the parameter |from= can be:

  1. Name of a language (Category:English male given names from Ancient Greek)
  2. How the name is generated (Category:English male given names from coinages‎)
  3. The place of origin instead of the language name (Category:English male given names from India‎)

The editor who included that functionality is already inactive, so the decision can be made by you.

As for me, I would probably keep the first two, and replace the third type with the language name.

--kc_kennylau (talk) 11:16, 11 December 2016 (UTC)

I'm done with fixing these editsEdit

diff. I've tried to make a solution, it was reverted. Tried to discuss a solution, nothing changed. If nobody wants to do anything about it, neither will I. —CodeCat 18:38, 11 December 2016 (UTC)

Did you link the wrong diff? That edit had nothing to do with you AFAICT. —Μετάknowledgediscuss/deeds 19:01, 11 December 2016 (UTC)
Template_talk:sense#CodeCat.27s_and_DCDuring.27s_edits_to_this_template_on_May_21.2C_2014--Dixtosa (talk) 19:10, 11 December 2016 (UTC)

Template:table:seasonsEdit

I only just discovered these cool table things. Where can I find a list of all the different types of tables? And is there a table for the planets (Earth, Mercury, Venus, etc.)? ---> Tooironic (talk) 05:57, 12 December 2016 (UTC)

  • Special pages => All pages with prefix => Display pages with prefix (table:) with namespace (Template). SemperBlotto (talk) 06:56, 12 December 2016 (UTC)
  • I created most of those. If you like them, then you're welcome. 😁 --Daniel Carrero (talk) 14:49, 12 December 2016 (UTC)
  • Hold on, I'll try to create a table for the planets now. --Daniel Carrero (talk) 14:51, 12 December 2016 (UTC)

@Tooironic: Ok, I created Template:table:Solar System/en. I kinda wanted to include the moons, so I had to choose among including zero, some or all moons. I included some. Also I linked to the Sun. The table could probably have less or more objects, so feel free to discuss or suggest changing the table in some way. --Daniel Carrero (talk) 16:15, 12 December 2016 (UTC)

Making wiktionary more visible to search enginesEdit

If you were trying to find a dictionary of a particular language via search engine, you might type in in something like "Hungarian dictionary" into a search query. However as it happens, at least for Google the search results pages don't have anything on Wiktionary at all for at least the first seven pages. This is in-spite of the fact that this site ranks among the top 600 websites globally. Perhaps what I'm suggesting is that there should be pages titled "(language name) dictionary", that search engines can more easily find and associate with search queries looking for exactly that, and these pages. Maybe pre-existing articles like Wiktionary:About Ancient Greek, Index:Swahili, or Category:Georgian language can be retitled along those lines?--Prisencolin (talk) 10:02, 12 December 2016 (UTC)

However, if you search for a Hungarian word (such as csirke), Wiktionary is one of the first results. --WikiTiki89 14:42, 12 December 2016 (UTC)
I would oppose any SEO scum tricks that try to make our pages sound more relevant than they are. We could quite possibly improve our rankings a little without going that far, though. Equinox 14:44, 12 December 2016 (UTC)
For example, I could imagine putting "taxonomic name" and "scientific name" in the metadata (Is that where it goes?) of every page that had {{taxoninfl}}. Would that help or do Google and others already do something similar? DCDuring TALK 13:36, 13 December 2016 (UTC)
That's essentially SEO... but it's also not a terrible idea, to be honest. —Μετάknowledgediscuss/deeds 18:00, 13 December 2016 (UTC)
There are some very basic things which we don't do. For instance, we don't set meta keywords. I am not sure if we even can. It would probably make sense to have all of our definition pages contain keywords like "dictionary, definition, language, word, translations, etymology" etc. - TheDaveRoss 18:39, 13 December 2016 (UTC)
Google's Matt Cutts has said that "Google does not use the keywords meta tag in web ranking" (no doubt because SEO scum ruined them by keyword stuffing). Equinox 18:42, 13 December 2016 (UTC)
Shows what I know about SEO. I do note that other's are using them still, but it is nice to know that we don't have to worry about ranking lower because we can't modify them. - TheDaveRoss 18:49, 13 December 2016 (UTC)
My suggestions are:
  • Find a way to tag definition lines as the main content of a page, especially the definitions of a word’s primary senses.
  • Link to Wiktionary from other sites more often (not in an advertisement-y way, but whenever a dictionary would be a useful resource).
Ungoliant (falai) 19:21, 13 December 2016 (UTC)
  • The obvious thing is to link from WP, Wikispecies, and any other WMF projects where it might make sense. Perhaps we should do a run against a file from the XML dump of English WP that identifies all the lexical tokens therein, lemmatize the tokens, and determine what subsets of those terms might benefit from a word-specific link of the form ''[[wikt:Rosa|Rosa]]''. It might be possible to do something more targeted at WikiSpecies, relating to either etymology or vernacular names.
In the case of taxonomic terms, unlike the case of real languages, we face the problem that the L2 header fails to convey the nature of the content provided, even in the broadest terms. At least a Hungarian L2 section conveys the language. DCDuring TALK 01:20, 14 December 2016 (UTC)
Linking from the other projects is only helpful for search ranking if they do not use rel="nofollow" (another tactic forced upon us by the prevalence of spammers). I think Mediawiki does use that. Equinox 01:37, 14 December 2016 (UTC)
I think when people search for "Hungarian dictionary" they are expecting a single Hungarian / English or English / Hungarian dictionary. We have no unified interface for such a thing. Everything else will just be unwanted noise. DTLHS (talk) 01:39, 14 December 2016 (UTC)
And we can take that in two directions: one, that we should focus on creating better interfaces for our content, something most people here probably don't have a lot of experience with or much interest in. Or, we can focus on providing data and not worry too much about how that data is presented. Then if the data is machine readable, other people will be able to use that data to produce more traditional dictionaries. DTLHS (talk) 01:51, 14 December 2016 (UTC)

Voting limitsEdit

I edited and restored the previously-withdrawn vote Wiktionary:Votes/pl-2016-11/Voting limits per its talk page. Feel free to discuss/edit the vote. --Daniel Carrero (talk) 13:13, 13 December 2016 (UTC)

New way to edit wikitextEdit

James Forrester (Product Manager, Editing department, Wikimedia Foundation) --19:31, 14 December 2016 (UTC)

Oh, I didn't realise the visual editor was already enabled here. Regarding the new text editor (esp. in the context of Wiktionary), I left some feedback over at MediaWiki. I've also begun to add TemplateData for some frequently used templates (Category:TemplateData_documentation). This will help new users in the visual editor as well as more experienced editors once the new editor can handle syntax highlighting and autocompletion. – Jberkel (talk) 11:24, 15 December 2016 (UTC)

SOP translationsEdit

It occured to me that in many cases translations of English words into foreign languages may be phrases whose meaning is derived from the sum of their parts and thus do not deserve an entry by themselves. Do we have a way to easily enter such phrases into translation tables and have parts linked individually, and if not, how would you prefer we implement this? Crom daba (talk) 23:38, 17 December 2016 (UTC)

Yes, just link them individually inside {{t}} (won't work with the automatic translation adder). DTLHS (talk) 23:41, 17 December 2016 (UTC)
I'd say it's pretty urgent that someone makes this work with the translation adder! —CodeCat 00:55, 18 December 2016 (UTC)
It's line 727 in User:Conrad.Irwin/editor.js- it would be easy to remove the check but I don't know if it would break other things. Done? Revert if necessary. DTLHS (talk) 01:02, 18 December 2016 (UTC)
Thanks. I hope it works. DCDuring TALK 02:02, 18 December 2016 (UTC)
  • I was just thinking about this today. Can SoP translations be added for Chinese translations using the automatic translation adder? If so, how? ---> Tooironic (talk) 14:44, 5 January 2017 (UTC)

Calques categoryEdit

Could we make a separate category for "X terms calqued from Y terms" instead of lumping it into borrowings? Crom daba (talk) 18:04, 20 December 2016 (UTC)

  •   Support --Daniel Carrero (talk) 18:22, 20 December 2016 (UTC)
    • I support this, and I support placing all the entries from "X terms calqued from Y" in "X terms derived from Y" too. --Daniel Carrero (talk) 01:23, 29 December 2016 (UTC)
  •   Support as long as the entries are not also placed in "X terms derived from Y". —CodeCat 18:47, 20 December 2016 (UTC)
  •   Support. I'm undecided about CodeCat's extra condition. --WikiTiki89 18:54, 20 December 2016 (UTC)
  •   Support. Andrew Sheedy (talk) 04:53, 21 December 2016 (UTC)
  •   Support. I also support CodeCat's extra condition. It is weird and misleading to see native words in "X terms derived from Y". --Vahag (talk) 05:06, 29 December 2016 (UTC)

Is this to everyone's satisfaction (double categorization aside)? Crom daba (talk) 23:43, 28 December 2016 (UTC)

  • Also if someone likes tedious but helpful work, DTLHS made a handy list of words manually marked as calques that need to be templatized here. Crom daba (talk) 04:55, 1 January 2017 (UTC)

Category:Forms of entries that do not existEdit

Just added this category to Module:form of- I can split more individual languages if anyone is interested. DTLHS (talk) 03:28, 23 December 2016 (UTC)

Maybe split off Japanese since Template:ja-romanization of's use of Module:form of and Category:Japanese_romaji_without_a_main_entry is currently creating a redundancy. —suzukaze (tc) 10:03, 23 December 2016 (UTC)
There seem to be something going wrong here — the category is also being filled with forms of entries that are not in mainspace (i.e. alternate forms of reconstructions). --Tropylium (talk) 16:41, 12 January 2017 (UTC)
It wasn't working properly in other cases anyway, and didn't catch most of the things I hoped it would. Removed from the module. DTLHS (talk) 16:46, 12 January 2017 (UTC)

BooksEdit

Shouldn't we have the ones of Category:Wiktionary books (user books) out of Category:Books? Sobreira ►〓 (parlez) 17:25, 23 December 2016 (UTC)

@Sobreira: Definitely. Thanks. —Justin (koavf)TCM 21:08, 23 December 2016 (UTC)

Populate "Japanese terms spelled with (...)" with all members from "Japanese terms spelled with (...) read as (...)"Edit

For example, I'd like to populate Category:Japanese terms spelled with 上 with all the words found in its subcategories.

Say, 坂上 would be a member of Category:Japanese terms spelled with 上, as well as Category:Japanese terms spelled with 上 read as かみ and Category:Japanese terms spelled with 上 read as うえ.

Rationale: When I'm looking for a new Japanese word spelled with a certain kanji that I don't know how to pronounce, I'd rather just see one category about the kanji than click on all the 11 subcategories to find the right word. --Daniel Carrero (talk) 10:04, 24 December 2016 (UTC)

I agree that it would be useful, for learners especially, but could words be added to categories on such a massive scale and avoid clutter? 天人了 (talk) 08:51, 25 December 2016 (UTC)
Currently Category:Japanese terms spelled with 上 itself (not subcategories) is populated with words with irregular readings and words for which {{ja-kanjitab}} is not filled out. What new category would these words fall into? —suzukaze (tc) 09:00, 25 December 2016 (UTC)
If we want to categorize entries for which {{ja-kanjitab}} is not filled out, then I suggest using a single category for all these terms. It should make it easier to clean up by adding the readings. I suggest creating either of these:
  1. Category:ja-kanjitab not filled out or
  2. Category:Japanese terms lacking pronunciation in the character boxes (I prefer this one)
If we want to categorize entries with irregular readings, we may want to create categories like this:
Feel free to suggest other ideas, including different names for these categories. --Daniel Carrero (talk) 09:18, 25 December 2016 (UTC)
  • FWIW, I've always been baffled that parent categories are not proper supersets of child categories. As Daniel notes above, this is an avoidable usability barrier. ‑‑ Eiríkr Útlendi │Tala við mig 18:22, 28 December 2016 (UTC)
  • We could do that easily if we committed to categorizing with templates only (not likely to happen). DTLHS (talk) 18:23, 28 December 2016 (UTC)
    • Most of our categories in the format "Category:English ..." are already filled by templates, as opposed to many categories in the format "Category:en:..." that are not. Apparently all or most of the "Japanese terms spelled with (...)" are already filled by templates, although I didn't do a complete check. --Daniel Carrero (talk) 01:22, 29 December 2016 (UTC)

Vote: Boldface in image captionsEdit

FYI, I created Wiktionary:Votes/2016-12/Boldface in image captions.

Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 13:18, 24 December 2016 (UTC)

Has there been any discussion of this issue at all? Is it something that needs to be voted on? —Aɴɢʀ (talk) 08:04, 25 December 2016 (UTC)
I found this: Wiktionary:Beer parlour/2011/March#Boldface in image captions --Daniel Carrero (talk) 20:22, 30 December 2016 (UTC)
There's been no recent discussion, AFAIK. I expect the discussion to develop before the vote or during the vote, but I also expect this to be largely a matter of taste. This could have been a poll instead of a vote, and I have tried to set up the vote in a largely symmetric matter, but I think we can give it a try with a vote. --Dan Polansky (talk) 15:13, 31 December 2016 (UTC)
The voting form is incomplete.   Support has been left out. DonnanZ (talk) 00:17, 29 December 2016 (UTC)
This appears to be by design. There are two options that can be supported. --Daniel Carrero (talk) 20:28, 30 December 2016 (UTC)

V-spellings in katakana and other alternative formsEdit

I added two alternative forms of loanwords spelled with ヸ, ヸタミン and テレヸジョン, however should all such terms be added? Considering the number as new loanwords from English are still increasing, this may not be possible. The two that I now added are definitely in use enough to encounter in the wild, some others are also, but theoretically any loanword can be formed with such hypercorrect respelling... I try to include as many alternative forms as I can think of at the moment or add later, which is why in the case of Southern Altai words I only list the alternative spellings or synonyms that exist in different dialects without the standardized variant, rather than separate entry for each variant to avoid clutter.

But Japanese is a major language of the world, unlike the Altai dialects or other Turkic languages etc. Others have added alternative spellings as both new entries and in the main entry, with no standardized practice I can see. The reality is with Altai that not many contribute to Wiktionary actively, and there is a political dimension, with Japanese this is different as Wiktionary is full of Japanese speakers. Should a standardized practice policy exist for alternative forms?

More importantly the concern is, could the addition of links to alternative forms be automated? To manually add all if such are included, not only is timeconsuming but impossible to avoid clutter. My proposal if possible would be an automation for links to alternative forms and synonyms to be added as expandable boxes, as with Chinese terms, so that alternative forms and synonyms can be added without cluttering the entry. But, this is only if alternative forms and synonyms are to be massively included as new entries and/or listed with possible redlinks, naturally. 天人了 (talk) 09:34, 25 December 2016 (UTC)

Japanese is a well documented language, which means any entry has to be citeable from at least three uses (not mere mentions) from independent, permanently archived sources over more than a year. If that's true of ヸタミン and テレヸジョン and any other forms with ヸ, they can be added. If not, they can't. I would therefore avoid using automation to add them, because it takes a human brain to determine which forms meet WT:CFI and which don't. —Aɴɢʀ (talk) 09:45, 25 December 2016 (UTC)
BTW, isn't it more common to use ヴィ to represent /vi/ rather than ヸ? Are ヴィタミン and テレヴィジョン more widely used than ヸタミン and テレヸジョン? —Aɴɢʀ (talk) 09:49, 25 December 2016 (UTC)
In that case, it is sensible to delete them as neither of the words is found in permanently archived sites. I'm sorry for adding them without first asking. 天人了 (talk) 16:08, 25 December 2016 (UTC)
The entries should be speedily deleted and the user warned for making things up. --Anatoli T. (обсудить/вклад) 10:58, 25 December 2016 (UTC)
You are right that they are not used enough in permanent ways online, then they should be deleted, but to call the entries "BS" and accuse me of "making things up" is not civil. You say this of Japanese but not Altai so I don't want to unfairly accuse you of politics. But, my suspicion is raised because as a Russian you say I should be "warned", sounds like an emotional reaction to suggest I am a vandal. Perhaps I am paranoid to suspect this an attempt to invalidize the Altai entries, but such is so very common in the real world by government policies, that it is questionable if politics has a part in your response... 天人了 (talk) 16:08, 25 December 2016 (UTC)
They needn't be found in online permanently archived sources. Print sources like books, magazines, and newspapers are acceptable too. —Aɴɢʀ (talk) 19:00, 25 December 2016 (UTC)
@天人了 My ethnicity or government policies have nothing to do with this. I care about the quality of the dictionary. I'm not suggesting you're vandal. Vandals are blocked, not warned. I don't know what Altai has to do it either and what I said about it. In any case, you have to prove that these spelling are used by using online or offline permanently archived sources. --Anatoli T. (обсудить/вклад) 23:03, 25 December 2016 (UTC)

Adding Proto-TupianEdit

Hello! As you may see, I'm new to Wiktionary, in fact, English is not even my native language. Anyways, I've been wanting to add Proto-Tupian to Wiktionary but I need some help, as I don't know much about this. I have a lot of reconstructions to add and I would really appreciate if Proto-Tupian could be added to Wiktionary. Regards! --~𝔪𝔢𝔪𝔬 (talk) 23:19, 25 December 2016 (UTC)

We'll need to add a code for the language to Module:languages/datax. Also are you planning on adding Tupian family entries in conjunction with their proto language ancestors? DTLHS (talk) 23:25, 25 December 2016 (UTC)
Also what books and letters will you be using? We'll need to document this at Wiktionary:About Proto-Tupian so that others know how to contribute in the same form later. Crom daba (talk) 03:14, 26 December 2016 (UTC)
The code would have to be 'tup-pro', and the letters to be used are ɨ, ã, ẽ, ĩ, õ, ũ, ʔ, ˀ, ʷ and ʸ. --~𝔪𝔢𝔪𝔬 (talk) 17:03, 2 January 2017 (UTC)
OK, I've added the code. I recommend creating a few entries and people will let you know if you're doing something wrong. DTLHS (talk) 17:15, 2 January 2017 (UTC)
It also needs to be added to the Tupian family and made an ancestor of Tupian languages. Crom daba (talk) 17:15, 3 January 2017 (UTC)
Also we have a few Guaraní varieties not subsumed under Tupian. Crom daba (talk) 17:18, 3 January 2017 (UTC)
I've made you a template editor so I think you should be able to edit the data modules now. DTLHS (talk) 17:19, 3 January 2017 (UTC)
Some Tupian languages apparently use the Mongol script... Anyone know what was meant here? Crom daba (talk) 17:50, 3 January 2017 (UTC)
Just a copy and paste error I assume. DTLHS (talk) 18:05, 3 January 2017 (UTC)
Hum. Is it Proto-Tupian or Proto-Tupi-Guarani (PTG)? Actually, I am a linguist working on a Tupi-Guarani language (siriono) and I've been part of a research team on PTG. We haven't described the protolanguage but established cognates to compare 33 languages (all TG but Mawé and Aweti). Our database is not published yet so I imagine you do not rely on this one. But, what are your sources? On how many Tupi non-TG languages are they based? Is it only a lexicostatistic study? Please add the references on every pages and not only on one main page not connected to the others. Noé (talk) 16:19, 4 January 2017 (UTC)
Here's our first Proto-Tupian lemma. Crom daba (talk) 16:49, 4 January 2017 (UTC)
Do you need help with making reference templates @Guillermo2149? Crom daba (talk) 16:49, 4 January 2017 (UTC)
I found this work (in Portuguese) that has PT and PTG reconstructions if anyone's interested, but it uses a different spelling convention than Guillermo's lemma. Crom daba (talk) 16:49, 4 January 2017 (UTC)

bucketsfulEdit

Is it the case that the plural of nouns of the form "xful" may always be either xfuls or xsful? SemperBlotto (talk) 08:19, 26 December 2016 (UTC)

  • And is "xsfuls" always wrong? SemperBlotto (talk) 08:20, 26 December 2016 (UTC)
I expect dialectic variance. cupfuls and cupsful would suggest the latter answer is no, in upper-midwestern, but not apparently in spelling locale of my computer. - Amgine/ t·e 17:40, 28 December 2016 (UTC)
What about "buckets full"? DonnanZ (talk) 00:20, 29 December 2016 (UTC)
AIUI, "three buckets full" would refer to three actual buckets that are full, while "three bucketsful" or "three bucketfuls" is just a particular volume that would fill three buckets, though it might be in some other container right now. To answer Blotto's question: I've added a lot of these and I check both plurals in Google Books. Some have a -sful plural and some do not, in practice. Equinox 02:28, 29 December 2016 (UTC)

"References" / "External sources" voteEdit

Based on Wiktionary:Beer parlour/2016/November#Suggestion: Mention on WT:EL the fact that external links ≠ references, I created this vote:

Feel free to discuss/edit/whatever. --Daniel Carrero (talk) 19:11, 31 December 2016 (UTC)

The vote is scheduled to start in a couple of weeks, just in case some people are not available right now because of the holidays.
In case people want to check the vote or maybe change anything, I'll ping everyone who participated in the previous discussion: @CodeCat, DCDuring, Equinox, Jberkel, Isomorphyc, Angr, Erutuon, Wikitiki89. --Daniel Carrero (talk) 12:10, 1 January 2017 (UTC)

January 2017

Reconstructed Latin WordsEdit

So what I'm proposing is to rename this section, which deals with reconstructed Vulgar Latin verbs, nouns, etc. to Proto-Romance and apply the proper phonology for that, which has also been reconstructed in the same way, i.e. via applying the comparative method to modern Romance Languages. The result reflects the last common ancestor of all of them, which was Proto-Romance by definition. Vulgar Latin is a much broader term which can cover centuries before the period in question; the about page for Vulgar Latin even feels the need to specify that 'the form of Vulgar Latin that is considered in Wiktionary's entries is primarily the latest common ancestor of the Romance languages.' So I think we're essentially in agreement on this point.

One benefit of the Proto-Romance label is that we can have a better idea of the phonology to use.

Let's take a look at some of the issues concerning the current phonology:

1) Mulgo = [molgo]

The first [o] reflects only the Italo-Western Romance pronunciation, while Sardinian and Eastern Romance are both given with [u].

That is because they all came from Proto-Romance [ʊ].

2) Festizo = [ˈfes.tʲe.zo]

The source cited for this page indicates that the < z > represents [ʤ]. That section then refers to a previous one that mentions that this evolved along the lines [dj] > [j] and sometimes to [ʤ] depending on context. For Proto-Rom. I'm fairly certain that we're still at the [dj] stage. In either case, [z] is wrong. Also the bot has automatically palatalized the t to tʲ, which can't be right as all the derivatives show that it was a simple [t].

3) Desidium = [deˈse.ðe.õ]

The consonant ð did not exist in the period in question (2-4th. centuries AD) according to any source I could find. Looking at this phonological timeline for the French language, the intervocalic lenitions (which is what we're seeing here with desi[d]ium -> desi[ð]ium ) happened during and after the 'Proto-Gallo-Ibero-Romance' period, so essentially it affected the areas of France and Spain. This would also explain why the consonant doesn't appear in Italian, which keeps e.g. the [t] in vi[t]a, which became [ð] in Spanish/Portuguese and early Old French (and disappeared in later French).

I can't find a source saying that nasal vowels existed at all in Proto-Rom. so to me that final õ is suspect. For whatever reason, the bot has failed to make the second (prevocalic) < i > into [j], as it usually does. Also, since the mergers of [ɪ] into [e] and [ʊ] into [o] had not occurred yet in PR, the overall pronunciation here would be along the lines of [deˈsɪdjʊ].

In sum, the section would more accurately be called Proto-Romance and its current phonology is sketchy and could stand to be replaced with the one linguists have come up with for that period. —This unsigned comment was added by Excelsius (talkcontribs).

Category:Pages with ISBN errorsEdit

This page is being populated by the new {{ISBN}} template. Entries in this category have ISBNs that do not pass the checksum test or are possibly incorrectly formatted and cannot be recognized as valid. They will need to be researched carefully and replaced with the correct numbers. DTLHS (talk) 7:50 pm, Today (UTC−8)

Module:check isxn This module is crosslinked at d:Q21856723 (but, of course, en.wikt doesn't have interwiki links through d: yet). It provides functionality for other standardized numbers such as ISSNs but I didn't copy over any templates for that. —Justin (koavf)TCM 05:00, 4 January 2017 (UTC)

Stable 20 % growth in active editors?Edit

From stats:wiktionary/EN/PlotEditorsZZ.png it looks like around the end of 2015 there was a 20 % jump in active editors which has been sustained until now. Most of the growth seems to come from the English Wiktionary: stats:wiktionary/EN/PlotEditorsEN.png. If we look at the detailed statistics (stats:wiktionary/EN/TablesWikipediansEditsGt5.htm), we see that en.wikt has had around 300 monthly active editors for years, but since April 2016 is around 390. October 2016 has been the first month ever with over 400 active editors (429); while deletions can reduce this number upon recalculation from updated dumps, I think it's likely to stay above 400.

Do you agree this is real growth and not just a statistical glitch? Do you have any idea what's going on? --Nemo 08:13, 4 January 2017 (UTC)

Because we are cool. Maybe also because Wiktionary is starting to become useful in reading and not only fun to edit. People find answers here and can just add small hints and not create a whole new page on there first edit. It is easier then to become a regular editor. Another plausible explanation is people coming from Wikipedia because of fights there or because of Wikidata improvements. Finally, it may come from the multiple efforts people are doing to talk about Wiktionary all over the world, like by doing a talk at Wikimania or creating content on Youtube about the project. In short: Wiktionary is becoming trendy   Noé (talk) 16:25, 4 January 2017 (UTC)
Maybe we could find some of these supposed new editors and ask them which of those reasons, if any, they have for being here. DTLHS (talk) 16:29, 4 January 2017 (UTC)
I started editing frequently around the end of 2015, so I suppose I'm part of this growth, but my case was mainly due to a revival of my personal interest in languages and etymology. Turning to Wiktionary came naturally due to its etymologies and the breadth of its coverage of old languages (especially Latin). — Kleio (t · c) 15:49, 7 January 2017 (UTC)
When did the translation editor java script gadget get added? That seems to be something that lots and lots of people use between 5 and 100 times. - TheDaveRoss 20:59, 5 January 2017 (UTC)
I think User:Conrad.Irwin started working on that (User:Conrad.Irwin/editor.js) on 9 April 2009‎. —Stephen (Talk) 11:29, 6 January 2017 (UTC)
Wiktionary, particularly en.WT, has a developing reputation amongst academicians. Language students and academics, and hobbyists of same, are likely our largest contributing pool as our content is most related to their field(s) of interest. Chicken, egg. (This is, imo, a failing of the WT project.) - Amgine/ t·e 17:09, 8 January 2017 (UTC)
@Amgine: I'm confused--how would attracting specialists be a bad thing? Do you think this project is too difficult for newcomers or lay editors? —Justin (koavf)TCM 17:57, 8 January 2017 (UTC)
Wikimedians work on what interests them. A group of specialists and academics are interested in, and build, rather different things than a lay person who is looking for a dictionary. - Amgine/ t·e 21:17, 8 January 2017 (UTC)
@Amgine: I agree that this project is the most intimidating. I encourage totally new users to check out q:, since it is so easy to edit and has so few rules and templates. This would be the last project because of its complexity. —Justin (koavf)TCM 21:51, 8 January 2017 (UTC)
So all the growth is thanks to my Wiktionary special on the Wikimedia research newsletter? ;-) I would like to believe so, but I'm not sure there are so many academics contributing. If there are, they should be pointed to the conclusion of the GLAWI paper, which I quote again: «Wiktionary serves its purpose well by having little constraints and maximising participation, while standardization can be performed downstream».
Nowadays, I suspect that any academic who finds Wiktionary too simple will spend time on d:Wikidata:Wiktionary or custom SemanticMediaWiki wikis, rather than try making Wiktionary gradually more similar to what they want. --Nemo 16:09, 11 January 2017 (UTC)
I agree with DTLHS, let's list the new active users and write to a sample of them. :) If someone wants to work on this, I can help with database queries. Nemo 16:09, 11 January 2017 (UTC)
@Nemo Bbis: are you thinking of OmegaWiki in particular or do you know of other examples? —Justin (koavf)TCM 16:24, 11 January 2017 (UTC)

Sixth LexiSession: carEdit

Monthly trend topic is car. You are invited to participate in the common goal to discover what can be gathered around the word car! There is already a Wikisaurus on vehicle and a Wikisaurus on automobile but there is still a lot to describe on pimping cars like spoiler, hubcap and vinyl roof. Plus, illustrations are welcome for every parts of cars and we may also imagine figures to illustrate the internal structure of vehicules!

This collaborative experiment is still running without any guide nor direction. You're free to participate as you like and to suggest next months topic. If you do something, please report it here, to let people know you are involve in a way or another. Hope there will be some people interested by this one   Noé (talk) 16:40, 4 January 2017 (UTC)

Redirect CJK & Kangxi radicals to common CJK charactersEdit

As we discussed around October 2016 about to redirect halfwidth & fullwidth characters to common ones, CJK & Kangxi radicals are in the same situation. The CJK & Kangxi radicals appear same as its common CJK characters. I suggest to redirect them either. For example, U+2F00 KANGXI RADICAL ONE should be included into U+4E00 CJK UNIFIED IDEOGRAPH-4E00 (and add character info too), and so on. Except only if a radical does not have same common character (I think I see some), so it can have its individual page. I am starting to do this at Thai Wiktionary. [1] How do you think? --Octahedron80 (talk) 02:50, 7 January 2017 (UTC)

PS. I also think about CJK compatible characters to be redirected. But I am not sure whether wiki system will allow to do that. --Octahedron80 (talk) 03:19, 7 January 2017 (UTC)

I'm pinging @I'm so meta even this acronym in case he's interested, because this is about character boxes and redirects.
Support both types of redirect wherever applicable. The wiki software apparently already redirects automatically all the codepoints in "CJK Compatibility Ideographs" and "CJK Compatibility Ideographs Supplement", but I'd support adding the second character box in all these entries.
Apparently, things like the / are just basically two codepoints for the same character. Unrelatedly, I naturally support keeping separate entries for simplified/traditional Chinese, which is a different thing.
I generally support redirecting any separate codepoints that are basically variations of the same single character. For example, I redirected ("HEAVY HEART EXCLAMATION MARK ORNAMENT") to ! because when you write an exclamation mark with a large heart style, it's still an exclamation mark. As I said in the last vote, if we define "D" as "Fullwidth form of D", this would not make a lot of sense if Wiktionary were printed or if the reader does not care about separate codepoints, because it's just a typographical variant. We might as well define "D" as "Comic Sans MS form of D".
See Category:Character variation redirects. I believe that currently, all the targets of these redirects have multiple boxes to account for the character variations. If the boxes are not enough (they are geeky and may require a bit of knowledge of Unicode to read properly), we may want to eventually add usage notes en masse to all these entries, explaining the differences of codepoints. --Daniel Carrero (talk) 07:41, 7 January 2017 (UTC)
I   Oppose this. The "Kangxi Radicals" and "CJK Radicals Supplement" Unicode blocks specifically represent ⾞ as a symbol used in classification of Chinese characters rather than 車 as a morpheme meaning "cart". —suzukaze (tc) 14:48, 7 January 2017 (UTC)
What is the difference? The radicals are only used as dictionary indices collecting for convenience. And we are the dictionary makers here. ;-) --Octahedron80 (talk) 14:53, 7 January 2017 (UTC)
"Kangxi Radical Cart" is a "translingual symbol" in the purest sense. Usage of "Kangxi Radical Cart" in text semantically differs from usage of "CJK Unified Ideograph 8ECA", unlike usage of halfwidth vs. fullwidth characters. —suzukaze (tc) 14:59, 7 January 2017 (UTC)
In addition, not all of them have "CJK Unified Ideograph" counterparts, like "CJK Radical repeat". —suzukaze (tc) 15:04, 7 January 2017 (UTC)
[edit conflict] See also chapter 18 of the Unicode Standard, page 20.suzukaze (tc) 15:09, 7 January 2017 (UTC)
It's just the Unicode name to call it something individually; it is not very special. If you dig a bit, you will find that they are already mapped with common characters on compatible mode. So it is not so wrong to use common 'car' as Kangxi 'car', and vice versa. Yes, some of them have no common character as I told above; we still keep them. --Octahedron80 (talk) 15:07, 7 January 2017 (UTC)
@Octahedron80: Regardless of the merits, if any, of redirecting these entries (which I support doing, as I said above), please wait some time (maybe a couple of weeks) before creating more redirects. I see that you redirected a few pages already, but we might want to discuss this first. Before redirecting the full/halfwidth characters, I waited 3 months, by my count: I created a one-month vote that started two months after I had created a discussion with the initial proposal. --Daniel Carrero (talk) 15:19, 7 January 2017 (UTC)
^ I also have a lot of work at thwikt with the same topic. So I must recess from enwikt for now. --Octahedron80 (talk) 15:27, 7 January 2017 (UTC)
Are there any paper dictionaries that define 車 as "cart" but which keep ⾞ separately defined as something like "This a symbol used in classification of Chinese characters! Don't confuse it with 車, which is absolutely different!"? On the contrary, are there any sources (even written in Chinese) that assign both uses to the same character, by saying something like: "you can use the cart symbol (車) as a way to classify Chinese characters"? --Daniel Carrero (talk) 15:24, 7 January 2017 (UTC)
  1. Anyone who would explicitly distinguish them in a traditional dictionary must be mad. But we're not a traditional dictionary.
  2. In common use, the "CJK Unified Ideograph" character is regularly used to represent the concept of the Kangxi radical (such as at 1 or 2) and most people probably don't even know of the existence of the "Kangxi Radical" Unicode block.
  3. Many dictionaries include a "Kangxi radical" definition for the character. For example, [2] (a dictionary by the Taiwanese Ministry of Education) defines the character as
    1. an alternative form of (rén)
    2. one of the 214 radicals.
  4. Accordingly, in September I added similar definitions to the "Translingual" section of "CJK Unified Ideographs" entries here while creating entries for "Kangxi Radicals" characters.
  5. See the Unicode Standard chapter I linked to above.
suzukaze (tc) 15:35, 7 January 2017 (UTC)
The halfwidth & fullwidth characters (and other outdated symbols) exist in Unicode with the same reason, backwards compat. Why not these radicals can't do the same way :-) . --Octahedron80 (talk) 15:41, 7 January 2017 (UTC)
These Unicode symbols have a difference explicitly defined by Unicode itself. —suzukaze (tc) 15:49, 7 January 2017 (UTC)
(Don't blame me.) I read your message many times. It sounds like you support in 1-3 as: After redirecting, we could include all available definition of a same appearent character in one page, even radical's and compat's. Because this is not traditional dict, no one would go mad. After including, we would have one more character info that still contains the radical's information, as the character itself and its categories, like the old ones. So nothing would be missed. --Octahedron80 (talk) 16:31, 7 January 2017 (UTC)
@Suzukaze-c: Naturally, feel free to disagree with me, but personally I still support redirecting the entries as Octahedron80 suggested. You did say that it's madness for traditional dictionaries to separate between 車/車; apart from the "traditional dictionary" label, you did not state any reason to justify the madness, (I'm not sure if my edit to $ which you linked implies something) but for one I believe that it really is a bad idea for print dictionaries to do so, because just by looking at the shape of 車/車 it's not possible to tell the difference between them. Still, if print dictionaries can't have these as separate entries for that reason, then it means that if Wiktionary were to be printed in the future (which it might be, either as an official Wikimedia project or as a legally CC-licensed derivative by any person), then it's going to be a "traditional" dictionary too in some sense, and it would be a bad idea for Wiktionary to keep 車/車 separated, too.
You linked to your edits on the entries and , the "main" one having a sense linking to the "radical" one. On principle, I applaud the idea of building a system to organize the characters, including your effort to create Template:mul-kangxi radical-def to link the entries. But I think that merging them is better: if we were going to have a whole sense in the "main" entry for the radical, then clicking on the separate "radical" entry did not seem able to provide very much else for the reader besides having an additional link to Index:Chinese radical/玄. We can just add the index link in the "radical" sense of the "main" entry and be done with it; this should be able to save one click from the reader's time.
I have the impression that, no matter whether the reader tries to access or at first (as in, they might have copied any of those from a separate website), the information on the whole "main" entry is probably going to be of interest; the "radical" page basically just had a sense linking back to the "main" page. The "main" entry contains the etymology, images and links to external databases. The "radical" entry did not have these things and was therefore incomplete; if it had these things, they would be a repetition of what can be found in the main "entry", and therefore reading it might be considered a waste of time.
As you said, in the dictionary you linked to, (the "main" entry) is defined as "one of the 214 radicals"; that dictionary does not use (the "radical" entry) for that definition. That does not invalidate the existence of the "radical" codepoint, but it does serve as additional evidence for the hypothesis that 儿 and ⼉ are the same character; possibly, there are applications that sort the "radical" in a different way (I didn't check) and/or have other hidden properties of these characters; the "radical" sense in the "main" entry should probably state "use the codepoint for the radical when a semantic distinction is needed". This usage instruction came from the page 21 of the Unicode policy you linked. It says: "The characters in the CJK and KangXi Radicals blocks are compatibility characters. Except in cases where it is necessary to make a semantic distinction between a Chinese character in its role as a radical and the same Chinese character in its role as an ideograph, the characters from the Unified Ideographs blocks should be used instead of the compatibility radicals." --Daniel Carrero (talk) 18:40, 7 January 2017 (UTC)
Thanks for the ping, Daniel Carrero. Primâ facie, I would support this, but suzukaze-c's opposition and this paragraph from page 688 of the Unicode Standard: “Characters in the CJK and KangXi Radicals blocks should never be used as ideographs. They have different properties and meanings. U+2F00 kangxi radical one is not equivalent to U+4E00 cjk unified ideograph-4e00, for example. The former is to be treated as a symbol, the latter as a word or part of a word.” make me hesitant. How would this proposal affect KangXi radicals that are visually distinguishable from their equivalent CJK unified ideographs? — I.S.M.E.T.A. 23:03, 7 January 2017 (UTC)
re. ISMETA: There are no "Kangxi Radicals" characters that are visually distinct from the "Unified Ideograph" equivalent (unless the font designer is weird or you get picky), but there are multiple characters in "CJK Radicals Supplement" that may map to a single "Unified Ideograph" character () and some that may map to multiple or no Unified Ideographs.
re. Daniel: Think of it this way: the character 儿 (Unified Ideograph) encompasses usage as both a radical and a morpheme used in language and running text while the character ⼉ (Kangxi Radical) is solely for usage as a radical. —suzukaze (tc) 02:49, 8 January 2017 (UTC)

So I made an Evenki transliteration moduleEdit

It's based on the pre-1938 Latin alphabet with the following changes:

  1. Cyrillic orthography doesn't differentiate between ə and e (э and е) after n, ņ t, d, ʒ and j as the characters are used to indicate palatality.
    • I've left ə for эand e for e in all such cases, since even if we found a way to automatize this as it's been done for Russian, there are too few resources to practically recover these distinctions.
  2. I've replaced the original transliteration of ш (s which is what с maps to as well) to ş (which щ maps to as well). In the literary dialect this sound occurs only in Russian loanwords, however some dialects have it in native words.


Does the community support this transliteration? Can I put it in operation? Crom daba (talk) 03:13, 7 January 2017 (UTC)

I have no comments on Evenki specifically, but I'd like to request that whatever transliteration conventions we create, they be 1) documented, and 2) linked from the appropriate places, such as Wiktionary:About Evenki (which doesn't even seem to exist yet). A decent further addition might be 3) motivation for the particulars of the transliteration, if that's not too much of a bother. --Tropylium (talk) 16:11, 12 January 2017 (UTC)
Done, thanks for not letting me off the hook without the documentation. Crom daba (talk) 04:01, 15 January 2017 (UTC)

Steps towards a policy on ... place namesEdit

It would help to have an official policy on place names. The current CFI talks around the issue but does not provide clear guidance: "Wiktionary articles are about words, not about people or places. Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the specific places and people belong in Wikipedia."

I think this is unhelpful to editors particularly less experienced members of the community. Editors risk investing a lot of time in articles that later get deleted - even without a formal policy change. That is off putting. I can see that in the past there has been some difficulty in achieving consensus but I would like to try again.

I would suggest that we start with a very simple policy to increase the chances of reaching agreement. My proposed policy statement is:

"The following place names meet the criteria for inclusion:

  • The names of continents.
  • The names of seas and oceans.
  • The names of nation states.
  • The names of primary administrative divisions (states, provinces, counties etc).
  • The capital cities of nation states.
  • The capitals of primary administrative divisions.

The community has not reached a consensus on place names/geographic features other than those listed above."

John Cross (talk) 19:10, 7 January 2017 (UTC)

These rules would exclude major Dutch cities like Leiden, Eindhoven and Breda, Belgian cities like Leuven and Charleroi, and German cities like Cologne and Nürnberg. I think that would be absurd. —CodeCat 19:55, 7 January 2017 (UTC)
My intention was that the policy was neutral on Cologne for example. John Cross (talk) 21:56, 7 January 2017 (UTC)
IMHO it'd be good to just have a blanket acceptance of all placenames. Toponymy and other forms of onomastics should fall within Wiktionary's scope, I think. — Kleio (t · c) 22:40, 7 January 2017 (UTC)
I'm inclined to agree with Kleio on this. I also dislike the use of the term nation state in this proposed policy statement; the nation state is a historically rather recent phenomenon (the 1648 Peace of Westphalia instituted it, according to the traditional account), so how does this policy affect Latin toponyms attested in Classical to Renaissance sources? — I.S.M.E.T.A. 23:11, 7 January 2017 (UTC)
Agreed We could have Springfield 1.) a city in Illinois, 2.) a city in Oregon, etc. for common place names and The Hague would have info on the one municipality that has that name. Place names can have etymologies and will certainly be attested. —Justin (koavf)TCM 23:12, 7 January 2017 (UTC)
Are you saying we should have a separate definition (and translations) for each city in the US named Springfield? DTLHS (talk) 23:14, 7 January 2017 (UTC)
@DTLHS: If they all have the same etymology (which these would), then I think it's fair to say "A common city name found in Illinois, Ohio, ..." —Justin (koavf)TCM 23:24, 7 January 2017 (UTC)
I would agree that for common city names, there should generally be a single entry, except where one particular usage is more important. Our current entry for Springfield is a good example. Capital cities should be noted individually because the capital is often used synonymously with the government of the entire jurisdiction. bd2412 T 00:05, 8 January 2017 (UTC)
We have Wiktionary:Place names as a list (under construction) of different types of places of each country for further analysis. Also, I agree with Kleio on this: "IMHO it'd be good to just have a blanket acceptance of all placenames." --Daniel Carrero (talk) 00:14, 8 January 2017 (UTC)
I too would agree to blanket acceptance of all place names. DonnanZ (talk) 00:29, 8 January 2017 (UTC)
What is a place name? Would we accept names of streets and buildings? DTLHS (talk) 00:42, 8 January 2017 (UTC)
This is a rule of thumb that can be discussed/changed by other people: I think I'd be fine with having all place names from continents and countries to cities, towns, neighborhoods, villages, etc.
I guess we probably don't want names of buildings and streets here; although we do have Empire State Building and Harley Street. I wonder if we are going to keep only a few streets and buildings that are notable. --Daniel Carrero (talk) 01:43, 8 January 2017 (UTC)

Place names - 2nd attemptEdit

I was really pleased with all the feedback I got on my 1st attempt here is attempt 2.

"The following place names meet the criteria for inclusion:

  • The names of continents.
  • The names of seas and oceans.
  • The names of countries.
  • The names of areas or regions containing multiple countries (e.g. Middle East, Eurozone).
  • The names of primary administrative divisions (states, provinces, counties etc).
  • The names of conurbations, cities, towns, villages and hamlets.
  • The names of natural geographic features (e.g. deserts, mountains, rivers) which are notable (per Wikipedia:Notability).

The Community has not yet reached a consensus as to whether or not place names/geographic features other than those listed above should be included in Wiktionary."

I have avoided mentioning streets, tunnels, buildings etc.

John Cross (talk) 08:24, 8 January 2017 (UTC) (amended John Cross (talk) 08:36, 8 January 2017 (UTC))

I think that if the name is a single word (e.g. Haymarket) then we should include it. If it is two or more words (e.g. Downing Street) then we should include it only if we can provide three or more usages of it in the usual sort of sources. SemperBlotto (talk) 08:57, 8 January 2017 (UTC)
Probably not an issue, but it feels a bit odd for us to rely on a WP policy (notability) that is somewhat outside our control. Equinox 10:21, 8 January 2017 (UTC)
What about place names that are linguistically interesting, but not otherwise "notable"? DCDuring TALK 17:00, 8 January 2017 (UTC)
<nods> I think the notability question should be relevant to Wiktionary's sphere of interest. However, it does become a local problem to determine what is notable. E.g. I have several books on feature names and their origins of the west coast of North America from the late 19th and early 20th century, and most every named rock has a story of a shipwreck, confrontation, or other historic event in any of several dozen languages. - Amgine/ t·e 17:19, 8 January 2017 (UTC)
I think place names should be allowed, provided they meet CFI. I imagine that that would exclude almost all but the most notable street names, which would not be mentioned in any published works that are citeable (i.e. excluding reference books, maps, etc.). The number of such places would doubtlessly be enormous, but I see no harm in such entries, if people are willing to add them. Etymologies of even the most insignificant places can be interesting to those who live near it, and it would be kind of neat to eventually be able to search for any town in the world and find out how it got its name, etc. I have no strong feelings on including street names or non-notable towns and villages, but I support adopting John Cross's criteria. Andrew Sheedy (talk) 21:51, 8 January 2017 (UTC)
Pretty much every city and town in the United States has a "Main Street", and I would hazard a guess that pretty much all of them that have local newspapers could meet CFI (news stories always name the street where something happened). Are we ready to have an English entry with more proper noun senses than water has translations? And definitions that mostly say "a street in <NAME OF A CITY OR TOWN>", at that. As for the laziness of users limiting quantity: we seem to attract more than our share of obsessive types who will literally add every permutation theoretically possible of everything unless someone stops them. Without an extremely clear and robust consensus about what is not permissible, we could find ourselves either with dozens of entries that look like the index of a large map book, or with boatloads of rfds- or both. Chuck Entz (talk) 04:27, 10 January 2017 (UTC)
I support this as well, and feel similar to Andrew Sheedy on this. Etymology is lexically interesting, whether it's a common noun or a place name. So is pronunciation, which we should certainly also include. I would like John Cross to clarify whether subdivisions of cities are also included. This is important because the status of settlements can change: what was once a separate village can later become a suburb of a larger city. Since official status can change at any time, this might imply that includability of a term is temporary, and what was once includable might lose that status once its real-life status changes too. I think that's undesirable, so we should make our rules independent of the official status of any particular place.
As for rivers in particular, we could adopt a different definition from notability, one based on how many places it flows through, or perhaps its length or water volume. These are more objective criteria. —CodeCat 22:57, 8 January 2017 (UTC)
Well, I am far more interested in the water-related place names than the land-based ones. As a nautical buff I would suggest there are many reasons for a waterway to be notable for itself, even if it no longer exists. E.g. portions of the Zuiderzee such as this bight northwest of Amsterdam which no longer exists but certainly once had a name, the North Pacific Gyre which is a solely current-defined feature, and the Nahwitti Bar (which forms the western end of Goletas Channel between Vancouver and Hope Islands on the north-eastern coast of the former, named after the 'Nakwaxda'xw tribe whose native language is 'Nak̕wala Kwak'wala, part of the Northern Wakashan group.) - Amgine/ t·e 00:45, 9 January 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I could probably support this, as long as the notability requirement were removed from point seven; we could have something like “The names of significant natural geographic features (e.g. deserts, mountains, rivers) which are notable (per Wikipedia:Notability).” — with “significant” allowing for fine-tuning by casuistry — instead. — I.S.M.E.T.A. 02:03, 10 January 2017 (UTC)


Place names - 3rd attemptEdit

I was really pleased with all the feedback I got from previous attempts - here is attempt 3.

"The following place names meet the criteria for inclusion:

  • The names of continents.
  • The names of seas and oceans.
  • The names of countries.
  • The names of areas or regions containing multiple countries (e.g. Middle East, Eurozone).
  • The names of primary administrative divisions (states, provinces, counties etc).
  • The names of conurbations, cities, towns, villages and hamlets.
  • Districts of towns and cities (e.g. Fulham).
  • The names of inhabited islands and archipelagos.
  • The names of other significant natural geographic features (such as large deserts and major rivers).

The Community has not yet reached a consensus as to whether or not the names of places and geographic features other than those listed above should be included in Wiktionary. There is currently no definition of "significant natural geographic features", but by way of an example, the twenty largest lakes in the world by surface area would each qualify. It is hoped that the Community will develop criteria over time to provide greater clarity and address matters not currently covered (for example the names of streets, buildings, tunnels). This policy is not intended to remove or reduce the requirement to find citations to support entries."

Please let me know what you think and please also help me to get this passed as a policy. Thank you.

John Cross (talk) 05:16, 12 January 2017 (UTC)

Support. Andrew Sheedy (talk) 05:36, 12 January 2017 (UTC)
I support everything except the requirement that islands be inhabited. — I.S.M.E.T.A. 08:21, 12 January 2017 (UTC)
Support As well. I suppose that strictly speaking you can also add some copy about how other locations are assumed to not fit the criteria unless otherwise notable or somesuch. Maybe provide a higher threshold for attestation for (e.g.) streets than continents. —Justin (koavf)TCM 08:26, 12 January 2017 (UTC)
  • Support. I would imagine that an uninhabited island could still qualify as a significant natural geographic feature. I'm not very worried about missing out on insignificant uninhabited islands. bd2412 T 23:58, 12 January 2017 (UTC)

Moved the draft text to an actual policy vote: https://en.wiktionary.org/wiki/Wiktionary:Votes/pl-2017-01/Policy_on_place_names

John Cross (talk) 05:12, 13 January 2017 (UTC)

The vote is now open: Wiktionary:Votes/pl-2017-01/Policy on place names John Cross (talk) 09:00, 21 January 2017 (UTC)

Vote: Trimming CFI for Wiktionary is not an encyclopedia 2Edit

FYI, I created Wiktionary:Votes/pl-2017-01/Trimming CFI for Wiktionary is not an encyclopedia 2 to try again to remove misleading sentences from CFI. The first attempt was at Wiktionary:Votes/pl-2015-02/Trimming CFI for Wiktionary is not an encyclopedia.

@John Cross: The vote may help address some of the concerns you have raised recently. Note that the English Wiktionary inclusion policy about places names is actually at WT:NSE, and in practice leads to a fairly indiscriminate inclusion of a broad variety of place names. A vote that well indicates the English Wiktionary stance on place names is this: Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2. --Dan Polansky (talk) 08:46, 8 January 2017 (UTC)

@Dan Polansky thank you. I have voted. John Cross (talk) 20:27, 17 January 2017 (UTC)

Wiktionary:Entry layoutEdit

POS orders I don't see anything specifying the order in which parts of speech should be placed in an entry. It seems like it goes something like Nouns→Verbs→Adjectives→...others? But I don't see a particular guideline. For instance, book has this order but the meanings of a bound collection of pages and codices is a lot more common than when the police enter a perp into the system (the verb form). At perfect, the adjective is before the adverb but this is also true of alphabetical order. Is there a preferred order for these entries? If so, it seems wise to state it explicitly. —Justin (koavf)TCM 22:06, 8 January 2017 (UTC)

They're ordered similar to senses: from common to rare. If they are ordered differently in an entry, it should be fixed. —CodeCat 22:43, 8 January 2017 (UTC)
I always thought they were alphabetized (adjective, noun, verb). DTLHS (talk) 22:49, 8 January 2017 (UTC)
Not usually. But I'd tend to put (say) the noun above the verb if the noun clearly came first — even though I prefer senses to be ordered by commonest current usage first. Equinox 22:50, 8 January 2017 (UTC)
Like Equinox, I list POS sections in order of historical development (ditto for etymologies, in the case of homonyms), but I also do that for senses. AFAIK, there does not exist a consensus here vis-à-vis listing by historical development vs. frequency in common usage vs. alphabetical order vs. whatever other scheme; accordingly, "fixing" the order of POS sections and/or senses, as CodeCat suggests, is likely to prove controversial. — I.S.M.E.T.A. 01:58, 10 January 2017 (UTC)

Actualités: Monthly news of French Wiktionary in EnglishEdit

Hi all,

We continue to translate our monthly publication in English, for people interested in what is going on in French Wiktionary. This month, we relate a big project we have on Occitan, a language spoken in the Southern France, and give some wordplay in French. We're eager to receive your feedback and joke suggestions!

Every months, we provide a bunch of metrics and try to write a couple of paragraphs about our beloved project. This edition is the 21th issue, so it is allowed to drink alcool in US, something quite unusual for a periodical newspaper! Noé (talk) 00:01, 9 January 2017 (UTC)

You forgot the link! [3] Equinox 00:09, 9 January 2017 (UTC)
Do you have any insights about running large projects involving multiple editors? On this site it seems like people prefer to work alone and we don't really do anything like focus on a single language for a month. DTLHS (talk) 00:39, 9 January 2017 (UTC)
The Occitan effort was part of a larger trans-project effort in continuing to develop content in one of the several primary regional languages of France. A parallel does not seem to exist in English, although possibly it might be if there were an effort to build content in Scots within en.WP and across the English wikimedia projects. - Amgine/ t·e 03:44, 9 January 2017 (UTC)
Cornish, anyone? Equinox 15:54, 10 January 2017 (UTC)

Special:Contributions/Cherusk - Russian pronunciation editsEdit

The edits of this user are currently under my scrutiny. Apparently they have unsuccessfully tried their luck on the Russian Wiktionary first. A regular knowledgeable user think it's a troll. Cherusk uses various non-standard or dialectal pronunciations, records own audio files. In some cases they are definitely wrong pronunciations. E.g. diff. Not sure if they are good faith edits. Considering a block if Cherusk doesn't stop. I don't have time to check each audio file but I can judge by the IPA edits. --Anatoli T. (обсудить/вклад) 03:21, 9 January 2017 (UTC)

Move character box images to Lua modulesEdit

See this diff.

I'd like to move all the images from the character boxes in each entry to Lua modules like Module:Unicode data/images/000. The charbox template ({{character info/new}}) is already prepared to find images in the modules.

Images to be moved: Category:Character boxes with images.

Rationale:

I think this would be a good idea, in case we want to use a database of images for each character somewhere else other than charboxes. In the future, I'm thinking of showing all the character images automatically in the Unicode tables like Appendix:Unicode/Latin Extended-A, if it's OK with everyone.

(ping: @I'm so meta even this acronym)

--Daniel Carrero (talk) 07:07, 10 January 2017 (UTC)

@Daniel Carrero: This change seems entirely positive to me. Just to make sure I understand you properly, do the numbers in the PAGENAME for Module:Unicode data/images/000 (taking your example) denote the first three numbers common to the Unicode codepoints in the range for which that page lists data? That is, does that “000” denote “U+000##” (where the first zero denotes the Basic Multilingual Plane), meaning that Module:Unicode data/images/000 covers everything from U+00000 NULL to U+000FF LATIN SMALL LETTER Y WITH DIAERESIS (ÿ) — in this case the entirety of the C0 Controls and Basic Latin and C1 Controls and Latin-1 Supplement Unicode blocks? — I.S.M.E.T.A. 10:42, 10 January 2017 (UTC)
@I'm so meta even this acronym. Good question. The answer is: I'm planning to imitate what is already done for character names. (unless there is any suggestion of a different naming system) We have modules like Module:Unicode data/names/000, Module:Unicode data/names/001, etc. The full list of modules is here: Module:Unicode data. That is, I'd like the first image module (the one you asked about: Module:Unicode data/images/000) to contain all codepoints from U+00000 NULL to 0x00FDA TIBETAN MARK TRAILING MCHAN RTAGS, which is the last assigned codepoint before the true upper limit: U+00FFF (unassigned). --Daniel Carrero (talk) 11:33, 10 January 2017 (UTC)
@Daniel Carrero: Oh, my mistake: Plane 16 (consisting of the Supplemental Private Use Area-B block and two non-characters) comprises codepoints U+100000–U+10FFFF, so it makes sense for “000” to denote “U+000###”. I agree with that naming system. — I.S.M.E.T.A. 14:36, 10 January 2017 (UTC)

Now, I made the images appear in the Unicode appendices. See Appendix:Unicode/Basic Latin, Appendix:Unicode/Greek and Coptic, Appendix:Unicode/Yijing Hexagram Symbols, Appendix:Unicode/Arrows... --Daniel Carrero (talk) 18:41, 12 January 2017 (UTC)

I noticed that @Octahedron80 copied the charbox template to the Thai Wiktionary (template link), which is great. Other Wiktionaries may want to copy the image modules, too. I created Category:character info/new with image as a temporary category with all the entries that are still using an "image=" parameter. I plan to delete the category once it's empty. There are still about 800 entries to go. --Daniel Carrero (talk) 10:34, 15 January 2017 (UTC)

Chakavian, Kajkavian, Torlakian?Edit

I would like to add entries for Chakavian words to Proto-Slavic entries but there is no language code that I can see for this language. Realistically speaking I would say this is its own Slavic language rather than a dialect of Serbo-Croatian, although it could be argued the other way. If we add Chakavian, we should also add Kajkavian and maybe Torlakian as well. Comments? Benwing2 (talk) 01:53, 12 January 2017 (UTC)

@Ivan Štambuk and others have been content to add it as Serbo-Croatian with a dialect label. See Category:Chakavian Serbo-Croatian. Dialectal words can always be added as descendants where appropriate. —Μετάknowledgediscuss/deeds 01:02, 17 January 2017 (UTC)

What accentual notation should be used for Proto-Slavic?Edit

@CodeCat I've been working on adding Proto-Slavic etymologies to Russian words and editing some of the Proto-Slavic reconstructed terms. Derksen 2008 uses the following notation:

  • à = short rising (old acute on an originally long vowel, neoacute on an originally short vowel)
  • á = long rising (neoacute)
  • ȁ = short falling (circumflex in some syllables, or original short accent? circumflex if the vowel was originally long, short accent otherwise), e.g. *dȅsętь "ten", *dȅsnъ "right", *dȍma "at home"
  • ȃ = long falling (circumflex in other syllables, or original short accent? circumflex if the vowel was originally long, short accent otherwise), e.g. *dȃrъ "gift", *dȏmъ "house"

The alternative notation is:

  • ȁ (maybe?) = original short accent
  • á = old acute
  • ã = neoacute
  • ȃ = circumflex

It could be argued that the latter notation is simpler, on the other hand it may not be accurate -- if you believe Derksen's view (maybe the Leiden view?), there was a never a time with three contrasting long-vowel accents, and the old acute was shortened prior to the shift that produced neoacutes.

Which should we use? Benwing2 (talk) 16:05, 12 January 2017 (UTC)

I'd recommend using a notation that does not explicitly encode a particular tone contour or length, as these aspects often differed in various Slavic dialects. Instead, the tones should be indicated by their identity: the acute is the acute no matter if it's long or short. —CodeCat 16:10, 12 January 2017 (UTC)
OK. I looked at WT:About Proto-Slavic and it does have a section on prosodic notation. However:
  1. It's pretty complicated, with six different accent types (seven if we count the macron for unstressed length). Furthermore, some of them (e.g. a̍, the word-final accent) don't display very well (e.g. in edit windows, at least on my Mac, where I just see boxes).
  2. This system isn't actually being used all that much. I took a look at various nouns in Category:Proto-Slavic nominals with accent paradigm b and many of them are using word-final à and ò, as in Derksen's and Wikipedia's system (see below).
Do we want to simplify this system? Possibly we can use the system used in the noun declensions given in Wikipedia's Proto-Slavic, which seems pretty logical to me.
Also, I am thinking of modifying the module that generates Proto-Slavic declensions to include accent marks. Sound OK to you? If we do this, it's another argument for using the Wikipedia system, since it would match up the declensions directly with that system. Benwing2 (talk) 19:27, 12 January 2017 (UTC)
The stuff on WT:ASLA was mostly put there by Ivan Štambuk without there being any agreement on it. Is the Wikipedia system the same as the alternative notation you gave above? —CodeCat 19:55, 12 January 2017 (UTC)
The stuff on Wikipedia is very similar to the Derksen system above but it uses ã for the neoacute instead of á or à (on both short and long vowels). The only place where the acute accent (á) is used is on final syllables, where both acute and grave occur. This is evidently indicating a length distinction (e.g. nominative singular nogà vs genitive singular nogý) but I don't know if this is standard or not. Note in particular that the genitive singular of class B žena is notated ženỳ whereas the genitive singular of class C noga is notated nogý. Benwing2 (talk) 00:27, 13 January 2017 (UTC)
What is the reason for this difference, I wonder. —CodeCat 00:30, 13 January 2017 (UTC)
I thought somewhat about this, and this is what I concluded:
  1. The length of Late Common Slavic final accented syllables is apparently reflected in the neo-circumflex in Slovenian (but nowhere else).
  2. The length of nogý but shortness of ženỳ happens because there was at one point a shortening of unstressed final syllables (this appears to be mentioned at the end of section 7 in [4]), and at the time, nogý was stressed but ženỳ was not. The final stress in class B occurred subsequent to this as a result of Dybo's law.
  3. The reason both nogà and ženà are short is a bit subtle, but apparently it's because the vowel was acute. At the time before Dybo's law, the forms were something like žȅna and nogá, with the final vowel acute. First, post-tonic acutes were lost (hence in žȅna) leaving a short vowel (per 7.13 in [5]), which remained short when the final syllable gained stress through Dybo's law. Later on, the acute was lost generally, producing a short rising vowel (9.2 in [6]).
I'm sure not everyone subscribes to this theory, though. Benwing2 (talk) 01:26, 13 January 2017 (UTC)
Why was the final vowel of *nogý not shortened the same way the vowel of *nogá was, when both had an acute? —CodeCat 02:10, 13 January 2017 (UTC)

Out of curiosity, once you decide on a system, will someone implement the paradigms in the inflection templates (if that is possible)? —JohnC5 03:57, 13 January 2017 (UTC)

Yes, I was thinking of doing exactly that. Benwing2 (talk) 05:54, 13 January 2017 (UTC)
Great! I wish I could contribute more to expedite this discussion, but I am woefully uninformed in this matter. —JohnC5 06:06, 13 January 2017 (UTC)
So I'm tentatively going with the Wikipedia notation. I corrected the comments above about it, and I'll repeat it:

Non-word-final vowels:

  • à = short rising = old acute (only on an originally long vowel)
  • ȁ = short falling (circumflex in some syllables, or original short accent; circumflex if the vowel was originally long, short accent otherwise), e.g. *dȅsętь "ten", *dȅsnъ "right", *dȍma "at home"
  • ȃ = long falling (circumflex in other syllables, or original short accent; circumflex if the vowel was originally long, short accent otherwise), e.g. *dȃrъ "gift", *dȏmъ "house"
  • ã = neoacute

Word-final-vowels:

  • à = short rising
  • á = long rising

Benwing2 (talk) 17:10, 13 January 2017 (UTC)

See Module talk:sla-noun. I implemented a first pass at Proto-Slavic accents (only for hard masculine o-stems). Benwing2 (talk) 06:31, 15 January 2017 (UTC)

Old FrenchEdit

How are we meant to distinguish between Early Old French, Old French and Late Old French? Take these:

Latin: dirēctus
Vulgar Latin: drēctus
Early Old French: dreit /dreit/
Old French: droit /droit/
Late Old French: droit /drwe/
Middle French: droit /drwe/
French: droit /dʁwa/
There's only one Wiktionary page and that's for the Old French form, not the early or late ones. ÞunoresWrǣþþe (talk) 10:20, 14 January 2017 (UTC)

As far as I know, Early Old French and Late Old French are both subsumed under just Old French. KarikaSlayer (talk) 15:44, 14 January 2017 (UTC)
Perhaps labels could be used to distinguish between them, just like Ancient Greek has labels for Attic, Koine and Byzantine, for example. — Kleio (t · c) 18:30, 14 January 2017 (UTC)
That'd be great, they really are rather different, mainly since they cover almost a millenium (like Greek)!ÞunoresWrǣþþe (talk) 18:38, 14 January 2017 (UTC)
You could do pronunciation data like it was done for Ancient Greek (here's an example) and for spelling variants you might do something like:
# {{form of|early spelling of|droit|gloss=[[right]]|lang=fro}}
While keeping all the pronunciation data at the main entry. Crom daba (talk) 02:12, 15 January 2017 (UTC)
Is there no alternative to {{form of}} here? That template should really only be used for one-off things, any kind of form-of message that could appear in many entries should have its own template. —CodeCat 02:14, 15 January 2017 (UTC)
Creating the equivalent template is left as an exercise for the reader. Crom daba (talk) 03:04, 15 January 2017 (UTC)
FWIW, I'd consider the pronunciation /drwe/ as Middle French, not Late Old French. Also, Old French droit doesn't derive from Latin dirēctus, it derives from dirēctum. The reflex of dirēctus is OF droiz.
For dreit vs. droit, you might consider a label {{lb|early}} (or maybe {{lb|Early Old French}}), since it's more a dialectal than a spelling difference. Benwing2 (talk) 04:03, 15 January 2017 (UTC)
It doesn't matter what you consider it, Late Old French droit was pronounced /drwe/. Middle French had the same pronunciation, but all the relevant sound changes had already happened by 1350. And it doesn't really matter what it derives from, as droiz ended up as a homophone anyway. Not to mention both are just inflections, not separate words. ÞunoresWrǣþþe (talk) 18:56, 17 January 2017 (UTC)
It's not a dialectal difference, one is a descendant of the other. Hwaet isn't a dialectal version of what (the same timespan separates both hwaet and what and dreit and droit). ÞunoresWrǣþþe (talk) 19:00, 17 January 2017 (UTC)

About "Glyph origin"Edit

Some Chinese and Translingual entries have a "Glyph origin" section. Random example: . A search for "glyph origin" (link) currently returns 6,071 results.

If people want to use that section, at least I'd like it to be mentioned in WT:EL. --Daniel Carrero (talk) 05:36, 15 January 2017 (UTC)

I support adding it. —Μετάknowledgediscuss/deeds 00:59, 17 January 2017 (UTC)
Does this not conceptually overlap with the Description header that e.g. explains why the biohazard symbol looks like it does? Equinox 14:32, 17 January 2017 (UTC)
Actually, I think that "Glyph origin" overlaps with "Etymology", doesn't it? See the entry 水#Chinese (only the Chinese section has a "Glyph origin" subsection at the moment). It contains a table "Historical forms of the character 水". Should it be moved to "Etymology" in all entries? --Daniel Carrero (talk) 14:40, 17 January 2017 (UTC)
Glyph Origin and Etymology are different. See ⿱成龍 for an example. —suzukaze (tc) 22:26, 17 January 2017 (UTC)

Morphological dictionary for Lithuanian or Serbo-Croatian?Edit

Does anyone know of a good morphological dictionary for Lithuanian or Serbo-Croatian, comparable to Zaliznyak's book on Russian? (Grammaticheskii Slovar’ Russkogo Iazyka, aka Russian Grammar Dictionary) It should include in particular the accent patterns of words. I imagine such a thing might be written in Lithuanian or Serbo-Croatian, which isn't ideal as I don't speak either language, but I can potentially puzzle it out esp. with the help of a speaker. (I puzzled out Zaliznyak's book not really knowing Russian either, with the help of Google Translate.) Benwing2 (talk) 00:05, 17 January 2017 (UTC)

The best monolingual dictionary for Lithuanian is, by far, this one. My favourite grammar of Standard Lithuanian is Mathiassen, but there are others that are good as well. Send me an email if you need further resources. —Μετάknowledgediscuss/deeds 00:59, 17 January 2017 (UTC)
For Lithuanian you can also try these six dictionaries. --Vahag (talk) 06:19, 17 January 2017 (UTC)

Appendix for Russian patronymicsEdit

I think we need a special page for Russian patronymics. I didn't get around fixing notes in -ович, -евич, -ыч, etc. The rules and exceptions might need to go into a separate file, rather than describing them on each patronymic suffix entry. @Erutuon, thanks for your attempt, though.

Calling on whoever might be interested in collating the rules @Cinemantique, Benwing2, Wikitiki89, Erutuon, KoreanQuoter, Wanjuscha, Stephen G. Brown, CodeCat, Vahagn Petrosyan. Things to consider are formal patronymics (used in documents), colloquial forms and abbreviations, variants, irregular pronunciations (feminine -чна is consistently pronounced irregularly as -шна, as in Ильинична), stress patterns, declensions and categories, use of foreign names in the former USSR space and overseas, occasionally for expats living in Russia, naturalised migrants, usage. @Benwing2, does Zaliznyak cover this topic. @Cinemantique, it would be beneficial if the Russian Wiktionary cover this as well. --Anatoli T. (обсудить/вклад) 06:52, 17 January 2017 (UTC)

I agree, but I don't have time to work on it right now. --WikiTiki89 15:39, 17 January 2017 (UTC)
I don't think Zaliznyak covers this topic. I agree it would be great to have such an appendix. Benwing2 (talk) 08:47, 19 January 2017 (UTC)

Old NorseEdit

How should we handle the evolution of "Old Norse"? For example, following the word Vreka from start to finish...
Proto-Germanic: IPA(key): /wrekɑnɑ̃/
Common West Scandinavian: IPA(key): /wrekɑ/
First Grammarian’s Icelandic: IPA(key): /wrεkɑ/
Classical Old Icelandic: IPA(key): /vrεkɑ/
Icelandic: IPA(key): /rɛka/

Common West Scandinavian, First Grammarian’s Icelandic and Classical Old Icelandic all count as Old Norse yet have very different pronunciations. How should this be represented in the entry for vreka? ÞunoresWrǣþþe (talk) 21:08, 17 January 2017 (UTC)

@ÞunoresWrǣþþe: Are you making and Old Norse pronunciation module? Also, what are your sources for these pronunciations? —JohnC5 21:25, 17 January 2017 (UTC)
No, I'm asking how we should differentiate the different pronunciations the word had in Old Norse. My sources are Old Icelandic: Its Structures and Development by Hreinn Benediktsson, The First Grammatical Treatise: The earliest Germanic Phonology by Einar Haugen, The Nordic Languages, An International Handbook on the History of the North Germanic Languages by Oskar Bandle and primary sources. ÞunoresWrǣþþe (talk) 22:51, 17 January 2017 (UTC)
You can use {{a}} to specify a specific accent. DTLHS (talk) 22:52, 17 January 2017 (UTC)
It's not an accent, they are separate, descendant languages. "Old Norse" is a blanket term that frankly shouldn't really be used given how many sound changes occurred in it. ÞunoresWrǣþþe (talk) 23:06, 17 January 2017 (UTC)
This problem exists for all ancient languages. Ancient Greek, for example, covers 2000+ years. They've handled the changing pronunciations quite well, I think, for that language. Benwing2 (talk) 23:13, 17 January 2017 (UTC)
Some can understandably be exempted, like Old English. The only sound changes in the seven century span that I can think of are syncope towards the end, and unstressed e's becoming schwas and eventually being dropped. For something like wrekɑ > vrεkɑ, some sort of note of the different pronunciations seems like it would be useful. ÞunoresWrǣþþe (talk) 23:21, 17 January 2017 (UTC)
Several points: phonological change is not the only thing that determines whether two lects get considered separate languages. Indeed, many dialects of modern English are as far apart in pronunciation as these historical periods of Old Norse, but they are still English. There is also disagreement about how sequences such as ⟨hv⟩, ⟨hr⟩, ⟨hl⟩, ⟨hn⟩, and ⟨-r⟩ would have been pronounced at different times. —JohnC5 23:29, 17 January 2017 (UTC)
Are you actually implying that PGM /e/ was [e] and lowered to [ɛ]? Korn [kʰũːɘ̃n] (talk) 23:58, 17 January 2017 (UTC)
Oh lol, I didn't even notice that! —JohnC5 00:25, 18 January 2017 (UTC)

Policy proposal for LDLs.Edit

In the light of some recent events, and since everyone expects me to do it for some reason, I'd like to toss something into the ring concerning uncodified languages which do not have an extensive written coverage and a great variety of written forms and thus put us before an issue when it comes to inclusion. We could make at least two dozen entries for some words, we know some words exist in some languages but they are never used in a CFI way, only recorded in professional works' IPA. The idea I want to put forward is to organise entries of such languages according to their last standardised form, even if it is only standardised by the scientific community. For example the regional nordic varieties would become standardised around Old Norse normalisation. To give an example from a field I actually understand: Normalised Middle Low German has fîf (5), modern forms are fif, fief, fiv, fiev, fiw, fiew, fiwe, five etc. pp. As long as we can reasonably decypher the origin of a word, we would make a hub entry based on the normalised form (fîf after Middle LG) and everything else links to it as an alternative form. Do you think this is feasible? Do you see issues? Should we have it in main entries or appendix? What are your thoughts? Korn [kʰũːɘ̃n] (talk) 11:03, 18 January 2017 (UTC)

@Korn: This is an excellent question and one that I don't think is really answered outright in Wiktionary:Criteria for inclusion. We are to include terms that one would "run across" but for dead languages, no one would "run across" anything except in print. Similarly, for a living language which is generally not documented (or at least, generally not documented with print, a la American Sign Language), one could definitely "run across" all manner of words which simply aren't encoded in written language. So there are verifiable variants of words which someone would never encounter unless he were reading 13th-century Scottish land transactions and there are words which are used on a daily basis by communities which have used a language for thousands of years. It's not really clear to me what exactly Wiktionary is attempting to include. If it's all terms one would run across then documenting oral/sign-only languages is virtually impossible. If it's all written encodings of terms that one can verify in print or digital media, then we will inevitably have huge numbers of variant spellings of the same term which will be very difficult to harmonize. If we want to do both, then we have both problems. As someone who edits here regularly but doesn't really get into the nuts and bolts of the site, this has confused me for years. —Justin (koavf)TCM 15:34, 18 January 2017 (UTC)
Yes, we have many alternative spellings, and for extinct or minority languages, it is common practice to choose a single orthographic standard to lemmatise. Also, we do cover sign languages. —Μετάknowledgediscuss/deeds 15:48, 18 January 2017 (UTC)
@Metaknowledge: Either I was unclear or your response was a little hasty: I realize that we cover sign languages. The point that I was making is that there are very common terms in many languages (most of them) which are difficult if not impossible to include here on a practical and technical level. There are speakers of languages which have no script at all, so that language certainly does exist and if you were in that community, you would run across myriad terms which we cannot include here since there is no way to write that language. (Or to put it alternately, there are many ways one could write it but none of those has been adopted by that community.) The same is true of sign languages which are only documented in audio/visual media and those communities do not use the methods for written notation that a few specialists have devised. They could—they simply choose to not. Prioritizing one form is fine but it does leave the possibility that the content at "color" and "colour" will drift and change over time in ways that we wouldn't want—these two spellings refer to the same word and concept. (We definitely would want different quotations/attestations but we would not want different definitions or translations. The etymologies would be slightly different, of course.) And of course, prioritizing one form over another is compounded by languages which use or have used different scripts: the Serbian version of Bosnian/Croatian/Montenegrin/Serbian uses Cyrillic and Latin alphabets. Kurdish variations use Arabic-based and Latin-based scripts, plus they formerly used Cyrillic-based scripts. Lithuanian was briefly written with Cyrillic characters, so there are Cyrillic versions of "name" or "house" but not for "computer". So the two difficulties I was trying to explain above are that 1.) there are many terms which we cannot include in principle as this is a written dictionary but those terms presumably could or should be included and 2.) there are many variant forms (spellings or transliterations) which we have to maintain to keep from forking into separate definitions. These two problems are pretty fundamental to the project but it's not clear to me what the solutions are. (I recall discussion on the latter at one of the discussion forums from last year). There may not be any one silver bullet solution—I suspect there's not—but as someone who uses this site on a daily basis for years, I don't know what the community has decided about these two problems. —Justin (koavf)TCM 16:05, 18 January 2017 (UTC)
I agree that forking should be minimised; we make exceptions for issues like American vs Commonwealth or Serbian vs Croatian, but generally we choose one standard or script to lemmatise at. I doubt they've been created, but any Cyrillic Lithuanian that meets CFI should be added as an alternative spelling with an appropriate label, just like we have Afrikaans entries in Arabic script, etc. —Μετάknowledgediscuss/deeds 16:10, 18 January 2017 (UTC)
Why are there exceptions? Korn [kʰũːɘ̃n] (talk) 16:48, 18 January 2017 (UTC)
For political reasons. Crom daba (talk) 17:35, 18 January 2017 (UTC)
────────────────────────────────────────────────────────────────────────────────────────────────────
organise entries of such languages according to their last standardised form (…) For example the regional nordic varieties would become standardised around Old Norse normalisation
Is this particular example somewhat off-center, in that modern Nordic regional varieties generally aren't very close to Old Norse? Do I read you right that you'd want to eliminate separate Elfdalian or Jamtish entries, and replace them with pronunciation footnotes in Old Norse entries? This sounds like a poor idea, and also one that does not particularly generalize: most of the time unwritten language varieties simply have no earlier written standard to fall back on, and in many cases where there it, it would take us impracticably far back (e.g. with modern Aramaic, modern West Iranian, some modern Indo-Aryan, or even some modern Romance varieties, the closest written direct ancestors are going to be all the way back in antiquity). But maybe I'm misunderstanding.
In any case, I agree with the main observation: many LDLs are often going to be essentially un-CFI, if not unverifiable from uses altogether. I wonder if a first step should be to divide LDLs into a couple subcategories. The following seem reasonably distinguishable:
  1. living languages with an established literary tradition but poor online representation (e.g. Mon, Javanese)
  2. living languages with at most a nascent, unstandardized literary tradition (e.g. Elfdalian, Karelian, Tuvan, Tulu, Navajo)
  3. extinct/liturgical languages with a reasonably-sized corpus (e.g. Ancient Greek, Akkadian, Coptic, Old Tupi)
  4. extinct languages with only fragmentary records known (e.g. Oscan, Phrygian, Crimean Gothic, Old East Japanese)
  5. extinct unwritten languages documented only in linguistic works (e.g. Mator, Sireniki, Yurok, Mbabaram)
I expect #2 will be the largest category by number of languages (and, therefore, potential entries), and the default LDL policies should probably be fine-tuned for them. Non-standard dialects could probably also follow the same guidelines. The majority of current LDL entries are however probably from #1 and #3, which may require their own separate CFI standards. Lastly, I am not convinced if #4 and especially #5 should be covered in mainspace at all. --Tropylium (talk) 16:25, 20 January 2017 (UTC)
Sorry, to clarify: The proposal specifically was that out default approach be that we devise our own normalisations based on the normalisation/standard of the closest ancestor, even if a divergent but non-exhaustive (doesn't cover enough spoken language) and/or chaotic (differs somewhat strongly from author to author) written form exists. Korn [kʰũːɘ̃n] (talk) 22:00, 20 January 2017 (UTC)

Getting rid of the idiomatic tagEdit

In my opinion, it doesn't carry any useful information whatsoever, and in many -if not all- instances could be best replaced by the figuratively tag; in fact, that's what I've been doing, and I'm sure nobody would break a sweat if I kept doing that silently. What do you think? --Barytonesis (talk) 19:57, 18 January 2017 (UTC)

Support, though it can't always be replaced with "figuratively". Figuratively requires a contrasting literal meaning, but we don't include literal meanings because they are SoP. Therefore, every term consisting of multiple words that is includable is implicitly not literal. I am not convinced that the "figuratively" label is useful for single-word terms either. For me, there's no literal and figurative senses of a word, there's just senses. —CodeCat 20:12, 18 January 2017 (UTC)
Sometimes it seems to have been used where colloquial would be more appropriate, e.g. bucket down. Equinox 20:14, 18 January 2017 (UTC)
Ofttimes, colloquial is used where informal would seem better. I've thought that colloquial indicated a set of situations, not a register. DCDuring TALK 20:43, 18 January 2017 (UTC)
Might be nice to create a style guide that connects each of the many labels to how we, collectively, use them and expect them to be used. - TheDaveRoss 21:38, 18 January 2017 (UTC)
I support removing it in general, but I think it should be done carefully in a way that ensures that if some tag is needed, then a good replacement is found. --WikiTiki89 21:56, 18 January 2017 (UTC)
What about cases in which a word can have a certain meaning in a (limited) number of set phrases but only in them? I'd use idiomatically in such cases, would this also be removed? Crom daba (talk) 00:46, 19 January 2017 (UTC)
I would label that "in idioms". --WikiTiki89 15:39, 19 January 2017 (UTC)
I think there's at least some usefulness to the idiomatic tag. For one thing, we have the {{&lit}} template that opposes it. For another, for phrasal verbs like get down or throw up, some of the definitions given follow logically from the sum of their parts (even if it's not completely obvious a priori that the phrases can be used in that fashion) and others certainly don't (e.g. get down meanings #1, 3, 4, 6, 8, 9 are at least partially non-idiomatic, whereas #5 and 7 are certainly idiomatic. Similarly, throw up meanings #1, 4, 5 are more or less non-idiomatic, where #2 (to "vomit") is almost protypically idiomatic (and I would dispute that it's colloquial, I think it's actually quite neutral in register). Benwing2 (talk) 03:35, 19 January 2017 (UTC)
Usually it can and should just be removed. - -sche (discuss) 11:02, 20 January 2017 (UTC)

Possibly useful site for Early Modern English attestationEdit

See here This site is a part of Zooniverse which has some people-powered research which is a little tangential to what we do here (somewhat similar to v:) and this project attempts to transcribe all manner of works from the time of Shakespeare to provide context to his plays. It could also be useful for some of us looking for spellings or older instances of words. —Justin (koavf)TCM 21:19, 18 January 2017 (UTC)

Arabic consonant patternsEdit

Most languages have suffixes or prefixes, which are easy to write entries on, but Arabic has consonant templates. Perhaps we could add a new POS type under the category of morpheme (alongside prefix, suffix, circumfix, root, and so on): perhaps Pattern. Then, using the root traditionally used to describe morphological forms, ف ع ل(f-ʿ-l) (and its variations), entries could be created for patterns such as فَعَلَ(faʿala), فَعِلَ(faʿila), فَعِيل(faʿīl), فَاعِل(fāʿil), فَعَائِل(faʿāʾil), فَعَائِيل(faʿāʾīl), فَعَى(faʿā), فَعَا(faʿā), فَاعَ(fāʿa), فَعْلَقَ(faʿlaqa), and their various meanings and alternative forms could be described, in much the same way that the various meanings and variations of a suffix like -y are described. As with prefixes and suffixes, there would be a main category, Category:Arabic patterns, and subcategories like Category:Arabic noun-forming patterns, Category:Arabic verb-forming patterns, Category:Arabic diminutive patterns, and so on.

The alternative, I suppose, would be an appendix that lists these patterns and defines them. But I think it would be unnecessarily biased against nonconcatenative morphology to not have the patterns described in the main namespace in the same way as concatenative morphemes such as prefix and suffix are.

I wonder, has this idea been proposed before? @CodeCat, Atitarev, Wikitiki89, Mahmudmasri, Benwing, ZxxZxxZ, any thoughts? — Eru·tuon 23:17, 18 January 2017 (UTC)

These are in Hebrew too (binyan), and maybe other Semitic languages. Equinox 23:20, 18 January 2017 (UTC)
See w:Transfix. —CodeCat 23:47, 18 January 2017 (UTC)
So I guess the question is, shall we use transfix or pattern for this morphological concept? Pattern is a translation of وَزْن(wazn), I think, so it would be more traditional. — Eru·tuon 03:43, 19 January 2017 (UTC)
I'd rather use transfix, as it's the linguistic term, and also more distinctive than "pattern". It also fits better with other kinds of -fixes. —CodeCat 14:11, 19 January 2017 (UTC)
@CodeCat: In all the linguistic literature I've read about Semitic languages, I have never encountered the word "transfix". They usually either use the word "pattern" or a borrowing from one of the languages, such as "wazn" or "mishqal". Sometimes other words like "mold" are used. @Erutuon: "Pattern" is not a translation of وَزْن(wazn), but simply an English description of the phenomenon. A translation of وَزْن(wazn) would have been "weight" (the Hebrew מִשְׁקָל(mishkál) is probably a translation of وَزْن(wazn)). --WikiTiki89 15:53, 19 January 2017 (UTC)
Thanks for tagging me. I would rather go for pattern, but it appears that transfix had already been chosen. --Mahmudmasri (talk) 19:06, 19 January 2017 (UTC)
Category:Arabic patterns is a really bad name. It's far too ambiguous a term, "pattern" can mean many things. We should choose a name that is clear. —CodeCat 19:09, 19 January 2017 (UTC)
It's not ambiguous if it's clearly explained on the page. You're the only one who supports "transfix" and you don't even work with these languages. We already have Category:Hebrew terms by pattern, if you want to see how it could look. --WikiTiki89 19:15, 19 January 2017 (UTC)
We are already (to some extent) doing this for Hebrew. See the root and pattern links (and categories) at the entry מלכה. I've been considering doing this for Arabic as well. --WikiTiki89 23:49, 18 January 2017 (UTC)
Lexical content shouldn't be in appendixes, I think. Appendixes can be used to clarify a topic or give an overview of forms, but the actual forms should still have real entries. —CodeCat 23:52, 18 January 2017 (UTC)
But the patterns are often only quasi-lexical. Sometimes they have well-defined meanings, other times they are just happenstance. --WikiTiki89 00:07, 19 January 2017 (UTC)
Perhaps we could establish official criteria to distinguish happenstance from lexical patterns. Perhaps we could classify them as lexical if an identifiable meaning for the pattern can be discerned in at least three terms using the pattern. This criterion would easily apply to the patterns I mentioned above. But the pattern فِعَال(fiʿāl) of كِتَاب(kitāb) might not, so it might have to go in an appendix. (On the other hand, another use of the pattern, for an adjectival broken plural, as in جِمَال(jimāl), might qualify as lexical.) — Eru·tuon 03:43, 19 January 2017 (UTC)
@Erutuon: I think it would be confusing to have some patterns in one place and others in another place. It would be better to have them either all in the main namespace or all in the appendix. Another reason I don't like the idea of having them in the main namespace is because they contain placeholder consonants that are not really part of the morpheme. That gets especially confusing if it coincides with an actual word. --WikiTiki89 15:53, 19 January 2017 (UTC)
I support the idea but this is already happening (Erutuon must be aware of this, judging by the edits): Category:Arabic_roots. Ideally, all Arabic terms, which have {{ar-root}} in the Etymology sections, should be categorised by the root letters for the ease of locating them. E.g تَكْفِير(takfīr) is formed from the root letters {{ar-root|ك|ف|ر}} (k-f-r) and should be searchable by k-f-r, even if it starts with "t" (ت) if searched alphabetically. --Anatoli T. (обсудить/вклад) 00:03, 19 January 2017 (UTC)
In Arabic, we are doing it for roots, but not for patterns. --WikiTiki89 00:07, 19 January 2017 (UTC)
Ah, I see, patterns. Yes, I support this as well. Loanwords, if they are not formed by using a pattern, should somehow be excluded. I think it's better to use romanised names for categories, e.g. Category:Arabic fāʿil pattern, which may belong to other pattern groups. --Anatoli T. (обсудить/вклад) 00:11, 19 January 2017 (UTC)
It should be mentioned that for verbs it's possible to categorize any verb with a conjugation template (which is essentially all of them) by root already -- {{ar-conj}} automatically computes the root of a verb as part of its operation, and AFAIK there are no loanword verbs in Arabic. Benwing2 (talk) 03:22, 19 January 2017 (UTC)
برج says it was borrowed, or should the noun and verb go in different etymology sections? (also has a descendant from the same term from which it was supposedly borrowed). DTLHS (talk) 03:33, 19 January 2017 (UTC)
The verb appears unrelated to the noun and IMO should be in a different etym seciton. OTOH now that I think of it there are clearly verbs that are indirectly borrowed, e.g. تَفَلْسَفَ(tafalsafa, to philosophize), although in this case it's formed from a borrowed noun and it could still be argued that it was formed by extracting a root f-l-s-f and applying a verbal pattern to it, and the root thereby acquired a meaning "philosophy". Hebrew similarly has לסבסד(lesabsed, to subsidize) where the same thing could be said, I think. Benwing2 (talk) 03:40, 19 January 2017 (UTC)
@Benwing2: Yes, I think the best way of looking at it is that the root was extracted from the noun and now exists on its own. The decision for Hebrew was that categorization by root is synchronic, so it doesn't matter if the word came from the root or the root came from the word or whatever else. --WikiTiki89 15:53, 19 January 2017 (UTC)

I've now created {{transfix}}, which works like the other morphology templates. It has a corresponding category template {{transfixcat}}. —CodeCat 15:00, 19 January 2017 (UTC)

Per-sense gendersEdit

We currently have mechanisms to indicate the genders of nouns in the headword, and we have the ability to indicate multiple genders. However, there are occasionally multi-gender nouns that have the same etymology and thus are one word (i.e. one "Noun" header and headword, one inflection table) but for which the genders are different for different senses. We have no method of showing this, currently. Context labels don't seem appropriate, because they set a context in which a certain sense applies, while this is the opposite: rather than "when feminine, the noun means X", this is "when the noun means X, it is feminine", cause and effect are the other way around. So how should we deal with it? —CodeCat 15:18, 19 January 2017 (UTC)

We had this conversation last year; see this discussion. The upshot is that multiple noun headers works fine in most cases. —Μετάknowledgediscuss/deeds 15:42, 19 January 2017 (UTC)
I don't find that solution satisfactory at all, so I'd like to ask for a better one. —CodeCat 16:24, 19 January 2017 (UTC)
Another one we've used is to mark both genders in the headword line and then add {{lb|fr|masculine}} and {{lb|fr|feminine}} to the appropriate senses. --WikiTiki89 16:28, 19 January 2017 (UTC)