Wiktionary:Beer parlour/2015/October

Templates for place names

I created a few templates for place names; they should be able to generate standardized definitions for them in all languages. I've been using these templates for entries of places in Brazil only, but this system should be usable for other countries by copying and adapting the existing templates.

I chose the format of "municipalities of São Paulo, Brazil" to copy Category:en:Municipalities of China and others. (see Place names and Earth modules for a complete list of place name categories; I should mention I've found some different, inconsistent naming formats that could hopefully be fixed eventually)

The templates I created:

Main template:

{{meta-place}}

Known issue:

These templates generate simple standardized definitions like "A municipality in São Paulo, Brazil." They still lack the functionality of linking from states to state capitals, and vice-versa. I plan on implementing that feature soon.

Thoughts? --Daniel Carrero (talk) 13:14, 1 October 2015 (UTC)[reply]

I was expecting something more like {{surname}} and {{given name}}. —CodeCa t 13:22, 1 October 2015 (UTC)[reply]

How so? --Daniel Carrero (talk) 13:26, 1 October 2015 (UTC)[reply]

Like {{municipality|São Paulo|Brazil|lang=pt}}. —CodeCa t 20:05, 1 October 2015 (UTC)[reply]

I did start to develop something like this at User:Daniel Carrero/place (while is just a stub, it did work perfectly in this revision, see the code). But, in my opinion, I'd rather use hard-wired full definitions (no matter if they are stored in MW code or Lua) like "municipality of São Paulo, Brazil" for this reason: if we use parts of definitions and allow people to use these parts in any way they see fit, then it would be potentially impossible to make the whole system consistent (judging by all the current un-templatized entries for place names, which are inconsistent in various levels, from the act of categorizing or not some entries, to the internal logic of the category naming system itself!):

There would be definitions like {{city|Florianópolis|Brazil}} (basically all second-level subdivisions in Brazil are "municipalities", presumably that's why multiple Wikipedias and Wiktionaries use "municipality" categories; yet I found many of those to be randomly defined as "cities" or "towns", which could mess up the categorization and definitions).
There would be too much freedom to change levels, like {{municipality|São Paulo|Southeastearn Region|Brazil}} with a "Southeastearn Region" in the middle.
And also just {{municipality|Florianópolis|Brazil}} does not account for the fact that Florianópolis is a state capital unless you add another parameter.

I don't mind changing the system, but since the current system restricts each of the full definitions and associates them with categories individually, it does not have any of the aforementioned limitations, so I would like any other proposed system to be safe from all these problems as well. --Daniel Carrero (talk) 20:32, 1 October 2015 (UTC)[reply]

Update: Rather than adding new parameters to the previous templates, I created different templates for capitals because I needed more specific definitions and categories (they use 2 categories each: state capitals of Brazil; and municipalities of each state). I've thought of somethiong along the lines of {{place:capital of São Paulo, Brazil}} or {{place:São Paulo (capital)}}, but that could change. They work well. The only problem I fear is having too many different templates, though that seems manageable. Naturally, I am open to different suggestions, such as using Lua or less templates with more parameters, but first I have one thing to say: The current system is simple and intuitive enough (at least, that's my opinion) and very customizable in case other countries have different needs. (like, provinces instead of states; or more or less comma-separated levels) For this reason, I'd suggest waiting some time before attempting to merge the current templates into any more condensed model, because that might not work for all countries. In the meantime, I plan to continue using these templates for new entries. As usual, I also request feedback of other people, too. --Daniel Carrero (talk) 19:14, 1 October 2015 (UTC)[reply]

Update: Using fewer templates for Brazil: I'm deleting state-specific and (oh God.) city-specific templates in favor of {{place:Brazil/municipality}} and {{place:Brazil/state capital}} (the last one I'm going to create in a moment.) --Daniel Carrero (talk) 08:19, 2 October 2015 (UTC)[reply]

Should all foreign-language place names have counterparts in English?

I've created most of the 853 entries for municipalities (a.k.a., cities/towns) of Minas Gerais (a state of Brazil) in Portuguese. See Category:pt:Municipalities of Minas Gerais, Brazil. I am thinking of doing the same to fill the English category completely as well. See Category:en:Municipalities of Minas Gerais, Brazil.

Should all place names in foreign languages have counterparts in English? Surely many of those (or all of those?) are citable anyway. I've done some cursory search of small Brazilian towns on Google Books and as of yet found all of them to be citable in English. Random example: Comercinho is citable on this book. What do you think? --Daniel Carrero (talk) 08:33, 2 October 2015 (UTC)[reply]

Might as well. I doubt anyone would stop you. --Zo3rWer (talk) 12:58, 2 October 2015 (UTC)[reply]

How can you be so sure that every single placename in every language has an English translation? Making all FL placenames link to English by default was not a good idea. — Ungoliant ^(falai) 15:42, 3 October 2015 (UTC)[reply]

Of course they have an English name. How else would English speakers refer to it? —CodeCa t 15:47, 3 October 2015 (UTC)[reply]

I doubt that some little village up in a Chinese mountain has an English name. It will certainly have an English transliteration - but we don't accept those. SemperBlotto (talk) 16:04, 3 October 2015 (UTC)[reply]

Attestation in running English text is a prerequisite for an English entry, as per CFI. --Dan Polansky (talk) 16:13, 3 October 2015 (UTC)[reply]

Place names are not words, they're designations. If a person comes along and introduces themselves as Katherine, then the other party must use that name to refer to her. And that's true cross-linguistically; speakers of all languages must use that name, because that's what she said her name is. The same would apply to a random village in China. It can be assumed that foreign speakers will adopt the name that locals use, because that is the name of the village. Of course, there's exonyms, but that's a different story: with exonyms, there is a name, but certain speakers decide to use another. When there is no known name, it must be assumed that there is still at least one name. —CodeCa t 17:33, 3 October 2015 (UTC)[reply]

That sounds almost like an argument to make such entries Translingual rather than pin them down to a specific language. —Aɴɢʀ (talk) 17:54, 3 October 2015 (UTC)[reply]

Maybe that's a good idea. —CodeCa t 17:59, 3 October 2015 (UTC)[reply]

Names as "translingual" would seem to make it pretty difficult to deal with even things like Москва (Moskva)/Moscow, let alone more divergent cases like Köln/Cologne. And in case you're only suggesting this approach for placenames that only exist in one language: suppose that we discover that a tiny Chinese village does have a distinct name in a minority language spoken nearby; would this turn it from translingual back into (Mandarin) Chinese? I'd say treat placenames as particular to the local language, and attestations in other languages as citation loans, unless there's some kind of evidence to the contrary (e.g. for a pronunciation or spelling particular to English). --Tropylium (talk) 18:17, 3 October 2015 (UTC)[reply]

Single-word place names (London, not New York) either are words per me and John Stuart Mill or in any case they behave like words: they get written down using alphabet, get pronounced, inflected and have an etymology. Having them as translingual very often does not work since they get language-specific inflection. But even if place names somehow were not "words", they still get included and regulated by CFI, and the criterion of attestation applies to them. We even had this vote Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2, so I do not think I am in a minority to think they need to be attested. --Dan Polansky (talk) 19:08, 3 October 2015 (UTC)[reply]

Then does this mean that many of the articles for places that exist in English Wikipedia actually have titles and describe places in running text, when their names are not words usable in English according to our own standards? I find that a bit bizarre. Let's take w:Tegal Buleud as a random obscure place. The article uses the name of the place twice in running text. Is "Tegal Buleud" not English if it's used in this way? If not, then what would make it English? —CodeCa t 22:55, 3 October 2015 (UTC)[reply]

It's not that "their names are not words usable in English", it's that their names are not attestable in English by our standards. --Wiki Tiki 89 15:54, 5 October 2015 (UTC)[reply]

I understand that, but then this does mean that our mission statement is false. We don't include all words, just the attestable ones. —CodeCa t 16:18, 5 October 2015 (UTC)[reply]

And that's always been the case. --Wiki Tiki 89 16:23, 5 October 2015 (UTC)[reply]

What I do (with Italian placenames) is generate an Italian entry then, if I can find an English translation, I add an English entry for the translation. If I can't find a translation, I have been known to add an English section if it is a well-known place. SemperBlotto (talk) 18:23, 3 October 2015 (UTC)[reply]

For Spanish places, I usually add an English and a Spanish, but sometimes just a Spanish or just an English. This is pure laziness from my part, but I suppose in theory we could have the same entry in loads of languages. I know that the French Wiktionnaire often does this - see [1] as an example of (excessive?) repetition. --Zo3rWer (talk) 10:31, 4 October 2015 (UTC)[reply]

Place names are designations, but these designations are words. There should be a section for a language only if the word is used in the language: attestations are required. These sections may be very useful, especially for pronunciation (have you ever heard of another dictionary with a pronunciation given for place names from all over the world?), homophones, examples/citations, usage notes, derived words, gentilics, anagrams, etc. In the example given above, most sections are repetition, sure, but these sections will be completed with time. Lmaltier (talk) 20:20, 5 October 2015 (UTC)[reply]

I don't see any reason to forbid creating English entries for foreign-language place names. I also don't see a need to rush out and create them all immediately. People can create them from time to time, though. Purple backpack89 22:12, 5 October 2015 (UTC)[reply]
Yes, especially when they see them used in the language. It's the same for Italian places names used in Spanish, etc. Lmaltier (talk) 17:58, 6 October 2015 (UTC)[reply]

Place name format: English (non-gloss) vs. foreign language (translation + gloss)

As I said in the discussions above, I created a few placename templates. I'd like to discuss about the results as they appear on the entries. See the entry Ouro Preto (a municipality in Brazil). It is currently defined in 3 languages using the same template. ({{place:Brazil/municipality}}) I checked on Google Books, it's attestable in all three.

If the language is English, the definition is formatted as a main (non-gloss) definition and if it's a foreign language, it's formatted as an English translation (linking back to the English entry or section) + {{gloss}}. It looks good IMO in the entry I linked, and it's a consistent system overall, but it also causes a problem: the translation still points back to the English section even if there's no English section to begin with, like in the entry Comercinho. (Which in my opinion, is a bad thing that should be fixed some way or other, but it's not extremely harmful. Ultimately, it's just a pointless link back to the same page.) I was just kind of expecting an English section to be present most of the time, like it happens randomly in entries like Laredo and Colorado (each of these entries have definitions for places in multiple countries, in the English section).

Note that the template uses the same syntax for all languages (only the language code changes), so it's supposed to make copypasting between languages easy. If you wanted to add a French section, you would just use the same code with "fr".

# {{place:Brazil/municipality|en|state=Minas Gerais}}
# {{place:Brazil/municipality|pt|state=Minas Gerais}}
# {{place:Brazil/municipality|es|state=Minas Gerais}}

This is one reason why I asked directly above "Should all foreign-language place names have counterparts in English?". If we could simply add English entries/sections for all placenames, the problem would be solved. But for cases where there's no language section in English, what should the template do? Should the template allow for non-gloss formatting (first letter capitalized and the period at the end) in foreign language entries? Can we use the Translingual section some way for place names? Most likely, any new functionality would be controlled by new parameters to the templates, so the system would become a bit more complex than it is now.

My favorite proposal is this:

For foreign language entries without an English translation, make the template keep the gloss format but without linking the main word [like this: Ouro Preto (municipality in the state of Minas Gerais, Brazil)]. Rationale: Consistent formatting in all entries, and when you translate Ouro Preto into English, you would use the original language name ("Ouro Preto"), even in cases where it's not attestable in permanently recorded media in English.

I plan to continue using the current system for the new entries I am creating. Edit {{meta-place}} if needed, to change how it works. --Daniel Carrero (talk) 08:21, 5 October 2015 (UTC)[reply]

Proposal: Extinct unwritten languages should not qualify for inclusion

WT:CFI includes some interesting clauses for the inclusion of terms from languages that are not "well-documented on the internet". Perhaps the most interesting is that entries may be created even on the basis of a single mention, without any attestation or evidence of attestability required.

Let's think a little bit about what are we even doing here. I would not say this criterion is always unreasonable; but obviously it does not exist just as a backdoor to document words even when they are not attestable per the usual standards, since we do not allow this method for creating entries for rare words in well-documented languages.

The impression I get — though this does not appear to be written out anywhere! — is that there's an underlying idea that some languages mainly exist elsewhere than on the Internet: e.g. as old literary languages that are not spoken anymore, or mainly as spoken languages which so far do not have much written materials available. And so we assume that if a word as been documented in a scholarly source or the like, it could in principle be also easily attested once the speakers of Pohnpeian get around to hanging online in greater amounts, or once people have uploaded enough Karakhanid Turkic materials on Wikisource, or so on forth. But as long as this is not the case, asking editors to provide attestations would be simply adding to a backlog.

"Mention implies potential attestation" however fails to hold for some languages. I propose that to qualify for inclusion in the main namespace, a language must have at least one of the following:

A surviving written tradition.
Continuing existence (≈ the potential for a written tradition to be established in the future).

This is relatively lenient still. If we consider epigraphic attestation a written tradition (but see below), languages like Oscan would continue to qualify under criterion #1 — alongside any more abundantly attested extinct languages, say Hittite, Old Tupi or Ubykh. In the absense of other updates to CFI, any individual words in these languages would also continue to qualify on the basis of mentions alone.

What this serves to exclude are languages like Crimean Gothic, Pumpokol or Tasmanian. Any material that is known of languages of this sort generally only exists in linguistic sources, in all but exceptional cases comes with glosses attached, and thus seems to fall clearly short of Wiktionary's general rule of inclusion, as stated at WT:CFI:

A term should be included if it's likely that someone would run across it and want to know what it means.

That is: it seems to me like entries in dead-and-buried languages do not exist to fulfill this need. They exist solely for the sake of linguistic curiosity. No one randomly runs across text in Crimean Gothic, and is left wondering what it might mean.

Linguistic curiosity is of course still a need, and Wiktionary is doing a good job at answering it, I believe. Some people might indeed wonder "so how does one count to ten in Crimean Gothic", or "has this Ket word of alleged Yeniseian ancestry been even recorded from the related languages" and want to look it up. Hence I am not proposing flat-out deleting what we currently have in languages that fail the current-or-potential-written-tradition test. A better solution probably would be the inclusion of recorded data from any natural language variety in the Appendix namespace. (Or, perhaps, the creation of a new Extinct namespace?)

You might ask what difference does it make to switch from regular entries to an appendix, other than make the terms not come up by default in search. One thing is that this would also seem to be grounds to diverge from the usual layout requirements. For example:

If a language's known corpus is something like twenty or fifty words, we can put them in a single appendix rather than sprinkle them across several stub entries.
- Individual quotations could in such cases be replaced with a single references section.
If a language has only been recorded in phonetic transcription, we could accordingly provide only the pronunciation/transcription, and avoid implying that an orthography exists.
- If competing transcription schemes exist, we could standardize one of them and cover the others by means of an equivalence table, rather than creating duplicate entries.
Missing information such as parts of speech could be left unknown.
If glosses are only available in a language other than English, and the precise meaning cannot be verified, we might leave the glosses in the original language and not risk mistranslating things.

--Tropylium (talk) 01:08, 6 October 2015 (UTC)[reply]

I support this proposal, if I understand that it essentially means "Mentions may only be used for attestation, if the language's corpus as a whole has actual attestations of other words. Languages whose entire corpus is mentions do not qualify for inclusion." —CodeCa t 09:27, 6 October 2015 (UTC)[reply]

It's still more lenient than that: "Entries may be based on mentions only if the language's corpus includes actual attestations, or, if the language is not extinct." --Tropylium (talk) 16:35, 6 October 2015 (UTC)[reply]

Subproposal

The above argumentation could additionally be extended to exclude from mainspace also two further types of languages:

Extinct languages whose known corpus is highly limited and which does not allow directly establishing the language's grammar, pronunciation, the meaning of its words, etc. Often information of this type can be determined via the comparative method (a particularly good example might be Proto-Norse) — but it would seem to me that this is not too much different from entirely unattested reconstructed languages.
Moribund languages, for which all available material is linguistic documentation and no revitalization efforts exist. For such languages, we have effectively no foreseeable hope of ever gaining anything resembling a written standard that could be documented according to regular attestation criteria. An example might be Ter Sami.

I would however like to go on record as strongly in favor of continuing to include endangered languages for which even marginal natural transmission can be suspected to remain (e.g. Ishkashimi), or even elementary attempts at formulating a written standard are underway (e.g. Votic). Hence any exclusion of languages by these criteria should be probably "opt-in", i.e. with the burden of proof on the side claiming that a language is indeed poorly-documented enough to not merit inclusion. --Tropylium (talk) 01:08, 6 October 2015 (UTC)[reply]

I strongly oppose this. Your "impression" (i.e. assumption) recorded above is inaccurate; we have more lenient criteria for such languages because we would otherwise have no way of documenting them (the use-mention distinction is merely meant as a tool for us to determine whether a word is really used). There is no point to separating off certain natural languages thus, and it would be counterintuitive to users. —Μετάknowledge^{discuss/deeds} 01:25, 6 October 2015 (UTC)[reply]

An interesting interpretation. (And in the absense of explicit policy support, I could ask whether it is also merely an assumption.) It seems to carry fairly strong implications, though.

If Wiktionary's purpose is to document any use of language whatsoever, on an equal level;
if this includes extinct languages just as well as living ones;
and if we're not tied to direct attestation, but are to also allow documentation thru indirect inference such as mentions

— then this would actually seem to require that we must also document proto-languages in mainspace, to an extent! The most reliably "entirely reconstructed" words are at least on an equally probable footing as are words in otherwise attested languages that are presumed to have exist based on hapax attestations by non-native speakers, reconstructed semantics, etc.

You might also want to note that the proposals above are not in opposition to the documentation of anything at all, only on what should be presented as regular dictionary entries. Contrary to its name, WT:CFI is not actually the criteria for inclusion on the Wiktionary servers period, but merely the criteria for inclusion in mainspace.

(Also, should I assume that this reply is meant also in opposition to the main proposal, not just to the two subproposals?)

--Tropylium (talk) 16:31, 6 October 2015 (UTC)[reply]

I oppose this as well. Attested terms should be included, period. —CodeCa t 09:27, 6 October 2015 (UTC)[reply]

Fair enough, though this counterargument appears to only cover only languages of the Proto-Norse type. With languages of the Ter Sami type, no attestations exist whatsoever. --Tropylium (talk) 16:31, 6 October 2015 (UTC)[reply]

What exists for Ter Sami then? —CodeCa t 16:58, 6 October 2015 (UTC)[reply]

The only major source is a comparative dialect dictionary of Kola Sami (the majority of whose materials are Kildin Sami) rendered in phonetic transcription, i.e. a huge bunch of mentions. Accordingly, what entries we have essentially have been created only as a pronunciation. Consider e.g. лa̭i̭ja (la̭i̭ja) ≈ IPA [ɫʌi̝jɑ]. (I doubt if anyone with native knowledge of Ter Sami would recognize either of these written forms, if presented with it.)

It's not only moribund languages that suffer from this problem, though. Tons of endangered languages only have field research materials available so far. You may recall a BP discussion about a new "Languages without a Written Tradition" in February. Hence I draw a difference here not between written and unwritten languages, but on if Wiktionary inclusion could be expected actually benefit the speaker community, and if we could expect native speakers to contribute at some point. --Tropylium (talk) 19:39, 6 October 2015 (UTC)[reply]

Query: How would this proposal affect proto-languages? Do you suggest that we remove all of our proto-language appendix entries? ‑‑ Eiríkr Útlendi │^{Tala við mig} 08:00, 6 October 2015 (UTC)[reply]
None of this has any effect on reconstructed proto-languages, which are already excluded from mainspace inclusion. --Tropylium (talk) 16:31, 6 October 2015 (UTC)[reply]

Away until mid-October.

I will be away until mid-October. Please try to have this project completed by the time I return. Cheers! bd2412 T 15:28, 7 October 2015 (UTC)[reply]

You're not my supervisor! WurdSnatcher (talk)

The "get it completed" is a traditional Wiktionary joke that gets funnier every time it is reused. HTH. Equinox ◑ 01:29, 8 October 2015 (UTC)[reply]

Surely there's not much to do, right? How complete is Wiktionary right now? 90%? 95%? --Daniel Carrero (talk) 01:39, 8 October 2015 (UTC)[reply]

Not even 1%. — Ungoliant ^(falai) 01:42, 8 October 2015 (UTC)[reply]

Blasphemy! Everyone knows that Wiktionary is pretty much complete already. Any word we don't have is probably not worth the trouble anyway. --Daniel Carrero (talk) 01:51, 8 October 2015 (UTC)[reply]

You're not my supervisor! HTH --Catsidhe ^{(verba, facta)} 01:41, 8 October 2015 (UTC)[reply]

What's the etymology of the joke? --Dixtosa (talk) 14:34, 11 October 2015 (UTC)[reply]

I am stunned to find that in the entire week that I was absent, the collection of all words in all languages was not completed. bd2412 T 01:06, 14 October 2015 (UTC)[reply]

I ate them all. --Romanophile ♞ (contributions) 01:08, 14 October 2015 (UTC)[reply]

That must have given you a sour stomach. bd2412 T 13:21, 14 October 2015 (UTC)[reply]

The vote on adding a collocations or phrases namespace or section

Wiktionary:Votes/2015-09/Adding a collocations or phrases namespace or section has opened. The vote was prompted by Wiktionary:Beer parlour/2015/August#Adding_a_collocations_tab_or_section. - -sche (discuss) 17:10, 8 October 2015 (UTC)[reply]

WT:NORM and multiple spaces, tabs and indentation

Currently, the first rule says no leading or trailing spaces, while rule 4 says no leading or trailing space in templates. I came across many entries with leading spaces, such as apple. Many of them occurred in uses of the {{quote-book}} template, though presumably it may occur in any template call. apple breaks both rule 1 and rule 4, since line breaks are also leading/trailing space in a template, and furthermore the parameter names have whitespace around them. This whitespace could simply be removed, but is this desirable?

I have also come across many pages that include tabs; some of them at the start or end of a line, others in the middle of a line. The question is the same, what should be done to get rid of them? They could be replaced with a set amount of spaces, but multiple spaces are equivalent to a single space in Wikitext, so this seems a bit pointless. This makes me wonder if there needs to be a rule that says no multiple spaces in a row. Then tabs could just be replaced by a space. What do you think? —CodeCa t 19:48, 8 October 2015 (UTC)[reply]

I think rule 4 should be amended to allow what apple is doing. I think the intention of rule 4 is to disallow things like {{ en-noun | - }} and [[ apple ]], but for {{quote-book}} and other templates that use a large number of parameters with long values, it is more readable to put in line breaks. I don't particularly like the extra spaces on each line of the {{quote-book}} template, but I can't see a reason to ban them. --Wiki Tiki 89 19:57, 8 October 2015 (UTC)[reply]

But we already do ban them. | page = 537 goes counter to rule 4 whether you put it on the same line as the template name, or on a line of its own. The rule specifies that it should be compressed to |page=537. I don't think it would be any good to make an exception to this whitespace rule for parameters on their own line.

I have less objections to the practice of breaking long template calls up into multiple lines in general, as long as a set format is decided on for those as well. Right now, there are differences in the amount of leading whitespace (apple vs accost). Should the leading whitespace be removed, so that the line starts with |? Then rule 1 would be satisfied. An exception could be added to rule 4 that the | preceding a template parameter can be optionally preceded by a line break.

Also, the practice should be nuanced for positional parameters, because leading and trailing whitespace does not get stripped from them, unless the template does it itself explicitly. Stripping whitespace in a template requires a module, as template code can't modify strings. At the same time, templates that take an indefinite number of positional parameters, like {{compound}}, {{head}} or {{der3}}, should be written as modules anyway, for other reasons. —CodeCa t 20:32, 8 October 2015 (UTC)[reply]

But there is a workaround for stripping whitespace in templates: pass the text as a named parameter to another template that returns the text back (like {{strip|={{{1|}}}}}). Anyway, I guess I would agree that the leading space in | page = 537 is bad practice, but I think the spaces remaining in | page = 537 are harmless as long as it is consistent within the template. --Wiki Tiki 89 21:09, 8 October 2015 (UTC)[reply]

I've now edited apple to conform with WT:NORM at least as far as the whitespace rules are concerned. —CodeCa t 21:34, 8 October 2015 (UTC)[reply]

Maybe you should have waited for more opinions before doing that. --Wiki Tiki 89 19:25, 9 October 2015 (UTC)[reply]

I didn't think following existing rules would be controversial. In fact, I did it to show the outcome of those rules. —CodeCa t 20:15, 9 October 2015 (UTC)[reply]

Re: "it is more readable to put in line breaks" I strongly disagree and remove them regularly in favor of spaces. We have many instances of the entire edit frame being taken up with a single citation template and many pagedowns being required to proceed from one definition to another. The result is that substantive editing of definitions of highly polysemous, cited, ie, English, terms needs to be done offline and can only be done offline with difficulty. Perhaps it is time to use transclusion of citations from citation space to make such entries intelligible in the edit frame. DCDuring TALK 23:56, 8 October 2015 (UTC)[reply]

A few extra page-downs is not so bad compared to trying to read such a long template crammed into one line. I would agree that this should not be done when the parameters to the template are short enough. --Wiki Tiki 89 19:25, 9 October 2015 (UTC)[reply]

FWIW, the Russian manual declension template {{ru-decl-noun}} is specifically intended to be used with linebreaks, e.g.:

{{ru-decl-noun
|пау́к-во́лк|пауки́-во́лки
|паука́-во́лка|пауко́в-волко́в
|пауку́-во́лку|паука́м-волка́м
|паука́-во́лка|пауко́в-волко́в
|пауко́м-во́лком|паука́ми-волка́ми
|о пауке́-во́лке|о паука́х-волка́х}}

This way of formatting puts the singulars in the first column and the plurals in the second column, and reading down the rows you get nom, gen, dat, acc, ins, prep. Putting it all on one line sometimes happens but makes it much less readable. So we might want to amend things to allow linebreaks with the preceding vertical bar on the same line, while still excluding leading/trailing/embedded spaces. Benwing2 (talk) 09:29, 9 October 2015 (UTC)[reply]

That's exactly the problem I have pointed out in the vote (Wiktionary:Votes/pl-2015-07/Normalization of entries 2). In the vote, I was told that "CodeCat is the one who added the rule in the policy". What can I say? --Dan Polansky (talk) 06:48, 11 October 2015 (UTC)[reply]
- What's your point? —CodeCa t 21:34, 11 October 2015 (UTC)[reply]

Inflections identical to lemma form

I've noticed that in general, when an inflectional form of a word is identical to the lemma, there is no separate definition for it. For instance, pecūnia and pecūniā, the vocative and ablative singular forms of pecūnia, have no definitions, yet each inflected form would have its own definition if they weren't on the lemma page.

I've come across a fair number of entries (mostly for Latin) that have a definition for each inflection in the lemma entry. I can't find any at the moment, but they're like this:

sheep

1. A woolly ruminant of the genus Ovis.
2. plural of sheep

Is there some sort of policy on this? If not, I think there should be. I would vote for the second option, since it is most consistent with us having separate definitions for each inflected form under non-lemma entries, but I'd like to hear other people's thoughts on the matter. Andrew Sheedy (talk) 03:51, 9 October 2015 (UTC)[reply]

I'm opposed to listing inflected forms that are identical to the lemma form, unless (as with pecūniā) the diacriticized headword form is different from the lemma's own diacriticized headword form. Since [[sheep]] already says "(plural sheep)", there's no need to list it separately. —Aɴɢʀ (talk) 09:29, 9 October 2015 (UTC)[reply]

When I created Arabic non-lemma inflections I specifically added code to exclude adding non-lemma inflections to the same page as the lemma they're derived from. Because I was only creating verbal inflections, this principally applies to the 3sg masc past active (which is the same as the dictionary form), but also to 3sg masc past passives, which almost always have the same written form as the active, although the vowels are different. I did this because I felt it would create a lot of noise to include all those passives on the same page; among other things, almost every verb page would be formatted using "==Etymology 1==" and "==Etymology 2==" (I did it this way because the pronunciations are different, although maybe there's a better way). With nouns it would be a lot worse, since for every noun lemma there would be four or five non-lemma entries on the same page, each with different pronunciation but the same nonvocalized spelling. The assumption here is that if the user looks up an Arabic word by spelling, it's enough to get them to the right page for the lemma, and they can hopefully figure out by looking at the conjugation table for the verb that it has 3sg masc past active and passive that are spelled the same as the lemma form. Benwing2 (talk) 09:42, 9 October 2015 (UTC)[reply]

I do think it would be good to make this information more explicit, but I think it would be confusing like that. Maybe a usage note? (This is the base form of this term, and it is also identical to the genitive plural and the ablative singular) WurdSnatcher (talk) 16:04, 10 October 2015 (UTC)[reply]

I think it's made explicit enough in the inflection table itself. —CodeCa t 17:24, 10 October 2015 (UTC)[reply]

"Genitive plural" and "ablative singular" are probably a lot more confusing to a user than what we have now. Equinox ◑ 21:38, 11 October 2015 (UTC)[reply]

So should definitions listing the other forms that are identical to the lemma be deleted, or left as is for now? The general consensus seems to be that they shouldn't be included. Does anyone else think that this should be made "official"? Andrew Sheedy (talk) 21:45, 11 October 2015 (UTC)[reply]

I think it would need at least a poll on this page showing near-consensus (80%+ support?) and possibly a vote if there were not near-consensus. DCDuring TALK 22:26, 11 October 2015 (UTC)[reply]

For Hungarian entries, I list identical lemma and non-lemma entries separated by their own Etymology header. See terem, a lemma noun, verb and non-lemma noun form. Because the non-lemma form has its own declension, pronunciation and hyphenation, I would like to continue to include them. --Panda10 (talk) 12:40, 12 October 2015 (UTC)[reply]

The possessive form is what's called a "sublemma": a form of another term that has some lemma-like properties, like having its own inflection. Participles also fit into this category. For fully non-lemma forms, that are not sublemmas either, there shouldn't be a separate entry if it's identical to the lemma form in all respects. If there is a difference (one that's not apparent from spelling), then it should have its own entry, like for Latin ablative singular forms. Non-lemmas, whether sublemmas or not, should not have etymologies unless the formation is irregular, and even then it's probably better to put the etymology on the lemma page. —CodeCa t 13:21, 12 October 2015 (UTC)[reply]

Re "Non-lemmas, whether sublemmas or not, should not have etymologies": Etymologies may not be useful for English non-lemma entries, but they are for agglutinative languages. For Hungarian, the non-lemma etymology allows the user to click on the suffix for more information as in for example ésszel. --Panda10 (talk) 13:39, 12 October 2015 (UTC)[reply]

Please vote - Allowing matched-pair entries

Please vote on Wiktionary:Votes/2015-08/Allowing matched-pair entries — it ends in 3 days: 23:59, 12 October 2015 (UTC).

Current results:

Support: 6 (66,6%)
Oppose: 3 (33,3%)
Abstain: 0

--Daniel Carrero (talk) 20:05, 9 October 2015 (UTC)[reply]

Poll and discussion: table for seasons of the year

I created Template:table:seasons about 16 hours ago. It is a table for the 4 seasons of the year that can be used in any language. Disclaimer: This template does not have to be used for any languages which have seasons other than the 4-season system — see this article for some 6-season calendars. (Only if the template can be changed somehow to agree with the needs of such languages.)

I chose some icons to illustrate the table and started adding it in on entries of multiple languages. In diff, @Catsidhe reverted one of the entries with the edit summary: "Undo revision 34541571 by Daniel Carrero (talk) That is twee, insulting, and i would like you to stop that now. It's starting to feel like vandalism." I started the discussion Thread:User talk:Catsidhe/Table for seasons, where Catsidhe says more about what they don't like about the template. (After that discussion, I shrunk the images and changed the style to the current 1-row template, turning larger "cartoon"-style images into less conspicuous icons.)

First of all, I realized I did not discuss that season template beforehand, so I apologize if creating and using that table was not a good idea and I'm ready to undo all the changes if that's what people want. I opened this discussion to make sure what format the community prefers for this. I acknowledge Catsidhe's opposition, but I'd feel stupid if I re-edited the entries right now to return to the list format without any further discussions and other people wanted to discuss the issue or wanted the tables back.

(Other tables I created recently: Template:table:playing cards,Template:table:suits and also Template:table:poker hands. The first of all was Template:table:chess pieces, which I discussed in the BP in September. Please let me know if there's any problem with them. Template:table:colors was created by User:DTLHS and has been discussed a lot in the talk page and also revised multiple times.)

The previous state of the entries for seasons was:

Many season entries have been using the "list" system, as in Category:English list templates - "Template:list:blahblahblah" with text as opposed to tables and cross-linking many templates between languages. The list system was a previous project of mine, I created the initial design of lists a few years ago, though it has changed a lot since then. I started converting a few season lists into tables after creating the initial Template:table:seasons template. I also created a few new season tables that didn't exist before as lists, such as Template:table:seasons/ast.

My rationale for using tables in some cases, as opposed to lists:

Consistency in word order and illustrations if they are needed; in short, tables are supposed to help you know instantly the meaning of all the words even if you don't speak the language in question, without the need to check each entry.

Also I chose the icons because IMHO they represent well the ideas of each season in the table even in low resolutions. As alternative proposals, I suggest: 1) keeping the table with different images or 2) keeping the table with English -> Foreign-Language text translations without images. But, as I said, if people want the list format back, (or agree on some other idea) I'll undo my changes (I'd use AWB for that). So please let me know what the community wants. --Daniel Carrero (talk) 01:58, 11 October 2015 (UTC)[reply]

Proposal: Using the table for seasons

Proposal: Having a table for the 4 seasons of the year that can be used in all languages, (Template:table:seasons - check the template to see a list of which languages have the season table already, which includes Template:table:seasons/en for English, Template:table:seasons/fr for French, etc.) except those languages that don't agree with the 4-season system. This proposal is just about having the table, not about the exact images that are used or the exact format of the table.

See 3 examples of implementation:

This diff, introducing the table where there were no links to the season entries, in a Portuguese entry.
This diff, changing a manual list of seasons into the table, in a French entry.
This diff, changing a "list" template (as in Category:English list templates) into a "table" template (as in Category:English auto-table templates), in an Irish entry.

Support

Support --Daniel Carrero (talk) 01:58, 11 October 2015 (UTC)[reply]
Also I like the pictures, but it's okay if I'm in the minority. --Daniel Carrero (talk) 21:18, 11 October 2015 (UTC)[reply]

Note: primavera is an interesting entry to look. It currently has the seasons table in the Asturian, Catalan, Galician, Interlingua, Italian, Portuguese and Spanish sections. --Daniel Carrero (talk) 21:43, 11 October 2015 (UTC)[reply]
Support -- I like the pictures, though I wouldn't be too upset to lose them. I do generally like the idea though. WurdSnatcher (talk) 17:02, 11 October 2015 (UTC)[reply]
Support -- The pictures are unnecessary, but they can be removed if the majority agrees. I think it's a good idea. Aryamanarora (talk) 21:21, 11 October 2015 (UTC)[reply]
Support The tables look nicer than plain lists of words, and can have clarifying translations and/or pictures too. Much more effective than a list of foreign words alone, where you have to click to know what they mean. —CodeCa t 21:36, 11 October 2015 (UTC)[reply]

Oppose

Oppose Simply hideous. Wrong for English wiktionary. Maybe OK for Simple Wiktionary? DCDuring TALK 02:15, 11 October 2015 (UTC)[reply]
Oppose addition of Template:table:seasons (which has pictures intended to represent seasons) to entries for now. I'll wait to see what supporters are going to say to see whether I should change my mind. --Dan Polansky (talk) 11:07, 11 October 2015 (UTC)[reply]
Oppose the pictures. Table or list, I don't mind, just not the daft cartoons. Catsidhe ^{(verba, facta)} 11:21, 11 October 2015 (UTC)[reply]
Oppose these, although I like the ones that can be symbolised unambiguously. —Μετάknowledge^{discuss/deeds} 17:00, 11 October 2015 (UTC)[reply]
Oppose per Equinox (below) - this wastes a lot of space to provide four words. - -sche (discuss) 19:38, 12 October 2015 (UTC)[reply]

Abstain

Comments/Discussion

I'd be fine with it if we got rid of the little pictures. —Aɴɢʀ (talk) 10:34, 11 October 2015 (UTC)[reply]
I don't really like the images and am not sure they are necessary. Equinox ◑ 18:02, 11 October 2015 (UTC)[reply]

When I created this poll, I said "This proposal is just about having the table, not about the exact images that are used or the exact format of the table.", but really the images seem to be the most controversial aspect of the table, so I've tried a different design: Template:table:seasons (without images).

I used it for:

What do you think? --Daniel Carrero (talk) 21:15, 11 October 2015 (UTC)[reply]

Honestly all the table/border stuff seems like a waste of space for only four words. Equinox ◑ 21:18, 11 October 2015 (UTC)[reply]

I oppose use of the table, per Equinox. (And Equinox's view should carry more weight than others' in a discussion about seasons. :-) )—msh210℠ (talk) 17:49, 12 October 2015 (UTC)[reply]

A simple list of the three seasons (with gloss for FLs) that are coordinate terms of the headword is what seems appropriate to me. The Coordinate terms header is intended for just this kind of thing. I would think there would be some technical challenge in efficiently suppressing the season that was the headword. DCDuring TALK 23:56, 12 October 2015 (UTC)[reply]

Update: I removed the pictures from the table. Past revision with pictures: this revision.

The official support/oppose count is 4-5, but it does not do justice to the table/picture relation. Maybe I designed this poll poorly.

Some people opposed the table in its entirety, but other people explicitly opposed the pictures while either saying they would be fine with the table or they don't care if it's a table. This, regardless of whether those people voted "Support/Oppose/Abstain/Comments". Even among people who supported the table, a number of people said they don't mind the pictures at best. That confused the hell out of me, but it's clear that at the very least the pictures are unwanted by the current majority, ~~probably the whole table too~~ actually, it looks more like a "no consensus" right now. --Daniel Carrero (talk) 02:41, 13 October 2015 (UTC)[reply]

No consensus, I guess. (as of now) --Daniel Carrero (talk) 20:03, 19 October 2015 (UTC)[reply]

No consensus needed to implement under existing WT:ELE using coordinate terms header, omitting from the display the headword. Can't that be done technically? DCDuring TALK 22:39, 19 October 2015 (UTC)[reply]

IMHO, I'm against using direct links in each wiki page because that would involve undoing the work using list or table templates. ~~We can~~ (edit: I mean "I can", since I promised that, but I was expecting some consensus and this discussion has become too confusing for the reasons that I said. Probably I'll let the status quo linger for a while, then I'll create some new discussion later.), however, convert the current tables back to the list format. Interestingly, Template:list:seasons/lv seems to be the only "list:"-prefixed template that uses 2 lines for some reason.

The distinction between "Coordinate terms" and "See also" seems to be a moot point. It's true that "Coordinate terms" exists in ELE, but so does "See also". --Daniel Carrero (talk) 23:00, 19 October 2015 (UTC)[reply]

Sub-national countries in Wiktionary:Wiktionarians

Wiktionary:Wiktionarians currently has sections for the Basque Country, Catalonia, and Spain; the former two are divisions of the latter. This is inconsistent with the principle of according sections only to countries qua sovereign states. Nevertheless, this is a dictionary, and as such political considerations are not as important as linguistic ones. The Basque Country and Catalonia have their own languages (Basque [eu] and Catalan [ca], respectively), so it is relevant to us whether a person is from one of those sub-national countries. On this principle, I see validity in including sections for other sub-national countries, for example Flanders and Wallonia (Belgium), Quebec (Canada), Guangzhou, Inner Mongolia, Manchuria, Tibet, and Xinjiang (China), Lapland (Finland / Norway / Russia / Sweden), Brittany (France), Gaeltacht (Ireland), Hokkaido and Ryukyu Islands (Japan), Eastern Cape, Free State, Gauteng, Kwazulu Natal, Limpopo, Mpumalanga, Northern Cape, North West, and Western Cape (South Africa), and Cornwall and Wales (United Kingdom). What is the opinion of the community? — I.S.M.E.T.A. 10:25, 11 October 2015 (UTC)[reply]

Speaking only to the case I'm somewhat familiar with, I would not be in favor of a section for the Gaeltacht, as being from the Gaeltacht is neither a necessary nor a sufficient condition for being a native speaker of Irish (though there is a greater than chance correlation). Also, unlike the Basque Country and Catalonia, the Gaeltacht does not correspond to any political entity and could not be considered a "sub-national country" by any stretch of the imagination. —Aɴɢʀ (talk) 10:33, 11 October 2015 (UTC)[reply]

@Aɴɢʀ: I can understand where you're coming from, and I therefore agree that the Gaeltacht was a bad example. However, surely being from anywhere is neither a necessary nor a sufficient condition for being or for not being a speaker (native or otherwise) of any language; in every case, the country listings are only suggestive of language ability. — I.S.M.E.T.A. 20:54, 11 October 2015 (UTC)[reply]

I agree with you - also, India and Pakistan could definitely use that category system:

Kashmir (Kashmiri)
Bengal (Bengali)
Sindh (Sindhi)
South India (Tamil, Telugu, etc.)

Aryamanarora (talk) 21:24, 11 October 2015 (UTC)[reply]

@Aryamanarora: Yes, that was the sort of thing I was thinking. Should the sub-national states be listed separately, as the Basque Country and Catalonia currently are, or should they be listed as subsections of the sovereign states' sections? — I.S.M.E.T.A. 23:10, 11 October 2015 (UTC)[reply]

@I'm so meta even this acronym: Subsections will be best for organization. Aryamanarora (talk) 01:44, 12 October 2015 (UTC)[reply]

@Aryamanarora: Subsections it is! — I.S.M.E.T.A. 12:12, 12 October 2015 (UTC)[reply]

Eh, let people categorize themselves however they want. If they identify as Basque and not Spanish, or as Californian and not American, let 'em do it. There's no need for policy to be that restrictive. Purple backpack89 05:25, 12 October 2015 (UTC)[reply]

@Purplebackpack89: I'm in favour of listing sub-national countries in Wiktionary:Wiktionarians. I just wanted to make sure that doing so has community support or, at least, lacks community opposition. — I.S.M.E.T.A. 12:12, 12 October 2015 (UTC)[reply]

Any objections to having subsections for sub-national countries in Wiktionary:Wiktionarians? — I.S.M.E.T.A. 12:12, 12 October 2015 (UTC)[reply]

I just listed myself as being from Scotland. --Zo3rWer (talk) 09:32, 14 October 2015 (UTC)[reply]

I've just reorganised the page and posted notice of this on Wiktionary:News for editors. — I.S.M.E.T.A. 21:54, 16 October 2015 (UTC)[reply]

Appendix:Capital letter

I created Appendix:Capital letter as an (incomplete) list of uses of a capital letter. Wikipedia already has Capitalization, but I created this page using the entry layout, which I find easier to navigate. A number of those is basically a list of senses that the entries A, B, C, etc. could have if we wanted. (like "found in the beginning of proper nouns" and "found in the beginning of sentences") But I think as a single page they look better. Feel free to expand the list with more languages or senses. --Daniel Carrero (talk) 09:31, 12 October 2015 (UTC)[reply]

Nice, it is certainly thorough. Aryamanarora (talk) 22:29, 12 October 2015 (UTC)[reply]

Thank you. --Daniel Carrero (talk) 08:55, 13 October 2015 (UTC)[reply]

I have a proposal:

Do you think Appendix:Capital letter could be moved to the main namespace under the title Unsupported titles/Capital letter, with the displayed title as [capital letter]? This name is consistent with the entry ] [, whose displayed title is [space].

Rationale:

Not only the whole page is formatted like an entry, if we assume that entries like A, B, C, etc. should have senses like "found in the beginning of proper nouns" and "found in the beginning of sentences", "found in the beginning of taxonomic names", etc., then the page Appendix:Capital letter suppresses the need for creating those definitions in every single letter. Think of it as a merger of all the entries for capital letters because they would have repeated information otherwise. The idea of "capital letter" is something of lexical significance, and completely able to be checked for attestations just like a normal entry. Also IMHO it is more important than the entry ] [.

Then again, I know it's an unprecedented idea, so I don't mind if someone disagrees. (not that I would usually mind otherwise) I used the appendix namespace because it would seem uncontroversial, but I was really aiming for the main namespace. Thoughts? --Daniel Carrero (talk) 08:55, 13 October 2015 (UTC)[reply]

I forgot to mention: moving Appendix:Capital letters into the main namespace also would serve the purpose of making it searchable. --Daniel Carrero (talk) 10:53, 24 October 2015 (UTC)[reply]

Nobody seems to have weighed in, but this has my support, whatever that's worth. Andrew Sheedy (talk) 01:08, 18 October 2015 (UTC)[reply]

True, thanks for your opinion. I suppose that makes us 2 Support; 0 Oppose; 0 Abstain!! :) In any event, I'll ask more people to join in the conversation. --Daniel Carrero (talk) 10:57, 24 October 2015 (UTC)[reply]

Whereas we define the meaning of the symbol (space), we're not defining the meaning of the symbol (capital letter) but the meaning of capitalization. I don't think that an appendix that looks like that (called "capital letter" and treating (capital letter) as the term to be defined) can possibly work in the mainspace. That argument doesn't apply if the appendix is called "capitalization" and treats (capitalization) as the term to be defined, and indeed I think it's a workable idea, possibly a good one. I'm late to the party, here, so lemme ping Daniel Carrero.—msh210℠ (talk) 16:42, 22 December 2015 (UTC)[reply]

@Msh210: Thanks for your thoughts. So, would you support moving Appendix:Capital letter into the main namespace under some title with "capitalization" in it? What about Unsupported titles/Capitalization? (maybe it's a bit too long, Appendix:Capital letter is actually easier to type and remember)

@DCDuring gave some arguments at #Please see - capital letter discussion against moving the page to "Unsupported titles/Capital letter". I agreed with him and the discussion had ended there, that's why the page remains as an appendix as of now. --Daniel Carrero (talk) 17:07, 22 December 2015 (UTC)[reply]

Oh, I was unaware of that discussion; thanks for the link. What DCD said makes a lot of sense (as usual), and I support it.—msh210℠ (talk) 17:12, 22 December 2015 (UTC)[reply]

Don't move to mainspace since it isn't a term - per Msh210. Equinox ◑ 17:08, 22 December 2015 (UTC)[reply]

Tolkien's languages' copyright

I've recently started expanding Quenya entries and User:Chuck Entz suggested coming over here and making sure that the Wikimedia Foundation won't be sued by the Tolkien Estate for any copyright infringement. The main reference I'm using is Eldamo.org, which licenses all of Tolkien's languages' definitions and meanings under Creative Commons 4.0. What does the community think? Aryamanarora (talk) 22:49, 12 October 2015 (UTC)[reply]

[IFYPY] IANAL, but @BD2412 is. —Μετάknowledge^{discuss/deeds} 03:01, 13 October 2015 (UTC)[reply]

This is an area where I would recommend treading very carefully. The fact that another website maintains a compendium of words from a fictional language under an lenient license has no bearing on the copyright status of the creative work with respect to the estate of the original author. bd2412 T 01:15, 14 October 2015 (UTC)[reply]

To clarify: The other site can't license something it has no right to in the first place

In the US, looking at 17 U.S. Code § 102, it does not seem that languages are included under the list of things that are copyrightable; they are closer to a system (of communicating), which is explicitly excluded, than a literary work, which seems to be the closest thing that is included.--Prosfilaes (talk) 01:14, 16 October 2015 (UTC)[reply]

What makes this rather murky is that these languages are an integral part of a series of literary works, and play an important part in the effect of several passages on the reader. For instance, the words of the Dark Lord on the Ring might not sound that unusual to someone who speaks any of a number of languages (somewhere in Central Asia, perhaps?) with a similar phoneme inventory, but to English speakers their sound symbolism really gives an impression of something alien, barbaric and harsh.

Even though some of the Elvish languages were the source of the literary works, rather than the other way around, I doubt they would have come out the same in the end if the literary works had never been created. Besides, Tolkien was such a master of linguistic details that it's hard to escape the impression that the languages are as much original creations as a poem, a sculpture, or a painting.

As for the issues involved, it's not merely a matter of whether the WMF is at risk of being sued: as an enterprise that depends on the willing contributions of a great many people, we need to be very respectful of intellectual property rights. We should avoid violating anyone's legal rights, whether there's a likelihood of litigation or not. Chuck Entz (talk) 23:11, 16 October 2015 (UTC)[reply]

I would suggest some limiting principle, such as including only words that are at least discussed in some other work. bd2412 T 00:24, 17 October 2015 (UTC)[reply]

Matched-pair entries - follow-up proposal

Now that Wiktionary:Votes/2015-08/Allowing matched-pair entries passed, (thanks for the votes!) I have a follow-up proposal:

All unpaired entries can exist as separate entries -- ), (, [, {, etc. -- but they should not have any definitions on the likes of "begins X" and "ends X" and they should not repeat the same information of the matched-pair entries. For example, if ( ) is defined as "encloses supplemental information", then ) should not be defined as "ends supplemental information". The entry ) should only have the modicum of information to point the reader to ( ), and should be devoid of examples, multiple senses (math sense, chemistry [?] sense, typography senses, etc.), regional variations, synonyms, etc. In other words, ( ) should be lemmatized, with (/) pointing to it. (But the individual character entry could have some information specific to it, such as the Unicode box, the name "this is called 'left parenthesis' in English", perhaps even a picture.)

More considerations:

Sometimes, a component of a matched-pair has also standalone definitions. The entry ) could have both:

Used in ( ).
Separates a number or letter from an item in a list.
1) New York, 2) London, 3) Paris.

Sometimes, a single character is the component of different matched-pairs, so it should point to all of them. The entry » could have, maybe:

Used in « », » » and » «.

Thoughts?

Also, feel free to change/adapt the proposal or propose something else if you'd like. --Daniel Carrero (talk) 01:17, 13 October 2015 (UTC)[reply]

Proposal: Only use lemma forms in etymologies

For Latin, some etymologies show multiple forms of a word. For verbs, both the infinitive and first-person singular present active indicative are sometimes shown. For nouns, some people seem to include the accusative singular or the genitive singular form.

This practice is pretty much unique to Latin etymologies, I haven't seen it for any other languages. It can be argued that mentioning this form makes the etymology more correct since Romance lemma forms derive from the infinitive and accusative singular. However, we don't seem to do this for any other languages where this might apply:

Bulgarian and Macedonian etymologies show verbs being derived from the Proto-Slavic infinitive, even though the modern languages have no infinitive and another form is used as lemma.
Irish and Scottish Gaelic verbs use the Old Irish third-person singular form, even though the lemma is another form in the modern languages.

Therefore, I want to propose that we use only the lemma forms in etymologies, regardless of whether the modern lemma form descends from the ancient lemma form. What is inherited is the entire paradigm, not the lemma form alone. The lemma form is merely a representative of that paradigm. So Latin cantō merely stands in for the entire paradigm, which includes cantāre. The choice of lemma form in any given language is completely irrelevant for the actual etymology. —CodeCa t 12:38, 13 October 2015 (UTC)[reply]

Definitely support. This has long been a desideratum of mine. —Aɴɢʀ (talk) 12:52, 13 October 2015 (UTC)[reply]

It’s not the always the entire paradigm that that is inherited. Most Romance words are loaned or inherited from the accusative, but some are from the nominative, and some are from the accusative plural.

For most Portuguese verbs, it’s specifically the infinitive that was loaned or inherited, and other forms were formed from its stem. Look at ajo (not from agō), jogue (not from iacit).

I don’t support this as a general rule, as we risk losing important etymological information if it were followed to the letter. I think it would be better if each language had its own policy about how to link to etymons. — Ungoliant ^(falai) 14:40, 13 October 2015 (UTC)[reply]

That kind of analogical restructuring and reforming is normal in any language and doesn't need mentioning unless something unusual is going on. Paradigms may be inherited but that doesn't mean all forms must be inherited individually. As for borrowing, for example, English placate does come from plācō, even though the specific form was taken from the past participle plācātus. Many Germanic languages, meanwhile, borrow Latin and Romance verbs by using the original infinitive as the stem. —CodeCa t 15:33, 13 October 2015 (UTC)[reply]

It is the entire paradigm that is inherited. It's just that all of the case forms other than the accusative are dropped, but the paradigm also contains all the definitions and connotations (even if these are also changed). --Wiki Tiki 89 15:43, 13 October 2015 (UTC)[reply]

Why is this being discussed without any reference to ordinary users?

In English etymologies the practice of including, for example, the stem of or a form other than nominative singular of a Latin or Greek noun can help users see and accept the etymological we offer. Similarly for words derived from present or past participls.

Is this intended to make it simpler to impose some pan-lingual uniformity on entries? It seems to me to have little prima facie justification, so one naturally looks for other, unstated motives. DCDuring TALK 23:31, 13 October 2015 (UTC)[reply]

I oppose this. I would, however, support linking to the lemma, as is done in some cases already (what seems most common, though, is including both in the etymology, as in "from linking, gerund of link"). Andrew Sheedy (talk) 00:08, 14 October 2015 (UTC)[reply]

Does that mean you think the same should be done for Old Irish, Proto-Slavic and any other language where this applies? What to do when the lemma form didn't even exist in the ancestor language, like for PIE vs its descendants? —CodeCa t 00:20, 14 October 2015 (UTC)[reply]

Assuming I am understanding correctly, then I would answer "yes" for the first question. One of the two practices that exist for Latin words in etymologies would be ideal, in my opinion. Part of it is, as DCDuring mentioned above,that an average user may not realize that they are not being given the actual form of the word from which the word they are looking at was derived. I know that was the case for me before I realized what was going on, and as a result, I thought that some of the etymologies were pretty far-fetched. I wouldn't throw a fit, however, if that standard wasn't adopted for languages like Old Irish, Proto-Slavic, etc.

I'm afraid I'm having difficulty parsing your second question in my sleepy state. Rather than me answering a question you didn't ask, could you please clarify what you mean first? Andrew Sheedy (talk) 01:02, 14 October 2015 (UTC)[reply]

Proto-Indo-European didn't have an infinitive, so for any language that uses the infinitive as lemma, there's a problem. There's no form to show it to be derived from. Latin ferō is inherited straight from PIE *bʰéroh₂, but Proto-Germanic *beraną didn't inherit from anything in PIE, it was formed after Germanic split off. The exact Germanic descendant of the PIE form is *berō, but that's not the lemma form. —CodeCa t 01:09, 14 October 2015 (UTC)[reply]

I see. In that case, I would note the extra step, i.e. the formation of *beraną from whatever form, which in turn came from PIE *bʰéroh₂ (or whatever the exact inflection is). I realize that that information may not always be available, but the bottom line is that the intermediary step should be noted, or it should be made clear that the derivation was indirect (e.g. saying "ultimately derived from" rather than simply "from"). (Also, if PIE had no infinitive, why is *bʰer- defined as "to bear, carry"?) Andrew Sheedy (talk) 02:00, 14 October 2015 (UTC)[reply]

It's actually the root with all the inflectional/derivational bits removed, so it's impossible to translate directly into English. Using the infinitive is better than a long explanation to the effect that it depends on what inflectional/derivational state it's in as to how to translate it. Chuck Entz (talk) 02:25, 14 October 2015 (UTC)[reply]

Ah, OK. Proto-Indo-European is unfamiliar territory for me. Andrew Sheedy (talk) 02:33, 14 October 2015 (UTC)[reply]

Much as Andrew Sheedy, I oppose this, but I support linking to the lemma forms. I deal primarily with Japanese, and sometimes a term derives not from a lemma form, but from some inflection thereof. Listing only the lemma form in the etymology would be incomplete, and it invites confusion. ‑‑ Eiríkr Útlendi │^{Tala við mig} 02:47, 14 October 2015 (UTC)[reply]
Oppose. Trying to standardize all cited forms as lemma forms would be overblown. Lemmatization is decided on other grounds entirely than etymological transparency. In particular, when lemmas consistently contain a particular suffix (e.g. an infinitive or nominative marker), I am in favor of quoting only the word stem, if this can be well-defined. (But I am, per Andrew S, in support of including at minimum a link to the lemma.) --Tropylium (talk) 09:19, 14 October 2015 (UTC)[reply]

Standardizing suffix entries

Looking at the suffix entries of different languages, I notice a great variety in how the definitions and examples are formed/formatted, and what terminology is used. This is especially visible on long multi-language pages such as -a, -t, -k. I think it would increase the quality of our dictionary if we could standardize certain aspects to make them appear more unified. The standards would be mainly recommendations and guidelines for formatting and terminology. Here are some simple examples:

Is there a preferred terminology in the definitions? E.g.:
"verb suffix", "verbal suffix", "verb-forming suffix", or "verb-building suffix"?

"plural suffix", "plural ending", or "plural marker"?
"Forms the..." or "Used to form the..."? Or {{non-gloss definition|Used to form the…}}
How to format the FL definition when the FL entry has an English equivalent?
How to format it when there is no English equivalent?
The order of terms within a definition, e.g. "third-person singular indicative past indefinite" or "third-person singular past indicative indefinite"?
For the examples listed below the definition, should we use the format recommended by {{suffixusex}}?

These are just initial thoughts and observations, I'm sure there will be others, I just wanted to find out whether other editors see a need for such recommendations. --Panda10 (talk) 14:24, 13 October 2015 (UTC)[reply]

@Panda10: I also struggle to find good ways to define suffixes. I’ll try to explain what I usually do:

If the suffix has an exact English equivalent, I use the typical FL format of translation (gloss). Scientific suffixes such as -metro and -algia are in this category.
Otherwise, I use the format {{n-g|explanation}}; possible_translations.
The explanation is “forms parts of speech, from parts of speech [qualifier], indicating/denoting/meaning [...]” (see -eiro for an example)
Sometimes the wording makes it clear what is the part of speech, e.g. “forms the names of lakes” doesn’t need to say it forms proper nouns
I think I’m the only one who has ever used {{suffixusex}}.

— Ungoliant ^(falai) 13:54, 16 October 2015 (UTC)[reply]

@Ungoliant MMDCCLXIV: Thanks. I often read FL suffix entries for ideas, to see how others organized them. It is especially educational when I know nothing about that particular language. If I understand it easily and I like the layout, I will use the same or similar in Hungarian suffix entries.

The -eiro entry looks very thorough and well organized. I'm not sure about using {{n-g}}. Personally, it's a little hard for me to read italics. Then there is the problem with the similarity of "form" and "from", not to mention that "forms" is also a plural noun. I checked other online and paper dictionaries, some use "forming nouns from verbs" to make it clearer. Others use a label such as {in nouns} before the definition. I also use a label (saw it first in Finnish entries). See -ás.
I considered using {{suffixusex}} before because its output is very close to what I've been using, but in the end I didn't. Mainly because I didn't want the suffix to be repeated in the example, instead I bold the suffix in the derived word. This way the example appears more compact to me and the suffix is still clear. See -ás.

I understand that there will always be differences due to the nature of a specific FL. For example, Portuguese will not require noun suffix inflection tables, only the headline forms for feminine and plural. But I'm convinced that certain things could be standardized. --Panda10 (talk) 19:59, 16 October 2015 (UTC)[reply]

There's also {{usex-suffix}}, for some reason (one of them should probably be deleted) DTLHS (talk) 20:05, 16 October 2015 (UTC)[reply]

I've converted existing uses to {{suffixusex}}, and turned it into a redirect. —CodeCa t 20:43, 16 October 2015 (UTC)[reply]

Last week for comments about IEG project

Hi beer chatters,

The call for Individual Engagement Grants is closed and there is only one week left for comments from the communities. I haven't seen any notice about this and it's quite sad looking at the nice list of projects and number of people involded. So, I invite you to spend some time to look at the project and to let endorsement notes to encourage the participants. I particularly invite you to look at Wikiproject Siriono, a project I built for Wiktionaries. I'll be glad to receive advice and comments about it. Of course, please read the others projects as well, there is really very exciting stuffs! Eölen/Noé (talk) 21:24, 13 October 2015 (UTC)[reply]

Cities of Norway vs Cities in Norway

I have noticed not too long ago. There are two very similar categories: Category:en:Cities in Norway and Category:en:Cities of Norway.

So which one is better? "Cities of Norway" or "Cities in Norway"?

--KoreanQuoter (talk) 02:27, 14 October 2015 (UTC)[reply]

I'd choose "IN", that's what Wikipedia does for cities.

Wikipedia has: w:Category:Cities and towns in Norway. --Daniel Carrero (talk) 02:34, 14 October 2015 (UTC)[reply]

I like "of" better, but our existing entries (and a bunch I just added to the module) are for "in". Of course, we're not talking about a lot of categories, or even of entries- but it's better not to rearrange everything if we don't have to. Chuck Entz (talk) 03:24, 14 October 2015 (UTC)[reply]

Civilocity

Civilocity is a neologism which describes a form of government where the people can watch and listen to the leader of their country for the entire time that person is leading their country. In 2007 Nathaniel Wenger took it upon himself to coin, classify, and copyright this pragmatic philosophy. Nathaniel began talking about civilocity, which he often calls wengerocracy as it remains in its neologism phase, to emphasize the importance for countries to watch the leader of their country no matter where they live. Civilocity can be defined as a form of government where the people can watch the leader of their country 24/7, 365 days a year, including the extra day once every leap year broadcasted live on public television to the entire world. Civilocity allows you to know every single thing the leader of your country did and having it all online.

The exact definition of civilocity is literally, behaving in the dwelling. Civilocity is derived from the Latin term civilis and the Medieval Latin term civitat in the early of the 21st century AD to improve the political systems existing in some American city-states, notably Washington, DC.

Add to WT:LOP if you like. Equinox ◑ 15:28, 15 October 2015 (UTC)[reply]

Lingwa de Planeta

Lingwa de Planeta (Lidepla) is a constructed language made in 2010. They have a sizable lexicon and I think we should include the language in Wiktionary. My question is - should it go in the appendix or main namespace? It only has 15 fluent speakers, but so does Volapük. Edit: 25 fluent speakers as per Wikipedia. — This unsigned comment was added by Aryamanarora (talk • contribs) at 19:58, 15 October 2015 (UTC).[reply]

Does this language have a sufficient amount of published textual material from which we can cite words to meet our attestation criteria given at WT:CFI? --Wiki Tiki 89 20:03, 15 October 2015 (UTC)[reply]

Since Wiktionary:Criteria for inclusion#Constructed languages shows that languages without ISO codes have no consensus, I'd think this would need to be decided by someone familiar with this. w:Lingwa de Planeta shows many literary works translated into Lidepla - [2] has a list, notably Alice in Wonderland is translated. There are around 3,000 words in the lexicon. There is a Swadesh list in the aforementioned Wikipedia article. There's also this [3], a translator from Esperanto to Lidepla. Aryamanarora (talk) 20:36, 15 October 2015 (UTC)[reply]

Is there any "durably archived" (i.e. used in "permanently recorded media"; see WT:CFI for clarification) written material produced originally in this language (i.e. not tranlations)? Note that even though Volapük currently has about 20 speakers (according to Wikipedia), it had more in the past and has existed for much longer. I don't know where we find Volapük citations, but I'm sure someone here knows. --Wiki Tiki 89 20:54, 15 October 2015 (UTC)[reply]

As far as I know, it does not. I just learned about a few days ago, however, so my knowledge may be limited. I think should we decide to add it it should remain in the appendix namespace. Aryamanarora (talk) 21:06, 15 October 2015 (UTC)[reply]

Is there anything at WT:CFI saying that durably archived cites have to be produced originally in the language and not translated? I can't find anything to that effect, but maybe I missed it. The Alice translation certainly exists in a dead-tree edition, which is available from Amazon. —Aɴɢʀ (talk) 21:11, 15 October 2015 (UTC)[reply]

I'm just trying to find out more information about the language. Since there would need to be consensus to include this language (a requirement that CFI does mention), editors need information to base their votes on. --Wiki Tiki 89 21:17, 15 October 2015 (UTC)[reply]

I do not believe languages should be added without an ISO code.

More directly, Volapük had close to a million speakers and some non-trivial publications. This language has 25 speakers, and a translation of Alice. (The list of translations seems to either consist of short works or translations of a few chapters.) Evertype has published lots of translations and versions of Alice, including several into tiny conlangs and rare scripts, including one in a script with one user. He currently offers three books in Volapük. His books, at least his Alice's, are print on demand.--Prosfilaes (talk) 01:37, 16 October 2015 (UTC)[reply]

It's a fairly obscure IAL, even within the obscure world of those who know and create conlangs. I certainly see no reason why it should be in mainspace, but I don't think we have any limitations on conlangs in appendix-space (although perhaps we should). —Μετάknowledge^{discuss/deeds} 02:44, 16 October 2015 (UTC)[reply]
My opinion is that Lingwa de Planeta has not matured enough to be included in mainspace. The translation of Alice really exists as an example of the language rather than a usage of the language, since I doubt that there has ever existed even one human being who would have felt more comfortable reading Alice in Lingwa de Planeta than in English. As time goes on, this language might take hold and its community may start producing legitimate uses of the language and then the language would be on track for inclusion in mainspace here. If this language takes off, it will take decades for it be ready to be included here. --Wiki Tiki 89 15:24, 16 October 2015 (UTC)[reply]

I agree with Wikitiki that this is not suitable for inclusion in the main namespace. As WT:CFI says, most constructed languages "do not meet the basic requirement that one might run across them and want to know the meaning of their words, since they are only used in a narrow context in which further material on the language is readily available." However, we do seem to let most conlangs have a minimal, not copyright-infringing appendix. (Is the language copyrighted? The website has a copyright notice.) Inclusion of an appendix doesn't seem to require that the language be given a code, e.g. the Sindarin appendix doesn't seem to use its code (its code seems to be included in the module only so that Category:Terms derived from Sindarin works). - -sche (discuss) 18:39, 25 October 2015 (UTC)[reply]

Template:place

Just a note: Template:place was created, together with Module:place, for use in placenames in all languages. Thanks to Ungoliant MMDCCLXIV (talk • contribs), who developed the module completely. (Also I credit myself as a beta-tester.) See Module talk:User:Ungoliant MMDCCLXIV/archive1 for conversations during the development of the module. --Daniel Carrero (talk) 03:13, 16 October 2015 (UTC)[reply]

Appendix:okay sign

Okay, so, I’m hardly proficient at any sign language, so I simply put this in the appendix. I don’t think that we have any other entries that are extremely similar to this one, so I more‐or‐less made up my own format. If you people have any suggestions or comments, I’d like to read them. I think that this sign, if nothing else, merits inclusion somewhere. --Romanophile ♞ (contributions) 07:29, 17 October 2015 (UTC)[reply]

I made another one: appendix:V-sign. --Romanophile ♞ (contributions) 15:38, 17 October 2015 (UTC)[reply]

I created Appendix:finger gun, Appendix:thumbs up, Appendix:thumbs down, Appendix:shushing and Appendix:air quotes. --Daniel Carrero (talk) 16:10, 17 October 2015 (UTC)[reply]

No comments? Well, qui tacet consentire videtur, as the Romans would say. --Romanophile ♞ (contributions) 23:59, 17 October 2015 (UTC)[reply]

To avoid having random bits and pieces all over the place, I think they should be made subpages of a common parent, like Appendix:Gestures/thumbs up etc. Equinox ◑ 00:02, 18 October 2015 (UTC)[reply]

(edit conflict) I think all of those should be in the mainspace, (to be searchable as normal entries) but I have yet to learn that long notation that we use for them.

Note: Appendix:okay sign = Sign gloss:OK and O@Side-PalmForward K@Side-PalmForward.

But, as long as they are in the appendix namespace, I agree with Equinox's idea of Appendix:Gestures/thumbs up. --Daniel Carrero (talk)

Yeah, there might be some sort of technical code that describes these, but I’m not sure how to write it. Had I known it, I would have just put them in the main space. --Romanophile ♞ (contributions) 00:14, 18 October 2015 (UTC)[reply]

FWIW, I think descriptive names such as "V-sign" are better than the long technical names. --Daniel Carrero (talk) 09:54, 18 October 2015 (UTC)[reply]

This is a great idea, love it! As others have already pointed out, finding gestures might be tricky (visual index?) Jberkel (talk) 00:28, 20 October 2015 (UTC)[reply]

Wiktionary:Votes/2015-10/Matched-pair naming format: left, space, right

A note: I created Wiktionary:Votes/2015-10/Matched-pair naming format: left, space, right as a follow-up to Wiktionary:Votes/2015-08/Allowing matched-pair entries. --Daniel Carrero (talk) 08:38, 17 October 2015 (UTC)[reply]

Categories for places that are not cities?

People have been creating a variety of "cities in..." categories, which is nice. But the category is a bit misnamed, because there are also attestable place names that aren't cities. Furthermore, in many countries/languages, "city" is merely an unofficial name for any large place, and is not strictly defined, whereas in others, even small places can be cities if they have city rights. So these should really be renamed to something more neutral and less subject to uncertainty. —CodeCa t 20:26, 17 October 2015 (UTC)[reply]

I'm not sure I understand why this is a problem. Every country has its own hierarchy of polities, so we shouldn't be trying to make everything conform to some one-size-fits-all scheme. Trying to coordinate placename types between countries can only lead to madness. Besides, what are you going to replace it with? A municipality can range from a part of a city to a regional jurisdiction containing multiple cities. A metropolitan area can stretch across large areas and include numerous cities. Even if you have very specifically-defined entities, deciding which to use entails a rather research-intensive, subjective process, since cities can vary so much. The only halfway-reliable fact is what something is called, so we should categorize using that and suppress the impulse to make sense of it all. Chuck Entz (talk) 22:15, 17 October 2015 (UTC)[reply]

I think we should use a single term that can apply to all of them regardless of size. We don't need to subdivide it further into whatever definitions apply. My gripe is that "city" doesn't cover all we might want to put in the category, so we will need Category:nl:Villages in the Netherlands, Category:nl:Villages in Belgium and so on. I'm trying to avoid that situation by suggesting we use a neutral term. —CodeCa t 22:25, 17 October 2015 (UTC)[reply]

In my opinion, we really should not use "city" for everything if that's inaccurate. Using inaccurate qualifiers just makes us an inaccurate dictionary, thus a less trustworthy one. One idea that Wikipedia uses is "First-level subdivision" (state, province, county), "Second-level subdivision", etc. In fact, Wikipedia takes this one step further, since each of those contains specific categories by country such as "Provinces of Algeria", "Districts of Azerbaijan", etc.

I created WT:Place names, which I propose to be a list of types of place names that all countries use, to help our current categorization system, though most countries are to be filled with information yet. That page also has links to the "x-level" Wikipedia categories I mentioned. --Daniel Carrero (talk) 22:48, 17 October 2015 (UTC)[reply]

Countries may not treat cities, towns, villages, hamlets etc. as legal entities. In the Netherlands for example there are only municipalities, but they can have many different villages in them. What Dutch people usually go by is whether the place has its own road sign that gives the name of the place when you enter it. The sign has legal significance, but only for road users (it means a 50 km/h speed limit). Addresses also use places rather than municipalities. The Dutch term for a generic group of houses in one place, regardless of size, is woonplaats (literally “living-place”). An English equivalent of that would be ideal for these category names. —CodeCa t 22:52, 17 October 2015 (UTC)[reply]

Wikipedia calls it a "settlement": w:Human settlement. —CodeCa t 23:05, 17 October 2015 (UTC)[reply]

That will run into the problem of different countries having different levels of complexity, organized in different hierarchies. To connect a term in use with one of your abstractions requires knowledge of the how the country is organized. Your description of Brazil took up most of a screen- multiple that by dozens. In the US, you have states, except for the District of Columbia and various territories. States are divided into counties, except for the ones that have independent cities that are their own counties, or the ones that are instead divided into parishes, or into boroughs. In Alaska, the borough is the equivalent of a county, and can contain multiple cities. In New York, the city of New York is divided into boroughs. As you subdivide further, there's virtually no correlation between size/importance of a given polity in a metropolitan area and anything at the same hierarchical level in a rural area: w:Los Angeles County, where I live, is larger than some states (not to mention countries), while some rural counties are smaller than the the smallest subdivision of the city of w:Los Angeles, where I live (which isn't in any of your "x-level" Wikipedia categories). No matter what criteria you use, consistency is a pipe dream without making things too complicated and too impractical for mere mortals. Chuck Entz (talk) 00:20, 18 October 2015 (UTC)[reply]

Hey, I didn't propose any "consistent" system or any specific change to the categories, I don't know what exactly would be the categories for most countries. But I'd like to know more, that's why I created what WT page. You don't have to help if you don't want to. But your comments about US are something that would fit well there. Heck, I'm not even saying that we are going to have a perfect, ideal system eventually, but I bet that information could help even the current system some way nonetheless. --Daniel Carrero (talk) 00:33, 18 October 2015 (UTC)[reply]

Wikipedia has another name for certain categories that I don't suggest using here, but I'm going to mention anyway: "Category:Populated places in (place)". --Daniel Carrero (talk) 08:49, 19 October 2015 (UTC)[reply]

Nominalized Adjectives

Regarding nominalized adjectives (adjectives which are used as nouns, such as rich) are we to add a section "Noun"? For example, rich has only an "Adjective" section.SoSivr (talk) 09:07, 18 October 2015 (UTC)[reply]

This is a productive process—I'm having trouble thinking of any English adjectives referring to a class of people that cannot be used this way, aside from words like "Catholic" that are already used as countable nouns. "The deaf", "the living", "the hidden", "the tall", "the ill", "the healthy", "the infirm", etc. can all be used in the appropriate context. Giving that seemingly all adjectives can be used this way as long as the meaning makes sense, I don't think we should add noun senses to their entries.

On the other hand, deaf, poor, wealthy, and ill do have a noun sense to cover this usage, though I think they probably shouldn't. —Mr. Granger (talk • contribs) 17:03, 18 October 2015 (UTC)[reply]

Some dictionaries have some "nominalized adjectives" as nouns, but most do not. Some of those that include it as a noun assert the rich to be an idiom or use the entry to say that rich takes a plural verb. To add and verify the corresponding information for every adjective for which such information would apply seems like a long run for a short slide. DCDuring TALK 17:59, 18 October 2015 (UTC)[reply]

Yes, this is just a feature of English grammar. We should only have noun entries if there exists translations that are different to those of the adjective. SemperBlotto (talk) 05:28, 19 October 2015 (UTC)[reply]

Some previous discussions: Talk:Irish, Talk:deaf, Talk:wicked. We do have "Used before an adjective, indicating all things (especially persons) described by that adjective." as a sense of the. - -sche (discuss) 16:41, 21 October 2015 (UTC)[reply]

So example sentences of these adjectives where they are used as nouns will probably be put inside the relevant Adjective section, together with a usage note perhaps.SoSivr (talk) 09:06, 23 October 2015 (UTC)[reply]

Wiktionary:Votes/2015-10/Internet ≠ Internet slang

Note: Created Wiktionary:Votes/2015-10/Internet ≠ Internet slang, based on the 2012 discussion Wiktionary:Beer parlour/2012/January#Internet =/= Internet slang. --Daniel Carrero (talk) 11:04, 18 October 2015 (UTC)[reply]

Does this need a vote? I feel it’s already the modern consensus and common practice (among people who know what they are doing). — Ungoliant ^(falai) 13:20, 18 October 2015 (UTC)[reply]

I doubt this needs a vote. It needs definition-by-definition review to correct existing entries. Possibly it could use a bit of discussion to clarify the distinction., especially in borderline cases and in cases where both might seem applicable. I'd assume that internet referred to the mostly technical jargon concerning the internet and internet slang referred to slang used on the internet. Slang used on the internet about the internet probably belongs in internet. DCDuring TALK 18:05, 18 October 2015 (UTC)[reply]

@Daniel Carrero: I agree with DCD, and I think this kind of response shows that a vote is unnecessary. —Μετάknowledge^{discuss/deeds} 18:44, 18 October 2015 (UTC)[reply]

OK, I retract the vote. --Daniel Carrero (talk) 18:52, 18 October 2015 (UTC)[reply]

Context label in the form "often medicine"

@CodeCat and I had a discussion about whether a context label at cacoethic in the following form was correct: "(obsolete, often medicine)" ({{context|obsolete|often|_|medicine|lang=en}}). I had used this label to indicate that cacoethic was often, but not always, used in a medical context. CodeCat said "often medicine" made no sense. I pointed out that this sort of label seemed fine for dictionary entries: see, for example, [4]. The matter was resolved by splitting up the medical and non-medical senses, but for future reference I'd appreciate some guidance on whether labels like "often medicine" are appropriate. Smuconlaw (talk) 14:40, 19 October 2015 (UTC)[reply]

I'd say in cases like this it would make more sense to say {{lb|en|chiefly|medicine}} to indicate a term is used chiefly but not exclusively in medicine. —Aɴɢʀ (talk) 15:12, 19 October 2015 (UTC)[reply]

So constructions like "chiefly medicine" and "often medicine" are acceptable as context labels, even though they are not, of course (as CodeCat pointed out), strictly grammatical? Smuconlaw (talk) 15:52, 19 October 2015 (UTC)[reply]

They are perfectly grammatical in the subgrammar of labels. --Wiki Tiki 89 15:55, 19 October 2015 (UTC)[reply]

OK, great. Thanks. Smuconlaw (talk) 13:55, 21 October 2015 (UTC)[reply]

Boring cleanup work for money

I lost my job in July, that's how I've been able to be more active on Wiktionary in the last few months. Though I'm still looking for other jobs.

My money is almost gone. Can I do boring cleanup work on Wiktionary for money?

I got the idea from Wiktionary:Beer parlour/2012/July#Reward or bounty board, in which someone said "I see no reason why a person doing boring cleanup work should not be paid with money if someone offers that money." and "appropriately clean up Category:Translation table header lacks gloss" is mentioned as one possibility.

My plan, if no one objects:

I set up a Patreon account. (I never did that before, I did my research but I apologize if I understood wrong how it works)
Let's say the goal is specifically: appropriately clean up Category:Translation table header lacks gloss.
I set up a goal of "receiving tips every 100 entries", with some minimum amount ($1?). If other people are willing to help, they use my Patreon page and every 100 entries I receive that amount of money, up to the maximum amount that people choose.

There are probably other types of boring cleanup work to do, I'm open to suggestions. --Daniel Carrero (talk) 12:09, 20 October 2015 (UTC)[reply]

I'll take you out for a meal and let you crash on my couch if you run a Spanish verb bot and empty all these categories. --Zo3rWer (talk) 13:49, 20 October 2015 (UTC)[reply]

Thanks! :p Running a bot? I could do it, but that seems a bit different from my original idea, I was thinking of working for money by doing some manual labor that no one wants to do, and this is one of those inherently repetitive tasks that a bot could do better, as you mentioned. (other tasks require individual consideration of each entry and thus would be better done by humans) I wonder who was the last bot used to create forms for Spanish verbs. User:TheDaveBot (2006-2013)? --Daniel Carrero (talk) 11:54, 21 October 2015 (UTC)[reply]

I could be a gazillionaire if I'd got paid for every edit I ever made. Even if you included the fines I'd have gotten for being blocked. --Zo3rWer (talk) 13:54, 20 October 2015 (UTC)[reply]

I'd pay you $1 for every 100 entries cleared out of Category:term cleanup and all its subcategories. —Aɴɢʀ (talk) 10:52, 21 October 2015 (UTC)[reply]

Thanks for the idea, sounds good! If no one minds, I think I'll create a Patreon account saying more broadly "$1 every 100 entries for boring cleanup work, to be decided by consensus." so that I could start working on Category:term cleanup now per your idea and the job could be changed if people voted/decided/discussed on something else (since there's Category:Translation table header lacks gloss mentioned above and probably other jobs). --Daniel Carrero (talk) 11:54, 21 October 2015 (UTC)[reply]

Does it violate any of Patreon's terms and conditions that what you're doing isn't really art? —Aɴɢʀ (talk) 12:38, 21 October 2015 (UTC)[reply]

No, I believe. I've read the Community Guidelines, the legalese Terms of Use and the Help Center. They talk repeatedly about "artists and creators."

Just to be sure, I've sent them an e-mail today.

My name is Daniel Carrero, I'm an editor/administrator at Wiktionary, a dictionary wiki which is a sister project to Wikipedia.
We are all content creators, but often there is content on the wikis that need maintenance and cleanup for quality and standards.
Can I use Patreon to crowdsource specific cleanup and quality work and, like "pledge $1 for every 100 pages cleaned up according to criteria X and Y"? I predict that only other Wiktionary members would be interested in paying for that project.
Thank you,
Daniel Carrero

Also, I've found some specific Patreon projects of creating a wiki:

https://www.patreon.com/DPWiki?ty=h
https://www.patreon.com/user?u=284426&ty=h&u=284426
https://www.patreon.com/Kittychanley?ty=h (this one is "wiki pages and Youtube videos")

--Daniel Carrero (talk) 15:36, 21 October 2015 (UTC)[reply]

They replied to my e-mail. They've suggested doing a monthly campaign (as opposed to "per creation"). What do other people think? I think "per creation" is still better as measurable progress, I don't mind listing all the entries when I finish the work.

Also: +1 point for Patreon because they did their homework, apparently. They said "Wiktionary entries" and I never said "entries" in my message, so they must know at least a bit about our work and correct terminology.

Hey Daniel,
Thanks for writing in and stoked to hear you're thinking about starting a Patreon campaign for Wiktionary!
I think this is a great idea and totally something that would work well on Patreon. To make it easier on you, I might even recommend doing a monthly campaign (as opposed to "per creation,") so that you're not having to keep a constant count of the pages cleaned up. I think that a lot of people would love to support such a great cause and it would be really cool to share fun new Wiktionary entries with your supporters on your Patreon page.
Happy to answer any other questions that come up as you familiarize yourself more with Patreon, so feel free to shoot me a note as they come up!
All the best,
Ellie

--Daniel Carrero (talk) 09:30, 22 October 2015 (UTC)[reply]

Wikipedia says in Patreon: "In October 2015, the site was the target of a massive hacking attack with almost 15 gigabytes' worth of password data, donation records, and source code taken and published. The breach exposed more than 2.3M unique e-mail addresses and millions of private messages." Is there a safer method? --Panda10 (talk) 13:06, 21 October 2015 (UTC)[reply]

I don't know, should I use any other platform listed at Category:Crowdfunding platforms? I believe Kickstarter would work only for huge, expensive unstarted projects, while Patreon should be usable for small tips according to measurable progress done (or per month if the worker chooses that option), like I proposed above.

Here's the links to the 2 websites that Wikipedia uses as sources to that information: [5] and [6].

This [7] is the official statement from Patreon, which I also found quoted in a number of other sites. It says: "There was unauthorized access to registered names, email addresses, posts, and some shipping addresses. Additionally, some billing addresses that were added prior to 2014 were also accessed. We do not store full credit card numbers on our servers and no credit card numbers were compromised. Although accessed, all passwords, social security numbers and tax form information remain safely encrypted with a 2048-bit RSA key. No specific action is required of our users, but as a precaution I recommend that all users update their passwords on Patreon." --Daniel Carrero (talk) 15:36, 21 October 2015 (UTC)[reply]

Also: "The unauthorized access was confirmed to have taken place on September 28th via a debug version of our website that was visible to the public. Once we identified this, we shut down the server and moved all of our non-production servers behind our firewall." --Daniel Carrero (talk) 15:41, 21 October 2015 (UTC)[reply]

I think this is a fine idea, and I might just contribute for term cleanup. —Μετάknowledge^{discuss/deeds} 16:22, 21 October 2015 (UTC)[reply]
Thank you. I've finished my Patreon page:

https://www.patreon.com/danielcarrero

What do other people think? Is that good or should I change something?

I am already accepting contributions, if you please! :) Current project: $1 every 100 entries cleaned up from Category:term cleanup. --Daniel Carrero (talk) 17:13, 21 October 2015 (UTC)[reply]
I see someone already pledged $1 for this work, thanks! :) Anyone else? I'm going to start soon. --Daniel Carrero (talk) 18:22, 21 October 2015 (UTC)[reply]

Just a note: I've found that a number of terms in Category:term cleanup is because people omit the 1st parameter to generate unlinked terms and then don't bother adding lang=. I posted about this on the Wiktionary:Grease pit/2015/October#Template:term - omitting the 1st parameter and the lang= parameter too?. --Daniel Carrero (talk) 19:34, 21 October 2015 (UTC)[reply]

I've finished 200 entries. Please see User:Daniel Carrero/term cleanup. --Daniel Carrero (talk) 03:09, 22 October 2015 (UTC)[reply]

Plain links to non-English entries

Another big problem are plain wiki links to non-English entries, e.g. [[non-english lemma]] (or is this included in your proposal?), but I think this type of cleanup could be semi-automated. Jberkel (talk) 12:56, 3 November 2015 (UTC)[reply]

@Jberkel: That sounds like a good idea! It's not included in my current work (User:Daniel Carrero/term cleanup), but it's something that I could do as a separate project if people want, after I finish the current one.

Just to be clear -- Surely we want every plain link to be converted to either {{m}} (mentions of words), {{l}} (in synonyms lists, etc.) or other templates, right, no matter if the link is to an English or non-English section? I'd guess that probably most of the 2,1 million "gloss definition" (according to WT:STATS) entries have plain links in one way or another.

If one or more people are willing to pay for that as a separate cleanup project later, I don't have any problem with editing as many entries as possible for that purpose. But bear in mind that it would be basically revising all the existing entries, (a process that I would try to speed up using CSS to spot plain links quickly or something) so of all the possible options for a future cleanup project, this might be one of the longest ones. --Daniel Carrero (talk) 13:31, 3 November 2015 (UTC)[reply]

WingerBot has gone through all Russian entries and wrapped plain links within {{l}} under all section except parts of speech and etymology.--DixtosaBOT (talk) 13:46, 3 November 2015 (UTC)[reply]

A possible plan could be like that:

Let bots wrap automatically all links in all entries where it can be done faithfully (synonyms, coordinate terms, derived terms, etc.) Supposedly, bots would be unable to fix etymology, POS sections, usage notes and other sections.
Create some sort of dump listing all the pages that have plain links that bots are unable to fix, so that they could be done manually if people want.

--Daniel Carrero (talk) 13:52, 3 November 2015 (UTC)[reply]

Yes, I think every plain link should be converted to a template with an explicit link target, if we want the Wiktionary data to be useful and non-ambiguous. Whenever I edit entries I always try to get rid of any [[links]] I come across. Another problem I discovered are relative links ([[#English|foo]], if already on page foo. These are not so much usability problems right now but important if we look at Wiktionary outside of a website / wiki context.

If it worked well for Russian is there a reason not to use the same bot for other languages? A combination of bot + manual work sounds like a good plan of attack. Jberkel (talk) 16:06, 3 November 2015 (UTC)[reply]

I tend to do the opposite (convert to simple [[links]]) so we may be working against each other here. Equinox ◑ 16:44, 4 November 2015 (UTC)[reply]

Why would you do that? Please stop. —CodeCa t 16:58, 4 November 2015 (UTC)[reply]

Because it makes it much harder to read and edit the source code. Equinox ◑ 15:43, 5 November 2015 (UTC)[reply]

I oppose templated links in definitions. --Wiki Tiki 89 18:57, 4 November 2015 (UTC)[reply]

I oppose templated links in translations from Chinese to English. It's causing problems with {{zh-forms}}, which displays definitions of individual parts of Chinese (compound) words. --Anatoli T. ^{(обсудить}/^вклад) 23:54, 31 December 2015 (UTC)[reply]

It's good to know that we can still tell people who use screen readers to fuck off. DTLHS (talk) 19:14, 4 November 2015 (UTC)[reply]

I'm sure the random English word in the middle of an English sentence would be very confusing to a screen reader if it is not marked as English. --Wiki Tiki 89 21:14, 4 November 2015 (UTC)[reply]

Would that mean English links can remain in the normal wikilink style ([[links]]), even so, should all non-English links be templated? Maybe we should create a poll for these questions? --Daniel Carrero (talk) 21:27, 4 November 2015 (UTC)[reply]

In definitions, etymology sections, usage notes, and various other places, we have running English text. If we want to link a word that we happen to use in running English text, then I think plain links are the best choice in order for the wikitext to remain easy to read. But if we were to talk about a word or present an example of text, then we should use a template even if it is in English. The former situation only occurs in English (since this is the English Wiktionary), but the latter situation can occur with any language; therefore, non-English links should always be templatized, since they are always either mentions or examples. --Wiki Tiki 89 23:00, 4 November 2015 (UTC)[reply]

I don't agree – why should English get treated differently (and only sometimes)? As I've already said, it leaves room for ambiguity, especially when there are non-English entries using the same headword. Having two different ways of linking (plain/template) is also confusing editors, which means we have a lot of entries which link to non-English words using plain links (which is definitely the bigger problem and should get fixed first).

However, by using templates for all links, regardless of language, we minimize the possibility of mistakes and add valuable semantic information. It can be very useful for tools (like mentioned screen readers) to know a) you're linking to a headword (and not a user page etc. b) what language the word is you're linking to. If this information is not given, these tools need to "guess" it (maybe based on the context or some implied knowledge) and apply arbitrary defaults, which could be wrong; these assumptions could become invalid any time. Sure, it could be an English word, but it could very well be a wrongly tagged Spanish word. If the link has already been marked as English (or any other language) there's no extra guesswork required. And as a nice side-effect all links will be generated by the same code/template which means formatting is automatically consistent and can be tweaked "after the fact". A plain link is just a 'dumb' plain link, nothing can be changed about it.

About readability: no doubts, wikitext is a mess and will never win a prize for readability, but I think we should worry more about user readability, not editor readability. The advantages outweigh the few extra character to type.

Jberkel (talk) 04:32, 5 November 2015 (UTC)[reply]

It's not English that should be treated differently, it's running text that should be treated differently, and all of our running text happens to be English because this is the English Wiktionary. --Wiki Tiki 89 15:42, 5 November 2015 (UTC)[reply]

What exactly is the advantage of using templated links to people using screen readers? I never used a screen reader before, would it change the voice/accent according to each language when encountering sentences like "The word bread in Japanese is パン, from Portuguese pão."?

Apart from that, some advantages I know of using templated links are: proper formatting/scripting, both the standard formatting that we all see (as in MediaWiki:Common.css) and the user-side formatting (as in Special:MyPage/common.css).

Also, when you use plain links without language ([[example]]), it doesn't point to the correct section, plus the "orange links" gadget can't work for that reason. I remember sometimes seeing horrible plain links with languages back in the day ([[example#English|example]]), but I guess nobody does that anymore, it would be too much work to type. --Daniel Carrero (talk) 19:36, 4 November 2015 (UTC)[reply]

Most Anagrams sections, which are bot-generated, still use the [[example#English|example]] format. I always convert those to {{l|en|example}} when I see them, but normal links like [[example]] I leave alone for English words as I don't see what's wrong with them. (I do change them for other languages, though.) —Aɴɢʀ (talk) 19:57, 4 November 2015 (UTC)[reply]

Apart from Anagrams sections, I was thinking of older examples like this: here's a a 2010 version of the entry pizza with some links in the format of # [[#English|pizza]]. --Daniel Carrero (talk) 05:20, 5 November 2015 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ What I did with WingerBot for Russian is:

Only convert links in Russian sections.
Only convert links in lines beginning with *.
Only convert links containing no Latin text (I should probably fix this further to check specifically for Cyrillic text or non-alphabetic text).
I skip Etymology and Pronunciation sections. In Usage Notes sections, the links become {{m|...}}, otherwise {{l|...}}. This is a sort of compromise; probably I should either skip Usage Notes or go ahead and process Etymology and Pronunciation.
I skip links nested inside of templates and tables.
When converting links, if the link is of the form [[A|B]], then it becomes {{l|ru|B}} if A is the non-accented equivalent of B, otherwise it becomes {{l|ru|A|B}}. Links of the form [[A#Russian|B]] have the #Russian part removed in the process.
Currently the set of pages processed is those in the categories Category:Russian lemmas and Category:Russian non-lemma forms.

This could probably be automated for other languages. It's not clear to me that we need to convert plain English-language links, but definitely it should be done for non-English links esp. those in foreign scripts. Benwing2 (talk) 00:33, 5 November 2015 (UTC)[reply]

I believe a bot would definitely work for lists of links in synonyms/related terms, etc!

But I'd like to talk about: "probably I should either skip Usage Notes or go ahead and process Etymology and Pronunciation"

Correct me if I'm wrong, but I believe the bot would be unable to accurately edit all the links in etymology sections. What about sentences like this?

[[A]] is the [[gerund]] of [[B]]

If A and B are terms in, say, Old Spanish, then the final result would ideally be one of the 2 options below, with the correct language codes; we'd use Template:l for simple links and Template:m for mentions:

{{m|osp|A}} is the [[gerund]] of {{m|osp|B}} (plain link English word)
{{m|osp|A}} is the {{l|en|gerund}} of {{m|osp|B}} (templated English word) -- personally, I prefer this one!

If a bot can or could do the whole work, then it would defeat the point of me manually converting all the uses of {{term|ABC}} into {{m|xx|ABC}}, to be honest! :-) At Wiktionary:Beer parlour/2013/April#Template term and lang parameter the possibility of adding language codes to {{term}} through a bot run has been discussed. According to that discussion, User:CodeCat already did part of the job, which could be done reliably by bot: she used the bot to replace {{etyl|xx|yy}} {{term|word}} with {{etyl|xx|yy}} {{term|word|lang=xx}}.

Another idea:

After User:Daniel Carrero/term cleanup finishes, I can start editing Category:Entries with non-standard headers as a separate paid project -- i.e., manually converting "Initialism", "Abbreviation" sections into "Noun", "Proper noun", etc. in all entries. Would people want that?

I'd just ask to keep the rate I suggested initially of $1 every 100 entries. 8,457 entries = US$84.57. I don't mind if it's, say, 1 person paying the whole amount or 2 people paying $42.28 each. This helps me pay my bills. Thank you. :) --Daniel Carrero (talk) 06:16, 5 November 2015 (UTC)[reply]

Well, it only works for Russian because Russian and English have different character sets. It's much harder for languages like Old Spanish, as you point out; we'd have to skip Etymology and Usage notes and Pronunciation, no way around it. For such languages I might also want to restrict further the links that get templated to be only those in a line beginning with * that look like they're part of a list (using appropriate regexes and such to determine this). Benwing2 (talk) 09:50, 5 November 2015 (UTC)[reply]

@Benwing2: I understand. Do you think that what you have done for Russian can be done for many languages with different character sets? Maybe Hebrew, Arabic, Greek, Chinese, Gothic, Armenian, Korean, Georgian, etc.? That would sound like a good plan, if that's possible. --Daniel Carrero (talk) 02:04, 6 November 2015 (UTC)[reply]

@Daniel Carrero: I can run it on other languages. The main thing I'd need is a regexp that specifies characters within the given character sets. It would look something like %AW-XY-Z where %A gets all non-letter characters and W, X, Y and Z represent the endpoints of the Unicode ranges that contain the appropriate character sets for each language. If you could help construct these ranges it would make it a lot easier to run the bot. You might be able to just snarf the character set ranges from Module:scripts/data, with a bit of checking to make sure they're reasonable. Benwing2 (talk) 00:05, 7 November 2015 (UTC)[reply]

@Benwing2: Why do you need to get all non-letter characters separately? If you find a foreign equivalent for a hyphen, a comma or an apostrophe or something else, does it change how the bot must work?

I'm going to try getting codepoints for the first few scripts now. Is this reasonable? I added the "X script languages" because I figured this would help knowing which languages your bot could edit that use each script.

Armenian:

Category:Armenian script languages
Ա-֏
from U+0531 to U+058F -- Appendix:Unicode/Armenian

Coptic:

Category:Coptic script languages
Ϣ-ϯ
from U+03E2 to U+03EF -- Appendix:Unicode/Greek and Coptic

Cyrillic:

Category:Cyrillic script languages
Ѐ-ӿ
from U+0400 to U+04FF -- Appendix:Unicode/Cyrillic

Greek:

Category:Greek script languages
Ͱ-ϡϰ-Ͽ
from U+0370 to U+03E1; from U+03F0 to U+03FF -- Appendix:Unicode/Greek and Coptic

--Daniel Carrero (talk) 02:38, 9 November 2015 (UTC)[reply]

Thanks! I use a regexp that gets the correct script and also includes non-letter characters so it will still catch terms that have accents, macrons, hyphens, etc., but excludes letter characters from other scripts. I'll also have it print out warnings if it finds terms that it excluded but which have non-Latin characters in them, to make sure it's not excluding too much. I'm going to start on Armenian, we'll see how it works. Benwing2 (talk) 03:17, 9 November 2015 (UTC)[reply]

I have already done that for Georgian long time ago. --DixtosaBOT (talk) 06:13, 9 November 2015 (UTC)[reply]

@Daniel Carrero I ran this for Armenian, Greek and Ancient Greek. It required both the list of characters in each script and the entry-conversion regexps in Module:languages/data2 and such. I also had it look for transliteration in parentheses after the link and try to eliminate it (or incorporate into the link, if the language isn't a translit-overriding language). To determine whether something in parens is a transliteration, it transliterates the link and then computes the edit distance (Levenshtein distance) between the auto-generated translit and the explicit translit, and if it's small enough (depending on the length of the words in question), it's accepted, although there are additional checks. When those various checks fail, there's a warning issued. If you want to do a good deed, check the warnings listed in User:Benwing2/fix-links-grc-warnings and fix up the entries needing fixing. There are about 200 of them, and many of them can be ignored. You especially want to check the warnings that mention "Levenshtein distance ... not treating X as transliteration of Y" or "Upper/lower mismatch between explicit X and auto Y", where X is what's found in parens and Y is the automatic transliteration of the link (the Levenshtein distance warning is slightly misworded). For example, the warning "WARNING: Levenshtein distance 15 too big for length 6, not treating Arktos, “Ursa Major” as transliteration of Árktos" means that it found something like [[Ἄρκτος]] (Arktos, “Ursa Major”) and determined that the stuff in parens couldn't be a translit of the link; in this case, the translit should be removed and the gloss incorporated into the link. There are also warnings of the sort "Link contains non-Latin characters not in proper charset", which are links in various non-Greek charsets that could be converted to templated links in the proper language, although a few appear to be Greek script and must contain some non-Greek character in them, which could be fixed. Benwing2 (talk) 10:18, 11 November 2015 (UTC)[reply]

BTW for Ancient Greek it was a bit tricky, or at least I had to use modern Greek (code el) in the Descendants section of Ancient Greek entries, since they share the same charset. Benwing2 (talk) 10:46, 11 November 2015 (UTC)[reply]

I've seen a few of the recent contributions of User:WingerBot, they look good!

OK, you've been using information from Module:languages/data2, but I assume you still need the codepoints I'm looking for you, right?

I'll look at User:Benwing2/fix-links-grc-warnings with more attention later. For the moment, I'll leave a few more codepoints here for you. Some of the starting or ending codepoints are combining forms, is that a problem? I can get codepoints ignoring combining forms if you want.

Gujarati:

Category:Gujarati script languages
ઁૹ
from U+0A81 to U+0AF9 -- Appendix:Unicode/Gujarati

Gurmukhi:

Category:Gurmukhi script languages
ਁੵ
from U+0A01 to U+0A75 -- Appendix:Unicode/Gurmukhi

Oriya:

Category:Odia script languages
ଁୱ
from U+0B01 to U+0B71 -- Appendix:Unicode/Oriya

Tamil:

Category:Tamil script languages
ஂ௺
from U+0B82 to U+0BFA -- Appendix:Unicode/Tamil

Telugu:

Category:Telugu script languages
ఀ౿
from U+0C00 to U+0C7F -- Appendix:Unicode/Telugu

--Daniel Carrero (talk) 10:55, 11 November 2015 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Thanks. I did Tamil, Telugu, Oriya, Punjabi, Gujarati, and I'm currently doing Hindi and Hebrew. The combining codepoints are OK for Python but cause problems in vim syntax highlighting, so I rewrote them using \u escapes. The language-specific info comes from a combination of Module:scripts/data, Module:languages/data2 (or data3/...), and Module:links (override_translit). The actual language-specific info looks like this:

...

def ar_remove_accents(text):
  text = re.sub(u"\u0671", u"\u0627", text)
  text = re.sub(u"[\u064B-\u0652\u0670\u0640]", "", text)
  return text

# Each element is full language name, function to remove accents to normalize
# an entry, character set range(s), and whether to ignore translit (info
# from [[Module:links]], or "notranslit" if the language doesn't do
# auto-translit)
languages = {
    'ru':["Russian", ru.remove_accents, u"Ѐ-џҊ-ԧꚀ-ꚗ", False],
    'hy':["Armenian", hy_remove_accents, u"Ա-֏ﬓ-ﬗ", True],
    'el':["Greek", lambda x:x, u"Ͱ-Ͽ", True],
    'grc':["Ancient Greek", grc_remove_accents, u"ἀ-῾Ͱ-Ͽ", True],
    'hi':["Hindi", lambda x:x, u"\u0900-\u097F\uA8E0-\uA8FD", False],
    'ta':["Tamil", lambda x:x, u"\u0B82-\u0BFA", True],
    'te':["Telugu", lambda x:x, u"\u0C00-\u0C7F", True],
    'gu':["Gujarati", lambda x:x, u"\u0A81-\u0AF9", "notranslit"],
    'or':["Oriya", lambda x:x, u"\u0B01-\u0B77", "notranslit"],
    'pa':["Punjabi", lambda x:x, u"\u0A01-\u0A75", "notranslit"],
    'he':["Hebrew", he_remove_accents, u"\u0590-\u05FF\uFB1D-\uFB4F", "notranslit"],
    'ar':["Arabic", ar_remove_accents, u"؀-ۿݐ-ݿࢠ-ࣿﭐ-﷽ﹰ-ﻼ", False],
}

It doesn't take too much work to find this info. However, it takes more effort to go through the warnings, so if you have time that's definitely something that would help. Here are warnings for modern Greek (98 of them) and from five Indian languages, not including Hindi (20 of them in total):

Again, not all of these warnings need to be fixed but they could use a once-over. Benwing2 (talk) 04:00, 12 November 2015 (UTC)[reply]

Here are the Hindi warnings (20 of them):

User:Benwing2/fix-links-hi-warnings

Benwing2 (talk) 04:07, 12 November 2015 (UTC)[reply]

Here are the Hebrew warnings (49 of them):

User:Benwing2/fix-links-he-warnings

Benwing2 (talk) 04:19, 12 November 2015 (UTC)[reply]

@Benwing2 In the meantime after you've sent that message, I've been offline for a few days and also busy with other things, but I promise I'll look into these lists.

Also, I've inserted a subsection of the conversation since the subject changed kind of abruptly with the idea of converting plain links into templated links, which I consider unrelated to the discussions above. --Daniel Carrero (talk) 23:12, 1 December 2015 (UTC)[reply]

@Daniel Carrero Cool, thanks. Benwing2 (talk) 00:19, 2 December 2015 (UTC)[reply]

@Benwing2

Done all the 5 pages:

But I don't speak Hebrew, so I couldn't check any items like "doesn't match accented" and "putative explicit translit" in the he page. --Daniel Carrero (talk) 10:22, 31 December 2015 (UTC)[reply]

I went over it a few weeks ago back. Enosh (talk) 12:35, 31 December 2015 (UTC)[reply]

@Daniel Carrero Awesome, thanks! Benwing2 (talk) 17:18, 31 December 2015 (UTC)[reply]

Code for Westrobothnian

Discussion moved from Wiktionary talk:Beer parlour#Code for Westrobothnian.

Apparently Westrobothnian is still considered a dialect of Swedish here, even though it's linguistically impossible. As I understand, the reason is that there is no language code for Westrobothnian here yet. On Swedish Wiktionary, we use gmq-bot. Languages like Jamtish and Scanian obviously also need their respective language codes, but at this time I'm only asking for Westrobothnian in order to stop people from adding it under Swedish (I kind of hurfes when I think of people doing this; it's like watching someone paint a norseman with a horned helmet, or hearing someone argue that we only use 10% of our brain). — Knyȝt (talk) 13:01, 20 October 2015 (UTC)[reply]

I think the word you're looking for is shudder.—msh210℠ (talk) 17:13, 22 October 2015 (UTC)[reply]

I have no objection to a new code for Westrobothnian, I only object to the removal of Westrobothnian forms from ==Swedish== entries as long as there's no place to move them to. We may as well discuss the other "Swedish dialects" that could be considered separate languages. In addition to Westrobothnian, they seem to be:

Dalecarlian (including Elfdalian), which had the ISO-639 code dlc until 2009 or so
Jamtlandic, which had the ISO-639 code jmk until 2009 or so
Scanian, which had the ISO-639 code scy until 2009 or so
Gutnish, which (like Westrobothnian) has never had an ISO-639 code.

The ISO-639 codes were apparently removed at the request of the government of Sweden, which didn't like the idea of their not being dialects of Swedish; the decision was thus more political than linguistic. At the moment, Scanian seems to be the only one we accommodate at all: Category:Regional Swedish includes subcategories only for Finland Swedish, Scanian Swedish, and Swedish Swedish. —Aɴɢʀ (talk) 14:10, 20 October 2015 (UTC)[reply]

I think we should use the old codes. What the government of Sweden thinks isn't relevant to linguistic interests. —CodeCa t 14:27, 20 October 2015 (UTC)[reply]

Of the five under discussion, only three have old codes. We can use those, but we should probably prefix them with gmq-. The other two will need codes of their own. Westrobothnian may as well use gmq-bot, as it does at sv-wikt. If Gutnish doesn't already have a code at sv-wikt, gmq-gut will work. —Aɴɢʀ (talk) 14:50, 20 October 2015 (UTC)[reply]

Well, what do you know, we already have dlc and gmq-gut, so it's just a matter of the other three. —Aɴɢʀ (talk) 18:11, 20 October 2015 (UTC)[reply]

Would an administrator please implement the codes gmq-jmk Jamtish, gmq-scy Scanian and gmq-bot Westrobothnian in Module:languages/datax? Thank you in advance --87.63.114.210 15:44, 4 November 2015 (UTC)[reply]

I oppose these codes, they should be jmk and scy, since those are the ISO codes for these languages. —CodeCa t 16:03, 4 November 2015 (UTC)[reply]

That is fine - there is just a need for some codes as to avoid inserting simple links [[term]] in the Descendants sections of the Proto-Germanic entries. --87.63.114.210 16:08, 4 November 2015 (UTC)[reply]

I oppose those codes; those aren't the ISO codes for these languages, those were the codes. They are no longer valid to use for new works. gmq-jmk, gmq-scy and gmq-bot are more consistent with the ISO standards.--Prosfilaes (talk) 23:29, 10 November 2015 (UTC)[reply]

We use sh, don't we? —CodeCa t 00:13, 11 November 2015 (UTC)[reply]

And we should probably use hbs instead. I'm more interested in us doing the right thing going forward then fighting that battle, though. "scy" and friends are not found on the official list of ISO 639-3 codes.--Prosfilaes (talk) 01:28, 11 November 2015 (UTC)[reply]

Unified multilingual Wiktionary

As Wiktionary:Project – Unified Wiktionary outreach and Wiktionary:Project – OmegaWiki are no longer active, OmegaWiki makes a multilingual dictionary to describe all words of all languages with definitions in all languages. However, without further progress at m:OmegaWiki, I have thought of a possibly simpler way to make a unified dictionary right here, possibly moving to wiktionary.org in the future:

Making many heading from Wiktionary:Entry_layout#Additional headings, like English, pronunciation, nouns, etc., automatically translated to other languages per users' preferences, like on Wikimedia Commons, would be valuable to encourage merging smaller Wiktionaries in other languages to this largest site.
Repeating many materials like pronunciation, synonyms, antonyms, derived terms, related terms, etc., is the drawback of keeping too many Wiktionaries in many languages. Unified Wiktionary would eliminate the duplicated materials.
Smaller Wiktionaries lack sizable community to justify bureaucrats and possibly administrators. Merging them here would administer all contents much more efficiently.
Considering Wiktionary:Entry_layout#Variations for languages other than English, only when Wiktionaries in certain other languages want to merge hereto, should we translate them to English then to third languages. For example, Chinese entries here should not yet be translated to third languages, until Chinese Wiktionary is going to merge hereto, which is what I dream of due to too few active users and too many entries with poor quality.
As OmegaWiki already has definitions of any word in many languages, Unified Wiktionary would also give etymology, usage notes, references, etc. in many languages.

I propose inviting other Wiktionaries optional merger when we are ready here based on my proposals above. Any useful comments are welcome.--Jusjih (talk) 00:29, 22 October 2015 (UTC)[reply]

Data unification can be very beneficial, but we must be careful when deciding what can and what can’t be merged. Definition- or translation-based mergers (the sort that the Wikidata folk wants to force upon us) are an outright horrible idea because they ignore the existence of anisomorphism, and that language is not mathematics.

I think the least controversial content mergers would be pronunciations and inflection tables. — Ungoliant ^(falai) 01:06, 22 October 2015 (UTC)[reply]

Wikimedia Incubator has Translingual Wiktionary as a stub project with a few pages and unclear format/purpose, so if there's any effort directed to making a multilingual Wiktionary, I suggest using that. I added some Portuguese verbs there myself back in 2011.

Ungoliant is right about the problems he mentioned, though. --Daniel Carrero (talk) 09:50, 22 October 2015 (UTC)[reply]

A very obvious problem is the decision of what to treat as a language. Other Wiktionaries may not accommodate reconstructed languages for example. Lithuanian Wiktionarians may object to having Proto-Balto-Slavic. And I doubt the Croatian Wiktionary will want to merge "their" language into Serbo-Croatian for our sake. Wiktionaries may also differ in the classification of languages. Even pronunciation details may differ; consider how divided our own Wiktionary is on /a/ vs /æ/ for British English. —CodeCa t 14:40, 22 October 2015 (UTC)[reply]

Thanks so much for all of your valuable comments. Wikimedia Commons already uses {{int:summary}}, {{int:support}}, {{int:oppose}}, etc. to automatically translate common phrases to many languages, and this is needed to go multilingual here. We should try a pilot program to phase in auto-translation as on Commons. For example, ks:ठूल is really Kashmiri-English, so once we auto-translate important layouts, we may be able to bring the contents of many smaller Wiktionaries here. As many minor languages lack their own Wiktionaries and some have been closed out, their users may want to come here to translate their words, compounds, and phrases to English.--Jusjih (talk) 16:54, 22 October 2015 (UTC)[reply]

Anyway, it's impossible: in a common project, common decisions must be taken, and there must be a common language for discussions. Discussions are in a different language for each wiktionary, and this principle cannot be changed: I'm not sure that you would be willing to close en.wikt in order to merge data with fr.wikt, and to change your discussion language to French. Wiktionaries are a success, Omegawiki is a failure (7 contributions a day: this is the average for the last 7 days), and there are good reasons for this situation. Trying to adopt Omegawiki principles would not be beneficial at all, it would kill projects adopting them. I am convinced that the best way to share data is through bots (bots importing data when possible, bots checking data consistency when possible, bots providing lists of words or list of translations, etc.) Lmaltier (talk) 20:49, 22 October 2015 (UTC) Of course, anybody can contibute to any wiktionary, not only in one's native language. This is already the case. And it would be beneficial to provide translation tables in all entries (except inflected forms), not only in English word entries. But this does not change the principle: one project for each discussion language. Nobody would propose a single unified Wikipedia. It's the same for wiktionaries. Lmaltier (talk) 21:00, 22 October 2015 (UTC)[reply]

I do not mean closing out English Wiktionary to go multilingual. I just suggest trying limited auto-translation here, as on Wikimedia Commons, to test merging minor languages in, while keeping major languages separate, as many Wiktionaries in minor languages have been closed out. If no consensus to go very multilingual, we need more global bots to coordinate Wiktionaries in different languages, maybe in connection with Wikidata.--Jusjih (talk) 00:59, 23 October 2015 (UTC)[reply]

Well, your title above is Unified multilingual Wiktionary... If the main idea is automatic translation of Wiktionary pages, tools already exist for major languages. For other languages, anyway, human translation of definitions, etc. would be needed, and there is no reason why there would be more contributors on a site in a foreign language than on a site in one's own language... Lmaltier (talk)

As files from Wikimedia Commons may be transcluded on other wikis with local descriptions possible, I am thinking of keeping language subdomains for language-dependent things, like categories, so even should the unified multilingual Wiktionary be approved through Meta request, only main articles will likely be imported there for internationalization of entry layouts, etc. for future transclusion on language subdomains. I will open further discussion on Meta later. Thanks.--Jusjih (talk) 00:44, 24 October 2015 (UTC)[reply]

Verbs that introduce a subordinate clause

Verb senses like “I think [that] she is here”, “I saw everyone go away”. Is there a grammatical term for them? — Ungoliant ^(falai) 19:09, 22 October 2015 (UTC)[reply]

Your two examples are completely different. "Think" is simply a transitive verb and the subordinate clause functions as a noun phrase: What do you think? I think thoughts. What are your thoughts? My thoughts are that she is here. "Saw" is a sensory verb, and it seems that sensory verbs have a special case where their direct objects can be modified by a bare infinitive in addition to a participle: I saw everyone. Everyone was going away. I saw everyone going away. I saw everyone go away. ~~Everyone was go away~~. It does seem odd. I would like to know the reason behind this. --Wiki Tiki 89 19:42, 22 October 2015 (UTC)[reply]

I'm unaware of a special term for verbs that can take clauses as their complement, but the difference between your two examples is the kind of clause they take. "Think" is followed by a noun clause, "(that) she is here", which (when that is removed) is by itself a complete sentence. "Saw", however, is followed by a small clause, "everyone go away", which is not a complete sentence and cannot be introduced by that. (You can tell it's not a complete sentence because everyone takes a singular verb, "everyone goes away", but in this case go is in the bare stem form.) When think means "hold an opinion" rather than "believe something to be the case", it can take either a noun clause or a small clause: I think that she is pretty or I think her pretty. —Aɴɢʀ (talk) 20:21, 22 October 2015 (UTC)[reply]

Thank you both. I guess I’ll continue to label them transitive. — Ungoliant ^(falai) 20:29, 22 October 2015 (UTC)[reply]

There is a category of verbs called "reporting verbs": Category:English reporting verbs, [8]. - -sche (discuss) 03:32, 23 October 2015 (UTC)[reply]

Yes, though neither "think" nor "see" is a reporting verb. —Aɴɢʀ (talk) 12:20, 23 October 2015 (UTC)[reply]

But it doesn't seem that reporting verbs are all that special grammatically. It seems more of a category that writers use to avoid saying "he said". --Wiki Tiki 89 15:31, 23 October 2015 (UTC)[reply]

"See" and "think" are labelled as reporting verbs by the site I linked to. "Think", at least, seems to be able to report things about as well as "order", which our own category gives as a reporting verb:

John shouted "leave!" — [report:] (I left because) he ordered me to leave.

John told me I should leave. — [report:] (I left because) he thought I should leave.

The category does seem to be rather amorphous, as Wikitiki notes. - -sche (discuss) 20:11, 24 October 2015 (UTC)[reply]

Vote on disallowing extending of votes - 7 days remaining

FYI, you can still vote at Wiktionary:Votes/pl-2015-07/Disallowing extending of votes.

Current results:

Support: 8 - 66%
Oppose: 4 - 33%
Abstain: 2 - N/A

--Dan Polansky (talk) 09:24, 24 October 2015 (UTC)[reply]

Is this vote going to be extended? :) —CodeCa t 13:42, 24 October 2015 (UTC)[reply]

<sarcasm>With 66% support, this vote could use more time to build consensus. Definitely extend.</sarcasm> --Daniel Carrero (talk) 13:46, 24 October 2015 (UTC)[reply]

From the vote: "Duration note: The vote is set for three months, and is not expected to be extended, to prevent discussions about circularity or recursiveness of the vote." --Dan Polansky (talk) 14:21, 24 October 2015 (UTC)[reply]

Please see - capital letter discussion

Please give your opinion on this discussion above, in which I proposed moving an appendix (already formatted as an entry) to the main namespace, to make it searchable. Current "results":

2 support (me and Andrew Sheedy)
0 oppose
0 abstain

--Daniel Carrero (talk) 11:00, 24 October 2015 (UTC)[reply]

The principle for which this case would be a precedent is the placement in mainspace of material that can be presented in the form of an entry even though it does not share the common characteristics of dictionary entries, appearing typically in a style manual, usage guide, or grammar.

One conceptual difficulty is that the headword, proposed to bear the headword "Unsupported titles/Capital letter, with the displayed title as [capital letter]", does not have the same relationship to the content as a normal headword does to a normal entry.

One suggested advantage is that the page would be included in searches. I can't quite imagine how a normal user, ie, one not an active participant in discussions such as this, would ever enter terms that would find the article and put it on the first search page, let alone near the top.

I propose the following alternative. As we already have an entry for capital letter (to which I have added a "See also" link to the Appendix), every L2 section of every one of our entries for capital letters should have such a link, preferably to the L2-like section of Appendix:Capital letter. DCDuring TALK 12:11, 24 October 2015 (UTC)[reply]

Just a note: I'm going to paste here my original rationale about moving the appendix to the entry namespace, assuming that the discussion should continue here:

Not only the whole page is formatted like an entry, if we assume that entries like A, B, C, etc. should have senses like "found in the beginning of proper nouns" and "found in the beginning of sentences", "found in the beginning of taxonomic names", etc., then the page Appendix:Capital letter suppresses the need for creating those definitions in every single letter. Think of it as a merger of all the entries for capital letters because they would have repeated information otherwise. The idea of "capital letter" is something of lexical significance, and completely able to be checked for attestations just like a normal entry. Also IMHO it is more important than the entry ] [.

Re DCDuring: I take your points about the headword and also about the difficulty of this page appearing at the top of search results. I like your idea of linking from every entry of every capital letter to the appendix. --Daniel Carrero (talk) 12:23, 24 October 2015 (UTC)[reply]

To save space, I think we should link the letter entries from the headword line, as opposed to linking them from the see also section. See all the letter sections in the entry B. I edited a few templates to make the headword lines of capital letters link to the appendix, at least when they use separate templates like {{en-letter}} and not {{head|en|letter}} (changing that would require me to edit {{head}}). Feel free to discuss. --Daniel Carrero (talk) 12:37, 24 October 2015 (UTC)[reply]

"Saving space" doesn't convince me, but I think the conclusion is right. IMO the more compelling argument for not relying exclusively on a link at "See also" is that users may not get that far down an entry. Either a link in the headword link or from a prominently placed definition is more likely to get a user's attention, I think. DCDuring TALK 18:17, 22 December 2015 (UTC)[reply]

To clarify: by "saving space" I meant that adding a new link in all "See also" sections to, for example, all the 30 languages at A would make the page longer and thus we'd have to scroll more. (not counting TabbedLanguages and section links) Plus, just editing the templates as I did above was a lazy and quick solution, easier than adding the link tothe "See also" in all sections of all pages.

I agree with your argument about: "users may not get that far down an entry". --Daniel Carrero (talk) 18:23, 22 December 2015 (UTC)[reply]

Incomplete etymologies

Quite often, there are words where I can easily tell where the word eventually came from, but it's much harder to determine how it made its way into the language. For example, Northern Sami anánas originates from the same source as similar words in most other European languages, but it could have been borrowed through Norwegian, Swedish or Finnish (the three languages that most Northern Sami loanwords come from) and I can't tell which one. návli is from Germanic, but did it come from Norwegian or Swedish, from Old Norse, or straight from Proto-Germanic?

Sometimes, etymologies are doomed to be incomplete because there just isn't enough information. I generally just put in what I can figure out myself. But I think it would be useful if there was a way to tag an entry with an "incomplete etymology" tag of some sort. Currently I've used {{rfe}} for this, but I don't think that's really correct when there is some etymology, just not enough to really explain the origin of the word in the necessary detail. Any thoughts on this? —CodeCa t 19:54, 24 October 2015 (UTC)[reply]

{{etystub}} is supposed to be for exactly that purpose. The case of anánas is different, in my opinion; we may never know the answer, and it would be best simply to list all three likely possibilities and give the ultimate etymon. —Μετάknowledge^{discuss/deeds} 20:12, 24 October 2015 (UTC)[reply]

{{etystub}} is a little bulky for the job. Something like {{rfelite}} would be better, but an expression of incompleteness like ultimately from Proto-Germanic or a more recent North Germanic language and placement in a category such as Category:Incomplete Northern Sami etymologies would seem to do the job. DCDuring TALK 00:37, 25 October 2015 (UTC)[reply]

I rephrased {{etystub}} a bit. —CodeCa t 00:48, 25 October 2015 (UTC)[reply]

You sell yourself short. You've completely changed the visual appearance of the template. DCDuring TALK 00:54, 25 October 2015 (UTC)[reply]

I was expecting a change from a table to bare text or vice versa when I read that, but it's just plain text (which I think is good). I agree with CodeCat's edit summary, the wording is now more in line with what I'd expect from a "stub" template. - -sche (discuss) 05:37, 25 October 2015 (UTC)[reply]

Zipser German

I have words to add from Zipser German, a Central German lect which developed as such in the 1300s in Slovakia (where it is still spoken in Hopgarten as Outzäpsersch i.e. Altzipserisch) before being carried to Franzenthal, Wassert(h)al, elsewhere in northern Romania, and Bukovina, where it was over time increasingly influenced by Upper Austrian. You can see a sample at User:-sche/Zipser. I would like to give it the code gmw-zps and treat all of the dialects under the one code in accordance with the literature on the subject, which speaks of it as one language. - -sche (discuss) 06:03, 25 October 2015 (UTC)[reply]

Done --Lo Ximiendo (talk) 18:46, 25 October 2015 (UTC)[reply]

Restoring WT:ELE

I request that WT:ELE (Wiktionary:Entry layout) is restored to the state of 13 October 2015. The subsequent changes seem rather subtantial to me, and require a vote, IMHO. For editing with abandon, there is Wiktionary:Entry layout/Editable. Thank you. --Dan Polansky (talk) 13:59, 25 October 2015 (UTC)[reply]

Which parts of the new version do you disagree with? —CodeCa t 14:01, 25 October 2015 (UTC)[reply]

I edited the WT:EL. See the history and the specific diff from the date that Dan Polansky mentioned.

I tried editing with the current consensus in mind, i. e., I believe the new version reflects best our current practices.

That said, I am aware that the policy box says "It should not be modified without discussion and consensus. Any substantial or contested changes require a VOTE." That was substantial indeed, and now contested by Dan Polansky. --Daniel Carrero (talk) 14:16, 25 October 2015 (UTC)[reply]

The point is that the change is substantial. Per "substantial or contested" in "Any substantial or contested changes require a VOTE" (on the top of the page), it suffices that it is substantial; it does not even need to be contested. I really don't see what there is to discuss, unless I have woken up in some Orwellian world, again. --Dan Polansky (talk) 14:22, 25 October 2015 (UTC)[reply]

That's OK. I am going to restore the 2 policies to the point you requested and create a vote for them. --Daniel Carrero (talk) 14:24, 25 October 2015 (UTC)[reply]

Updates/notes:

Created Wiktionary:Votes/pl-2015-10/CFI and EL revision for the substantial revision discussed here.
Created, too, Wiktionary:Votes/pl-2015-10/Always use templates for headword lines for another policy change discussed here.
Also, Wiktionary:Votes/2015-10/Matched-pair naming format: left, space, right started yesterday and is ready for votes.

I'd like to know if other people disagree with any of the changes/proposals. Thank you. --Daniel Carrero (talk) 15:24, 25 October 2015 (UTC)[reply]

Restoring WT:CFI

I request that WT:CFI is restored to the state from 5 September 2015. For free editing, there is Wiktionary:Criteria for inclusion/Editable. Thank you. --Dan Polansky (talk) 14:04, 25 October 2015 (UTC)[reply]

I restored CFI too per your request and I am going to create a vote for it, too. Though, FWIW, I'd like to say that while the changes to EL were going to be very substantial, I don't consider the changes to CFI to be substantial. Diffs:

Wiktionary:Criteria for inclusion: 35064944 (diff)
Wiktionary:Entry layout: 35064941 (diff)

--Daniel Carrero (talk) 14:40, 25 October 2015 (UTC)[reply]

The CFI change may be less substantial but is bad, and I am going to oppose it. It introduces phrasing "It has been voted" and it introduces rationales in "One reason for having separate pages ...". That is bad for a policy page, IMHO, and AFAIK some people agree with me in this regard. A policy page should state its shoulds AKA regulations and that's it. It should not state "It has been voted on"; we have refereces to votes for that. And rationales should be in the votes that lead to the policy, not in the policy itself, IMHO. --Dan Polansky (talk) 15:29, 25 October 2015 (UTC)[reply]

@Dan Polansky: Point taken. In my proposed revisions, there is 1 explicit mention to a vote in the CFI, and 1 in the ELE. I intend to remove those from the proposal, per your criticism. Are there many other points you would disagree with? Or: Would you support the proposal? If not, what could change in the proposal before you could consider supporting it?

Sometimes, I see you posting long, detailed arguments about a given issue. If you have time/interest to review the proposal to be voted, we could discuss any changes to be made before the vote starts. --Daniel Carrero (talk) 22:10, 26 October 2015 (UTC)[reply]

Suggestion: Edit to Template:policy

I propose editing Template:policy like this, to organize the different types of policies:

	This is a Wiktionary policy, guideline or common practices page. It must not be modified without a VOTE.
	Entries: CFI - EL - NORM - NPOV - QUOTE - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS.

(I removed WT:REDIR recently as outdated and added WT:LT because I believe it's important, like WT:AXX.)

--Daniel Carrero (talk) 02:12, 27 October 2015 (UTC)[reply]

There are pages like WT:About English where parts of the pages are policy and parts aren't. I think this could be made clearer with an optional parameter in the template. Renard Migrant (talk) 14:27, 27 October 2015 (UTC)[reply]

Category:Regional Hebrew for diachronic varieties

Currently, the categories Category:Classical Hebrew, Category:Biblical Hebrew, Category:Mishnaic Hebrew, and Category:Israeli Hebrew are categorized under Category:Regional Hebrew. However, these are all diachronic varieties of Hebrew and all existed in essentially the same region. Is there a better way to categorize diachronic varieties of a language? What other languages have this problem? --Wiki Tiki 89 22:24, 27 October 2015 (UTC)[reply]

Pretty much every old language that has varieties. Latin in particular. —CodeCa t 22:46, 27 October 2015 (UTC)[reply]

Yes but we don't seem to have a Category:Regional Latin. --Wiki Tiki 89 01:11, 28 October 2015 (UTC)[reply]

Documenting how to handle long s and ligatures

We exclude a number of graphical variants, such as long s (Talk:diſtinguiſh) and ligatures like f-i, s-t, f-f-l and so forth (Talk:ﬁsherwoman, Talk:phileraﬆ), but these two practices are not explicitly documented on a Wiktionary-namespace page as far as I can tell. I'd like to know if these practices are still supported; if they are, I'll document them somewhere (perhaps in WT:CFI#Spellings in the vicinity of the line about combining characters?). By the way, can our javascript be made to redirect ﬁ etc to fi etc, like it redirects ſ to s? (Go to [[ſiſter]] and you're sent to [[sister]] after a second, but go to [[ﬁsh]] and your browser sits on the blank page.) - -sche (discuss) 02:15, 28 October 2015 (UTC)[reply]

Wouldn't this be very language dependent? So maybe the specific language policy pages would be a better place. DTLHS (talk) 02:32, 28 October 2015 (UTC)[reply]

Not really. Is there any language where ﬁ is a different letter from fi? —CodeCa t 03:11, 28 October 2015 (UTC)[reply]

Make it policy (vote anyone? as much as I hate the v-word) and then add exceptions as we find them. If we find them, I agree with CodeCat there probably aren't any. A lot of our policies aren't documented because of the difficulty of getting stuff through a vote (votes failing with 65% support and whatnot) but it's rarely a problem because there are few enough of us we can just discuss it. Renard Migrant (talk) 13:20, 28 October 2015 (UTC)[reply]

The rules actually were very language-dependent, especially with regard to sequences of two or more S's. --Wiki Tiki 89 14:49, 28 October 2015 (UTC)[reply]

Go on. Renard Migrant (talk) 15:03, 28 October 2015 (UTC)[reply]

I swear I thought there were differences, but I can no longer find any evidence of them. Perhaps I was thinking of v vs. u, where some languages used u even at the beginnings of words. --Wiki Tiki 89 15:09, 29 October 2015 (UTC)[reply]

@Wikitiki89: here you go. --Romanophile ♞ (contributions) 15:30, 29 October 2015 (UTC)[reply]

That really is a good source supporting the idea that the long s is a typographical variant of s as opposed to another letter. Renard Migrant (talk) 16:50, 29 October 2015 (UTC)[reply]

I think that we should make an appendix for ſ, similar to how we have an appendix for capital letters. For the stylistic ligatures, though, there’s not much to say about them. Even as potential redirects, they wouldn’t be very utile considering that very few people know how to type them. Plus, Unicode discourages them anyway. Automatic redirections would be acceptable with me, though.
A policy prohibiting stylistic ligatures is okay with me. It would have been nicer if somebody made that years ago, though. --Romanophile ♞ (contributions) 13:21, 29 October 2015 (UTC)[reply]

A few proposed changes to WT:NORM

There are a few changes that are being proposed at Wiktionary talk:Normalization of entries. Since not many people have responded, I'm letting you know here. —CodeCa t 20:13, 28 October 2015 (UTC)[reply]

Restoring WT:NORM

I ask that WT:NORM is restored to the state of 5 September 2015. Since then, substantive (meaning-changing) changes took place without a vote, and that cannot be per what it says at the top of WT:NORM: "Any substantial or contested changes require a VOTE." The addition of the template containing this text to the page is a consequence of Wiktionary:Votes/pl-2015-07/Normalization of entries 2.

Let me reiterate that my contesting the changes now is not necessary; the condition contains an or: "Any substantial or contested changes require a VOTE." --Dan Polansky (talk) 20:18, 28 October 2015 (UTC)[reply]

You haven't actually contested any changes. You just voiced a blanket "I don't like it"-style disagreement with no rationale. —CodeCa t 20:20, 28 October 2015 (UTC)[reply]

I have made the restoration. I point the above editor and anyone else to the word "or" in the condition. I have my hopes. --Dan Polansky (talk) 20:20, 28 October 2015 (UTC)[reply]

I've restored the current version. The changes that were made didn't change the meaning, so this proposal is unconstructive and in bad faith. On that ground I reserve the right to reverse it. —CodeCa t 20:23, 28 October 2015 (UTC)[reply]

I ask for restoration. I won't revert war on that page; someone else has to restore the page to a proper state. If the page will not get restored, it will ipso facto cease to be a policy. --Dan Polansky (talk) 20:27, 28 October 2015 (UTC)[reply]

What Dan Polansky is saying is that a voted-on policy shouldn't be altered without a vote. It is a bit of a pain when someone missed a comma out and nobody wanted to vote against a proposal merely on the basis of a missing comma, then you need another vote just to insert a bleeding comma where one is needed. Renard Migrant (talk) 16:43, 29 October 2015 (UTC)[reply]

Eats, shoots, and leaves. DCDuring TALK 17:52, 29 October 2015 (UTC)[reply]

That is not entirely accurate. You can correct a missing or misplaced comma without a vote as long as it does not change the meaning of the sentence. It follows from "Any substantial or contested changes require a VOTE", which was voted by the community to apply in Wiktionary:Votes/pl-2012-03/Vote requirements for policy changes. --Dan Polansky (talk) 09:32, 7 November 2015 (UTC)[reply]

As such pages are not assuredly on anyone's watchlist and certainly not on everyone's, shouldn't a contributor making such changes draw attention to them by giving notice here to expose them to being deemed "substantial or contested"? To me that seems like an efficient way of reducing acrimony. This search illustrates that commas have been involved in controversies of interpretation. DCDuring TALK 12:49, 7 November 2015 (UTC)[reply]

I think it is enough for such pages to be on the watchlist of the admins who care enough to enforce such policies. --Wiki Tiki 89 14:51, 9 November 2015 (UTC)[reply]

What about those who would prefer that such policies were not altered and then imposed on them? What about new and occasional contributors, potential recruits to more substantial contribution?

We seem to have a more than sufficient number of folks whose principal contribution is finding rules to impose on content contributors. The rules can hardly be said to make it easier for new contributors to get involved in our efforts. DCDuring TALK 21:47, 9 November 2015 (UTC)[reply]

Which is why no one can make any changes to the rules (i.e. changes to the substance of a policy page) without a vote; and admins who care enough will watch the policy pages to make sure such changes are not made. --Wiki Tiki 89 22:02, 9 November 2015 (UTC)[reply]

"Proper" codes for etymology-only languages part 2

In Wiktionary:Beer_parlour/2013/December#"Proper" codes for etymology-only languages, it has been proposed renaming the etymology-only language codes into the "proper" format of aaa-aaa, like this: Late Latin/LL. > "la-lat". Can we do that now? I take it would require a vote?

Rationale: Standardization. It is weird that the different codes work in different ways:

Currently:

Late Latin: both {{etyl|LL.|en}} and {{etyl|Late Latin|en}} work (it does not matter if we use the code or name)
Latin: only {{etyl|la|en}} works, {{etyl|Latin|en}} does not work (we have to use the code, not the name)

Proposal:

Late Latin: only {{etyl|la-lat|en}} should work, {{etyl|Late Latin|en}} would not work anymore

It would require a bot changing the codes in all entries before full implementation.

That discussion also introduced the idea of leaving both old and new codes working together for a while (LL. and la-lat) as a transition period while people get used to them. What do you think? --Daniel Carrero (talk) 03:38, 29 October 2015 (UTC)[reply]

We've already been in the transitional period since then. The new codes already work. See Module:etymology languages/data. —CodeCa t 13:35, 29 October 2015 (UTC)[reply]

Category:Singlish

This category claims that Singlish is an English-based creole. If that is true, it needs to be given its own language code and this category needs to stop being used in English entries. — Ungoliant ^(falai) 13:41, 29 October 2015 (UTC)[reply]

Wouldn't that force us to have separate Singlish sections for the thousands of nouns, adjectives, etc. that are used unaltered from English in Singlish? (Actually, is the same true of e.g. Scots?) Equinox ◑ 11:11, 30 October 2015 (UTC)[reply]

Yes, but if it’s a different language it’s a different language. — Ungoliant ^(falai) 13:47, 30 October 2015 (UTC)[reply]

Singlish lists three sources asserting that it's a creole, and looking at the example sentences, especially those labeled basilectal, I'm inclined to agree. (I certainly wouldn't consider "Dis guy Singrish si beh zai sia" to be an utterance of a dialect of English.) According to Category:Creole or pidgin languages, the code we use for creoles and pidgins is crp, so maybe crp-sng? Incidentally, why do we have both Category:Creole or pidgin languages and Category:Pidgins and creole languages? —Aɴɢʀ (talk) 14:54, 30 October 2015 (UTC)[reply]

Line 172 of Module:category tree/langcatboiler is the culprit. — Ungoliant ^(falai) 15:07, 30 October 2015 (UTC)[reply]

More specifically, that module is not in agreement with Module:families/data, which specifies "creole or pidgin" as the name of the language family. Can someone more versed in editing modules than I am please fix it? —Aɴɢʀ (talk) 19:59, 30 October 2015 (UTC)[reply]

I have merged the categories at Category:Creole or pidgin languages. - -sche (discuss) 23:15, 11 November 2015 (UTC)[reply]

Rollback in LenovoTest01

Add flag rollback LenovoTest01. admin group. LenovoTest01 (talk) 09:11, 30 October 2015 (UTC)[reply]

Why? Who are you? Equinox ◑ 11:07, 30 October 2015 (UTC)[reply]

According to WP, a blocked sockpuppet of w:User:Никита-Родин-2002, who progressed from vandalism to good-faith-but-disruptive cluelessness and incompetence, all using a host of IPs and sockpuppets. Chuck Entz (talk) 21:03, 11 November 2015 (UTC)[reply]

Request denied. Unknown user with zero edits. SemperBlotto (talk) 11:17, 30 October 2015 (UTC)[reply]

Wiktionary:Votes/pl-2015-10/Entry name section

FYI: I created Wiktionary:Votes/pl-2015-10/Entry name section. It is a vote about the "entry name" section of WT:EL. --Daniel Carrero (talk) 18:20, 30 October 2015 (UTC)[reply]