Wiktionary talk:Votes/pl-2014-06/Allowing attested romanizations

Latest comment: 9 years ago by -sche in topic User:-sche/svobodnyx

Notes edit

The proposal may of course be amended as much as necessary, but I'll explain the logic behind the wording I've used:

  1. I chose to say a romanization should be allowed whenever it "is attested in as many works as native-script words in that language are required to be attested in" rather than e.g. "is attested three times", because words in some languages are subject to lower thresholds (viz. 1 attestation, or in some cases 1 mention). And I decided against saying a romanization should be allowed whenever it "meets the CFI to which native-script words in that language are subject", because some users would argue that certain/many/all romanizations fail the parts of WT:CFI which require words to be "used" to "convey mention", etc.
  2. I didn't add any wording to the effect of "...and that the romanization entry consist of the modicum of information needed to allow readers to get to the native-script entry." Such wording could be added if users prefer.
  3. As this vote is currently written, it will allow romanizations of words even if those words (in native script) do not themselves meet CFI. That could be changed if users prefer.

- -sche (discuss) 01:53, 10 June 2014 (UTC)Reply

Oh, and I opted for the word "romanization" instead of "transliteration". That could be changed, if users really want to vote on allowing even e.g. Cyrillic-script forms (transliterations) of ==Persian== and ==English== words. - -sche (discuss) 02:04, 10 June 2014 (UTC)Reply
Some comments on this:
  • No proponent of having entries on transliterations has proposed allowing one-off transliterations. I would require at least three sources spanning a year, and possibly more sources spanning more years to insure that the transliteration was a common one that readers might need defined. For this reason, I would request that this limitation be specified in the proposal, and that the name of this vote be changed to Allowing attested romanizations. I do not feel that the current title accurately represents the position advocated by proponents.
  • Citations must still pass the use/mention distinction. Thus, "our great leader, our rodnoi (our own dear) Father" would not suffice because it only provides the individual word in order to immediately define it.
  • I am at least dubious about using citations where the transliterated word is merely used as part of the proper name of an entity (i.e., "director of the Rodnoi Korpus school in Sofia"), since proper names often contain arbitrary constructions (for example, Boca Raton is the English name of a Florida city, but neither boca nor raton has any meaning in English). I would exclude these and stick to uses of the word as a common part of speech, unless the word is one that is a common element of multiple proper nouns (e.g. Rican).
  • No proponent of having entries on transliterations has proposed allowing words that do not meet the CFI in their native script. I am having a very hard time picturing a scenario where a word is not used enough in its native script to merit an entry, and yet is the subject of several instances of transliteration. In fact, I would suspect that any such word is merely an English-language false cognate, for which this discussion does not apply. I would require that an attested, CFI-worthy entry exist for the native-script term before a romanization could exist for it.
  • Although I would prefer to allow all transliterations rather than merely romanizations, that may be a matter for a different discussion. Certainly, citations exist along the lines of Африкан Николаевич Криштофович, Геологический Обзор Стран Дальнего Востока (1932), p. 87: "Трансгрессия, вероятно, доходила на восток до Нин- нина в южном Гуаньси, гле эти слои залегают прямо на известняке Мяо- као (силур)", where "гуаньси" is merely the Cyrillic transliteration of 关系 (guānxi). Of course, we have an entry on guanxi in English as a loanword, so гуаньси may well be a loanword also.
  • Isn't Zhongguo (romanization of 中國) already permitted? It's straight-up Mandarin Chinese.
Cheers! bd2412 T 14:38, 10 June 2014 (UTC)Reply
Support renaming "Allowing all romanizations" to "Allowing attested romanizations", provided the vote stays in the misleading setup by which we need a vote to allow something that was not forbidden in the first place. --Dan Polansky (talk) 17:51, 10 June 2014 (UTC)Reply
Re "use/mention distinction": WT:CFI explicitly accepts citations which use words and then immediately define them; it gives "They raised the jib (a small sail forward of the mainsail)" as an example. What distinguishes the "rodnoi" citation from the "jib" one — anything other than italics?
Re other points: OK, I will remove the "Rodnoi Korpus" example, add the requirement that romanizations be attested three times, add the requirement that native-script entries meet CFI, and rename the vote.
Re "Isn't Zhongguo (romanization of 中國) already permitted": no, per Wiktionary:Votes/pl-2010-10/Treatment of toneless pinyin other than syllables, toneless pinyin romanizations of words which are longer than one syllable are not allowed in the main namespace (hence Zhongguo has been deleted three times, with the most recent deletion summary linking to the allowed toned form). Should that ban be left in place?
PS in the 2005 "Shoujo Manga" citation, is "shoujo" English or romanized Japanese, in your view? (It's derived from 少女 one way or another, but the Wiktionary-approved romaji form of that word is shōjo.) The Natsumi Konjoh and Koge Donbo I parse as using strings of 4–6 romanizations of Japanese words, but the 2005 "Shoujo Manga" citation I would parse as entirely English, but I can see how it could alternatively be parsed as "shoujoJapanese mangaJapanese techniquesEnglish".
- -sche (discuss) 18:36, 10 June 2014 (UTC)Reply

Status quo edit

AFAIK, there is no policy forbidding attested romanizations from being included. The vote you should have created is "Forbidding attested romanizations unless they have granted an exception". --Dan Polansky (talk) 17:29, 10 June 2014 (UTC)Reply

For the benefit of anyone coming here without having read the BP thread, I will note that this subject is discussed in somewhat more detail there.
I and some others feel that a vote is necessary to allow romanizations, because until now Wiktionary has tagged romanizations with Template:wrongscript, and/or simply moved them to the correct (native) script and/or deleted them, unless they were specifically allowed by a vote. I have noted that votes have been accepted as necessary even for languages such as Gothic which are more often found in romanized form than in native script, and I have noted that after one vote to allow romanizations of Punic and some other languages failed, a second vote washeld (and passed) before romanizations of Punic et al were allowed. (See also Template talk:romanization of#Deletion_debate.)
On the other hand, Dan interprets WT:CFI as not containing an explicit ban on romanizations.
The disagreement is somewhat reminiscent to me of those that occur when users argue that citing news websites / blogs / etc is acceptable because, in their interpretation, WT:CFI does not contain an explicit ban on news websites / blogs / etc.
- -sche (discuss) 23:34, 10 June 2014 (UTC)Reply
It's not really reminiscent. News websites are considered not to be permanently recorded media, a term used in WT:CFI#Attestation. There is nothing in WT:CFI that corresponds to a ban on romanizations; in particular, there is no exclusion regulation in WT:CFI that uses a broader phrase of which romanizations would be a special case.
On another note, the fact that votes were created to explicitly allow romanizations is insufficient evidence for romanizations being forbidden before these votes; it is merely an evidence that at least one editor deemed the inclusion of romanizations controversial, or that he intended to have the inclusion explicitly codified. Similarly, the existence of Wiktionary:Votes/pl-2014-04/Keeping common misspellings is no evidence for the claim that, before the vote, common misspellings were excluded from Wiktionary per policy or common practice; contrary is true; I created the vote since some editors started to vote in RFD for exclusion of common misspellings, without linking to relevant policy and contrary to previous common practice. --Dan Polansky (talk) 08:57, 14 June 2014 (UTC)Reply

Drastic simplification edit

The vote creates an impression of complexity where there is rather little, IMHO. The proposal seems to say not much else but this:

"Romanizations shall be subject to WT:CFI, including WT:CFI#Attestation and WT:CFI#Idiomaticity, rather than being excluded by default. For some languages, romanizations can be included even if unattested as long as the native-script form being romanized is attested, as per votes establishing that on a per-language basis."

--Dan Polansky (talk) 17:37, 10 June 2014 (UTC)Reply

I think drastic simplification of this vote is unlikely to happen. But since multiple editors maintain that romanizations are excluded by default, the following vote should pass with their support: Wiktionary:Votes/pl-2014-06/Excluding romanizations by default. The wording of the vote is simple by design. By contrast, Wiktionary:Votes/pl-2014-06/Allowing attested romanizations may fail over a disagreement over wording and its implications. --Dan Polansky (talk) 07:04, 14 June 2014 (UTC)Reply

Wording: is attested three times edit

The wording "is attested three times as per WT:CFI#Attestation" should ideally be improved, IMHO. It incorporates part of WT:ATTEST without incorporating other parts: it incorporates three times, without incorporating e.g. conveying meaning. Furthermore, it misleads, since being attested involves having the requisite number of independed quotations; on a strict reading, "being attested three times" does not really mean anything. IMHO, "is attested as per WT:CFI#Attestation" is the best one can do, since WT:CFI#Attestation already specifies what "attested" means.--Dan Polansky (talk) 18:22, 10 June 2014 (UTC)Reply

The issues with saying "is attested as per WT:CFI#Attestation" are that (a) WT:CFI#Attestation considers words attested if they're used in well-known works, and moreover (b) words in most of the living languages Wiktionary covers are considered attested if they have "only one use or mention". If one wishes, as BD does, to require that romanizations of words in these languages be attested by three (or more) uses, then it is necessary to spell that requirement out, since it is higher than the requirment that would be imposed by saying "attested per WT:CFI#Attestation". - -sche (discuss) 18:56, 10 June 2014 (UTC)Reply
Re "conveying meaning": I'm not sure it would be wise to include wording about "use" or "meaning" in this vote. If such wording is included, and the vote passes, the users who have already expressed the opinion that romanizations are not "uses" and/or don't convey meaning will presumably continue to express that opinion and thus argue that the vote has not actually allowed some or all of the romanizations which proponents probably read it as allowing. In short, I'd like this vote to clarify the inclusion or exclusion of romanizations, and I think such wording would actually invite continued unclarity and disagreement. - -sche (discuss) 19:08, 10 June 2014 (UTC)Reply
I disagree. Compare the following:
But, just as always when faced by crisis, the peasant sought solace in the rodnoi village and in the all-protective commune.
But, just as always when faced by crisis, the peasant sought solace in the gkkkkg village and in the all-protective commune.
I believe that the average reader will assume that the "rodnoi" in the first sentence means something (which they may need to look up in a dictionary), while the "gkkkkg" in the second sentence is meaningless gibberish. Obviously, every word in "ih'dinā l-ṣirāṭa l-mus'taqīma ṣirāṭa alladhīna anʿamta ʿalayhim ghayri l-maghḍūbi ʿalayhim walā l-ḍālīna" or "Henansheng 1937 xianyi shiling zhuangding tongji biao. Zhongguo dierlishi danganguan" conveys meaning, even if assistance is needed to understand that meaning. bd2412 T 21:19, 10 June 2014 (UTC)Reply
I disagree with this criterium and I won't be supporting it, except perhaps as a stepping stone. Attestation of the romanization itself should not be required in the case that the transliteration follows a regular and established scheme, such as IAST for Sanskrit. I don't see the practical value in requiring attestation for transliterations separately, it would increase maintenance for us and not serve any purpose to our users to exclude them. After all, does Wiktionary's quality really improve in any way by excluding them? That should really be our primary concern. The criteria we use for Gothic or Japanese are much more workable. —CodeCat 21:21, 10 June 2014 (UTC)Reply
There's a separate vote proposed for Sanskrit that doesn't include this provision. However, to the extent that common lay romanizations exist (and do not necessarily follow "a regular and established scheme"), their inclusion should be CFI attestation-based. Including romanizations of certain languages, Cyrillic languages for example, seems to engender more opposition. bd2412 T 21:41, 10 June 2014 (UTC)Reply
For Cyrillic I imagine the main problem is that although transliteration standards exist, there are many of them and even then people widely use nonstandard schemes too. Even Wiktionary uses a nonstandard transliteration for some Russian words... —CodeCat 22:56, 10 June 2014 (UTC)Reply
Here we go again. None of "standard transliterations" is used in Russian-foreign language dictionaries. Languages like Russian, Greek are considered easy, in term of the script. If a transliteration is used, as in textbooks or phrasebooks, it is phonetic, not literal and is always customised for specific books, so is Wiktionary transliteration for Russian. There is no need for romanised Russian or Greek entries, anyway. Specific language policies, common practice and common sense dictate that e.g. Russian is written in Cyrillic, Greek uses Greek alphabet, Hindi is written in Devanagari and Arabic in Arabic script, not in Roman letters. --Anatoli (обсудить/вклад) 23:37, 10 June 2014 (UTC)Reply
We aren't looking to add these because these words need some system of transliteration (at least, that's not my thinking). The whole purpose of maintaining an attestation requirement is so that our entries reflect words as they are used in the real world. In other words, we have entries because readers will see things that look like words that they would reasonably expect that a dictionary would helpfully define for them. I say, let's be helpful. bd2412 T 00:49, 11 June 2014 (UTC)Reply
For a simple Russian word like хорошо́ (xorošó), you will find xorošó/xorošo, horošó/horošo, khorošó/khorošo, khoroshó/khorosho, or more phonetically also appearing in books, phrasebooks - kharasho or harasho. They are all unnecessary, they are not in the native script and we don't cater for all possible transliterations for each language in proper, native script entries, so these are not even searchable.
I'm worried about the future state of Wiktionary, if we allow various transliterations as entries, it looks it's going to be at the expense of native scripts. Users and editors will simply believe that it's OK to write in Roman letters in any language, as was the case with Pinyin and Romaji entries. The active proponents of these introductions are not even actively working in foreign languages, such as Sanskrit, and not planning too, only worried about their romanisations. It's a worrying trend that arguments used that Sanskrit should be written in Roman letters, not Devanagari. Search functionality can be improved, for that we don't need mass-introduction of romanised entries in languages, normally not written in Roman. --Anatoli (обсудить/вклад) 01:06, 11 June 2014 (UTC)Reply
Correct me if I'm wrong, but uses in dictionaries and phrasebooks would not count as uses for the CFI, would they? If there is a "worrying trend" to address, it seems to me that it is the practice of authors in the world generally romanizing these scripts. We are merely here to provide definitions for words used by authors. It might be just as worrying that we include common misspellings, which might encourage people to think that these spellings are legitimate, or that we include grammatically disfavored constructions like ain't and eye dialect spellings like cuméquié. I would also note that requiring strict attestation (which the vast majority of our entries do not have) means that we will not be autogenerating these entries, but that editors who wish to make them will have to make them based on finding the attestations first. I do not anticipate a flood of new entries based on this criteria; rather, I expect that very common transliterations and transliterations that are confusingly similar to existing words will be the ones to be made. With respect to these romanizations, would it be helpful if the entries contained a note stating that these words are not usually written in their romanized form? bd2412 T 01:56, 11 June 2014 (UTC)Reply
A published phrasebook or textbook may meet CFI by the current proposal (they would also be uses, not mentions). I think hard redirects, using {{also}} would be sufficient, if the existing search functionality is not helping. If a user successfully arrives at the proper script entry, I don't think they need to be further explained that the Roman spelling they used is the transliteration (usually appears in brackets or in special boxes). Common misspellings are not the same thing, that's a native feature of languages and they are common by definition. I don't quite follow your real intention, motive or interest, looking at your various posts. Are you interested in Sanskrit? Are you learning it? Do you have trouble finding entries? Is it a problem only for words that use diacritics? Any chosen Wiktionary transliteration may not cover all possible ways a transliteration of a word may appear in print, a Sanskrit term ददाति may appear as "dádāti" or as "dadāti". If a word is a proper noun, it may be even considered an English word, it may qualify for inclusion, even if it uses Sanskrit-specific diacritics. --Anatoli (обсудить/вклад) 02:18, 11 June 2014 (UTC)Reply
The problem is, I'm not interested in Sanskrit at all. I am interested in being able to define words that I come across. As I have explained elsewhere, I specifically came across mahā while fixing disambiguation links on Wikipedia, and needed a Wiktionary entry to point some of those links to. I searched here, but putting "mahā" in the search engine unhelpfully took me to maha; I was only able to find mahā by using my admin bit (which most readers will not have) to look at the previously deleted article at this title. If mahā happened to be a word in any language written with Latin script, I wouldn't have found it that way either. As it happens, this is not the first transliteration for which I've seen the issue of inclusion arise. I created the entry ayubowan all the way back in 2006. I'm sure I just came across it in a book or a newspaper somewhere. It was there as a Sinhalese entry for over seven years before anyone objected, and after an RfD it was ultimately kept as an "English" word. It may well be a "regional" loanword, but I continue to find it strange that it sits here as part of the English corpus, not the Sinhalese. bd2412 T 02:48, 11 June 2014 (UTC)Reply
ayubowan was found to be an English word by Wiktionarians, whether it's true or not, it's beyond the point of this discussion. The proper Sinhalese entry ආයුබෝවන් can be found either by using "ayubowan" or "āyubōvan" in the search, regardless whether ayubowan exists or not. The same story is with "mahā" -mahā, which can link you to महत्, మహా or maha (which now has {{also|mahā}}). Whether it's easy or hard to find words by their transliterations, it's a technical feature of Wiktionary. It's always harder to find shorter words, even if they're spelled in Roman letters. --Anatoli (обсудить/вклад) 03:00, 11 June 2014 (UTC)Reply
Flip the ayubowan situation around, though. If that is "English", what is to prevent the inclusion of every transliteration from any language found in English running text under the theory that it is an "English" loanword? Isn't that just as bad (and apparently permissible, as of now)? bd2412 T 03:38, 11 June 2014 (UTC)Reply
I don't think it's the same. English, more than other languages, absorbs words form all over the world. English speakers in Sri Lanka use "ayubowan" even if they speak English, it gives their speech a special flavour. In short, if a word penetrated English (or other language) in some form, then it's English. As I said, it's not an RFV discussion, I'm not trying to prove that "ayubowan" is an English word, if you think that it was not permissable, you can reopen the RFD/RFV. Every transliteration may only be applicable to specific non-Roman languages. It needs to be proven that that transliteration IS indeed an English loanword (without double quotes). Yes, it would be very bad, not just as bad, to report transliterations as English words. You have to be clear what you're really trying to achieve, either have L2 English entries as transliterations or another language entries (soft/hard redirects/full-blown entries). --Anatoli (обсудить/вклад) 03:51, 11 June 2014 (UTC)Reply

Romanizations allowed by other votes edit

Should a technical note be added to this vote to make explicit that romanizations which are allowed subject to lower requirements (e.g. Punic romanizations and toned Pinyin, which per other votes are not required to be attested at all) are not affected by this vote and continue to be allowed subject to those lower requirements? I don't think it's necessary, but it might prevent people who don't read through the reams of RFD, BP and talk page discussion from thinking that this vote would apply attestation requirements to Punic, Pinyin, etc. PS, "lower requirements" could be changed to "other requirements" if people want to keep the above-mentioned ban on multi-syllabic toneless pinyin intact. - -sche (discuss) 22:31, 10 June 2014 (UTC)Reply

There should probably be a link to the agreements that were made in the past concerning transliterations, and explicitly state that the vote only applies to the languages that are not covered by those past votes. —CodeCat 22:57, 10 June 2014 (UTC)Reply
I don't think that much specificity is needed. (For one thing, it would require tracking down all the votes Wiktionary has had on transliterations.) And, as I stated, some people may want to overturn the previous arrangements which instituted higher requirements than this vote. - -sche (discuss) 23:38, 10 June 2014 (UTC)Reply
I certainly agree that this is not a vote intended to raise the requirements over those struck in any previous discussion. bd2412 T 00:50, 11 June 2014 (UTC)Reply

One possible format of romanization entries edit

I don't think votes should decide details of format, so I emphasize that this is not part of the proposal, but I couldn't think of a better place to float this thought about one possible format of romanization entries:
Have a template similar to {{alternative form of}}, which by default would just display "Romanization of", but which would have a parameter that could be set to specify particular romanization schemes:

  • {{romanization of|foobar|from=IAST|lang=sa}} → IAST romanization of foobar.
  • {{romanization of|foobar|from=nonstandard|lang=sa}} → Nonstandard romanization of foobar.
  • {{romanization of|foobar|from=nonstandard-ru|lang=ru}} → Nonstandard romanization of foobar. (Notice where the first link goes.) (Alternatively, if we were very clever, we could make the template recognize that the combination of from=nonstandard and lang=ru should result in the link going to w:Informal romanizations of Russian.)

We'd have to be careful, though: many romanization systems have considerable overlap, so entries like [[i#Russian]] could end up looking like:

...or we could just say something like "Common romanization of" in such cases. In fact, if we were clever and wanted to, we could even make it so that it was possible to specify all those parameters, and the template just knew that if more than some number (say, 3) of schemes was specified, it should reduce the displayed text to "Common romanization of". - -sche (discuss) 03:17, 11 June 2014 (UTC)Reply

What's the point of envisaging all this chaos? Wouldn't it be more worthwhile to develop gadgets which allow language-specific, transliteration/transcription field-specific searches, or write reverse transliteration modules and use them in an advanced ambiguous transliteration search function? Wyang (talk) 03:54, 11 June 2014 (UTC)Reply

merge the votes edit

I propose that, instead of having two separate up-or-down votes, the proposal in this vote and the proposal at [[Wiktionary:Votes/pl-2014-06/Excluding romanizations by default]] be merged into one vote that contrasts them and any other options people think of before the vote starts. That would be simpler (one vote) and cleaner (allow people to express their true opinions instead of merely voting yea or nay on the proposers' wording).​—msh210 (talk) 17:19, 16 June 2014 (UTC)Reply

Discussion has continued at [[Wiktionary talk:Votes/pl-2014-06/Excluding romanizations by default#merge the votes]].​—msh210 (talk) 23:52, 16 June 2014 (UTC)Reply

User:-sche/svobodnyx edit

User:-sche/svobodnyx is more than a "modicum of information...". I strongly appose citations and anything other than headers and links:

This below is the modicum, if we follow other (allowed) romanisation entries

Russian
Romanization

svobodnyx

  1. Romanization of свободных.

Citations (if they are required), should be moved to citations page. --Anatoli (обсудить/вклад) 03:11, 17 June 2014 (UTC)Reply

The only difference between your format and mine, AFAICT, is that mine includes citations. Because this is a vote on including attested romanizations if and only if they are attested, citations have to be present. They could be included on the citations page, with {{seeCites}} used in the entry, but that seems entirely unnecessary. - -sche (discuss) 04:41, 17 June 2014 (UTC)Reply
That's what I meant - on the citations page, not the entry page, otherwise it looks like a full language entry, which it isn't. --Anatoli (обсудить/вклад) 05:04, 17 June 2014 (UTC)Reply
Having the citations only on the Citations: pages as you propose is okay with me. --Dan Polansky (talk) 19:09, 18 June 2014 (UTC)Reply
This is exactly what I had in mind. Here's a cite, by the way:
  • 1968, Kaufman, V. Sh. O raspoznavanii nekotoryx svojstv kontekstno-svobodnyx grammatik, I-ya Vsesojuznaja konferencija po programmirovaniju, Kiev, 1968. [Title of article reported in Computer and Automation Institute, Computational linguistics and computer languages (1969), p. 92].
Cheers! bd2412 T 20:38, 18 June 2014 (UTC)Reply
@BD2412 What is your preference regarding the placement on the citations? Is requiring them to be placed on a citations page rather than in the entry OK? - -sche (discuss) 20:59, 18 June 2014 (UTC)Reply
I have no preference, but I think it's a non-issue, since citations on the entry page are hidden anyway. If they are on the citations page, then the entry requires a header and link to the citations page, which actually takes up a bigger footprint. See, e.g., noctivagant, which has both. I would add that in my opinion, they serve different functions. Quotes in the entry are definitional, showing the reader how the word is typically used to elucidate its meaning. Citations on the citations page demonstrate the existence of the term for the etymological record, and (ideally) give a sense of how long it has been in use, how current it is, and how it has evolved in usage over time. We've never really hammered these things out, but that seems like common sense to me. bd2412 T 22:44, 18 June 2014 (UTC)Reply
OK, I have changed the format of the example entry. - -sche (discuss) 01:25, 19 June 2014 (UTC)Reply
Return to the project page "Votes/pl-2014-06/Allowing attested romanizations".