Wiktionary:Votes/pl-2014-06/Allowing attested romanizations

Allowing attested romanizations edit

Voting on: Whenever a romanization of a word in a particular language (whether it is an ad-hoc romanization or one derived from a formal romanization scheme) is attested three times as per WT:CFI#Attestation, and the word in its native script meets WT:CFI, the romanization will have an entry. That entry will contain only the modicum of information needed to allow readers to get to the native-script entry. For example: svobodnyx is attested in these citations, and the word it is a romanization of (свободных) meets CFI, so svobodnyx will have an entry with the L2 header ==Russian== which will look about like this (link).

Rationale: See the BP discussion, particularly the section #What are romanizations for?. In short: readers may encounter and want to look up romanizations (e.g. when bibliographies cite, in transliterated form, works written in Chinese or Russian, users may want to look up the words that make up the title), and making entries for those romanizations is the best way of getting those readers to the native-script forms of the words.

Technical note: romanizations which are subject to lower requirements due to other votes (e.g. romanizations of Gothic, which per other votes are not required to be attested) continue to be subject to those lower requirements.

Vote starts: 00:01, 21 June 2014 (UTC)
Vote ends: 23:59, 20 July 2014 (UTC)

Vote created: - -sche (discuss) 01:44, 10 June 2014 (UTC)[reply]
Discussion:
Wiktionary:Beer parlour/2014/June#Prescriptivism as to common lay transliterations.

Wiktionary:Requests_for_deletion#mahā

Wiktionary talk:Votes/pl-2014-06/Romanization of Sanskrit

Wiktionary talk:Votes/pl-2014-06/Allowing attested romanizations

Support edit

Support, but I would prefer that this vote be withdrawn, because the outcome of the vote at Wiktionary:Votes/pl-2014-06/Excluding romanizations by default will likely render it moot. Absent agreement to exclude, there is no need for a vote to include. It was proposed that these votes be combined, and that should have been done to provide the proper range of options. A conflicting outcome, of course, would default to the existing CFI, absent a rule stating otherwise. bd2412 T 00:01, 24 June 2014 (UTC)[reply]
The fact that people can't even agree on what the status quo is certainly adds a level of ...interestingness... to this series of discussions. You and Dan think romanizations are already allowed and only the vote to exclude them is needed; I and Anatoli and Ungoliant and seemingly also Renard think they've always been excluded and so [only] the vote to allow them is needed... I do hope one or the other vote passes (I think the same person should close both, when it's time); otherwise, we'll have run quite a caucus race to get back to still disagreeing on both what the status quo is and what should be done going forward. lol - -sche (discuss) 01:20, 24 June 2014 (UTC)[reply]
As an attorney, I prefer to adhere to the rule of law, and not a rule of anarchy pulled out of someone's backsides. bd2412 T 02:37, 24 June 2014 (UTC)[reply]
I prefer the rule of law, too. :) If only we agreed on what the law was. You and Dan prefer to interpret "romanizations" as "words" and interpret CFI as including them, yes? But surely you've noticed that many other users have disagreed with your interpretations, hence my comment. (Case law is demonstrably that romanizations have been deleted or moved unless a vote has allowed them. If anything were to be characterized as anarchic, attempting to overturn that precedent through simple re-interpretation might be it.) - -sche (discuss) 03:14, 24 June 2014 (UTC)[reply]
Actually, I doubt that "precedent" supports a blanket exclusion. We have tens of thousands of romanizations, including a number of romanizations that are not from Asian scripts, but are included in this corpus as loanwords or the like. The only cases that I'm aware of are ayubowan, which was kept as an "English" word, and mahā, which is still an open discussion itself. No one has batted an eye at the proposition that tovarich is an "English" word when it is obviously a transliteration. If that is the standard we are using, there are citations enough to enter tens of thousands more transliterated words from many scripts and describe them as being as "English" as apple pie. bd2412 T 12:32, 24 June 2014 (UTC)[reply]
To be clear, when I speak of romanizations, I'm speaking of entries that have the L3 ===Romanization===, and have definitions like "romanization of [foo]", and have the same L2 as the original (native-script) words do. I'm not speaking of loanwords, which in most cases don't contain the word "romanization" anywhere, even in the etymology section. The line between the two is not 100% unblurry, but citations of "ayubowan" in the middle of [otherwise] English sentences suggest it's a loanword, whereas "O raspoznavanii nekotoryx svojstv kontekstno-svobodnyx grammatik, I-ya Vsesojuznaja konferencija po programmirovaniju" is (IMO) clearly a quotation in romanized form of a long string from another language — in this case, a proper noun, name of a specific work.
Importantly, if citations show something to be a loanword, it should have an entry even if the same string is also a romanization — hence e.g. both shojo and shojo (shojo) exist.
- -sche (discuss) 17:25, 24 June 2014 (UTC)[reply]
@-sche, I would certainly agree that there is room for a great deal of flexibility on how entries are presented, what sorts of headers or limitations are used for the definitions themselves. Surely there is some middle ground that we can reach here. bd2412 T 18:49, 27 June 2014 (UTC)[reply]
Support per nom. If it can be attested and doesn't violate NOT, it should be allowed an entry. Purplebackpack89 ^{(Notes Taken) (Locker)} 14:10, 24 June 2014 (UTC)[reply]
Support as a straight-forward reading of CFI. We should make CFI explicitly support uses like Latin macron dropping and Esperanto diacritic normalization as well as not every single transliteration form ever, but that seems to be the role of a different proposal.--Prosfilaes (talk) 20:15, 24 June 2014 (UTC)[reply]

Oppose edit

Oppose.
Languages can borrow terms from each other even if they use different scripts. For example, Флорида (Florida) is a Russian word and Tokyo is an English word.

Languages have native scripts. For example, Latin is the native script of modern English but is not the native script of Arabic, even though non-Latin scripts may be used to write English and Latin may be used to write Arabic.

Just because an unbroken sequence of letters is attested, it doesn’t mean we should include it. For example, we shouldn’t include rare misspellings and typos, typographical variants such as words in upper case due to being in the beginning of a sentence, or in all-caps, or using obsolete type variants (perſon), words with auxiliary diacritics (Latin macrons, Serbo-Croatian tone diacritics), words only used in reference to a fictional universe nor random keyboard-smashings that happen to be citable. I hope that transliterations will remain in this list.

In some cases it is useful to have romanisation entries, like Gothic which uses a script that few computers support by default. In my opinion, our current system of allowing individual languages to have them after case-by-case analysis is sufficient. Using {{also}} to include words whose romanisation matches the pagename also helps.

(note: I am talking about romanisations as a level 3 heading, not about words whose etymology involves romanisation, like Tokyo). — Ungoliant ^(falai) 16:15, 23 June 2014 (UTC)[reply]
Languages can borrow terms from each other, but there will still be some terms that fall in the cracks. If people were more liberal about permitting italicized terms used in running English text, there would be less push for romanization entries.

Languages do not have native scripts. Languages have various scripts that are used to write them, frequently several scripts reaching a dominant position at some point and place, and most languages being written in multiple scripts. We may have a hard time attesting Arabic in Latin script, but dismissing a whole writing tradition on phones because of some prescriptivist attitude about which script Arabic is written in is absurd.

Latin macrons and Serbo-Croatian tone diacritics are about reducing complexity without losing information. I'm part of the informal consensus that Esperanto entries should have standard diacritic use, instead of a dozen different ASCIIfications, so I do understand that. But it's important that we don't throw out important data, and this started with someone who could not find a transliterated word in Wiktionary. Essentialist platonic arguments don't do anything to convince me that we should ignore reality.

To claim we write Gothic in the Latin scripts because of computers is to deny history and the current reality of the language. Go find someone who speaks Gothic and ask them to translate something for you; they will write down the answer in the Latin script, just like everyone who knows the language has done for over a hundred years, in the same script as every book of Gothic text ever printed. We can not and should not blind our eyes to how people use language in reality.--Prosfilaes (talk) 00:26, 24 June 2014 (UTC)[reply]
Written languages have scripts, and Wiktionary is a text-based resource, so we must organise our content based on writing. If they don’t, you won’t mind if I reply to your comments in whatever script I prefer? ᛁᚾ ᚱᚢᚾᛁᚳ ᛋᚳᚱᛁᛈᛏ?

Most Gothic is written in Latin script because philologists writing about it never cared to use the same script as the surviving texts. That few computers support Gothic by default is just one more reason why including Gothic romanisations is useful. — Ungoliant ^(falai) 01:30, 24 June 2014 (UTC)[reply]
Can you please convey your point without the sarcasm? This is the English language Wiktionary. You should not need to be told that we have discussions in English, using Latin script, not what comes across in my browser (and probably in those of others) as a series of boxes. Our definitions and headers are written in the English language, and in the Latin script, and it should also come as no surprise to you that our readers are likely to be looking for things in Latin text. If it amuses you to thwart and confuse readers, and prevent them from finding what they are looking for, I suspect that there are other websites better suited to providing that kind of entertainment. bd2412 T 02:34, 24 June 2014 (UTC)[reply]

No matter how you cut it, Deseret is a script of English; it was created by English speakers for English and used only for English. Does that mean it's okay to post in Deseret? Written texts have scripts, but users of languages can be remarkably flexible in what scripts they use, in a way they aren't, e.g., phonologically.

Which makes Latin the most common script for Gothic, and what we have for Gothic not romanizations. That's the script people use for Gothic, like it or not.--Prosfilaes (talk) 19:57, 24 June 2014 (UTC)[reply]
Oppose. There are other ways to make users searching for transliterations happy. The proposed action is not the wisest and most efficient way of achieving that. Wyang (talk) 23:54, 23 June 2014 (UTC)[reply]
Oppose --Anatoli ^{(обсудить}/^вклад) 06:14, 24 June 2014 (UTC)[reply]
Oppose --Dijan (talk) 06:42, 24 June 2014 (UTC)[reply]
Oppose -- Liliana • 19:21, 24 June 2014 (UTC)[reply]
Oppose - -sche (discuss) 16:37, 13 July 2014 (UTC)[reply]
Oppose, I think it would be bad for us tp include romanizations (which aren't words any more than a picture is a word). Renard Migrant (talk) 16:52, 13 July 2014 (UTC)[reply]

Abstain edit

Decision edit

Fails 3–7 (30%). Thus, no change in policy is effected.—msh210℠ (talk) 18:30, 23 July 2014 (UTC)[reply]