Wiktionary:Votes/2019-05/Lemmatizing Akkadian words in their transliteration

Lemmatizing Akkadian terms in their transliteration

Voting on: Moving the current content of Akkadian entries to its transliteration, and leaving a hard redirect when possible (except for single-signed words), leaving a soft one otherwise. The cuneiform script would be included in the head of the word for all entries.

Rationale: The voters only vote on the proposed action, not on the rationale.

Firstly, Cuneiform would still be displayed. They would be in the head of every entry, as in Egyptian hieroglyphs, and in inflection-tables. Therefore, no content would be lost by this vote.

Books and dictionaries lemmatize words according to their transliteration or transcriptions. These are the very sources we use for our Akkadian entries. They rarely ever display cuneiforms, and never to list words, mostly to list signs.

This is because very few people actually know cuneiform script, and those who know enough, do not use Wiktionary as a resource. Most of the readers do not even pay attention to the signs, unless they are trying to learn the script. In that case, the script would still be displayed. So they have no benefit from words being lemmatized in cuneiform.

Moreover, no one would ever give themselves the trouble to search for an Akkadian word in cuneiforms, specially, given that no one ever publishes anything in them, and whenever they do, they never use Unicode, but images instead. This makes most of the content inaccessible for readers. As a result, most pages have less than 5 views per month. It'd be ridiculous for us to expect that our readers know the script. And if we acknowledge they don't, then why listing words for them in a script we know they can't read?

Cuneiform signs changed depending on the period, area or scribe, while Unicode only renders one version for each sign. Furthermore, there are lapidary and cursive signs, where the first are inscribed in stone and the second are pressed into clay, and they can differ greatly. So lemmatizing entries in them does not guarantee an accurate representation of any given word if context isn't taken into account. Therefore, arguing for accuracy would be pointless. This is why books rely mostly on images instead of Unicode. And maybe we should consider in the future doing the same if we want to include different dialects.

It also makes it unnecessarily difficult to look for words in categories. For example, even though I am familiar with the script, being mostly the main editor of Hittite, it is still too cumbersome for me to quickly browse for a word in its categories. This also makes it extremely tedious to edit, since signs can not be quickly written, and I am forced to copypaste cuneiform words countless times. Besides, not all users might be able to display the signs properly, which would make it pointless for them to have words lemmatized in cuneiforms if they can’t even see them.

Egyptian words are lemmatized in their transliteration. And in spite that Unicode has characters for hieroglyphs, they aren’t used for Egyptian entries. So this wouldn’t mean that a huge exception is being done for Akkadian, since it’s been done before, for very similar reasons. Currently we have over 2 thousand Egyptian lemmas, while our Akkadian entries do not even make a tenth of that number. On average the last 10 Akkadian entries had a creation frequency of roughly 20 days. If Akkadian was lemmatized in roman script, it is quite likely that it would be able to grow, and could acquire regular editors, which currently doesn't have.

Furthermore, lemmatizing at cuneiform restricts the lemmas admissible for entries. Roots and sometimes other morphemes cannot be written in cuneiform, since it is a syllabary, and it's impossible to write a sequence of three consecutive consonants in it. This means that we should either write them in Latin script, or not write them at all, which wouldn't be a favorable option, since as in other semitic languages, Akkadian words are often listed by root rather than alphabetization, so it'd be important to have a list of Akkadian roots and their derivatives as well as entries.

There is no real benefit of lemmatizing Akkadian in cuneiforms, it is done at the expense of the legibility of our entries, as well as the accessibility of our content. So why sacrifice so much for so little?

Schedule:

Vote starts: 00:00, 22 May 2019 (UTC)
Vote ends: 23:59, 22 July 2019 (UTC)
Vote created: Tom 144 (𒄩𒇻𒅗𒀸) 18:29, 11 May 2019 (UTC)[reply]

Discussion:

Support

Support – Tom 144 (𒄩𒇻𒅗𒀸) 00:02, 22 May 2019 (UTC)[reply]
Support, and I'd support doing this for Hittite as well. Not even scholars use the cuneiform. —Mahāgaja · talk 07:01, 10 June 2019 (UTC)[reply]
Support. I am strongly against any proposal that suggests we should not write any language in its official standard script because of obscurity (this includes Egyptian). However, the technical difficulties with Unicode as of this moment are a serious issue. I do not know if other signs are as problematic or even worse, but for example the w:DINGIR sign is seriously different between Sumerian and Akkadian, yet they both share the same codepoint. Readers should get the right signs or no signs at all if their devices do not support the character block, but to give them signs that are that different knowing that there is such problem with Unicode is not professional at all. So until the Unicode issue gets resolved, for the sake of accuracy I support this proposal. --Emascandam (talk) 02:11, 14 June 2019 (UTC)[reply]
That seems a weird reason to support. I actually don't see this as an issue. A simple solution is to have an Akkadian specific font. If a specific unicode character is indeed the better solution, we can always just use a bot to convert entries when that's released. No reason to throw the baby out with the bath water. --{{victar|talk}} 04:18, 14 June 2019 (UTC)[reply]
It is indeed true that Unicode represents inaccurately most Akkadian cuneiform signs. And as I said in the rationale, signs changed according to period and dialect, so more than a single set of signs would be required if we want to include different dialects at different stages (which I intend to do). Using fonts however, we would only be able to display one set of signs at a time, so it would use the same fonts for Babylonian, Assyrian, Sumerian, and Hittite, all of which require a different set. Furthermore, they would have to be installed by the reader, which I highly doubt will happen. These are the reasons not a single assyriological book uses Unicode but rely on images instead. And as I've already said, that's the proper way to deal with this issue, but that's matter of a different vote. –– Tom 144 (𒄩𒇻𒅗𒀸) 15:40, 14 June 2019 (UTC)[reply]
No, fonts on wikt can be both language and script dependent, so you ca have one font for Akkadian, and another Sumerian, even if they are the same script. --{{victar|talk}} 16:27, 14 June 2019 (UTC)[reply]
If your system has the proper fonts, 述 should appear differently in multiple places on the page. —Suzukaze-c ◇◇ 17:58, 14 June 2019 (UTC)[reply]

I wasn't aware of that. However, I do not see how you'd attempt to do this, because if the signs on the content of the entry are dependent on the language but the fonts of the title remain the same (which is what we're really discussing), the problem hasn't been fixed. The only thing this would achieve is confuse the viewer, not knowing if the correct form is the one in the title or the one in the content, see for example 𒀭 (DINGIR), where there are different contradictory images representing different stages of the sign. – Tom 144 (𒄩𒇻𒅗𒀸) 18:22, 14 June 2019 (UTC)[reply]
The head and links would be correct though. It might be too much to ask for the page header as well. I could make the same complaint for Middle Persian written in Avestan, which doesn't use Avestan ligatures, but it's really not that big of a deal if it's just the page name. --{{victar|talk}} 20:30, 14 June 2019 (UTC)[reply]

Yeah, it seems to be a matter of markup. Wouldn’t it be even stranger if “the same sign” is multiple times encoded and thus what is written DINGIR is spread across multiple pages? Also historical representations of Chinese letters don’t get encoded separately either. Fay Freak (talk) 20:14, 22 June 2019 (UTC)[reply]
Support trusting in the judgement of Tom 144 as being the primary editor of Akkadian content. —Suzukaze-c ◇◇ 04:28, 14 June 2019 (UTC)[reply]
(The rationale is sensible to me. —Suzukaze-c ◇◇ 17:58, 14 June 2019 (UTC))[reply]
Support--ჯეო/მიქაელ (talk) 15:43, 14 June 2019 (UTC)[reply]
I oppose this. Words should always be listed under their native script whenever possible, and obscurity of the script is absolutely no excuse. However, I agree that there are technical problems with cuneiform Unicode, so only out of necessity, I suppose that I have to Support instead. But if the technical challenges can be overcome in the future, then this policy should be revoted upon to be changed. Nicole Sharp (talk) 15:18, 24 June 2019 (UTC)[reply]
@Nicole Sharp, see above reply to Emascandam's vote. I really wouldn't call it much of a technical problem at all. --{{victar|talk}} 17:35, 24 June 2019 (UTC)[reply]
Unfortunately, an Akkadian font loaded with CSS will not always work outside of Wikimedia. Wiktionary is more than just a website since it is designed to be copied and redistributed in a wide variety of different media and formats. Simplest case would be if someone wants to copy and paste the text, they will lose the font formatting. Unicode should always be used for word entries, and if the script is not fully supported by Unicode, then I agree that Roman transliteration is the best fallback. But the display of cuneiform on Wiktionary should always use both Unicode and graphics (preferably SVG) in my opinion (for cuneiform that can be displayed by Unicode). Using graphics-only is a real pain, since then one has to look up the Unicode equivalent (assuming it exists). For displaying cuneiform, Unicode should ideally be used first, and then graphics as a fallback. Nicole Sharp (talk) 02:03, 25 June 2019 (UTC)[reply]
@Nicole Sharp: So your argument is because not everyone has fonts installed for certain unicode characters, we should all just use latin characters everywhere? You also do realize that every entry has transliterations and transcriptions on it, right? --{{victar|talk}} 08:21, 28 June 2019 (UTC)[reply]
Support For all Sumerian cuneiform languages. —*i̯óh₁n̥C ^[5] 08:39, 6 July 2019 (UTC)[reply]
~~Support~~ I completely agree with Tom 144 both in the "Rationale" and in every comment written until now. --Nebhos2019 (talk) 18:01, 19 July 2019 (UTC)[reply]
Ineligible to vote per WT:VP: "2. Their account must have at least 50 edits in total...". — sur jec tion ⟨?⟩ 20:05, 19 July 2019 (UTC)[reply]

Oppose

Oppose Absolutely not. --{{victar|talk}} 07:13, 24 May 2019 (UTC)[reply]
Any reasons why? – Tom 144 (𒄩𒇻𒅗𒀸) 11:07, 24 May 2019 (UTC)[reply]
For many of the same reasons Fay Freak list on the talk page, and to echo a point from the Sanskrit vote, if you can't handle working in Akkadian cuneiform, you should shouldn't be working in it at all. --{{victar|talk}} 16:07, 24 May 2019 (UTC)[reply]
Editors would still be working on cuneiforms, since the script would still be mandatory. The only difference is that the entry would be located under it's transliteration. The content does not change at all, and the knowledge requirements would be the same. --– Tom 144 (𒄩𒇻𒅗𒀸) 19:02, 24 May 2019 (UTC)[reply]
Disagree. People will always be lazy, and the lazy is just adding the transliteration. You can say it's "mandatory" until your face is blue. --{{victar|talk}} 20:15, 24 May 2019 (UTC)[reply]
This. Can’t force the people to work a certain amount unless by the rod. The bad characters rule the world, and optimism is what makes it even more mediocre. Fay Freak (talk) 23:32, 24 May 2019 (UTC)[reply]
This kind of pesimist thinking leads nowhere. – Tom 144 (𒄩𒇻𒅗𒀸) 20:19, 26 May 2019 (UTC)[reply]

See my proposal. --{{victar|talk}} 17:44, 6 July 2019 (UTC)[reply]
Oppose Reasons verso. My prognosis is disfavourable. Fay Freak (talk) 23:32, 24 May 2019 (UTC)[reply]
The phrase "Reasons verso" is probably intended to mean "reasons are on the talk page of the vote". That people who intentionally make themselves hard to understand (as others have noted) still have a discussion right and voting right is the downside of a wiki where anyone can edit. --Dan Polansky (talk) 07:27, 1 June 2019 (UTC)[reply]
Oppose Transliteration pages should point to a cuneiform lemma. /mof.va.nes/ (talk) 23:43, 24 May 2019 (UTC)[reply]
Why? There’s absolutely no benefit for doing that. We lose legibility, accessibility, accuracy, and entries for roots. – Tom 144 (𒄩𒇻𒅗𒀸) 01:32, 7 June 2019 (UTC)[reply]

Abstain

Abstain The proposal is interesting. I like the following part of the rationale: "Books and dictionaries lemmatize words according to their transliteration or transcriptions. These are the very sources we use for our Akkadian entries." The opposition so far has produced only weak arguments. However, I will wait to see if more worthwhile discussion develops than has so far been the case. I would be happy to see some sources, like links to online dictionaries or to Google Books or other places where I can see the same thing the author of the proposal has seen. --Dan Polansky (talk) 07:22, 1 June 2019 (UTC)[reply]
@Dan Polansky: The Chicago Assyrian Dictionary is the most prestigious and complete dictionary of the field. Other sources include a glossary of Old Babylonian words in John Huehnergard's grammar book, and another glossary on Old Akkadian in Ignace Gelb's Old Akkadian's grammar. Plus this fairly accurate online reasource that I found. All of these lemmatize at romanizations. I'd challenge the opposition to provide a single source that lemmatizes at cuneiform. I feel the main unstated argument of the opposition is that this violates wiktionary lemmatization tradition. – Tom 144 (𒄩𒇻𒅗𒀸) 16:52, 1 June 2019 (UTC)[reply]
@Tom 144: This argument has been played out so many times, it's practically a meme. What's best online isn't always what's best in print, and vice versa. We're not restricted by things like space and legibility, which I can see as legitimate reasons to not include cuneiform in an Akkadian dictionary. We have the ability to make a better, more complete dictionary, without such restrictions. --{{victar|talk}} 16:45, 20 June 2019 (UTC)[reply]
@Victar: As I've said numerous times now, cuneiform is still going to be included, the location of the entry is the only thing that changes. No content will be lost. None of the arguments I presented on the rationale are restricted to printed dictionaries. Why would the fact that wiktionary is an online resource have anything to do with legibility. – Tom 144 (𒄩𒇻𒅗𒀸) 19:54, 20 June 2019 (UTC)[reply]
@Tom 144: And as I replied, that's an unrealistic expectation and the only way to have it truly be enforced is to mandate that the entries be in cuneiform. Come again? Print is both restricted in physical page size and print costs. For cuneiform to be legible, it needs to be larger than normal Latin text, especially in print where you have ink bleeding. Trying to say that online and print are no different is just plain silly. --{{victar|talk}} 20:07, 20 June 2019 (UTC)[reply]
@Victar:As I understand your main objection, you start under the assumption that people will be as lazy as they can be, and unless they are forced to do some laborious task, people would opt not to do it. So lemmatizing at cuneiforms serves as a measure against those who will try to ignore the policy and not include the cuneiform script. However, from this initial assumption, this whole project should not work out, because no one is getting paid. Why would people work unnecessarily unless they were forced to do it.

Concerning the print and online discussion, I didn't say that print and online were no different, I said they irrelevant to the arguments I raised. – Tom 144 (𒄩𒇻𒅗𒀸) 21:42, 20 June 2019 (UTC)[reply]
I don't see how getting paid as anything to do with it. I spend much of my time getting on people's case for being too lazy on en.Wikt, by not including sources, sloppy formatting, etc. If you think laziness of editors isn't a factor, than you've been working in isolation for too long, my friend.

It's absolutely relative because you're making an apples to oranges comparison of print dictionaries to the online one on en.Wikt, which is flawed. --{{victar|talk}} 00:02, 21 June 2019 (UTC)[reply]

@Victar: Can you provide a link to a dictionary that places Akkadian lemmas on cuneiform, whether in print or online? And can you provide a link to a corpus-like resource that uses cuneiform where use of Akkadian terms in Akkadian sentences can be verified? --Dan Polansky (talk) 07:27, 22 June 2019 (UTC)[reply]
Just like I do with Old Persian, when I want to verify the cuneiform, I go straight to the texts. {{R:akk:Lenzi:2011}} has great facsimiles of texts, for one.

Secondly, I want to point out that there is a huge difference between dictionaries in transliterations ({{R:akk:GOA}}), and those in transcriptions ({{R:akk:CDA}}). I assume Tom is actually suggesting Akkadian entries in transcriptions, like ḫurāṣum, and not transliterations, like KU₃-SIG₁₇. @Tom 144, can you please clarify? --{{victar|talk}} 01:23, 23 June 2019 (UTC)[reply]

And thirdly, I'm also against Akkadian being lemmatized at all because many of these words only have a single attestation and a lemmatisation makes assumptions that aren't always correct. --{{victar|talk}} 01:23, 23 June 2019 (UTC)[reply]
@Victar: I'm sorry for not answering sooner, I have been very busy with my exams lately. I mean transliteration, because unlike transcription, it preserves information about the original orthography, since it's purpose is in fact to describe unambiguously the native script. – Tom 144 (𒄩𒇻𒅗𒀸) 15:57, 27 June 2019 (UTC)[reply]
@Tom 144: Could you please indicate where 𒊓𒄠𒋢𒌝, 𒌋𒊏𒀸𒌴, and 𒌨𒈨 would be lemmatized, so we see some examples? --Dan Polansky (talk) 15:56, 28 June 2019 (UTC)[reply]
@Dan Polansky: 𒊓𒄠𒋢𒌝 would be at ša₁₀-am-šu₁₁-um as an alternative of UTU. 𒌋𒊏𒀸𒌴 would go at u-ra-aš-ṭu, and 𒌨𒈨 would have to be placed at UR.ME. I would try to lemmatize at syllabifications whenever possible, unless of course the logographic orthography is vastly more common than the syllabic one. Again, sorry for the delay. – Tom 144 (𒄩𒇻𒅗𒀸) 02:15, 30 June 2019 (UTC)[reply]
@Tom 144, Dan Polansky See my proposal. --{{victar|talk}} 17:43, 6 July 2019 (UTC)[reply]
Abstain Canonicalization (talk) 09:50, 5 June 2019 (UTC)[reply]

Decision

7-3-2 passed – Tom 144 (𒄩𒇻𒅗𒀸) 04:41, 29 July 2019 (UTC)[reply]

@Tom 144 This vote was ill-conceived and needs much discussion before you start moving entries. There needs to be better consensus on what format the entries should have first. --{{victar|talk}} 22:06, 29 July 2019 (UTC)[reply]

@Victar What do you think is worth re-discussing? Is it the format we should choose for entries or for transliteration? – Tom 144 (𒄩𒇻𒅗𒀸) 03:12, 8 August 2019 (UTC)[reply]

Speaking about format, how do you think roots should be listed? Should they be hyphenated, or in capitals? --– Tom 144 (𒄩𒇻𒅗𒀸) 03:22, 8 August 2019 (UTC)[reply]

@Tom 144: Well, as I wrote on the talk page, I think they should be transcriptions if they're lemmatized (@JohnC5 agrees), because lemmatizing transliteration poses the same problems as cuneiform, making it rather pointless. I also think transcriptions should be reconstructions, as that's what they really are. I recommend you go over all the points raised here and on the talk page and start a new discussion in the Beer Parlour, something that really should have happened before this vote. --{{victar|talk}} 16:28, 9 August 2019 (UTC)[reply]

Which is why I consider this vote void. It didn’t vote for something certain, and one can’t vote against cuneiform either if the alternative is an unknown – although hm, according to the result of this vote one can now put Akkadian in any form one likes. Great job, Tom. Fay Freak (talk) 00:12, 10 August 2019 (UTC)[reply]

I don't think transcriptions should be reconstructions unless the stem isn't attested, or attested poorly. I'd support having the content at transcriptions instead of transliterations, and I agree with you in the fact that it is more convenient to have it lemmatized there, but I do not support moving all Akkadian lemmas to the reconstruction namespace. – Tom 144 (𒄩𒇻𒅗𒀸) 18:33, 19 August 2019 (UTC)[reply]

@Tom 144: And I'll repeat, you should start a discussion with these points in a new Beer Parlour thread. It's ridiculous that this is only being discussed after-the-fact, but here we are. --{{victar|talk}} 23:32, 19 August 2019 (UTC)[reply]