Wiktionary:Votes/pl-2019-05/Lemmatize Japanese wago words at kana spellings

Lemmatize Japanese wago words at kana spellings edit

Voting on: Change Wiktionary:About Japanese#Lemma entries to either:

(1) Lemmatize all wago words at kana spellings.

Lemma entries

Following are the guidelines for entries for the lemma form of Japanese terms: (vote)

  • For wago (和語) words aka Yamato-kotoba (大和言葉), i.e. words of native Japanese origin, the kana form is considered the lemma.
  • For everything else, the most common or most regular spelling is considered the lemma.
  • When the situation is unclear, editors are advised to use their best judgment on a case by case basis.

For non-lemma entries (e.g. for alternative spellings, rōmaji, and conjugated forms), see #Non-lemma forms for the more abbreviated form to use instead.

Or (2) only lemmatize rare or archaic wago at kana spellings.

Lemma entries

Following are the guidelines for entries for the lemma form of Japanese terms: (vote)

  • As a general rule, the most common spelling is considered the lemma.
  • Rare or archaic 和語 (wago) terms may use the kana spelling as the lemma, to avoid cramming kanji entries with too many readings. (This does not apply to proper nouns, specialized terms, etc.)
  • When the situation is unclear, editors are advised to use their best judgment on a case by case basis.

For non-lemma entries (e.g. for alternative spellings, rōmaji, and conjugated forms), see #Non-lemma forms for the more abbreviated form to use instead.

Schedule:

Discussion:

Lemmatize all wago at kana edit

  1.   SupportSuzukaze-c 01:08, 13 May 2019 (UTC)[reply]
    I also support the second option if there would be "no consensus" between options 1 and 2 otherwise. —Suzukaze-c 23:12, 27 May 2019 (UTC)[reply]
  2.   Support. By the way, can we create a category for wago terms such as Category:Native Japanese nouns? Here's a shortcut to look for native Japanese verbs, which are easier to identify. KevinUp (talk) 05:38, 24 May 2019 (UTC)[reply]
    Main reason for supporting this option is because native Japanese words tend to have many different kanji spellings — つく (tsuku) has about fourteen. Since Wiktionary is an etymological dictionary, I would prefer to see words being lemmatized in a way that can better explain its etymology. KevinUp (talk) 06:04, 24 May 2019 (UTC)[reply]
  3.   Support I see that I argued against an early version of this proposal (March last year), but I think the case for wago entries is very well thought out. (I could write for ever, but most of it has already been said.) Imaginatorium (talk) 07:02, 24 May 2019 (UTC) Followup comment I would like to vote against the idea of restricting this to "rare" (who decides what's rare?) or "archaic" (marginal case anyway) entries. Saying "like any Western language" doesn't work, because Japanese isn't. Imaginatorium (talk) 08:59, 24 May 2019 (UTC)[reply]
    "Rare or archaic": Perhaps using the official Jōyō kanji list as a standard? As a counterargument, is a 表外字, but it's very common. --Dine2016 (talk) 09:25, 24 May 2019 (UTC)[reply]
  4.   Oppose "Lemmatize all wago at kana" per TAKASUGI Shinji: "we should use the most common spellings for words currently in use, just like any Western languages." I cannot place this into the oppose section which would be for both proposals at once; each proposal needs its own oppose section. I see no problem with "Lemmatize only rare/archaic wago at kana", and I abstain on it. --Dan Polansky (talk) 09:33, 7 June 2019 (UTC)[reply]
    Comment: I think there is a huge problem with using the term "spelling" to refer to completely different ways of writing something. It sounds supremely convincing to say "Use most common spelling", but this is not what is being talked about. Consider in English: is the cardinal 1 more commonly written as "1" or "one"? Would you call these "alternative spellings"? I think not. Dictionaries of the English language normally include this under "One", because that is its spelling. If you have to make an analogy with Japanese it would be that 'やま' is the "spelled-out" form of '山'. Imaginatorium (talk) 05:06, 10 June 2019 (UTC)[reply]
  5.   Support, albeit with the understanding that this is largely a technical consideration due to the inherent limitations of the MediaWiki software.
    One argument that leads me to this support is that various terms, even (especially?) very common terms, are only spelled differently for specific senses, and the core meanings and etymologies are identical. C.f. the many spellings of つく (tsuku). Breaking all of these out to separate entries requires a lot of data duplication. Native-language electronic dictionaries get around this by containing all the core information in one place, and essentially having all relevant spellings included in a single headword line. Example headline from my electronic copy of Shogakukan's KDJ: つ・く【突く・衝く・撞く・搗く・舂く・築く・吐く】 A search for any of the kanji spellings will deliver the user to the entry, and a search for the kana spelling つく (tsuku) will give the user a list of entries to choose from that match that reading. Since we can't seem to come up with a way to induce the MediaWiki framework to allow us to use a similar organizational scheme, lemmatizing at the kana spelling strikes me as the best way forward. ‑‑ Eiríkr Útlendi │Tala við mig 04:21, 10 June 2019 (UTC)[reply]
  6.   Support (quite strongly) User Eiríkr Útlendi sums it up well. Many wago don't have a specific kanji, and often when kanji differ depending on reading, sources are inconsistent. There's also the problem of inconsistency in which kana are written after the kanji for the inflectional endings -- some of that variation is easy to decide on per common usage, but much is not, not without us imposing our POV as to how Japanese should be written. Our trying to force order into this chaos is not likely to be practical, and not what we're here for anyway. kwami (talk) 02:07, 29 July 2019 (UTC)[reply]
  7. I'm between the   Abstain and   Oppose sides... per TAKASUGI Shinji and Dan Polansky. ~ POKéTalker23:18, 8 August 2019 (UTC)[reply]

Lemmatize only rare/archaic wago at kana edit

  1.   Support: we should use the most common spellings for words currently in use, just like any Western languages. That is what users look for in a dictionary. Moving out rare readings is for readability. — TAKASUGI Shinji (talk) 05:43, 24 May 2019 (UTC)[reply]
    I think it's mistaken to draw false analogies between Japanese and western languages because Japanese is fundamentally different fron western languages: a word in a western language usually has one spelling, but a word in Japanese usually has two spellings. For example, English 'book' has only one spelling, but Japanese 本 has two spellings, 本 and ほん. The English verb 'read' has only one spelling, but Japanese 読む has two spellings, 読む and よむ (and a few others). As a result, a printed dictionary of a western language uses the most common spelling of words as headwords, but a printed dictionary of Japanese use the kana spelling of word as headwords, even though kanji spellings are more common for most words. If printed dictionaries of Japanese used the most common spelling of words as headwords, then looking up will be more difficult, which does not help but hinder users. (Whether Wiktionary should use the most common spelling as lemma is a separate issue.) --Dine2016 (talk) 08:59, 15 August 2019 (UTC)[reply]
  2.   Support --Anatoli T. (обсудить/вклад) 08:36, 24 May 2019 (UTC)[reply]
  3.   Support I agree that the most common spelling should be the lemma; that will not always be a kana spelling. (By the way, I appreciate the "use their best judgment" caveat in both suggestions.) Cnilep (talk) 09:46, 25 May 2019 (UTC)[reply]
  4.   Support. The "most common written form as lemma" rule is sensible. Deryck Chan (talk) 13:29, 14 June 2019 (UTC)[reply]
    (Indented, as a vote added after the close. ‑‑ Eiríkr Útlendi │Tala við mig 23:27, 27 June 2019 (UTC))[reply]
    Unindented due to the extension. —Suzukaze-c 04:38, 30 June 2019 (UTC)[reply]
  5.   Support (explicit support as suggested by User:Dan Polansky on the talk page) —Suzukaze-c 01:17, 1 July 2019 (UTC)[reply]
  6.   Support Xbypass (talk) 15:59, 1 July 2019 (UTC)[reply]
  7.   Oppose: I've noticed that in single character kanji entries such as (hi), wago derived terms of (hi) may have spellings that do not contain :
    1. (higashi)
    2. (hiko)
    3. (hime)
    4. (hiru)
    5. 光る (hikaru)
I would like to propose for wago of single character kanji such as () and (やま) to be lemmatized at kana and redirected using {{ja-see}} such as the format currently used in (waga, a, are). The main reason is because entries such as (mizu) and (hito) are currently experiencing memory errors. For example, I recently broke the entry for in this edit. KevinUp (talk) 22:34, 26 July 2019 (UTC)[reply]
Han character entries that have Lua memory overflows can be found here: Special:WhatLinksHere/Template:character_info/subpage (this template reduces Lua memory)
As I continue to work on Han character entries in sections such as Japanese, Korean, Vietnamese and the addition of Classical Chinese quotations to these entries, Lua memory overflows are bound to occur in future. KevinUp (talk) 15:49, 28 July 2019 (UTC)[reply]
@Dan Polansky, Chuck Entz Would it be possible to ask those who have already voted on "Option 2: Lemmatize only rare/archaic wago at kana" to make a second vote to see whether they would support or oppose making an exception for single character kanji entries (lemmatizing wago of single character kanji at kana form) due to technical limitations? KevinUp (talk) 15:49, 28 July 2019 (UTC)[reply]
Could you explain how your rational is related to your vote? hi, higasi, hiko, hime etc. would all be separate entries regardless, so I don't understand what they have to do with this. kwami (talk) 02:03, 29 July 2019 (UTC)[reply]
@Kwamikagami: What I mean is that (higashi), (hiko), (hime), (hiru), 光る (hikaru) are currently listed as ====Derived terms===== of (hi). This may lead some people to think that kanji such as , , , are derivatives of the kanji which is not true. KevinUp (talk) 22:50, 29 July 2019 (UTC)[reply]
If higashi, hiko, hime, hiru, hikaru, etc. which are etymologically related were listed as derived terms of (hi, day; sun), I think that would be less confusing for our readers. KevinUp (talk) 22:50, 29 July 2019 (UTC)[reply]
I agree. I guess I'm confused as to how that counts as 'opposed', when it would be solved by making the change? kwami (talk) 23:35, 29 July 2019 (UTC)[reply]
I was also confused by the "oppose" header in this vote, so I have moved my vote for {{oppose}} to option 2 for less ambiguity. Anyway, I had already put my support for option 1 (lemmatize all wago at kana) around last month. KevinUp (talk) 00:46, 30 July 2019 (UTC)[reply]
Anyway, my main concern is that single character kanji are taking up too much Lua memory. I hope that those who have voted for option 2 would be willing to make an exception for single character kanji and to lemmatize wago spellings of kanji such as (ひと) and (みず) at kana form to solve the issue of "Lua error: not enough memory" at entries such as and . KevinUp (talk) 22:50, 29 July 2019 (UTC)[reply]
50 MB (error)
50 MB (error)
50 MB (error)
50 MB (error)
48.24 MB
47.72 MB
46.48 MB
46.42 MB
46.41 MB
46.17 MB
45.46 MB
45.44 MB
45.28 MB
44.62 MB
44.52 MB
44.50 MB
44.43 MB
43.29 MB
43.24 MB
42.98 MB









The table above is a random selection of entries that are almost reaching the 50 MB memory limit. KevinUp (talk) 22:50, 29 July 2019 (UTC)[reply]
Again, I seem to be missing something. I read this as an argument in support. kwami (talk) 23:35, 29 July 2019 (UTC)[reply]
Okay, to make things clearer, I am opposing option 2 (I had previously put {{oppose}} under the "Oppose" header below). KevinUp (talk) 00:46, 30 July 2019 (UTC)[reply]
  1.   Support “Lemmatize *some* only rare/archaic wago at kana” per KevinUp, anything to reduce the overflowing memory. Why not create something like 水/Chinese, 水/Japanese, etc.? Oh I get it, the {{head|langcode|head=}} problem. ~ POKéTalker23:18, 8 August 2019 (UTC)[reply]
    Regarding the issue of overflowing memory, I will start a separate Beer Parlour discussion to discuss this issue. A tentative proposal is to redirect wago forms of kanji taught at primary school level (教育漢字 (kyōiku kanji) to kana form. This will affect only kanji readings written without okurigana (送り仮名). KevinUp (talk) 12:11, 11 August 2019 (UTC)[reply]

Oppose edit

  1.   Oppose I have a slightly different opinion. Most-common-spelling policy should be stuck to, except when it is the case that one wago word has different kanji spellings, like 着く/付く/就く or 掛かる/架かる/懸かる. Only in that case, lemmatizing at kana is beneficial enough to resort to, for the reason of reducing redundancy. Also I strongly oppose treating wago proper nouns, like 富士山 or 埼玉, in this way, since the spellings of proper nouns are well established and should not be changed unless there be some really good reasons. -- Huhu9001 (talk) 03:56, 1 June 2019 (UTC)[reply]
    Comment: Regarding wago proper nouns, placenames and family names have fixed spelling. Indeed, these are well-established so it would make sense to use the most common spelling. However, kanji spelling for given names can have many different combinations, which is why I would prefer to see Japanese given names lemmatized using hiragana spelling.
    I think this is also a good suggestion, i.e. to stick to the most common spelling for all terms, except when one wago word has different kanji spellings. If this is available as a third option I would vote for it. KevinUp (talk) 10:22, 15 June 2019 (UTC)[reply]
  2. Generally speaking,   Oppose per Huhu. Generally speaking, list words at their most common "spelling" (obviously with link from kana spelling if most common is kanji). Treat cases with multiple common kanji "spellings" individually on their merits, including option of listing at hiragana spelling. Mihia (talk) 00:27, 27 June 2019 (UTC)[reply]
    Indented, as being a vote made after the end of the vote. —Suzukaze-c 06:51, 27 June 2019 (UTC)[reply]
    Unindented due to the extension. —Suzukaze-c 04:38, 30 June 2019 (UTC)[reply]

Abstain edit

  1.   Abstain I'm actually ok with any of the three outcomes. I created the vote to finalise the soft-redirection templates based on the outcome. For example, {{ja-spellings}} was originally designed for wago at kana entries so I put it on the right side of the page, expecting it would be in complementary distribution with {{ja-kanjitab}}. If it's decided that we lemmatize a large number of wago at kanji entries, then we definitely need to change its appearance so it does not clash with {{ja-kanjitab}}. Dine2016 (talk) 02:39, 12 May 2019 (UTC)[reply]

Decision edit

If we consider two votes separately with general "oppose" counting for both proposals:

  • proposal 1: 5 support, 3½ oppose, 1½ abstain: support/oppose ratio ~59%, no consensus
  • proposal 2: 7 support, 3 oppose, 1 abstain: support/oppose ratio 70%, passed

Thus proposal 2 ("Lemmatize only rare/archaic wago at kana") passes. — surjection?11:27, 20 August 2019 (UTC)[reply]

I cannot agree -- I voted in favor of Proposal 1 (lemmatizing all Japanese wago terms at kana spellings), and as these two proposals appear to be mutually exclusive, I also intended my vote to be in opposition to Proposal 2 (lemmatize only rare/archaic wago at kana spellings). I may not be alone in my view, raising the distinct possibility that the support/oppose ratio for Proposal 2 should be seen as lower than stated above.Likewise if votes for Proposal 2 were also intended to be in opposition to Proposal 1. ‑‑ Eiríkr Útlendi │Tala við mig 21:51, 21 August 2019 (UTC)[reply]
They are mutually exclusive, but in that situation there should have been an oppose vote for proposal 2. If we consider all proposal 1 support votes to also support proposal 2 and vice versa, that seems unfair for both, since the support votes are not necessarily being mutually exclusive. Arguably the vote itself wasn't organized correctly. — surjection?19:50, 23 August 2019 (UTC)[reply]
Yeah, I agree that this proposal was badly designed, and I'm sorry for that. Pass of proposal 2 isn't a bad idea because (1) it allows us to move some kanji entries to kana (such as (mono) and 居る), (2) it does not forbid proposal 1 from being proposed in a future vote, and (3) there are many problems with proposal 1, such as randomness of compounds. --Dine2016 (talk) 04:18, 24 August 2019 (UTC)[reply]