Open main menu

Wiktionary β

User talk:Suzukaze-c

  • [I am not good at communication.]

Mandarin readingsEdit

I was just curious about supposedly having the Mandarin readings "āo, āo, niū, " when other sources say they are "ǎo, ào, niù (not 100% sure about )". Bumm13 (talk) 08:29, 31 May 2017 (UTC)

Shoot, it must have been one of those cases where the code I used to generate the entry accidentally ignored tone marks... I must figure out a way to track them down. —suzukaze (tc) 19:57, 31 May 2017 (UTC)

ja-readings templateEdit

I just wanted to say thanks for what you did to the {{ja-readings}} template. Awesome job! thanks!! 馬太阿房 (talk) 07:42, 9 June 2017 (UTC)

You're welcome! User:Krun was the impetus though. —suzukaze (tc) 21:18, 9 June 2017 (UTC)
I'm curious to know where you got the list of jōyō kanji which designates certain jōyō readings as uncommon (the one the {{ja-readings}} template draws data from). Knowing that a reading is uncommon is useful information, so I'm glad that information is included, but I was wondering how trustworthy that information is. For example, I could be wrong, but I thought 久遠 read as くおん is more common than 久遠 read as きゅうえん but the く reading has the "Jōyō, uncommon" label. 馬太阿房 (talk) 17:12, 13 June 2017 (UTC)
@馬太阿房 It was extracted from the Japanese Wikipedia page. If you know of any better sources I could look at it.—suzukaze (tc) 17:25, 13 June 2017 (UTC)
Okay, I think I figured out what it means, and it is useful information. I don't think it means that it is less common, but rather that is used in just a few or maybe just one or two words (words which may themselves be considered common and also more commonly read with that reading).馬太阿房 (talk) 17:32, 13 June 2017 (UTC)

Edit

Please check what I did to the Japanese section under . Feel free to revert or change my editing as you see fit, but I thought it looked very confusing the way it was and decided to do something about it. I'm thinking maybe User:Krun had already asked you to do something about this too because I noticed that he was the one who had made the changes utilizing the {{ja-kanji}} template which doesn't work properly in cases like this. Can you maybe do something with this template and template documentation to make it more usable for cases like this? 馬太阿房 (talk) 22:28, 16 June 2017 (UTC)

Hmm, I don't think anyone has discussed {{ja-kanji}} nor this particular problem (shinjitai co-opting of older characters) before. I'll think about it. —suzukaze (tc) 00:55, 17 June 2017 (UTC)

FavourEdit

Hiya. You can see what I'm doing; converting {{ja-accent-common}} to {{ja-pron}}. But can you do a big favour for me? Please list all of the pages that use the former template. Thank you! 220.244.238.202 07:15, 18 June 2017 (UTC)

Special:WhatLinksHere/Template:ja-pronSpecial:WhatLinksHere/Template:ja-accent-commonsuzukaze (tc) 07:40, 18 June 2017 (UTC)
ありがとうございます。220.244.238.202 08:14, 18 June 2017 (UTC)

/ 𪡏Edit

The character is not being simplified correctly in the compounds section on its page. I presume that its simplified form, having been fairly recently added to Unicode, is missing from a simplification table in some module we have. Do you know how to fix this? There will also be several more new simplified characters missing from there, so it would probably be best to scour the most recent version of the Unihan database to get all the connections we’re missing. Edit: I noticed e.g. that the simplified form was added to the page late (2013), manually, by User:Bumm13, so I guess we’ve never had a good coverage of these later-encoded simplified forms. – Krun (talk) 00:57, 24 June 2017 (UTC)

@Krun: The data is at Module:zh/data/st (trad→simp) and Module:zh/data/ts (trad→simp). I've considered regenerating the data from other sources online (that are better than the Unihan database), but haven't gotten around to it... —suzukaze (tc) 01:04, 25 June 2017 (UTC)

Edit

Why isn't the reading なま showing the Jōyō kanji lable in the readings section? I can't figure out what the problem is. Can you fix this? 馬太阿房 (talk) 20:12, 27 July 2017 (UTC)

Well, I see that the issue is resolved now. I don't know if you did anything, I did anything, or if it resolved itself.馬太阿房 (talk) 20:37, 27 July 2017 (UTC)

Transclusion limitsEdit

In this diff you used a rather ingenious kludge to get the desired display. Unfortunately, that caused the entry to hit the template include size, and a number of templates toward the bottom of the page displayed as links to the templates themselves without any of the parameters. An IP (sort of) fixed this by selectively undoing the edit on those templates that weren't displaying, but the display is now inconsistent and less-than-optimum. I'm not sure the best way to fix this, but you may think of something I haven't (the easiest way would be to split the page, I guess). Thanks! Chuck Entz (talk) 01:40, 29 July 2017 (UTC)

@Chuck Entz: Damn, I hadn't noticed. I can only think of splitting the page. —suzukaze (tc) 05:50, 29 July 2017 (UTC)

Hanzi sortkeysEdit

I was looking at the sortkey data modules when you were experimenting with a sortkey function in Module:sandbox. Your data module is pretty huge. I wonder, had you considered splitting it by Unicode codepoint or something? Then a function could get the codepoint for a character and look up its sortkey in the correct module. And then, at least on most pages (i.e., those that have characters from fewer submodules), the function would use less memory.

Isomorphyc divided up modules used for reference templates in a similar way, though I really don't understand how it works (see Category:Reference module sharded data tables). — Eru·tuon 06:23, 29 July 2017 (UTC)

I have considered it. I know Module:zh does it at .check_pron() but I don't understand the code. —suzukaze (tc) 06:32, 29 July 2017 (UTC)
Hi @Erutuon, Suzukaze-c: Sorry I'm still away much longer than I thought. I am trying to make regular time for Wiktionary again. If you would like to pursue the sharding idea for data modules I can put my sharding code in GitHub or share it with some notes on my user page. It is in Python and it is simple conceptually; and I can help with explanations or cleaning up my Python style if necessary. Isomorphyc (talk) 18:30, 29 July 2017 (UTC)
I would add that I do not really recommend sharding if you can avoid it. It is not very friendly to other editors, and is too low-tech to maintain itself cleanly, while the robot client nexus with Lua is overpowered for data storage, brittle, and not very open. I wasn't able to find your data-intensive sort key module, but my binary hash index key is probably not appropriate; if you must shard, a user transparent key, such as stroke count, radical or initial Pinyin letter would probably be better. Isomorphyc (talk) 19:12, 29 July 2017 (UTC)
The module of sortkeys is at Module:User:Suzukaze-c/zh/data/skeys. It has single characters indexed to a radical (?) and a number. I was thinking of just putting it in a bunch of separate modules organized by Unicode codepoint. I'm new to programming and I'm not totally sure how it would work, but I might be able to figure it out. — Eru·tuon 19:21, 29 July 2017 (UTC)
I created a function to print out the content for the separate modules: Special:Permalink/47139321. — Eru·tuon 19:23, 29 July 2017 (UTC)
If you like I could share the code I used to process http://unicode.org/Public/10.0.0/ucd/Unihan.zip into Module:User:Suzukaze-c/zh/data/skeys. —suzukaze (tc) 23:24, 29 July 2017 (UTC)
Naw (if you're talking to me). I can just compile the submodules from your master module. See the current version of Module:sandbox/documentation. (I've exceeded the template include size, so it won't display in Module:sandbox. XD)
Only question is how big the submodules should be.
Right now, the module puts each range of 5000 codepoints (for instance, codepoints 13312 to 18311) in a single module, and it yields 36 submodules. (That indicates there are gaps, because there are 87870 codepoints total, and 87870/36 ≈ 2440. So the number of characters in each module probably varies.)
This system might be maintainable: we could keep the submodule-compiling function and your master sortkey module, and recompile the submodules if there are any changes. And because it's based on codepoints, the organization should be stable. However, I wonder if the submodules should be smaller. More work, but less likely to cause memory problems. I don't know how to figure out the actual amount of memory used by the module, though. — Eru·tuon 00:44, 30 July 2017 (UTC)
Bleh. There aren't actually that many modules. There's a big skip in the actual module numbers, corresponding to a skip in codepoints. There are still only 18 modules, approximately 87870 / 5000. — Eru·tuon 01:03, 30 July 2017 (UTC)
(I imagine that theoretically we could exclude a good number of the characters as being unused, but the modules might be used for unanticipated purposes in the future... —suzukaze (tc) 02:10, 30 July 2017 (UTC))
My experience was that 5k-50k is a good module size. One can be on the small side of the range if a single page might have a large number of different module transclusions; otherwise on the larger side. It is necessary to optimise for the worst case scenarios, usually involving short, high traffic, multi-language pages, depending on use case. I regularly made the mistake of not checking `Category:Pages_with_module_errors' often enough after changes for memory errors. Good luck. I did look over your data and I agree this is a very good candidate for sharding, and code point value is a good key. Isomorphyc (talk) 05:23, 30 July 2017 (UTC)
I'm starting to create the modules. I think they're on the lower end of the range that you give. That should be safer. — Eru·tuon 06:19, 30 July 2017 (UTC)

Okay, I've got less than half of Module:User:Suzukaze-c/zh/data/skeys added to subpages of Module:zh-sortkey/data. You can see the results on Module:zh-sortkey. (I should probably zero-pad the pagenames....) — Eru·tuon 07:26, 30 July 2017 (UTC)

I moved the pages to zero-padded versions. So there are like 80 modules left to go. If you'd be willing to help, I'd greatly appreciate it. Just go to Module:User:Erutuon/zh/documentation and copy the module code into the appropriately numbered subpages of Module:zh-sortkey/data/. — Eru·tuon 18:07, 30 July 2017 (UTC)

@DTLHS: Thank you for doing this; you beat me to it. Isomorphyc (talk) 21:48, 30 July 2017 (UTC)
Heh, I was planning to do it by hand and ask others to help, but then I thought, This is stupid, a bot should do it, so I asked DTLHS. But I hadn't considered that you might do it with OrphicBot. — Eru·tuon 22:00, 30 July 2017 (UTC)

{{ja-readings}}Edit

Hi. I saw your work on the new format of {{ja-readings}} and it looked much nicer. Just asking if it was possible to add the correspondence between the reading section and the kanji definitions for 多音字 like ? --Dine2016 (talk) 01:21, 19 August 2017 (UTC)

Like 樂#Korean? (which is an experiment) —suzukaze (tc) 04:12, 19 August 2017 (UTC)
The Korean hanja section looks great. I'd like to wait until we have enough Japanese 漢和 coverage, though. --Dine2016 (talk) 06:03, 19 August 2017 (UTC)

By the way, what would be the criterion for inclusion of kun-readings? I've just got a copy of the three-volume 広漢和辞典, but it seemed to list too many kun-readings, some of which are only used for reading kanbun. --Dine2016 (talk) 03:01, 8 September 2017 (UTC)

I'm not sure... —suzukaze (tc) 03:12, 8 September 2017 (UTC)

Edit

Please check your last edit there. It looks like there are at least a couple of bogus/empty items in the list (at least those seem to be hyphens, not kana- but what do I know). The reason I noticed is that it led to the automatic creation of Category:Japanese kanji with kun reading -, which {{auto cat}} doesn't know what to do with. Chuck Entz (talk) 02:58, 23 August 2017 (UTC)

Darn. Thanks for noticing.—suzukaze (tc) 03:01, 23 August 2017 (UTC)

Cáo CāoEdit

Objecting your proposed quick deletion, I changed it to discussion. Personal names should generally be excluded, but limited exception should be considered with specific reasons. I was just trying to delink 曹操.Jusjih (talk) 00:05, 2 September 2017 (UTC)

So far, the pinyin forms of deleted entries have also been deleted. Why should we keep a non-lemma form of a deleted entry? —suzukaze (tc) 00:06, 2 September 2017 (UTC)
I have a new thought at Wiktionary:Requests for deletion#Cáo Cāo.--Jusjih (talk) 03:03, 3 September 2017 (UTC)

zoosexualityEdit

Probably, but luckily Metaknowledge just protected it. - Amgine/ t·e 23:31, 18 September 2017 (UTC)

, and on'yomi of the pattern CVchiEdit

I was curious about this change.

Generally speaking, I remember reading somewhere that at least some of the on'yomi ending in ち are essentially reconstructed and only found in kanji dictionaries, based on Middle Chinese readings that ended in /t/. Meanwhile, つ ending for such readings would basically be allophone, as the alternative Japanese nativization of a final /t/.

More specific to the entry, this would point to the がつ reading as the expected goon pair for がち. What is the basis for removing がつ from goon?

Also, is there any attestable evidence for the ごち reading in the historic record? I've had a poke in my resources and can't find anything.

Curious, ‑‑ Eiríkr Útlendi │Tala við mig 17:49, 20 September 2017 (UTC)

I don't know about the other readings but gatsu is already listed as kan'youon. —suzukaze (tc) 22:36, 20 September 2017 (UTC)
Return to the user page of "Suzukaze-c".