Archives:
20092010201120152016201720182019

2 thingsEdit

Hi :)

Could you help with this?

  1. review and commit my change to TranslationAdder.js removing the balancer buttons and reliance on trans-mid. I have used it daily since the change and have not seen any problems after removing it.
  2. Add me to this list for me to able to use JWB.--So9q (talk) 10:24, 9 September 2019 (UTC)
Your change to the gadget looks okay, so I'll copy it to the gadget page.
I'm just an interface admin, so I can't edit Wiktionary:AutoWikiBrowser/CheckPage. You'll have to get the attention of a real admin (sysop). — Eru·tuon 18:07, 9 September 2019 (UTC)

Lua memory usageEdit

Hi, I found this via your common.js: User:Erutuon/scripts/simpleTranslations.js. It contains this: {{[[T:t-simple|t-simple]]}} for Latin-script terms with just lang, term, and gender, to reduce Lua memory usage, using [[User:Erutuon/simpleTranslations.js|JavaScript]]

Is this still relevant? If yes, would it not be a good idea to improve the TranslationAdder.js to insert these for da, no, nb, etc.? WDYT?

I saw that some pages have sub-pages /translations to work around the Lua memory issue. Can massive use of t-simple avoid that?--So9q (talk) 10:40, 9 September 2019 (UTC)

No, the translation adder shouldn't use {{t-simple}}. It's just a workaround on pages that are in CAT:E because they are using too much Lua memory. And {{t-simple}} doesn't always reduce memory enough to remove the error messages; that's why there are translation subpages. — Eru·tuon 17:05, 9 September 2019 (UTC)

Community Insights SurveyEdit

RMaung (WMF) 14:34, 9 September 2019 (UTC)

Context deprecation and red messageEdit

In {{context}}, I restored the version that does not show the long red message. The point of deprecation as opposed to deletion is to make page histories legible. I did that after I noticed in page histories illegibility that I did not expect to be there, and then found the source of the illegibility.

I understand this was an attempt to prevent people from using the template. There is a better way, preserving history legibility: create an edit filter that is going to prevent people from saving an entry that contains a deprecated template. No one created such a filter yet and I don't know why; I fear I do not have enough user rights to edit these filters.

In any case, we have deprecation under control via Category:Pages using deprecated templates, which now contains 4 pages. I am cleaning up the category once in a while, and I remember similar counts. It is very manageable. With the edit filter, it would be even easier. --Dan Polansky (talk)

Not a bad approach to the problem. So much edit history is virtually unusable because of deprecation. DCDuring (talk) 14:47, 12 September 2019 (UTC)
@Dan Polansky: I'm also generally in favor of keeping histories legible, but got a bit carried away so I added the error message. Since you are keeping an eye on the category, it makes sense to remove it. I do like the idea of an edit filter for frequently used deprecated templates, but I'm not an admin either. — Eru·tuon 00:15, 13 September 2019 (UTC)

Administrator?Edit

You do a lot of valuable work with templates and modules. Would you consider becoming an administrator? — SGconlaw (talk) 11:51, 13 September 2019 (UTC)

Good idea. You would have access to more things. We wouldn't make you do more patrolling. DCDuring (talk) 13:10, 13 September 2019 (UTC)
Not that I would mind if we had more people patrolling... —Μετάknowledgediscuss/deeds 16:51, 13 September 2019 (UTC)
I'm grateful for what he does. I run into vandalism that he's undone all the time. Chuck Entz (talk) 22:23, 13 September 2019 (UTC)
I'm surprised Erutuon is not an admin already! —AryamanA (मुझसे बात करेंयोगदान) 18:50, 14 September 2019 (UTC)
He's been offered the position before: see here. 31.173.87.215 18:54, 14 September 2019 (UTC)
I refused before, but I guess I'd be willing now if there's something I could do with the admin tools. Perhaps protecting vandalized modules and templates and moving pages. — Eru·tuon 19:31, 14 September 2019 (UTC)
Great! Let me see if I can figure out how to nominate you. (Unless someone else wants to jump in and do it first ...) — SGconlaw (talk) 19:51, 14 September 2019 (UTC)
@Sgconlaw: Done. Please endorse the nomination. 31.173.83.164 12:15, 15 September 2019 (UTC)
Oh, thanks, 31.173.83.164! Erutuon, you need to indicate your acceptance on the voting page. — SGconlaw (talk) 14:46, 15 September 2019 (UTC)

Erroneous conversion to t-simpleEdit

Hi, I just discovered that these entries have been converted by you to t-simple because of the Lua memory bug but in a way that does not show the information about gender.

* Danish: {{t-simple|da|næse|c|langname=Danish|interwiki=1}}

This is correct:

* Danish: {{t-simple|da|næse|g=c|langname=Danish|interwiki=1}}

--So9q (talk) 11:33, 16 September 2019 (UTC)

Ouch. Good catch. I'm going to have to figure out if it's better to make parameter 3 be gender, or convert these to use |g= and change my script. — Eru·tuon 18:24, 16 September 2019 (UTC)
Census of parameters in {{t-simple}} from the latest dump:
  • |1=: 16129
  • |2=: 16129
  • |3=: 3716
  • |4=: 1
  • |alt=: 141
  • |g=: 323
  • |interwiki=: 6342
  • |lang=: 1
  • |langname=: 15341
  • |lit=: 1
  • |sc=: 66
  • |tr=: 317
Since |3= is so common (because of me no doubt), {{t-simple}} now accepts the gender in either |3= or |g=. I also checked and there was only one instance with both |3= and |g=, which I corrected. — Eru·tuon 20:28, 16 September 2019 (UTC)
Nice! Thank you, again, again :)--So9q (talk) 20:56, 16 September 2019 (UTC)

English at topEdit

Concerning this do you have a link to a policy or vote stating this norm? I found nothing in wt:EL and other style pages I looked at.--So9q (talk) 08:05, 18 September 2019 (UTC)

@So9q: From ELE:
Priority is given to Translingual: this heading includes terms that remain the same in all languages. This includes taxonomic names, symbols for the chemical elements, and abbreviations for international units of measurement; for example Homo sapiens, He (“helium”), and km (“kilometre”). English comes next, because this is the English Wiktionary. After that come other languages in alphabetical order.
Giorgi Eufshi (talk) 10:43, 18 September 2019 (UTC)
OK, that makes sense. --So9q (talk) 11:18, 18 September 2019 (UTC)

AdminEdit

Congratulations! Chuck Entz (talk) 13:01, 30 September 2019 (UTC)

Yeah, you are awesome and admin --Vealhurl (talk) 17:52, 10 October 2019 (UTC)
Indeed, congrats! — SGconlaw (talk) 20:16, 10 October 2019 (UTC)

wikt:majolica n.Edit

Re your reversion, removal of images: The word majolica has been dogged with confusion since it is used for two distinctly different products in different countries in different periods of time. All other dictionaries than Wiktionary define it inaccurately or omit one sense of the word. Hard to believe but true. The two products, the two meanings of majolica, the two majolicas are visibly different. I feel the deleted images assist understanding and warrant an exception to the 'minimal images' rule.
Davidmadelena (talk) 23:10, 15 October 2019 (UTC)

@Davidmadelena: I have no objection to illustrating the two definitions – it's just not clear to me why so many images are needed. Why wouldn't two images, one for each definition, be enough? (This is an honest question – I hadn't heard of majolica before the entry showed up in my possibly incorrect headers cleanup page.) If you could find two images that clearly illustrate the differences in the two techniques, that would be ideal. To allow people to see more images, you can create a page on Wikimedia Commons (see c:Category:Majolica) and link it from the entry using {{commons}}. — Eru·tuon 23:33, 15 October 2019 (UTC)
@Eru: Overnight I had reached the same conclusion: two images to clearly illustrate the difference. Done, and thanks, much better now.Davidmadelena (talk) 10:15, 16 October 2019 (UTC)
I think you mean @Erutuon: :) Eru (talk) 16:36, 16 October 2019 (UTC)
@Davidmadelena, Eru: My confusing signature is to blame... — Eru·tuon 16:39, 16 October 2019 (UTC)

Removing control charsEdit

Some of these should not be removed, but rather replaced with an em dash, e.g. [1]. Equinox 21:25, 18 October 2019 (UTC)

@Equinox: Oh, yeah, that makes sense. I'll go and clean up after myself. — Eru·tuon 21:36, 18 October 2019 (UTC)

Template:t-simpleEdit

Regarding diff, I thought the whole point of translation subpages was that they would avoid Lua memory problems without the need for the clumsy {{t-simple}} template. That's why I've been going through them and removing it from them. If you're readding it though, then we're working at cross purposes. —Mahāgaja · talk 09:40, 24 October 2019 (UTC)

@Mahagaja: I switched translations in fire/translations to {{t-simple}} because it was running out of memory. In general I'm in favor of having translation subpages use {{t}}, {{t+}} if they can without running out of memory; if fire/translations can be switched back (maybe I should make a script for this), it should be. — Eru·tuon 16:07, 24 October 2019 (UTC)
Good heavens, you're right: it was running out of memory. That's kind of appalling. But I agree that using {{t-simple}} in that case is unavoidable. —Mahāgaja · talk 19:50, 24 October 2019 (UTC)

Thank youEdit

I'm extremely new to Lua. Having a solid background in JavaScript has helped me transition, but I appreciate the improvements you've offered. I just wanted to tell you that I've been working on a major update to the module script, which I've been editing offline because...Wiktionary's editor isn't as convenient as EditPad for indentation, regular expressions text search and replacement, etc.

Some background information: I know ideally, if I can get more people to help me out with Marshallese maintenance on Wiktionary (and on Wikipedia, where I'm mostly responsible for it there, too), I can't just treat scripts like something I can write and maintain unilaterally. But for now, the script is still very much in flux, not just in the state of code but in the wisdom of coding decisions, etc. For instance, I think I made a huge mistake embedding separate MED vs. Choi vs. Willson IPA symbols, because they don't actually represent different dialects, but merely different published researchers' occasionally conflicting phonological analyses of the language. Honestly, the state of Marshallese linguistics publications can be a bit of a mish-mash of different researchers doing their own things and not always agreeing on conventions, which has led me occasionally having to get a tad...creative. Lately I've been asking for more peer review on w:Talk:Marshallese language to help improve the occasionally confused and OR-prone state of the article and pronunciation templates, and what the scripting I write here is something I hope can eventually be used there as well where appropriate. That effort on Wikipedia, like this script, and the Wiktionary:About Marshallese proposal, are still all very much a work in progress, and for the most part I've had to maintain it all myself, and inadequate peer review means the mistakes I make tend to become the decisive word in how the wikis describe the language, sometimes for years on end until someone (or myself) notices the problem.

So thank you for your help with scripting and setting up some simple test cases, etc. While I'm still improving the script offline, I've made note of your improvements and am trying to add them in the offline editing before I submit and test features of a new update, all while trying not to break currently deployed invocations in the process. - Gilgamesh~enwiki (talk) 08:03, 31 October 2019 (UTC)

Glad that my tinkering was appreciated. I just encountered some module errors due to outdated input in {{mh-ipa-rows}} and provided more informative module errors, and then possibly made the errors useless by removing u from the supported characters. (All the erroring instances had u.) Wiktionary:About Marshallese still needs updating though. — Eru·tuon 17:18, 2 November 2019 (UTC)
Thanks again. And yeah, my bad.
Again, some background, and what motivated me to make such drastic changes today: When writing Marshallese templates on Wikipedia and Wiktionary years ago, I devised an ASCII-based symbol system loosely based on the MED phoneme transcription developed by Byron W. Bender and used in the Marshallese-English Dictionary. But in my effort to simplify it into an ASCII-inputtable system, I changed Bender's a e ẹ i notation to a e o u, since at the time we were treating the vertical vowel system phonemes as underspecified for backness or roundedness—which is true, they are underspecified for that, but at the time we were representing the phonemes using central vowel symbols /a ɜ ɘ ɨ/. But in the most recent discussion at w:Talk:Marshallese language where I asked for review by other editors to improve the quality of the language's representation and to reduce original research, it was agreed that only one of the published linguists had represented the phonemes with central vowel symbols at all, and that was Choi (1992). No one else used his ad hoc system, and it excluded one of the vowels altogether, representing only three. Other published researchers had either phonetically represented the vowels only as allophones, or echoed Bender's half century of Marshallese research using front vowel symbols (instead of central vowel symbols) to represent the underlying phonemes, which meant that the a e o u notation used before had come to make even less logical sense now. We agreed to change the way the article on Wikipedia represents the phonology. Many of those edits are still pending—I've been focusing most of my edits so far on Wiktionary because it will be most affected by these changes. Anyway, it was observed that before Bender started using a e ẹ i, he represented them in his earlier works from 1968 and 1969 as a e & i using an ampersand instead of ẹ, and I realized that since those four characters are still ASCII, they're as good as any symbols to represent those phonemes in modules and templates. I changed every instance I could find in the word entries, and I checked Category:E to check for stragglers, but at the time there weren't any, so I thought I'd gotten them all. Obviously, it seems I missed two of them.
But yes, the use of "o" and "u" as symbols are in the process of being retired, and I edited the parse function to no longer recognize them when I thought I'd at least already updated all the examples in the word entries. (I still need to edit Wiktionary:About Marshallese, and the examples in talk pages have the lowest priority at the moment.)
This time, I was quick to incorporate your most recent changes to the module code in my offline editing copy. But I admit...I don't understand the syntax text:gsub or what that snippet of code does. I didn't know Lua at all before a couple of weeks ago, and I've adapted to writing it much more quickly than I imagined possible, but that's thanks to where I've been able to convert my equivalent JavaScript knowledge. Though I understand your error-checking edits were to diagnose straggling "u" as the culprit, I don't fully understand what your added code actually does in regards to error message reporting. Could you please explain it, if possible? When I've gotten errors from the module, I've mainly just been browsing the stack trace and the line numbers of where the error was generated. - Gilgamesh~enwiki (talk) 18:52, 2 November 2019 (UTC)
Thanks for the further explanation.
I learned JavaScript (and C) after learning Lua, so I can try to explain the colon syntax by comparison with JavaScript. In JavaScript, a.method() is a method call and passes an implicit this, equal to a, to the method. In Lua, a:method() is the closest equivalent; it passes a as the first argument to the method. a.method() would call the method with no arguments. The functions in the string library are available when a string value is indexed (via the __index field in the metatable for strings), so if text is a string, text.gsub gives a function equal to string.gsub, and text:gsub(pattern, replacement) is equivalent to string.gsub(text, pattern, replacement), and is analogous to text.replace(regex, replacement) in JavaScript. text.gsub(a, b) would fail to pass text as the first argument to the function, so is equivalent to string.gsub(a, b): a is the string, and b is the Lua pattern. (Lua will throw a runtime error because the replacement value is required: "string/function/table expected".) In JavaScript, it would be sort of similar to do { const replace = text.replace; replace.call(a, b); }.
The error messages I added were to avoid the incomprehensible error for indexing of a nil value for map[a][d] and such indexings. If local map = {} and local a = "u", then accessing map[a][anything] will cause the error "attempt to index field '?' (a nil value)" because map[a] is nil (there is no value indexed by a) and nil values can't be indexed in vanilla Lua. So I added a check that will prevent the "indexing of nil" error message, since I like error messages to be somewhat understandable (even though average users can't fix them). The error message might be wrong, since I was writing it quickly, and it's possible the check is no longer needed, if the module ensures that the transcription has correct phonotactics or syntax before that point. — Eru·tuon 20:29, 2 November 2019 (UTC)
Thank you. I didn't even know that calling syntax was possible in Lua, but it looks elegant. I'm tempted to use it more. - Gilgamesh~enwiki (talk) 06:52, 3 November 2019 (UTC)
So, to be clear, arg:func() is syntactic sugar for func(arg), right? And arg:func(a, b, c) is equivalent to func(arg, a, b, c)? - Gilgamesh~enwiki (talk) 07:12, 3 November 2019 (UTC)
Apparently it's not quite that simple... But I'd love to understand it. - Gilgamesh~enwiki (talk) 08:13, 3 November 2019 (UTC)
No, any old local or global variable can't be accessed with method syntax. For arg:func() to work, indexing arg.func (or arg["func"]) has to yield a function. So, setting func as a field in a table with local arg = { func = table.insert } enables it to be used as a method: arg:insert("elem"). (The same can be done by setting the metatable for the table: local arg = setmetatable({}, { __index = { func = table.insert } }).)
In the Scribunto variety of Lua, we can only modify the fields or metatables of tables. As mentioned, strings have a metatable that allows using the functions in the string library as methods, but it can't be modified. — Eru·tuon 15:35, 3 November 2019 (UTC)
I see... - Gilgamesh~enwiki (talk) 23:01, 3 November 2019 (UTC)
Well, I hope I'm making sense. Methods in JavaScript and Lua pretty similar apart from the this thing and the difference between prototypes and metatables. — Eru·tuon 17:24, 4 November 2019 (UTC)

Also, if you don't mind my asking, are there any thoughts or critiques you could offer on how I structure the module code, the things I'm doing in the functions, etc.? I'm trying not to make my code too convoluted, but I'm also consciously aware I'm exercising some degree of feature creep. And when I realized you were also exporting the internal conversion functions, I changed the export naming convention so that all such functions are prefixed with an underscore to indicate they are internal functions not intended for normal exported use rather than the actual exports functions. - Gilgamesh~enwiki (talk) 19:00, 2 November 2019 (UTC)

In regard to design, it would be simpler (at least conceptually, and for the testcases module) if the transcription-generating functions took a string and yielded a string, rather than an array of strings. Then multiple transcriptions can be handled by applying the functions multiple times. And it would be consistent with {{IPA}} to have the separate inputs in numbered parameters, rather than separate them with commas in a single parameter, and to bracket them separately: for instance, {{mh-ipa-rows|j&ngw&wil|jengwewil}} instead of {{mh-ipa-rows|j&ngw&wil, jengwewil}} yielding /tʲeŋʷewilʲ/, / tʲɛŋʷɛwilʲ/ as the phonemic transcription instead of /tʲeŋʷewilʲ, tʲɛŋʷɛwilʲ/. But this might complicate {{mh-ipa-rows}} or the module, so you should be the one to decide. — Eru·tuon 18:51, 5 November 2019 (UTC)

Okay, so to be clear...calling gsub with tbl is equivalent to function(match) return tbl[match] or match end? I thought if the item wasn't in the table, it might return nil or something, which is why I wrote it as a function that returns the item or match. Also, I noticed you replaced all those substitutions with "("..V..")(ː*)%1". I was honestly not aware it was possible to reference a capture within the same pattern. - Gilgamesh~enwiki (talk) 20:40, 4 November 2019 (UTC)

Yes, that's correct. Similarly, if a function supplied to gsub returns nil for a particular match, no change will be made to that match. For instance, both ("bat"):gsub(".", { ["b"] = "c" }) and ("bat"):gsub(".", function(char) if char == "b" then return "c" end end) return "cat". (Whereas in JavaScript if you do "bat".replace(/./g, function(char) { if (char === "b") { return "c"; } }) you get "cundefinedundefined". Heh.) — Eru·tuon 20:55, 4 November 2019 (UTC)
I appreciate what you've further done with the testcases, in making tests appear on the main module's page itself. And since I really didn't write any of the testcases script and am not sure what to change without breaking it, I should probably let you know that the MED/Choi/Willson stuff is not coming back. I don't know what I was thinking, putting linguists' conflicting vowel symbols in pronunciation sections as if they were different dialects—that was really unwise of me to begin with. - Gilgamesh~enwiki (talk) 12:10, 5 November 2019 (UTC)
@Gilgamesh~enwiki: In case you haven't noticed, I've made the testcases on Module:mh-pronunc/documentation to compare the outputs of Module:mh-pronunc and Module:mh-pronunc/sandbox. In each of the table cells for which the sandbox module differs, its output is shown below the output of the main module. — Eru·tuon 16:13, 6 November 2019 (UTC)
Yes, I noticed. It helps. Though I still don't quite understand how you're getting that word list programmatically, as it hasn't seemed to have updated since I added new word entries on the wiki. - Gilgamesh~enwiki (talk) 16:36, 6 November 2019 (UTC)
The list of pages and template inputs isn't automatically updated; I generated it from this list of all {{mh-ipa-rows}} templates, which I made two days ago with Pywikibot. I can regenerate it soon if you like. — Eru·tuon 16:41, 6 November 2019 (UTC)
Oh. Okay, that makes sense. - Gilgamesh~enwiki (talk) 17:24, 6 November 2019 (UTC)

Since you've been helping me maintain the module code, I thought I should let you know that I made some major changes to the code structure. I wrote a new local function, gsubBatch, to help reduce boilerplate in the source, since gsub is called a lot and I wanted to streamline it. - Gilgamesh~enwiki (talk) 23:55, 13 November 2019 (UTC)

@Gilgamesh~enwiki: I like it. You might want to take a look at this edit applying the useful behavior of the function replacement value. I think it makes the code more readable. — Eru·tuon 21:01, 14 November 2019 (UTC)

My gsubBatch function may not have been as wise as I once thought. Though it makes code more elegant to read, it can actually make it harder to debug, because errors that occur inside anonymous functions don't seem to report their line numbers if they generate an error, which in a long batch makes it harder to determine where the error came from. I may find myself restructuring code again, but if a lot of sequential gsub calls are necessary, I think I'd rather reduce the length of some variable names, because the sheer amount of boilerplate can be awful. - Gilgamesh~enwiki (talk) 00:55, 18 November 2019 (UTC)

@Gilgamesh~enwiki: Hmm, this should be an improvement. However, if you aren't aware, you can click the Lua error to get a backtrace (assuming JavaScript is working). — Eru·tuon 05:20, 18 November 2019 (UTC)
If I adopt a gsubBatch mechanism again, I'll look into it. - Gilgamesh~enwiki (talk) 17:05, 19 November 2019 (UTC)

I just noticed a strange abundance of words in the table spelt "Wiktionary:About Marshallese", with six different phonological forms. :) Also, been adding more words up to moments ago. - Gilgamesh~enwiki (talk) 17:05, 19 November 2019 (UTC)

@Gilgamesh~enwiki: Yeah, I wasn't sure if you had gotten all the new transcriptions, so I ran the Pywikibot script. It prints the contents of the transclusions of {{mh-ipa-rows}} in Wiktionary:About Marshallese as well as in entries; then I have to remove the unwanted titles. I added a list of titles to exclude so that in the future the unwanted titles can be automatically removed. Perhaps alternative spelling entries could just be soft redirects using {{alternative spelling of}}, without any definition or pronunciation (because both of those are the same for all spellings). I changed M̧ajōļ to an alternative spelling entry for M̧ajeļ based on something you said in the Wikipedia discussion, but am not sure about the others. — Eru·tuon 17:18, 19 November 2019 (UTC)
Yeah, the orthography takes a while to get a feel for. I'm still learning new mini-rules about it, especially recently since I started writing that script. Of the examples at the top of my head, where Bender phonemes are otherwise identical...
  • io̧kwe over iakwe or yokwe. io̧kio̧kwe isn't difficult from there.
  • eok over yuk, etc. The Marshallese new orthography, strictly speaking, has no Y.
  • jukwa over juga. The new orthography has no G, either. Just AĀBDEIJKLĻMM̧NŅN̄OO̧ŌPRTUŪW.
  • wōja over oja, and similar examples.
  • Wūjae over Ujae, and similar examples.
  • I'm not 100% sure whether Bok-ak or Bokaak should be considered primary. I'm guessing Bok-ak, because Bokaak unusually spells out an epenthetic vowel that the new orthography largely avoids.
  • Between spaces, hyphens and unspaced unhyphenated compound words, there's really no difference in pronunciation, so just one can be picked from multiple. Multiple words undergo assimilations in uninterrupted speech, and individual morphemes of words can be enunciated as needed. The logic of that is...a work in progress; I'm still trying to reconcile the differences between normal vowels and epenthetic vowels when they neighbor glide consonants {y h w}. Anyway, I'd probably go with unhyphenated words or hyphenated ones, and hyphenated words over spaced words.
  • Note overall that as I've written vowel simplifications into the module, I've largely been following orthographic norms in deciding which surface vowel to express. And I've been trying to leave notes as to "{this} is [that], not [that]", etc.
And thank you again. :) - Gilgamesh~enwiki (talk) 19:49, 19 November 2019 (UTC)
And Jāmo̧ over Jemo̧. - Gilgamesh~enwiki (talk) 22:35, 19 November 2019 (UTC)

EfficiencyEdit

I may have significantly increased the module's execution time, which may be extending table load times. I changed it so that forRemainder is actually (pretty much unconditionally) called twice and the duplicate result discarded. This is for careful mode (variable name subject to change), to satisfy inconsistencies between the way Bender (1968) and Willson (2003) described the language, and the more careful pronunciations prescribed by Naan (2014). Basically, in careful mode, the nasal consonant cluster assimilations are avoided, there's a handful more cases where clusters have epenthesis instead of assimilation, and the behavior of epenthetic vowels neighboring glides has changed. I don't necessarily see an inconsistency in including both, since most languages (including English) have words or phrases that differ notably in pronunciation when spoken more rapidly or more slowly, and can change how people perceive the word in their own speech. Compare "ornge" vs. orange, where some people primarily speak it as two syllables, and some (like me) say it as one syllable. - Gilgamesh~enwiki (talk) 20:02, 20 November 2019 (UTC)

Yes, execution time is definitely way up according to the "Lua time usage" measurement (at the bottom of the edit page). According to the profile as I am writing this, 4160 ms (85.2%) of that is mw.ustring.gsub. It's not a very efficient function because it's implemented using PHP regex and calls go over the Lua–PHP boundary. Sometimes the number of calls can be reduced by generalizing the patterns (regexes) and using a function replacement. — Eru·tuon 20:17, 20 November 2019 (UTC)
By the way, I like how the "careful" mode avoids assimilations. Assuming Arņo is a native word, it seems strange for the r to be assimilated into a ņ, when the only reason for the r to be in the spelling is if it is sometimes pronounced. Otherwise, it should be Aņņo. Similarly with Aujtōrōlia, which could be Auttōrōlia, though since it's a loanword and the j might be needed to represent the original s, it's not very strong evidence against assimilation. — Eru·tuon 20:44, 20 November 2019 (UTC)
Youch... So would it actually be more efficient to pass a function substitutor argument than a string substitutor argument? I'm all for increasing the efficiency of the script by whatever practical means available. It is also my very first Lua script.
And yes...Marshallese orthography has always been a strange creature. The new orthography since the 1970s is not purely phonemic, obviously, if you compare it with Bender's phonemes, but is designed so that syllables in isolation are reasonably easy to learn how to pronounce once you learn which sound each letter stands for, and is something foreigners (most of whose languages do not have vertical vowel systems) can more easily learn to pronounce. Native speakers of the language already know words in isolation, and know how to string them together into compound words and sentences, so their orthography can simply string together morphemes and allow epenthesis, sandhi, assimilations, etc. to take their natural course. In this way, it also preserves the morphemic structure and thus more of the etymology of words, in an orthographic approach also preferred in languages like French and Icelandic. Arņo is a compound name of two morphemes: ar "lagoon beach" and ņo "wave". If you simply write the assimilations and write it Aņņo, the etymology is relatively more obscured. What seems to be relatively new to the equation is learning how to pronounce words as they are written in a stable orthography already provided. This means that some consonant clusters that were previously routinely assimilated, may now be enunciated more carefully by people who have learnt to read and write at school. Spellings like kw increasingly are no longer taken as single consonant phonemes, but as sequences of k and w. Two-syllable words like io̧kwe may instead come to be analyzed as three-syllable words because of how they are written. rn is pronounced as two different consonants because it is written that way. I've seen evidence of these trends in the pronunciation guides prescribed by Naan (2014), my discovery of which led me to rethink how to write the Lua module. I honestly can't say I know how realistic these "careful" pronunciations are among native Marshallese speakers (some of it may well be more artificial than not), but it certainly seems to be increasingly how Marshallese is taught, at least in a college environment. If only we had more access to more native Marshallese speakers, but internet access is too expensive and unreliable for most of the population. (I'm impressed that the undersea fiberoptic cable connecting Majuro to Guam manages to span the Marianas Trench.) - Gilgamesh~enwiki (talk) 22:09, 20 November 2019 (UTC)
I just noticed you made changes to the script. I haven't fully assessed the changes yet, but I've seen just enough to pique my interest. - Gilgamesh~enwiki (talk) 22:31, 20 November 2019 (UTC)
Yeah, I think a function substitution can be more efficient. The function replacement handling assimilation is slightly faster, if the "Lua time usage" figures for the "before" and "after" versions of the module are accurate. (But sometimes the figures vary unpredictably. Greater differences are less likely to be the result of chance.) It means only one mw.ustring.gsub call to handle all assimilations, and perhaps the overhead of calling a function for every series of two consonants is less than the overhead of multiple calls to mw.ustring.gsub. I think that's plausible because of all that PHP has to do for each mw.ustring.gsub call.
I didn't realize Arņo was a compound (naturally, since I'm pretty ignorant). That does provide an explanation for the spelling, even if there's assimilation. — Eru·tuon 22:35, 20 November 2019 (UTC)
Is it all right if I rename the substitutor function's variable names? Not just because I generally start non-consonant variable names with a lowercase letter, but C2 already exists as a separate higher scope variable, and using a different variable name may reduce the risk of variable name confusion and make the code more readable.
And s'fine. A lot of common Marshallese morphemes are only two letters long, and there was no Wiktionary Marshallese entry for ar yet anyway. - Gilgamesh~enwiki (talk) 22:42, 20 November 2019 (UTC)
Yeah, the variable name duplication is not a good idea. I noticed it and was displeased. I do prefer somewhat descriptive variable names over "a, b, c, d" though. — Eru·tuon 22:47, 20 November 2019 (UTC)
I tend to think of captures as a, b, c, d as a sequence of captures, and easier on the eyes than letter-numbering them like c1, c2, c3, c4, etc. Anyway, I think I know what you're trying to accomplish. Your code broke some of the (as of yet unused) nʷtˠ logic, but what you're doing here looks very, very clever and I think I know how to take it and run with it with other parts of the code. - Gilgamesh~enwiki (talk) 22:58, 20 November 2019 (UTC)
Well, the variable names C1, A1, C2, A2 were abbreviations of "consonant 1", "articulation 1", "consonant 2", "articulation 2" (though that's not completely accurate terminology, since it's more like primary and secondary articulation), so more descriptive than either a, b, c, d or c1, c2, c3, c4. — Eru·tuon 23:03, 20 November 2019 (UTC)
I've thought of it: x, xx, y, yy. It helps that neither X nor Y are in the standard new orthography. And when I realized what you were doing, I rewrote your function. May I demonstrate...? - Gilgamesh~enwiki (talk) 23:36, 20 November 2019 (UTC)
Ahh, that's much more readable! — Eru·tuon 01:20, 21 November 2019 (UTC)
Thanks. :D And I'm not even done yet. You gave me the idea, and I'm running with it. About to try another edit. - Gilgamesh~enwiki (talk) 02:14, 21 November 2019 (UTC)

In response to your question, "Why did the epenthetic vowel disappear between the p and the k in Āneeļļapkaņ?", the pattern is not matching the /pʲkˠ/ when mw.ustring.gsub is called the second time, because /lˠlˠ/ is not changed when mw.ustring.gsub is called the first time, and is matched both times. Here is a technique for cases like this that also allows mw.ustring.gsub to be called only once. (Gah, in the edit summary I meant to say "getting the surrounding consonants with mw.ustring.sub", not "mw.ustring.gsub".) — Eru·tuon 02:54, 21 November 2019 (UTC)

Your solution with the i and j indices was clever. (I renamed them xvi and yvi.) It all...seems to work now. Now let's see if I can rewrite the logic of another expensive regex batch without breaking it too badly.
Oh, and...the table's Rālik vs. Ratak logic seems reversed. When both forms are the same, it shows two table cells. But when the forms differ, it only shows the Rātak form.- Gilgamesh~enwiki (talk) 03:13, 21 November 2019 (UTC)
How much time do you think was shaved off the module's execution, comparing right after I added "careful" mode to when we rewrote this regex batch? - Gilgamesh~enwiki (talk) 03:15, 21 November 2019 (UTC)
Whoops, fixed the logic. Glad you spotted it.
It is apparently somewhat faster; I previewed Module:mh-pronunc/documentation three times with the old version and the new version, and got 5.3 or 5.4 or 7.1 seconds and 4.5 or 4.6 or 3.0 seconds respectively. Significant variation, so it's hard to say just how much faster, but there wasn't overlap. The number of calls to mw.ustring.gsub in Module:mh-pronunc in the generation of the testcases table (counted thus) has been reduced from 228,294 to 156,516.
We should probably be editing Module:mh-pronunc/sandbox to avoid changing transcriptions in entries (and avoid asking the server to update pages).... — Eru·tuon 07:18, 21 November 2019 (UTC)
So, edit sandbox for experimental code, and the main module for stable milestones? Yeah, I can see how that's a good idea. - Gilgamesh~enwiki (talk) 13:36, 21 November 2019 (UTC)

I've been considering an alternative approach to programming the phonetic algorithm. As it currently stands, the regex approach is effective in thoroughly processing the input text, but it's also proven a lot more inefficient than I predicted. Putting more logic into substitutor functions improves the performance somewhat, but in a process where regex replaces matches one by one, it's not as practical in making necessarily adjustments to vowels that were already replaced. For example, this existing code:

				-- {yekʷey, yewan} are [ɛɡʷɛ, ɛwɑnʲ], not [ɛ̯ɔɡʷɛ, ɛ̯ɔwɑnʲ]
				text = gsub(text,
					"(ɦʲ@*)([ɔou])(@*.ʷ.?ʷ?@*[æɛeiɑʌɤɯ])", function(a, b, c)
						return a..VOWELS_Y[b]..c
					end)

Unlike other logic that replaces text based on what already exists to the match's left-hand side, this replacement can only be made if the stable value of the vowel on the right is already known. This is how I earlier solved the Ānewātak problem so that its phonetics were properly displayed as [ænʲeːwæːtˠɑk] instead of [ænʲeowæːtˠɑk]. In a more optimized approach, that could be fixed in a second regex pass, but I think I have a better idea—I just don't know beforehand how practical it will be.

Basically, my idea is, instead of relying so much on regex, just parse the input text and represent its data as a doubly linked list of table objects, where each node represents either a consonant or a vowel. Code could loop through the link nodes, make changes in them informed by nodes that come before or after, and can make secondary changes to previous node data as needed. Then, when the linked list is done being manipulated, convert it back to text.

But can this all be done in Lua using only linked lists and logic, more efficiently than batches of regex replacements can do it? - Gilgamesh~enwiki (talk) 18:46, 22 November 2019 (UTC)

I'm not sure, but I think it could end up being faster because the overhead of many mw.ustring.gsub calls is considerable. It could also reduce memory because fewer intermediate strings would be created. But I'm speculating.
I haven't done anything quite like this; the closest thing is the pair of functions make_tokens in Module:grc-utilities and tr in Module:grc-utilities. The former processes Greek characters into "tokens" (sub-sequences, mainly to handle diphthongs and single vowels correctly), and uses objects to represent the characteristics of the Greek characters, and the latter processes the tokens to create a transliteration. Not super elegant, but my version of the tokenization function was much faster than the previous one, probably because it got rid of most of the calls to mw.ustring functions.
Using a doubly linked list is an interesting idea. It could be more elegant, though I can't imagine all the details of how it could work. — Eru·tuon 03:24, 24 November 2019 (UTC)
Well, practically any grc script has to be easier to maintain than the pre-Scribunto version, which I wrote back in the day. That was such a beast... - Gilgamesh~enwiki (talk) 14:37, 24 November 2019 (UTC)
Wait...you said mw.ustring functions were inefficient. Does that include mw.ustring.sub? - Gilgamesh~enwiki (talk) 14:40, 24 November 2019 (UTC)
mw.ustring.sub is noticeably inefficient when there are many calls, for instance when you iterate through strings using for i = 1, mw.ustring.len(str) do local character = mw.ustring.sub(str, i, i) end. In the previous version of the tokenization function, mw.ustring.sub was called about up to three times for every code point in the string. My impression is that that explained most of the inefficiency in the old version of the function, though it's not a great testcase because the old and new versions are so different. The overhead is probably not as noticeable in the function replacement in Module:mh-pronunc though, where it currently has only 2,028 calls, as opposed to 115,872 for mw.ustring.gsub to create the testcases table. (And I guess mw.ustring.gsub probably has greater overhead.) It's not so efficient that the function should be avoided altogether.
I should say, the module is already efficient enough in entries (it looks like {{mh-ipa-rows}} takes about a twentieth of a second in entries), so don't feel obligated to remodel it for that reason at least. (Not to discourage you from rewriting it if you want to – I do quite a bit of random rewriting of modules for various reasons.) — Eru·tuon 23:08, 24 November 2019 (UTC)
It's not just Wiktionary I have to think about. I want to also be able to migrate the code to Wikipedia. Most WP articles where it would be relevant might need the entry only once, but not on articles like Kwajalein Atoll where there are Marshallese names provided for all the notable islets and many of them are notable, but most not notable enough to get separate articles of their own. And some of these islands have two or three separate Marshallese names depending on context. Obviously, being WP, pronunciations aren't embedded in the same format as Template:mh-ipa-rows, and perhaps that means fewer functions called, but toPhonetic would certainly be called multiple times in an article like that. I'd rather not add that much extra load time there. - Gilgamesh~enwiki (talk) 00:12, 25 November 2019 (UTC)
Also, as I've tried to write linked list code, I'm realizing that I'm still creating a beast of a different kind: Far fewer mw.string, but immensely more bloated code. I get the impression that functions like mw.string.sub are so expensive because the strings are probably encoded in UTF-8, but logic required to seek codepoint indices—or worse, conceivably to convert between UTF-8 and UTF-16 and back—may involve a lot of overhead if called often enough (I'm not sure which, if any of these things, is actually being done). Obviously we're working with a lot of Unicode text and the data needs to be preserved in that format.
I wonder...what if I completely redesign the internal code format (returned by parse and passed to the other internal functions) to use only ASCII surrogates and byte-based string functions for the text-crunching, and then convert them to Unicode forms to represent their final forms? Are there also byte-based functions available for regex that are more efficient? - Gilgamesh~enwiki (talk) 00:12, 25 November 2019 (UTC)
I just had a thought. Many calls to mw.ustring.sub can be expensive, right? But most of the time I only need a single Unicode character. What if I...split a string into an array of characters first, and just reference the array's indices? No dynamic linear behavior involved in retrieving an indexed Unicode code point from a byte string. - Gilgamesh~enwiki (talk) 02:00, 25 November 2019 (UTC)
Hm, yeah, maybe some Wikipedia articles could invoke the module enough to noticeably increase Lua time usage. There are quite a few words in Kwajalein Atoll that could have IPA transcriptions.
I certainly hope mw.ustring.sub doesn't do any conversion between UTF-8 and UTF-16. That would be madness. I found that the implementation of mw.ustring.sub calls mb_substr in PHP, which calls mbfl_substr, but I didn't figure out what it does to UTF-8.
The byte-based functions are the string library functions (the ones that can be called as methods on strings). They are much more efficient because they call directly into C and don't have to deal with UTF-8 or Unicode categories. But using ASCII replacements for the Unicode characters sounds like a bit of a pain; it could make the intermediate forms a bit harder to understand.
Yeah, using an array of characters should be cheaper if you're calling mw.ustring.sub to get multiple characters from the same string. To be super cheap, I would use string.gmatch: function get_character_array(str) local arr, i = {}, 1 for char in string.gmatch(str, "[%z\1-\127\194-\244][\128-\191]*") do arr[i] = char i = i + 1 end return arr end. — Eru·tuon 05:38, 25 November 2019 (UTC)
I'm increasingly wondering if UTF-16 isn't involved under the hood at all. But then, Unicode code point operations on UTF-8 data still means that the functions cannot know in advance which byte index contains which code point index, which means that it has to measure from the start of the string. That means linear behavior, and that isn't much better than converting the whole string to UTF-16.
Anyway, the string-to-character-array code I had in mind was mw.string.split(text, ""), called only once before a major mw.string.gsub operation whose substitutor function would have otherwise needed mw.string.sub multiple times per match. I hadn't considered your string.gmatch approach before, but it looks interesting—might there be a way to expand it to work with three- and four-byte UTF-8 code points?
And yeah, trying to find an ASCII-based surrogate code has proven...challenging, to the point I think maybe I won't do it. I tried to design a Unicode-to-ASCII-to-Unicode cipher mostly based on X-SAMPA, but it had its constraints, and a lot of X-SAMPA sequences use two or more ASCII characters where Unicode IPA would only use one code point. It's fortunate I'm pretty knowledgeable in X-SAMPA, which greatly improved since I wrote an offline JS utility (downloadable here) that automatically converts X-SAMPA input to IPA as you type. (I wrote it several years ago, and my coding conventions have certainly improved since then, so don't be too horrified if you view source. If I could write the identical utility today, there would be so many things I'd change. But I digress.) So, to try to come up with a one-code-point-to-one-character cipher, I had to think of ways to simplify some sequences. [æɛeiɑʌɤɯɒɔou] already has a one-to-one conversion with {EeiAV7MQOou, but when writing regex sequences, { would have to become %{, so I could just replace it with a instead. The secondary articulations is where it gets trickier, as the equivalents of [ʲ ˠ ʷ] are ' _G _w. Since I only use [w] as a final phonetic presentation form, I could conceivably just use j G w, but it's again complicated where the X-SAMPA equivalent of [ɦ] is h\. Lots of these little things call for lots of little simplifications, until you get to the point where the internal string /ɦʲænʲeɦʲelˠlˠæpʲkˠænˠ/ (Āneeļļapkaņ) has a pseudo-X-SAMPA appearance of hjanjehjelGlGapjkGanG, and...I end up kinda not wanting to go that route anymore. Regex and the algorithm can already get complex enough without making the internal IPA so much harder to read. - Gilgamesh~enwiki (talk) 16:27, 25 November 2019 (UTC)
Oh, just now realized that your "[%z\1-\127\194-\244][\128-\191]*" does support three- and four-byte code points. - Gilgamesh~enwiki (talk) 16:36, 25 November 2019 (UTC)
Wait, your example code just grows an array by assigning new indices to the end of it? That seems bad to me from a JS background, where an array becomes much more inefficient unless you grow it with array.push(element). You sure that doesn't hurt array storage efficiency on the JIT site? (Or does Scribunto/Lua not use a JIT anyway?) I'd probably find myself writing it with push's Lua equivalent, table.insert. - Gilgamesh~enwiki (talk) 16:41, 25 November 2019 (UTC)
Huh... Okay, then, your approach is better. :) - Gilgamesh~enwiki (talk) 16:44, 25 November 2019 (UTC)
Hm, is it generally safe (and hopefully performs better) to use byte-string-based regex functions on UTF-8 strings in situations where it doesn't have to care how the Unicode code points are encoded? UTF-8 searches, UTF-8 replacements, etc. It seems to me like it would only really get unsafe if you tried to mix non-ASCII characters into single-character regex logic ([xyz] x? x* x+ etc.), as it would test for the byte rather than the codepoint. But stuff like simple substring replacements and multi-character captures (xyz) could be fine even with UTF-8 code points included. - Gilgamesh~enwiki (talk) 17:02, 25 November 2019 (UTC)
table.insert isn't any more efficient than t[i]. As mentioned in the link, it's actually slower because of the two meanings that table.insert has (table.insert(t, val) vs. table.insert(t, i, val)). Scribunto doesn't use LuaJIT. It would probably improve performance to allocate the entire array at once with { nil, nil, nil, ... }, but that requires knowing the number of code points and having a function that can return that many nils.
Yep, those are two cases in which the string library doesn't work with multi-byte characters; also several of the character classes like %s are Unicode-dependent in the mw.ustring library. I wrote a little about this at WT:LUA § Ustring patterns and created Module:User:Erutuon/patterns, which contains a function that tests whether a pattern will match correctly (according to UTF-8 and Unicode semantics) in the string library functions.
I imagine that converting UTF-8 to UTF-16 and back requires memory allocation, so there should be a significant performance penalty if mw.ustring.sub is implemented that way. Certainly indexing UTF-8 by code point is slower than byte indexing, but I imagine with this decoding technique it could be fairly fast. — Eru·tuon

I've given the the theoretical Unicode-to-ASCII-pseudo-X-SAMPA cipher more thought, and I believe if I were to use it, it would look something like this:

p b t d z k ɡ m n ŋ r l ĭ ī ɣ ɦ ɧ _ ʲ ˠ ʷ æ ɛ e i ï ɑ ʌ ɤ ɯ ɒ ɔ o u ◌̯ ː ◌͡◌
p b t d d k g m n N r l y Y H h H _ j G w a E e i I A V 7 M Q O o u ^ : =

Because, on second thought, hjanjehjelGlGapjkGanG is rather hard to read, but then, so is /ɦʲænʲeɦʲelˠlˠæpʲkˠænˠ/. These are internal formats, not display formats (even the internal IPA is pseudo-IPA), and at least X-SAMPA is well documented enough for a pseudo-X-SAMPA approach to be viable. I'm still working with code ideas offline. - Gilgamesh~enwiki (talk) 21:23, 26 November 2019 (UTC)

I've tried a variety of coding approaches, and I'm realizing there may be no real substitute for batches of regex. Regexp can be written fairly concisely, and the more bloated code comes, the harder it is to read. And after multiple attempted rewrites, I've found that I've stopped writing comments to reduce mental gear-shifting. Well-written code doesn't need many comments anyway. I just want to write something that balances readability with efficiency. Fortunately, I've had decent success with the pseudo-X-SAMPA approach in concept, and I can minimize the use of UTF-8 regex functions and rely more on faster functions like string.gsub. (At least I hope it's faster...) - Gilgamesh~enwiki (talk) 08:16, 2 December 2019 (UTC)

This revision does seem to be noticeably more efficient than this: about 1.7 seconds versus 2.7 or so. Since some of that is the less efficient Module:mh-pronunc, I guess the sandbox module takes 1.7 - 2.7 / 2, or 0.4 seconds. But there is a tradeoff between efficiency and readability. 20:34, 2 December 2019 (UTC)
I wonder...how are Lua's regular expressions functions implemented? string.gsub, string.find, etc. I cringe to think that the engine has to compile a new regex edifice every time the regex code is passed to one of these functions. I hope they are at least being cached between calls, either in an internal hashtable or attached to the internalized pattern strings themselves. - Gilgamesh~enwiki (talk) 02:08, 3 December 2019 (UTC)
Since Lua patterns are so much simpler than proper regular expressions, they're just interpreted. You can see the pattern-interpreting function used by all of the string-library pattern-matching functions, except string.find when the plain flag is set, here. — Eru·tuon 04:15, 3 December 2019 (UTC)
I see... I hadn't considered that. Keeping it simple means implementing it simple. - Gilgamesh~enwiki (talk) 04:27, 3 December 2019 (UTC)

I finished writing the new draft and ironing out the bugs, and replaced the non-sandbox version with it. How does the performance compare now with the previous version? - Gilgamesh~enwiki (talk) 21:32, 5 December 2019 (UTC)

Wow! Considerably faster for the whole testcases table: less than half a second. — Eru·tuon 22:52, 5 December 2019 (UTC)
Seems like a winner, then. And the code is readable? The pseudo-X-SAMPA isn't too much trouble? I had to deviate significantly for some symbols, like c J h H y Y a I @ which do not represent their conventional X-SAMPA counterparts, for the sake of being more regex-pattern-friendly and single-character-friendly. The way I use them, c is actually [t͡s], J is [d͡z], h and H are transitional representations of unsurfaced and surfaced glides, y is {yi'y} ([i̯]), Y is {'yiy} ([iː]), a is [æ] ({ isn't as readably regex-friendly), I is a dotless [ı] obsolete or nonstandard characters (ı), invalid IPA characters (ı) that is friendlier to IPA tie bars, and @ is the diacritic [◌̆]. Otherwise (unless I've forgotten any), the symbols are the same as their X-SAMPA counterparts (or _-notated forms thereof), which are mostly the same as their IPA counterparts when they are plain Latin lowercase letters. The system works well. (Right now, in edit preview, it complains that [ı] obsolete or nonstandard characters (ı), invalid IPA characters (ı) is invalid IPA, but the choice is really just to keep the tie bar from hovering so much higher than over other pairs of vowels when [i] is present—[u͡i] vs. [u͡ı] obsolete or nonstandard characters (ı), invalid IPA characters (ı). If it proves problematic, it can be reverted to [i]—I just wanted to polish the presentation a bit, which makes a different with certain IPA typefaces like Gentium and certain browsers like Firefox.) - Gilgamesh~enwiki (talk) 01:35, 6 December 2019 (UTC)
It looks pretty readable to me, since I'm familiar with a fair amount of X-SAMPA.
An alternative to using the dotless i would be to use  ͜ (U+035C COMBINING DOUBLE BREVE BELOW) if either of the two vowels is i: [u͜i]. I prefer that because the dotless i confuses me: it looks somewhat like ɪ, and I think I'm used to seeing the dot when there's a tie bar. The equals sign could be converted to the tie character above or below before the rest of the ASCII characters at the end. — Eru·tuon 04:40, 6 December 2019 (UTC)
That is a very good point. I think I'll do what you suggest. - Gilgamesh~enwiki (talk) 04:49, 6 December 2019 (UTC)
You know, it has been my conventional wisdom for decades that regular expressions are one of the slowest devices in scripting, and that practically any other conventional means of parsing text is preferable for speed. But that isn't always true, is it? At least, not in Lua. In some cases, string.gsub actually seems faster than trying to do the same thing procedurally, even if you try to do it all with arrays of one-character strings. These calls are actually a lot faster than I gave them credit for—I knew they would be faster than mw.ustring.gsub, but not that they might actually be faster than my attempts to do the same thing procedurally. I suppose it also helps that, this time, I eliminated most throwaway lookup tables, and instead generate them only once and cache them.
All that said...I still kinda hate Lua. Too many thens and nots and not enough curly braces, and arrays starting at 1 instead of 0 is consistently maddening. I miss JavaScript. Would love to write modules in modern JS. - Gilgamesh~enwiki (talk) 05:09, 6 December 2019 (UTC)

I made a small change that could significantly improve performance, at least for some regex replacements, but I don't know how well. The change is:

local function string_gsub2(text, pattern, subst)
	local result = text
	result = string.gsub(result, pattern, subst)
	-- If it didn't change the first time, it won't change the second time.
	if result ~= text then
		result = string.gsub(result, pattern, subst)
	end
	return result
end

Still looking for small ways I can improve efficiency. - Gilgamesh~enwiki (talk) 19:44, 21 January 2020 (UTC)

toMODEdit

I wrote a simple new function, toMOD, that I need tested, perhaps with a new column in the table. It converts standard orthographic spelling to the format used by the Marshallese-English Online Dictionary, converting ĻļM̧m̧ŅņN̄n̄O̧o̧ to ḶḷṂṃṆṇÑñỌọ. This has potential applications in Marshallese reference templating, where a word in standard orthographic spelling can be automatically converted to MOD's spelling so that references can link directly to dictionary entry anchors on that site without us needing to directly embed a differently-spelt word in the external link. No such template has been written yet. It may be a good idea for each row of the "term" column and a potential MOD column to share a table cell where the forms have identical spelling. And, in any event, the separate MOD spelling should probably not link to a Wiktionary entry with that spelling, as it is and always was a non-standard alteration to Marshallese orthography which is largely limited to the MOD, Naan and associated media intended for offline distribution to available computers in the Marshall Islands. I imagine that, if the standard orthography were considered friendlier to older Windows and Mac computers and their available font rendering, MOD and Naan would be using the standard orthography out of the box, but for the time being they are what they are. - Gilgamesh~enwiki (talk) 07:44, 10 December 2019 (UTC)

That is a useful function to have. I think it would be useful to display the MOD spelling in the entry, unlinked – that would allow people to search for the MOD spelling (ḷọñ and find the entry (ļo̧n̄), provided there's no entry for a homograph of the MOD spelling. — Eru·tuon 22:09, 10 December 2019 (UTC)
I thought most modern browsers allow Ctrl-F text searches that recognize letters and ignore diacritics. Right now I press Ctrl-F and type unmarked "lon" and it finds both of those words you just mentioned. However, just displaying the MOD spelling in the entry might be doable...might need some new templates. But I think I've been hesitant to dive into new Marshallese entry templating design too soon when there are still so many aspects of the language's grammar I don't fully understand. For instance, all Marshallese adjectives are verbs, and beyond suspecting that adjectives are stative verbs (equivalent to English "to be <adjective>"), I don't know what else that actually means. Yet for now, a Marshallese entry template doesn't have to be complicated—it can just redirect to the standard entry template, but display the MOD spelling as an alternate where they differ.
By the way, I've not yet figured out how display actual wiki markup using Scribunto/Lua—everything I print out seems to be the same as the contents of <nowiki></nowiki>. If I knew how to write scripts that generate more complex wiki markup output, I might be able to migrate more of the functionality of {{mh-ipa-rows}} to a template.
It also occurs to me that Module:mh-pronunc is getting big, at over 30K now. Conventional wisdom suggests splitting it up into multiple scripts that can be imported into each other as needed, but then a multi-file project isn't as simple to mirror at Wikipedia. (A copy exists at wikipedia:Module:mh-pronunc, and its comment at the top links back here.) So maybe, the most portable, reusable portions could be maintained as one script, and more site-specific applications can be separate scripts that can stay on this wiki. For instance, mh-ipa-rows is useful at Wiktionary but notso much at Wikipedia. - Gilgamesh~enwiki (talk) 03:04, 11 December 2019 (UTC)
Oh, by search I'm mean the search engine for Wiktionary. Right now ļo̧n̄ is the 17th result in the search for ḷọñ, but if it is displayed in one of the templates, it should be higher in the results. I was thinking the MOD spelling could be displayed in the pronunciation template, but that isn't quite appropriate, and anyway alternative spelling entries probably need a MOD spelling, but might not have a pronunciation template. Probably the template that displays the MOD spelling should be placed in the Alternative forms section.
I've maintained a sort-of mirrored version of a set of Wiktionary modules on Wikipedia (Module:Unicode data), but the Wikipedia and Wiktionary versions have drifted apart in some ways; it's tedious copying the source code. It might be easier with a Pywikibot script, but I can't edit the Wikipedia module anymore because it's been template-protected. — Eru·tuon 04:05, 11 December 2019 (UTC)
I didn't realize that's what you meant—I put it in (newly-created and under-featured) {{mh-head}} for now. At least the MOD spelling is being displayed, though. And I don't think it may be the best idea to put the MOD spelling in an alternative forms section, because it may prompt a naive third-party editor to turn the unlinked term into a linked term and create a word entry. My concern is that it may motivate an unnecessary duplication of many entries with the non-standard orthographic variants. It also doesn't help that some sources for the language write Marshallese words without any diacritics, and it seems dan was created from one of these sources as an unknowing duplicate of dān. - Gilgamesh~enwiki (talk) 08:05, 11 December 2019 (UTC)

If I may ask, could you please update the table? I was updating it manually, but then I added so many new entries that I got behind. Most of the new entries are words that start with ri-—demonyms, mainly. - Gilgamesh~enwiki (talk) 05:08, 15 December 2019 (UTC)

Done. And finally the script is fully automatic: it reads the "excluded titles" list and updates the list of template input without me copy-pasting anything. — Eru·tuon 09:59, 15 December 2019 (UTC)
Thank you. What do you think of the state of the script and entries now? It's still only a tiny selection of the language, but I've been trying to steadily add more words. I'll also try to add words of phonological interest that help continue to refine the script. - Gilgamesh~enwiki (talk) 11:01, 15 December 2019 (UTC)

Overhauling Template:mh-headEdit

Marshallese doesn't have all the complex noun cases of an agglutinative language, but it does have some inflected forms, and {{mh-head}} would seem to be the appropriate place to list these. I have an idea of what I want to accomplish, but it may require some additional Scribunto/Lua API I'm not that familiar with, since I think template-only logic would become unnecessarily bloated. I was wondering if you could help me write such a template and backing script. I need to figure out how vanilla {{head}} creates its inflection list and handles the appropriate automatical categories with language-sensitive sorting keys, and how I can extend or replicate that in a script, with possibilities like default inflected forms, more than one of the same kind of inflected form, etc. I can conceptualize what I want to achieve, but API-wise I'm in over my head. - Gilgamesh~enwiki (talk) 02:14, 24 December 2019 (UTC)

I think I found some resources to start with, chiefly Module:headword. - Gilgamesh~enwiki (talk) 18:02, 24 December 2019 (UTC)

Yeah, the language-specific headword-line modules call full_headword in Module:headword and if necessary format_categories in Module:utilities to format extra categories that don't begin with the language name. In the Marshallese module there could be a main function that generates the MOD spelling and it can call one of the pos_functions to handle part-of-speech-specific stuff. I'm not sure what is a good module to base the Marshallese one on though. Much of Module:eo-headword is probably understandable because the morphology is simple at least. — Eru·tuon 19:52, 24 December 2019 (UTC)
Now that I understand the technical aspects better of implementing the template, I realize I still need a better understanding of the grammar, so I'll put it off for the time being. After all, I'm sure there may be all sorts of unforeseen errors in the Wiktionary entries that could be remedied with a better understanding of both Marshallese grammar and the MOD entry structure. - Gilgamesh~enwiki (talk) 05:04, 25 December 2019 (UTC)

Distributive verbsEdit

I think sometimes I forgot just how much technical work you do here at Wiktionary, beyond just helping me with a Marshallese module. I created a new category, Category:Marshallese distributive verbs, but {{auto cat}} shows this category is not supported. What would be involved in creating new grammar categories? - Gilgamesh~enwiki (talk) 13:45, 14 January 2020 (UTC)

Some brief background: Marshallese distributive verbs basically modify a noun or verb with the rough inflected meaning of "there are a lot of [something]s." This particular grammatical form is demonstrated extensively in example sentences throughout the Marshallese-English Online Dictionary. - Gilgamesh~enwiki (talk) 13:53, 14 January 2020 (UTC)

The "distributive verbs" category should only be added to the category system (Module:category tree/poscatboiler/data/lemmas probably) if it's going to be used in other languages and the meaning is roughly the same for all of them – meaning if there are distributive verbs in another language with a different meaning, that doesn't allow us to have a single description for every language's distributive verbs category. At least to start with, it can have manual content. — Eru·tuon 23:38, 15 January 2020 (UTC)
That seems logical. Since I'm not specifically aware of distributive verbs being in any other language, I couldn't guarantee they would mean the same thing in those languages. As it is, Marshallese already uses at least a few relatively exotic grammatical forms that only one or a few other languages use—for instance, besides Category:Marshallese noun construct forms, there's only Category:Hebrew noun construct forms as subcategories of Category:Noun construct forms by language. Then there's also adjective verbs, which I initially categorized as Category:Marshallese adjectives, but then wondered if they shouldn't be better in Category:Marshallese stative verbs (there are no adjectives that are not verbs), when in reality these grammatical categories don't always easily fit in the existing conventional hierarchy, and I'm not proficient enough in the language myself to make confident decisions about their placement, and I fear I may be introducing errors that might have to be fixed in bulk at a later date. - Gilgamesh~enwiki (talk) 06:28, 16 January 2020 (UTC)

@Erutuon Wow, you are a busy bee. I think I have even greater respect for what you do here than I did even just 24 hours ago. As much as I would appreciate your continued feedback in my ongoing endeavors, I can still wait. - Gilgamesh~enwiki (talk) 23:28, 15 January 2020 (UTC)

BugEdit

@Erutuon There's a bug in the module's debug table, most noticeable with words whose Bender spellings start with "yiy" and a vowel. In line with references explaining how Marshallese words can be enunciated phoneme by phoneme, I'm testing an experimental enunciate-mode, where short prosodic breaks [|] are inserted in the middle of consonant clusters. The problem is...the International Phonetic Alphabet specifies these as pipe characters |. I already tried hard-coding {{!}} in the module output, but it only looks like {{!}}. So now I'm using a normal pipe character, but there's a bug in the way the module's debug table displays it. What's only displaying æ.e.kʷwɤtʲ] should actually be displaying [i | æ.e.kʷwɤtʲ] - Gilgamesh~enwiki (talk) 19:03, 16 January 2020 (UTC)

@Gilgamesh~enwiki: Fixed, in the testcases module, by escaping the pipes. They are part of template syntax, and in this case the stuff before the pipe was being treated as attributes for the table cell. — Eru·tuon 19:14, 16 January 2020 (UTC)
Thank you. :) - Gilgamesh~enwiki (talk) 19:25, 16 January 2020 (UTC)
Just FYI: it's unnecessary to ping someone on their talk page, because they already get a notification just from someone else editing their talk page. Chuck Entz (talk) 04:11, 17 January 2020 (UTC)
Ahh, good to know. - Gilgamesh~enwiki (talk) 06:37, 21 January 2020 (UTC)

Ratak and Rālik specific word categoriesEdit

How do I set this up? So things work in {{lb}}, and so forth. I know similar categories exist for Category:Indian English, Category:New Zealand English, etc. The Ratak Chain and Rālik Chain dialects of Marshallese are mutually intelligible, and differ mainly by some regular variations in pronunciation reflex, and some vocabulary differences. But many of the different forms are often still written differently depending on dialect. For instance, m̧m̧an "good" is the common stem, em̧m̧an is the Rālik reflex, and m̧ōm̧an is the Ratak reflex, but in both dialects the prothetic vowel vanishes if the stem takes a bare vowel prefix: rūm̧m̧an (ri- + m̧m̧an) means "good person." I want to start making articles for the stem forms, and have their dialect reflex entries (by spelling) automatically categorized through {{lb|mh|Ratak}}, {{lb|mh|Ralik}}/{{lb|mh|ālik}}, etc. I should add that I don't know if the dialects themselves have supplemental language codes, the same way Tosk Albanian is "als" (Albanian, South) and Gheg Albanian is "aln" (Albanian, North).

I'm not sure what to name the categories, though—"Rālik Marshallese"? "Rālik dialect Marshallese"? "Rālik Chain Marshallese"? I'm not sure what the most stable nomenclature would be. In the Marshallese-English Online Dictionary, they're also frequently just called "Dial. W" and "Dial E.", since Rālik ("sunset") is the western chain and Ratak ("sunrise") is the eastern chain, but the two dialects' native isogloss line still runs between the two chains themselves.

I should probably additionally add...I'm not 100% sure that I know what I'm doing. It's one thing to know how templating and scripting languages work (which I increasingly know), and another thing entirely to know how existing templates and scripts are set up so I extend them for specific editing needs. - Gilgamesh~enwiki (talk) 01:14, 20 January 2020 (UTC)

@Gilgamesh~enwiki: Categories for most language varieties are added to entries via Module:labels/data/subvarieties. You can add definitions for the labels {{lb|mh|Ratak}} and {{lb|mh|Ralik}} there, with categories and linked display text if desired. Personally, I like the shorter category name: "Rālik Marshallese". The category page can explain what it means. It looks like there aren't ISO codes for Rālik and Ratak, but if they might be referred to in etymologies (for instance, {{der|en|<code for ralik>|word}}), then they could be given Wiktionary codes in Module:etymology languages/data too. — Eru·tuon 19:34, 20 January 2020 (UTC)
Thank you, I'll check out the subvarieties. And if nothing else, "mh-ralik" and "mh-ratak" may suffice as ad hoc language codes if ever needed. - Gilgamesh~enwiki (talk) 19:41, 20 January 2020 (UTC)

Enunciated columns in the debug tableEdit

In addition to the previous section I just wrote, I was wondering...do we risk the module timing out if we add additional enunciated columns to it? Seeing that enunciated mode has since been fully deployed to articles wherever a consonant cluster exists in the phonemic form, acting on previously unread documents that Austronesier and I discussed at wikipedia:Talk:Marshallese language—see kajin M̧ajeļ for a good example of how normal phonetic and enunciated IPA can differ. And it's not just the absence of consonant assimilations or epenthetic vowels, but also some different vowel reflexes simply as a consequence of the last vowel before a consonant cluster being the last vowel of its prosodic fragment and the first vowel after a consonant cluster being the first vowel of its prosodic fragment—see eakeak, tuen̄ and utut to see what I mean. (Incidentally, you may be pleased to see that Arņo now shows two different consonants when enunciated.)

As for how the added columns would work, enunciated forms would only differ between dialects if their normal phonetic forms already differ (because of the limits in the differences between dialect reflexes), so I'm thinking something like: phonetic (Rālik), enunciated (Rālik), phonetic (Ratak), enunciated (Ratak), with each dialect's phonetic and enunciated columns merging if they're the same, and all four columns merging if all four forms are the same.

If we'd be taxing our Scribunto/Lua allowances too much for the one table, I could instead set it to show enunciated mode in the sandboxed version as a temporary visual aid during relevant discussions, but still there are now effectively four different phonetic modes to debug. - Gilgamesh~enwiki (talk) 16:11, 20 January 2020 (UTC)

@Gilgamesh~enwiki: At the moment there's no risk of the testcases timing out, even if they take twice as much Lua processing time as they do now, because it's still under a second, and they've got a limit of ten seconds. The page does take a bit long to parse now though: the "real time" can be as much as 2 seconds (not quite as long as for Wiktionary:List of languages: ~6 seconds).
I'll take a look at how to handle the enunciated mode. I do like it using spaces; it looks quite intuitive to me. — Eru·tuon 09:26, 21 January 2020 (UTC)
Well, it looks like if the table starts to balloon that big, we may have to start excluding other words that simply won't get displayed. Perhaps some of the least bug-prone words with the least complicated logic involved, like for instance those with invariable /ʲVʲ, ˠVˠ, ʷVʷ/ vowels and no clusters, like jeen and ļan̄. But for now, nothing needs to be removed and it may never get to that point. And I suppose there's still a chance I could improve the module's efficiency in other areas. - Gilgamesh~enwiki (talk) 13:14, 21 January 2020 (UTC)

Voicing of fricatives in Old English pronunciation transcriptionsEdit

I saw that you've recently edited a bunch of Old English entries to replace /z/ with /s/, leaving the comment that [z] is an allophone of /s/ in Old English. That is arguably true, but I think the removal of /z/ from Old English transcriptions brings up a few more issues that ought to be addressed. First, the reason I say the allophonic status of [z] is "arguable" is because there are in fact some contexts where the use of a voiced vs. a voiceless fricative may not be completely predictable from the phonological context. See "Phonemically Contrastive Fricatives in Old English?", by Donka Minkova, for a description of some of the relevant evidence and references to prior literature that discusses the topic (Minkova does support the interpretation that the voiced and voiceless fricatives were allophones in Old English). The other issue, more important in my opinion, is a matter of consistency: two other voiced fricatives, [v] and [ð], are commonly analyzed as allophones of /f/ and /θ/. So a transcription like "/ˈt͡ʃiyvese/" for ciefese seems fairly problematic: if we decide to use /s/ here, I think it would be better to also use /f/, giving /ˈt͡ʃiyfese/. And in fact, considering that the allophonic realization of voiceless fricative phonemes as voiced fricatives doesn't come naturally to modern English speakers, and that (as mentioned above), the distribution of the voiced and voiceless allophones in Old English is somewhat complicated, I think it would be worthwhile to include a phonetic transcription using [v] and [z] in addition to a phonemic transcription with /f/ and /s/ for words like this.--Urszag (talk) 21:36, 31 October 2019 (UTC)

@Urszag: Sorry, I really did make a mess with my edits. I will search for phonemic transcriptions with /ð/ and /v/ and correct them as well.
It would be easier to just generate Old English transcriptions with Module:ang-pronunciation, which I started but never completed. I agree that there should be phonetic transcriptions for words in which /f s θ/ are voiced. Words with hard allophones of /j/, like eċġ, whose phonemic transcription /ejj/, would also benefit from phonetic transcriptions (assuming that the "hard" and "soft" pronunciations of ġ are indeed allophones) because the change from /j/ to [d͡ʒ] is a bit surprising. — Eru·tuon 14:46, 1 November 2019 (UTC)

Review of NEC rewriteEdit

WDYT about the result? Should I move the function processor() and function setup_click_keyup() out of the setup_infl()?--So9q (talk) 19:17, 4 November 2019 (UTC)

I'm still very confused by the script, but it looks much improved. I have some cleanup ideas. It's probably a good idea to add a nec- prefix to the NEC parameters in the URL, to avoid collisions, and it's traditional to use hyphens in class names rather than underscores. I've made the script use mw.util.getParamValue instead of a custom function.
I loaded the scripts, and some of the translation links are colored; but clicking the links doesn't show the NEC. Maybe I broke User:So9q/new-entry-creator.js when I edited it? — Eru·tuon 20:12, 4 November 2019 (UTC)
I just tested and it still works for me clicking translation links. Although for now CreateTranslation.js only support fetching the first PoS. There is a bug with lang=code not being set also.--So9q (talk) 16:30, 6 November 2019 (UTC)
Oh, it's working now for me too. That's odd. — Eru·tuon 17:07, 6 November 2019 (UTC)

Adding aliases to Module:family treeEdit

You've done a lot of work on this. Now that we have aliases for etymology languages, I'd like to display them, either in the family tree or in an info box, similar to what we have with {{langcatboiler}}. Maybe we should have {{etym lang cat}} for etymology language categories; currently these categories, when they exist, aren't standardized in name or contents. Benwing2 (talk) 05:40, 15 November 2019 (UTC)

@Benwing2: I've thought of creating a template for etymology language categories, but I got hung up over an unresolved issue. At the moment, many etymology language categories just have a category for the canonical name (Category:Attic Greek), though there is also Category:Kölsch Central Franconian corresponding to Kölsch (ksh). Entries are added to the categories using {{lb}} and {{tlb}}. Ideally lemmas and non-lemma forms would be in different categories, but I didn't know how to do that. It would be weird to have to specify lemmas or non-lemma forms in {{tlb}}, like having {{tlb|grc|Epic Greek lemmas}} or {{tlb|grc|Epic Greek non-lemma forms}} display as "(Epic)" but add different categories, and I didn't know how to accommodate that in Module:labels and couldn't think of another good way to add the categories. So I never came up with any kind of action plan. Maybe this issue doesn't have to be solved right away though. — Eru·tuon 19:52, 15 November 2019 (UTC)
One possibility is to allow etymology languages in {{head}}, which knows about the POS and hence whether it's a lemma or not. The only other way I can think of without having the POS or lemma status marked explicitly in {{tlb}} is for {{tlb}} to look through the page text, which is expensive and likely error-prone. Benwing2 (talk) 18:11, 16 November 2019 (UTC)

Χαῖρε! On 21st century Wiktionary we shouldn't perpetuate the biases of 19th century Englishmen; Doric is real Ancient Greek! Not a subdialect of Attic...Edit

Χαῖρε, hello, nice to (virtually) meet you...
With regard to recent edits on ἅρπα I wasn't sure where to post this, I was just responding specifically vis-à-vis the Doric Greek morphology of ἅρπα but ran long touching on the broader subject of Greek dialects and their inclusion on Wiktionary, so I'll post this full comment on your talk page too...
Extended content
Personally I am bewildered that a simple 1st declension noun like Doric ἅρπα for Attic ἅρπη would be controversial...? This is pretty basic Ancient Greek dialectal morphology variance. Doric (and Aeolic) retain original ᾱ which Attic changed to η in many cases (there are exceptions after certain letters ε, ι, ρ; whereas Ionic nearly always changes old ᾱ to η). 1st declension singular -ᾱ, -ᾱς, -ᾳ, ᾱν. In the plural the forms are the same as Attic except in the genitive plural Doric -ᾱων typically contracts to -ᾶν. Unlike some other dialectal variances, on an academic level Doric 1st declension in -ᾱ, -ᾱς for Attic -η, -ης is a fairly well-established consistent paradigm, a minor lengthening of one vowel...
....and Western/Central Greek dialects (Doric-Aeolic) preserved ᾱ which was the original Ancient Greek form; Attic-Ionic lengthening ᾱ to η was a later dialectal novelty unique to the Eastern Greek dialects (Attic-Ionic). Attic is in fact the variant form here from the original authentic archaic Greek form which Aeolic and Doric much more faithfully preserved...to this day Tsakonian, descended from Doric, spoken in the Peloponnese (albeit sadly endangered) preserves ancient α where later Attic-derived Greek substituted η.
And in the ancient world, Doric and Aeolic Greek is what they spoke in Sparta and all of Laconia, in Thebes and all of Boeotia, in Epirus, in Achaea and Thessaly, Corinth and Olympia, on the islands of Lesbos and of Crete (also a bastion of preservation for the most authentic original Ancient Greek, being the birthplace of Greek civilization going back to the Mycenaean Greeks and Minoan Greeks), and also in much of Magna Græcia (Italy and Sicily), including Syracusæ in Sicily, the home of Archimedes, and by the Classical period the greatest and most significant rival city of Athens in the Hellenic world, by some sources Syracusæ was even larger and more significant than Athens. (And of course if you know your history, Athens deciding to launch an infamous "Sicilian Expedition" to attack Doric Syracusæ during the Peloponnesian War would prove a catastrophic ruinous mistake for the Athenians).
This seems to touch on the other general problem raised by recent edit reverts, which is bias in Wiktionary's coverage of Ancient Greek hitherto, bias that should be removed. A 21st century electronic 'Wiktionary' should not perpetuate biases of 19th century-20th century elite French and Englishmen who based on historical judgments idolized all things Athens, put up on an Ionic pedestal (the other 2 Greek column orders being Doric and Corinthian, both Dorian speakers!) while demonizing and denigrating Sparta and all of the Doric and Aeolic Greek worlds, in fact all of Ancient Greek linguistic history except for c. 5th century BC Athens. Biased scholars many centuries later decided that Attic was superior and real Greek while other dialects mere imitators, Archimedes in Syracusæ did not speak Ancient Greek of the Doric dialect, rather he spoke an inferior "Doric forms" of REAL Greek which is only Attic.
Other than such historical bias, there is no reason why distinct words and forms of Ancient Greek in Doric or Aeolic should just link to the Attic form as REAL Ancient Greek. Attic has more unique local noveltiies diverging from standard Ancient Greek than Doric/Aeolic. In their time Doric and Aeolic Greek were of equal if not greater significance, and spoken by far more people than the novel local dialect of Athens, which again only became looked at as the "model"
Doric Greek is different from Attic Greek, different enough that Doric/Aeolic forms deserve their own entry (at least a West Doric/Aeolic separate from Attic/Ionic). Different but an equally valid form of Ancient Greek in its own right and merits inclusion of Doric/Aeolic forms that stand on their own, not just (mis)represented as inferior variant forms of Attic. The language is called "Ancient Greek", NOT "Attic Greek". Doric/Aeolic Greek words and forms should be added/provided whenever possible-and as their own entries, not links to Attic, 'tis biased historical revisionism to imply Doric and Aeolic Greek are just variant forms of REAL (Attic) Greek, when in fact the dialects developed independently and were of equal standing and signifcance in the time when they were actually spoken and used as living languages (and Doric was actually closer to the original, Attic was the odd local provincial dialect that diverged most from Proto-Hellenic). As a reference source for all languages including ancient languages no longer spoken (some of which far more speculative like e.g. Phoenician/Punic), Wiktionary (and Wiktionarians) should seek to provide Doric Greek entries no less so than Attic entries. The biases of the recent past against any form of Greek except 5th century BC Athens dialect should be left on the ash heap of history. Rather, for a fair, unbiased and thorough modern reference source on Ancient Greek, the dialects should be treated equally as their own forms of Ancient Greek language with their own unique morphology.
Reducing Doric/Aeolic Greek words to mere dialectal variants of Athens just linking to the Attic variant is akin to having Aragonese, Asturian, Catalan, Galician, Leonese, Occitan, even Portuguese, all just have links to the (Castilian) Spanish entry e.g. Catalan joventut entry should say just "Catalan form of juventud" with a link to the Castilian Spanish juventud entry. After all, like Attic among Greek dialects, Castilian Spanish is the clear historical winner of the Ibero-Romance languages, the other Ibero-Romance languages are historical losers, just inferior imitation dialect forms of Spanish language not worth recordng and preserviing in their own right, like Doric and Aeolic are just inferior imitation dialects of Attic REAL Greek...
Respectfully, I would suggest perhaps re-examining your potential ingrained Athenocentric biases that have plagued Greek classrooms and textbooks and lexicons for the past few centuries which conflate Attic Greek with Ancient Greek, and which ignore or disparage other dialects as irrelevant inferior imitations of Attic at best, missing the forest through the trees; try to zoom out and get a new bigger picture perspective conscious of these insidious deeply ingrained...some of us have actually studied and are actually interested in researching and preserving Doric and Aeolic Greek for their own sake as equally valid and historically and linguistically significant forms of Ancient Greek, not as mere trivial inferior variant subdialects of Attic. Someone who wants to research Doric Greek forms should not have to click through every entry to go see the Attic variant as the "real" form. Attic is the spin-off from the original, not Doric! And at the very least Doric and Aeolic Greek entries deserve to exist! Especially such simple forms conforming to basic paradigms of what we know about the standard morphology and usage of Doric and Aeolic Greek dialects. Wiktionary cannot claim to have comprehensive coverage of Ancient Greek as a reference source if it neglects the other equally significant, equally legitimate, equally valid, equally deserving divergent dialects. Wiktionarians should seek to add Doric Greek entries just like they add Catalan and Galician or Asturian despite being varians of far more well-known and widely used Castilian Spanish which like Attic Greek just happened to win the historical winners-and-losers lottery...
And this is the case with Doric-Aeolic ἅρπα, ἅρπᾱς, an equally valid independent Western Greek form deserving of its own entry distinct from the Eastern Greek Attic-Ionic variant ἅρπη, ἅρπης...across many other languages there are many far more redundant forms of words in closely related languages (often forms identical or nearly identical, more closely related than the rainbow of diverse Western Ancient Greek and Eastern Ancient Greek dialects) that may not be so commonlyused much but are considered worthwhile to preserve as a comprehensive linguistic reference source database.

Herbert Weir Smyth, A Greek Grammar for Colleges http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0007%3Apart%3D2%3Achapter%3D13%3Asection%3D13 Smyth grammar 2.13.13 FIRST DECLENSION (STEMS IN α_)

[*] 214. The dialects show various forms.

[*] 214 D. 1. For η, Doric and Aeolic have original α_; thus, νί_κα_, ϝί_κα_ς, ϝί_κᾳ, νί_κα_ν; πολί_τα_ς, κριτά_ς, Ἀτρείδα_ς.

2. Ionic has η for the α_ of Attic even after ε, ι, and ρ; thus, γενεή, οἰκίη, ἀγορή, μοίρης, μοίρῃ (nom. μοῖρα^), νεηνίης. Thus, ἀγορή, -ῆς, -ῇ, -ήν; νεηνίης, -ου, -ῃ, -ην. But Hom. has θεά_ goddess, Ἑρμεία_ς Hermes.

3. The dialects admit -α^ in the nom. sing. less often than does Attic. Thus, Ionic πρύμνη stern, κνί_ση savour (Att. πρύμνα, κνῖσα), Dor. τόλμα_ daring. Ionic has η for α^ in the abstracts in -είη, -οίη (ἀληθείη truth, εὐνοίη good-will). Hom. has νύμφα^ oh maiden from νύμφη.

8. Gen. plur.—(a) -ά_ων, the original form, occurs in Hom. (μουσά_ων, ἀγορά_ων). In Aeolic and Doric -ά_ων contracts to (b) -ᾶν (ἀγορᾶν). The Doric -ᾶν is found also in the choral songs of the drama (πετρᾶν rocks). (c) -έων, the Ionic form, appears in Homer, who usually makes it a single syllable by synizesis (60) as in βουλέωνν, from βουλή plan. -έων is from -ήων, Ionic for -ά_ων. (d) -ῶν in Hom. generally after vowels (κλισιῶν, from κλισίη hut).

Perseus Greek Word Study Tool:

http://www.perseus.tufts.edu/hopper/morph?l=arpa&la=greek#lexicon ἅρπα noun sg fem nom doric aeolic ἅρπα noun sg fem nom doric aeolic

http://www.perseus.tufts.edu/hopper/morph?l=arpas&la=greek#lexicon ἅρπας noun sg fem gen doric aeolic

Greek morphological index (Ελληνική μορφολογικούς δείκτες):

Nominative: https://morphological_el.academic.ru/687234/%E1%BC%85%CF%81%CF%80%CE%B1%CF%82#sel=10:3,10:3 ἅρπας

   ἅρπᾱς , ἅρπη
   bird of prey
   fem acc pl
   ἅρπᾱς , ἅρπη
   bird of prey
   fem gen sg (doric aeolic)

Accusative: https://morphological_el.enacademic.com/687226/%E1%BC%85%CF%81%CF%80%CE%B1%CE%BD ἅρπαν

   ἅρπᾱν , ἅρπη
   bird of prey
   fem acc sg (doric aeolic)

Inqvisitor (talk) 08:24, 16 November 2019 (UTC)

Hi, it looks like your post in WT:RFVN is substantially the same. In future, please post in just one place. You can bring my attention to the post by including a link to my user page (Erutuon). That will send me a notification. — Eru·tuon 09:04, 16 November 2019 (UTC)

On the reversal of my edit on the article on ışıkEdit

You reverted my edit on the page ışık. Why is that? The declension adds nothing to the article (the nominative declension is the word itself and the accusative declension is already given in the {{tr-noun}} template: "ışık (definite accusative ışığı, plural ışıklar)"). In my opinion, the templates {{tr-infl-noun-c}} and {{tr-infl-noun-v}} shouldn't be used anywhere on Wiktionary as they provide no information that {{tr-noun}} doesn't already provide already but only bloat the site. --Fytcha (talk) 18:16, 6 December 2019 (UTC)

@Fytcha: There are a lot more forms in the table than just the definite accusative and the plural (ışık, ışığı, ışıklar, ışıkları, ışığa, ışıklara, ışıkta, ışıklarda, ışıktan, ışıklardan, ışığın, ışıkların, ışığım, ışıklarım, ışığımız, ışıklarımız, ışığınız, ışıklarınız), but they are hidden by default. You've got to click two "more" buttons on the right side of the table to see them. — Eru·tuon 18:22, 6 December 2019 (UTC)

Another Rustacean :)Edit

I noticed that you are working in Rust. It has become my favourite language recently, although for Wiktionary bot work I still use Python. —Rua (mew) 11:01, 9 December 2019 (UTC)

I've become quite fond of it as well, and now often miss features like return values from blocks and match blocks when programming in Lua. — Eru·tuon 19:36, 9 December 2019 (UTC)
@Rua, Erutuon: I'm interested in things you dislike about Rust. I looked at it a while ago, and there was a lack of libs for doing standard stuff (talking to a database etc.), but that's probably changed in the meantime. - Jberkel 00:26, 10 December 2019 (UTC)
Yeah, the development is going pretty fast. Not just the language itself, but library infrastructure as well. —Rua (mew) 10:14, 10 December 2019 (UTC)

If you ever have timeEdit

I hate to bother you all the time. If you ever have time, could you check el:Module:sarritest The only person in el.wikt who knew Lua is now a 'vanished' user. sarri.greek (talk) 00:00, 11 December 2019 (UTC)
Thank you so much! sarri.greek (talk) 18:48, 11 December 2019 (UTC)

@sarri.greek: Let me know if you need any more help or further explanation. — Eru·tuon 18:51, 11 December 2019 (UTC)
The basic ideas of Lua, I cannot grasp. I have tried all kinds of combinations of the words 'local', 'frame', but I cannot make the collective function.main work. It is just an excercise, it is not important.
One general question, if i may: When we have a module which produces declensions automatically like el:Module:κλίση/el/ουσιαστικό, is it better/preferable to do all the paradigms IN the Module? Or create wikitext Templates with the parameters for the endings? They are so many! and the Module page becomes so long! sarri.greek (talk) 16:08, 13 December 2019 (UTC)
It turns out I had reversed the logic for getting args. That's not uncommon with me.
Do you mean separate templates for each declension? I suppose either way works, but I like to be able to edit all the paradigms at once and compare them, so having them in a single module helps. For Ancient Greek, the module is Module:grc-decl/decl/staticdata/paradigms. If each is in a separate template, then there are more pages to edit. — Eru·tuon 19:04, 13 December 2019 (UTC)
Thank you SO much. For the many pages of paradigmata: I was worried about what is best for ...errr... you call some actions 'expensive' or bad, or not good. I will study the examples you have shown me. sarri.greek (talk) 19:09, 13 December 2019 (UTC)
Ahh, I see. I'm not sure which is least expensive in memory and Lua processing time. — Eru·tuon 19:20, 13 December 2019 (UTC)

ReqEdit

Hi Erutuon. Can you run a bot to do this:

moving translations with ku code and Latin script to kmr code and Northern Kurdish dialect
moving translations with ku code and Arabic script to ckb code and Central Kurdish dialect

also this:

changing translations with ku code and Latin script to kmr code
changing translations with ku code and Arabic script to ckb code

also we shouldn't allow ppl to add translations with ku code; they should use Kurdish dialects codes (kmr, ckb, ...) instead of using ku code directly. Thanks.--Calak (talk) 16:50, 13 December 2019 (UTC)

Hmm, I know how to identify scripts, but don't have a method to modify translations yet. I can at least make a list to start with. — Eru·tuon 08:13, 14 December 2019 (UTC)
Oh, no! You don't need to modify translations, you should change "ku" code to "ckb" or "kmr" per its script.--Calak (talk) 11:15, 14 December 2019 (UTC)
Right, by modifying translations I mean changing moving translations from "Kurdish" to "Northern Kurdish" etc. while using the correct format (the first diff). For that, it would be nice to have a method that would move translation x from language a to language b and format everything correctly. It seems complicated though. Perhaps someone else has worked this out already. But I might be able to change language codes easily (the second diff). — Eru·tuon 22:25, 14 December 2019 (UTC)
OK. How about to prevent people from using ku code in translations? Can you add a code (in TranslationAdder gadget) to do this?--Calak (talk) 16:19, 15 December 2019 (UTC)
@Calak: Hmm, perhaps the TranslationAdder could suggest inserting the translation under ckb, kmr, or sdh instead of ku? I might be able to figure out how to do that but I've mostly stayed away from that gadget because its code confuses me. — Eru·tuon 09:14, 17 December 2019 (UTC)

It is OK Erutuon. I will be thankful if you can apply any one of them.--Calak (talk) 07:12, 21 December 2019 (UTC)

Reverted EditEdit

Hello, it is not an "odd alternative pronunciation". Several million people pronounce it that way, whereas the mispronunciation of "decade" has about five variants on the site for about 10 speakers. ABAlphaBeta (talk) 08:39, 17 December 2019 (UTC)

@ABAlphaBeta: I'm sorry for my hasty reversion. I've restored the alternative pronunciation that you probably meant (as User:Mellohi! pointed out to me), but moved it into {{fr-IPA}}: {{fr-IPA|écuidistant|équidistant}}. I know very little about the fine details of French pronunciation and you may be right. Words with équi- (or ultimately derived from aequus) are transcribed with either /e.kɥi/ or /e.ki/ on Wiktionary, and while the soundfiles of équidistant on the French Wiktionary and on Forvo has /e.kɥi/, perhaps some people pronounce it with /e.ki/ like équilibre and other words because it may be as confusing for French speakers as it is for foreigners like me. — Eru·tuon 09:10, 17 December 2019 (UTC)

Deletion reasonsEdit

Hi. In October, you added "Incorrect title: a mixture of Latin- and Cyrillic-script characters". Do you think this could be merged into the existing "Bad entry title"? How do they differ? Equinox 08:05, 20 December 2019 (UTC)

@Equinox: Well, it's certainly a subtype, but I prefer to be clear since it's not always easy to see what's wrong with the title. I was thinking maybe something like "mixed script" or "incorrect lookalike characters" would work as well. At the time there was a backlog of these titles, and I was getting tired of re-entering the deletion reason since the "content: ..." bit prevented the input box history from working. But perhaps it won't be needed now that there's this abuse filter. It displays a message showing which characters are in which script, which seems to enable editors to create the entry at the right title, so there aren't any new badly titled entries to delete. — Eru·tuon 08:33, 20 December 2019 (UTC)
Yeah, went and removed it. — Eru·tuon 08:56, 20 December 2019 (UTC)

Help needed at simple.wiktEdit

Hi Erutuon, can you help me with the Lua Module:number list on simple.wikt? Minorax (talk) 05:10, 29 December 2019 (UTC)

@Minorax: Sure... I did fix one problem that caused a module error. — Eru·tuon 05:30, 29 December 2019 (UTC)
So that was the problem, forgot about that. Thank you :) Minorax (talk) 05:37, 29 December 2019 (UTC)
And since simple.wikt only contains English words, Module:number list/data/en isn't really needed as a subset of the module, is it possible to merge it into the main module? Minorax (talk) 05:41, 29 December 2019 (UTC)
It's possible, but I wouldn't recommend it. Putting data in the main module adds many lines, making it harder to edit, and if you want to keep the Simple Wiktionary module in sync with the English Wiktionary module, it will be harder to copy code. — Eru·tuon 05:51, 29 December 2019 (UTC)
Alright :) Minorax (talk) 05:52, 29 December 2019 (UTC)

Wiktionary:Todo/multiword Spanish lemmas with a hyphenEdit

Hey. After your excellent page Wiktionary:Todo/multiword Spanish lemmas not idiom or proverb, I'd like a request of all Spanish entries with a hyphen. There shouldn't be many entries on the list, as Spanish doesn't use them so much. Thanks in advance, anyhow. --ReloadtheMatrix (talk) 19:35, 1 January 2020 (UTC)

@ReloadtheMatrix: Done because I have files of all entry names for all languages. Whoops, that was wrong, it's supposed to be lemmas. Fixed. — Eru·tuon 19:43, 1 January 2020 (UTC)
Awesome. You rule. Any chance of having the prefixes and suffixes removed? --ReloadtheMatrix (talk) 19:49, 1 January 2020 (UTC)
Done. — Eru·tuon 19:56, 1 January 2020 (UTC)

Gah, the search engine includes results for redirects, which is why dimensional was in the list (-dimensional redirects to it). [Edit: Anyway, fixed.] — Eru·tuon 20:34, 1 January 2020 (UTC)

Adding a pronunciation table for AlbanianEdit

Hello,

I'd like to ask you whether you could add a pronunciation table for Albanian with the same structure as the Ancient Greek pronunciation table. I could also provide you with the content for doing so. Apart from that, I'd like to know how links may be added to a template without having to place linking brackets around every term encompassed by it. HeliosX (talk) 17:14, 2 January 2020 (UTC)

@HeliosX: What Ancient Greek pronunciation table are you referring to? And what sort of template are you talking about? — Eru·tuon 19:36, 3 January 2020 (UTC)
I meant the current Ancient Greek pronunciation table that requires the letters to be entered in the page and, for example, this template. There should be, for instance, a link to "ali" and the noun ending in "-ã" or "-i" even though the terms are separated through "ale" in the declension table. It is not sure whether "ali" and the noun ending in "-e" should be linked because the usage of [e] or [i] in positions that allow both is usually somewhat similar and phonologically coherent. HeliosX (talk) 20:00, 3 January 2020 (UTC)
@HeliosX: Do you {{grc-IPA}}? — Eru·tuon 20:07, 3 January 2020 (UTC)
Yes, I meant this one. HeliosX (talk) 20:08, 3 January 2020 (UTC)
Ahh, I see. I was confused because "table" made me think of Appendix:Greek pronunciation. I could probably make a pronunciation template for Albanian. I'm not very familiar with Albanian, so I would have to use any information that you can provide, and w:Help:IPA/Albanian and w:Albanian phonology.
I still don't understand the problem with {{rup-noun-f-ã}}. I also don't understand why there are so many forms in each cell in the table. Does every noun of this type have two indefinite plurals, one in -i and one in -e? — Eru·tuon 20:18, 3 January 2020 (UTC)
Thank you for any possible aid with this. I'd have to divide the information about Albanian phonology as far as I'm concerned and as I've gotten to know into three IPA tables.
Firstly, the terms of Standard Albanian, which is mostly the same as Tosk, should have three major IPA rows. The first row would be Tosk and its phonemes are all given in the second phonology overview that you were referring to. However, the vowel [ə] is only pronounced when being stressed, in the first syllable of a word or if the word ends with a consonant after the vowel [ə]. The pronunciation due to position in the first syllable applies as well to any terms that are derived so that it is realized always in asnjë and asgjë. Contrastingly, only in the accusative forms atë, këtë, dikë, çdokë, askë and kurrkë and some terms beginning with "atë-" it may be pronounced even in the Tosk rendition of Standard Albanian. Also, the letter "r" is realized either as [ɽ] or often [ɹ] whereas [ɾ] probably does not occur. Hence, there could be a first pronunciation only with [ɽ] and, in the same row, a second pronunciation solely with [ɹ] in addition to denoting that, furthermore, [ɽ] and [ɹ] can be intermixed in a single word. Another matter concerns itself with "ë" that might also be pronounced as [ʉ] but that should only be noted next to the IPA row. The establishment and attribution of this phoneme is also a bit insecure but I've taken note of it.
The orthography of Albanian is based on the Korçan dialect of Tosk and, despite not having very many speakers at all, it should be included in the second row because it provides an explanation of the orthography. The vowel [ə] is pronounced everywhere but its speakers may not do so frequently in consideration of having learnt the general phonology of Standard Albanian, omitting these vowels in quite many positions. Nowadays, it seems that "r" is only realized as [ɹ].
Even though Gheg frequently may have its own variants for Standard Albanian vocabulary and grammar, its speakers also employ Standard Albanian and would pronounce it differently. Making only the distinction to the pronunciation of the latter in Tosk, the letter "r" has got the phoneme [ɾ], the affricates [t͡ʃ] and [d͡ʒ] can be extended to "gj" and "q", allowing two variants to be placed into the same IPA row.
In words that do not belong to Standard Albanian but only to Tosk, a second IPA table with the realization in its own dialect includes [c] and [ɟ] for the letters "q" and "gj" apart from the affricates [c͡ç] and [ɟ͡ʝ] in a single row. Those words don't have any pronunciation in Gheg but as well in the Korçan dialect.
In words of Gheg Albanian according to its own pronunciation, not including the other dialects, the information about vowels from this article can be continued for the third IPA table. Nevertheless, "ë" is still realized as [ə] unless the orthography shows that it has been altered. It can be denoted that it may also be pronounced as [ʌ] like in another dialect of Tosk but it does not have to be written into the pronunciation itself. Additionally to the Gheg pronunciation characteristics already entailed by the first IPA table, the consonantal clusters "nd" and "mb" can be pronounced as [nd] or [ⁿd] and [mb] or [ᵐb] and, written differently but derived from those, "n" and "m" as [n] or [nˠ] and [m] or [mˠ]. In order to differentiate the instants of "n" and "m" only as variants for "nd" and "mb" it should perhaps be recognized whether the term is a variant, referring to the templat usage, of a term that has "nd" or "mb" in the position of "n" and "m". The characters "q" and "gj" are not realized as [c͡ç] and [ɟ͡ʝ], but, in extension of their other possible pronunciation, also as [t͡ɕ] and [d͡ʑ]. The consonant [h] is sometimes weakened in particular when not being word-initial and, apparently, [l] can be palatized into [lʲ] at least before [ə], [ɔ] and possibly [o].
In Aromanian, both indefinite plurals may be formed. I would need links for each form that is phonetically close so that those would be, giving just an example, ali featã, ali feati and ale featã, ale feate even though there is written only "ali, ale featã, feati, feate" in the declension table. HeliosX (talk) 18:29, 6 January 2020 (UTC)
I don't know how to display "ali, ale featã, feati, feate" but link to ali featã, ali feati, ale featã, ale feate. Which words would link to which entries? "ali" could link to either ali featã or ali feati, and "featã" could link to either ali featã or ale featã.
In general if these are just phrases like "to the girl", we would not give them their own entries, and each word would be linked separately – ali, ale featã, feati, feate – as in the declension table in θεός (theós) where the forms of the definite article (ho) are linked separately from the forms of θεός (theós). That removes the problem of how to show individual words but link to phrases. But I am just guessing that ali and ale mean "to the", because they don't have entries yet. (I also don't know what the different final vowels mean.) — Eru·tuon 07:28, 6 January 2020 (UTC)
They almost can't be used without a, ali or ale, I found it without any separate particles of the genitive and dative cases for example in "Soarili, cã s-avea disprãs di surorili-a lui, dzenili di munti, iu chindurea cathi tahina, di li-adutsea ghiumi-mplini di lunjin, ta si-sh speal fatsa di liatsa noptsljei" in "Lunjina dit sinduchi" by Aromanian writer Dini Trandu but the author evidently employs those and a might simply have been blended together with the definite form of liatsã as this would have resulted in "liatsa-a noptsljei" according to the author's orthography. The vowels [e] and [i] can be used both and I think that the latter has been influenced maybe by Greek phonology and grammar with "-i" as frequent ending of feminine declensions. However, they could be regarded in the same way as the Albanian article used along with the genitive exoclitics, which are not included in the entries that are linked to. HeliosX (talk) 23:42, 6 January 2020 (UTC)
Well, even if the genitive or dative case form doesn't occur without these other words, we don't give entries to phrases unless their meaning is not sum-of-parts as explained in WT:SOP – for example if the meaning of ali featã is not a combination of the meaning of ali and the meaning of featã. — Eru·tuon 22:06, 6 January 2020 (UTC)
Having reconsidered this in comparison to the linkings in the Albanian declension tables, I'd agree that these particles don't have to be linked. HeliosX (talk) 16:07, 8 January 2020 (UTC)
Well, I should clarify – I suggested that a, ale, ali should be linked separately from the noun forms, like the Ancient Greek definite articles in the declension table of θεός (theós): ali, ale featã, feati, feate. Regarding the Albanian pronunciation template, I will try to get to it eventually. I have some other projects that I'm working on at the moment. — Eru·tuon 08:31, 9 January 2020 (UTC)

Toilbot unusual editEdit

[2] DTLHS (talk) 00:32, 5 January 2020 (UTC)

Thanks. My regex to match a PoS header followed by a headword-line template wasn't good enough. — Eru·tuon 00:45, 5 January 2020 (UTC)

Changing all derivations from Proto-AlbanianEdit

Hello,

maybe you could use a tool for multiple edits if such tool has been devised or a programmed account to change all these instants of derivations from Proto-Albanian to inheriting. HeliosX (talk) 16:35, 6 January 2020 (UTC)

Barnstar!Edit

  The da Vinci Barnstar
For helping us create a smart template in the Further reading of Hungarian-language entries. Thank you so much. Adam78 (talk) 22:39, 13 January 2020 (UTC)

updateEdit

Hey. Can you update User:Erutuon/abbreviation headers at the next dump please? I estimate it will be around 28% the size of the current page. --Yesyesandmaybe (talk) 10:45, 18 January 2020 (UTC)

@Yesyesandmaybe: Yep! It's in the script that updates the other header pages. — Eru·tuon 19:23, 18 January 2020 (UTC)

Module errors from edits to documentation submodulesEdit

Please check CAT:E. Chuck Entz (talk) 17:39, 20 January 2020 (UTC)

@Chuck Entz: Fixed. Thanks. I wish I'd caught it earlier. — Eru·tuon 19:15, 20 January 2020 (UTC)
Well, at least the pages with the errors aren't where a lot of people would see them. It's not a big deal, but the sooner something like this is fixed, the better. Glad I could help. Chuck Entz (talk) 19:22, 20 January 2020 (UTC)

Etymology at epigone.Edit

Hello, Erutuon. I wonder if you will take a moment to visit the English language epigone page when you are able, and check on what I suspect might be an error in the etymology given there. I believe the statement within the Etymology there, that ἐπίγονος comes "from ἐπιγίγνομαι" to be incorrect, as it suggests that γόνος is derived from γίγνομαι. Rather, I think that γόνος, as did γένος, entered Ancient Greek more directly as a lemma from earlier IE sources, instead of being derived from γίγνομαι (please note the Etymology at γόνος, wherein that is indicated, and wherein γόνος is indicated to be merely the equivalent of γίγνομαι + -ος). This is much the same in Latin, wherein the noun genus cannot be said to be a derivative of the verb gigno, but rather, that it is a related word with both deriving from separate IE lexemes. It seems to make more sense to me that the noun ἐπίγονος should be derived as is shown on its page, rather than from ἐπιγίγνομαι. As for myself, I am loath to change any existing etymologies, as I am really not that learned in linguistic history, and so would like to have your more experienced eyes on this (I believe it was Victar who rightly "slapped me down" on an earlier foray of mine into the IE realm). I thought that, instead of just including an etymology template on the page, I might rather just bring it to the attention of someone who probably can assess the etymology properly. Thanks. —⁠This unsigned comment was added by 68.112.86.146 (talk) at 19:33, 20 January 2020 (UTC).

Redirect problemEdit

[3] DTLHS (talk) 16:19, 22 January 2020 (UTC)

@DTLHS: Thanks. I'll exclude redirects and look for the other redirects that my bot messed up. — Eru·tuon 19:38, 22 January 2020 (UTC)

Requested editsEdit

You reverted my edit on Wikitionary:Requested entries because there is no page for Yogotti, but I was told that Wikitionary:Requested entries was the place you request new words. WikitionaryGuy (talk) 23:26, 22 January 2020 (UTC)

@WikitionaryGuy: You must have been misinformed. Wiktionary:Requested entries links to the pages where you post requests. In this case, if Yogotti is an English word, you would post it in Wiktionary:Requested entries (English). — Eru·tuon 00:35, 23 January 2020 (UTC)

ToilBot "Normalizing" VandalismEdit

Is there any way you could have your bot avoid normalizing entries that have been edited too recently? I keep finding cases where someone vandalizes an entry and ToilBot tidies it before any patrollers can get to it- thus blocking it from the rollback tool. The only way around that is undoing via the edit history, which is slower and much less convenient. Chuck Entz (talk) 04:19, 23 January 2020 (UTC)

@Chuck Entz: Sure. That's pretty annoying. I'll work on a way to skip pages that have been edited within a certain number of hours before I run the script on a large number of pages again. — Eru·tuon 04:41, 23 January 2020 (UTC)
@Chuck Entz: Update: now the script finds pages whose most recent edit is in Recent Changes, and it starts from the oldest edits in Recent Changes and stops at edits from 12 hours ago, if it gets that far. I might change the start date because the oldest edits in Recent Changes are from 1 month ago, and some pages are probably edited more often than that. But do you think 12 hours is enough time? — Eru·tuon 19:18, 20 March 2020 (UTC)
I would be more comfortable with 24 hours, but there are others who do more rollbacks than I do- @SemperBlotto, @Surjection and @Robbie SWE, to start with. Chuck Entz (talk) 20:33, 20 March 2020 (UTC)
24 hours would probably be enough for me. — surjection?〉 23:05, 20 March 2020 (UTC)
Okay, I've changed it to 24 hours margin for vandal-fighting. — Eru·tuon 23:48, 20 March 2020 (UTC)

Esperanto ordinal numbersEdit

I see you worked on Module:eo-headword and also applied protection to the page. Could you help me at Wiktionary:Grease_pit/2020/January#Esperanto_ordinal_numbers? I can't edit the page myself. Robin van der Vliet (talk) (contribs) 15:41, 24 January 2020 (UTC)

ἵημι problemEdit

Hi! In ἵημι the "Aorist: εἵμην" misses the first three persons of the indicative, although in the wikitext they are present; could you please check why don't they appear? Thank you very much! --Epìdosis (talk) 12:06, 31 January 2020 (UTC)

@Epìdosis: I see the forms missing in both the header and inside the table. That's because the singular uses first-aorist forms, ἧκᾰ, ἧκᾰς, ἧκε(ν), which are shown in a different table because of the limitations of {{grc-conj}}. And so {{grc-conj}} shows the first-person singular indicative middle εἵμην (heímēn) in the header. — Eru·tuon 21:57, 31 January 2020 (UTC)

Ops, my error! Thank you very much, --Epìdosis (talk) 21:59, 31 January 2020 (UTC)

Update 2Edit

Hey. Can you gimme another update of User:Erutuon/abbreviation headers at the next dump? I reckon about 60% of the terms have since then been corrected (at least in the Abbreviations subpage anyway), and I find myself visiting pages I've already corrected. TIA --AcpoKrane (talk) 11:58, 18 February 2020 (UTC)

@AcpoKrane: Yep, I'll update it when the right dump files come out, as usual. — Eru·tuon 23:50, 22 February 2020 (UTC)
Done. Just realized I forgot to do it after the last dump (2020-02-01). — Eru·tuon 23:48, 23 February 2020 (UTC)

Nesting in translationsEdit

Hi,

Do you know, which module contains the nesting? So that if you add, e.g. a Kurdish translation, you can add "Kurdish/Kurmanji" in the "Nesting"? --Anatoli T. (обсудить/вклад) 02:33, 26 February 2020 (UTC)

@Atitarev: Yes, that's in MediaWiki:Gadget-TranslationAdder-Data.js, under var nesting = {. — Eru·tuon 23:06, 26 February 2020 (UTC)
Thanks but it's not obvious to me how language code "ku" allows nesting "Kurdish/Kurmanji". I'd like to fix Eastern Mari ("chm") as "Mari/Eastern Mari", add a Mongolian nesting "Mongolian/Uyghurjin". --Anatoli T. (обсудить/вклад) 00:05, 27 February 2020 (UTC)
@Atitarev: Right, MediaWiki:Gadget-TranslationAdder-Data.js only controls nesting that is automatically generated by the TranslationAdder gadget; by editing source code manually, anyone can nest any language any way they want, and that's where the Kurdish/Kurmanji nesting for ku comes from. I think "Mongolian/Ughurjin" requires a different mechanism, which may not exist, because the nesting table in MediaWiki:Gadget-TranslationAdder-Data.js is by language code; it doesn't describe any sub-nestings for writing systems. I'm guessing that the "Serbo-Croatian: Cyrillic: ... Roman: ..." that is in quite a few translations sections was added manually, not by the gadget. — Eru·tuon 00:36, 27 February 2020 (UTC)
Thanks. Any language will allow "language name"/Cyrillic or "language name"/Roman. I have fixed the "Eastern Mari" nesting and it seems I can just use Mongolian/Ughurjin or Mongolian/Cyrillic if there is no Mongolian translation present. --Anatoli T. (обсудить/вклад) 04:49, 27 February 2020 (UTC)
Okay. I don't see any way to do "Mongolian/Ughurjin" in the translation adder (and wouldn't be able to add that capability), but if that's not necessary, great. — Eru·tuon 19:06, 27 February 2020 (UTC)

Wiktionary:Todo/multiword Spanish lemmas not idiom or proverb updateEdit

Hey E. Can you rerun Wiktionary:Todo/multiword Spanish lemmas not idiom or proverb after the next dump? I linked, over the space of 4 and a bit months, all of the decent entries in there. What I'm looking for exactly is all NEW multiword entries made since the original list, so after making it, would you be able to remove all entries which appear in the original list? Only then will I be able to say that my quest has been completed. Thanks in advance --AcpoKrane (talk) 09:00, 27 February 2020 (UTC)

@AcpoKrane: I just used a bot script, so no need to wait. This should be it. — Eru·tuon 23:10, 27 February 2020 (UTC)
That's just beautiful. --AcpoKrane (talk) 11:41, 28 February 2020 (UTC)

Day to DaysEdit

How to I change the descendant trees ?

https://en.wiktionary.org/wiki/Reconstruction:Proto-Germanic/dagōs Personisgaming (talk) 15:48, 18 March 2020 (UTC)

Sure, you got it!Edit

Nobody else edits as fast around here (except Equinox, of course). Anyway, if I get blocked before I'm done, would you mind adding {{audio|en|LL-Q1860 (eng)-Vealhurl-{{subst:PAGENAME}}.wav ‎|Audio (UK)}} in the Pronunciation section to all of these words that I recorded today? That would allow me to do other stuff, like, Spanish idioms or nominating people for adminship. --Gorgehater (talk) 22:30, 27 March 2020 (UTC)

TemplatehoardEdit

Do you still update it? I'd like to generate some new wanted entry lists. – Jberkel 09:36, 5 April 2020 (UTC)

@Jberkel: Updated. (I need to figure out how to streamline the process; it's kind of tedious running all the commands.) I tried running the wanted entry script after the 2020-03-01 dump came out, but the first command failed. — Eru·tuon 23:41, 5 April 2020 (UTC)
Thanks! Maybe use a simple Makefile to automate the commands? I'll take a look, sometimes there are resource-related problems, unlike Rust Java needs a lot of memory :) – Jberkel
Ok, all regenerated. It was a silly bug in the CBOR deserialization. – Jberkel 22:29, 7 April 2020 (UTC)
@Jberkel: I made a Makefile and it's now much easier to generate the template dump and entry index: just a single command for each. — Eru·tuon 21:48, 23 April 2020 (UTC)
Cool, I'll renegerate the pages. – Jberkel 14:15, 25 April 2020 (UTC)
@Jberkel: I noticed that the scripts never got to the stage of saving the lists, and looked at the error log but didn't know how to fix it. Something about the Java version number if I recall right. (I wish the error log weren't spammed with progress bars or whatever; it makes it hard to read with less.) Do you have time to debug? — Eru·tuon 18:57, 12 May 2020 (UTC)
@Erutuon: yes, I foolishly updated some dependent libraries to a more modern version of Java, but Spark still needs an ancient version of the JDK. I could rollback to an older version but I'm waiting for the new version of Spark to be released, which should be soon. If it doesn't get released for the next dump I'll revert the changes. – Jberkel 21:03, 12 May 2020 (UTC)

User:ToilBot worsened paadjeEdit

Why did User:ToilBot worsen my contribution [4]? If you don't mind, I would like to revert it. There may be many cases where "usage case" is incorrectly used, but this wasn't one of them. It was just one sentence, that should be a hint for your bot to not touch it. --85.148.244.121 06:04, 11 April 2020 (UTC)

Do not revert it. We have standardised headers, which allows us to keep track of the millions of pages on the wiki. Think of it this way: in an idealised, complete entry, there may be many relevant usage notes, or there may only be one, but all usage notes will be under the header 'Usage notes'. —Μετάknowledgediscuss/deeds 06:08, 11 April 2020 (UTC)
That's OK and why I asked it, but are we still allowed to call us "the English-language Wiktionary" [5] if we refuse to speak English and even have bots to remove English from content which uses it? In an idealised English-language wiktionary, we would be writing English (and that still has plural and singular, if that changes the undeclined word would probably win). On the other hand, I don't even speak standard English very well (Sassenach for Alba); for me, it's OK, I just asked. --85.148.244.121 07:45, 11 April 2020 (UTC)
Well, "Usage notes" looks like English to me – certainly not Klingon at least. It does strictly speaking violate the rules of grammatical agreement in paadje, but Wiktionary can do what it wants because there's no Académie Anglaise to punish it for crimes against English grammar. More seriously, it would be a headache to try to make the headers agree in number with the contents of the sections, and it would make entries a bit less machine-readable, so Wiktionary has chosen one grammatical number for each header ("Usage notes" in plural, "Pronunciation" in singular) and I enforce it. This is the current convention, and changing it now might cause various bots and tools to break. — Eru·tuon 08:35, 11 April 2020 (UTC)

Update to {{en-conj-simple}}Edit

If you have time, I was wondering if you would see if {{en-conj-simple}} could be tweaked so that the archaic second person singular present tense (for example, walkest) and archaic third person singular present tense (walketh) forms could be made into links that, if clicked on, would create the inflections in an accelerated manner, in the way that it works with {{en-verb}}. There might have to be a warning somewhere that editors should check whether these verb forms are attestable. This isn't urgent. — SGconlaw (talk) 16:52, 12 April 2020 (UTC)

@Sgconlaw: I've added the second-person singular past-tense form (-edst) and made the table unconditionally link the forms, because up till now they were only linked if the target page existed; linking to nonexistent pages is a requirement for adding acceleration. I think I'll add acceleration to all the forms, not just -eth and -est, as none of them have it yet. — Eru·tuon 20:07, 20 April 2020 (UTC)
Thanks. I had no idea the -edst form existed. The format looks odd, though (what’s the significance of the two columns in the “past tense” section?) – perhaps it should match the present tense column? — SGconlaw (talk) 20:10, 20 April 2020 (UTC)
The past-tense columns were basically "modern" and "Elizabethan", but I've changed it to the format of the present-tense column. — Eru·tuon 20:45, 20 April 2020 (UTC)
@Sgconlaw: Okay, finished the process. Added new acceleration protocols to Module:accel/en for the archaic forms. Let me know if you notice any problems. — Eru·tuon 21:07, 20 April 2020 (UTC)
Is the acceleration working? I clicked on the links cherishest and cherishedst (note: not saying these words exist) in the sample on the documentation page, and they just led to blank pages. — SGconlaw (talk) 04:22, 21 April 2020 (UTC)
@Sgconlaw: Those links work for me. Do you have the acceleration gadget enabled in your preferences (search for "accelerated creation links" on the page)? — Eru·tuon 05:13, 21 April 2020 (UTC)
Accelerated links in {{en-verb}} work for me. Didn’t know I had to do something extra for these – will check. — SGconlaw (talk) 05:18, 21 April 2020 (UTC)
Okay, {{en-conj-simple}} should work if {{en-verb}} does. Oh, the problem is that acceleration doesn't work in the template namespace. Try clicking the links in the conjugation table in cherish instead. — Eru·tuon 05:30, 21 April 2020 (UTC)
Ah, that was the issue. Yes, it's working fine! Thanks again. — SGconlaw (talk) 07:57, 21 April 2020 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Should this extra pipe be removed? — SGconlaw (talk) 19:11, 21 April 2020 (UTC)

@Sgconlaw: Huh. That definition should be {{en-archaic second-person singular past of|translate}}. Aha, here was the problem. — Eru·tuon 19:19, 21 April 2020 (UTC)
 SGconlaw (talk) 04:33, 22 April 2020 (UTC)

Chaucer quotes in English sectionEdit

Hey. Could I get a list of all Chaucer quotations in the English (but not Middle English) section of an entry? It's because they shouldn't be there, they should be in Middle English. You could put it at Wiktionary:Todo/English Chaucer. Thanks in advance --Vitoscots (talk) 17:25, 20 April 2020 (UTC)

@Vitoscots: You already saw, but I made the page. It includes things besides quotes, but it has excerpts so you don't have to waste your time visiting the entry. — Eru·tuon 23:45, 20 April 2020 (UTC)
Love ya, Eru! --Vitoscots (talk) 00:17, 21 April 2020 (UTC)

Chaucer list for Shakey and MillyEdit

Hey. Can we get a list of undated Milton and Shakespeare quotes? I guess looking for

We already have those wheels, in two forms:
  1. Search for, eg, 'hastemplate:"rfdate" insource:/rfdatek\|en\|Chaucer/'
  2. Use categories like Category:Requests for date/Chaucer.
HTH. DCDuring (talk) 15:00, 21 April 2020 (UTC)
@DCDuring: Yeah, that works for {{rfdatek}}, which has the author in the template, but {{rfdate}} doesn't (though you can find examples of {{rfdate}} applied to Shakespeare, for instance, among the search results for hastemplate:"rfdate" Shakespeare); for instance in drug:
#* {{rfdate|en}} {{w|William Shakespeare}}, ''{{w|Timon of Athens}}''
#*: Hadst thou, like us from our first swath, proceeded / The sweet degrees that this brief world affords / To such as may the passive '''drugs''' of it / Freely command, thou wouldst have plunged thyself / In general riot {{...}}
Also I think Wonderfool has an enthusiasm for lists. They help keep him motivated because he can check things off and write down how much work is left.
@Vitoscots: I'll see what I can do. It's more complex than the previous Chaucer request. Gotta figure out what the format typically is. — Eru·tuon 17:49, 21 April 2020 (UTC)
Yeah, you gotta keep your volunteers motivated, boss. --Vitoscots (talk) 17:51, 21 April 2020 (UTC)
Either alternative technique yields lists from which the completed items disappear, which provides even more motivation. And what about my motivation, having added nearly ten thousand instances {{rfdatek}} and {{rfdate}} only now to have WF reject my handiwork? DCDuring (talk) 18:33, 21 April 2020 (UTC)
I didn't reject your handiwork, DCD. I was attacking your rfdefs with my steely knife. --Vitoscots (talk) 19:40, 21 April 2020 (UTC)
I don't get it. Seems like my making a list is a good way to make your work come to fruition (with dates finally being added)! — Eru·tuon 21:19, 21 April 2020 (UTC)
Your Chaucer list was less selective than "my" lists, so you must have essentially ignored the presence or absence the rfdate and rfdatek templates and the resulting categories. DCDuring (talk) 23:41, 21 April 2020 (UTC)
@DCDuring: Ahh, I see what you mean now. I thought you were talking about the Milton and Shakespeare lists. The purpose of the Chaucer list is to catch Chaucer quotes that need to be moved from the (Modern) English to the Middle English entry, so yeah, {{rfdate}} and {{rfdatek}} aren't involved. (There were false positives because I just searched for "Chaucer" in English sections without trying to figure out if it was the author of a quote, or if it was the Chaucer as opposed to another Chaucer.) But the Milton and Shakespeare lists are only occurrences in conjunction with {{rfdate}} so they are making use of your work inserting {{rfdate}}. — Eru·tuon 00:39, 22 April 2020 (UTC)
I see. All of the Chaucer quotes now in English should be in Middle English. We had long been accommodating an excellent contributor who thought Middle English quotes, even of alternative forms should appear in the entry of the English descendant. Having the dates should make it especially obvious. BTW, it would be nice to locate each quote in the manuscript fragment it was found in. I think there are four of them, but I haven't seem dates for the fragments. BTW, you have seen how many authors there are with rfdatek and rfquotek categories, right? DCDuring (talk) 00:49, 22 April 2020 (UTC)
Who was the excellent contributor, out of interest? --Vitoscots (talk) 00:03, 23 April 2020 (UTC)
@DCDuring: Yes, seems like a tremendous number. Maybe it would be useful to print the templates in some kind of list format, showing the definition under which {{rfquotek}} was placed, and the quote that {{rfdatek}} was placed on (somewhat like WT:Todo/Undated Milton and WT:Todo/Undated Shakespeare). Then people could more quickly look over the requests to find ones they can fill, without having to visit hundreds of pages and look over the text of them. The list could be put on a Toolforge site, though then it would be harder to give editors the satisfaction of crossing out the requests they had filled. — Eru·tuon 04:25, 24 April 2020 (UTC)
To make easier yet, you could include a link to a search for the quote on Google Books (and Wikisource and Gutenberg?}. The searcher might still have to shorten the search string to find the original wording of the quote, but the job would often be very easy indeed. I had thought about that while adding all the templates, but I just wanted to get the ball rolling. DCDuring (talk) 04:58, 24 April 2020 (UTC)
Okay, did the Shakespeare and Milton {{rfdate}} bit. I included the quotes in the list to make your job easier. — Eru·tuon 21:19, 21 April 2020 (UTC)
Nice. It didn't take long to clean all those up. --Vitoscots (talk) 22:09, 24 April 2020 (UTC)
@Elvinrust: See Wiktionary:Todo/Undated English quote-templates. It's not a Milton-and-Shakespeare-only list, but it's easy to find them on it. — Eru·tuon 21:21, 5 May 2020 (UTC)

Re: Category timestampEdit

Re your question on #wikimedia-tech, just checking: you are aware that the timestamp is not supposed to reflect when a category was added to a page, aren't you? See mw:Manual:Categorylinks_table#cl_timestamp. Timestamps are often updated en mass after some template change, for instance. That said, some SQL queries can often shed light on what's going on. Nemo 10:12, 22 April 2020 (UTC)

@Nemo bis: Thanks, I wasn't aware of that dynamicpagelist used cl_timestamp for category additions. That explains why the list is sometimes random. — Eru·tuon 17:34, 22 April 2020 (UTC)

PleaseEdit

keep things under control from now on. I'm taking a long break --Vitoscots (talk) 19:33, 26 April 2020 (UTC)

Most deleted pagesEdit

Was just reading this Special:Diff/47099083/59225779 and wondering if this can be queried – which pages have been deleted many times but do currently exist? Do we have that data? – Jberkel 19:03, 27 April 2020 (UTC)

@Jberkel: Well, here is the "times deleted leaderboard", with a column indicating which titles actually exist. User talk:Equinox is at the top of the currently existing titles because Equinox doesn't believe in archiving his talk page... sigh... — Eru·tuon 19:38, 27 April 2020 (UTC)
Great, thanks! lots of NSFW type words as expected but some interesting ones as well. – Jberkel 20:30, 27 April 2020 (UTC)
Sigh sigh sigh. Well, I don't archive my talk page because I see "talk" as an ephemeral thing, like (to some extent) e-mail, IRC, or instant-messenger chat; distinct from the actual meat or content of the project, the entries and appendices. I can see why some people would disagree with this, especially when it's a "page" on the project (and we do archive "unowned" talk like Beer Parlour). But don't panic: my deleted stuff is still available to admins and the future historians who will read us, like Pepys, to find out what really drove amateur lexicographers in the early 21st century. (HI HISTORIAN! I SEE YOU!) I've got a vague memory that somebody (was it Purplebackpack?) tried to pass a vote preventing people deleting their talk pages, but I think it failed... can't remember... don't care really. I do take the point that if people should be able to delete their userspace then this ability shouldn't necessarily be limited to those who happen to have admin rights (required for deletion); there's always the "speedy" tag though, which tends to be respected unless it's being abused to hide ongoing debates etc. Equinox 23:20, 4 May 2020 (UTC)
@Equinox: Wrote the response below but never submitted it. Got angsty or whatever about it. So it sat in my browser (which kindly saves it for me) for weeks.
Well, most talk is un-ephemeral here. Admins can get at your talk page stuff at the moment, if they can find it in the long list of deleted revisions. But no one can use the search engine to find out when someone talked to you about something on your talk page. (Unless there's a "search deleted or past revisions" tool somewhere.) So it's best if nobody starts an important discussion on your talk page that someone might want to be able to refer back to: they'd have trouble finding it because it wouldn't come up in search results. It's not quite like a user page because other people are involved in it and it's not quite like personal IM since it's public to begin with.
Unfortunately I apparently think too much about the sort-of-lost treasures of discussion on your talk page. I didn't really mean to lecture you about it but I let my little complaint slip out and I pinged you because didn't want to complain behind your back so to speak...
PS: I think I wouldn't support a measure to require people to not delete their talk pages though. Seems too restrictive of personal liberty. — Eru·tuon 22:15, 23 May 2020 (UTC)
We agree to license all our "contributions" under whatever freebie licence so I suppose that covers talk pages too. Equinox 05:04, 31 May 2020 (UTC)

WT:Todo/Undated BibleEdit

Hi. Good work with WT:Todo/Undated Milton, by the way. Could we get something similar for WT:Todo/Undated Bible. It has been noticed that undated Bible quotations are all over this fricking website. For some reason, DCDuring (talkcontribs) never tagged them with {{rfdatek}} so they don't show up in the categories. --Elvinrust (talk) 22:43, 4 May 2020 (UTC)

I skipped what was the most common undated cluster. It is necessary to handle not just the KJV, but also Douay and other versions. Also some quotations don't have Bible, but rather, say, Matthew. You could find most of them by searching for 'incategory:"English lemmas" insource:/\#\*[ ]+[A-Z][a-z]+/'. To speed things up you could add 'incategory:"English nouns"' and then proceed to verbs, etc. I would think we would want to link to Wikisource's edition of KJV if at all possible. DCDuring (talk) 22:57, 4 May 2020 (UTC)
@DCDuring, Elvinrust: Made the list using that regex, looking at all English sections. I'm surprised that it really does catch mostly Bible quotations! It seems like it should be too general. I thought of filtering it by "Bible" and names or abbreviations of books of the Bible, but it doesn't seem worth it. — Eru·tuon 02:06, 5 May 2020 (UTC)
That regex WAS the one I used to try to catch ALL of the quotations that didn't start with a date. I just didn't insert {{rfdatek}} or {{rfdate}} in the biblical quotes. DCDuring (talk) 02:09, 5 May 2020 (UTC)
The advantage of using the search with a regex is that it yields a dynamic list, removing those that have been corrected and adding any that may have been added while the operation is in process. DCDuring (talk) 02:12, 5 May 2020 (UTC)
@DCDuring: Aha, it was your work. Thanks. Yeah, the search engine has its advantages. I could try to update the list more frequently using the MediaWiki API (generating a list of pages, getting their content, searching it), but it would be more complicated and doesn't seem worth the trouble since Wonderfool is the only person who seems to be using the lists at the moment and he likes crossing stuff off. — Eru·tuon 01:21, 7 May 2020 (UTC)
I was happy when I discovered how the short regex caught so many (also, such a large proportion) of the quotations without dates. Talk about low-hanging fruit. Going after easy targets does mean that the remaining targets are less likely to be found. The ultimate residual mechanism of manual individual contributor error-detection and -correction is very slow.
Also, the reqex I used actually allowed for multiple occurrences of "#": [#]+. DCDuring (talk) 13:35, 7 May 2020 (UTC)
I used a pattern with the equivalent because it checked that the list marker was at the beginning of a line and "<line start>#*" would skip quotes under sub-definitions. I ultimately generated a JSONL file of probably-quote wikitext from English sections to make it faster to find Bible quotes. (Searching the dump for quotes took ~2 minutes at best but searching the quote file takes a few seconds.) — Eru·tuon 19:19, 7 May 2020 (UTC)

QuarryingEdit

Also, it would be fun to see the league table for the most one-sided thank relationship (an unrequited-love list, if you will), like this where we can all see that I'm so meta even this acronym (talkcontribs) is totally stalking JohnC5 (talkcontribs) (951 thanks), but with a part where JohnC5's number of thanks to ISMETA (341) is deducted from that total. --Elvinrust (talk) 22:49, 4 May 2020 (UTC)

Which makes me think of another fun list - most affectionate thank-couples (combined thank-totals...John as ISMETA will win that, hands down) --Elvinrust (talk) 22:51, 4 May 2020 (UTC)
Ooh, and a list of the most aggressive relationships, coz I'm gonna make a documentary about it called Users who Revert Other Users. --Elvinrust (talk) 22:53, 4 May 2020 (UTC)

HeaderEdit

For some hiragana entries I can’t remove the header, therefore you see some weird empty header which is my attempt of removing it. Thanks for the correction! Shen233 (talk) 03:52, 7 May 2020 (UTC)

You may not be very familiar with Japanese wiktionary, but for most non-lemma hiragana entries, no header is needed for the current "{{ja-see}}" redirect mechanism. There maybe a noun and a verb which share the same hiragana, then we separate them by etymologies, such as 五日 (いつか) a noun and いつか, an adverb. In older practices they put PoS and then "{{ja-def }}" though. Shen233 (talk) 04:03, 7 May 2020 (UTC)

Thanks for the explanation. I did a little editing of Japanese, mostly updating {{ja-readings}} and fixing ruby, in the past, before {{ja-see}} was renovated. Here I was just removing the empty header === ===: it isn't officially allowed in WT:EL and shows up in my cleanup list, User:Erutuon/mainspace headers/possibly incorrect. — Eru·tuon 17:48, 7 May 2020 (UTC)

Module:es-pronuncEdit

Hi, Erutuon. Could you change a part of Module:es-pronunc? In Template:es-IPA ll is shown as a consonant with different pronounciation in Castile and Latin America but that is not accurate. yeísmo and lleísmo exist in both regions, yeísmo is preferred in both too. Words like llamar should say "(yeísmo) IPA(key): /ʎaˈmaɾ/ (lleísmo) IPA(key): /ɟ͡ʝaˈmaɾ/. Thanks in advance. 181.226.219.122 20:04, 16 May 2020 (UTC)

I agree that this should be changed, but it's more complicated because seseo is also involved. So with this change, there might have to be four pronunciations, distinción and lleísmo, distinción and yeísmo, seseo and lleísmo, seseo and yeísmo. I don't know if all of these exist. It would be better to discuss this at Module talk:es-pronunc. — Eru·tuon 16:26, 21 May 2020 (UTC)
Thank you. Yes, all of them exist, being distinción + lleísmo the less common way. Someone posted a message in 2018 and it remains unanswered. I'll repeat my request there. Regards. Lin linao (talk) 18:57, 7 June 2020 (UTC)

Missing Spanish idiomsEdit

Hey. Can you make me a list of all the entries in this Spanish cat that are not in en.wikt? Let's put it at Wiktionary:Todo/Missing Spanish verb idioms --Spanishlearner574 (talk) 21:50, 23 May 2020 (UTC)

@Spanishlearner574: Okay, made a Quarry query and pasted the results there. — Eru·tuon 22:57, 23 May 2020 (UTC)
Sweet. There's more entries there than I was expecting. --Spanishlearner574 (talk) 23:16, 23 May 2020 (UTC)
Could you make that list even better by including links to es.wikt, like below - I started doing it manually offline but found no quick way to make the changes. --Undurbjáni (talk) 10:35, 28 May 2020 (UTC)
@Undurbjáni: Ah, yeah, makes sense. Added it. You can change the look of it by editing the same part that I edited. — Eru·tuon 15:54, 28 May 2020 (UTC)
  1. abrir cancha (es:abrir cancha)
  2. abrir el tarro (es:abrir el tarro)

Category:Han scriptEdit

This has had a mostly unnoticed module error for quite some time. As far as I can tell, it's a disagreement between Module:category tree/script cat/blocks and Module:Unicode data/blocks about where the end of a block is, and it seems to have been triggered by this edit. Could you take a look at it? It's definitely not high priority, but it's mildly annoying... Chuck Entz (talk) 01:31, 24 May 2020 (UTC)

@Chuck Entz: Thanks! Fixed. That was because Module:scripts/data was assigning a range of unassigned code points ending in U+2FFFF to Hani, and that one doesn't have a block assigned to it. It would be handy to include the whole Supplementary Ideographic Plane (U+20000-U+2FFFF) because I guess it will only ever include Han characters, but Module:category tree/script cat/blocks requires the first and last code points of the ranges to be assigned. — Eru·tuon 01:58, 24 May 2020 (UTC)

fixing excessive width of Hungarian-language number boxesEdit

(Antecedents.) Would you please change the display of "Adverbial ordinal:" to A.o. (preferably with this tool tip) in Module:number list? See e.g. tizenkilenc. Currently this seems to be the only way to avoid its double entries excessively widening the table, without any side-effects. Thank you in advance.

Another way I could imagine is inserting a string length check possibly before table.concat(form, ", ")), so that a line break should be inserted instead of a space after the comma if a given value is longer than e.g. 15 characters. However, it would affect lots of other tables as well, so I understand if you'd rather avoid it, although it might be some improvement nevertheless.

I've deleted one out of the three values given at Adverbial ordinals from Module:number list/data/hu, because the formatting of its values simply didn't allow the number box to be inserted into másodszor; it produced an error message. A similar solution could be considered for the distributive, possibly abbreviating it to "Dist.:", because we're bound to have the same problem, see e.g. száz. (@Panda10, do you have any suggestion?) Adam78 (talk) 22:55, 2 June 2020 (UTC)

Sorry, I don't have any new suggestions. Panda10 (talk) 16:24, 3 June 2020 (UTC)
Unfortunately string length is complicated. There is a function to count the number of code points, but it doesn't correspond to the number of visible characters when you have, for instance, combining accents (a + ́ = á). This doesn't come up with many European languages, but would with various Indian and Southeast Asian scripts. The number of Unicode graphemes is better (á is a single grapheme) but we don't a grapheme-counting function here on Wiktionary; I'd have to find or write one. Graphemes aren't exactly proportional to font length, but they are closer.
I think these tweaks are not the final answer to the problem of the number box layout. Some sort of redesigning would be better, but I have no good ideas at the moment and I'm just discouraged about the whole thing. I might implement your suggestions as a temporary measure. — Eru·tuon 00:02, 4 June 2020 (UTC)
All right, no problem. In this case, forget about string length. All I'd like to ask you is modify these two names in the list:
I was also thinking about the current term "number of people", which is named "adverb of number" in a grammar book, but it sounds too unspecific to me (many of these terms are some kinds of adverb of number anyway), so it's better kept as it is, unless you suggest otherwise. Thanks a lot in advance. Adam78 (talk) 18:35, 4 June 2020 (UTC)
@Adam78: I changed the display of "adverbial ordinal" but am not sure what to do about distributive because it's also used by other languages. Maybe there needs to be a way to customize the label for each number type for each language. — Eru·tuon 18:32, 5 June 2020 (UTC)
OK, thank you. We're still one step ahead. Now the width looks considerably better, if not the best. Adam78 (talk) 23:18, 5 June 2020 (UTC)

Arabic new entry templatesEdit

Hello, I created these templates: Template:ar-nogomatch

and then I realized that it should be added to MediaWiki:Searchmenu-new. Can you please add it there to language picker the dropdown? LinguisticMystic (talk) 09:45, 22 June 2020 (UTC)

@LinguisticMystic: I've added the buttons from Template:ar-nogomatch, but I modified the style to match the rest of the languages in MediaWiki:Searchmenu-new. — Eru·tuon 18:18, 22 June 2020 (UTC)
Thanks! Actually it's still not working, when I try to create a new entry, only English, American Sign Language, Spanish, Swedish pop up in the dropdown menu, and they dont seem to be working either. When I select Swedish, for example nothing happens. LinguisticMystic (talk) 18:28, 22 June 2020 (UTC)
I'm wondering what could be the problem. LinguisticMystic (talk) 18:40, 22 June 2020 (UTC)
@LinguisticMystic: MediaWiki:Searchmenu-new doesn't have a dropdown menu, at least with my settings. You must be seeing a different MediaWiki message. — Eru·tuon 18:51, 22 June 2020 (UTC)
Below the table, below VERB, it says English, Select a different language. If you click English, the others appear, except for Arabic. LinguisticMystic (talk) 18:55, 22 June 2020 (UTC)
@LinguisticMystic: Ah, it looks like I'd disabled MediaWiki:Gadget-SpecialSearch.js, which generates the dropdown menu in MediaWiki:Searchmenu-new. I might be able to figure out why the gadget isn't picking up Arabic. — Eru·tuon 19:07, 22 June 2020 (UTC)
Unfortunately, the other options don't work for me either, only the default English option, so please check if the code is okay. LinguisticMystic (talk) 19:11, 22 June 2020 (UTC)
Great. It is working indeed. Thanks a lot. You made my work much easier and faster. LinguisticMystic (talk) 20:10, 22 June 2020 (UTC)

Module errors due to removing items from Module:unsupported titles/dataEdit

{{unsupported|://}} at Wiktionary:Beer_parlour/2016/October#Possible future vote about deleting all programming language_symbols, {{unsupported|ideographic space}} at ideographic space, and a whole Finnish-declension-table full of unsupported inflected forms at Unsupported titles/n:s. Chuck Entz (talk) 04:41, 25 June 2020 (UTC)

@Chuck Entz: Ouch. Reverted. I noticed the one (þ), but didn't go looking for more. Later I might try to figure out which titles were added in the edit and restore them, or maybe User:J3133 will be kind enough to do it. — Eru·tuon 04:47, 25 June 2020 (UTC)

lots of rfdateksEdit

Hi. It's been a while since I've bugged you for a random list. Would you be able to cook up a list of the entries which contain the most occurrences of {{rfdatek}} and {{rfdate}}? My bet is the "winner" will have around 25. --Nueva normalidad (talk) 07:42, 1 July 2020 (UTC)