User talk:CodeCat

Archives: 2009-2010 · 2011 · 2012
Start a new discussion


Thread titleRepliesLast modified
Language codes in templates1222:50, 24 November 2015
Appendix:Proto-Indo-European/ātr-1708:09, 24 November 2015
Module:compound/templates008:03, 24 November 2015
A little regex help417:33, 23 November 2015
Germanic loans in Proto-Samic221:48, 21 November 2015
Declension class fi:solakka611:58, 19 November 2015
Templating forms between protolanguages418:18, 16 November 2015
Requests for Finnish etymologies917:49, 16 November 2015
template error012:53, 12 November 2015
Snee003:15, 12 November 2015
Module:ca-verb011:51, 8 November 2015
Topic Cat004:15, 4 November 2015
Help implementing WT:ACCEL419:07, 2 November 2015
Wiktionary:Votes/pl-2015-11/NORM: 10 proposals003:00, 2 November 2015
Mewbot and redirects215:33, 31 October 2015
MewBot and spaces1012:54, 28 October 2015
MewBot removing blank line after {{also}}512:56, 27 October 2015
Zipser German - asking for permission205:45, 27 October 2015
Adding orange links021:34, 25 October 2015
Parsing nl-verbs templates using a bot215:07, 24 October 2015
First page
First page
Previous page
Previous page
Last page
Last page

Language codes in templates

Hello you. I haven't concerned myself with templates at all but I do sometimes find that they get longer and unwieldier over time; therefore I was pleased to see the recent news that we can use "en" etc. as first parameter. However: why do some templates support this while others don't? (I think perhaps "cx" and "sense" don't, off the top of my head.) How can one tell without checking source code? Should they all, one day, support the language-first format?

Equinox 21:21, 24 November 2015

A requirement for this to work is that the language code is mandatory. So part of the work is in making sure that all existing entries have a language.

CodeCat21:25, 24 November 2015

Something else I've occasionally thought is that it would be great if the default language in any language-section were the language of that section (i.e. a language-less templated element in an English section would automatically be English). It seems to make sense and would save our poor fingers... I am probably missing some counterarguments, but I can't immediately think of any, except the lame one that it would inhibit manual copy-pasting where a second related language needs a similar entry.

Equinox 21:32, 24 November 2015

It would be great, but I have no idea how it might be done.

CodeCat21:59, 24 November 2015

I reckon that existing uses of term and context should be bot-replaced where possible; hopefully no new users will add the templates to pages, thus making the switchover more complete.

Μετάknowledgediscuss/deeds21:33, 24 November 2015

I'm hesitant to do anything like that because of you-know-who.

CodeCat21:59, 24 November 2015

This is why voting before doing stuff is good!

Equinox 22:02, 24 November 2015

Before doing anything Dan doesn't like? Just look at this: Wiktionary:Beer_parlour/2015/November#Rhymes_navigation. Dan has been reverting a user just for using a template he doesn't like. He's made himself the consensus police, editor choice be damned. I think it's time that we just tell him to stuff it and no longer let him do this.

CodeCat22:04, 24 November 2015


ok, but why did you deleted Appendix:Proto-Indo-European/ātr- ? why didn't you just moved it? I worked really hard on it.

1Albin2 (talk)23:28, 20 November 2015

We often get people adding incorrect or outdated etymological information to Wiktionary, so we tend to be rather cautious about anything that seems less than legit. I can restore and move the page if you tell me what it should be called according to modern Indo-European linguistics.

CodeCat23:31, 20 November 2015

Modern Indo-European linguistics has at least 5 different designations of which *atr- or *ater- (both with a long -a-) are the most cited ones. You can choose one of them. In any case, it would be nice if you could restore it at least to my sandbox.

1Albin2 (talk)23:45, 20 November 2015

I've placed it at User:1Albin2/ātr-. My main objection to this reconstruction is the long ā, since linguists are pretty confident nowadays that there was no such thing in Indo-European. Another point is that words almost always started with a consonant.

CodeCat00:12, 21 November 2015

Thank you. I'll have an eye on this subject, even though I think, as long as Indo-Europeanists and other linguists use the long ā in their books, journals and papers, we ain't able to create any entry for PIE in the future, except we use *Hʷet-, *Hʷet˖r-, or *hₓehₓtr- with no defined/fixed meaning for "fire". What do you think about *hₓehₓtr- actually, does it fit?

1Albin2 (talk)00:24, 21 November 2015

I don't really know what the x's mean though. Do they indicate that the laryngeal is unknown? And Hʷ is another thing I've never seen before.

CodeCat02:01, 21 November 2015


Both of the pages currently in Category:Pages with module errors are there due to the module getting confused by an empty lang parameter. You may want to look at the changes you made the other day to see why this is happening.

Chuck Entz (talk)08:03, 24 November 2015

A little regex help

Hey Code, I'm working on a new Sanskrit declension module that can work across different scripts, and I was having some issues formulating a regex given Lua's weak pattern syntax. I want to check whether a lemma is monosyllabic ending in ī, but to check for an onset cluster in Devanagari encoding, you need the ् character between each consonant in the cluster, and I can't figure out how to check for:

  • zero or more (consonants each followed by ्)
  • followed by a consonant with ी

or approximately '^(?:[Deva consonants]्)*[Deva consonants]ी$'

This would be easier if I could do use a Kleene star after a non-capturing group, but Lua doesn't allow that. Do you have any advice for this? Thanks.

JohnC502:49, 23 November 2015

You could just capture it and then ignore the capture?

CodeCat15:47, 23 November 2015

Annoyingly according to the MW docs, Lua patterns do not allow greedy quantification over capture groups.

JohnC516:28, 23 November 2015

Then I guess you'll have to make do with several matches, with increasing numbers of preceding consonants.

CodeCat16:55, 23 November 2015

I think so too. I believe the maximum onset sze in Sanskrit is 3. It is also an interesting question how many words possess this edge case. I'll look into it. Thanks for the advice; though it's odd there isn't an easy way to do this.

JohnC517:33, 23 November 2015

Germanic loans in Proto-Samic

For future reference: *a > *ō in Proto-Samic was later than *ē > *ā in North Germanic, so any PS entry that has *ā in correspondence to a Germanic *a (or *ā), as in e.g. *mānō, is better marked as a loan from Proto-Norse than Proto-Germanic.

(I guess it would be also possible to add Northwest Germanic as a "PG dialect", similar to what we do with Proto-Uralic vs. Proto-Finno-Ugric, but this doesn't seem like a move that would have terribly numerous benefits.)

Tropylium (talk)20:41, 21 November 2015

Could the word not have been borrowed with Germanic ē being substituted by Pre-Samic ǟ?

CodeCat20:45, 21 November 2015

That much is possible in principle, yes, at least in the case of roots of the shape *ā-ē < *ǟ-ā, though anything that early can be usually told apart by other signs (e.g. by Finnic equivalents that have *ä).

Tropylium (talk)21:48, 21 November 2015

Declension class fi:solakka

I noticed that we have always had the illative plural wrong in declension class "solakka", in the sense that the form with one "k" is the more common one and should thus come first. See e.g. puolukka: puolukkoihin" is above "puolukoihin" and it should be the other way round. It seems that KOTUS's net dictionary has it wrong. If you google any word in this declension class, you'll mostly get at least hundred times as many hits for double-k-form than for the single-k-form. For once, I want to disagree with KOTUS!

Hekaheka (talk)08:32, 14 November 2015

I wrote it the wrong way. Should read: If you google any word in this declension class, you'll mostly get at least hundred times as many hits for single-k-form than for the double-k-form.

Hekaheka (talk)08:35, 14 November 2015

It's strange, because normally the illative plural has the strong grade doesn't it? I wonder why there's this exception. Anyway, I've swapped the forms around. Should the same be done for the laatikko class?

CodeCat15:00, 14 November 2015

Indeed it seems that also in "laatikko" class the single-k is more popular. Here are numbers from Google searches: laatikoiden 114 000, laatikkojen 9 800, laatikoitten 3 600; laatikoita 242 000, laatikkoja 10 000; laatikoihin 78 000, laatikkoihin 6 600; laatikoina 10 000, laatikkoina 3 400; laatikoineen, laatikkoineen. If one chooses another word, one gets different result. With "valikko" the frequencies are more even (ratio is btw. 1 and 2), and in comitative the double-k form is more frequent.

Hekaheka (talk)07:07, 15 November 2015

Forgot laatikoineen 8 000, laatikkoineen 4 000. Also in "puolukka", puolukoineen is much more popular than puolukkoineen.

Hekaheka (talk)07:12, 15 November 2015

I wrote KOTUS about this. Their comment: good observation, but we cannot change the system right now. I take this as support.

Hekaheka (talk)12:38, 18 November 2015

For the record, the historical background is that coda *j used to trigger gradation of geminate stops in some Finnish dialects (IIRC there's no evidence for gradation of singletons), but it's been mostly analogized away from existence; in particular since plural & past tense *-j has often contracted with the preceding vowel to -i-. (This has included even *oi-stem nouns such as *kukkoi > dial. kukoi for kukko.) However, -kka always takes the plural stem -kkoi-, so gradation to -koi- had pretty good odds for remaining.

Tropylium (talk)11:58, 19 November 2015

Templating forms between protolanguages

I've been meaning to ask: is there any particular reason you've been wrapping intermediate phonological forms in etymologies in {{m}}, as e.g. here? Things like these are after all not quite from whatever language is under discussion, but from an older stage. (The "early Proto-Germanic" and "early Proto-Finnic" forms also could be called "late PIE" and "late western PU" just as well.)

As long as we're not linking them to anything, what's exactly the benefit to this encoding? Machine-readability, arguably sometimes, but conflating different stages of development might also be detrimental there, depending on what one tries to machine-read.

Tropylium (talk)01:52, 16 November 2015

It's still better than having it tagged as English, isn't it? Also, {{m}} doesn't produce the same output as '', it's not just tagging the language that matters, there's also taggin mentions.

CodeCat01:55, 16 November 2015

Wait, are you saying that running text on Wiktionary is tagged as English by default? That seems like a bad idea.

Tropylium (talk)02:33, 16 November 2015

Yes, it is.

CodeCat02:39, 16 November 2015

I'm curious, what's the difference in effect between using {{m|[LANG]||[TERM]}} versus using {{lang|[LANG]|[TERM]}}? I've been using {{lang}} when I wish to specify the language without creating a link, but I'm happy to change if {{m}} is better somehow.

‑‑ Eiríkr Útlendi │Tala við mig18:18, 16 November 2015

Requests for Finnish etymologies

Nobody is actively working with Finnish etymologies. Thus, adding requests for them seems a bit pointless activity.

Hekaheka (talk)04:46, 12 November 2015

The entries still need etymologies though, even if there's nobody to provide them. I see it more as a "to do" list.

CodeCat13:45, 12 November 2015

No harm done, but there must be like 40,000 missing etymologies in Finnish.

Hekaheka (talk)18:54, 12 November 2015

Maybe one day I'll get around to adding them, like I'm doing now. Suffixes are much easier to get than other origins.

CodeCat18:55, 12 November 2015

All right, I idin't know that you are working on them. Keep up the good work. I was just worried of the possibility of accumulating a superlong page with a list of missing etymologies. I'm currently working with Finnish index which also has tens of thousands of red links to follow. It'll take years before I finish it.

Hekaheka (talk)14:28, 13 November 2015

I still take passes at cleaning up Finnish etymologies every now and then; having them tagged helps to keep track of what's done and what's not.

Tropylium (talk)19:31, 15 November 2015

I usually create Proto-Finnic pages if I feel confident enough that the term comes from Proto-Finnic. That means that it has to have attestations in at least the North and South Finnic groups, preferably the less standardised varieties. Finnish + Võro + Veps is an excellent distribution, Votic is also very good. Finnish + Karelian + Veps is probably too dubious, as is North Finnic + Estonian, because Estonian is known to have borrowed a fair number of terms from Finnish.

If I'm not confident in reconstructing a term, I put a request there instead and list the cognates I did find.

CodeCat20:04, 15 November 2015

A good use of caution. Words found in Finnish + Karelian + Veps alone would often include recent Russian loans, similarly Finnish + Karelian + Estonian would rake in lots of Swedish and even some Low German loans.

Tropylium (talk)01:56, 16 November 2015

"I usually create Proto-Finnic pages if I feel confident enough that the term comes from Proto-Finnic." Isn't that called "guessing" by another name?

Hekaheka (talk)17:40, 16 November 2015

The caution I described here is supposed to avoid guessing, and make sure that there is enough evidence for the reconstruction. If all Finnic languages agree on a form, there's no room left for guesswork.

CodeCat17:49, 16 November 2015

template error

This diff created an error in the template with some case forms appearing twice. The changes are yours.

Please look at it and determine how to fix it. I'm sure you meant to do something but I'm not sure what it was. Please don't engage in a dialog because I'm not logged in and won't see it. Thanks kindly., 12 November 2015


omg ! I think it's time for bed. Thank you for catching that!!

Leasnam (talk)03:15, 12 November 2015


Hi. Thanks again for your extremely difficult modules. Please can you modify Module:ca-verb in order to show the alternative past for néixer - see this link - PASSAT (alternatiu)

SimonP45 (talk)11:51, 8 November 2015

Topic Cat

The label autodetect feature is now broken, and Category:Categories with invalid label currently contains 3,895 entries. Please fix it. Thanks.

Chuck Entz (talk)04:15, 4 November 2015

Help implementing WT:ACCEL

Wikitiki suggested you might be able to help with this at {{Module:he-headword}}, for nouns. If you can add what we have now (at User:Conrad.Irwin/creationrules.js) I'd like to expand it later.

Enosh (talk)12:54, 2 November 2015

I'm not sure what you're asking me to do. Do you want me to add rules to creationrules.js? If so, which rules?

CodeCat15:36, 2 November 2015

No, the module. The existing {{he-noun}} is a mix of templates stuff and HTML and I think having a module is simpler and better. It already supports WT:ACCEL so to match functionality.

Enosh (talk)18:03, 2 November 2015

Ok, it should work now. Please test it though.

CodeCat18:16, 2 November 2015

See at User:Wikitiki89/בית, it brings up a module error. And thank you.

Enosh (talk)19:06, 2 November 2015

Fixed now.

CodeCat19:07, 2 November 2015

Wiktionary:Votes/pl-2015-11/NORM: 10 proposals

Please see Wiktionary:Votes/pl-2015-11/NORM: 10 proposals. Did I forget to mention any of the changes that people want to do that could improve WT:NORM? I got stuff from your reverted edits, from the Beer parlour, the NORM talk page and the NORM votes.

--Daniel Carrero (talk)03:00, 2 November 2015

Mewbot and redirects

Your bot is breaking redirects: The Bahamas

DTLHS (talk)04:31, 31 October 2015

I'm aware of this, and will fix it.

CodeCat14:29, 31 October 2015

It's running now to fix the problems.

CodeCat15:33, 31 October 2015

MewBot and spaces

MewBot is generating a *lot* of noise by collapsing two spaces after a period into just one, such as in this edit. This convention doesn't seem to be defined one way or the other in WT:NORM. Could you remove that particular change from MewBot's operation?

‑‑ Eiríkr Útlendi │Tala við mig16:31, 26 October 2015

Now that you mentioned it to CodeCat, it's true that WT:NORM does not seem to have any rule for converting multiple spaces into one. FWIW, I'd support adding that rule to the policy, but for now I abstain as to whether MewBot should stop doing that.

--Daniel Carrero (talk)19:56, 26 October 2015

I don't see any reason why it should stop now, it's close to finished on all Wiktionary entries already.

CodeCat19:57, 26 October 2015

Fair enough, if it's mostly done by this point.

‑‑ Eiríkr Útlendi │Tala við mig20:23, 26 October 2015

From a usability perspective, cases of two spaces after a period, where that sentence is followed by other text within the same paragraph, make the wikitext easier to visually scan in the monospace text presented by the editor, and thus should be kept. I would agree that cases of multiple spaces in other instances should be either collapsed, if followed by other text, or removed, if at the end of a paragraph.

‑‑ Eiríkr Útlendi │Tala við mig20:30, 26 October 2015

Putting two spaces after a period is far from established practice though. I've never done it, and I wasn't even aware it was a thing.

CodeCat20:31, 26 October 2015

Heh. That may speak to your youth. It's definitely a thing. There are whole tirades online about spacing after periods, with a general trend that older folks prefer two spaces, and younger prefer just one. C.f. the Carolingian reforms as they applied to writing: a single space between each letter, two spaces between each word, and three spaces between each sentence. Given that the "single space" in Carolingian terms just meant that the letters weren't all running together, we have modern block print as I learned it: letters distinct and separate, one clear whitespace between words, two clear whitespaces between sentences.

The practice of one space everywhere, even between sentences, appears to be a relatively modern contrivance. C.f. these two pages from Ben Franklin's Poor Richard's Almanack, or this page from the 1914 publication Library Jokes and Jottings by Henry T. Coutts, or this page from the 1894 book Benefits Forgot by Wolcott Balestier, or this page from the 1809 book The present state of Turkey by Thomas Thornton, or this page from Charles Dickens' 1853 book Bleak House, all clearly showing increased spacing between sentences than there is between words. HTML, XML, and other markup enforces single-spacing by programmatically collapsing adjacent spaces into one upon rendering. Modern fonts and layout engines theoretically handle kerning and such to adjust spacing between words and sentences, but many don't make any distinction, resulting in effectively the same spacing between words and between sentences. Two spaces after sentence-final punctuation, especially in monospace situations, can make it easier for a reader to visually parse the text, by clearly differentiating gaps between sentences versus gaps between inline words, including abbreviations that might be followed by a non-final period.

‑‑ Eiríkr Útlendi │Tala við mig21:53, 26 October 2015

Eiríkr, maybe you could consider copying/editing some of your explanation into the "Usage notes" section of our entry about the space, which has none of that information.

--Daniel Carrero (talk)23:16, 26 October 2015

MewBot removing blank line after {{also}}

I've seen a few of these now - I think there probably should be a blank line between the {{also}} template and the first language heading.

Keith the Koala (talk)14:24, 26 October 2015

Not per WT:NORM.

CodeCat14:24, 26 October 2015

I'm not seeing that. It says "One blank line before all headings, including between two headings, except for before the first language heading", but that seems to presume there is nothing before the first language heading.

Keith the Koala (talk)19:48, 26 October 2015

Keith, CodeCat is right. It is standard practice that pages with {{also}} have no blank line between it and the first language section. The sentence "One blank line before all headings, including between two headings, except for before the first language heading." is good enough as it is, it does not assume -- or exclude -- the existence of anything before it, it just states a place where the standard formatting is not having blank lines.

That said, maybe WT:NORM really should be edited to mention {{also}} for clarification. I could swear that in one of the previous proposed versions of the policy, the template was explicitly mentioned.

--Daniel Carrero (talk)20:04, 26 October 2015

Somebody removed it in this edit :) I would argue that putting a blank line between {{also}} and the first heading would make both more readable.

Keith the Koala (talk)09:41, 27 October 2015

I actually agree that putting in a blank line would be good, but as of right now the rules say otherwise.

CodeCat12:56, 27 October 2015

Zipser German - asking for permission

-sche (talkcontribs) wants a category created for dialects of Zipser German, as you could see in the Beer Parlor discussion for October 2015. May I fulfill -sche's request? Or will you?

Lo Ximiendo (talk)18:20, 25 October 2015

I'm not sure why -sche is not able to do that himself? He's experienced enough, I thought.

CodeCat19:22, 25 October 2015

I would have done it myself after a while (as I did for Sathmar), assuming there were no objections. I just posted to the BP to see whether or not there were objections / comments (as there were about Mennonite Low German), and to store my rationale somewhere for others' future reference.

- -sche (discuss)05:45, 27 October 2015

Adding orange links

At Wiktionary talk:Wanted entries#Adding orange links, I proposed reverting one edit you made. Feel free to give your opinion on that, if you'd like.

--Daniel Carrero (talk)21:33, 25 October 2015

Parsing nl-verbs templates using a bot

Hello CodeCat,

I'm trying to parse Dutch verbs which use the module nl-verbs ( However, the module generates quite a lot more information than is given to it and it would take some more than a couple lines of code to mimic it. It outputs HTML as a default which I could parse but this is a bit tedious. When searching through the source I saw the bot parameter (args["bot"]) being used to switch the output to a computer readable format. After lots of trying and searching I wasn't able to pass the parameter without an error message being thrown at me (

  • pres_indc_3sg_1=is
  • past_indc_gij_1=waart
  • pres_indc_u_1=bent
  • pres_indc_u_2=is
  • past_ptc_1=geweest
  • past_subj_sg_1=ware
  • pres_indc_jij_1=bent
  • pres_indc_pl_1=zijn
  • pres_indc_1sg_1=ben
  • pres_subj_sg_1=zij
  • past_indc_sg_1=was
  • pres_ptc_1=zijnd
  • impr_pl_1=weest
  • impr_pl_2=zijt
  • impr_sg_1=wees
  • impr_sg_2=ben
  • infinitive_1=zijn
  • pres_indc_gij_1=zijt
  • past_indc_pl_1=waren

). After asking for help on the pywikibot IRC someone said that this could be due to some edit which restricts the parameters being passed: . As you can see, the bot key is missing in the newly defined params table (is that what you call an array in Lua?).

MewBot is also using this parameter to parse the templates: (look for: params["bot"] = "1") I would guess that your bot isn't fully functional or that I have found deprecated code or something.

Anyhow, could you help me get this parameter to work again. I'm not quite experienced in using Wiktionary/other wikis in this way and I would not like to mess something up that is used accross so many pages.

Thanks in advance.

Markkremer (talk)14:55, 24 October 2015

I added the parameter but I don't know if it will work now.

CodeCat14:58, 24 October 2015

Because the wrongly formatted url in my message now shows a nice list I thing it did.


Markkremer (talk)15:07, 24 October 2015
First page
First page
Previous page
Previous page
Last page
Last page
Return to the user page of "CodeCat".