Python problem (non-Wiki)
editHi. I have been away for some days recovering from a laptop disaster. I have reinstalled Python 2.7.14 Now, when I right click a Python program and select Edit with IDLE, nothing happens (no edit window, no message, nothing). Any ideas? SemperBlotto (talk) 19:32, 2 February 2018 (UTC)
- What operating system? DTLHS (talk) 19:51, 2 February 2018 (UTC)
- Windows 10. SemperBlotto (talk) 19:55, 2 February 2018 (UTC)
- "Try deleting the contents of the .idlerc folder in your profile. To open the folder just type and enter %USERPROFILE%.idlerc." DTLHS (talk) 19:59, 2 February 2018 (UTC)
- I can't find any file or folder named idlerc (with or without a dot) anywhere on my machine. SemperBlotto (talk) 20:57, 2 February 2018 (UTC)
- I uninstalled / reinstalled Python - now works OK. Now I have to get the bot working again. SemperBlotto (talk) 16:08, 3 February 2018 (UTC)
- "Try deleting the contents of the .idlerc folder in your profile. To open the folder just type and enter %USERPROFILE%.idlerc." DTLHS (talk) 19:59, 2 February 2018 (UTC)
- Windows 10. SemperBlotto (talk) 19:55, 2 February 2018 (UTC)
- I reinstalled pywikibot. Now, when I try running the bot, I get:-
Traceback (most recent call last):
File "C:\Python-it\itnouns.py", line 7, in <module> import pywikibot, config File "C:\Python-it\pywikibot\__init__.py", line 15, in <module> from textlib import * File "C:\Python-it\pywikibot\textlib.py", line 17, in <module> import wikipedia as pywikibot File "C:\Python-it\wikipedia.py", line 7559, in <module> get_throttle = Throttle()
NameError: name 'Throttle' is not defined
Any ideas? SemperBlotto (talk) 06:10, 4 February 2018 (UTC)
Uniform usage labels template
editHi! We (@Isbms27 and myself) would like to propose a single unified system of usage labels for wiktionaries in all languages. Such labels exist in some wiktionaries (English, Russian, etc.), their use it not systematic, and in some other languages they are not used at all.
We suggest the following categories, the tags are taken as an example from English wiktionary: 1. Usage/Register/Stylistic: neutral / colloquial / informal / formal / slang / jargon / nonstandard / familiar / periphrastic / official / vulgar / taboo / obscene 2. Speakers: to specify special social groups (by age, gender, social status, occupation (for slang / jargon), etc.) 3. Academic subject area: chemistry / biology / zoology, etc. (for terminology only) 4. Regional/Geography: American English / Australian / etc. 5. Temporal: dated (outdated) / archaic / obsolete / neologism / historical / hot word / nonce word 6. Expressiveness: approving / disapproving / humorous / ironic / offensive / euphemism / 7. Word type: abbreviations / acronyms / initialism
Sorry, if wrong thread!
- I don't see how 7 relates to usage context. —Rua (mew) 18:06, 3 February 2018 (UTC)
- I wanted to reformat, to more easily understand the proposal:
- Usage/Register/Stylistic
- colloquial
- familiar
- formal
- informal
- jargon
- neutral
- nonstandard
- obscene
- official
- periphrastic
- slang
- taboo
- vulgar
- Speakers: to specify special social groups (by age, gender, social status, occupation (for slang / jargon), etc.)
- Academic subject area:
- chemistry / biology / zoology, etc. (for terminology only)
- Regional/Geography:
- American English / Australian / etc.
- Temporal:
- Expressiveness:
- approving
- disapproving
- euphemism
- humorous
- ironic
- offensive
- Word type:
- A couple of points:
- Although many of these are defined in the Appendix:Glossary for English, not all of them are because they are quite subjective. The proposal is pan-project; how would these concepts be normalized across the languages involved?
- What qualifies as academic subject area? e.g. Pharmaceuticals, particularly trade marks like Viagra (or, more controversially, Aspirin, which is a trade mark in most of the world other than the USA) which are not a chemical name or pharmaceutical compound but simply a brand. - Amgine/ t·e 18:30, 3 February 2018 (UTC)
- I completely agree that particular concepts of some categories (esp. Register, Temporal and Expressiveness) should be normalized across languages; we used existing tags from English wictionary as an example, Russian tags are also quite messy and I could find no system for Spanish or German. However, fixing language specific tags is the second step.
- The first and the main step is to introduce a uniform set of categories as a fixed template for all languages and to encourage users, 1) to specify as many categories as possible for a particular word-usage, 2) to use the same pattern describing words in different languages.
- We already have it in grammar description: for every word part of speech is specified, etc. It will be useful to have this for 'semantics'/'usage' information as well.
- For example, at Wikimedia Pre-hackathon in Olot they discuss an idea about the integration of lexical wikidata into some machine translation systems. These usage-labels if uniformly presented in all languages and integrated can help to choose translation equivalents. —Isbms27 10:30, 4 February 2018 (UTC)
- <nod> There certainly would be work to do. But I was asking how you would address normalizing the categories. It is the part which seems most difficult for Wiktionary.
- Also, I believe Rua would like for you to address the concern raised about #7. - Amgine/ t·e 16:08, 4 February 2018 (UTC)
- Also, we don't have a part of speech for every word -- POS concepts don't apply very well to Chinese terms, for instance, and Lojban is just plain wacky, and there are aspects of Japanese that haven't been tackled yet, and the whole issue of idioms has gone back and forth a few times, among many other issues. Looking at the numbered list above, some aspects are rather confusing -- #3 says "for terminology only", which is very ambiguous given the context.
- While I support the underlying idea (semantic tagging for better correspondences across languages), I caution planners that this is a vastly complicated issue. Please do not expect rapid progress. :) ‑‑ Eiríkr Útlendi │Tala við mig 18:41, 6 February 2018 (UTC)
- In my opinion there are certain elements here which are broadly accepted - for example the temporal tags. I believe most languages of Wiktionary will have the concept, e.g. Catégorie:Langage désuet, Kategorie:Zastaralé_výrazy, etc., of dated/archaic terms. Those should be immediately implemented as they serve as exemplars how this project can work inter-language. - Amgine/ t·e 17:21, 7 February 2018 (UTC)
How does {{head}}
add words to this category? Asking since I'm importing some modules from here to Hindi Wiktionary, and hi:માટે is being put in this category. —AryamanA (मुझसे बात करें • योगदान) 23:26, 3 February 2018 (UTC)
- Any part of speech that isn't in the lemmas or non-lemmas table in Module:headword/data will be put into that category. DTLHS (talk) 00:01, 4 February 2018 (UTC)
Old English declension infrastructure
editOld English noun declension templates are woefully incapable of presenting what should, by all rights, not be too complex. For example, see feond for an entry where each form has to be specified because the templates can't handle it. If someone could Luacise it so that it worked like our Latin declension templates, that would help a great deal. @Rua, JohnC5 as people with a likely interest. —Μετάknowledgediscuss/deeds 21:38, 4 February 2018 (UTC)
- For feond, the problem came from trying to squish two separate declensions into the same table, so I have separated them. mellohi! (僕の乖離) 04:06, 8 February 2018 (UTC)
Module:wikipedia (I think)
editWhen viewing {{pedia}}
in the mobile version of Safari, it comes out pretty much like this:
February on Wikipedia.Wikipedia Wikipedia
Note that there's no space between the period and the "Wikipedia" link when the problem actually occurs. Esszet (talk) 17:56, 5 February 2018 (UTC)
- @Esszet: There a link after the period even in the desktop version, though it's invisible: for instance,
<span class="interProject"><a href="https://en.wikipedia.org/wiki/elephant" class="extiw" title="w:elephant">Wikipedia</a></span>
in elephant. Not sure what its purpose is. The CSS file for the desktop version, MediaWiki:Common.css, hides it, while the CSS for the mobile version, MediaWiki:Mobile.css, doesn't. — Eru·tuon 05:24, 6 February 2018 (UTC)- Can it be deleted? It seems totally redundant. Esszet (talk) 13:23, 6 February 2018 (UTC)
- @Esszet: Apparently it isn't. It seems to populate the "in other projects" section of the sidebar. So an admin needs to add a rule to MediaWiki:Mobile.css like the one in MediaWiki:Common.css. Perhaps @-sche, who was working on that file? — Eru·tuon 17:54, 6 February 2018 (UTC)
- Done. Let me know if there are any unwanted side effects; I looked for anything else that used that class, and didn't see anything that seemed likely to cause problems. - -sche (discuss) 18:29, 6 February 2018 (UTC)
- @Esszet: Apparently it isn't. It seems to populate the "in other projects" section of the sidebar. So an admin needs to add a rule to MediaWiki:Mobile.css like the one in MediaWiki:Common.css. Perhaps @-sche, who was working on that file? — Eru·tuon 17:54, 6 February 2018 (UTC)
- Can it be deleted? It seems totally redundant. Esszet (talk) 13:23, 6 February 2018 (UTC)
Odd behavior from template
editOn jocose, I added a citation, originally with {{cite-book}}
but then changing it to {{quote-book}}
. In both, the only date I used was the year (1886) but when I use the latter, correct template, it displays "1886. February 5." Where is it getting the date??? —Justin (koavf)❤T☮C☺M☯ 18:04, 5 February 2018 (UTC)
- @Sgconlaw? —Μετάknowledgediscuss/deeds 18:11, 5 February 2018 (UTC)
- The quotation templates uses PHP functions to parse dates. This is unfortunate because PHP will never throw an exception, ever, no matter what garbage input you give it. So if you give it a date of "1886" it will attempt to fill in the other parameters of the date with the current date. Anyway, if you just want a year, use the year parameter. Only use date if you have a specific date. DTLHS (talk) 18:13, 5 February 2018 (UTC)
- Yup. This is a feature of {{#time:}}, not of the templates themselves. — SGconlaw (talk) 18:26, 5 February 2018 (UTC)
Japanese example quoting using ja-usex
editI want to write some examples for Japanese Jukujikun (熟字訓) kanji readings
- 2000, "Tsunami", in Ballad 3: The Album Of Love, performed by Southern All Stars, track 24:
{{ja-usex|人は誰も愛求めて 闇にさまよう運命|ひとはだれもあいもとめてやみにさまようさだめ|The fate of a man wandering in the dark, looking for love}}
- 2000, "Tsunami", in Ballad 3: The Album Of Love, performed by Southern All Stars, track 24:
Is there a way to bypass this system(exceptions for cases found in famous literature) or add Kanji readings that I believe are legitimate?
Thanks. Jayshinkw (talk) 03:49, 6 February 2018 (UTC)
- Did you mean this? (After adding spaces)
- 人は誰も愛求めて闇にさまよう運命
- hito wa dare mo ai motomete yami ni samayō sadame
- The fate of a man wandering in the dark, looking for love
- Wyang (talk) 04:12, 6 February 2018 (UTC)
Diacritic stripping request
editKind of how Latin links display macrons but they get stripped from links. For Livonian, lang code: liv
- remove any apostrophes
- also remove this weird apostrophe: ’
- remove ogonek from long ō: ǭ --> ō
Neitrāls vārds (talk) 01:16, 8 February 2018 (UTC)
- Done. DTLHS (talk) 01:34, 8 February 2018 (UTC)
- Thank you! Neitrāls vārds (talk) 01:55, 8 February 2018 (UTC)
Another diacritic stripping question
editLatvian (lv) has most of the scholarly tone diacritic removal/replacement rules covered (thanks to whoever did this) but it would be awesome to get rid of a spelled out <uo> diphthong (just o in standard orthography) but one of the two letters (seems to vary by author which one) gets a tone diacritic.
Examples:
To avoid having to make any complicated "logic statement" I could spell out all of them (not that many because it's not possible for 2 different tone marks to be within the same diphthong):
(uo) (ũo ûo ùo ūo) (uõ uô uò uō) (ũõ ûô ùò ūō) --> o
But the part with macrons should only apply to <uo> sequence because macron is a legitimate diacritic (technically it's not even a tone mark but, I guess, they use it as a replacement for tilde, since <uo> is not part of orthography to begin with, that prevents any confusion, I suppose.) Neitrāls vārds (talk) 03:01, 8 February 2018 (UTC)
- If it was even possible, I disagree with this one. These are orthographic variation and should be treated as such. --Victar (talk) 01:21, 9 February 2018 (UTC)
- I guess it makes sense that replacing a sequence would be more tricky than a single character...
- However, uo is not a variant spelling only a "dictionary notation convention" (for lack of a better term), to my knowledge there isn't any spelling tradition (that has seen any use) that would spell out uo's, only a convention that is used in headword lines of more specialized dictionaries. Neitrāls vārds (talk) 18:14, 9 February 2018 (UTC)
This page http://valoda.ailab.lv/latval/vidusskolai/orto/ilvo.htm gives a quick rundown of the stages of lv orthography
1st (chaotic) attempts (ca. 16~18th century), just plain o
- Ahbola /a:buola/ (modern ābola)
or occasionally a doubled oo
- goodtc /guots/ (gods)
So-called "German orthography" (~19th century), just o
- eerozis /ieruotsis/ (ierocis)
Modern orthography, conceptualized at the turn of 19th/20th centuries but introduced after WW1 (1918-ish?) Initially the plan was for it to spell out uo's but it didn't materialize (rightly so if you ask me, when every (native) o is implicitly uo, what's the point of explicitly spelling them out, but I digress.) So, outside of dictionaries it has never really been used. Neitrāls vārds (talk) 18:14, 9 February 2018 (UTC)
- The issue is that we strip diacritics to make linking easier, but not as a tool to normalise an orthography. The problem is not a technical one, but that this is an inappropriate application. —Μετάknowledgediscuss/deeds 18:38, 9 February 2018 (UTC)
- So there has never been a precedent? I.e., a language actually adding extra letters for their dictionary notation as opposed to just adding diacritics (every example I can think of falls in the latter category actually.)
- not as a tool to normalise an orthography – as I outlined above uo has never been part of any orthography tradition, only "dictionary notatation" / faux transcription.
- Latvian is not that relevant in etymologies (Lithuanian can usually do the same job while being more archaic) but looking forward how can this problem be tackled? Suppose I magically fix all the links right now, then the year 2020 rolls over and there are another 200 red links when there are perfectly fine entries that they should land on, say, for example, link for uozuols when there's ozols (and the former is not a valid form attestable in prose, only in dict headword lines.) It's not that I care that much but it sounds like something one would constantly need to look after. Neitrāls vārds (talk) 20:57, 9 February 2018 (UTC)
Double boldface
editIs it possible to avoid the double boldface that occurs when, say, {{past participle of|acquit|lang=en}}
is used on the acquit entry page itself? — SGconlaw (talk) 10:54, 8 February 2018 (UTC)
- I agree this is a problem that should be resolved. I recall this kind of thing coming up before (there was a discussion of it involving msh210 and Rua—CodeCat at the time). - -sche (discuss) 05:27, 9 February 2018 (UTC)
- That is what would normally appear if an inflection has the same spelling as the main entry. I don't think it should be done away with. DonnanZ (talk) 16:45, 9 February 2018 (UTC)
- Hmm, it seems to be more accentuated than at run#Verb. DonnanZ (talk) 16:56, 9 February 2018 (UTC)
- Wouldn't the normal level of boldface be sufficient? The extra boldface seems excessive. — SGconlaw (talk) 17:43, 9 February 2018 (UTC)
- Yes, it should only be as bold as e.g. a (sloppy, #Noun-less) link from an adjective section to a noun section on the same page, not twice as bold. I found the prior discussion, which should help resolve the current case: Wiktionary:Beer parlour/2014/June § boldfaced forms of invariant lemmata. - -sche (discuss) 18:56, 9 February 2018 (UTC)
- We don't normally create inflection entries if they are the same spelling as the main entry (I don't anyway) so this case is a little odd. DonnanZ (talk) 19:12, 9 February 2018 (UTC)
- @Sgconlaw: I have fixed it using "old-fashioned" technology, but had to re-add the past participle category. See if it meets your approval now. DonnanZ (talk) 01:04, 10 February 2018 (UTC)
- Thanks! — SGconlaw (talk) 15:19, 11 February 2018 (UTC)
- Yes, it should only be as bold as e.g. a (sloppy, #Noun-less) link from an adjective section to a noun section on the same page, not twice as bold. I found the prior discussion, which should help resolve the current case: Wiktionary:Beer parlour/2014/June § boldfaced forms of invariant lemmata. - -sche (discuss) 18:56, 9 February 2018 (UTC)
- Wouldn't the normal level of boldface be sufficient? The extra boldface seems excessive. — SGconlaw (talk) 17:43, 9 February 2018 (UTC)
- Hmm, it seems to be more accentuated than at run#Verb. DonnanZ (talk) 16:56, 9 February 2018 (UTC)
- @Erutuon, Rua, DTLHS or anyone else who might know: can we find an actual, general solution to this? Simply removing templates as was done on [[acquit]] seems like an undesirable and unmaintainable approach that only "fixes" individual entries as they crop up. - -sche (discuss) 19:24, 11 February 2018 (UTC)
- @-sche: I had taken a look at the HTML, and thought it was because two CSS selectors were both emboldening the same word: the
strong
tag and.form-of-definition-link .mention
class selector. But that when I look at it with browser-internal styles displayed (I'm in Firefox), it's clear that the reason is slightly different:<strong>, <b>
tags have the rulefont-weight: bolder;
, which means that the "form of definition mention" text, which is already bold, is made even bolder by the<strong>
tag. One solution would be to override this browser-internal rule withstrong, b { font-weight: bold; }
in MediaWiki:Common.css, though I'm not sure if that's the best solution. — Eru·tuon 20:54, 11 February 2018 (UTC)- Is there any way to standardize the css classes to only use either strong or bold in order to match what the wikitext produces? Having both floating around seems destined to create oddly-distributed coincidental combinations. Chuck Entz (talk) 21:44, 11 February 2018 (UTC)
- I don't quite understand your question, because strong and bold are HTML tags and have nothing to do with CSS rules applying to classes. I might not have explained things well. In the entry acquit, a selflink
<strong class="mw-selflink selflink">acquit</strong>
(acquit) was generated from the wikitext[[acquit]]
. The strong tag, as well as the b tag generated by wikitext bolding syntax, has the CSS propertyfont-weight: bolder;
applied to it by my browser, and apparently by other people's browsers. The selflink was wrapped by<span class="form-of-definition-link"><i class="Latn mention" lang="en">...</i></span>
, and MediaWiki:Common.css applies the propertyfont-style: bold;
to this configuration of classes. So acquit starts out bold because of the Wiktionary CSS property and becomes even bolder because of the browser-internal CSS property. (The resulting bold value is 900 according to my browser:font-weight: bold;
+font-weight: bolder;
=font-weight: 900;
.) — Eru·tuon 22:30, 11 February 2018 (UTC)
- I don't quite understand your question, because strong and bold are HTML tags and have nothing to do with CSS rules applying to classes. I might not have explained things well. In the entry acquit, a selflink
- Is there any way to standardize the css classes to only use either strong or bold in order to match what the wikitext produces? Having both floating around seems destined to create oddly-distributed coincidental combinations. Chuck Entz (talk) 21:44, 11 February 2018 (UTC)
- @-sche: I had taken a look at the HTML, and thought it was because two CSS selectors were both emboldening the same word: the
- How would such a line in the css interact with e.g. the lines that specify that "bolded" Hebrew has a normal (non-bolded) font weight and is big instead? Would it override them and cause Hebrew to be "bolded"? If not, that sounds like a good fix. What did we do to fix the "fishbone" problem linked above, of self-links on the headword line being double-bolded? - -sche (discuss) 01:18, 12 February 2018 (UTC)
- @-sche: Based on this article, CSS selectors that include class names (
.Hebr
) will have precedence over those that only include tag names (b, strong
). So the Hebrew-related styles will behave in the same way. - When I test it in my browser,
b, strong { font-weight: bold; }
fixes the double-bolded headword selflink problem too, so it could replace the rule that currently fixes the problem,b .selflink, strong .selflink { font-weight: inherit; }
. I wonder if there are any cases in which Wiktionary needs any levels of bolding besides normal and bold. — Eru·tuon 04:56, 12 February 2018 (UTC)- I wouldn't imagine so. — SGconlaw (talk) 02:34, 14 February 2018 (UTC)
- I have added that code to MediaWiki:Common.css. It seems to solve the issue. If there re no adverse side-effects, the old code (currently commented out) can be removed. - -sche (discuss) 03:14, 14 February 2018 (UTC)
- I wouldn't imagine so. — SGconlaw (talk) 02:34, 14 February 2018 (UTC)
- @-sche: Based on this article, CSS selectors that include class names (
- How would such a line in the css interact with e.g. the lines that specify that "bolded" Hebrew has a normal (non-bolded) font weight and is big instead? Would it override them and cause Hebrew to be "bolded"? If not, that sounds like a good fix. What did we do to fix the "fishbone" problem linked above, of self-links on the headword line being double-bolded? - -sche (discuss) 01:18, 12 February 2018 (UTC)
- Oh, I thought the template had been modified. Didn’t realize it had simply been removed. — SGconlaw (talk) 19:39, 11 February 2018 (UTC)
- I tried to explain that, it can be regarded as a temporary solution if the "boffins" can work out what to do. DonnanZ (talk) 12:06, 12 February 2018 (UTC)
FastRevert no longer correctly?
editI noticed that FastRevert is no longer leaving clean edit summaries. See this diff, for example. Anyone know what happened?
For the record, I was using Safari 11.0.2 when I made that edit, although I'm not sure it makes a difference. --Ixfd64 (talk) 19:48, 8 February 2018 (UTC)
Protected page edit request
editHello, For Wiktionary:Per-browser_preferences, please change:
<div id="isPreferencePage" name="isPreferencePage" />
- to:
<div id="isPreferencePage" name="isPreferencePage"></div>
- To resolve Special:LintErrors/self-closed-tag. Thank you, Xaosflux (talk) 16:14, 9 February 2018 (UTC)
- Done. Thanks for pointing out the lint error. - -sche (discuss) 16:20, 9 February 2018 (UTC)
Renaming 'Azeri' to 'Azerbaijani' with a bot
editPer WT:RFM#Renaming_az I have renamed the language code az to 'Azerbaijani'. Can someone with a bot update the language headers, translations tables and descendants lists? --Vahag (talk) 17:51, 9 February 2018 (UTC)
- Many categories also need to be moved. DTLHS (talk) 17:51, 9 February 2018 (UTC)
- @Vahagn Petrosyan: I see that you have gone ahead and renamed it, which is probably not a good idea until we're ready to switch everything over. As DTLHS notes, a big part of it will be the categories, so if you want to start moving those over, now would be a good time. —Μετάknowledgediscuss/deeds 21:26, 9 February 2018 (UTC)
- It's fine, I'll move them tonight. DTLHS (talk) 21:32, 9 February 2018 (UTC)
- Looks like the categories are done now. The mainspace changes await. —Μετάknowledgediscuss/deeds 06:24, 10 February 2018 (UTC)
- A few newly created entries have kept "==Azeri==", such as artıq. But I guess the process is still ongoing, so there's no need for concern. Allahverdi Verdizade (talk) 12:15, 10 February 2018 (UTC)
- @DTLHS, Metaknowledge, thank you for moving and sorry for hurrying. --Vahag (talk) 13:18, 10 February 2018 (UTC)
- A few newly created entries have kept "==Azeri==", such as artıq. But I guess the process is still ongoing, so there's no need for concern. Allahverdi Verdizade (talk) 12:15, 10 February 2018 (UTC)
- Looks like the categories are done now. The mainspace changes await. —Μετάknowledgediscuss/deeds 06:24, 10 February 2018 (UTC)
- It's fine, I'll move them tonight. DTLHS (talk) 21:32, 9 February 2018 (UTC)
- @Vahagn Petrosyan: I see that you have gone ahead and renamed it, which is probably not a good idea until we're ready to switch everything over. As DTLHS notes, a big part of it will be the categories, so if you want to start moving those over, now would be a good time. —Μετάknowledgediscuss/deeds 21:26, 9 February 2018 (UTC)
Need Help
editCan anyone please help me in removing transliteration of Urdu, Persian & Arabic languages from Urdu Wiktionary? The imported modules there cause Urdu entries to say "transliteration needed" but that shouldn't be necessary because it's the Urdu Wiktionary. — Bukhari (Talk!) 11:55, 12 February 2018 (UTC)
- Well, even on the Urdu Wiktionary, don't you think Urdu entries should have some way of showing how they're pronounced, like IPA or using vowel marking on the headword line? In any case, @Aryamanarora can probably help. —Μετάknowledgediscuss/deeds 17:38, 16 February 2018 (UTC)
Template "de-nom" at Occitan Wiktionary
editThe template "de-nom" at the Occitan Wiktionary looks like it could benefit from an expansion to include the other three grammatical cases. --Lo Ximiendo (talk) 13:17, 13 February 2018 (UTC)
- UPDATE: I added three rows for the noun forms, but not the columns for the names of the grammatical cases, which are "nominatiu, genitiu, datiu, acusatiu" in Occitan. --Lo Ximiendo (talk) 13:38, 13 February 2018 (UTC)
Bug with /api/rest_v1...
edithttps://en.wiktionary.org/api/rest_v1/page/definition/kuulua
definition 5 has this:
"Hän kysyi, mistä puhuin Samin kanssa. Sanoin, että se ei kuulu hänelle.\n
- He asked what I was talking to Sam about. I told him it was none of his business.
"
3 "He asked what I was talking to Sam about. I told him it was none of his business."
As you can see the sentence is duped.
https://en.wiktionary.org/wiki/kuulua on the other hand, has no such duplicate sentence.
- What is this? I've never heard of this API. Who is developing it? DTLHS (talk) 21:40, 13 February 2018 (UTC)
https://en.wiktionary.org/api/rest_v1/#!/Page_content/get_page_definition_term I guess I should've gone to https://phabricator.wikimedia.org/ to report this? I don't feel like making an account. If anyone would do this, then good.
- @Jberkel has worked on that project. - TheDaveRoss 21:52, 13 February 2018 (UTC)
- That API endpoint was added to permit dictionary lookups from within the Android Wikipedia app (documentation). I'm not involved in the development of the API, I only suggested that our templates generate some extra markup to facilitate the parsing. I'm not sure the endpoint is still maintained/developed at the moment. The discussion on phabricator has stalled in any case. – Jberkel 23:16, 13 February 2018 (UTC)
- @Jberkel do you think it is worth submitting a bug? If so would you mind doing so? A little familiarity with the project goes a long way in making bugs meaningful. If you would rather not I can take a stab at it. - TheDaveRoss 13:23, 14 February 2018 (UTC)
- @TheDaveRoss: – Sure, I'll do it. Jberkel 15:03, 14 February 2018 (UTC)
- ticket T187430 on phab. – Jberkel 10:55, 15 February 2018 (UTC)
- @TheDaveRoss: – Sure, I'll do it. Jberkel 15:03, 14 February 2018 (UTC)
- @Jberkel do you think it is worth submitting a bug? If so would you mind doing so? A little familiarity with the project goes a long way in making bugs meaningful. If you would rather not I can take a stab at it. - TheDaveRoss 13:23, 14 February 2018 (UTC)
- That API endpoint was added to permit dictionary lookups from within the Android Wikipedia app (documentation). I'm not involved in the development of the API, I only suggested that our templates generate some extra markup to facilitate the parsing. I'm not sure the endpoint is still maintained/developed at the moment. The discussion on phabricator has stalled in any case. – Jberkel 23:16, 13 February 2018 (UTC)
- @Jberkel has worked on that project. - TheDaveRoss 21:52, 13 February 2018 (UTC)
Listing all daughter languages
editIs there any way to see a complete list of all languages that have a given language X as their ancestor? For example, both German and Yiddish have Middle High German as their immediate ancestor, while Cimbrian has Bavarian as its immediate ancestor and MHG as a more distant ancestor; is there any convenient way to see a complete list of languages that have MHG anywhere in their ancestor tree? —Mahāgaja (formerly Angr) · talk 15:36, 14 February 2018 (UTC)
- Here (this is not my work). --Per utramque cavernam (talk) 15:43, 14 February 2018 (UTC)
- Awesome, thanks! (And thanks, JohnC5, too!) —Mahāgaja (formerly Angr) · talk 15:50, 14 February 2018 (UTC)
- @Mahagaja: Thank @Erutuon. He's the one who has been the one cleaning it up recently. We're hoping to integrate it into
{{langcatboiler}}
at some point so that it is easily available. —*i̯óh₁nC[5] 20:42, 14 February 2018 (UTC)
- @Mahagaja: Thank @Erutuon. He's the one who has been the one cleaning it up recently. We're hoping to integrate it into
- Awesome, thanks! (And thanks, JohnC5, too!) —Mahāgaja (formerly Angr) · talk 15:50, 14 February 2018 (UTC)
Two competing categories
editHi all,
I have encountered these two entities:
Category:Bashkir terms borrowed from Arabic
Category:Bashkir terms derived from Arabic
I am surprised to see these two are separate entities rather than one. Also, note that the two lists are different.
Ideally, these two should be merged, and only one should be kept. I need somebody technical to help me with this. Borovi4ok (talk) 09:02, 15 February 2018 (UTC)
- No, "derived terms" categories (for all languages) include all forms of derivation, including inheritance and borrowing, whereas "borrowed terms" are only form terms borrowed directly. Consider e.g. French terms derived from Latin, which includes borrowed terms and a large inherited vocabulary. But the distinct also holds for Bashkir; a word borrowed into Bashkir from, say, English, which borrowed it from French, which inherited it from Latin, which borrowed it from Arabic, is thus a Bashkir word which is ultimately derived from Arabic, but it's not a Bashkir word borrowed from Arabic. (The category boilerplate text could be expended to explain this better, IMO.) - -sche (discuss) 09:46, 15 February 2018 (UTC)
- OK thanx,
- I will sort it out manually then. Borovi4ok (talk) 12:41, 15 February 2018 (UTC)
I'm struggling to think of any English verb which couldn't be conjugated in the present participle as fooing but which also isn't a gerund-style noun at the same time. The Accel gadget is too intricate for me to tinker with it, so can someone please add the gerund forms to it? —Justin (koavf)❤T☮C☺M☯ 17:56, 15 February 2018 (UTC)
Adding wikidata ids to Module:data/languages
editI'm proposing the addition of another field to {{Module:languages}}
data which is the Wikidata item for that language. This would supersede the wikipedia_article
property, since these links could easily be generated from the Wikidata item id. – Jberkel 09:54, 16 February 2018 (UTC)
- I support that idea. It will have to be done with great care, though, as some of our languages may not map as intuitively as you'd expect to Wikidata items, and many won't have items at all. —Μετάknowledgediscuss/deeds 17:35, 16 February 2018 (UTC)
- Can you generate a preliminary list of mappings so that it can be reviewed? DTLHS (talk) 17:46, 16 February 2018 (UTC)
- @DTLHS: raw data:
{{Module:User:Jberkel/languages}}
, matched table:{{User:Jberkel/languages}}
, unmatched table:{{User:Jberkel/languages/unmatched}}
, ambiguous:{{User:Jberkel/languages/ambiguous}}
. The matched table is just a sample, since lua runtime constraints prevent the full render. Matching is done via ISO 639-X codes. – Jberkel 22:22, 16 February 2018 (UTC)
- @DTLHS: raw data:
- I've done a test edit in
{{Module:languages/data3/a}}
, Special:Diff/49017034. Please take a look, the entries I checked were all correct. If there are no objections I'll run the script on the other data modules. – Jberkel 21:28, 18 February 2018 (UTC)- Looks good. @DTLHS, in case he wants to take a look. —Μετάknowledgediscuss/deeds 22:19, 18 February 2018 (UTC)
- @Jberkel, Metaknowledge, Erutuon Look like you boys broke some stuff. I'm getting Lua error in Module:languages/data3/p at line 1252: attempt to call global 'wikidata_item' (a nil value) from
{{desc|psu|𑀡𑀻𑀟}}
. --Victar (talk) 16:17, 19 February 2018 (UTC)- @Victar: sorry should be fixed now. i've done a couple of manual edits and a typo snuck in. – Jberkel 16:23, 19 February 2018 (UTC)
- Thanks. --Victar (talk) 16:27, 19 February 2018 (UTC)
- @Jberkel Hi,
pmh
is still broken: can't use the{{der}}
or{{inh}}
tags. -- माधवपंडित (talk) 17:00, 19 February 2018 (UTC)- @माधवपंडित: maybe caching? can you post a link to a broken entry? – Jberkel 17:09, 19 February 2018 (UTC)
- @Jberkel: रित्तें and अस्वल show this error but others don't. I'll clear cache and re-check. -- माधवपंडित (talk) 17:14, 19 February 2018 (UTC)
- Odd, the error persists on these entries but entries like हांव are intact. -- माधवपंडित (talk) 17:19, 19 February 2018 (UTC)
- @माधवपंडित: I don't see errors in the entries you linked to. – Jberkel 17:53, 19 February 2018 (UTC)
- @Jberkel: Yeah, it's fine now. -- माधवपंडित (talk) 01:03, 20 February 2018 (UTC)
- @माधवपंडित: I don't see errors in the entries you linked to. – Jberkel 17:53, 19 February 2018 (UTC)
- @माधवपंडित: maybe caching? can you post a link to a broken entry? – Jberkel 17:09, 19 February 2018 (UTC)
- @Jberkel Hi,
- Thanks. --Victar (talk) 16:27, 19 February 2018 (UTC)
- @Victar: sorry should be fixed now. i've done a couple of manual edits and a typo snuck in. – Jberkel 16:23, 19 February 2018 (UTC)
- @Jberkel, Metaknowledge, Erutuon Look like you boys broke some stuff. I'm getting Lua error in Module:languages/data3/p at line 1252: attempt to call global 'wikidata_item' (a nil value) from
- Looks good. @DTLHS, in case he wants to take a look. —Μετάknowledgediscuss/deeds 22:19, 18 February 2018 (UTC)
RTL reconstructions
editIs there some way to fix the spacing problem in RTL reconstructions, like Avestan: *𐬛𐬁𐬥𐬀 (*dāna)? I'm wondering if the solution is to have a |recon=
param. @Erutuon --Victar (talk) 20:14, 19 February 2018 (UTC)
- @Victar: What is the problem? It looks fine to me. I see an asterisk and an Avestan word, right-to-left, followed by a space and transliteration in parentheses, left-to-right. — Eru·tuon 20:46, 19 February 2018 (UTC)
- @Erutuon: Compare Avestan: *𐬛𐬁𐬥𐬀 (*dāna) to Avestan: 𐬛𐬁𐬥𐬀 (dāna). Note the space. --Victar (talk) 20:59, 19 February 2018 (UTC)
- @Victar: Still not seeing it. Perhaps you could post a screenshot of the issue? — Eru·tuon 21:03, 19 February 2018 (UTC)
- I sometimes have this issue as well, though not here. But at अज़दहा (azadhā), I'm seeing this: . --Per utramque cavernam (talk) 21:18, 19 February 2018 (UTC)
- @Erutuon: https://image.ibb.co/coJfO7/ae_spacing.png --Victar (talk) 21:26, 19 February 2018 (UTC)
- @Victar: Wow. So you are seeing several spaces between the reconstructed Avestan and the opening bracket, while I am seeing just one. I am using Firefox Quantum 59, but I just viewed this page in Chrome 64 and saw this spacing problem. It seems to be related to the
unicode-bidi: embed;
CSS property in that is assigned to Avestan in MediaWiki:Common.css. If I remove that property in the developer tools (right-click on the text and click "Inspect"), the text displays with only one space, but the asterisk is then on the left side. In fact, when I switch between the differentunicode-bidi
property values, the property values that put the asterisk on the right (where it should be) also have the spacing problem. That's got to be a bug. Something about including an asterisk in right-to-left text is screwing things up. — Eru·tuon 21:54, 19 February 2018 (UTC)- @Erutuon: Good to know it's not broken cross-platform. What if we filtered out the asterisk from the Avestan text and then add it back with CSS, something like
.Avst::after { content: "*" }
? --Victar (talk) 22:17, 19 February 2018 (UTC)- @Victar: Interesting idea. I tried it in the developer tools (through JavaScript) and it does work, though I had to modify the selector to
.Avst a::before
to get the asterisk to display inside the link and in the correct position (on the right side). So it amounts to removing the asterisk and then adding it back. Heh. — Eru·tuon 03:06, 20 February 2018 (UTC)- @Erutuon: Hah, well, if it works! I wonder if other RTL reconstructions as afflicted by the same bug. --Victar (talk) 03:10, 20 February 2018 (UTC)
- @Victar: Interesting idea. I tried it in the developer tools (through JavaScript) and it does work, though I had to modify the selector to
- @Erutuon: Good to know it's not broken cross-platform. What if we filtered out the asterisk from the Avestan text and then add it back with CSS, something like
- @Victar: Wow. So you are seeing several spaces between the reconstructed Avestan and the opening bracket, while I am seeing just one. I am using Firefox Quantum 59, but I just viewed this page in Chrome 64 and saw this spacing problem. It seems to be related to the
- @Victar: Still not seeing it. Perhaps you could post a screenshot of the issue? — Eru·tuon 21:03, 19 February 2018 (UTC)
- @Erutuon: Compare Avestan: *𐬛𐬁𐬥𐬀 (*dāna) to Avestan: 𐬛𐬁𐬥𐬀 (dāna). Note the space. --Victar (talk) 20:59, 19 February 2018 (UTC)
- Previous discussion of our bumping into the Lua memory limit, including a rejected phabricator request to raise that memory limit: WT:GP/2017/April § water is broken.
Starting from English etymology 2, light is full of "Lua error: not enough memory" error messages. Can anyone diagnose/fix? Tetromino (talk) 22:21, 19 February 2018 (UTC)
- @Tetromino: Maybe it's because of the wikidata ids that were recently added to the language and language family data modules by @Jberkel. — Eru·tuon 22:55, 19 February 2018 (UTC)
- Weird. I would't expect this to consume that much more memory. Or there's something seriously wrong inside the wikidata extension. – Jberkel 23:11, 19 February 2018 (UTC)
- It's probably just that there is a delicate balance with the memory that the modules use- many pages are right on the edge and any addition can put them over. DTLHS (talk) 23:18, 19 February 2018 (UTC)
- I removed the sitelink lookup and it still fails. If there's something seriously wrong I'd expect a lot more pages to fail. – Jberkel 23:43, 19 February 2018 (UTC)
- The problem is the size of the data module, not whether anything is being done with that data. DTLHS (talk) 23:44, 19 February 2018 (UTC)
- Hm, it's just a few extra bytes per language, but given the number of languages it could add up to something like 200kb, assuming that all data modules get loaded. – Jberkel 23:50, 19 February 2018 (UTC)
- One solution could be to mirror the language data into another module with only the wikidata IDs mapped to our language codes. DTLHS (talk) 23:52, 19 February 2018 (UTC)
- @Jberkel: The total memory is probably than 200 KB. I don't entirely understand how Scribunto memory works, but this Lua function is an attempt to get a handle on how much memory the new Wikidata items might take up. World of Warcraft wiki says that each table index not in the array part of the table takes up 40 bytes, plus the bytes taken up by the value. And apparently each string uses 24 bytes along with its byte length. So 7447 Wikidata items times 40 bytes = 297,880 bytes; the total of the bytes in each of the strings is 56,306 bytes; then 24 bytes times 7447 strings = 178,728 bytes. Total of all of those, 532,914 bytes. And if any tables had to be expanded to the next larger size (a power of two), that added memory too. So assuming this all is correct, more than 500 KB has been added by the recent edits on any page where all the language data modules are transcluded, even when not considering the memory used by
mw.loadData
when it wraps the data modules, and by the newgetWikidataItem
function, and so on. — Eru·tuon 00:05, 20 February 2018 (UTC)- FYI, there are actually errors on about 30 high-volume pages (CAT:E). —*i̯óh₁n̥C[5] 03:42, 20 February 2018 (UTC)
- Hm, it's just a few extra bytes per language, but given the number of languages it could add up to something like 200kb, assuming that all data modules get loaded. – Jberkel 23:50, 19 February 2018 (UTC)
- The problem is the size of the data module, not whether anything is being done with that data. DTLHS (talk) 23:44, 19 February 2018 (UTC)
- I removed the sitelink lookup and it still fails. If there's something seriously wrong I'd expect a lot more pages to fail. – Jberkel 23:43, 19 February 2018 (UTC)
- It's probably just that there is a delicate balance with the memory that the modules use- many pages are right on the edge and any addition can put them over. DTLHS (talk) 23:18, 19 February 2018 (UTC)
- Weird. I would't expect this to consume that much more memory. Or there's something seriously wrong inside the wikidata extension. – Jberkel 23:11, 19 February 2018 (UTC)
- Yesterday I added a bunch of words from CAT:E into the pagename blacklist in the source of Template:redlink category, and it did help. But today it looks like a bunch of those pages are back in CAT:E and more. FWIW. —Internoob 04:52, 20 February 2018 (UTC)
- @Jberkel: Given the scope of the memory errors that are being produced and the very limited usefulness of the Wikidata IDs. I would like you to undo your changes to the modules for now. (We can keep the field for the IDs, just not use them.) —Μετάknowledgediscuss/deeds 05:27, 20 February 2018 (UTC)
- So, comment out the Wikidata IDs rather than undoing/entirely removing them? (Seems sensible, whether it's what you're suggesting or not.) We still need to address the pre-existing problems of our entries using Lua for so much, e.g. auto-transliteration and the redlink finder, of course. If more memory is used the more codes there are, memory usage will continue to go up for that reason, too, because we are always adding more codes... - -sche (discuss) 06:26, 20 February 2018 (UTC)
- Unfortunately, our project is well suited to automation, and I feel that this issue will continuously be coming up. As Meta has mentioned, it seems like we should ask for the a software solution to this problem from the devs, whether that be increasing the memory limit or having them streamline some of our base processes (though I don't know how that might work). —*i̯óh₁n̥C[5] 06:32, 20 February 2018 (UTC)
- Commenting them out would be perfect, yes. We should get the ball rolling with a Phabricator ticket, perhaps, but I'm not the right person to write one. —Μετάknowledgediscuss/deeds 06:44, 20 February 2018 (UTC)
- Le sigh. Ok, I'll undo my changes. I was initially thinking about splitting the data modules into smaller pieces (data/a/a1, /a/a2 etc.) but there will always be some outlier pages which transclude everything, and more pieces also means more overhead (and inconvenience for editors). Another solution could be to increase the memory limit exceptionally for a few high traffic pages (but how would that be set up?). In any case we need to find a "proper" solution soon. We also need better tools for profiling memory usage. I'll start a ticket on phabricator to get some ideas. – Jberkel 07:55, 20 February 2018 (UTC)
- @Jberkel: To be fair, if every language is going to have canonical name and wikidata code, couldn't you put those in indices [1] and [2] to save a lot of memory in the language modules? —*i̯óh₁n̥C[5] 08:23, 20 February 2018 (UTC)
- Indeed, we're approaching full coverage of the family parameter, so it might make sense to put that in [3] and just assign an "uncategorized" family to those yet to be added. I believe I'm right in thinking that the array memory is much more efficient if used at the declaration time of the table than the hash table, right? —*i̯óh₁n̥C[5] 08:33, 20 February 2018 (UTC)
- @JohnC5: Sorry, I don't follow. I don't see how an extra index would save memory here. As Erutuon has indicated, the storage requirements for strings are around 24+length * number of instances. It's difficult to get below this baseline. The table keys should be handled by Lua's string interning and only count once. I'll verify this though to be sure. – Jberkel 09:28, 20 February 2018 (UTC)
- @Jberkel: I'm saying that instead, of putting the values of
canonicalName
,wikidata_item
,family
under those names entries (i.e. in the table's underlying hashtable), put them as entries[1]
,[2]
,[3]
of the table's underlying array. For instance, convert:- ["zaa"] = {
- canonicalName = "Sierra de Juárez Zapotec",
- otherNames = {"Ixtlán Zapotec", "Atepec"},
- scripts = {"Latn"},
- family = "omq-zap",
- wikidata_item = "Q12953989",
- }
- ["zaa"] = {
- to:
- ["zaa"] = {
- "Sierra de Juárez Zapotec",
- "Q12953989",
- "omq-zap",
- otherNames = {"Ixtlán Zapotec", "Atepec"},
- scripts = {"Latn"},
- }
- ["zaa"] = {
- This will mean that the table creation is much more efficient for these mandatory entries as well as the lookups and will save memory in that way. —*i̯óh₁n̥C[5] 09:43, 20 February 2018 (UTC)
- Ah, ok I misread
[1]
as missing wiki references, not indexes :). Yes, this should save (3 * 40 bytes (string keys) - 32 bytes (3 int keys) = 88 bytes per entry? I can't believe it's 2018 and we're discussing byte-level optimisations :) – Jberkel 10:08, 20 February 2018 (UTC) - That is a good idea. Another idea is to share script arrays between languages, particularly for
{"Latn"}
, which is used more than 3000 times (see the "script combinations" table in User:Erutuon/language stuff). That is, definelocal Latn = {"Latn"}
at the top and use that in each applicable data table on the page.mw.loadData
is clever enough to cache only one copy of the table then. That would in theory save 40 + 16 bytes for every "Latn" script table after the first, plus about 24 + 4 bytes for the string (84 bytes?). I tried it in one module, but didn't notice any effect. I suppose it would save even more, at least in the data module, to use a string,{ --[[...]] scripts = "Latn", --[[...]] }
, instead of an array, but the functions relying on the scripts item would have to be modified. — Eru·tuon 10:47, 20 February 2018 (UTC)- @Erutuon: Even better, assume script Latn as default avoiding to define it to each language. --Vriullop (talk) 18:49, 20 February 2018 (UTC)
- That seems mostly sensible. Latin is by far the most used script, especially for the obscurer lects where no script is yet specified. However, I'd like if we could make a list of languages which don't currently have a script set, before we make Latn the default, so we know which languages we need to check the script of. (Or, add an "undetermined" script code to those languages, which can be converted to specific script codes at leisure.) - -sche (discuss) 19:31, 20 February 2018 (UTC)
- @-sche: If you look at the "script combinations" table in User:Erutuon/language stuff, languages with no script are in the
None
row; there are 3718 of them at the moment. If you sort by the "languages" column, you will see they are the largest group, larger thanLatn
. — Eru·tuon 20:23, 20 February 2018 (UTC)- Good point; I meant that Latin is the most used script in the world [by number of languages using it], but in our modules, there are still a lot of gaps. But I've filled in a bunch of those gaps; Latin is used by more than half of all the lects we have codes for. But [t occurs to me to back up and ask] would it actually save us any memory to treat Latn as the default, or would the same amount of memory still be used just by the check that would be performed to see whether or not a script was set for a particular language? - -sche (discuss) 06:20, 21 February 2018 (UTC)
- @-sche: Right, I was mainly just pointing to the page. It would save some memory to leave out
{"Latn"}
in the tables. I guess at least 96 bytes is used per instance (based on the World of Warcraft wiki explanation, ignoring Scribunto-specific stuff), if a localLatn
variable is not being shared between the tables, which would come to a few hundred kilobytes if all the data modules are being transcluded. By contrast, it's cheap to check for the presence of the "scripts" item in a language's data table: you just check whetherdata_table.scripts
isnil
. I wonder if there are languages that need to have their script specified asNone
? I guess I can't see why. — Eru·tuon 07:40, 21 February 2018 (UTC) - @-sche: I created a list of languages without scripts at User:Erutuon/languages with no scripts, as I realized that's what you were actually asking for. 21:12, 21 February 2018 (UTC)
- Thanks! Right now, while I'm just adding scripts to the modules, the list doesn't offer much advantage over just noticing which languages have no script set (unless it's of help to someone fulfilling the idea I suggested a few threads down for adding missing scriptS), my point is that it would be necessary (or at least, helpful) to save or subst: a copy prior to any switch to not declaring Latn at all and assuming that languages with no script specified can be assumed to be written in Latn (a fine assumption, but one we'll want to fix the edge cases of). (I've saved a copy now.) - -sche (discuss) 15:04, 22 February 2018 (UTC)
- From the perspective of the module, I guess there's probably no advantage to specifying "None" over assuming "Latn". But from the perspective of people trying to go through and ensure that languages with identifiable scripts have those scripts specified (most are Latin, but in a few cases the script has been Deva, or Ethi, or Thai), if we switch to specifying no script when the script is Latn, it would be good to know which languages have no script specified because the script is known to be Latn, vs which have no script specified because the script is not known. Perhaps this could be accomplished by first adding a commented-out
scripts = {"None"}
orscript unknown
to languages with no script specified, so the module doesn't have to spend any time processing that "script", but humans can still see while editing the module which languages we still need to track down script info for. - -sche (discuss) 16:06, 21 February 2018 (UTC)- That's a good general principle, especially for a wiki that requires elapsed-time-consuming research. We need more allowance for work in process at a highly granular level. I don't really need to get a red Lua message for typing "g=f?, m". I need to have an acceptable entry to which I can come back when I have more information or an working on that class of problem. DCDuring (talk) 17:02, 21 February 2018 (UTC)
- @-sche: Right, I was mainly just pointing to the page. It would save some memory to leave out
- Good point; I meant that Latin is the most used script in the world [by number of languages using it], but in our modules, there are still a lot of gaps. But I've filled in a bunch of those gaps; Latin is used by more than half of all the lects we have codes for. But [t occurs to me to back up and ask] would it actually save us any memory to treat Latn as the default, or would the same amount of memory still be used just by the check that would be performed to see whether or not a script was set for a particular language? - -sche (discuss) 06:20, 21 February 2018 (UTC)
- @-sche: If you look at the "script combinations" table in User:Erutuon/language stuff, languages with no script are in the
- That seems mostly sensible. Latin is by far the most used script, especially for the obscurer lects where no script is yet specified. However, I'd like if we could make a list of languages which don't currently have a script set, before we make Latn the default, so we know which languages we need to check the script of. (Or, add an "undetermined" script code to those languages, which can be converted to specific script codes at leisure.) - -sche (discuss) 19:31, 20 February 2018 (UTC)
- @Erutuon: Even better, assume script Latn as default avoiding to define it to each language. --Vriullop (talk) 18:49, 20 February 2018 (UTC)
- Ah, ok I misread
- @Jberkel: I'm saying that instead, of putting the values of
- @JohnC5: Sorry, I don't follow. I don't see how an extra index would save memory here. As Erutuon has indicated, the storage requirements for strings are around 24+length * number of instances. It's difficult to get below this baseline. The table keys should be handled by Lua's string interning and only count once. I'll verify this though to be sure. – Jberkel 09:28, 20 February 2018 (UTC)
- Le sigh. Ok, I'll undo my changes. I was initially thinking about splitting the data modules into smaller pieces (data/a/a1, /a/a2 etc.) but there will always be some outlier pages which transclude everything, and more pieces also means more overhead (and inconvenience for editors). Another solution could be to increase the memory limit exceptionally for a few high traffic pages (but how would that be set up?). In any case we need to find a "proper" solution soon. We also need better tools for profiling memory usage. I'll start a ticket on phabricator to get some ideas. – Jberkel 07:55, 20 February 2018 (UTC)
- Commenting them out would be perfect, yes. We should get the ball rolling with a Phabricator ticket, perhaps, but I'm not the right person to write one. —Μετάknowledgediscuss/deeds 06:44, 20 February 2018 (UTC)
- Unfortunately, our project is well suited to automation, and I feel that this issue will continuously be coming up. As Meta has mentioned, it seems like we should ask for the a software solution to this problem from the devs, whether that be increasing the memory limit or having them streamline some of our base processes (though I don't know how that might work). —*i̯óh₁n̥C[5] 06:32, 20 February 2018 (UTC)
- So, comment out the Wikidata IDs rather than undoing/entirely removing them? (Seems sensible, whether it's what you're suggesting or not.) We still need to address the pre-existing problems of our entries using Lua for so much, e.g. auto-transliteration and the redlink finder, of course. If more memory is used the more codes there are, memory usage will continue to go up for that reason, too, because we are always adding more codes... - -sche (discuss) 06:26, 20 February 2018 (UTC)
- @Erutuon: Well, mine will require a script change as well. The transition for mine would also be fairly easy: change the accessors to check the positional params as well as the hashtable during the transition period, then remove the check in the hashtable after the transition is over. Could you possibly get some statistics on your page concerning how many languages don't have family params? Thanks! —*i̯óh₁n̥C[5] 11:01, 20 February 2018 (UTC)
- @JohnC5: you can see it at
{{Module:sandbox}}
: 8031 - 5778 = 2253. I'll change the data modules to use indexes plus inline the Latn scripts. – Jberkel 12:30, 20 February 2018 (UTC) - @JohnC5: I've added a table of the total number of languages and the number that has each data item (with notes on what the numerical indices represent). — Eru·tuon 20:41, 20 February 2018 (UTC)
- @JohnC5: you can see it at
- @Erutuon: Well, mine will require a script change as well. The transition for mine would also be fairly easy: change the accessors to check the positional params as well as the hashtable during the transition period, then remove the check in the hashtable after the transition is over. Could you possibly get some statistics on your page concerning how many languages don't have family params? Thanks! —*i̯óh₁n̥C[5] 11:01, 20 February 2018 (UTC)
- I am wondering if, at some point in the near future, we can all agree that the concept and execution of the languages module is just not going to work and try and come up with some novel solutions. The current process of making a change, breaking a bunch of things, then trying to scale back changes until nothing quite breaks is not what I would call an optimal design paradigm. If we want to persist in using the current solution, I the propose that we mandate any changes made will demonstrably *not* break content which is currently unbroken. - TheDaveRoss 18:26, 20 February 2018 (UTC)
- You're looking at novel solutions right above your comment. There's no optimal design paradigm possible when we don't have control over how it all works, and we don't even fully understand how memory is allocated. (And that's also why your mandate would not be feasible, because it's hard to demonstrate without trying it first.) —Μετάknowledgediscuss/deeds 18:36, 20 February 2018 (UTC)
- Tweaks to the existing design do not qualify in my book, moving from a large flat-file format which needs to be read in its entirety during every invocation to almost anything else would be a marked improvement. Perhaps restructuring the module so that it can read a small page specific to the language code rather than reading a large module with all language data. Perhaps figuring out how to migrate to Wikidata and leveraging an actual structured database. Perhaps something else entirely. - TheDaveRoss 21:20, 20 February 2018 (UTC)
- @TheDaveRoss: Yes, we could (and should) make better use of Wikidata. That's why I wanted to incorporate ids in our database. Things like language script data already exists in Wikidata. So in the long term our (reusable) data should be stored there, not in big lua chunks. – Jberkel 02:07, 21 February 2018 (UTC)
- Note that arbitrary access to Wikidata is an expensive function. See mw:Extension:Wikibase Client/Lua function mw.wikibase.getEntity. I'm afraid it will not be an alternative for intensive uses. --Vriullop (talk) 10:47, 21 February 2018 (UTC)
- @Vriullop: that's the case for this particular function call, since it loads all the data. However it's also possible to only query the fields needed which is much cheaper. – Jberkel 10:51, 21 February 2018 (UTC)
- Note that arbitrary access to Wikidata is an expensive function. See mw:Extension:Wikibase Client/Lua function mw.wikibase.getEntity. I'm afraid it will not be an alternative for intensive uses. --Vriullop (talk) 10:47, 21 February 2018 (UTC)
- @TheDaveRoss: Yes, we could (and should) make better use of Wikidata. That's why I wanted to incorporate ids in our database. Things like language script data already exists in Wikidata. So in the long term our (reusable) data should be stored there, not in big lua chunks. – Jberkel 02:07, 21 February 2018 (UTC)
- Actually, perhaps that would be a great first step. If every language had its own data module it would reduce the amount read tremendously. Why does the data need to be in such large chunks? It would be easier to maintain if it were in discrete pages as well. A bot could probably generate all of the submodules in minutes, without a disruption in the existing structure. Then we would only have to update the data module lookup function and the rest should remain functional as is. - TheDaveRoss 21:46, 20 February 2018 (UTC)
- I suspect that having to load thousands of individual modules would not be a performance improvement over having to load a single module (or 26 modules as we do now). DTLHS (talk) 22:04, 20 February 2018 (UTC)
- @TheDaveRoss: I'm not sure what you mean by "read in its entirety"; the first time
mw.loadData
is called on a data module, it creates a cached copy that is then used by later calls tomw.loadData
. So a given data module is read only once on a page, provided it is always loaded withmw.loadData
and not withrequire
. I am curious what the memory difference would be if the data modules were split up. - There is a certain amount of overhead for each data module loaded with
mw.loadData
. If I'm reading the source code right, the data-wrapping function creates one table (seen
) every timemw.loadData
is called to map between the actual (cached) tables and the empty tables that are returned, and for each table in the data module it creates 2 tables (an empty table and the empty table's metatable) and 6 functions. Four of these functions,__index, __newindex, __pairs, __ipairs
, are placed in the metatable of the virtual table and two (pairsfunc, ipairsfunc
) are returned whenpairs
andipairs
are called on the empty table returned bymw.loadData
. (Whew, it actually re-wraps the data every time the function is called, so these tables and functions are duplicated for every invocation! That's got to be a major contributor to our memory problems, because we load data modules so many times.) - Okay, so I guess the only item that would be duplicated if the data modules are split is the
seen
table. [Edit:] Actually, only the top level of a data module is wrapped. Subtables are wrapped only if they are visited by indexing. (For instance,mw.loadData("Module:languages/data2")["en"]["scripts"]
wraps the top-level table, the English data table, and the English scripts table.) So if you iterate through a loaded data module that contains subtables, each of the subtables will be wrapped, and memory usage will be greater than if you load it without doing anything else. — Eru·tuon 22:29, 20 February 2018 (UTC)- @Erutuon: Re performance, the reality is that we are up against an artificial performance problem, Wikimedia decided that 50mb of Lua memory usage would be the limit whether or not some other amount would be usable without compromising actual performance (e.g. page load time, server cost). The solution, until we start hitting other performance issues, can be as simple as minimizing the use of Lua memory in favor of resources which are less restricted (processor time). Splitting the data module into a per-code format would, I completely agree, increase the overhead in terms of function calls, but since most pages contain very few languages, I suspect that on average it would reduce overall server resource consumption. Since it is very hard for us to profile the things we do on wiki, we will be mostly stuck guessing about these types of things. (edit) Also, since not every invocation returns the same table in the current format, I am curious how MW decides to optimize. - TheDaveRoss 13:13, 22 February 2018 (UTC)
- Re "most pages contain very few languages": English lemmas with translations tables contain lots of languages, and the number of those is only going to increase as we become more and more complete. They are already the entries we're having trouble with. - -sche (discuss) 20:25, 22 February 2018 (UTC)
- @-sche: True. However currently every page with any invocations needs to read a large data file into memory, even if it only needs one language. There will be a tipping point somewhere when the average page needs to read a sufficiently large portion of the current module, but we are VERY far from that. - TheDaveRoss 21:03, 22 February 2018 (UTC)
- @TheDaveRoss: Actually, I've changed my mind; splitting up the language data modules is worth a try. It makes sense, because a given module typically uses only one or two language data tables. However, as there are 8031 language codes and there would be that many modules, it would probably be best to keep the current large modules for human editing and create a bot that would maintain the small modules. They would need to be protected and Module:documentation could display a message like "This module is generated from module x by a bot. Please edit module x instead of this one." (Heh, this would make the list of transclusions incredibly long. I wonder how many language codes are used on the pages with the most translations.) — Eru·tuon 21:34, 22 February 2018 (UTC)
- I assume you're volunteering to write and maintain said bot. DTLHS (talk) 21:37, 22 February 2018 (UTC)
- But would this actually help (m)any entries? We aren't having problems on entries that use only one language code, e.g. Evenki entries that never need to invoke any other language code besides Evenki, so we don't need to "fix" all those pages. We might see improvements on the few very language pages that are breaking now, but we'd be letting that tail wag the dog, in a way that would require much more upkeep (8000+ separate modules, possibly maintaining a bot to handle them,...). Our most complete pages, that transclude thousands of language codes, might still break. - -sche (discuss) 22:30, 22 February 2018 (UTC)
- @DTLHS: The first step is determining if it's worth it. If so, I might consider learning bot-writing just for this purpose.
- @-sche: I don't know. Maybe loading one of several large modules many times is more costly than loading many small modules with the same data, or maybe not. There is probably a way to test this without creating 8000-plus modules. — Eru·tuon 23:13, 22 February 2018 (UTC)
- I was thinking of replacing the
languages/dataX
modules with something likelanguages/data/en
and keeping thelanguages
module exactly as it is. Once the module has been split into languages (perhaps by bot) it seems like it would be easier for humans to maintain the smaller, specific data files. They are easy to find (since they are just at their ISO code subpage) and they will be very small and simple. - TheDaveRoss 13:47, 23 February 2018 (UTC)- The current system has the advantage that it's easier to quickly add data to a lot of lanuages, e.g. paging between Wikipedia, Ethnologue and one large lettered data module at a time, I've added script data to almost a thousand languages. It's also easier to watchlist and monitor changes to a few data modules. If we split it up, it'd seem like a step backwards, to when we had templates. We would seem to need to protect not only all existing subpages (/en, /fvr, /aav-ban-pro, etc), but all nonexistence subpages of valid form (/xx, /xxx, /xxx-xxx, /xxx-xxx-xxx) against being created by vandals, that could otherwise be created and would then AFAICT be accepted by the modules without complaint. And it doesn't seem like it would help that many pages. I'm not totally opposed to it, it just seems like it has a lot of drawbacks and not such great benefits. - -sche (discuss) 15:07, 23 February 2018 (UTC)
- I was thinking of replacing the
- Re "most pages contain very few languages": English lemmas with translations tables contain lots of languages, and the number of those is only going to increase as we become more and more complete. They are already the entries we're having trouble with. - -sche (discuss) 20:25, 22 February 2018 (UTC)
- @Erutuon: Re performance, the reality is that we are up against an artificial performance problem, Wikimedia decided that 50mb of Lua memory usage would be the limit whether or not some other amount would be usable without compromising actual performance (e.g. page load time, server cost). The solution, until we start hitting other performance issues, can be as simple as minimizing the use of Lua memory in favor of resources which are less restricted (processor time). Splitting the data module into a per-code format would, I completely agree, increase the overhead in terms of function calls, but since most pages contain very few languages, I suspect that on average it would reduce overall server resource consumption. Since it is very hard for us to profile the things we do on wiki, we will be mostly stuck guessing about these types of things. (edit) Also, since not every invocation returns the same table in the current format, I am curious how MW decides to optimize. - TheDaveRoss 13:13, 22 February 2018 (UTC)
- Tweaks to the existing design do not qualify in my book, moving from a large flat-file format which needs to be read in its entirety during every invocation to almost anything else would be a marked improvement. Perhaps restructuring the module so that it can read a small page specific to the language code rather than reading a large module with all language data. Perhaps figuring out how to migrate to Wikidata and leveraging an actual structured database. Perhaps something else entirely. - TheDaveRoss 21:20, 20 February 2018 (UTC)
- You're looking at novel solutions right above your comment. There's no optimal design paradigm possible when we don't have control over how it all works, and we don't even fully understand how memory is allocated. (And that's also why your mandate would not be feasible, because it's hard to demonstrate without trying it first.) —Μετάknowledgediscuss/deeds 18:36, 20 February 2018 (UTC)
- If we didn't care that the language data modules were human readable, how much could we reduce the size? I'm thinking of something like a minifier that periodically "compiles" the human readable modules (what we have now) into something smaller. DTLHS (talk) 18:41, 20 February 2018 (UTC)
- @DTLHS: One idea: concatenate all data into a string and provide another string with numerical data (printed in some non-decimal system) to indicate how to read the data. But I don't know exactly how to implement that or if it would really use less memory. — Eru·tuon 21:12, 20 February 2018 (UTC)
- Language modules seem to be used more intensively in the translation tables, but translations templates only need to know the script (and transliteration?), and probably other templates only need the script as well. Smaller modules with script data could be a good aproach. --Vriullop (talk) 10:38, 21 February 2018 (UTC)
- @Vriullop: Ideally we would still store all the data in one place and have a mechanism to selectively load only the fields needed, sort of like a specialized view of the data. – Jberkel 10:44, 21 February 2018 (UTC)
- Language modules seem to be used more intensively in the translation tables, but translations templates only need to know the script (and transliteration?), and probably other templates only need the script as well. Smaller modules with script data could be a good aproach. --Vriullop (talk) 10:38, 21 February 2018 (UTC)
- @DTLHS: One idea: concatenate all data into a string and provide another string with numerical data (printed in some non-decimal system) to indicate how to read the data. But I don't know exactly how to implement that or if it would really use less memory. — Eru·tuon 21:12, 20 February 2018 (UTC)
What is the intended format for cases where a lect does not, at the time it is added to the data module, have a Wikidata ID? (This could easily be the case for some of the more obscure lects we add exceptional codes for.) A blank "",
? Use the old format where the canonical name and family are named parameters/fields? - -sche (discuss) 19:28, 20 February 2018 (UTC)
- @-sche: Plain
nil
. I've just bulk-changed the data modules, the memory errors are gone now. – Jberkel 19:42, 20 February 2018 (UTC)- @Jberkel Could you publish the script that converts to new format? On my wiki, many language names are already translated and it is too tired to convert manually. --Octahedron80 (talk) 20:08, 20 February 2018 (UTC)
- @Octahedron80: sure, it's for Python3 + pywikibot: gitlab.com/snippets/1699967 – Jberkel 01:52, 21 February 2018 (UTC)
- @Jberkel Could you publish the script that converts to new format? On my wiki, many language names are already translated and it is too tired to convert manually. --Octahedron80 (talk) 20:08, 20 February 2018 (UTC)
- @-sche: Plain
- Category:Rebracketings by language and many other categories display "Lua error in Module:languages/by_name at line 5: table index is nil" now. - -sche (discuss) 21:55, 20 February 2018 (UTC)
- Now gone, although it had persisted even after null edits earlier. - -sche (discuss) 22:53, 20 February 2018 (UTC)
- Would it help to deploy the "local Latn" 'hack' that Module:languages/data3/a uses to the other submodules? - -sche (discuss) 22:53, 20 February 2018 (UTC)
- It probably wouldn't hurt. — Eru·tuon 00:02, 21 February 2018 (UTC)
- Ok people, after cleaning up some unrelated errors and debugging, the only page remaining with an error is do. Anyone got any ideas? —*i̯óh₁n̥C[5] 08:14, 21 February 2018 (UTC)
- @JohnC5: No, but I'll check if the mw.wikibase.sitelink calls can benefit from caching, not sure how much memory is allocated there. – Jberkel 11:41, 21 February 2018 (UTC)
- Just a random idea: I noticed that Lua has weak tables which could be used to hold the language data. If more memory is needed some of it can be garbage collected (and later reloaded if necessary). The problem at the moment is that all language modules are loaded and never reclaimed. – Jberkel 16:27, 21 February 2018 (UTC)
- @Jberkel: Unfortunately, data modules that will be loaded with
mw.loadData
can't be weak, because you can't add metatables to them, and I don't know if the weakness of tables actually even affects Scribunto memory usage. — Eru·tuon 20:20, 22 February 2018 (UTC)
- @Jberkel: Unfortunately, data modules that will be loaded with
- Just a random idea: I noticed that Lua has weak tables which could be used to hold the language data. If more memory is needed some of it can be garbage collected (and later reloaded if necessary). The problem at the moment is that all language modules are loaded and never reclaimed. – Jberkel 16:27, 21 February 2018 (UTC)
- @JohnC5: It might reduce memory to put
scripts
andother_names
in indices 4 and 5. Those are the next most frequent items, in that order. However, going from 4 to 5 array items may enlarge the size of the array part of the table from 4 to 8; if so, leavingother_names
in the hash part would be best. — Eru·tuon 22:01, 22 February 2018 (UTC)- @Erutuon, Jberkel: So last night, while doing some other work, I found what I think is a more efficient and user-friendly way of doing this. I've created Module:languages/global which contains the names of all the fields in the language data ordered by frequency, all the standard diacritics, and the common scripts. We load this into all the language modules and use it as the one source of truth. So what is now:
m["be"] = {
- "Belarusian",
- "Q9091",
- "zle",
- otherNames = {"Belorussian", "Belarusan", "Bielorussian", "Byelorussian", "Belarussian", "White Russian"},
- scripts = Cyrl,
- ancestors = {"orv"},
- translit_module = "be-translit",
- sort_key = {
- from = {"Ё", "ё"},
- to = {"Е" , "е"}},
- entry_name = {
- from = {"Ѐ", "ѐ", GRAVE, ACUTE},
- to = {"Е", "е"}},
}
- Becomes:
local g = mw.loadData("Module:languages/global")
- …
- m["be"] = {
- [g.canonical_name] = "Belarusian",
- [g.wikidata_item] = "Q9091",
- [g.family] = "zle",
- [g.other_names] = {"Belorussian", "Belarusan", "Bielorussian", "Byelorussian", "Belarussian", "White Russian"},
- [g.scripts] = Cyrl,
- [g.ancestors] = {"orv"},
- [g.translit_module] = "be-translit",
- [g.sort_key] = {
- from = {"Ё", "ё"},
- to = {"Е" , "е"}},
- [g.entry_name] = {
- from = {"Ѐ", "ѐ", g.CHARS.GRAVE, g.CHARS.ACUTE},
- to = {"Е", "е"}},
}
- Under the hood this would comes to be stored as (with the current project wide frequencies):
m["be"] = {
- [1] = "Belarusian",
- [2] = "Q9091",
- [3] = "zle",
- [5] = {"Belorussian", "Belarusan", "Bielorussian", "Byelorussian", "Belarussian", "White Russian"},
- [4] = Cyrl,
- [6] = {"orv"},
- [8] = "be-translit",
- [10] = {
- from = {"Ё", "ё"},
- to = {"Е" , "е"}},
- [9] = {
- from = {"Ѐ", "ѐ", g.CHARS.GRAVE, g.CHARS.ACUTE},
- to = {"Е", "е"}},
}
- It would turn out that for this case, that fields 1–6 will go into the array whereas 8–10 will go into the hashtable because
[7]
is omitted. However, we never iterate over these tables, and so the simplest tables will only have a few bytes worth of storage overhead. Then when you want to get something out, you do something like:local g = mw.loadData("Module:languages/global")
- …
- local language_name = self.__data[g.canonical_name]
- There will be a bit more lookup overhead, but it will always be O(1). This system also means that is one field becomes more common than another, all we need to do is change the order in Module:languages/global to rebalance the entire project. What do you think? —*i̯óh₁n̥C[5] 22:52, 22 February 2018 (UTC)
- Also, @-sche, DTLHS. —*i̯óh₁n̥C[5] 23:45, 22 February 2018 (UTC)
- @JohnC5: Lua Performance Tips mentions "If you write something like
{[1] = true, [2] = true, [3] = true}
, however, Lua is not smart enough to detect that the given expressions (literal numbers, in this case) describe array indices, so it creates a table with four slots in its hash part, wasting memory and CPU time." I'll have a look at the implementation, it's still not clear to me how it decides between array/hash parts. – Jberkel 08:21, 23 February 2018 (UTC)- @Jberkel: I'm not sure why it says that since it's definitely not true. If you look at Module:User:JohnC5/Sandbox3, you can see that the first 3 elements which are inserted in the table under indices
1
,2
, and3
get printed out byipairs
, which only prints from the array. The object at index5
gets put in the hashtable because it is non consecutive. Note also that the order in which the indices are entered is not relevant, as the compiler will still recognize that2
,1
,3
is actually1
to3
consecutively. Perhaps those tips come from before Lua 5.1, when they suped up the constructor for the tables? Does this make sense? —*i̯óh₁n̥C[5] 08:40, 23 February 2018 (UTC) - @Jberkel: Looking more carefully now that I've made some changes to my test module, the behavior is weirdly more robust than I expected. All the test I know for checking the size of the array (
#a
,ipairs(a)
, andtable.getn(a)
) point to my being correct, but I'm startled by these results. —*i̯óh₁n̥C[5] 08:58, 23 February 2018 (UTC) - @Jberkel: I take it back. After some fiddling around with memory stuff, these functions are just clever, but they are not being put in the array. Lemme think on this for a bit. —*i̯óh₁n̥C[5] 09:18, 23 February 2018 (UTC)
- @Jberkel: Damn, it won't work. I tried a bunch of things, but we'd just have to hard code them in order. Damn Lua for being the worst. —*i̯óh₁n̥C[5] 10:50, 23 February 2018 (UTC)
- @JohnC5: It seems that the length operator first looks at the array part, then looks in the hash part. In the latter case, it finds the largest power of 2
i
such thatt[i]
isn't nil, then does the search for ani
less than that wheret[i + 1]
is nil andt[i]
isn't. (So it returns the wrong result if a power of two is empty:x = { [1] = true, [3] = true, [4] = true, [5] = true } assert(#x == 1)
.table.getn
does some other stuff that I don't understand, but if that fails, it calls the#
operator. 21:48, 23 February 2018 (UTC)
- @JohnC5: It seems that the length operator first looks at the array part, then looks in the hash part. In the latter case, it finds the largest power of 2
- @Jberkel: I'm not sure why it says that since it's definitely not true. If you look at Module:User:JohnC5/Sandbox3, you can see that the first 3 elements which are inserted in the table under indices
- @JohnC5: Lua Performance Tips mentions "If you write something like
- @Erutuon, Jberkel: So last night, while doing some other work, I found what I think is a more efficient and user-friendly way of doing this. I've created Module:languages/global which contains the names of all the fields in the language data ordered by frequency, all the standard diacritics, and the common scripts. We load this into all the language modules and use it as the one source of truth. So what is now:
- @JohnC5: No, but I'll check if the mw.wikibase.sitelink calls can benefit from caching, not sure how much memory is allocated there. – Jberkel 11:41, 21 February 2018 (UTC)
@JohnC5, Jberkel: A way to use numerical indices would be to preprocess the data before outputting it: replacing string keys with numbers. "scripts"
could be replaced with 4
, "otherNames"
with 5
, and so on. Because the modules are loaded into memory once on a page, this processing would also be done only once. Unfortunately, it would confuse people that the exported table didn't match the table in the module (as would the previous idea). — Eru·tuon 21:55, 14 March 2018 (UTC)
- @Erutuon: Yes, I think this will be a maintenance nightmare. My call for profiling help on phabricator didn't go anywhere unfortunately. And setting up an instance to do profiling locally seems to be a lot of work. Ideally there would be a sandbox instance with extra debugging and profiling enabled. – Jberkel 23:45, 14 March 2018 (UTC)
Translation template error
editAfter using "Edit source" in translation section (trans-top template) and returning back from the editor by "Publish changes", all translation sections miss the [show ⏷] button on the right side to unfold them (and also the ± sign to edit the header). It is necessary to refresh the page afterwards to get back to normal operation of the template. With thanks and regards, Peter 10:17, 20 February 2018 (UTC)
- @Peter K. Livingston: I've experienced that as well, when using the AjaxEdit script. The "show" buttons and "±" sign are powered by JavaScript scripts, and I guess the scripts don't reload when "Publish changes" is pressed. Unfortunately I don't know how to fix this. — Eru·tuon 21:13, 20 February 2018 (UTC)
- @Peter K. Livingston: What should we do to find somebody who is qualified to solve this issue? Peter 22:17, 20 February 2018 (UTC)
- @Peter K. Livingston: @Dixtosa might be able to help. He knows JavaScript better than I do. — Eru·tuon 21:53, 23 February 2018 (UTC)
- I am afraid we can't anything about it that is not a hack. Loading Common.js manually solves show/hide problems. --Dixtosa (talk) 08:21, 24 February 2018 (UTC)
- @Peter K. Livingston: @Dixtosa might be able to help. He knows JavaScript better than I do. — Eru·tuon 21:53, 23 February 2018 (UTC)
Red links not turning blue
editI have had this problem for a couple of days where red links don't turn blue straight away when an entry is done, say for an inflection. I'm not sure whether it's just happening to me, or whether anyone else has noticed it. It can be rectified by doing a null edit, but this shouldn't be necessary. DonnanZ (talk) 19:25, 20 February 2018 (UTC)
- I've noticed it as well; when I created buck-hoist as an alt form of buck hoist before I created buck hoist itself, the link from buck-hoist to buck hoist stayed red until I did a null edit. It might have something to do with all the changes to the language modules filling up the job queue(?). - -sche (discuss) 19:33, 20 February 2018 (UTC)
- Is 10K a big number for jobs? DCDuring (talk) 21:04, 20 February 2018 (UTC)
- The situation has improved somewhat, but still a few seconds slow on occasion. DonnanZ (talk) 21:14, 20 February 2018 (UTC)
"Lemma" categories and non-morphemic sinograms
edit@Wyang, Justinrleung, Suzukaze-c I just noticed page 鳺/𱉎. This Chinese character is obviously not a lemma (at least in Chinese). But this page is currently categorized into Translingual lemmas, Middle Chinese lemmas, Old Chinese lemmas, Chinese lemmas, and Mandarin lemmas. What should we do? Dokurrat (talk) 20:15, 20 February 2018 (UTC)
- The lemma – non-lemma distinction is useless for Chinese, since there is no non-lemma form in Chinese by default. I think we should leave it as it is, since the "lemma" categories effectively function as a catch-all place for the words that one would find in a traditional dictionary, which is what 鳺 would belong to. Wyang (talk) 23:06, 20 February 2018 (UTC)
Arabic etymology
editIs there any way I can pull our all the Arabic word entries in Wiktionary that contain etymological info, please? — This unsigned comment was added by Rdurkan (talk • contribs).
- The best I can suggest is ploughing your way through Category:Arabic lemmas, which is not terribly helpful. DonnanZ (talk) 10:34, 21 February 2018 (UTC)
- @Rdurkan You can start with this list and extract the relevant sections from their contents using some kind of script, such as the get.py from Pywikibot, or you can write your own script to extract from links which use
&action=raw
as a parameter to index.php (e.g. this example). Wyang (talk) 13:21, 21 February 2018 (UTC)
- @Rdurkan You can start with this list and extract the relevant sections from their contents using some kind of script, such as the get.py from Pywikibot, or you can write your own script to extract from links which use
On MediaWiki_talk:Recentchangestext, there is a request to add a link to the Urdu version, but (a) the link is not of the same format as all the rest of the links (which use "foo:Special:Recentchanges" and rely on the site software to redirect to the local name of the page), and (b) I would imagine every wiki has a Recentchanges page, right? so I wonder if there are some criteria for deciding which languages to add interwiki links to, and whether Urdu meets those criteria. - -sche (discuss) 05:37, 21 February 2018 (UTC)
- Huh? I see no difference in the format of the link. As for your (b), I don't think we have any criteria, but it would be sensible to choose a cutoff of article count, and limit it to those wikis. —Μετάknowledgediscuss/deeds 05:42, 21 February 2018 (UTC)
- The request is to add [[ur:خاص:حالیہ تبدیلیاں]] (the Urdu-language name of the page), whereas the link to e.g. Arabic is not to [[خاص:أحدث_التغييرات]] but rather to [[ar:special:recentchanges|ar]] which then resolves to [[خاص:أحدث_التغييرات]]. - -sche (discuss) 05:51, 21 February 2018 (UTC)
- Oh, I see. They both end up at the same place, but I guess you're right that we should standardise with the easier one. —Μετάknowledgediscuss/deeds 06:06, 21 February 2018 (UTC)
- The request is to add [[ur:خاص:حالیہ تبدیلیاں]] (the Urdu-language name of the page), whereas the link to e.g. Arabic is not to [[خاص:أحدث_التغييرات]] but rather to [[ar:special:recentchanges|ar]] which then resolves to [[خاص:أحدث_التغييرات]]. - -sche (discuss) 05:51, 21 February 2018 (UTC)
- If we make the cutoff 10,000+ articles (since we already link to Arabic and Simple, and since that is the cutoff for the Main Page's sidebar links), we need to add quite a few more. I'll do that now, I suppose. I wonder if this is the kind of thing Wikidata wants to handle, the way they handle interwikis between different wikis' editions of Category:English nouns etc. - -sche (discuss) 15:11, 22 February 2018 (UTC)
- Done: I've updated the list to provide the same languages, using the same cutoff, as the Main Page. - -sche (discuss) 18:39, 22 February 2018 (UTC)
Moving ordinal numerals from category:Azerbaijani adjectives
editCould someone with a bot please help me moving all Azerbaijani ordinal numerals from the adjective category and over to numerals? That is, renaming ===Adjective=== and changing {{head|az|adjective}}. Thank you. Allahverdi Verdizade (talk) 12:58, 21 February 2018 (UTC)
- @Allahverdi Verdizade Done on 100 pages in that category which contain
{{ordinalbox}}
in their content; see the changes. Wyang (talk) 13:41, 21 February 2018 (UTC)- @Wyang Thanks a million! Allahverdi Verdizade (talk) 15:19, 21 February 2018 (UTC)
Languages with entries but no script specified
editIf anyone feels up to the task, it would be helpful if someone found every language which has no script specified in Module:languages, but which has entries (or even: which has translations in water), identify which scripts those entries/translations are in, and mass-add the scripts to Module:languages. - -sche (discuss) 23:34, 21 February 2018 (UTC)
- (For the current list of these languages, see User:Erutuon/languages with no scripts.) — Eru·tuon 06:10, 22 February 2018 (UTC)
- It would probably even be useful to simply add to the modules the scripts that all the languages we have entries for are de facto written in (meaning, the scripts our entries are in), not just ones that don't already have scripts specified. - -sche (discuss) 15:07, 22 February 2018 (UTC)
Parameter request for {{homophones}}
edit
Can someone add a "qN=" functionality to this template? Homophones are so often rooted in regional pronunciations, and I've seen some pretty bad workarounds and incomplete accent tagging due to the absence of this function. Or it could be "aN=" in keeping with {{a}}
, or it could be like {{alter}}
, but personally I find that template confusing. Ultimateria (talk) 12:04, 22 February 2018 (UTC)
@Rua, DTLHS, Erutuon, any chance we could update the usage of Module:form of from {{comparative of|good|lang=en}}
to {{comparative of|en|good}}
? --Victar (talk) 04:09, 23 February 2018 (UTC)
- Just that one template, or all of them? Right now it would be weird to use
{{comparative of|en|good}}
when all the other form-of templates use|lang=
. —Mahāgaja (formerly Angr) · talk 12:29, 24 February 2018 (UTC)- @Mahagaja: Oh yes, I mean all templates under that module. --Victar (talk) 14:12, 24 February 2018 (UTC)
As titled. There are some entries having usage examples with audios, for example Korean 헤아리다 (hearida). The audio can be displayed after the example in inline examples, and on a line under the example in multiline ones. Thanks!
(By the way, the current audios on that page are displayed incorrectly for me, covering the line above.) Wyang (talk) 09:32, 25 February 2018 (UTC)
- How is it displaying incorrectly? It looks OK to me. — SGconlaw (talk) 17:05, 25 February 2018 (UTC)
- @Sgconlaw The current display for me: [1], where the audios are shifted upward, almost completely covering the lines above. Wyang (talk) 22:32, 25 February 2018 (UTC)
- I see. This isn't something I can help with, unfortunately. In any case, what browser are you using, and what version? I have no problem with Mozilla Firefox Quantum 58.0.2. — SGconlaw (talk) 22:46, 25 February 2018 (UTC)
- @Wyang: It's displaying more or less the same way for me. Changing the inline CSS properties in the table tag that surrounds the audio player fixes it:
vertical-align: bottom; display: inline;
. The player is then centered on the bullet. (I used the developer tools to tinker with it. I'm in Firefox Quantum 59.) — Eru·tuon 22:53, 25 February 2018 (UTC)- @Erutuon Thank you, that also makes it display better on mine. I'm using Chrome 64.0.3282.167. Although not completely level, the line above is visible at least: [2]. Wyang (talk) 23:04, 25 February 2018 (UTC)
- @Wyang: Interesting. In your browser it looks a little different. I don't know what is going on. — Eru·tuon 02:33, 26 February 2018 (UTC)
- @Erutuon Thank you, that also makes it display better on mine. I'm using Chrome 64.0.3282.167. Although not completely level, the line above is visible at least: [2]. Wyang (talk) 23:04, 25 February 2018 (UTC)
- @Sgconlaw The current display for me: [1], where the audios are shifted upward, almost completely covering the lines above. Wyang (talk) 22:32, 25 February 2018 (UTC)
- This probably needs someone who can work with the javascript to solve the positioning issues and maybe make a slimmer player, before it can be added to
{{ux}}
. DTLHS (talk) 18:07, 25 February 2018 (UTC)
Can someone please add a new exception to Module:ru-translit, please? Please look for the line starting with -- handle Того, То́го (but not того or Того́, which have /v/)
The translit should produce the regular "g" and the pronunciation should use [ɡ]. --Anatoli T. (обсудить/вклад) 22:43, 25 February 2018 (UTC)
- I have attempted in diff, modelled on handling of до́рого (dórogo) above but it didn't work for some reason. --Anatoli T. (обсудить/вклад) 23:01, 25 February 2018 (UTC)
- Fixed in diff by User:Per utramque cavernam. I see that I wasn't attentive. Thanks! --Anatoli T. (обсудить/вклад) 23:59, 25 February 2018 (UTC)
Where is the code that disables the "transliteration needed" prompt for Latin-script languages? Asking for Hindi Wiktionary, where I've imported some modules. I believe @BukhariSaeed also has this issue on Urdu Wiktionary. —AryamanA (मुझसे बात करें • योगदान) 22:52, 25 February 2018 (UTC)
- I think it's language specific modules. E.g. if Persian noun headwords can't find "tr=" Module:fa-noun, entries use transliteration needed. You can search by modules (tick module only and the word "needed") and scan for "transliteration" in the browser. --Anatoli T. (обсудить/вклад) 00:59, 26 February 2018 (UTC)
- Perhaps lines 175–178 in Module:headword? Wyang (talk) 01:45, 26 February 2018 (UTC)
- @AryamanA: any solution? — Bukhari (Talk!) 04:34, 7 March 2018 (UTC)
- @Wyang: Thanks! Solved finally. —AryamanA (मुझसे बात करें • योगदान) 22:00, 8 March 2018 (UTC)
- Perhaps lines 175–178 in Module:headword? Wyang (talk) 01:45, 26 February 2018 (UTC)
Consensus required for move protection code
editDear Community,
In this phabricator task an admin requested deletion protection that was backed up by community consensus. The patch currently is on hold since if it were merged move protection would be enabled for the main page also. This deletion and move protection if implemented would block all users (including sysops) from moving or deleting the page. What are the community's thoughts on it? If the community is not ready to fully commit yet to this protection maybe enable it for a reasonable trial period (6 months or so) to see its effects?
Thanks
Sau226 (talk) 12:24, 26 February 2018 (UTC)
- I can't think of any case where we would need to move the main page! Equinox ◑ 12:57, 26 February 2018 (UTC)
- We have done so before, but I think there are other ways to accomplish the same results without moving if the need arises in the future. - TheDaveRoss 13:59, 26 February 2018 (UTC)
- If there is appropriate consensus I can make a link to the discussion (or an admin/locally trusted user can post it) on the phab task. As soon as this is given the aim is to merge the change ASAP --Sau226 (talk) 17:11, 1 March 2018 (UTC)
- This is the best you're going to get — we have no need to move the main page, and I created the task with a link to a discussion about deleting the main page. —Μετάknowledgediscuss/deeds 17:17, 1 March 2018 (UTC)
- If there is appropriate consensus I can make a link to the discussion (or an admin/locally trusted user can post it) on the phab task. As soon as this is given the aim is to merge the change ASAP --Sau226 (talk) 17:11, 1 March 2018 (UTC)
- We have done so before, but I think there are other ways to accomplish the same results without moving if the need arises in the future. - TheDaveRoss 13:59, 26 February 2018 (UTC)
Language categories and indexes
editThe language categories (CAT:Belarusian language, CAT:Assamese language, CAT:Hindi language, etc.) automatically link to Index:xxx, whether it exists (Index:Hindi) or not (Index:Belarusian, Index:Assamese).
AFAIK we don't use and create these anymore, so could we remove the link from {{langcatboiler}}
when the Index doesn't exist? --Per utramque cavernam (talk) 12:36, 26 February 2018 (UTC)
Another Lua Memory Shortage
editThere's another Lua memory shortage at the entry wind. --Lo Ximiendo (talk) 15:56, 26 February 2018 (UTC)
- I mean, shouldn't the solution be similar to the one for the entry water? --Lo Ximiendo (talk) 07:29, 28 February 2018 (UTC)
- That's a workaround, not a solution :) We really need to fix this properly. I've opened a ticket (T188492) to get some suggestions for better memory profiling. – Jberkel 10:59, 28 February 2018 (UTC)
Tech News
editJust a reminder to anyone who wants to keep up with Wikimedia tech news (which sometimes includes explanations for our mysterious bugs), be sure to watchlist Wiktionary:Wikimedia Tech News/2018. —Μετάknowledgediscuss/deeds 22:03, 26 February 2018 (UTC)
Global preferences available for testing
editPlease help translate to your language.
Greetings,
Global preferences, a highly request feature in the 2016 Community Wishlist, is available for testing.
- Read over the help page, it is brief and has screenshots
- Login or register an account on Beta English Wikipedia
- Visit Global Preferences and try enabling and disabling some settings
- Visit some other language and project test wikis such as English Wikivoyage, the Hebrew Wikipedia and test the settings
- Report your findings, experience, bugs, and other observations
Once the team has feedback on design issues, bugs, and other things that might need worked out, the problems will be addressed and global preferences will be sent to the wikis.
Please let me know if you have any questions. Thanks! --Keegan (WMF) (talk) 00:24, 27 February 2018 (UTC)
Buryat Mongol script
editCould someone please make Buryat written in Mongol script display vertically? Crom daba (talk) 14:38, 27 February 2018 (UTC)
- Done; diff —suzukaze (t・c) 00:52, 28 February 2018 (UTC)
- @Suzukaze-c That makes Cyrillic script also appear vertically: see ᠬᠥᠳᠡᠭᠡ_ᠠᠵᠤ_ᠠᠬᠤᠢ. DTLHS (talk) 00:57, 28 February 2018 (UTC)
- @DTLHS: OK, I reverted it for now, but I think that has to do with the way
{{head}}
assigns scripts... See [3]. @Erutuon? —suzukaze (t・c) 01:01, 28 February 2018 (UTC)- @DTLHS, Suzukaze-c: Yes, Module:headword assumes that all forms share the same script as the headword. So in this case, the Cyrillic was being tagged as Mongolian. This probably saves some Lua resources because
findBestScript
doesn't have to be called on each form, but I don't know how much. Headword modules for languages that regularly use multiple scripts (Module:mn-headword, Module:sh-headword) supply the script for the alternative form. So in this case the solution would be{{head|bua|noun|tr=xüdöö ažaxy|Cyrillic|хүдөө ажахы|f1sc=Cyrl}}
: automatic script detection for the headword, manually supplied script for the alternative spelling. — Eru·tuon 01:14, 28 February 2018 (UTC)- I guess
{{bua-noun}}
already accomplishes this- so if the existing entries that just use{{head}}
are switched over Mongolian can be added to the script list. DTLHS (talk) 01:24, 28 February 2018 (UTC)
- I guess
- @DTLHS, Suzukaze-c: Yes, Module:headword assumes that all forms share the same script as the headword. So in this case, the Cyrillic was being tagged as Mongolian. This probably saves some Lua resources because
- @DTLHS: OK, I reverted it for now, but I think that has to do with the way
- @Suzukaze-c That makes Cyrillic script also appear vertically: see ᠬᠥᠳᠡᠭᠡ_ᠠᠵᠤ_ᠠᠬᠤᠢ. DTLHS (talk) 00:57, 28 February 2018 (UTC)
So now I’m a spammer?
editCreation of a simple user page was blocked citing "various specific spammer habits." Suspect an over-zealous reaction to a single link to a page about my late wife. Also said "if I believe it constructive," I could resubmit. That message is wrong, because resubmitting only made the complaint stronger and removed the resubmit offer.
Having read the entire user page guidelines, I am persuaded my three paragraphs contain nothing prohibited and everything asked for. — This unsigned comment was added by 伟思礼 (talk • contribs).
- It's an automated preventive measure against those weird people who believe Wiktionary user pages are an appropriate place to post ads. :/
- I believe it should deactivate if you make more edits. —suzukaze (t・c) 06:44, 28 February 2018 (UTC)
- If all links are prohibited, (1) the guidelines should say so, instead of "may describe your real-life activities and/or link to your own website"; and (2) the rejection should not invite a resubmission which will only be rejected again. I removed the link and the rest of it was allowed. 伟思礼 (talk) 06:51, 28 February 2018 (UTC)
- @伟思礼: It is not the case that all links are prohibited, as you will see many pages contain links to a wide variety of places. The restriction on placing links is tied to the status of the account, with brand new accounts being restricted completely. It is certainly inconvenient for new editors, but it is a necessary evil to prevent spam bots from adding links all over the place.
- The text of the message is, I agree, unhelpful, that is something we can do something about.
- Thanks for your interest in Wiktionary, and I hope you stick around and contribute some of your knowledge to the project. - TheDaveRoss 13:23, 28 February 2018 (UTC)
Another bug
editWhen I make an entry here, go to the bottom and click "publish," it adds a captcha to the bottom of the page, then scrolls to the top and adds "incomplete or missing captcha." Trying to publish an edit in other places adds the captcha to the top of the page. If a captcha is going to always be required, why not make it part of the page right away, instead of making us scroll to the bottom twice and click the same publish button twice? — This comment was unsigned.
- I think captchas are only required before submitting edits if the edit contains an external link (either to an unexpected site, and/or prior to the editor making a certain number of edits? I'm not sure). I am also fairly certain that captchas are not something we as an individual wiki control (unlike the so-called "abuse filters" which stopped you from adding a link to your userpage). - -sche (discuss) 20:53, 1 March 2018 (UTC)
Pinging problems are really annoying
editI've been using wikis for years and years now, and I'm embarrassed to say that I'm unsure about how to ping people properly. I've gotten messages time and time again like, your ping didn't work, your ping didn't work. I'm not trying to sound like I'm ranting or something, but it's really annoying to have to keep hearing that (and I'm not annoyed at people themselves for telling me, I'm just annoyed that it keeps not working).
I'm not asking to tell me about how pings work (though it'd be nice). I'm just asking if there's some way that pinging can become easier; for instance, if the symbol @ is put before [[User:, then it should automatically ping in every situation. Something like that. Because the current way to do it I THINK is to put the ping right before your signature or on the same line as your signature or something like that.
I'm not into the technical stuff, so I don't know how much work implementing something like that would require, but I'm just asking if there's something we can implement in regards to this. PseudoSkull (talk) 00:05, 1 March 2018 (UTC)
- 1. The guide is at mw:Manual:Echo :)
- 2. I believe "automatically ping in every situation" could be quite disastrous, like when making manual archives of talk pages.
- —suzukaze (t・c) 00:13, 1 March 2018 (UTC)