Wiktionary:Votes/bt-2016-06/User:OrphicBot for bot status

User:OrphicBot for bot status edit

Nomination: I hereby request the Bot flag for User:OrphicBot for the following purposes: Most Latin and Greek entries do not have relevant external links, which in the past often had to be located by hand. I have rewritten a few modules (R:L&S, R:LSJ, R:Woodhouse, R:M&A, and R:Strong's) to locate important classical resources reasonably accurately, and would like to add templates consistently, while, as brought up in the discussion, removing deprecated links to dictionaries' Wikipedia pages. I would also be willing to try my hand at updating Greek pronunciations and declensions templates in stages, as others suggested when I brought up the topic of a Greek robot, if there continues to be interest after this small project is finished.

Sample edits: Δαναΐς, οἶνοψ, ἀλεκτρυών, ἀρσενοκοίτης, καταφρονέω, χρηστότης. The first shows how this version of the robot addresses an error created in the samples produced by the previous version, as was pointed out to me. The last five were chosen by a random number generator.

New sample edits:

diff:Δαναΐς diff:οἶνοψ diff:ἀλεκτρυών diff:ἀρσενοκοίτης diff:καταφρονέω diff:χρηστότης diff:arma

Notes: References are partially sorted by preference and DGE is only linked in the a-ek range. Otherwise, the references are not changed; for example, a bullet point is not added to the existing R:LSJ template in χρηστότης, and </references> is not moved, in the interest of conserving as much of the original file as possible. καταφρονέω now includes a prototype pronunciation template update from grc-ipa-rows to grc-IPA, as discussed. Only 1/8th of the set changes this template due to ambiguous vowels. The pronunciations are a work in progress. Arma is obviously a Latin sample, mutatis mutandis. The edit summaries are unfortunately all wrong. Isomorphyc (talk) 02:20, 21 June 2016 (UTC)[reply]

Isomorphyc (talk) 06:30, 16 June 2016 (UTC)[reply]
Vote starts: 06:30, 16 June 2016 (UTC)
Vote ends: ~~23:59, 23 June 2016 (UTC)~~
- Vote ends: ~~23:59, 30 June 2016 (UTC)~~ - extended - at least 14 days for bot votes. --Dan Polansky (talk) 07:29, 19 June 2016 (UTC)[reply]
- Vote ends: 23:59, 10 July 2016 (UTC) - Extended. Discuss here: Wiktionary:Beer parlour/2016/July#Closing OrphicBot vote. --Daniel Carrero (talk) 11:03, 3 July 2016 (UTC)[reply]

Discussion:
WT:BP#Potential Bot for Adding LSJ and L&S Links to Ancient Greek and Latin Entries

User talk:JohnC5#Greek LSJ Links

Support edit

Support. The edits look look good. The bot owner is responsive to feedback and willing to clean up his mistakes. --Wiki Tiki 89 14:13, 16 June 2016 (UTC)[reply]
Support —Μετάknowledge^{discuss/deeds} 03:22, 17 June 2016 (UTC)[reply]
Support —John C5 04:06, 17 June 2016 (UTC)[reply]
Support -Xbony2 (talk) 11:14, 17 June 2016 (UTC)[reply]
Support --Vahag (talk) 08:54, 6 July 2016 (UTC)[reply]

Oppose edit

Oppose I oppose placing LSJ last when it should be the first dictionary listed, certainly not after Diccionario Griego–Español, and probably not after Strong and Woodhouse. I see no better suited dictionary than LSJ so LSJ should lead the list. I saw this problem at diff. Maybe oppose is too strong for this objection? I welcome the initiative of adding these links by a bot. --Dan Polansky (talk) 20:34, 18 June 2016 (UTC)[reply]
That's close to my preference too. Does anyone object to sorting the list? I added the new links to the top to keep the pre-robot entries together. I personally tend to use LSJ and Middle Liddell a lot, and everything else not so much. Isomorphyc (talk) 00:03, 19 June 2016 (UTC)[reply]
Yes, I object to sorting the list alphabetically; or did you mean something else? LSJ should be the first item. And I wonder whether we want Diccionario Griego–Español on every page at all; it would make sense on pages where we can't get LSJ, but where we have LSJ, adding a Spanish dictionary is unobvious since we could also add a German one, etc. Is Diccionario Griego–Español exceptionally good? --Dan Polansky (talk) 06:52, 19 June 2016 (UTC)[reply]
I meant sorting by likely utility. I would probably suggest: LSJ, DGE, Strong's, Woodhouse. DGE is worth having because the fascicles up to εξ- are essentially modern revisions of LSJ. It was an error on my part (involving alphabetical order and unicode) to include DGE for entries past εξ-. I will post some revisions of the links soon. Is there anything else you would like to see linked? Isomorphyc (talk) 14:07, 19 June 2016 (UTC)[reply]
Thank you for the explanation about DGE. I take back my opposition as long as LSJ comes out the first. I know of no other dictionaries to link for Ancient Greek; we used to link LSJ a bit and it seems real good. --Dan Polansky (talk) 15:05, 19 June 2016 (UTC)[reply]

Let me emphasize that you need to check the existence of the target pages of the added links before you add them. This can be probably done via a script. --Dan Polansky (talk) 08:34, 20 June 2016 (UTC)[reply]
All of the modules do this already; I have these lists because I wrote the modules. Isomorphyc (talk) 12:26, 20 June 2016 (UTC)[reply]
I don't understand. How does the module make sure you do not add link if the target does not exist? Can you point me to one of the modules that does this, and to the specific part of the code that does this? Do you mean Lua modules or do you mean something else? --Dan Polansky (talk) 12:30, 20 June 2016 (UTC)[reply]
For LSJ, Perseus-style alpha-coded links will be valid unless there are two words with the same name. In this case, the link is redirected to a collision resolution page. The list of collisions is here: Module:R:LSJ/collision-data. The list of headwords in Perseus/LSJ is available in the Perseus XML file, so it is easy only to link words with valid headwords. This is not an entirely complete list, because the Perseus data are not formatted entirely consistently; but it is close at least to 99% complete. Similarly for R:Woodhouse, there is a full reverse index in the five sub-modules, mainly Module:R:Woodhouse/reverse_index, which are loaded with mw.loadData() in the main body of the module. You will find the same in Strong's and M&A if you review the modules. Also, if you look at the Python source code in updateReferencesGreek.py at User:OrphicBot you will see these lines which load the index data, which are checked against for list inclusion:

strongs = set([unicodedata.normalize('NFC',x) for x in re.split('[\[\]a-zA-Z0-9{}=\",\s]', readFile(basepath + "strongsNumbers.txt")) if len(x) > 0])
woodhouse = set([unicodedata.normalize('NFC',x) for x in re.split('[\[\]a-zA-Z0-9{}=\",\s]', readFile(basepath+"woodhouseReverseIndex1.txt")) if len(x) > 0])
Thank you for the explanation. I am not sure I understand all of this. I will assume then that each link that you will add will point to a target page with data relevant to the entry from which the link goes rather than "not found", and that if in a small number of cases "not found" will be there, humans will manually remove the links from Wiktionary. --Dan Polansky (talk) 13:06, 20 June 2016 (UTC)[reply]
Well, normally I try to check a hundred random cases for obvious errors, though I don't always know what constitutes an error until something gets posted. It's a slightly iterative process, but ideally no more than a couple of dozen pages will have to be revisited later for errors-- hopefully not by a human. In any case, list inclusion is something which gets tested for on two levels. If a word is known not to be in a list, the robot won't post the reference. If the reference is used anyway without arguments, the module will normally check against its internal list and display a bibliographic reference with no links. If a human supplies a link anyway, then this supersedes the module's checking, and the link is posted. But the robot will only post argument-free references, which leaves link validity checking to the modules. Thanks for taking an interest in these slightly tedious details. Isomorphyc (talk) 13:49, 20 June 2016 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ In χρηστότης, the DGE link leads to http://dge.cchs.csic.es/xdge/χρηστοτης which shows no data. --Dan Polansky (talk) 20:08, 20 June 2016 (UTC)[reply]
As mentioned above, anything past εξ- in DGE is an error which will be fixed with the revisions also showing the revised preference-ordering. Isomorphyc (talk) 20:34, 20 June 2016 (UTC)[reply]
@Dan Polansky Please see above for updated changes. Isomorphyc (talk) 02:20, 21 June 2016 (UTC)[reply]
I checked χρηστότης again and the DGE problem remains. I don't know where I should look to see updated changes.

I checked you recent edits and they use wrong edit summaries: like in diff, the edit summary is "grc-ipa-rows -> grc-IPA test (unambiguous vowels)" but the only change the edit does is change in template order. --Dan Polansky (talk) 08:38, 21 June 2016 (UTC)[reply]

I am restoring my opposition. This vote will probably pass anyway, but I think a bot operation needs to be much more careful than what I see here. I am sorry for that; I welcome the bot initiative in principle. --Dan Polansky (talk) 08:41, 21 June 2016 (UTC)[reply]
I don't think I explained this very well. The current revisions demonstrate three things: 1) the robot does not add R:DGE past ek-; this is visible in οἶνοψ. As you can see in the history, I had to remove R:DGE by hand to show this. 2) The robot does not remove R:DGE past ek- if it is already present. This is visible in χρηστότης which you noticed, and also καταφρονέω. As there are only three such examples, it is obvious these should be revised by hand, not by a robot. If a human added R:DGE past ek-, I am not desirous to remove it with a robot. The two examples in which it is not removed demonstrate this behaviour. 3) The pronunciation template is now changed from grc-ipa-rows to grc-IPA where only unambiguous vowels are present. This is indeed mentioned in the edit summaries. The pronunciation template is the only change of any complexity being tested. It is also the only change which requires posting to Wiktionary to see the results, due to server-side module execution. It indeed runs on every example, including Latin ones, and a null-change is evidence for correct behaviour in the presence of ambiguous vowels or in a Latin article. The edit summaries are badly worded, and I mentioned this myself in my commentary. Still, they are relevant for Greek editors interested in the accuracy of the pronunciation code. Unlike the references shuffling, where small differences from peoples' preferences are easy to notice and fix, the pronunciations are not linguistically so trivial and are important to get right. — This unsigned comment was added by Isomorphyc (talk • contribs).

~~Oppose~~ I object to the idea of having a bot for the sole purpose of adding external links. External links should be discouraged, not encouraged. They are a dissuasion from further improvement of articles and should be used very discreetly. (would support if focus changed to adding actual dictionary content or polishing existing code) Wyang (talk) 09:22, 5 July 2016 (UTC)[reply]

Hi @Wyang: I started with this task because I believe that linking external dictionaries will make it much easier for editors to add content. Latin and Greek differ from the modern languages in several ways which makes this salient: people expect citations to dictionaries which are in some cases over a hundred years old, and in many cases are themselves derived from dictionaries which have been evolving for many hundreds of years. Most editors would not really be willing to add content without consulting some or all of these dictionaries, because the languages are no longer spoken natively. At the same time, Perseus is the closest thing the Classics community has to a search engine, and it is not always easy to use. For example, while I can type pinyin without tone marks into WeChat and expect obscure characters to come out with long enough strings of text from context guessing, there is no way to type polytonic Greek which is not at least five times or ten times slower. I personally have to do a lot of copy/paste, and I regularly have to fiddle with the unicode encoding in C to translate from one website's convention for combining tone marks with vowel length marks to Wiktionary's mandatory convention. Other editors have expressed to me that they have the same problems. Linking will save a lot of people time because it lets the robot handle some of this minutiae.

I would add that the Latin Wiktionary has evolved into a glossary and etymological dictionary which complements other dictionaries, but does not replace them. For example, our etymologies are far more more explicit than can be found anywhere, especially with regard to affixation, and are usually more up-to-date than the traditional dictionaries (but not the leading etymological dictionaries). The Latin section covers 97% of the lemmata found in the most widely-read (so-called Golden Age) authors, and 99% of the words. For years it also had the most convenient stemmer on the internet. But where it is weak is in shades of meaning and in quotations, at which the historical dictionaries excel. If it were easier to find and verify these meanings quickly, it would be easier to fit this information into Wiktionary's far more terse style. The Greek section tends to be more scholarly about definitions than Latin, but has far fewer, about a fifth the number, of lemmata. I think the Greek editors will find a different path of evolution for this section than is in Latin, but it is still only reasonable to expect that with much work it will become complementary to other resources.

As a last point, I would add that this robot proposal was made nearly a month ago. I did mention I would look in to changing over some old pronunciation templates to new ones, and since then other, more lexicographical tasks have been discussed. Here are three lists in my sandbox of words whose pronunciation templates I will be able to change, with increasing levels of difficulty: User:Isomorphyc/Sandbox/grc-rows-ipa/0_unambiguous, User:Isomorphyc/Sandbox/grc-rows-ipa/2_with_head_annotations, and User:Isomorphyc/Sandbox/grc-rows-ipa/3_unambiguous_with_lsj. I have also discussed using the robot to add missing diacritical marks and import new words with other editors; but these are much more nuanced problems which require more premeditation to do correctly, and also more gradual tasks which will need a different model of collaboration than a monolithic proposal. I am not sure what you meant by using a robot for polishing existing code, but certainly editing actual dictionary content has been discussed in the BP and on various user pages. Thanks for your involvement in this and your willingness to support with your concerns being addressed; I regret the long note but I hope I have been able to address these issues in a way that represents the views of other Classical languages editors also. Isomorphyc (talk) 17:21, 5 July 2016 (UTC)[reply]

@Wyang: Your vote is the only thing blocking this from passing, and what Isomorphyc says is quite abundantly true. Different languages have different requirements, and for these languages, I can say with experience that external links are a service Wiktionary should be providing. —Μετάknowledge^{discuss/deeds} 17:33, 5 July 2016 (UTC)[reply]

The vote currently has 4 supporters and 2 opposers, so it reached 2/3 majority. If the vote ended now, it would pass. --Daniel Carrero (talk) 17:56, 5 July 2016 (UTC)[reply]

I retracted my opposition for now. I believe you are a sensible editor who knows what is appropriate for these classical languages. I still regard indiscriminately adding external links to non-classical languages as a very unwise idea - imagine the English entry story being filled with links to other dictionaries. Wyang (talk) 21:56, 5 July 2016 (UTC)[reply]

@Wyang: Thank you, it is appreciated. I have not had any intention of adding dictionary links to any modern language, least of all English to English, which I do not edit except for etymologies. I would add I would probably end up maintaining our data-shard sets (which I created after the vote had initially passed) with OrphicBot if this turns out to be necessary. I noticed you do a lot of this with your robot; but hopefully our data modules will need less upkeep than yours, since they are smaller and not user-facing. I realised this is possibly what you meant by polishing existing code. Isomorphyc (talk) 22:19, 5 July 2016 (UTC)[reply]

I absolutely disagree. External links are great added value for our readers. I hope the above position will not gain ground around here in English Wiktionary. --Dan Polansky (talk) 21:25, 5 July 2016 (UTC)[reply]

Abstain edit

Abstain --Daniel Carrero (talk) 09:50, 3 July 2016 (UTC)[reply]
@Daniel Carrero: Thank you for moderating this. Isomorphyc (talk) 17:05, 11 July 2016 (UTC)[reply]

Decision edit

5-1-1 (83.3%-16.7%) — Passes. @Stephen G. Brown, SemperBlotto, Chuck Entz --Daniel Carrero (talk) 14:52, 11 July 2016 (UTC)[reply]

Done. —Stephen ^(Talk) 12:29, 13 July 2016 (UTC)[reply]