Wiktionary:Votes/bt-2007-12/User:Tbot creating FL entries

User:Tbot creating FL entries edit

  • Vote ends: 31 December 2007 23:59 UTC
  • Vote started: 08:33, 17 December 2007 (UTC)

This has taken a lot of examples and tries to find out what is reliable, and when the translations are no good for various reasons. It now knows how to check (fairly well) with the FL.wikt entry, as well as stealing various things from it. (e.g. see śliwka, and compare pl:śliwka). Any remaining entries in Category:Tbot entries November 2007 will be deleted after a while (weeks-several months) as not as reliable as they should be; any that have been checked and the tag removed are of course fine. I am continuing to improve things; it has been taught the template names for Wikipedia and IPA pronunciation for various FL wikts, and some other details.

At present, the 'bot doesn't add language sections to existing entries (including those it has created itself); this makes it easier to shoot bad entries; it will be doing this presently, after some more experience is gained.

Note that this is 14 days rather than the default 7 for a 'bot vote, in recognition that people have more or less time over the festive season. (and my Best Wishes for the season) Robert

Support edit

  1.   Support Robert Ullmann 08:48, 17 December 2007 (UTC)[reply]
  2.   Support Dmcdevit·t 09:18, 17 December 2007 (UTC)[reply]
  3.   Support Conrad.Irwin 11:39, 17 December 2007 (UTC) They might not be perfect but they are a huge step forward.[reply]
  4.   Support SemperBlotto 11:44, 17 December 2007 (UTC)[reply]
  5.   Support Connel MacKenzie 18:10, 17 December 2007 (UTC) Generating lists hasn't encouraged translation entries - even when imperfect, this method allows for more straight-forward cleanup (even if it does eventually turn into a nine-headed hydra.)[reply]
  6.   Supportmsh210 20:48, 17 December 2007 (UTC)[reply]
  7.   Support Cynewulf 15:01, 19 December 2007 (UTC) Don't shoot the messenger, folks. We'd expect a human to trust the translation tables (otherwise why are they still here), why should a bot be different? The double-checking and such just makes it better. (And anyway, I assume the entries will still be flagged with {{tbot entry}} making a convenient triple-check list)[reply]
  8.   Support \Mike 23:43, 19 December 2007 (UTC) I haven't looked much into the new batch, but already the November batch was indeed useful for warn about dubious translations entered into the trans-tables, when some weird FL entry happened. And that's a good thing, IMO.[reply]
  9.   Support Visviva 07:20, 25 December 2007 (UTC) Looks good from here.[reply]
  10.   Support, this is a useful feature.--Jyril 17:10, 29 December 2007 (UTC)[reply]
  11.   Support. It's been annoying to discover how inconsistent translators have been in choosing Hebrew verb lemmata, but I guess I shouldn't Oppose the messenger. :-P   Also, given how many (misguided) editors think we should include full entries for non-lemmata, it's apparent that we need to start developing the bot infrastructure to create entries from scratch given information in other entries. Kudos to Robert for all his work in that direction. —RuakhTALK 23:50, 29 December 2007 (UTC)[reply]
    You haven't been following discussions in the Beer Parlour lately, have you :) --EncycloPetey 00:01, 30 December 2007 (UTC)[reply]
    No, I haven't … I take it I missed something relevant to my comment? —RuakhTALK 06:42, 30 December 2007 (UTC)[reply]
  12.   Support. Is very useful, tine-saving and efficient. --Keene 03:10, 31 December 2007 (UTC)[reply]

Oppose edit

  1.   Oppose — [ ric ] opiaterein20:09, 17 December 2007 (UTC) You saw this coming.[reply]
  2.   Oppose EncycloPetey 01:22, 18 December 2007 (UTC) - The cited Polish example is a good reason (in and of itself) to oppose this. The word is incorrectly translated. Take a look at where this item takes you to on the Polish Wikipedia [1]. It points to the page about the genus Prunus, which is śliwa in Polish. The word śliwka in Polish does not quite mean "plum". It is actually a genitive plural form (read: non-lemma) that means "from the plum" or "from the prune". It is not the nominative singular, and so should not be a lemma entry. The fact that this example was held up as a good one shows just how bad some of the generated articles really are. I would not trust a bot to do this work, especially given the history of where some of our translations have come from. The Polish translation was added by Widsith [2], who is not an expert in Polish. Just look at the edit history to see how many translations were added by Drago (aka You-Know-Who), who was notorious for bad Central European translations. Now, do we want a bot to propagate these errors to new pages? No, thank you. --EncycloPetey 01:22, 18 December 2007 (UTC)[reply]
    It's true I'm not an expert by any means in Polish. But I have to object - śliwka is a noun and is a nominative form. Possibly it comes from a genitive plural form, although to me it just looks like a diminutive of śliwa. Widsith 08:26, 18 December 2007 (UTC)[reply]
    Both śliwka and śliwa are nominative singulars. śliwka is plum, meaning the fruit, and śliwa is a plum tree. The genitive plural of śliwka is śliwek ("of plums"). The genitive plural of śliwa is śliw ("of plum trees"). —Stephen 00:27, 19 December 2007 (UTC)[reply]
    In that case, the entry in Wielki Słownik Polsko-Angielski is either wrong or very badly formatted. --EncycloPetey 02:56, 19 December 2007 (UTC)[reply]
    I find this unconvincing because the translation was not the bot's idea, it was the idea of the human that entered the translation. The bot's purpose is mostly to cut down on the duplication of work between entering non-English articles and translations at English articles. If humans entered bad translations, we need to feal with that, not let them sit in the articles wrong. As long as the articles have a disclaimer and are being maintained, it's a net plus. In the case of incorrect entries, it will help bring attention to poor translations that had gone unnoticed in the English article. Dmcdevit·t 01:34, 18 December 2007 (UTC)[reply]
    • It's not a net plus if the articles used have a high ratio of bad translations to good. We have no data on this. Keep in mind also that there are wiktionaries that duplicate our entries, so we wouldn't be just setting up new entries here with a disclaimer, we are potentially propagating bad entries to other wiktionaries that import from us. --EncycloPetey 01:39, 18 December 2007 (UTC)[reply]
    We have a lot of data. The November entries had an error rate in the 2-10% range depending on language. Not good enough. The December entries, of which I have checked 100+ carefully, have an observed error rate of zero. (If anyone can find one, I'll be very interested in figuring out how to fix it ;-) Robert Ullmann 10:56, 19 December 2007 (UTC)[reply]
      • Well, then perhaps my sympathy is in part based on the fact that in my experience most of the Spanish articles created by Tbot have been correct, or at least helpful. Dmcdevit·t 02:03, 18 December 2007 (UTC)[reply]
        • I expect that results from (1) having good Spanish language editors active here, (b) Drago didn't enter Spanish translations, and (iii) Spanish (in many ways) is much closer to English grammatically than Polish is. --EncycloPetey 02:10, 18 December 2007 (UTC)[reply]
    The entry is, in fact, correct. It is a Noun, and it does mean plum. Or course, it should also have the inflection added, as well as possibly giving more of the form in the definition line. See this which I edited and then reverted so current version is still the original. Strike that, Tbot's cross-check on the lemma form was correct, the "genitive plural" stuff above is bilge. I do trust the pl.wikt entry, it was written by Derbeth, a native speaker of Polish. The translation table at plum could be improved as well. So? When did we not have things that can be improved? We could also use an inflection/declension table for Polish nouns, so a section could be added to this and the other forms. Robert Ullmann 07:41, 18 December 2007 (UTC)[reply]
    wait-a-sec, I'm pretty sure I shouldn't be listening to you; the entry itself says it is the singular, and the pl.wikt itself defines pl:δαμάσκηνο as śliwka (owoc) "plum (fruit)". The Polish Wiktionary itself uses this form as the target of translations for "plum". Robert Ullmann 08:14, 18 December 2007 (UTC)[reply]
    ditto Swahili pl:zambarau, all of the words for plum (nominative singular) are defined there as śliwka. Where did you get this "genitive plural" stuff? (it might also be, I dunno, but it ain't shown as such, it would be lm D in the list of inflections. ;-) Robert Ullmann 08:29, 18 December 2007 (UTC)[reply]
    (one other note: if the argument is that as a "non-lemma" entry, it should just be some unhelpful dumbed-down stub ... forget it. We can and should have complete information on every entry. <snark> If you think that can't be done, you should stay out of the way of those who are doing it </snark> ;-) Robert Ullmann 07:48, 18 December 2007 (UTC)[reply]
    Of course, as the Polish Wiktionary itself makes it very clear that śliwka is the lemma form, all the rest of this objection is nonsense. Robert Ullmann 08:29, 18 December 2007 (UTC)[reply]
    Not according to Jan Stanisławski's Wielki Słownik Polsko-Angielski ('Great Polish-English Dictionary', considered one of the two best such translating dictionaries). --EncycloPetey 02:50, 19 December 2007 (UTC)[reply]
    Wait, you're saying that because we can't convince pl.wikt: to adhere to the reference you cite, we shouldn't accept translations? Did I misread this whole thread? --Connel MacKenzie 07:03, 19 December 2007 (UTC)[reply]
    Don't worry Connel, I'm sure the print dictionary (an OED pub) says exactly the same thing (that śliwka is the nominative singular) as the pl.wikt and our new entry. What happened here was that instead of simply entering an "oppose" vote, EP decided to try to discredit Tbot by contriving a reason why the example given was "wrong". This attempt fell completely flat (of course, that sort of malice always does ;-), showing rather that Derbeth+Widsith+Tbot created a entirely correct entry. (Then trying to blame the dictionary is embarrassing, or ought to be.) The way Tbot works now, it takes several exactly coincident errors for it to make a mistake. Not impossible of course, but I haven't seen any yet. Robert Ullmann 10:56, 19 December 2007 (UTC)[reply]
    I wish you would not argue against people like this. It seem to me that EncycloPetey has made several perfectly valid points, and it should be enough for you to deal with those rather than get sarcastic about his "malice" or "embarrassment". Widsith 11:11, 19 December 2007 (UTC)[reply]
    You are quite right, I apologize. It is sufficient to note that he set out to show—with whatever motivation—that Tbot had gotten it wrong (that śliwka was allegedly not the nominative singular), and failed. Tbot does check these things; it could see from the pl.wikt that the new entry would be correct (it is primarily reading the FL.wikt entry for a cross-check; sporking the image/audio etc is just a bonus ;-). Likewise, the Drago/WF entries (when bad) will not pass Tbot's checks. Robert Ullmann 11:45, 19 December 2007 (UTC)[reply]
    You might be a lot happier if you view the entries as created from the FL.wikt entries when correctly referenced by a translation table in an entry here. Junk translations entered by Drago or whomever won't create bogus entries here. Robert Ullmann 12:01, 19 December 2007 (UTC)[reply]
    Here is WF adding a batch of translations to open in March 2005. Note the Bulgarian. Tbot has updated open, looking for bg:отворен which exists, but has insufficient information to pass the check, so did not create отворен. (although it is in this case correct, and it has been checked since by someone else for the correct senses.) Robert Ullmann 12:47, 19 December 2007 (UTC)[reply]
    If you are thinking "yeah sure, the bot can't check the FL entry. I can't even read Polish, how can the bot?" ... I'll give you an example of a check: suppose you are a bot, you don't know Hungarian or Greek, but you do know how to do bot-like things (like sifting and rearranging and comparing data very quickly ;-) Now look at hu:vallás and el:θρησκεία, (the bot reads the wikitext) and ask your bot-brain: "Given that these are allegedly translations of the same English word, are they really? And what word is it?" Remember, no using human info, just data manipulation. See? Robert Ullmann 13:09, 19 December 2007 (UTC) (Oh, if you want to fry your brain, you can look up how it's done. 'though Tbot is simpler, as it has a priori knowledge.) Robert Ullmann 13:18, 19 December 2007 (UTC)[reply]
    Another example of a failed check: see blood, and note the Kurdish خوێن for which Tbot added the template, as {t+} because the entry exists, but didn't create the entry because the ku.wikt doesn't consider it a lemma (in this particular case a Sorani form). Robert Ullmann 14:04, 19 December 2007 (UTC)[reply]

Abstain edit

  1.   Abstain Circeus 20:00, 17 December 2007 (UTC) I believe there is much more potential in using the same style of bot to generate form-of entries.[reply]
    User:TheCheatBot uses the XML-dump approach to that problem already (allowing for a longer review period.) While I have no strong objection to alternate methods, I'm not sure I understand your complaint. --Connel MacKenzie 20:03, 17 December 2007 (UTC)[reply]
    These aren't mutually exclusive concepts; several people have run form-of bots, both in English and other languages. The potential for that doesn't affect this? Robert Ullmann 22:51, 17 December 2007 (UTC)[reply]
  2.   Abstain Widsith 08:28, 18 December 2007 (UTC)[reply]

Comment edit

  • I don't see any indication that the bot can handle entries with macrons or other marks that are included in inflection lines, but not used in page names. --EncycloPetey 01:36, 18 December 2007 (UTC)[reply]
    It does use the form from link alternation or alt= if the source already has {t} as the head= parameter to {infl}. Also uses the gender/number, script, and transliteration. I improved the documention at User:Tbot a little bit. Robert Ullmann 07:41, 18 December 2007 (UTC)[reply]
    Hmm... it still doesn't answer my question; perhaps I wasn't as clear as I should have been. Suppose a Latin translation in a table uses the macron-containing form (e.g. * Latin: [[alternīs]]). Would the bot create the entry under that name, or is it smart enough to realize that it should not create a Latin entry with a macron in the page name? --EncycloPetey 03:00, 19 December 2007 (UTC)[reply]
    Of course it won't create it with the macron, as the macrons are entirely an affectation of dictionaries (like the middle dots in some English dic·tion·a·ries) and are not part of the language. Robert Ullmann 05:47, 19 December 2007 (UTC)[reply]
    OK, now that answers my question. It wasn't obvious from the doco, but is a concern since a Latin translation is sometimes added with only the macron-bearing form. --EncycloPetey 01:37, 20 December 2007 (UTC)[reply]
    But macrons are part of the standard orthographies of some languages such as Latvian. —Stephen 20:00, 20 December 2007 (UTC)[reply]
    Yes, that is true, so the entries ought to be created for Latvian, just not for Latin. --EncycloPetey 23:42, 20 December 2007 (UTC)[reply]
    E.g. lietvārds. It doesn't have a specific rule for script for Latin; it was going to, but with the checks it is doing it isn't needed. Now if the la.wikt started creating entries with macrons in the page names, we would have an issue, since Tbot might find a matching entry, and think it was correct. Likewise, it won't add various transliterations shown in the tables without the correct script unless the FL.wikt has them, which they don't. (AFAIK: any known exceptions?) If some/any explicit check(s) on scripts are found to be needed, I will add it/them. Robert Ullmann 13:46, 21 December 2007 (UTC)[reply]
  • Is there any integration (and/or checking) done against OmegaWiki, at this point, or planned? With the GFDL→CC-by activities of WMF & GNU, it might be worth considering or expanding. --Connel MacKenzie 07:08, 24 December 2007 (UTC) AFAIK, OW can't copy content from the language Wiktionaries currently, but because it is dual licensed, we can copy content (with history attribution) from there. Even if only used as an additional verification step, it might be worth considering. --Connel MacKenzie 07:17, 24 December 2007 (UTC)[reply]

Decision edit