Wiktionary:Votes/bt-2006-03/Request for bot status: TheCheatBot

Discussion moved from Wiktionary:Beer parlour/2006/March#Request for 'bot status User:TheCheatBot.
  • Name: User:TheCheatBot
  • Owner: User: Connel MacKenzie
  • Purpose: Upload inflected forms (only if missing), translation entries (only if missing), redirects
  • Method:
    1. Analyze the "latest" XML dump.
    2. Find all words that are linked (text or any template.)
    3. For undefined entries, use a preload template to generate text.
    4. Magic: Upload using the pywikipedia tool pagesfromfile.py as so:
      1. Simple noun plurals that are not also third person verbs.
      2. Comparatives.
      3. Superlatives.
      4. Third person verb forms that are not also noun plurals.
      5. Present participles that are not adjective (manual list verification)
      6. Past and Past participles that are regular, i.e. match each other.
      7. Foreign language entries from list of de-wikified languages from WT:ELE.
      8. Noun plurals that are also third person verb forms.
      9. Fill in uppercase/lowercase missing redirects.

Note that for all nine tasks, each task will finish before moving on to the next.

VOTE:

*Approve 'bot flag:

  • Deny 'bot flag:
    1. This bot does too many things, including imposing formats and templates. That does not give alternative formats a fair and equal chance. Eclecticology 18:08, 13 March 2006 (UTC)[reply]
      If find your complaints very irrational. You've embodied so many misconceptions in two little sentences, I'm baffled and unsure where to begin. --Connel MacKenzie T C 00:30, 14 March 2006 (UTC)[reply]
      • OK, The main thing I do not understand about your complaint is the "imposing formats" (no, I'm not!) "and templates" bit. Before you made your comment, I finished testing the first hundred or so entries. During that test, I refined the logic to exclude noun plurals that are also verb third person forms. During the initial test, five slipped through. But I invite you to point out so much as one single entry where the 'bot code employed a template! When parsing current root forms, I honor ALL templates, including the ones I loathe created by Ncik. --Connel MacKenzie T C 06:07, 14 March 2006 (UTC)[reply]
      • The second major misconception, is that these are not inherently integral tasks. The parsing that is done on an XML dump is naturally conducive to combining these. For practical implementation, I hoped to solicit comments at each phase, after completing the prior phase. That way, as new XML dumps become available, the earlier iterations are folded in. Also, finding where to add comments would then be easier. But since you insiste they be broken apart, TheCheatBot is now resubmitted (in a separate beer parlour section!) to perform only the noun plurals portion. --Connel MacKenzie T C 06:07, 14 March 2006 (UTC)[reply]
      • The implication that this somehow "imposes" anything is quite far-fetched! I currently enter these manually. I thought I was being nice by trying to do them outside of my user account, to avoid accusations of artificially boosting my edit count or other such nonsense. Furthermore, I prefer to use my semi-automated edits only when human review is required. For all these entries, no such review is needed (beyond pre-scanning the generated lists before each run.) --Connel MacKenzie T C 06:07, 14 March 2006 (UTC)[reply]
    2. I support this effort, but there are a few concerns with the automation (see below). There are other additions that could be made to fill out the pages more completely, including an etymology and a note not to leave translations for inflected senses. Also I'm unsatisfied with the format (see below) but for me that's not a barrier to activation, which is fortunate since it seems no one wants to make concessions on style here anyways. Davilla 19:51, 13 March 2006 (UTC)[reply]
      Thanks for the support. --Connel MacKenzie T C 00:30, 14 March 2006 (UTC)[reply]
    3. Having these redirects is wrong. Proposing a bot that will add them is asking for me to vote against it. GerardM 20:18, 13 March 2006 (UTC)[reply]
      They are here by English Wiktionary consensus. We have headwords and redirects - the redirects are not "spelling errors" as you continually assert...and having them 'bot added is simply because no human wants to sit around entering them. But if they are in place, we frustrate fewer visitors. --Connel MacKenzie T C 00:30, 14 March 2006 (UTC)[reply]
      The massive number of redirects that we have from capitalized to de-capitalized titles are a by-product of our changeover to first letter case sensitivity. They are kept because of a desire by some to protect links that already existed before the changeover. This should not be viewed as a green light to enter more such redirects. Eclecticology 20:02, 14 March 2006 (UTC)[reply]
  • Comments:
  • Note that denying 'bot flag will only accomplish flooding Special:Recentchanges. --Connel MacKenzie T C 02:23, 13 March 2006 (UTC)[reply]
re: Fill in uppercase/lowercase missing redirects. - I would vote against doing that. We have a policy (of sorts!). A word is only capitalized in special circumstances. So why put in tens of thousands of simple redirects which go against that rule. If a link is mistakenly capitalized (ie: a link would succeed if it were not capitalized), then the preferred action would be to de-capitalize the mistaken capitalized link, not add a redirect. ?--Richardb 08:26, 13 March 2006 (UTC)[reply]
The idea for the redirects is to assist Wikipedia (and Wikipedia-like) external links, that have auto-capitalization correctly turned on. The a 'bot being able to do it for us (note: without affecting the entry count) why should it be considered as an option? --Connel MacKenzie T C 00:30, 14 March 2006 (UTC)[reply]
I support redirects from lower-case to upper-case entries, as the typical user will expect searches to be case-insensitive. I don't see any point at all in doing them in the opposite direction. — Paul G 10:04, 13 March 2006 (UTC)[reply]
Sounds like good policy to me. Davilla 19:51, 13 March 2006 (UTC)[reply]
I can abide with this, but I'd still like the previous possibility discussed. --Connel MacKenzie T C 00:30, 14 March 2006 (UTC)[reply]
Actually, shouldn't upper-case be redirected to lower-case since "Something" at the beginning of a sentence means "something"? But "villareal" is a misspelling of "Villareal" since the name should always be capitalized. Anyways the search mechanism already handles this. I would support deleting all redirects of capitalization, for single words (no spaces) at minimum. Davilla 03:19, 15 March 2006 (UTC)[reply]
That is demonstrably false. Wikified links do not search. External links do not search. Sister project wikifications do not search. Sister project links do not search. Mirror sites that politely/correctly refer back here do not search. (Note: most mirrors are using inherently out-dated XML dumps.) The search logic was modified as a stop gap, but the fundamental problem still exists; Wiktionary no longer (for the last 9 months) has headwords, so redirects function only as navigation aides. Redirects do not count against the entry count. Deleting redirects is just stupid. The only thing deleting a redirect accomplishes (vandalism aside) is making the site en.wiktionary.org less useful. --Connel MacKenzie T C 05:20, 15 March 2006 (UTC)[reply]
  • More thought goes into this when new entries are created manually for the inflections. Here are a few things that I can think of. For comparatives, sometimes both "adj+er" and "more adj" are acceptable, as in stupider = more stupid, and should be reflected as a synonym on the inflected page. Similarly for superlatives. (If in some cases these may have other definitions, those are rare.)
As with present participles, past participles that might have additional adjective senses, e.g. tired, could leave the page as a stub. It would almost be better to have nothing there and know as much then to have to comb through looking for obvious definitions that were left out by a bot. The best solution of course is to also verify past participles from a list.
As to present participles themselves, there's also the debate about gerunds that had come up between us before, and you sided on leaving them out. I still think the majority of "v+ing" forms can be nouns, as a simple translation of the sentence "I like to v..." into "v+ing... is fun" can attest to. Davilla 19:51, 13 March 2006 (UTC)[reply]
  • I'm not crazy about the language "third person singular simple present". Is there an alternative? I prefer "Simple past of verb" and "Past participle of verb", using the base form of the verb, to "Simple past and past participle of to verb". The infinitive form is more commonly recognized in romance languages, and different lines are potentially needed to differentiate regional use. In general, a distinction should be made between a definition that equates the title, preferred, and as here a definition that describes it, e.g. "A word that means...". Davilla 19:58, 13 March 2006 (UTC)[reply]
Davilla, I'll respond to these concerns more clearly elsewhere. --Connel MacKenzie T C 00:30, 14 March 2006 (UTC)[reply]
  • Great idea, but not quite yet. My entirely frivolous reason: our 60,000+ real-ish English articles are within spitting distance of the 70,000 entries claimed by a bunch of mid-sized dictionaries, like the Concise Oxford. I'd find it more emotionally satisfying to cross that threshold honestly, dance a jig, throw a party, then start cheating like hell. Keffy 22:35, 13 March 2006 (UTC)[reply]
  • More seriously: Comparative adjectives should be skipped and done manually if the stem is also a verb, since those will likely also have an agentive noun that will need a human-written definition. Keffy 22:35, 13 March 2006 (UTC)[reply]
Keffy, we already have thousands of such entries. Are you able to exclude them in a consistent manner now, when generating your statistics?
"TheCheatBot" is so named in honor of homestarrunner.com's StrongBad e-mail making me laugh for minutes - it is not about cheating the count.
For noun plurals, comparatives, and superlatives, yes, I skip them outright, if they share a verb inflection. --Connel MacKenzie T C 00:30, 14 March 2006 (UTC)[reply]
What was someone saying earlier about Beer Parlouring being addictive? I actually found inputing inflected forms a useful way of getting used to Wictionary but there are way too many to contemplate doing them all manually. Lets get on with it, how can we let FR be ahead of EN? :-) MGSpiller 02:10, 14 March 2006 (UTC)[reply]
I really don't care which project has the most articles. IMHO quality is more important than quantity. In theory, if completed projects were possible, and they all included all words in all languages, they should all have the same number of articles. :-) Eclecticology 20:02, 14 March 2006 (UTC)[reply]

I read the strike throughs above as indications that this bot in this form has been withdrawn, and replaced by the series of narrower scoped bots described below. Further comments should prferably be put in the relevant section below. Eclecticology 20:02, 14 March 2006 (UTC)[reply]