Wiktionary talk:Todo

Common misspellings

Latest comment: 10 years ago1 comment1 person in discussion

A great many misspellings occur in our entries, even in headers. See Wiktionary:Todo/Misspellings, and add to it. Then, we can periodically search for and eliminate instances of the listed misspellings. - -sche (discuss) 05:40, 2 March 2014 (UTC)Reply

Stray spaces

Latest comment: 9 years ago1 comment1 person in discussion

Stray spaces appear in a number of predictable/easily-findable circumstances, such as this. - -sche (discuss) 06:55, 13 July 2014 (UTC)Reply

Instances of people starting with pl2= or other numbered parameters >1

Latest comment: 9 years ago1 comment1 person in discussion

According to TemplateTiger, this was the only example of someone using {{en-proper noun}} and starting with pl2= rather than with the unnamed first parameter. It might be fruitful to check if other templates have been used in the same way, i.e. with a second plural form ("pl2=") declared prior to any first plural was declared, particularly if (as here) no first plural is automatically displayed, and/or pl2= is set to whatever the automatically displayed plural would be. - -sche (discuss) 19:23, 1 August 2014 (UTC)Reply

Tool for manually finding misspelt or unsupported parameters

Latest comment: 9 years ago1 comment1 person in discussion

TemplateTiger is a good tool for finding which entries use certain parameters of a given template, regardless of whether or not the template supports those parameters. Among other things, this can allow one to find misspelt or mistaken parameters, like the "compound=" or "current=" parameters formerly used in save-all and barrel roll, or the following misspellings of "head=" : "head]", "haed", "hwad", "heead". - -sche (discuss) 20:35, 1 August 2014 (UTC)Reply

Middle dots as decimal points

Latest comment: 9 years ago1 comment1 person in discussion

Doremítzwr seems to have used middle dots as decimal points(??) in some entries, e.g. the depth measurement here. These should be located and cleaned up. - -sche (discuss) 03:59, 20 August 2014 (UTC)Reply

Look for brackets in the displayed text of pages

Latest comment: 9 years ago6 comments2 people in discussion

If someone could examine the displayed text of pages (as opposed to the wikitext) and look for instances of {{, [{, {[, [[, }}, ]}, }], or ]], that would probably be informative. I imagine most occurrences of such strings are the result of mismatched brackets or bot-errors breaking templates across lines. - -sche (discuss) 20:11, 27 August 2014 (UTC)Reply

I suppose one would have to have some sort of local wiki markup parser to do this. DTLHS (talk) 20:36, 27 August 2014 (UTC)Reply

Is mwparserfromhell of use? - -sche (discuss) 20:54, 27 August 2014 (UTC)Reply

I believe mwparserfromhell will only give you valid templates / links- it's not going to tell you if something is malformed. DTLHS (talk) 01:04, 28 August 2014 (UTC)Reply

Wiktionary:Todo/bad links. I just looked for lines where the number of occurrences of "[[" doesn't match that of "]]". Technically links can extend over multiple lines (but they probably shouldn't). Looking for malformed templates is much harder. DTLHS (talk) 03:15, 29 August 2014 (UTC)Reply

That looks like a very useful list; thank you! I've cleaned up a few entries already. One idea I may suggest in the GP is that we try to make an abuse filter that tags edits that leave a page with more [[s than ]]s or {{s than }}s, to alert us to new instances. I think abuse filters can do that; there's one than warns people against <ref> without <references/>. - -sche (discuss) 06:52, 29 August 2014 (UTC)Reply

Commas after `{{circa}}`

Latest comment: 9 years ago1 comment1 person in discussion

A bot could check for an remove commas after {{circa}} (which itself adds a comma, making an additional comma superfluous), like so. - -sche (discuss) 19:43, 10 June 2015 (UTC)Reply

Random excessive whitespace

Latest comment: 9 years ago1 comment1 person in discussion

Like this. - -sche (discuss) 21:47, 20 June 2015 (UTC)Reply

Latin infinitives glossed as first-person forms

Latest comment: 9 years ago1 comment1 person in discussion

I've noticed several entries like this one, where the infinitive (not the first-person form) of a Latin word is given, but it is glossed as a first-person form. This is obviously incorrect regardless of whether one prefers to lemmatize infinitives or first-person forms. - -sche (discuss) 02:50, 29 June 2015 (UTC)Reply

Untemplatized links to dictionaries

Latest comment: 9 years ago1 comment1 person in discussion

Should be found and templatized like [1]. I will try to do this myself. - -sche (discuss) 02:19, 7 July 2015 (UTC)Reply

English terms spelled with Æ/Œ not marked as archaic/obsolete

Latest comment: 8 years ago5 comments3 people in discussion

For example, [2]. Some are valid (Æsir) but most are not. - -sche (discuss) 05:37, 30 July 2015 (UTC)Reply

@-sche User:DTLHS/cleanup/english ae oe DTLHS (talk) 20:15, 20 August 2015 (UTC)Reply

Thank you! If it's not too difficult, would it be possible to remove inflected forms of lemmas which also have Æ/Œ (e.g. œcologies, plural of œcology) — in such cases, it's sufficient that the lemma be marked; the plurals are generally not any more obsolete than the lemmas. Plurals of lemmas that don't contain Æ/Œ (e.g. cassiæ, plural of cassia) should stay on the list, since in those cases the plurals usually are more obsolete than other possible plurals. If that's too much bother, don't worry about it — I'll go through the entries on the list with AWB and can easily ignore œcologies-type entries. - -sche (discuss) 22:24, 20 August 2015 (UTC)Reply

I don't really have an easy way to distinguish them, sorry. DTLHS (talk) 22:35, 20 August 2015 (UTC)Reply

Since I’m responsible for a sizeable chunk of these, I feel obligated to express my regret that I’m making you clean these up. I was pretty ignorant and immature back then, but I realise now that I was acting inappropriately. --Romanophile (talk) 08:47, 23 August 2015 (UTC)Reply

Miscellaneous additional periodic tasks

Latest comment: 6 years ago6 comments1 person in discussion

For reasons discussed in this old thread, some uses of the label "proscribed" on entries already labelled "colloquial", "informal", etc are not sensible. Searches like insource:"lb en informal proscribed", insource:"lb en proscribed informal" etc catch them. - -sche (discuss) 20:42, 4 February 2018 (UTC)Reply
Check for miscapitalized labels; see Wiktionary:Grease pit/2015/August#Miscapitalized_labels. - -sche (discuss) 21:07, 4 February 2018 (UTC)Reply
Watch out for quotations which are on the same line as their bibliographic particulars (an old list was at Wiktionary:Todo/Single-line quotes; entries of the sort seem to persist). - -sche (discuss) 14:30, 19 February 2018 (UTC)Reply
Check that no more WT:Todo/Linked language names in trans tables remain. - -sche (discuss) 14:30, 19 February 2018 (UTC)Reply
Watch out for instances of "from from", and (a bigger problem on WP) "an" + consonant sounds that aren't used with "an" in any standard dialect (e.g. I just fixed one each of "an special" and "an school"). - -sche (discuss) 14:30, 19 February 2018 (UTC)Reply

The label "idiomatic" is arguably useless (Wiktionary:Beer parlour/2014/January#(idiomatic)). The labels "dialect(al)" and "regional" should generally be replaced with more specific information (I brought this up in the BP or maybe GP but I forget precisely where). - -sche (discuss) 23:13, 3 March 2018 (UTC)Reply

Misformatted/indented quotes

Latest comment: 6 years ago16 comments3 people in discussion

Maybe this is already covered here, e.g. Special:Diff/48477724. – Jberkel 21:39, 4 February 2018 (UTC)Reply

It's not easy to determine automatically that a line contains a quote and not some other type of content. DTLHS (talk) 21:45, 4 February 2018 (UTC)Reply

No, it isn't, but we could at least catch some specific types of malformation like the one in that diff, probably even with an edit filter (what do you think, @Chuck Entz?). #: '''[0-9][0-9][0-9][0-9]''' is another red flag (though may have some false positives?). - -sche (discuss) 21:59, 4 February 2018 (UTC)Reply

@Jberkel User:DTLHS/cleanup/quote template line starts. Disclaimers: not all of these are errors, this is only for English entries, and this won't catch anything that doesn't use a quote-X template. DTLHS (talk) 22:42, 4 February 2018 (UTC)Reply

@DTLHS: that's a good start, thanks, i'll work my way through the list. Jberkel 22:52, 4 February 2018 (UTC)Reply

@DTLHS done. will this list get regenerated from the next dump? – Jberkel 10:24, 6 February 2018 (UTC)Reply

Sure, if you want me to. DTLHS (talk) 16:18, 6 February 2018 (UTC)Reply

@DTLHS could you run this again with the new dump? ta. – Jberkel 23:40, 23 February 2018 (UTC)Reply

@Jberkel Done. DTLHS (talk) 01:58, 24 February 2018 (UTC)Reply

@DTLHS again, please. – Jberkel 10:17, 24 March 2018 (UTC)Reply

@Jberkel Updated. DTLHS (talk) 23:01, 25 March 2018 (UTC)Reply

@DTLHS: thanks, I'm slowly getting there. What is the point of the quotations header, isn't it redundant with Citations: and the sense-related quotes? – Jberkel 23:40, 25 March 2018 (UTC)Reply

Probably, it's just a lot of work to get rid of it. DTLHS (talk) 23:41, 25 March 2018 (UTC)Reply

@DTLHS: As a first step, couldn't we (semi-automatically?) shove them under the carpet (aka Citations:), then deprecate the L4 header? – Jberkel 10:52, 26 March 2018 (UTC)Reply

I don't like using citation pages to hold quotations that could be underneath sense lines. DTLHS (talk) 18:22, 26 March 2018 (UTC)Reply

Right, but that's 100% manual work. Best to leave it then. – Jberkel 19:32, 26 March 2018 (UTC)Reply

English multiple etymologies, categorized

Latest comment: 6 years ago1 comment1 person in discussion

Possible task: find pages in Category:English terms with multiple etymologies that don't have multiple etymology sections (checking for pages that don't contain "=Etymology 2=" seems like one obvious way of doing that), which should probably be removed. In the other direction, look for pages that do have "=Etymology 2=" within an English section and aren't in this category yet. - -sche (discuss) 06:42, 30 May 2018 (UTC)Reply

Male and female given names separately on same line

Latest comment: 6 years ago1 comment1 person in discussion

...should be combined like this. I will try to search for instances of this myself later. - -sche (discuss) 21:56, 15 June 2018 (UTC)Reply

Form-of templates used as etymologies

Latest comment: 4 years ago1 comment1 person in discussion

...like this, should be cleaned up. - -sche (discuss) 00:14, 27 January 2020 (UTC)Reply

Labels with wrong language code

Latest comment: 3 years ago1 comment1 person in discussion

As here. (Should try to catch these systematically.) - -sche (discuss) 18:33, 29 September 2020 (UTC)Reply

Words with the "religion" label specific to one religion

Latest comment: 3 years ago1 comment1 person in discussion

Many entries with the {{lb|en|religion}} label are specific to Christianity (or rarely to another religion such as Buddhism) and should use the more specific label instead, for example "use". (Several other entries should not use the label at all, like Jew or Calvinist.) - -sche (discuss) 10:55, 26 November 2020 (UTC)Reply

RFC discussion: June 2018

Latest comment: 6 years ago1 comment1 person in discussion

The following discussion has been moved from Wiktionary:Requests for cleanup (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Modern words "borrowed" from proto-languages

I just listed this as a WT:TODO task because I expect it'll keep being an issue even after we fix the existing cases, but: numerous entries in "Terms borrowed from Proto-Foo" categories (like Category:Terms borrowed from Proto-Slavic) were not actually "borrowed" by the L2 language in the way we use the word; see e.g. here. (Surprisingly, one English word apparently was borrowed from Proto-Indo-European, ghrelin.) - -sche (discuss) 08:37, 11 June 2018 (UTC)Reply

Words that are obviously gendered but not defined as such

Latest comment: 2 years ago1 comment1 person in discussion

https://en.wiktionary.org/w/index.php?title=hatchet_man&diff=66526741&oldid=66526489 —Fish bowl (talk) 09:58, 28 April 2022 (UTC)Reply

T:en-plural nouns that (maybe) aren't

Latest comment: 1 year ago1 comment1 person in discussion

There is a longstanding issue of how to handle group names like "the Abenaki", "the Venda", etc. Many are listed as plural-only using the template above, but there are in fact cites of the singulars ("a Venda") and of the plurals ("Vendas"), so my impression is that these are supposed to be recast as singulars which can have either regular plurals (Vendas) or invariant plurals (Venda). I spy quite a few of these at Special:WhatLinksHere/Template:en-plural_noun. - -sche (discuss) 21:37, 13 August 2022 (UTC)Reply

""double quotes"" in T:qfliteral

Latest comment: 6 months ago1 comment1 person in discussion

I came across one entry, and did a database dump search and found three more entries, which 'manually' wrote quotation marks inside T:qfliteral, which itself adds quotation marks, resulting in ""double""; this might be worth checking for once a year or something. είναι κινέζικα για μένα, αυτά μου φαίνονται κινέζικα, εντελώς αβέβαιο, durante beneplacito. - -sche (discuss) 06:48, 3 January 2024 (UTC)Reply

find duplicate definitions

Latest comment: 2 months ago1 comment1 person in discussion

Spitballing: a recurring issue is that when an entry has multiple etymology sections, people only look at the first one and, if not seeing the sense they seek, add it there or add it as a new etymology section (without noticing it is already present in another etymology section). Examples that I can find offhand are the cases de-duplicated in diff of e and diff of linn. I wonder if we could make a list of entries (in a given language: say we start with English) that have multiple etymology sections, and then winnow it to only cases where the definitions in ety 1 and 2 have "important" words in common, for example by 1. retaining cases where definitions had any words in common other than words on a list of "unimportant" words like "the", "of", "and", "for", "from", etc (and also excluding where one of the definitions was a non-gloss like "past tense of foo"), and 2. looking at the results and expanding the list of "unimportant" words, thus progressively winnowing the list of entries that have "important" words in common until it's a manageable size to put the "sets of definitions with words in common" on a page and let a human look them over and spot duplicates. Not saying this is a priority, and not sure if it's feasible, but I'm mentioning the idea. - -sche (discuss) 17:53, 2 May 2024 (UTC)Reply

Add topic

Wiktionary talk:Todo

Contents

Archive

Common misspellings

Stray spaces

Instances of people starting with pl2= or other numbered parameters >1

Tool for manually finding misspelt or unsupported parameters

Middle dots as decimal points

Look for brackets in the displayed text of pages

Commas after `{{circa}}`

Random excessive whitespace

Latin infinitives glossed as first-person forms

Untemplatized links to dictionaries

English terms spelled with Æ/Œ not marked as archaic/obsolete

Miscellaneous additional periodic tasks

Misformatted/indented quotes

English multiple etymologies, categorized

Male and female given names separately on same line

Form-of templates used as etymologies

Labels with wrong language code

Words with the "religion" label specific to one religion

RFC discussion: June 2018

Words that are obviously gendered but not defined as such

T:en-plural nouns that (maybe) aren't

""double quotes"" in T:qfliteral

find duplicate definitions

Wiktionary talk:Todo

Archive

Common misspellings

Stray spaces

Instances of people starting with pl2= or other numbered parameters >1

Tool for manually finding misspelt or unsupported parameters

Middle dots as decimal points

Look for brackets in the displayed text of pages

Commas after {{circa}}

Random excessive whitespace

Latin infinitives glossed as first-person forms

Untemplatized links to dictionaries

English terms spelled with Æ/Œ not marked as archaic/obsolete

Miscellaneous additional periodic tasks

Misformatted/indented quotes

English multiple etymologies, categorized

Male and female given names separately on same line

Form-of templates used as etymologies

Labels with wrong language code

Words with the "religion" label specific to one religion

RFC discussion: June 2018

Words that are obviously gendered but not defined as such

T:en-plural nouns that (maybe) aren't

""double quotes"" in T:qfliteral

find duplicate definitions

Commas after `{{circa}}`