Wiktionary talk:Todo

Latest comment: 8 days ago by -sche in topic find duplicate definitions

Archive edit

Old discussions have been archived to Wiktionary talk:Todo/archive.

Common misspellings edit

A great many misspellings occur in our entries, even in headers. See Wiktionary:Todo/Misspellings, and add to it. Then, we can periodically search for and eliminate instances of the listed misspellings. - -sche (discuss) 05:40, 2 March 2014 (UTC)Reply

Stray spaces edit

Stray spaces appear in a number of predictable/easily-findable circumstances, such as this. - -sche (discuss) 06:55, 13 July 2014 (UTC)Reply

Instances of people starting with pl2= or other numbered parameters >1 edit

According to TemplateTiger, this was the only example of someone using {{en-proper noun}} and starting with pl2= rather than with the unnamed first parameter. It might be fruitful to check if other templates have been used in the same way, i.e. with a second plural form ("pl2=") declared prior to any first plural was declared, particularly if (as here) no first plural is automatically displayed, and/or pl2= is set to whatever the automatically displayed plural would be. - -sche (discuss) 19:23, 1 August 2014 (UTC)Reply

Tool for manually finding misspelt or unsupported parameters edit

TemplateTiger is a good tool for finding which entries use certain parameters of a given template, regardless of whether or not the template supports those parameters. Among other things, this can allow one to find misspelt or mistaken parameters, like the "compound=" or "current=" parameters formerly used in save-all and barrel roll, or the following misspellings of "head=" : "head]", "haed", "hwad", "heead". - -sche (discuss) 20:35, 1 August 2014 (UTC)Reply

Middle dots as decimal points edit

Doremítzwr seems to have used middle dots as decimal points(??) in some entries, e.g. the depth measurement here. These should be located and cleaned up. - -sche (discuss) 03:59, 20 August 2014 (UTC)Reply

Look for brackets in the displayed text of pages edit

If someone could examine the displayed text of pages (as opposed to the wikitext) and look for instances of {{, [{, {[, [[, }}, ]}, }], or ]], that would probably be informative. I imagine most occurrences of such strings are the result of mismatched brackets or bot-errors breaking templates across lines. - -sche (discuss) 20:11, 27 August 2014 (UTC)Reply

I suppose one would have to have some sort of local wiki markup parser to do this. DTLHS (talk) 20:36, 27 August 2014 (UTC)Reply
Is mwparserfromhell of use? - -sche (discuss) 20:54, 27 August 2014 (UTC)Reply
I believe mwparserfromhell will only give you valid templates / links- it's not going to tell you if something is malformed. DTLHS (talk) 01:04, 28 August 2014 (UTC)Reply
Wiktionary:Todo/bad links. I just looked for lines where the number of occurrences of "[[" doesn't match that of "]]". Technically links can extend over multiple lines (but they probably shouldn't). Looking for malformed templates is much harder. DTLHS (talk) 03:15, 29 August 2014 (UTC)Reply
That looks like a very useful list; thank you! I've cleaned up a few entries already. One idea I may suggest in the GP is that we try to make an abuse filter that tags edits that leave a page with more [[s than ]]s or {{s than }}s, to alert us to new instances. I think abuse filters can do that; there's one than warns people against <ref> without <references/>. - -sche (discuss) 06:52, 29 August 2014 (UTC)Reply

Commas after {{circa}} edit

A bot could check for an remove commas after {{circa}} (which itself adds a comma, making an additional comma superfluous), like so. - -sche (discuss) 19:43, 10 June 2015 (UTC)Reply

Random excessive whitespace edit

Like this. - -sche (discuss) 21:47, 20 June 2015 (UTC)Reply

Latin infinitives glossed as first-person forms edit

I've noticed several entries like this one, where the infinitive (not the first-person form) of a Latin word is given, but it is glossed as a first-person form. This is obviously incorrect regardless of whether one prefers to lemmatize infinitives or first-person forms. - -sche (discuss) 02:50, 29 June 2015 (UTC)Reply

Untemplatized links to dictionaries edit

Should be found and templatized like [1]. I will try to do this myself. - -sche (discuss) 02:19, 7 July 2015 (UTC)Reply

English terms spelled with Æ/Œ not marked as archaic/obsolete edit

For example, [2]. Some are valid (Æsir) but most are not. - -sche (discuss) 05:37, 30 July 2015 (UTC)Reply

@-sche User:DTLHS/cleanup/english ae oe DTLHS (talk) 20:15, 20 August 2015 (UTC)Reply
Thank you! If it's not too difficult, would it be possible to remove inflected forms of lemmas which also have Æ/Œ (e.g. œcologies, plural of œcology) — in such cases, it's sufficient that the lemma be marked; the plurals are generally not any more obsolete than the lemmas. Plurals of lemmas that don't contain Æ/Œ (e.g. cassiæ, plural of cassia) should stay on the list, since in those cases the plurals usually are more obsolete than other possible plurals. If that's too much bother, don't worry about it — I'll go through the entries on the list with AWB and can easily ignore œcologies-type entries. - -sche (discuss) 22:24, 20 August 2015 (UTC)Reply
I don't really have an easy way to distinguish them, sorry. DTLHS (talk) 22:35, 20 August 2015 (UTC)Reply
Since I’m responsible for a sizeable chunk of these, I feel obligated to express my regret that I’m making you clean these up. I was pretty ignorant and immature back then, but I realise now that I was acting inappropriately. --Romanophile (talk) 08:47, 23 August 2015 (UTC)Reply

Miscellaneous additional periodic tasks edit

Misformatted/indented quotes edit

Maybe this is already covered here, e.g. Special:Diff/48477724. – Jberkel 21:39, 4 February 2018 (UTC)Reply

It's not easy to determine automatically that a line contains a quote and not some other type of content. DTLHS (talk) 21:45, 4 February 2018 (UTC)Reply
No, it isn't, but we could at least catch some specific types of malformation like the one in that diff, probably even with an edit filter (what do you think, @Chuck Entz?). #: '''[0-9][0-9][0-9][0-9]''' is another red flag (though may have some false positives?). - -sche (discuss) 21:59, 4 February 2018 (UTC)Reply
@Jberkel User:DTLHS/cleanup/quote template line starts. Disclaimers: not all of these are errors, this is only for English entries, and this won't catch anything that doesn't use a quote-X template. DTLHS (talk) 22:42, 4 February 2018 (UTC)Reply
@DTLHS: that's a good start, thanks, i'll work my way through the list. Jberkel 22:52, 4 February 2018 (UTC)Reply
@DTLHS done. will this list get regenerated from the next dump? – Jberkel 10:24, 6 February 2018 (UTC)Reply
Sure, if you want me to. DTLHS (talk) 16:18, 6 February 2018 (UTC)Reply
@DTLHS could you run this again with the new dump? ta. – Jberkel 23:40, 23 February 2018 (UTC)Reply
@Jberkel Done. DTLHS (talk) 01:58, 24 February 2018 (UTC)Reply
@DTLHS again, please. – Jberkel 10:17, 24 March 2018 (UTC)Reply
@Jberkel Updated. DTLHS (talk) 23:01, 25 March 2018 (UTC)Reply
@DTLHS: thanks, I'm slowly getting there. What is the point of the quotations header, isn't it redundant with Citations: and the sense-related quotes? – Jberkel 23:40, 25 March 2018 (UTC)Reply
Probably, it's just a lot of work to get rid of it. DTLHS (talk) 23:41, 25 March 2018 (UTC)Reply
@DTLHS: As a first step, couldn't we (semi-automatically?) shove them under the carpet (aka Citations:), then deprecate the L4 header? – Jberkel 10:52, 26 March 2018 (UTC)Reply
I don't like using citation pages to hold quotations that could be underneath sense lines. DTLHS (talk) 18:22, 26 March 2018 (UTC)Reply
Right, but that's 100% manual work. Best to leave it then. – Jberkel 19:32, 26 March 2018 (UTC)Reply

English multiple etymologies, categorized edit

Possible task: find pages in Category:English terms with multiple etymologies that don't have multiple etymology sections (checking for pages that don't contain "=Etymology 2=" seems like one obvious way of doing that), which should probably be removed. In the other direction, look for pages that do have "=Etymology 2=" within an English section and aren't in this category yet. - -sche (discuss) 06:42, 30 May 2018 (UTC)Reply

Male and female given names separately on same line edit

...should be combined like this. I will try to search for instances of this myself later. - -sche (discuss) 21:56, 15 June 2018 (UTC)Reply

Form-of templates used as etymologies edit

...like this, should be cleaned up. - -sche (discuss) 00:14, 27 January 2020 (UTC)Reply

Labels with wrong language code edit

As here. (Should try to catch these systematically.) - -sche (discuss) 18:33, 29 September 2020 (UTC)Reply

Words with the "religion" label specific to one religion edit

Many entries with the {{lb|en|religion}} label are specific to Christianity (or rarely to another religion such as Buddhism) and should use the more specific label instead, for example "use". (Several other entries should not use the label at all, like Jew or Calvinist.) - -sche (discuss) 10:55, 26 November 2020 (UTC)Reply

RFC discussion: June 2018 edit

 

The following discussion has been moved from Wiktionary:Requests for cleanup (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Modern words "borrowed" from proto-languages

I just listed this as a WT:TODO task because I expect it'll keep being an issue even after we fix the existing cases, but: numerous entries in "Terms borrowed from Proto-Foo" categories (like Category:Terms borrowed from Proto-Slavic) were not actually "borrowed" by the L2 language in the way we use the word; see e.g. here. (Surprisingly, one English word apparently was borrowed from Proto-Indo-European, ghrelin.) - -sche (discuss) 08:37, 11 June 2018 (UTC)Reply

Words that are obviously gendered but not defined as such edit

https://en.wiktionary.org/w/index.php?title=hatchet_man&diff=66526741&oldid=66526489Fish bowl (talk) 09:58, 28 April 2022 (UTC)Reply

T:en-plural nouns that (maybe) aren't edit

There is a longstanding issue of how to handle group names like "the Abenaki", "the Venda", etc. Many are listed as plural-only using the template above, but there are in fact cites of the singulars ("a Venda") and of the plurals ("Vendas"), so my impression is that these are supposed to be recast as singulars which can have either regular plurals (Vendas) or invariant plurals (Venda). I spy quite a few of these at Special:WhatLinksHere/Template:en-plural_noun. - -sche (discuss) 21:37, 13 August 2022 (UTC)Reply

""double quotes"" in T:qfliteral edit

I came across one entry, and did a database dump search and found three more entries, which 'manually' wrote quotation marks inside T:qfliteral, which itself adds quotation marks, resulting in ""double""; this might be worth checking for once a year or something. είναι κινέζικα για μένα, αυτά μου φαίνονται κινέζικα, εντελώς αβέβαιο, durante beneplacito. - -sche (discuss) 06:48, 3 January 2024 (UTC)Reply

find duplicate definitions edit

Spitballing: a recurring issue is that when an entry has multiple etymology sections, people only look at the first one and, if not seeing the sense they seek, add it there or add it as a new etymology section (without noticing it is already present in another etymology section). Examples that I can find offhand are the cases de-duplicated in diff of e and diff of linn. I wonder if we could make a list of entries (in a given language: say we start with English) that have multiple etymology sections, and then winnow it to only cases where the definitions in ety 1 and 2 have "important" words in common, for example by 1. retaining cases where definitions had any words in common other than words on a list of "unimportant" words like "the", "of", "and", "for", "from", etc (and also excluding where one of the definitions was a non-gloss like "past tense of foo"), and 2. looking at the results and expanding the list of "unimportant" words, thus progressively winnowing the list of entries that have "important" words in common until it's a manageable size to put the "sets of definitions with words in common" on a page and let a human look them over and spot duplicates. Not saying this is a priority, and not sure if it's feasible, but I'm mentioning the idea. - -sche (discuss) 17:53, 2 May 2024 (UTC)Reply

Return to the project page "Todo".