Open main menu

Wiktionary β

This page is for cleanup jobs. Request jobs are at Wiktionary:Things to do.

This page lists cleanup requests affecting multiple entries. These may include updating templates, categories or generic entry structure, but not specific terms, which should be tagged with {{rfc}} and put on WT:RFC. Therefore, tasks that have previously been divided across discussion and user pages are grouped together in one place where they are easier to find.

Regular tasksEdit

Semi-regular tasksEdit

Usually dump-analyzed:

  • Unhelpful abbreviations — These should use the full term.
  • Category:Limit of template reached.
  • Occasionally, people write {{w:Foobar}}, this should be {{w|Foobar}} (and can be found by searching the database for instances of {{w:).
  • Occasionally, soft hyphens sneak into the content of entries or even the pagenames; these should be removed.
  • People sometimes type {[, }] etc when they mean {{ / }}. It is useful to periodically scan dumps for instances of this. Here is some regex: ([^\[\{]\[\{[^\[\{]|[^\[\{]\{\[[^\[\{]|[^\]\}]\]\}[^\]\}]|[^\]\}]\}\][^\]\}]). Simply searching for ]} will not work, because there are many valid instances of it, e.g. {{term|foo|fooing|lang=und|[[gloss]]}}.
  • Every few months, check for instances of the common but nonstandard headers "Alternative form", "Alternative spelling" and "Alternative spellings" (which should be "Alternative forms") and "Usage note" (which should be "Usage notes"). Many other nonstandard headers exist, but none are as common as those. Also, no L1 headers should exist in the main namespace (language headers should always be L2, and all other headers should always be L3 or more).

To be monitored manually:

  • Uses of the language code aaa in translations tables are often vandalism.
  • pdc (Pennsylvania High German) and pdt (Plautdietsch Low German) need to be kept separate.
  • aja (Aja/Adja of Sudan) and ajg (Adja/Aja of Benin) need to be kept separate.
  • Check periodically for misspellings.
  • Check periodically that things in Category:English countable proper nouns aren't mislabelled common nouns.
  • Check for misuse of ي vs ی and ك vs ک in Arabic, Persian, Urdu, Azeri, and other languages: they look identical in certain positions and are often mixed up.

Useful search queriesEdit

All subpagesEdit

Subpages of Wiktionary:Todo :

2009Edit

English adverbs using "manner" in their definitionEdit

[This search] produces a list of such adverbs. What they often need:

  1. "===Etymology==="
    {{suffix|X|ly}} where X is the adjective from which derived.
  2. {{en-adverb}} (usually comparable)
  3. A definition worded without the word "manner" and preferably without the adjective from which it is derived.
  4. A Synonyms section using {{sense}} with an appropriate gloss.
  5. A Translations section using {{trans-top}} with the same gloss as the Synonyms section.

If you don't do all of these, please leave the word "manner" in a definition so it remains on the list until completed.

Other possible needs could be: usage examples, additional senses, membership in Category:English degree adverbs or Category:English sentence adverbs, removing redundant categories. Other thoughts may occur to you. DCDuring TALK 18:31, 24 November 2009 (UTC)

User:Robert Ullmann/Oldest redlinksEdit

In some cases creating the entry, but in a lot of cases these are typos (I fixed Vigin Islands for example) or stuff that should never have an article (abate into a freehold). Mglovesfun (talk) 12:44, 25 November 2009 (UTC)

Action of the verbEdit

I don't think "action of the verb to foo" is a good definition, google:"action of the verb" site:en.wiktionary.org gets 1470 hits (or the search thinks it does; in reality a lot less). Any chance of eliminating these? If so, would it have to be by hand, or is there a formulaic wording which would do the job? Mglovesfun (talk) 22:56, 20 July 2012 (UTC)

Updating Websters definitionsEdit

User:Visviva/Cobwebs contains 4 lists of entries that have markers of insufficient editorial attention since import from Websters 1913. It would be nice if, over the next 3 years, we could modernize these entries so that by the hundredth anniversary of the publication we could honor it as an inspiration for a modern dictionary instead of just relying on its often obsolete set of senses.

The markers that put the entries on the list should not be eliminated until the entry is thoroughly updated. The problems run the gamut: excessive reliance on "literary" usage examples, missing citations information, obsolete language, inclusion of related terms definitions, missing senses, superseded etymologies.

There are many basic words on the General Service List - core words - that suffer these problems. DCDuring TALK 00:28, 2 December 2009 (UTC)

Someone could use Wiktionary:Abbreviated Authorities in Webster and AWB to fix ambiguous author citations. It would be even better if pedia links were included. --Bequw¢τ 05:18, 2 December 2009 (UTC)
That's treating a symptom. Should a literary style quote from 100-200 years ago (or more) appear in a long entry for a basic word (vs on Citations page). The main disease is that many of our entries from Webster's, especially the long ones, have not had enough constructive contributor attention. They are more likely to get some additional hyperspecific sense added (poker, cricket, mycology, etc) than to get updated definitions. If cleanup only treats symptoms we will succeed in masking the serious disease. Or perhaps users don't need entries for such basic words or at least don't benefit from unabridged-dictionary-style treatment of them. DCDuring TALK 22:42, 2 December 2009 (UTC)
True. BTW have all the original pages been imported? What are all the Appendix:webster 1913:* pages doing? --Bequw¢τ 03:39, 4 December 2009 (UTC)
I have assumed that the pages have been imported. The remaining terms mostly don't seem a high priority. I wouldn't add that list here until we do some of the more important or urgent work. Perhaps we need a warning to discourage uncritical importing without updating the wording and checking for obsolescence. DCDuring TALK 11:34, 4 December 2009 (UTC)

Acronyms classed abbreviationsEdit

Category:English abbreviations includes many erroneous entries (mostly the full-cap ones). Sometimes the mistake is that someone used "abbreviation" in a broad sense of the word and the more correct acronym or initialism should be used here. If you are not sure if it's an acronym or initialism, put as acronym but leave a {{rfap}}. Another mistake is that geographic codes for countries and provinces/states should be made Translingual symbols. --Bequw¢τ 00:27, 27 December 2009 (UTC)

2011Edit

Misemboldened material in quotes and usage examplesEdit

How hard would it be to create a clean-up list of instances of emboldened text in quotes and usage examples that are not identical to the headword, especially in the entry, but also on any citations page? Is this a task for Autoformat? DCDuring TALK 17:28, 18 January 2011 (UTC)

Not all bold text different form the headword is incorrectly bold. E.g., [[say]] has boldface said and says in quotes/usexes, and foreign words have boldface transliterations and translations.​—msh210 (talk) 17:34, 18 January 2011 (UTC)
It would be lovely if the matching was against any inflected form as well.
How about bold text with a space when the headword is a single word? That would capture cases where cites were copied from an MWE to one or more component words without adjustment. DCDuring TALK 18:28, 18 January 2011 (UTC)
What's an MWE? (Not seeing it at [[MWE]] (redlink), [[Appendix:Glossary]], or [[Wiktionary:Glossary]].) In any event, I'm sure there are languages where a cited term can have a space even if citing a word without a space. (Perhaps "auf-" verbs in German, cited as ".. auf"?) I seem to recall seeing even some English entries that do this: cite a space-including form like dump truck for dumptruck. (We don't like those at RFV, but they're useful to show early use, especially where the version in the cite is a form-of entry.)​—msh210 (talk) 19:26, 18 January 2011 (UTC)
Multi-word entry. I picked it up from BP or RfD. It doesn't prejudice the case as to idiomaticity.
As a manual cleanup item we should be able to catch those. It is English entries that have the most usage examples and citations that might need the clean up. I don't really understand why we have separate entries (vs, say, redirects) for alternative forms, but conflate them all in entry-page citations. If we want to conflate them somewhere, why not in citation space with transclusion or redirection? We have a problem with the multiple purposes of citations: usage example for a sense, attestation of form, history of lemma. Citations that, say, show a headword to be an adjective are not the best for sense usage or history of lemma. DCDuring TALK 20:36, 18 January 2011 (UTC)
Yeah, if it's to be done manually, okay. The false negatives (positives? I mean badly bold entries) shouldn't overwhelm the true ones.​—msh210 (talk) 20:48, 18 January 2011 (UTC)
When I mentioned AF I was thinking of the problem-detection-and-marking function it or its successors have. DCDuring TALK 23:05, 18 January 2011 (UTC)
Not too many in Latin script: /bolded spaces in single-word entries. --Bequw τ 02:58, 19 January 2011 (UTC) (Slightly edited by msh210 18:59, 19 January 2011 (UTC).)
I don't see any instance of blanks in emboldened text in the first few I looked at. Of course, I am a little suspicious that they are all one-character entries. Is there a problem in the selection logic? DCDuring TALK 12:45, 19 January 2011 (UTC)
Should be fixed. I also shows now the matches so that you don't have to guess as much. --Bequw τ 16:44, 19 January 2011 (UTC)
Thanks. The 5000+ entries fits my expectations better. Showing the matches is a help. DCDuring TALK 17:25, 19 January 2011 (UTC)
I just went through five dozen or so Hebrew entries and discovered that virtually all of them have the "error" in a translation and were actually fine. Any way you can re-run the script, skipping boldfacing in #*:: and #::, please? (Or maybe there's a better way to omit translations.)​—msh210 (talk) 18:06, 19 January 2011 (UTC)
Much less serious, because visible from the page, are the instances of "more" and "most" in the inflection line, and dates in quotes. I am only going for what are probably English entries because the list doesn't seem to have a very high yield of real problems for non-English because of what msh210 describes. DCDuring TALK 18:50, 19 January 2011 (UTC)
"More"/"most" is good to list: the inflection^Wheadword line needs a template then. No?​—msh210 (talk) 18:56, 19 January 2011 (UTC)
The instances I saw were within templates. DCDuring TALK 01:56, 20 January 2011 (UTC)
Reran not matching those line prefixes (or ##:: or ##*::). Ones already removed were kept out. --Bequw τ 01:37, 20 January 2011 (UTC)

Category:Pages with incorrect ref formattingEdit

Found this by chance. Some of the entries tagged are in the main namespace. Mglovesfun (talk) 13:54, 31 January 2011 (UTC)

/phrases not linked to from componentsEdit

​—msh210 (talk) 07:04, 17 February 2011 (UTC)

Category:Pronunciation templates without a pronunciationEdit

I created this to find all uses of {{IPA}}, {{X-SAMPA}} and {{enPR}} where no first parameter is given. Many of these are simple typos and can be fixed in a few seconds, which is nice. Mglovesfun (talk) 15:34, 20 April 2012 (UTC)

2012Edit

Category:en:ZoologyEdit

This shouldn't contain names of animals, as animal names aren't limited to zoology. Names of animals should be in more specific categories, such as Category:en:Animals. Mglovesfun (talk) 09:38, 3 August 2012 (UTC)

The problem with this as a cleanup list is that too many of the items that are in the category are supposed to be in the category. I sorted out most of the capitalized terms, all those I was sure belonged in Translingual. I may try with those with Latinate endings, in search of species epithets that should be Latin. DCDuring TALK 17:59, 14 August 2012 (UTC)
Or Translingual, there's no consensus on this, apart from in your head, but I for one don't care enough to take any action on the matter, and based on observation, nor does anyone else. Mglovesfun (talk) 21:35, 14 August 2012 (UTC)
No one else active seems to care about any of these, except in principle, and not so much that either. Someone can mass-change the Latin stuff if ever a consensus emerges in favor of making New Latin a dialect of Translingual. I mark the senses of the New-Latin-coined Latin terms with {{New Latin}} and try to define the sense using {{n-g}} and the word epithet, so it should be easy enough to identify them. At the moment they might be in English, Translingual (not so many now), or Latin, but most likely they are redlinks, yet to be defined, at least in the form used in species names. Maybe the forms should be defined as inflected forms of unattested New Latin lemmas. DCDuring TALK 22:17, 14 August 2012 (UTC)

/Entries containing modifier lettersEdit

- -sche (discuss) 20:13, 7 September 2012 (UTC)

/External links to WikipediaEdit

RuakhTALK 15:04, 30 September 2012 (UTC)

/Entries containing non-template contextsEdit

- -sche (discuss) 07:56, 6 October 2012 (UTC)

/Citations without citationsEdit

(Not really. It's actually Citations:-pages without {{citation}} or {{citations}}. But this way is catchier!) —RuakhTALK 14:59, 6 October 2012 (UTC)

User:DTLHS/translation indent analysisEdit

Odd indentation in translations tables.

Category:en:MineralogyEdit

Most of the entries in this category are simply minerals and belong in Category:en:Minerals. Ultimateria (talk) 04:39, 30 December 2012 (UTC)

This doesn't seem worth doing without resolving whether visible tags should be used for topics, rather than for usage context. At least some resolutions of this would involve replacing the "context" label with a hard category in some cases where the term has entered broader use than "mineralogy" indicates. DCDuring TALK 16:38, 20 June 2013 (UTC)

/Former name ofEdit

Most entries which contain the phrase "former name of foo" should be changed to use {{obsolete}}, {{historical}} and {{qualifier}} and/or {{defdate}} ([1], [2]). "Stalingrad" isn't the word for "a former name of Volgograd", it is a former name of Volgograd. The word for "a former name of Volgograd" might be an "exvolgogradonym" or something. - -sche (discuss) 08:53, 31 December 2012 (UTC)

Wiktionary:Todo/former name of DTLHS (talk) 06:54, 2 January 2013 (UTC)

2013Edit

/Representative entriesEdit

Quoting from the sub-page:

In April 2011, we voted to "[d]eprecat[e] the practice, rule or guideline that allows representative entries to be placed at the start of the list of members of categories" by using sort-keys that start with an asterisk or a space (see Wiktionary:Votes/2011-04/Representative entries). However, there does not seem to have been any drive to fix existing such entries, and there remain a few thousand of them.

The below 2,671 entries were found by examining the last database dump (31 December 2012), scanning page-text for the Perl regex \[\[ *Category *:[^\]\[\|]+\|[ *] (meaning an explicit in-entry category-link where the first character after the pipe is an asterisk or a space).

Does anyone object to my fixing (many/most of) these with a bot?

RuakhTALK 19:27, 1 January 2013 (UTC)

No far from it, dunno why I haven't done this myself in fact. Mglovesfun (talk) 19:29, 1 January 2013 (UTC)
Has this been done? Mglovesfun (talk) 10:36, 27 May 2013 (UTC)

/All sensesEdit

These entries contain the gloss "all senses", which should usually be replaced by an actual list of which senses. - -sche (discuss) 05:55, 24 May 2013 (UTC)

Thanks! I wish I'd've thought of this. Mglovesfun (talk) 10:33, 27 May 2013 (UTC)
As -sche indicates, there are times when it isn't worth the effort of splitting up synonyms etc. Doing so might make it easier for machines to read the information, but not for humans to either enter it or read it. DCDuring TALK 16:41, 20 June 2013 (UTC)
The thing is, when you add a sense to the English word, you are implicitly adding that sense to the foreign language word as well. Same applies to deleting a sense. Mglovesfun (talk) 16:45, 20 June 2013 (UTC)

/Latvian adjectivesEdit

See Wiktionary:Grease pit/2013/May#Bot_request:_Latvian_adjective_homographs_missing_second_headers. - -sche (discuss) 18:32, 25 May 2013 (UTC)

Pronunciation problemsEdit

At User:-sche/pronunciation problems, I have attempted to make a comprehensive list (additions welcome) of all possible problems which may exist in pronunciations sections, to aid those who would create lists of and fix entries which suffer from those problems.

This incorporates User:Robert Ullmann/Pronunciation exceptions (from June 2010), Wiktionary:Todo/non-standard pronunciation transcriptions (from July 2012) (which is finished, but for a few Egyptian entries, but could be re-run periodically), and /Entries containing obsolete IPA characters (from September 2012) (which is also finished, but should be re-run periodically).

- -sche (discuss) 06:27, 25 June 2013 (UTC)

/Entries containing ttEdit

These entries contain <tt> not as part of the formatting of a link to a Usenet/Google group. See page for more details. - -sche (discuss) 07:31, 27 June 2013 (UTC)

/Non-templatised gendersEdit

These 3027 pages use ''f'', ''m'', ''n'', ''c'', ''p'' or ''pl''. In many cases, the pages could be updated to call {{g|f}}, etc. - -sche (discuss) 21:22, 3 July 2013 (UTC)

Wiktionary:Todo/Single-line quotesEdit

Most of these pages include a single-line quote where the the bibliographic information and the quoted material are the on the same line (when they should be separate). Let me know if there's an easy way to get rid of more false positives. The multi-line regexp that I used was: ^#[^\n\r]+'''[0-9]+[^\n\r]{15,}—. --Bequw τ 14:07, 13 August 2013 (UTC)

Wiktionary:Todo/Slovene masculine translationsEdit

> This is the list of entries, as of the last database dump, that contain Slovene translations with the gender m ("masculine"). They should most likely be changed to use either m-an (+ "animate") or m-in (+ "inanimate"), since that distinction has grammatical consequences in Slovene. (?)

RuakhTALK 14:34, 11 September 2013 (UTC)

2014Edit

/Linked language names in trans tablesEdit

AutoFormat and its successor, KassadBot, unlink these like so, but it seems they only edit entries which show up in recent changes. Here is a list of 2435 entries with linked language names in trans tables which a bot could make a pass through. - -sche (discuss) 09:23, 31 January 2014 (UTC)

Was gonna say, just bot 'em. No need for a wikified list. Renard Migrant (talk) 17:27, 17 January 2015 (UTC)

/Mandarin translation not nested under ChineseEdit

These are Mandarin translations not nested under the Chinese section. They probably have to be cleaned up manually. Matthias Buchmeier (talk) 18:17, 31 January 2014 (UTC)

I have updated the list from the recent dump. Matthias Buchmeier (talk) 10:58, 11 March 2017 (UTC)

2015Edit

Category:Entries with non-standard headersEdit

I've tagged all the ===Abbreviation===, ===Acronym=== and ===Initialism=== headers (by bot). There are now 9,000 entries. Can we chip away at it a bit? 5 a day per person, you'd be surprised what you can do. Especially if people work in languages with which they are familiar. Renard Migrant (talk) 17:25, 17 January 2015 (UTC)

What's wrong with those headers? --SuperWonderbot (talk) 17:39, 17 January 2015 (UTC)
They're shit. Renard Migrant (talk) 20:29, 17 January 2015 (UTC)

/Pages containing LTR marksEdit

In many cases, these are unnecessary and cause problems. - -sche (discuss) 18:16, 21 January 2015 (UTC)

What are LTR marks and how should one improve the entry? --A230rjfowe (talk) 21:00, 15 July 2015 (UTC)

/Pages containing RTL marksEdit

In many cases, these are unnecessary and cause problems. - -sche (discuss) 18:16, 21 January 2015 (UTC)

What are RTL marks and how should one improve the entry? --A230rjfowe (talk) 21:00, 15 July 2015 (UTC)

/Page with untemplatized etymologiesEdit

A partial list of pages where at least one language section simply states, in plain text, without using {{etyl}}, that it derives from German, French, Latin, Greek, Ancient Greek, Chinese or Spanish. - -sche (discuss) 17:43, 25 January 2015 (UTC)

/Last line in trans table could benefit from xteEdit

The last line of one of the translations tables or checktrans tables in each of these entries ends in ]] and could benefit from being adapted to use {{t}}, e.g. via the xte gadget. (More adept searching could catch instances where other lines would benefit from xte, but this is a decent start.) - -sche (discuss) 22:47, 9 February 2015 (UTC)

/North AmericanEdit

A list of entries which are labelled as being Canadian, or American, but not both. It is likely that many should in fact have both labels. See Wiktionary:Beer_parlour/2015/March#North_American_English_vs_Canadian_and_American_English for a bit of background. - -sche (discuss) 05:00, 7 March 2015 (UTC)

Erroneous Greek charactersEdit

Any place that the character ϕ is used in place of φ or ϑ in place of θ in a string that is marked as being grc or el should be listed so that an editor can look them over and fix mistakes. I just found one lying around in a {{term}}, which made me think that these shouldn't be overly hard to find. —Μετάknowledgediscuss/deeds 21:01, 12 May 2015 (UTC)

@Metaknowledge: Never knew this page existed. Ironically I came across this why searching for incorrect uses of ϕ. For future reference, here is the search for ϕ and here is the search for ϑ (other incorrect characters are ϖ ϛ ϰ ϱ ϐ ϵ ϲ ϗ ȣ; there may be more). --WikiTiki89 13:20, 21 April 2017 (UTC)
If nothing has been done about this, I can make Module:script utilities search for these characters when it tags text, and add a tracking template or a category. — Eru·tuon 23:50, 20 May 2017 (UTC)
@Metaknowledge, Wikitiki89: Done.Eru·tuon 00:02, 21 May 2017 (UTC)
@Erutuon: It's never done, people will keep adding them. --WikiTiki89 15:03, 22 May 2017 (UTC)
Oh sorry, you were referring to having Module:script utilities search for them. It's not that nothing has been done, I went through and removed over a hundred of these. But again, people will keep adding them. --WikiTiki89 15:05, 22 May 2017 (UTC)
Right. I just found one in polypharmacy... 🙄 — Eru·tuon 18:14, 22 May 2017 (UTC)

User:DTLHS/cleanup/bad etymologyEdit

Etymology problems - usually easily fixed when the wrong language code has been used --A230rjfowe (talk) 23:17, 30 July 2015 (UTC)

Not click charactersEdit

All over the dictionary, e.g. in the name and content of !nawas and in this translation, ! turns up for ǃ, and I wouldn't be surprised to find other substitutions for click consonants. The best way I can think of to find such uses is: create a list of all languages that use clicks, or as a presumably easier-to-make approximation of that a list of all Khoisan languages, then search a database dump for all translations, language sections, and {{m}}/{{l}}s of those languages that contain !. I've just cleaned up the few pages which misused ! in their pagenames (only 31 pages on Wiktionary used ! in their pagenames at all). - -sche (discuss) 18:42, 25 August 2015 (UTC)

2017Edit

"from from"Edit

Something to do: search for instances of "from from", and fix ones that are errors, like at fine. - -sche (discuss) 09:09, 15 May 2017 (UTC)

Not really unisex namesEdit

At User:-sche/names are lists of entries that are in both 'male given names' and 'female given names' but not yet 'unisex given names' (they should also be in the third cat if they are in the first two), and also entries in 'unisex given names' that are not in both 'male given names' and 'female given names' (so they are missing a 'female'/'male given name' definition line, or are not really unisex). - -sche (discuss) 06:52, 19 May 2017 (UTC)

Check IDsEdit

As discussed at Wiktionary:Grease pit/2017/May § Adding ids to enable linking to headwords, we need to check for sense ids in {{senseid}} and the |id= parameter of headword templates that are on the same page and have the same language and have the same id string: that is, those that would create the exact link when input into an entry linking template. Each sense id for a given language on a given page should be unique. — Eru·tuon 16:57, 19 May 2017 (UTC)

Derived from cognatesEdit

It might be fruitful to check for instances of "cognate to"/"cognate with" followed by {{etyl}} where the second unnamed parameter is set to something other than -, as in [3], which results in the entry being categorized as if it derived from the cognate. - -sche (discuss) 23:41, 20 May 2017 (UTC)

Usage note template namingEdit

User:-sche/Usage note templates lists some usage-note templates which could be moved to fit our usual naming scheme, as described on the page and [4]. - -sche (discuss) 22:01, 26 May 2017 (UTC)

Wiktionary:Webster 1913Edit

Wiktionary:Webster 1913 has a small bunch of pages which haven't been integrated into WT yet. With a small push, this 10+ year project can finally be put to bed. -WF

Possibly mislabeled affixesEdit

Wiktionary:Todo/interfixes: These look like interfixes, but are labelled "prefixes" or "suffixes". - -sche (discuss) 19:57, 8 June 2017 (UTC)

Pronunciation audio filesEdit

User:DerbethBot/Add manually: DerbethBot adds pronunciation files to entries, but some audio files need to be added manually. (See also User:DerbethBot for more info.) -- Curious (talk) 12:00, 11 June 2017 (UTC)