Wiktionary:Per-language pages proposal

This page is meant to collect proposals, requirements, known problems and other information, regarding a possible future change in Wiktionary to have each page contain an entry in one language only.

Discussions should be held on the talk page, to keep this page clean and clear.

Overview edit

Currently, Wiktionary divides each main-namespace page into sections by language. Every language that has that word gets its own section on the page.

This situation is historical. When Wiktionary was first created, there were no other large wiki-based dictionaries, and Wikipedia (an encyclopedia) was the primary model for content creation and management for most new wikis on the internet. In its earliest days, Wiktionary did not explicitly support all languages, and it was primarily an English-only dictionary. Other languages were gradually added, and when entries for English already existed, sections for those languages were added to the existing pages. The format evolved naturally, but it has some major drawbacks.

Doesn't match typical use cases edit

Only a small minority of users will ever be interested in getting an overview of a word in different languages. Most users will be using Wiktionary for three tasks:

  • to look up the meaning of an English word
  • to translate back and forth between English and another language (English-to-foreign is done through translation tables, foreign-to-English through definitions in foreign-language entries)
  • to find auxiliary information about a word in any language, such as etymology, grammar etc.

In each of these cases, the language that is being looked up is already known, and most other online multilingual dictionaries treat the target language as the primary choice. This means that to look up a word, you first specify what language you're looking up, and then say what word to look up in that language.

The current situation on Wiktionary is the reverse of that: first you look up a word, then you select the language among the sections that are available on the page. This works fairly well when you look up a single word, provided you are patient enough to find the word on each page. But the more words you look up, the more time you spend finding the correct section on each page, and this can be rather frustrating, especially when better alternatives are available elsewhere on the internet.

The problem here is that we require users to make the choice of language each time they view a page. This doesn't match what people typically do with a dictionary. They normally know in advance what language they want, and once they have selected it, they will want all words they look up to be in that language. Scripts like tabbed languages mitigate this to some degree, but because they are client-side, they have their own problems and are not as robust as proper server-side support for user needs.

Leads to technical barriers edit

Over time, we have developed some solutions to mitigate problems, including:

  • Tabbed languages.
  • Gadget-run orange links for existing pages that do not have a section anchor for a specific language.
  • Links to language sections in {{l}}, {{m}}, {{m+}}, {{head}} and others.
  • Templates like {{l-self}}.

But these often have their own problems: tabbed languages and orange links are both client-side scripts, and are relatively fragile. They rely on backwards-parsing our content to figure out what we meant in the first place, but couldn't express because of limitations in our content model. And these solutions don't really fix that core problem, which is that we just aren't storing information in a useful way. It's counterproductive to keep trying to devise new ways to work around limitations in our content model that could easily be fixed if we changed that.

This proposal is also the only long-term solution to memory errors on long pages, described at Wiktionary:Lua memory errors.

Conclusion edit

At a fundamental level, Wiktionary entries are still about words in all languages and not about words in one language. This doesn't match how users expect to use a dictionary, and has led to technical measures which try to work around a broken content model. However, there is really only one proper workable long-term solution, and that is to change the content model itself. This page is, hence, a proposal that:

  • Wiktionary entries should be organised around words in a given language. Entries for the same word in different languages should be placed on different wiki pages.
  • Changes should be made to how Wiktionary users look up content, so that they can find the information they want more easily. Preferably, we should look to other online dictionaries for inspiration.

Consequences edit

Consequences of implementing this change:

Advantages edit

  1. Watchlist becomes more useful. Edits to (for example) en/foo won't appear in the watchlist of someone who only cares about fr/foo.
  2. "What links here" becomes more useful. Terms linking to language X won't appear in the WLH page of terms linking to language Y. Consequently, it might become a useful tool for searching for related terms, descendants, derived terms and whatnot.
  3. Page access will be much faster. If a person only wants to read information about the Portuguese word a they can load the Portuguese content alone, instead of the Portuguese content with content in 77 other languages they don't care about.
  4. More flexible use of redirects. The existence of a word in language X won't prevent us from redirecting the same word in language Y.
  5. No need to provide language codes to templates, if templates and modules can retrieve the language from the current page title. This means:
    1. Less typing, and less room for errors. This leads to less incorrect categorisation as well.
    2. Lowers the barrier for editors who no longer need to remember what all the codes mean.
    3. Creating identical entries will be easier. If multiple entries are identical, as is common for terms in closely related languages, you only need to write the wikicode once and paste it into the other language pages without the need to manually change the lang codes.
  6. If a link is blue, you know that the entry exists for that language. No more need for orange links, which is a feature most users do not get to use.
  7. More flexible use of section linking. Currently, links to part-of-speech sections like "Noun" are avoided because they are ambiguous as to the language. With per-language pages, those will be more reliable, although not perfectly so because an entry might still have multiple noun sections.
  8. Makes patrolling easier, if the language can be identified from the page name. You can focus on pages in a given language, while skipping any you're not familiar with.

Disadvantages edit

  1. Conversion will be a nightmare. Millions of pages to create and move, thousands of templates to rewrite, new software to create, loads of new practices to decide, twenty years of tradition thrown in the dust. How do we undertake the transition without making Wiktionary unavailable during its certainly long duration?
    1. How do we deal with almost 2 decades of page history in a sane way? How will history be maintained if we split pages?
  2. Adding multiple entries at once for a single word will be more difficult. It will require creating an entry for each word separately.
  3. Not clear how to handle Translingual sections. An entry like cm is relevant to someone interested in French as much as it is to someone who is interested in English. However, if this change is undertaken and a person searches for something like a with French as the language, they might never see the Translingual entry which may contain the sense they are looking for.
    • Ideally, translingual sections would be transcluded onto every page regardless of language, but this would need extra software support. Alternately, they could simply link to it in a prominent way.
  4. If the new structure is done by convention, that convention will need enforcing and checking, to make sure nobody creates entries with names that don't fit the pattern. If we can get the software to check this for us (perhaps via abuse filters), this is less an issue.
  5. Links in definition lines, which are currently raw links and meant to point to English (and sometimes even Translingual) sections, will have to be replaced, either with a link template like {{l}} or possibly with a template that encloses an entire definition.
  6. Disambiguation pages would become necessary, which would require regular bot-updating.
  7. A new search interface would be needed where the language of interest could be input, in a straightforward way (similar to the translation-adding gadget).
  8. Some languages have slashes as part of their standard orthography, e.g. Iraqw /ameeni, or have slashes in special situations, e.g. English /b/tard. These would need some special handling.

Other tasks that will be necessary edit

  1. Consider removing all bot flags, as the structural alterations may cause bots to malfunction. The flags can be readded, preferably without a vote, once the operator updates the bot or confirms it is unaffected by the changes.
  2. Update policy pages, especially Wiktionary:Entry layout explained, to reflect the changes.
  3. If subpages are used, the wiki configuration file needs to be updated so namespace 0 allows subpages.
  4. All interwiki links (stored at Wikidata) will have to be fixed, as well as incoming links from Wikipedia.
  5. All raw links must be converted to language-specific links, including in definitions, which would probably mean that all definitions would have to be wrapped in a template.

An attempt to make a full list of tasks, in roughly chronological order, is at Wiktionary:Per-language pages proposal/Tasks.

Implementation ideas and proposals edit

Page structure edit

Subpages, language/word edit

Advantages:

  1. Supported by the current MediaWiki software.
  2. Structures pages and URLs the way people look them up. Language first, then word.
  3. Linking to another word in the same language is relatively easy, using ../new word. This doesn't require knowing the current language.
  4. Allows the main page of each language (the one with no subpage) to be used as a portal or index.

Disadvantages:

  1. Doesn't mesh easily with our existing structure. If a user looks up "word" without a language, then it's not clear where to send the user to, because the mainspace only contains pages beginning with a language (name or code). Thus, all links will require a language to be specified unless a workaround is found.
  2. Because it uses subpages, certain page titles will be reinterpreted by the software.

Subpages, word/language edit

Advantages:

  1. Supported by the current MediaWiki software.
  2. Resembles our current structure more closely, and therefore easier to convert over, and easier for existing editors and users to get used to.
  3. Allows the main page of each word (the one with no subpage) to be used as a disambiguation page.
  4. Because of the above, existing links will not break; they just lead to the disambiguation page. This is not as convenient, but at least it's not broken, and we could use a tabbed-languages-like solution to bridge the gap until all links are fixed to point to the correct language subpage.

Disadvantages:

  1. Represents a compromise, because it still oriented somewhat towards the word-first structure that is currently used.
  2. Less ideal for linking; linking to another word in the same language requires new word/{{SUBPAGENAME}}. This is mitigated by our extensive use of linking templates with language codes, which can be converted to this new format easily.
  3. Because it uses subpages, certain page titles will be reinterpreted by the software.
  4. Every time a new entry is created, the hub page must be created as well. This could be solved by bots, or by making modifications to the software so that it auto-generates these pages if they don't exist.

What should go on the "main" page of each word?

Disambiguation page edit

This would be a list of all subpages, but no content.

Advantages:

  1. Clean and simple, easy for users to find the information they need.

Disadvantages:

  1. Users have to click through to a specific language before seeing anything informative if they don't have JavaScript enabled, or click on an old link from elsewhere before the change.
  2. There would be no way of easily finding a word if the user does not first know which language it belongs to. This is probably not a common occurrence, however, and other dictionaries assume the language is known too.

Content access edit

Instead of a single input for searching, there needs to be one for the term and one for the language. The language should be remembered, as the user's "current language".

However, not all searches are done through the search box. People may have search box plugins for Wiktionary in their browser, or they might just find it more convenient to type the word into the URL. In addition, not all links on Wiktionary use linking templates like {{l}} yet, so we must assume for the foreseeable future that some of the incoming lookup requests that Wiktionary receives will not have a language specified. There needs to be a convenient way to handle these cases.

Default to current language edit

This would be like tabbed languages, but server-side. Once the current language is known, if the user types a word by itself, bypass any disambiguation pages and always send the user to the word in that language.

If there is only one language for a word a user looks up, what should be done if that's not the user's current language? Sending them there regardless would be surprising and confusing for a user (look up Ukrainian, get Russian?). Sending them to the nonexistent page for that language is probably better.

A disadvantage is that this breaks "wikified" text, such as definitions. These would end up linking to the current language, not to English.

This would not affect links to languages that have a language specified, so that cross-language links (translations, descendants, etymologies) should work normally. Should these links change the current language (the way tabbed languages does), or should it be kept? That is, if you visit an English entry, should your current language then be set to English, so that all further lookups are assumed to be for English terms?

What should the current language be when none has been set yet, because the user started a new session? Should it be English, or should it default to disambiguating all languageless links?

Assume languageless links are English edit

Another possibility is to assume links with no language are English. This makes sense from a wikification point of view, and most of our entries still don't link to definitions using language links, so these will not be broken by this. But it's less convenient for lookup.

Past discussions edit