User:Juho/Thoughts on improving syntax and software for wikipedia

Thoughts on improving syntax and software for wikipedia to advance the project

I've glanced at the way Wiktionary works at the moment, and I think that this technological platform and free form syntax works ok for Wikipedia, but poses a few problems in Wiktionary.

A clearly defined syntax combined with some additional software to build automatical dependencies could:

Reduce the amount of unnecessary, redundant manual editing required to reach the final goal of creating a dictionary and thesaurus in every language.

Provide protection from vandalism.

Question:

If a word is know to be ONLY a noun in languages a, b and c
and it is also known that in languges a, b and c the word in UNAMBIGUOUS...

then why the %&€#&% does it have to be manually translated

from a to b
from a to c
from b to a
from b to c
from c to a
from c to b

in order to achieve a dictionary in all three languages of the planet?????

If you have n languages and one word which corresponds to exactly n-1 words in the other languages then in the current system you'll have to make n(n-1)=n^2-n entries. If you use the symmetry between 2 languages (e.g. you know free<-->frei) you'll still have to write (n^2-n)/2 entries. If you use transitivity (free<->frei<->libre) you will only have to make n entrys. But things get more complicated once one word has more than one meaning. As someone noted somewhere else you should translate meanings and not words. But sometimes there is no single word for translation. Then you need to translate to a phrase.

But to add a lot of those phrases to a language is also uncomfortable....

- Henryk911 12:20 Feb 22, 2003 (UTC)

Example:

If a bottle (noun) (English) known to be only a noun in English and is also known, that it translates to Finnish only to pullo (noun) and only a noun in the Finnish language, then why isn't this correlation automatically propagated to other languages?

For example: If someone enters flaska (noun) known to be only a noun in (Swedish) in the Swedish Wiktionary and translates it to English as a bottle, why can't there be a software component, that automatically adds the Finnish translation into the Swedish Wiktionary? Additions made to Wiktionaries by software could have a different colour, until some human has checked them to be sane.

Anwser:

Because there is no shared namespace, where all language namespaces would recide, and in the namespace of a language there is no namespace for classes of words and there is no agreement to mark-up whether a word in language X, class Y can have many distinct meanings, in which case it would be excluded from automated propagation.

and

Because there is no definitive syntax guide (and therefore the syntax cannot be enforced or utilised by software) to do such automatic propagation of information, which is just silly.

Problems for implementing such a scheme such as:

- Words can be of multiple class: English "can" is a verb and a noun, but this could be solved with consistent syntax/markup.

- Words can mean multiple things in the same class: The word for peace (noun) and world (noun) are the same in Russian (Mir). This could be solved by asking people to mark if a word means many things (true/false) in the same class, in which case it would not be propageted by software.

- All languages don't have follow the western style of classifying words as noun, verb, adjective and so forth.

- Other problems?

Proposition

I propose that we use namespaces for languages and classes of words. This would of course require some changes in the software.

for example:

en:noun:true:bottle
fi:noun:true:pullo
se:noun:true:flaska

language:class:this_is_the_only_meaning_in_this_class:word

...or something like that. I'll elaborate on this.

I agree on the namespaces for languages. But the concept of nouns and so on are not common to every language, so that you'll lose flexibility. Maybe you should move this discussion to Beer parlour! Henryk911 02:35 Feb 22, 2003 (UTC)

This discussion sounds like it's old, but I thought I'd chime in anyway. I'm fairly new to the Wiktionary, and I'm already beginning to worry a little bit about whether or not a good XML definition would save us thousands of hours of work down the road. I disagree that an XML DTD would eliminate any redundancy in the translation area. Lanaguages are just to messy, and there's no such thing as a noun with a single meaning in my opinion. Having said that, I think some more sturucture in the way we record translations would make a huge difference. I'm also concerned about my favorite issue (quotations), which could overwelm the other data as well. Wouldn't it be nice if we had enough structure to display them conditionally? -- CoryCohen 03:09, 9 Sep 2004 (UTC)