Wiktionary:About Japanese-English bilingual
Japanese-English bilingual wiktionary?Edit
As suggested in the Beer parlour, I have started this page to explore options for having a wiktionary-based Japanese-English bilingual dictionary based on an upload of the EDICT (or more likely JMdict.xml) files
Original Beer Parlour DiscussionEdit
The following is the original discussion in the Beer parlour, where it is under: Bilingual Dictionaries, esp. Japanese-English
I want to raise the question of bilingual dictionaries within the Wikidictionary environment. I have read all the relevant pages I can find, but cannot find any specific mention of it. (I note that Wiktionary describes itself as a "multilingual dictionary", but what I see is a set of monolingual dictionaries.)
I have a motive for raising this. For the past 14 years I have been running the largeish EDICT Japanese-English dictionary project (see: http://www.csse.monash.edu.au/~jwb/edict.html) The component files of that project are used in a lot of products and servers, and there is a WWW system for putting in corrections. Until now I have been the sole editor/compiler, but I am seriously considering opening it up to a more public access/contribute/edit arrangement, and hence the Wiki community may well be a good environment.
While my interests are primarily Japanese-English, the XML version of my main file has Japanese-German, Japanese-French, etc. as well. These parts are folded in (see http://www.csse.monash.edu.au/~jwb/j_jmdict.html)
On the matter of size, I look after the following:
- EDICT/JMdict- general (103,000 entries)
- COMPDIC - computing & telecomms (15,000 entries)
- ENAMDICT/JMnedict - Japanese proper names (511,000 entries)
- KANJIDIC - kanji database (14,000 entries)
So, is there scope to fold one of more of those files into the Wiktionary environment? Is there interest in doing it? How does one go about it?
Looking forward to seeing what people think about it. JimBreen 03:24, 14 July 2005 (UTC)
- Hey, I've found your dictionaries to be most useful (heck, with them I havnt had to worry about buying dead-tree resources for Japanese yet).
- Anyway, what wiktionary is trying to be is both a monolingual dictionary and a bilingual dictionary at once: the English wiktionary is both an English dictionary and an English-everything everything-English dictionary; the German wiktionary is a German dictionary and a German-X, X-German dictionary, etc.
- The addition of your information would be most valuable... what you would basically do is set up or find someone to set up a bot to add the material, formatted according to our standard article format. (For the Japanese-German, Japanese-French, etc. you will want to ask on the relevant other language wiktionaries—general consensus is that kind of information doesn't belong on the English wiktionary. Certainly the Japanese wiktionary would benefit from all this information as well.) —Muke Tever 06:30, 14 July 2005 (UTC)
- Thanks, Muke, but if the only way to go involves going over to the ja.wiktionary community, knocking on the door, and saying "can I roll this heap of Japanese-English entries into your pitch", well I'd probably just back off and go and do my own thing.
- I don't want to be snobbish, but (a) the ja.wiktionary people have spent several years going nowhere (only 2,000 entries, and just look at the deep and meaningful one for 花), but FAR more importantly (b) bilingual lexicography is a very different animal to monolingual, and cannot ever be seen as the sole province of the native speakers of either language. Moreover the principles of compilation, layout, etc. are rather different.
- What I am really trying to sound out is a bilingual structure within wiktionary as a whole; not the addition of multilingual material to the English dictionary, or the Japanese one. I'm raising it here because clearly most of the drive within Wiki is coming from English speakers. The standard article format, for example, is not really suitable for bilingual material, and particularly for Japanese where there are often multiple valid variants of a word. JimBreen 07:17, 14 July 2005 (UTC)
- Well, I can't say much for ja.wiktionary, not being part of its community, but it (like all of the non-en wiktionaries) was only started a year ago and actually had its database frozen for most of its existence due to internal disputes, so it can't be faulted for not having too much progress yet.
- True, but that dispute is a concern in itself. JimBreen 01:08, 15 July 2005 (UTC)
- At any rate, the bilingual structure is not, as far as I'm aware, intended to be any different from the monolingual structure; Wiki is not paper, and the aim is to have our article on 花, for example, to be as or more comprehensive than an article on 花 in any massive monolingual Japanese dictionary. (However a big obstacle is that most individual Han characters were inserted (as characters, not words) in a weird format by a bot years ago and most havnt really been working on trying to convert them into ordinary article format, AFAIK. This is something I was working on when I was still a regular contributor here.)
- Also that 花 entry is a case of an entry relevant to a "character" dictionary; not a "word" dictionary. To illustrate what I mean, compare the 花 entry in KANJIDIC and EDICT. In a language like Japanese you need both forms, and they should be interlinked, but for a number of reasons the format and treatment are different. JimBreen 01:08, 15 July 2005 (UTC)
- As for the multiple valid variants... that's something the English wiktionary has not really worked out yet. Current practice, as far as I know is either "every variant gets an article" or "every variant gets a #redirect to an article," depending on the editor and whether a #redirect is feasible, neither of which I like. —Muke Tever 16:23, 14 July 2005 (UTC)
- Hi Jim,
- What we try here is to describe all words of all languages in English. If you want help in creating a bot that can interpret your xml files and add the information contained within them, I can add functionality to the pywikipediabot framework to accomplish that. It won't be a bilingual dictionary though. It will simply be a collection of entries with links to their English counterparts. Don't ask us to change the way we work. It has been multilingual from the start. I'm sure the content you propose is very worthwhile and we would love to incorporate it into Wiktionary. The form would change in the process though, adapted to the standards we developed. In a way it would be bilingual after all. All entries describing Japanese only have links to translations in English. From there a link will go back to the same and other Japanese words, but also to many other languages.
- How do you propose to take care of mentioning the data comes from you? Would you like it to be added in the summary field (that's easy)? Is that enough? It's also possible to add it to the talk pages, but I don't think that is realistic or practical for the amount of pages. The last option is to add it as content on the pages themselves. But I think that should be avoided. The entries should remain clean and tidy describing the words.
- Do you feel like running the bot? That is entirely possible. It also makes it easier to find all the entries that were added by the bot. It will take some time though. Since the bot changes actual content, I prefer it to be interactive. The bot does all the hard and boring work. The operator checks it and corrects were necessary. Especially difficult are the entries that already exist, and there will be quite a few. At first the merging will need to be performed manually.
- Let me know on my talk page, if you want me to have a look at it. The bot framework is written in Python. This language is relatively easy to learn and work with. Polyglot 17:33, 14 July 2005 (UTC)
- Jim, your suggestions have exciting possibilities. Using a bot to massively apply material is certainly a possibility, but there are a few steps that need to be gone through before we get to that. We need to know just what task that bot will be asked to perform. My suggestion would be to put together a handful of representative articles in a way that at least attempts to conform to our usual page layouts, but reflects what you consider to be important material on these pages. Once these pages are there we will have a basis for determining whether an accomodation can be worked out. Don't be too worried by what's on Wiktionary:Entry layout explained; there is still room for flexibility.
- I agree with Muke that our handling of multiple variants has been less than stellar, but even without that I feel that this project has gone a long way in a short time since its inception on Dec. 12, 2002. I can't speak for the ja:wiktionary, but it is a younger project, and each project is free to set most of its own policies. Until now the emphasis has been on building a basic volume of material, but I would hope that as time goes on we will be able to layer in new levels of sophistication. I look forward to results that will be to our mutual satisfaction. Eclecticology 21:17, July 14, 2005 (UTC)
Thanks for the responses so far. For me this is still very exploratory. While there are advantages im migrating to an established environment such as Wiktionary, there are also downsides which would certainly involve compromises. In some areas I can change; in others I will not, as there are some major points of principle involved (e.g. having spent years tracking down orthographical variants and bringing them into unified entries, I won't see them blown apart. Automatically-generated non-editable sideways-references would be OK, multiple entries would not.)
Anyway, where is the best place to progress discussion on this? Here? The mailing list? Should I be proposing a new project on bilingual dictionaries? JimBreen 01:08, 15 July 2005 (UTC)
- Asking where the discussion might be continued is at least a sign that the idea is worth further exploration. I think that a new page like Wiktionary:About Japanese bilingual and its associated talk page might be a good place to carry forward. ("Bilingual" in that link was just a quick suggestion; feel free to change it to something more appropriate if you want.) This page is great for early reactions about an idea, but becomes unworkable when we need to go into more details. The participants in the mailing list tend to be a much smaller subset of what you find here, so that there is a lower likelihood of reaching those who could be seriously interested. The page Wiktionary:About Japanese is one active person's attempt to adapt standard formats to Japanese; I'm sure that he will be prepared to listen to your needs.
- I agree in principle with maintaining unified pages of variants. Non-editable references could be more of a problem since one of the Key concepts underlying wikis is the open editability. This may not be as big a problem as it at first seems. The problem people that we encounter mostly avoid these more esoteric topics. Records of changes are maintained so that the unacceptable ones can be reversed. I look forward to example pages in our project so that we can begin looking for common ground. Hopefully there will be enough to begin generating interest at Wikimania. Eclecticology 06:43, July 15, 2005 (UTC)
- Hi Jim,
- I cleaned up the Beer parlour. Please let me know on my talk page when you created a Wiktionary page that reflects the contents of one or a few of your xml structures. Then I can have a look at the feasibility of creating a bot to speed up the transfer. Polyglot 05:51, 16 July 2005 (UTC)