Wiktionary talk:Categories

Latest comment: 17 years ago by Wakuran in topic Cognates and derivations

Various 2004 edit

This was copied from the Beer parlour:

What's about Category:English nouns ? 213.228.42.109 11:24, 2 Jul 2004 (UTC)

"Categories" is a powerful tool which I would be loath to abuse. By categorizing articles according to whim we can only end up with an unworkable mess that is not of much use to anybody. The English nouns category is potentially huge. How would a non-contributing reader use it?

We need some sharing of ideas about how categories can best serve the purposes of Wiktionary before people go off into worlds of their own. Eclecticology 17:36, 2 Jul 2004 (UTC)

Yes, Categories is a bit powerful, especially for something like "English nouns". The Esperanto Wiktionary seems to be using them to replace indices (ex: eo:Kategorio:EN) with subcategories for each initial letter (the list of English words starting with "A" is probably smaller than the list of English nouns, but that's still a lot...). On the Latin Wiktionary I've been using categories for etymology, (ex: la:Category:Radices Graecas) with subcategories for individual roots (somewhat like the AHD does). This may not amount to smaller categories at the by-language level but I think it is more useful than plain indexing. —Muke Tever 21:37, 2 Jul 2004 (UTC)
One thing we could be doing with categories that we are currently doing by hand is the Rhymes indices. —Muke Tever 21:37, 2 Jul 2004 (UTC)

Again there are new contributors running around categorising english nouns and pronouns by person but not by number or formality or reflexivness or personal vs other. I really think we need to start talking about this and decide what type of categories we do and don't want, and how wide or narrow the categories should be if we are going to categorise every single entry. — Hippietrail 03:07, 14 Jul 2004 (UTC)

I'm glad to see another voice in this discussion. I do find it difficult to suggest what might be the best approach to categorizing. I still think that "English nouns" is wasteful, as it would also be with verbs, adjectives and adverbs, but I would still be more open to conjunctions and prepositions. Even then the reversed order "Prepositions in English" could be more useful. Subcategorizing the pronouns spreads things too thin, and you can end up with one item categories. The items in a category should have something in common; all we are saying about the item in single item category is that it has nothing in common with anything. The piping symbol in categories is used to force a particular alphabetization. Thus [Category:Pronoun|P] and [Category:Pronoun|R] could be used to force sepearate lists for personal and reflexive pronouns on the category page. In other circumstances using language code pipes could have a similar effect.
Etymology as a cataloguing entry may be more useful for Latin which tends to have a more scholarly approach, but I don't think that most of the English contributors have reached the level of sophistication needed for this.
Some time ago I advocated the separation of the pseudonamespaces "Index:" and "Appendix:". The distinction would have had "Index:" used for languages, and "Appendix:" used for other kinds of lists. I still think that the distinction is a useful one, and would have relieved an already overburdened "Wiktionary:" namespace. I've hesitated to proceed because of the tremendous amount of work needed to ensure that we would not end up with a lot of broken links. "Concordance:" also came up as a possibility, and our Sherlock Holmes fan made some very strong points in defence of what he was doing; unfortunately he hasn't been actively developing his own idea, so we can ignore that issue in the short term.
There has been some discussion of the role of "List of ..." articles in Wikipedia, and whether these are effectively replaced by categories. I don't think so. A category is a bottom-up representation, and nothing can appear in a category unless it already exists. This is not the cast with a top-down list "List" whose elements can be items that we merely wish to have.
I think that the most important queation to be asked is, "How can a category be most useful to the user?" If a user wants to find certainn information, how does he go about doing this? If he knows tha work=d and just wants to find out what it means the ordinary search function will do just fine. I know that extremely long and extremely short categories are mostly useless, but beyond that I'm not sure of what I'm looking for. Eclecticology 16:55, 14 Jul 2004 (UTC)

If you look in Special:Categories, you will see a number of different types of categories emerging:

1.) Those attempting to include all entries in an open class (like English_nouns or English_slang). 2.) Those attempting to include all entries in an closed class (like English_pronouns). 3.) Those attempting to construct a theme (like Harry Potter). 4.) Ontological categories (like English_parts_of_speech, which includes the articles on noun, verb, etc.) 5.) Thesaurus-like categories (well, there aren't any yet) 6.) Meta categories (like French_index)

For tracking open classes (1), it seems silly to rely on people both putting in the right header in the article and to assign the word to the appropriate category. That is to say, every time someone created a definition for an English noun, they'd have to remember to add it to Category:English_nouns. Also consider what happens when that category is better populated. When a class has hundreds of thousands or millions of members, the category display mechanism: a.) might not be the most computationally efficient way to store and retrieve membership data of , and b.) won't produce a space-efficient or user-friendly layout.

I propose there be a Special page or pages that track open classes (which is useful in certain ways). A script could be periodically run that classifies as an English noun all the articles that have ===Noun=== inside an ==English== section, and so on for all interesting permutations. The appropriate Special page could be a member of Category:English_nouns.

The use of Wiktionary for ontological purposes (4) - essentially making it a wiki-based variant of WordNet or OpenCyc - seems interesting. The category system is a logical choice for making the maintenance of is-a-type-of relationships easy (hyponyms and hypernyms). Arguably, an ontological categories should not be connected to anything outside their strict hierarchy, except by article pages. Depending on licensing terms, the easiest thing to do might be to wikify the open portions of OpenCyc, preferably in an automated fashion.

Categories would also seem to be nice for thesaurus-making purposes; they might even replace "Synonym" and "Antonym" headers. Except that there's not really a way to assign only one sense of a word to a category, as opposed to the whole article.

They also seem handy for defining closed classes, constructing themes, and meta-organization.

Things could get rather ugly in terms of navigation, namespace conflicts, automated extraction, and hierarchy co-mingling. Perhaps we should designate distinct namespaces for each these "projects". Some of them need some relatively strict guidelines to make them useful; others can be more freeform.

-- Beland 05:59, 18 Jul 2004 (UTC)

I am pro a thematic + language categorie: edit

I am pro a thematic + language categorie:

This allow to see all words to be used in a theme in a language. Moreover, these category, will be linked to the same category explained in another language. see: Category:English units and fr:Catégorie:Lexique en anglais des unités de mesures

62.147.114.112 22:46, 21 Jul 2004 (UTC)

In response would something like [[category:architecture|en]] and [[category:architecture|fr]] be workable. Remember that the pipe works differently in the category namespace. It designates the way things will be sorted on the category page. Eclecticology 04:55, 2 Aug 2004 (UTC)

A suggestion edit

How about using categories to split entries into different categories. lIke for example the following (i know it has loads f problems, but bear with me here)


  • root level
    • language
      • chinese
        • some form #1
        • some form #2
      • english
        • olde
        • British
        • simplified
        • american
    • type
      • verb
        • tense
          • past
          • present
          • future
          • present continues
          • you get the idea
      • adjective
      • adverb
      • noun
    • something else


imagine how easy that would make it to find a verb in present continues that starts with the letter g and is popular mostly amongst British speakers

or lets say we are looking for a list of adjectives that start with the letter B or something similar.

it seems like a GREAT use for categories - Anonymous

Rather than having any definition directly under a category of the type of speech ("noun"), they should go under categories like "Units of measurement", "Chemicals", "Body parts". Having definitions under the type of speech at all might not be a good idea, though, because more than one type of speech is properly in the same page, most often with noun and adjective. More complicatedly, because of the broken practice of having different words of entirely different meaning and origin under the same page, you run into the problem of having definitions under categories to which they do not belong at all, or are even remotely related to. Which is an incidental reason of why no other dictionary in the world puts words under the same definition simply because of the spelling of it... - Centrx 23:36, 4 Aug 2004 (UTC)

The more I think about it, the more I agree that the parts of speech are not a fruitful way of setting up categories, especially for the major ones of noun, verb, adjective and adverb. There would be just too many in those categories. Eclecticology 04:35, 5 Aug 2004 (UTC)
I agree that the categories would be too big, but if one is to implement the idea above with a script I think it would be more efficient if it searched for a {{ennoun}}-tag instead of the suggested nested headers. The tag wouldn't necessarily be visible in the article. That would be more flexible if one wants to add other kinds of classifications, and has the advantage that it doesn't create five lines with category links, as could happen if someone thinks that not only en nouns should be classified. (Think of modem to get an idea about what I'm thinking of...) \Mike 10:09, 10 Aug 2004 (UTC)
If people find this tagging system too confusing they simply won't bother with it. Making it not visible would add even more complication to the writing. From a passive user's perspective will the information that he gets fro a long list of English nouns be useful? Eclecticology 08:26, 12 Aug 2004 (UTC)
That's why I used the big if (maybe not big enough?) :) \Mike 09:48, 27 Aug 2004 (UTC)
I think that etymological and linguistic categorization is a great idea, but should reined in. Where I think that part of speech categories will go wrong is when words are not subcategorized at all, category:English:Verb could include all verbs, but it could be easily split into category:English:Verb:Transitive|Intransitive:Action|Existence|Occurence etc. making it far more meaningful and navigable.

A missed? opportunity edit

If you wish to break new ground in dictionaries, consider the outrageous. Use an xml structure to create a universal dictionary:

<Gestalt id="wikiID">
  <Description lang="">Extended definition of a single idea</Description>
  <Entry type="word">
   <Lang="English" [otherAttibutes]="">Word</Lang>
  </Entry>
</Gestalt> 

So a simple example:

<Gestalt id="wiki-red">
  <Description lang="">A color descriptive of visible light in the frequency range of....</Description>
  <Entry>
    <Lang="English">red</Lang>
    <Lang="Spanish>rojo</Lang>
  </Entry>
</Gestalt> 

<Gestalt id="wiki-red:politic">
  <Description lang="">A term used for communists by American conservatives in the 20th century</Description>
  <Entry>
    <Lang="English">red</Lang>
    <Lang="Spanish>comunista</Lang>
  </Entry>
</Gestalt>

<Gestalt id="wiki-1234567">
  <Description lang="English">To harm oneself by overreacting</Description>
  <Entry type="metaphor">
    <Lang="English">
      <Origin>Uncle Bob had a horrible itch on his nose.....</Origin>
         To cut off your nose despite your face
      </Lang>
  </Entry>
  <Entry>
    <Lang="English">
      <Origin>during the Reformation the Hussites cast off the Catholic church...</Origin>
         To throw the baby out with the bath water
    </Lang>
  </Entry>
</Gestalt>


The Gestalt id becomes the universal index of ideas. Translators could use it to translate ideas rather than words. The xml can be extended anyway you like to include everything that has been discussed by everyone.

And you don't have to even have a taxonomy for the gestalt id. It just has to be unique. If it is human readable, it is slightly better.

   wandadubbayou
<Jun-Dai 23:00, 17 Jun 2005 (UTC)> The biggest problem (of many) is that it is sort of reverse to how language works. Most precise ideas don't map well to particular words in a language. Consider things such as 青, which means blue in Japanese. It can also mean green, but it usually means blue. Consequently, it doesn't map so affectively to English blue, which we would never use to describe grass.
      <Gestalt id="3456788">
        <Description lang="Japanese">青</Description>
        <Entry>
          <Lang="English">Usally blue but can mean green. Never used to describe grass.</Lang>
        </Entry>
      </Gestalt> 
      

      -wandadubbayou

Going further, if you take a single concept, such as a particular color (frequency of light), there are often many words in English that could describe it (red, crimson, ruby, etc.), and what's more, some of the words would only be appropriate in a specific context (e.g., clothing), which this format doesn't seem to encourage.

      <Gestalt id="wiki-crimson" memberOf="wiki-red">
        <Description lang="">A color descriptive of visible light in the frequency range of.... only appropriate for clothing</Description>
        <Entry>
          <Lang="English">clothing specific ruby</Lang>
        </Entry>
      </Gestalt> 
      

      -wandadubbayou

  The biggest problem that most translating dictionaries is the limitation with regards to how much space is given to usage, examples, and the specific contexts in which a word in one language can be used and a similar word in the other language cannot (as with 青, which will always have blue as the first item in the translation). I don't know where some people around here are getting the idea that different languages simply have different sets of words that map in some (relatively) easy fashion from one language to another. Nothing could be further from the truth.
      You obviously have missed the whole point. I am proposing a dictionary of ideas... not creating a dictionary of words. -wandadubbayou

I would go so far as to say that, outside of cognates and borrowed words (and maybe even including them), there are so few words as to be negligible that are an almost exact equivalent for another word in another language, and most of them are only sloppy equivalents. Furthermore, some concepts cannot be described well in a particular language, and some simply cannot be described at all.


There is, for example, no word in German that has the meaning that is the main sense of the English ambiguity. It simply doesn't exist in German, and you would have to use an awkward phrase to produce a similar meaning (most dictionaries offer die Zweideutigkeit, which roughly means "double meaning").

      <Gestalt id="34545558">
        <Description lang="English">ambiguity</Description>
        <Entry>
          <Lang="German">[an awkward phrase to produce a similar meaning ].
             <UsualTranslation>die Zweideutigkeit</UsualTranslation>
             <AlternateTranslation when="at birtday parties">dur garbletygookishnieder</UsualTranslation> 
          </Lang>
        </Entry>
      </Gestalt> 
      

      -wandadubbayou


All of this idea of translation mapping is, IMO, pure folly. </Jun-Dai>

      Nonsense. XML can describe anything. Take three pages if you need to. All ideas can be communicated across languages, or you presume that people cannot learn new ideas, only use existing ones.
<Jun-Dai 01:52, 18 Jun 2005 (UTC)> The technology is not the limitation. The problem is with the idea of "translation". To begin with, there is no such thing as an "accurate translation", despite the fact that some translations can be more accurate than others, or more accurate in particular ways. That all ideas can be communicated across languages is not at all a given, and nor does the notion that not all ideas can be communicated across languages rely upon the notion that people cannot learn new ideas. To put it another way, a better way to put that statement would be that not all ideas can be communicated in any given (or even all) existing languages.
People clearly can learn new ideas, and yet we often have ideas that cannot be communicated. Sometimes this is because we don't know how to articulate it, but other times it is simply because the language is not up to the task. Similarly, some languages are better for communicating certain ideas than others. German, for example, is not a good language for talking about ambiguity.
I think there are two points here that you are making that I can agree with, provided that they are kept separate: (1) It would be useful to have a means of mapping various means of expressing particular ideas in any given language (a reverse dictionary), though this would not replace the Wiktionary, or even really overlap with the Wiktionary's aims much. (2) XML would be a superior format for containing this and other (including Wiktionary) information.
What I don't agree with is the idea that such an idea could replace or be superior to what the Wiktionary offers. I also don't agree that XML is the appropriate format for something like the Wiktionary, the reason being that while XML is a more useful and extensible language for processing, it is a pain to write in and to maintain manually, which is how data is generated and improved in the Wiktionary. If computers could write dictionaries, then it would definitely be better.
Moreover, the translation mapping that you describe is of limited use for the reasons that I gave. The latest sample that you provided comes closer to something like the Wiktionary in that it is providing a definition for a word, rather than a word for a definition/concept. You are no longer mapping translations, and the content in that sample looks just like the Wiktionary but in a format that is much more difficult to write, extend, and modify, and would thus never get many contributors.
        I am suggesting mapping ideas. In the case above, 青 is not just a word. It is an idea that is different from the ideas of blue or green. You would take as must realestate as necessary to convey the idea in whatever other languages you were using to convey it in. You could even add a color swatch or pantone number. From your description, and the suggestiveness of the picture I would guess that the idea of 青 is something more like "peaceful color" which should be explained rather than just the scientific description of light frequencies. We might use "cool color" in english, which also means blues and greens. -wandadubbayou

Currently we don't have enough contributors to make the Wiktionary especially useful, but it's growing, and I believe that it will become a useful resource at some point in the future. What you've proposed would never get enough contributors to be useful, and would probably not even get as many as we have here now. It might be nice to have an XML output format, but having an XML input format would be hell. </Jun-Dai>

Category by XML edit

As mentioned above, each Wiktionary item would categorize an idea. A convenient taxonomy for realtionships between ideas could be adopted as a starting point and expanded if necessary. Check out the link in the example below, Trigg did pioneering work in hyper-linking.

<t:taxonomy for linking>
  <o n="TriggLinkType">
    <r t="TriggLinkType:C-Source">http://www.workpractice.com/trigg/thesis-chap4.html</r>
    <r t="contains">C-source</r>
    <r t="contains">C-pioneer</r>
  </o>

  <o n="TriggLinkType:C-source">
    <r t="describedBy">Gives the source of concepts and ideas in order to enable checking and authenticating of data and clams of facts, physical constants, etc.</r>
  </o>
  <o n="TriggLinkType:C-pioneer">
    <r t="describedBy">Pays homage to pioneers. This is similar to C-source though broader in scope, i.e. one cites the work or a pioneer in a field though the cited work may not be directly relevant.</r>
  </o>
<t:/taxonomy>

The edit page would be auto-generated by a schema to keep editors within the structure of the wiki. Th exml would be invisible and you wouldn't even need wiki ML. The display could be customized by each user by applying custom xslt on the data. Do If someone wanted standard wiki displays it would be available through an xsl.

If you build it, they will come.

--wandadubbayou wanda at dubbayou.com


Overlap? edit

Once in xml the data could be displayed as a forward or reverse Dictionary in any and all languages it contained with a simple xsl.

Once in an xml format, a simple engine can be made that actually discovers new relationships and can ask questions to clarify relationships.

Implementation edit

Wiki already stores text nicely... xml is text. A thin layer applies an xsl to the stream based on user preferences. In edit mode, a thin layer uses a schema for a template to create a form. Data from the form is stored on wiki pages in xml. Linking would be done by browsing in an embedded frame in the edit GUI. All contributions done through the GUI. No ML for anyone. xml can be autogenerated from other DB's for automatic entries. Use a public dictionary for starters and auto generate entries, name the gestalt by the word and definition number.

Publish the schema so that other dictionary groups can standardize and integrate easily. Let the computers do what computers do well, and ask contributers to clarify relationships. Assign a reliability index to each contributor that indicates a trust level based on history. The wiki then could answer the question, "Who said so, and should I trust it?" Policing becomes easier.

A data driven search engine runs in the background looking for relationships based on rules made by contributors. As it discovers possible relationships, they are published to a question page. Users answer the questions which then validate the links. The engine knows who to trust. If the responders are unknown, the question is asked of several people. A relaibility threshold must be reached before the change is made.

Eventually a search engine is added that can browse the internet looking for possible links. They are validated by asking users questions.

Wiki is the perfect venue for such a project. And something like Wiktionary is exactly where it would have to happen.

page Wiktionary:Categories was virtually empty, so I created a new entry edit

The page was virtually empty, so I created a new page describing how categories are used in Wiktionary. I did delete the reference to the latin word for categories, since this page is in the Wiktionary: namespace, not the main content space.
Then, to my surprise, I find a great length of discussion associated with the page. But the history shows it never had any content ? Strange.--Richardb 10:15, 21 Nov 2004 (UTC)

There was no reference to the Latin word for categories. That was an interlanguage link to the article on the same topic in the Latin Wiktionary. They appear on other wiktionary:, category:, and article pages as well. Please do not delete such links. —Muke Tever 18:35, 5 Mar 2005 (UTC)


After Real Thought edit

A proposal for Wiktionary structure

The proposed structure is designed to capitalize of the strength of Wiki and handle every type of categorization. I propose one Gestalt node per wiki page.

<Gestalt id="" type=" gestaltIdRef">
	<Relationships>
		<Relation gestaltRef="">gestaltRef</Relation>
		<Relation gestaltRef="">gestaltRef</Relation>
	</Relationships>
	<Symbolics>
		<Symbol lang="" type="gestaltRef">Best definition of the Gestalt in the lang</Symbol>
		<Symbol lang="" type="gestaltRef"> Best definition of the Gestalt in the lang</Symbol>	
	<Symbolics>
</Gestalt>

1. Gestalt – an atomic universal idea. Its id attribute is universally unique. The idea has only one meaning. It is language independent, even though a language may be used to assign the id for readability purposes. It’s type is a reference to another Gestalt that is a member of a type taxonomy. Much can be discussed about the taxonomy, but it is envisioned that it has elements like pureIdea, instanceOfAnIdea, IdeaOfaRealObject, InstanceOfaGestaltOfARealObject, etc. These also might be able to be handled by Relationships.

2. Relation – A relation describes a relationship to another Gestalt. The attribute id a reference to a relationship Gestalt and the value is the other Gestalt. Much of the text in definitions just tells about relationships to other things. I suspect that more than half of the definition will be contained in the relations which are language independent.

<Gestalt id="gestalt:red" > <Relationships> <Relation gestaltRef="containedBy">gestalt:colors</Relation>

3. Symbol – A Symbol is an expression of the idea. Words and sentences are not ideas, just symbols of the ideas. Symbols vary in each language, but the ideas can be shared. Translation attempts to change the symbols from one language into the symbols of another in an efficient accurate manner. Communication attempts to move an idea from one mind to another in a precise manner.

The Symbol type attribute is a reference to a structure contained in another Gestalt. The structure will change from language to language to accommodate structural difference in languages.

For English, the structure would contain elements for nouns, verbs, etc. In Greek it would include declensions. In Tok Pisin, it would include inclusive and exclusive pronoun types.

Each might have translation hints, etymology, etc. Did we forget something? No problem just add it to the type taxonomy with wiki.

Wiki should not limit the kinds of information, but provide a structure to include and make useful all information.

Use xml behind the scenes. The GUI should provide only those parts of the Gestalt that each user wants to see. They can expand and contract the view at will. The view is populated with the symbols from the other gestaltRefs so that the relational parts of the definition are actually written by the wiki at the time the information is requested.

The GUI editor creates an html form from the viewer schema based on the user’s preference. The user can edit those parts. The GUI also allows browsing in another frame so that Gestalt linking make relationship definitions easy.

I believe this is an infinitely flexible yet wholly managable idea. I also believe you will get more people to participate since the wiktionary will have value to AI engines in this form.

<Jun-Dai 01:48, 20 Jun 2005 (UTC)> An additional problem, which I didn't mention earlier, is that this relies on the two assumptions: (1) ideas can be wholly language-independent (by no means obvious or proven) and (2) language-independent ideas can truly be articulated in language without absorbing additional meaning. The first one is mostly a matter of belief--it's not really provable. The second one is, IMO, clearly wrong, though there may be some question as to whether we can come close enough for practical purposes. Ultimately, the distinction between two closely-related ideas is in many cases going to depend on language that the ideas are being articulated in, and the culture connected to that language. The question then becomes how to define a particular gestalt. If we are defining it in English, then the basic meaning for the term blue would be a gestalt, and the Japanese word would be on the list for translations. If we are defining it in Japanese, then would be a gestalt, and the English words blue and green would both be mapped to it, with blue being the main term. In Japanese, there simply isn't a single concept that maps to the English idea of blue. In English, there simply isn't a single concept that maps to the idea of . To say that the English term peaceful was involved would give too much emphasis to the affect of the color, when in Japanese it really just refers to the color (much like how blue could be defined to some extent in terms of mood or affect, but for most purposes it is simple a color with little emotional connection being made on the part of the speaker). Nevermind the fact that in Japanese culture, the very color itself is going to have subtly different cultural associations. </Jun-Dai>

I think you are hung up on the symbols... All you have done is make an excellent case that "blue" and 青 are symbols for different gestalts. Why would you attempt to insist that the gestalt was one or the other? The two gestalts will share some relationships and have others that differ.

And I will leave all the philosophising to you. The topic was how to categorize. Whether you think an ideal idea dictionary can be attained is not at all at stake. The proposal overcomes your objections to an xml structure based on editing in xml. So now you wish to raise philosophical objections.

So if you want to talk philosophy there currently is no reason for anyone to participate in wiktionary unless thay are a member of a group that speaks an obscure language. All the major languages already have online dictionaries offering what wiktionary purports to be its highest goal. Namely " a collaborative project to produce a free multilingual dictionary in every language, with definitions, etymologies, pronunciations and quotations."

Online dictionaries already offer all this and more when combined with search engines. At least wikipedia is a great social experiment. Wiktionary is just redundant grunt work in its present form.

Do something bigger.


<Jun-Dai 15:34, 20 Jun 2005 (UTC)> You've missed my point again. It's not that and blue correspond to different gestalts, it's that blue refers to a number of gestalts that simply cannot be expressed in Japanese, and similarly cannot be expressed in English. They overlap in a number of ways. Articulating the areas of overlap and the areas that each distinctly covers is going to be culturally-specific, and following that route logically, you'll no longer be talking about gestalts, per se, but simply <gestalt>blue</gestalt><gestalt_ref>some long-ass explanation in Japanese</gestalt_ref>--i.e., a translational mapping dictionary, which is essentially what we have, but without all the bulky XML.</Jun-Dai>
<Jun-Dai 16:54, 20 Jun 2005 (UTC)> Also, the Wiktionary, as we have it, has the potential to be useful in a number of ways. To begin with, it is capable of including and defining a number of words that have not made it into dictionaries yet (e.g., google). It is also capable of a much more extensive explanation with regards to the ways in which words are used, and the connotations that they bring with them into a particular context (this is especially true for the foreign terms). We have a long way to go before we are really very useful, but I'm fairly convinced that we'll get there. We will certainly see some sort of technology shift before we get there. The backend of that may be XML, it may not be (XML is very slow). But the user should never see the XML, and the XML should never be as intrusive as the schema that you've laid out--it should only contain large sections, which are editing and wiki-marked-up by users in some sort of text field. GeraldM's notion of a Universal Wiktionary is not a bad one. What I'd prefer to see would be some sort of master Wiktionary containing core elements of a term, such as its pronunciation (in international symbols), its conjugation, and its translation into various languages, with links to those terms. Think of it as a universal templating system for all the Wiktionaries out there. Would XML play a role? possibly. But I don't see much benefit to what you've described. If you really believe in it, however, then build it, and if you are right, then people will come to work on it. As there is very little overlap with the English Wiktionary, what you describe really needs its own MediaWiki instance, or possibly some other software, built from scratch, in which case it is a discussion for the meta-wiki. </Jun-Dai>

Dag, I am really missing something here. How is all this significantly different from Special:Export/blue vs. Special:Export/青? The Special pages link on the bottom left of your screen should get you to where you can find the "Export pages" link... Or are you saying you want the wiki software to parse the "=" headings down further into xml for you? --Connel MacKenzie 17:12, 20 Jun 2005 (UTC)

Or is perhaps the suggestion then that the Special:Export feature should be more prominently touted? --Connel MacKenzie 22:55, 21 Jun 2005 (UTC)
<Jun-Dai 23:04, 21 Jun 2005 (UTC)> I think the problem is that the export is just an export of the article with the entire text in one XML tag. This person wants a number of tags covering all the functions, as well as an integration between the various language wiktionaries, and article entries that correspond to some kind of language-independent concept, rather than a word in a language. Basically something that is much, er, bigger than what we have here. </Jun-Dai>


I am simply offering a structure that can accomodate cataloguing what you have, and have plenty of flexibility for bigger things. I apologize for having vision. Since you seem to own this thing, do what you want, and I'll butt out.

<Jun-Dai 01:00, 22 Jun 2005 (UTC)> Neither me nor Connel could really be said to own this thing. I think that what you describe is an interesting proposal, and it certainly would fill a niche that doesn't currently exist. Does that make it "bigger" than this project? no. More importantly, what you describe would not be well suited to the particular software that we have in place, though it would be good with a similar sort of collaborative environment. What's more, my understanding of the licensing for the Wiktionary content would lead me to believe that you should be able to "suck" the content from the Wiktionary as a means of initially populating your project. If you have vision enough, you'll find a way to implement this idea on a more appropriate platform, and I'll happily sign up as one of your first contributors. </Jun-Dai>

In this short "discussion", you first objected to the unsuitability of xml to wiki. When that objection was overcome, you then object based on linguistic philosophies. Then you first use the term "bigger", and when I repeat it, you argue with me about its usage. You have lots of arguement. Then another admin makes derogotary remarks about being anonymous, when anyone who has actually read anything I posted can get my e-maiil address, and I have personally invited you for offline dialog in e-mail or irc. I expect cheap shots on here, but not from admins. Good luck and good bye.

<Jun-Dai 02:09, 22 Jun 2005 (UTC)>

I objected to (and still object to) the unsuitability of xml as a user-inputted format for wiki, which seems/seemed to be what you were suggesting. I also feel that XML is slow, and not necessarily the best storage/backend solution for the Wiktionary data, though it is a great format for manipulating, mining, and drawing connections through that data. This objection has in no way been overcome. I also feel that the basic idea behind the proposal is problematic at best insofar as separating ideas from language is concerned. Though the proposal has in itself a kind of linguistic ideology, it doesn't seem to be based in solid linguistic principles of any sort.
As for bigger, I was referring to your statement "do something bigger," which, in context, was a snide remark about the Wiktionary itself. You expressed the opinion that the Wiktionary offers nothing new (which I refuted, with no response from you) and is simply a bunch of grunt work. Your proposal, on the other hand, you seem to feel offers the opportunity to create something that hasn't been done before. This last part is, IMO, a valid point, but I see no need for it to replace the Wiktionary as it stands. Better yet, let it complement the Wiktionary.
I fail to see where I've made any cheap shots here. You, on the other hand, have dodged around most of my points, and in the end have taken an ad hominem position with a somewhat melodramatic finale ("I expect cheap shots on here, but not from admins. Good luck and good bye"). Why?

</Jun-Dai>

ad hominem edit

[[1]] Regular Ad Hominem

  1. A makes claim B;
  2. there is something objectionable about A
  3. therefore claim B is false.

It does not appear that WD has used any ad hominem attacks. Had she made an ad hominem attack, she would have said

"blank is a bleep and therefor nothing blank says is valid. "

Instead WD made a claim (which may or may not be precise) that JD has shifted the topic repeatedly rather than address the technical issues being proposed.

JD replies by accusing WD of dodging other issues and being melodramatic.

<Jun-Dai 01:01, 23 Jun 2005 (UTC)>

You're talking about ad hominem argument, my friend. I was referring to your position as being ad hominem. You were making comments on the personal character of your interlocutors in what seemed to me an attempt to invalidate or make dubious your interlocutors' assertions. You did this by referring to me and Connel as having made "cheap shots" and thereby implying that we were not of the (moral?) character that you would expect admins to be. I admit readily that this is reading into your statement, and that you didn't explicitly make any such point. It seems pretty clear to me, however. What's more, I don't see any cheap shots that we have made.
Moreover, I don't think I've avoided addressing any of the technical issues you've proposed. On the contrary, you have shifted the argument repeatedly, and I have responded accordingly. I never argued that XML wouldn't work as a backend format, though I did point out that it wouldn't be ideal for performance reasons. My only argument with regards to XML was that it would be terrible as a user-inputted format, which wasn't overcome so much as made irrelevant by the fact that you qualified the proposal to make it clear that you were thinking of XML as running behind the scenes rather than something the user would be inputting.
My philosophical points were present from the very first statement I made, and I have merely expanded on them. You avoided those comments with the statement "I will leave all the philosophising to you." Well, the problem with that is that the whole idea being proposed is based on certain assumptions about the nature of language and ideas.
What's more, I will readily admit that I took the discussion to my interlocutor (ad hominem) by calling her/his final statement melodramatic. I didn't intend for this to be a judgement on my interlocutor's previous statements or proposal in general, though I accept that it can be interpreted that way. Regardless, I stand by my judgement. It was a melodramatic statement to make.
In any case, would you kindly point out what it is that I have dodged? I certainly can't see anything along those lines in the discussion above.
Also, I've seen the admins here make comments about anonymous posting, but I'm really not seeing it in this discussion. Where did it happen, and who said it?

</Jun-Dai>


Flamewar and hurt feelings aside, I still agree that the Special:Export feature should be more prominently touted. If/when Wanda devises a dynamic re-parser, perhaps she could demonstrate it here? Or even better, help it make its way into the wiki source code tree? --Connel MacKenzie 18:59, 25 Jun 2005 (UTC)

Cognates and derivations edit

There are several examples of cognates listed as derivation, cf. for instance brown and mild, does someone know an easy way to fix it up? Wakuran 12:48, 17 August 2006 (UTC)Reply

Entry layout explained suggests the heading ===Descendants===. Is that what you are asking? --Connel MacKenzie 16:37, 4 September 2006 (UTC)Reply
Ah, no, on the example mild, the English word is listed as a Danish (etc.) derivative, when the truth is that the Danish word is just a cognate. It seems the only way to fix that is to manually edit all incorrect pages. Wakuran 19:09, 6 September 2006 (UTC)Reply
Return to the project page "Categories".