Wiktionary talk:Thesaurus considerations

Latest comment: 18 years ago by Richardb in topic From Beer Parlour 2006

I am going to restructure the page Wiktionary:Thesaurus considerations to try to present the comments in some logical form. I will try to preserve all comments and their intentions. My apologies of I don't get it exactly right.--Richardb 11:49, 10 Dec 2004 (UTC)

From Beer Parlour 2006

edit

State of the WikiSaurus

edit

I am of the opinion (and I am certain that I am not alone here) that the [[:Category:Wikisaurus|Wikisaurus]] namespace/project is at present not doing well. The VERY few entries which are thriving are the anatomical ones, and those are dubious at best in nature. I suggest that some radical decision should be made about the direction of the project, something to kick it back into gear and make it into a useful and accurate resource.

Possible directions:

  1. WikiSaurus project. - This is certainly the most radical, but arguably the most sound direction. A WikiSaurus project would make it a much more useful resource (easier to search for one, better non-wikt formatting as appropriate to a thesaurus for another). The obvious downside is that we would need to develop a new community (somewhat), reestablish leadership and basically start from scratch. This is certainly not insurmountable. It would also require some higher-up support.
  2. A MAJOR cleanup. - I am envisioning the removal of much content (taboo, I know, but the place is getting ugly) and a new stance of relative prescriptivism on the *saurus namespace. The current free-for-all stance has allowed many of the entries to degenerate into a "who can think of the most creative word/phrase for penis/breasts" competition and *saurus validity is losing...badly.
  3. Ignore the anatomy. - We can attempt to be passive aggressive with it, try and add so much useful content that the junk pages become a minority. It will take a while, and I think we are early enough in the project that an overhaul is better now than later.
  4. Your direction. - The ones I haven't thought about.

If I am in a minority in thinking that *saurus is hurting, I'll go ahead with option 3 and live and let live, but if you take the time to peruse the content of *saurus I think you will find that something ought to be done. - TheDaveRoss 04:21, 25 February 2006 (UTC)Reply

I'm glad to see you assuming leadership of this project. ;-) "Ignoring the anatomy" is a safe approach. If you add enough useful pages it could be seen as putting clothing on that "anatomy". I would be inclined to draw more links between the WikiSaurus project and Wiktionary:Roget Thesaurus Classification. Perhaps every item in that list should also have a link to its corresponding WikiSaurus page. Other items could also be added to the list to reflect the fact that the number of headings has varied over the years. The latest (6th-2003) has 1075. It is relatively easy to categorize nouns that represent concrete objects, but abstract ideas are far more problematic. Eclecticology 06:02, 25 February 2006 (UTC)Reply
He may need some delete buttons to do this massive cleanup task.
(cur) (last) 21:45, December 2, 2005 Eclecticology (→TheDaveRoss - Removed from list without prejudice; inactivity)
http://en.wiktionary.org/w/index.php?title=Wiktionary%3AAdministrators&diff=686848&oldid=686641
--Connel MacKenzie T C 02:08, 26 February 2006 (UTC)Reply
No problem with this; all I need is an indication from him that he wants to be anadmin.
I agree on #3. I think the Saurus belongs in this project and should not become independent at the moment. I'll help you out whenever I have some material for it. — Vildricianus 09:34, 25 February 2006 (UTC)Reply
The way we use the term "project" here is often ambiguous. I read it as meaning a new project in Wiktionary rather than a whole new wiki. Eclecticology 09:28, 26 February 2006 (UTC)Reply

heading 1- just for breaking up the section

edit

I think it time to do something radical as suggested in #2 but making it a separate project as suggested in #1 doesn't really make much sense. The reason dictionaries and thesauruses are traditionally considered seperate is because of limitations in the technology used. Here on the Internet this is less of a problem. Sure MediaWiki is not ideally suited for this but having two seperate MediaWiki doesn't really make it less limiting.

WikiSaurus is acceptable as the name of subproject. However, as a name for a namespace, or as it currently is a pseudo namespace, it makes little sense. Actually the word thesaurus itself is not really a good word at all. It orginally meant "store-house" or "treasury". There are better words that are more precise like synonyms and antonyms which is incidently what thesauruses normally contains.

It would be better to have namespaces like Synonyms: and Antonyms: used like Synonyms:happy and Antonyms:happy. This incidently allows us to support other semantic relations like hyponyms, meronyms and troponyms. For example

  • Hyponyms:animal could contain a list of animals like cats, dogs etc. OK this is mostly what categories should be used for, so it more meant as a compliment.
  • Meronyms:tree could contain a list of all possible part of a tree like bark, leaf etc. Perhaps even with some sort of clickable drawing.
  • Troponyms:cut could contain particular ways to cut like to trim and to slice etc with a short explaination how they are more particular.

Of course in most cases there is really no reason to have it separately at all. But that goes for synonyms and especially antonyms as well. Only words there listing everything would take to much space or would create too much duplication between different words would be separate.

So we would have for example on page happy:

====Synonyms====
:''Main article'': [[Synonyms:happy]]
* [[content]]
* [[satisfied]]

listing only the most common synonyms of each word on its page. --Patrik Stridvall 11:39, 25 February 2006 (UTC)Reply

I have no problem with the WikiSaurus [pseudo]namespace. I will have no fundamental objection to that becoming a full namespace shortly after the namespace manager is made a part of the software. What I most like about retaining the name is that since it is our own invented name there are fewer preconceived notions about what that should be. That gives us more flexibility to deal with the various kinds of word relationships that have already been mentioned in this thread, particularly those that have not traditionally been part of a thesaurus. It is conceivable that at some future time "Synonyms:" and "Meronyms:", etc. could become separate namespaces, but at this time there is a strong risk that it could lack focus as editors developed each of these sub-namespaces in individual ways. Beginning with one namespace, and building scalability into it from the beginning can have long term positive effects.
The headings that we now have as "synonyms", "antonyms", etc. could be reduced to "Word relations" which would have a link to the relevant WikiSaurus page for happy this would be to WikiSaurus:pleasure, with that name being chosen because it is the Roget class that contains it. Doing WikiSaurus right will be a complex undertaking. Eclecticology 09:28, 26 February 2006 (UTC)Reply

heading 2

edit
  • I don't see how the WikiSaurus can get the attention it needs as a child-project here. If it were a separate realm, it would have a better chance at getting the momentum it needs. (Note: I am not volunteering!) --Connel MacKenzie T C 02:11, 26 February 2006 (UTC)Reply
    How much attention do we need at this stage? I would much rather see a few people providing leadership and initial structure to the project than having a lot of newbies bickering about structure. Remember too that while there4 may be some who would support having WikiSaurus as a separate subject, we also have the WiktionaryZ people wanting to put everything together in one big mish-mash. Eclecticology 09:28, 26 February 2006 (UTC)Reply
The problem is that we are already a thesaurus even without the WikiSaurus namespace since we have sections for Synonyms and Antonyms. --Patrik Stridvall 08:46, 26 February 2006 (UTC)Reply
  • I started WikiSaurus just as an experiment, to try and take some heat out of a very cyclic debate that was happening. And, being evil, I populated it with all the kinds of words people look up most often in a dictionary, just to get some interest. Seems to have worked some ways, but not others.

By all means, if you can think of a better way to do it, and give it a decent poulation tof words to get started, then do it.

But the last thing I want to see is someone wasting their time and bowdlerising the WikiSaurus. Leave all the inventive stuff there. Waht harm is there in one line entries anyway. To some extent it avoids them creating new entries to put their little imaginative words into.

And, can we please have discussion of Thesaurus considerations in the place designated, Wiktionary:Thesaurus considerations

heading 4

edit
Leave all the inventive stuff there. Waht harm is there in one line entries anyway. To some extent it avoids them creating new entries to put their little imaginative words into.
To the above anonymous poster (Richardb?). Even though WikiSaurus has thus far been dominated by those with prurient obsessions I have no intention to engage in bowdlerization. With the addition of more ordinary entries, those pages will diminish in importance. As for your last comment I don't see why you would want this on the project page instead of this talk page. Eclecticology 09:28, 26 February 2006 (UTC) I didn't want this stuff on the project page. I just moved this discussion from Beer Parlour to here--Richardb 09:50, 26 February 2006 (UTC)Reply
You may have created WikiSaurus as an expiriment, but if we are going to be indifferent to it, let's get rid of it altogether. Advocating worthless content to distract vandals is just silly, "what is the harm?" the harm is that Wiktionary purports to be somewhat authoritative, yet we keep content that is utterly wrong, not only that but apparently we advocate it. If you don't care about WikiSaurus, ignore it. I think it could be useful, and is a very good potential resource, regardless of your intent.
I am willing to work on the namespace, to clean it up and sort it to make it somewhat valid, if you are going to be reverting all my changes that will be very hard. I am not censoring anything, I am attempting to make WS follow the same CFI as the rest of wikt*, instead of having no CFI beyond the limits of imagination. - TheDaveRoss 05:29, 26 February 2006 (UTC)Reply
Dave, what you are doing is just being destructive of other people's work, If you don't like what is in WikiSaurus, than just leave it alone and use your enthusiam elsewhere. Or clean up sensibly. Your clean ups were just destructive deletes. You deleted a lot of well used words.

Anyway, even the CFI is not really an agreed policy, still under development. And it applies to the main word space, not the Wikitionary namespace. Otherise you'll be cleaning out a huge amount of unverifed lists etc. It's just not up to you to decide what is "worthless content". As I say, entries like chillaxin with 229,000 hits in Google, or thousands of uses of Jacksie, and you want to just chuck those entries away.--Richardb 09:49, 26 February 2006 (UTC)Reply

If it's a matter of too little content, is there any way to populate the WikiSaurus quickly? Are there any old reference books that can be digitized? Could people establish page names and then have bots populate them semi-automatically by looking at the synonyms already in Wiktionary, from selected senses? Davilla 06:22, 26 February 2006 (UTC)Reply

Sounds like someone is being constructive about it anyway.--Richardb 09:52, 26 February 2006 (UTC)Reply

heading 5

edit
Why the rush? Simply digitizing old reference works without taking time to proof read what OCR gives would be a hell of a mess. I very much prefer quality to quantity. If we as humans can't keep up in applying judgement to what the bots are producing we aren't doing a very good job. Eclecticology 09:44, 26 February 2006 (UTC)Reply

In addition to my suggestion above it could be argued that there ALL relevant information should be in the page for the concerned word. Sure there would be some duplication.


It would be MASSIVE duplication, and duplication would mean errors. Just use the technologyy. One click and you get to the word in all it's glory anyway.--Richardb 09:53, 26 February 2006 (UTC)Reply

However it could also be argued that synonyms are really recommendations to other word to be used if you mean something that is almost but not quite what you currently looked up. Alternatively what words you are recommended to use depends on context. If you are in England, Scottland, in the US or are writtening a scientific article there might be other words might be prefered eventhough they mean exactly the same thing. It could also be argued that using slang words are never recommended to use so we should never list them as synonyms. Futhermore unless two words means exactly the same thing they shouldn't really have synonym sections that are exactly the same. So

  • Is there really any point at all with a WikiSaurus namespace or whatever we choose to call it. Are there really that many words that both means exactly the same thing as another word, are not slang words and have many synonyms?
  • Couldn't we just have use categories for slang words, especially the antomical ones?

--Patrik Stridvall 08:46, 26 February 2006 (UTC)Reply

Have you ever looked at a decent size Thesaurus to see how many entries there are ? And that is when they are limited by paper and contributors. If you used categories you would have literally thousands, hundreds of thousands of catgegories. And would not do as good a job as even the present form of WikiSaurus, where subtle differences in usage can be included. Look at some of the decent WikiSaurus entries, such as WikiSaurus:insane instead of the popular swear words. Then you might get a bit more idea of WikiSaurus can be usefully developed, while accommodating people's need for listing lost of occasionally used words.--Richardb 09:54, 26 February 2006 (UTC)Reply
Of course I have. I mostly played the devils advocate to provoke a reaction. Still, I was partly serious. See below. --Patrik Stridvall 11:51, 26 February 2006 (UTC)Reply
I agree that it's rare that two words mean exactly the same thing. True synonyms can probably stay on a word's page. A thesaurus often gives words with similar meanings that fall under the umbrella of a headword. The writer's judgement is still required to choose the right one. This is the area in which WikiSaurus would function. Slang can be included as long as it is suitably identified as such. Eclecticology 10:07, 26 February 2006 (UTC)Reply

heading 6

edit
The point is that I don't really find thesauruses as they are organized very useful. Many word are listed in the same category that are only remotely related to each other and soemtimes only usable in certain contexts. I find myself often more confused than helped. The goal is to try to find a better word to use than the current word you are looking at, right?
There are MANY goals to a thesaurus. But one goal is to list a lot of words that are close in meaning to one another. The user than has to look up each word and decide if it is appropriate to their use. Even words that perhaps have strictly the same meaning ([eg:[corpse]] and cadaver), are used in diferent contexts. The truth is that there are probably few true synonyms. There are often very subtle diiferences of meaning, appropriateness of use etc between words that are listed as synonyms.--Richardb 15:43, 28 February 2006 (UTC)Reply
Another use that a searchable WikiSaurus provides is a searchable very compact storehouse for all those words and phrases that are in use, but don't (yet) meet the Criteria for Inclusion. eg: WikiSaurus:corpse has the phrase mortal remains. Surely an acceptable phrase for Wiktionary. Yet, so far it doesn't even have an entry, let alone meet the CFI. Yet if someone searches on "mortal remains" they will find the WikiSaurs:corpse entry, and get a pretty clear idea of the meaning of the phrase. If you looked up lung you'd get no idea of what a "healthy pair of lungs is", unless the search also gave you WikiSaurus:breasts. Only place you'll find that great old word bazoomas is on WikiSaurus:breasts--Richardb 15:43, 28 February 2006 (UTC)Reply


So, if we for forget about true synonyms and consider the normal case there any alternative carries slightly different denotations and conotations making every word unique and thus it relation with other words. So it would be more useful to on the page for each and ever word specify in what way other words are synonyms. To be more concrete on WikiSaurus:insane there are only two word I would consider using, unless I wanted to be offensive, are mentally disturbed and weird. Of course depending on what I mean I might use retarded, demented or eccentric. So for example on the page insane:

====Synonyms====

  • [[eccentric]], [[wierd]] (''by choice'')
  • [[demented]] (''because of old age'')
  • [[retarded]] (''because of low intelligence'')
  • [[mentally disturbed]] (''more clinically'')
--Patrik Stridvall 11:51, 26 February 2006 (UTC)Reply

heading 7

edit

What now pops up in my mind is the following question: Do we...

  1. ...work along the lines of a Roget-like structure, i.e. a thematical classification of fewer but longer entries, providing closely as well as remotely related terms (all the "-nyms", all verbs, adjectives, phrases, proverbs etc)? This may indeed create large entries with lots and lots of information.
  2. ...work along the lines of an Oxford Thesaurus-like structure, i.e. an alphabetical classification of more but shorter entries, providing only closely related synonyms and a few antonyms with cross-references to more remotely-related headwords?

Like Ec stated, "WikiSaurus" is completely up to our own interpretation. The thing is how do we interpret it then? — Vildricianus 11:35, 26 February 2006 (UTC)Reply

I would clearly support the first option. It would start with the big themes but organized to be scalable. Much of what is suggested in the second option is already being done in our regular Wiktionary entries. Patrik's comments are valid too, but does one use a thesaurus just to find exact synonyms? Generally a person's passive vocabulary is bigger than his active vocabulary; in other words, there are many words that one recognizes while reading that don't come to mind while writing. A thesaurus helps to bridge that gap. A thesaurus is really an advanced writing tool, and one advantage that we have is the ability to directly link to the words themselves so that a user can differentiate suggestions more easily than with paper thesauri.
I think too, that our initial WikiSaurus headings should conform as closely as possible to the most recent Roget classifications (i.e. the sixth edition). Roget has had an enormous number of editions in the last 150 years, and each tweaks the classes just a little bit. Using this as an anchor will help us to take things further in a semi-organized manner, but it should not be a hinderance to a further development of these classifications as the need develops. We currently have WikiSaurus:insane and WikiSaurus:mad person, while Roget has a primary entry under "insanity" and formerly had a primary entry under "madman" that has been merged into the other in recent editions. I would suggest renaming both to conform, and leaving a note on the latter page about what has happened to that entry. One of the problems with what we have at Wiktionary:Roget Thesaurus Classification is that the edition is not identified. Apparently it was copied from what was already in Wikipedia at that time? Can anybody identify it?
That might turn out to be a bit restrictive, as Roget Thesaurus was designed for a paper system, not a hyperlink system. But we could make a start by at least trying to link each WikiSaurus entry to a Roget classification (as done in WikiSaurus:insane). That might help inform us on whether the classification is too restrictive. Plus, I think having to stick to a prescriptive classification might make it very difficult for the general user to add WikiSaurus entries. Whereas an entry like WikiSaurus:chav can be built without this knowledge, and then linked to Roget. In looking for where to link it we might then find a better head word than chav, and move/rename the Wikisaurus entry to there, leaving a redirection in place until we can change all the referring links (may be a job for a bot ?)--Richardb 16:03, 28 February 2006 (UTC)Reply
We probably should not be developing WikiSaurus entries for foreign words, at least not in the near future. There have been different suggestions relating to the translations, including hiding them. My recommendation would be to continue as we have been with translation, and make no drastic changes to that until the Wiktionary2 people have their act tegether enough to let us know what kind of relationship there can be between the two projects. Eclecticology 19:36, 26 February 2006 (UTC)Reply
I had missed this discussion before, but I think we ought to combine the two systems somewhat, trying to both have as much information as possible while also making the best use of the hyperlink system and the wiki software. Roget's was geared toward a certain clientelle I think, as was Oxford, as are all Thesauruses, including ours. I think our goal is to be usable by everyone, and we ought to clearly define what uses we want to provide. Synonyms and antonyms are obvious to me, as well as idiomatic synonyms and antonyms. Where we branch off from there is basically only limited by how much we CAN add, the more information about a word as it is semantically related to other words the better, but I want to be sure that that most basic information is there and readily accessible by ANYONE, whether or not they are familiar with how Thesauruses work. About foreign entries, I think they have merit, and should be included as soon as we figure out what exactly we are going to make this be. Especially if we can add sesnses to the synonyms and antonyms, this can provide some vocabulary expansion and precision in choice to the people using WikiSaurus. - TheDaveRoss 17:13, 5 April 2006 (UTC)Reply
We really don't have that much disagreement. What I mostly support is a go slow approach because this is such a complex task. It would be too easy to get bogged down in a poorly thought out and premature design for this. Eclecticology 09:07, 6 April 2006 (UTC)Reply
We needed a bit of a go at Thesaurus to start sorting our ideas out. My view is we are making progress along that route. I'd only like to encourage Dave to do what he can to take some more steps. I'm not sure that going slower is going to make it any better. But I agree we all need to keep an eye on it and comment on the way it is developing. I 'think' Dave has taken on board my concern about not chucking out the crap words as worthless. There is value in the dross. And he also seems quite aware of the idea of making it very accessible, keeping it simple.--Richardb 12:31, 7 April 2006 (UTC)Reply

heading 8

edit
Oh, I admit, I have been helped by a thesaurus from time to time. Still, English is not my native language so I have learned it over time and by now I know most important words at least passively... Many years ago I didn't and I remember being very confused when I looked at a thesaurus. I imagine, I will get the same feeling if a look a German or a French thesaurus today. The point is that we should really try to do something better...
Listing a lot of colloquial and slang will only make it less useful for people that does not speak good English. We have categories lets use that for such things instead. Important synonyms with a short explaination can go at the pages of each word. You can always continue clicking on the synonyms for the words you thought were the best alternatives. --Patrik Stridvall 21:41, 26 February 2006 (UTC)Reply
Patrick, I don't think you are going to convince us to scrap WikiSaurus at this stage. Those who find it useful and interesting will build it. Those who don't want to use it don't have to. You can stick a huge list of synonyms at the end of many word entries if you want to, though they would tend to clutte rup the entries a bit.
Perhaps you can try something out first. Why not start with going to every word listed in WikiSaurus:breasts and in each list all the synonyms fromWikiSaurus:breasts. Along the way create entries and add verifications for all the words listed in WikiSaurus:breasts that don't yet have entries. Then come back and tell us if it's easier to have a WikiSaurus or not. If you still think you can get along without WikiSaurus, then carry on with WikiSaurus:sexual intercourse etc etc. Should keep you busy for a while.--Richardb 15:53, 28 February 2006 (UTC)Reply
Few words have very many useful synonyms.
Patrick.That is very, very subjective. And who said we should limit the Saurus idea to useful synonyms ? Should we limit Wiktionary only to useful words perhaps ? What might not be useful to you, an obviously serious author, might be extremely useful to someone writing scatalogical poetry, or comedy scripts.--Richardb 13:56, 3 March 2006 (UTC)Reply
Subjective, perhaps. I see it more as prioritizing people that write normal texts. There is no point presenting a lot of, for most people, "useless" alternatives. I have no problem with "useless" words having pages and categories of their own. It think it is OK to have a link to such categories at the pages of the "serious" words but no more than that. ---Patrik Stridvall 17:04, 3 March 2006 (UTC)Reply
By useful I mean words that you should seriously consider using when you are writing. If you are reading you can just look up the word you have found, you don't need a thesaurus for that.
May be partially true, if and when Wiktionary has good entries for all those slang words. But since we have tough Criteria for Inclusion, then a lot of those slang words won't get in, so we need them in WikiSaurus.
Words that don't meet our Criteria for Inclusion shouldn't be here at all anywhere unless they are protologisms, neologisms or similar in which case they belong in the appendix. --Patrik Stridvall 17:04, 3 March 2006 (UTC)Reply
The only AFAICS useful synonyms listed in WikiSaurus:breasts is mammary glands, udder and teats. The others are not recommended to use if you are writing so they can belong to a suitable category like Category:English slang:breasts or whatever. So that won't be very hard. Have any more difficult challenges? --Patrik Stridvall 21:07, 28 February 2006 (UTC)Reply
I haven't noticed you actually try the challenge yet, which would mean actually creating entries, verified, for all those words that are listed in WikiSaurus:breasts but don't yet have entries.
As long as a debate it going on it would be impolite to do things others might disagree with. We obviously, whatever we decide to do, can't delete WikiSaurus:breasts until all words are, if they meet the Criteria for inclusion, added and categorized. Don't except me to do anything concerning words that are "useless" for writing though. --Patrik Stridvall 17:04, 3 March 2006 (UTC)Reply
"Not Recommended" - not recommended by whom ??Patrick, I think you are just having a lend of us, winding us up. You can't really be serious. Can you ??? --Richardb 13:56, 3 March 2006 (UTC)Reply
I'm serious. But yes, I admit that I presented an extreme view, in part to provoke a reaction, in part to provide a contrast to how it is currently done. We obviously have to choose some sort of reasonable comprise. By "not recommended" I mean words that is you shouldn't use if you expect your readers to take you seriously. How serious would you take this post if I had used a lot of slang words? How serious do you think people take WikiSaurus since it mostly contains slang words? Appearance is very important if you are to be taken seriously. --Patrik Stridvall 17:04, 3 March 2006 (UTC)Reply