Wiktionary:Beer parlour/2015/February

Wiktionary:Criteria for inclusion edit

I ask that Kephir edit to Wiktionary:Criteria for inclusion (diff) is undone. I would do it myself but I cannot. Vote Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep did not pass, and therefore cannot lead to any edit to WT:CFI; no edit to that page was proposed. Furthermore, the opposers of the vote did not express the wish to ignore CFI, merely to override it in a relatively small number of cases. I am fairly certain that the edit is not based on consensus. --Dan Polansky (talk) 12:42, 1 February 2015 (UTC)[reply]

Yeah, it probably should be undone by a sysop. At present, we have no mechanism for deletion of articles. See my comment below. Purple backpack89 16:21, 1 February 2015 (UTC)[reply]

Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep: What does it mean? edit

The clock ran out on Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep, and it was closed as not enacted. So what does this mean? Recently, Wiktionary:CFI was changed to be an obsolete policy. Much as I do not agree with CFI as currently written, it's clear that that was not the correct approach, and creates a host of problems stemming from the lack of a mechanism to delete any article. No, what I interpret the discussion to be is that many participants are unhappy with CFI as written and they believe things other than CFI should be considered in RfD discussions. The upshot of this is not demoting CFI (or at least the part of it that is RfD's purview) to obsolescence, but demoting it to a guideline. Some of you have asked "what's the difference between a policy and a guideline?" A policy is overarching, supported by a wide supermajority of participants, and should be followed all or almost all the time. A guideline can be ignored if there's a consensus to, need not cover everything, and can be enacted with less of a supermajority. The practical implications of this are that articles can still be nominated for deletion under the auspices of CFI, but they won't necessarily be deleted solely on CFI. I see this as being in line with the vote. Note that much of this doesn't apply to the parts of CFI that have to do with RfV. I have seen no evidence of people being upset with the verifiability sections of CFI, which should probably be spun off into a different page that remains policy. Purple backpack89 16:16, 1 February 2015 (UTC)[reply]

Vote Trimming CFI for Wiktionary is not an encyclopedia edit

FYI, Wiktionary:Votes/pl-2015-02/Trimming CFI for Wiktionary is not an encyclopedia. Let us postpone the vote as long as discussion requires. --Dan Polansky (talk) 16:24, 1 February 2015 (UTC)[reply]

Just some food for thought about bots edit

In the words of C.G.P.Grey, "They don’t need to be perfect, they just need to be better than us [humans]."

How should we deal with bots who make mistakes? --kc_kennylau (talk) 16:42, 1 February 2015 (UTC)[reply]

The traditional thing is to tell their human owner. That's what people have done with mine (far too often). SemperBlotto (talk) 16:44, 1 February 2015 (UTC)[reply]
- @SemperBlotto Well, telling the human owner does not directly solve the problem. Sometimes, fixing the error is much harder and time-consuming than creating it, especially if the error is hidden amongst a group of, let's say, 5000 pages. Personally, I spend 1% of the time writing the programme and 99% of the time debugging. --kc_kennylau (talk) 17:29, 1 February 2015 (UTC)[reply]

Request to Add New Subcategory "LWT" Within LDL edit

This is a request for the Wiktionary Community to consider adding a new subcategory, Languages without a Written Tradition (LWT), under LDL (Less Documented Languages).

What is an "LWT"?

An LWT is a language that has an oral tradition, but has no tradition of writing and no written publications authored by native speakers. LWTs are a subset of LDLs. (Note that documents authored in other languages by outsiders and merely translated by native speakers, such as the Bible and government documents, are not suitable as sources for documenting a language.)

Why not Simply Call LWTs "Unwritten Languages"?

The term "unwritten" can be misleading, because the boundary between languages that are "unwritten" and languages that are "written" is actually quite fuzzy. Presumably we can all agree that a language community that has no writing system, no notion of literacy, and has never had its speech transcribed by outsiders can be considered an "unwritten language."

But when that community is visited by linguists who develop an orthography, and (perhaps imperfectly) transcribe some words and phrases from the spoken language into written form, perhaps publishing the results, what then? Is this language "written," even if no one in the language community is literate, and the published "results" contain the errors of a non-native speaker? Some of you might call such a language "written," and others just as reasonably might say it is "unwritten."

Let us now consider a third example. What about a small indigenous language community in Brazil that is completely unfamiliar with writing, and yet, through a process of increasing contact with the national society, develops an orthography and village schools, where children are taught to read and write in both their indigenous language and Portuguese? Obviously, when nearly every child can write words in their own language, the language cannot be considered an "unwritten language." Yet is it a "written" language? Does it have great literature? Yes, in oral form. Poetry? Absolutely, in oral form. Historical narratives, sacred texts, genealogies, song lyrics, compendiums of botanical and zoological knowledge? Yes, all in oral form. What, then, is written in this language? Aside from basic word lists and literacy primers modeled on Portuguese examples, virtually nothing — yet. Today's young adults are the first literate generation.

This is the case with Wauja, an Arawak language spoken by 400 indigenous people in lowland Amazonia. Although Wauja was "unwritten" a generation ago, today it is "written," in the sense the children are taught basic literacy in their village schools. However, as yet — and this doubtless will change — there is no written tradition in this language, no body of publications authored by native speakers. All their literature is still in oral form.

For the purposes of Wiktionary, the key issue is not whether a missionary or professional linguist has phonetically transcribed snippets from the language, but whether there exists a body of work authored by native speakers that is large enough to provide references for every word in the language. For languages like English, Chinese, and all "major" languages, the answer is yes. These languages have extensive written traditions. For thousands of small and endangered languages, the answer is no. These are languages with rich intellectual and literary traditions — in oral form. Such languages may have some (recently-acquired) knowledge of writing, but they have no tradition of writing. This presence or lack of native-speaker-authored published references is the distinction that matters for the Wiktionary community, at least in reference to inclusion criteria.

Why is the Subcategory LWT Needed?

LWTs, by definition, lack a body of published sources authored by native speakers. As a result, it is not possible to use published sources to attest to Wiktionary entries for LWTs. Nevertheless, LWTs are important members of the family of human languages, with rich literary and intellectual traditions, and they deserve to be included in Wiktionary. In fact, these LWTs are typically endangered languages spoken by language communities that are most in need of the permanent, globally accessible, open source, cultural commons platform that only Wiktionary can provide. Therefore, it is proposed that the Wiktionary community define this limited category of languages (LWTs) and agree upon attestation criteria that are sensible and appropriate for such languages.

Can LWTs Meet Current Attestation Standards?

Current Wiktionary attestation standards (see Criteria for inclusion) call for verification either through widespread use (hard to verify for a language without publications) or "use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year (different requirements apply for certain languages)." For spoken languages that are living [but not well documented on the Internet], only one use or mention is adequate, subject to the following requirements:

the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
each entry should have its source(s) listed on the entry or citation page, and
a box explaining that a low number of citations were used should be included on the entry page (such as by using the LDL template).

Assuming that the first bulleted requirement above refers to a list of materials that are permanently available online, probably most LWTs cannot meet this requirement. For example, in the case of the Wauja language, spoken as a first language by 400 people in the Amazonian rainforest, there are hundreds of audio recordings, and several dozen carefully transcribed traditional stories, but none of them currently are available online. (Though they could be made available to Wiktionary admins upon request.)

Before these stories are posted online, the community must agree that they are correctly transcribed. That's because they were first recorded and transcribed several decades ago by an anthropologist (myself, in this case), at a time before any Wauja were able to read and write. Today, there is a cadre of young university-educated Wauja bilingual schoolteachers who are deeply committed to standardizing their orthography and documenting their language. However, this process takes time, because it is not decided by fiat. Instead, the Wauja, like many communities that speak LWTs, take time to reach decisions through building consensus. It's a chicken-and-egg situation. Without a standard orthography, it's hard to build a dictionary, but without a dictionary, it's hard to standardize the orthography.

Proposed Attestation Standards for LWTs

To allow responsible documentation to proceed within Wiktionary while members of LWT communities increasingly move toward standard orthography, publications by native speakers, and full compliance with Wiktionary LDL attestation standards, the following interim attestation standards for LWTs are proposed:

The community of editors for that language should maintain a list of materials deemed appropriate as the only currently existing sources for entries.
These sources may include audio or video recordings of native speakers, and transcripts of such recordings.
Sources also may include direct quotes from letters and written messages produced by literate native speakers, provided that the quoted material is archived online and annotated as described below.
All sources must include mention of the date of the recording or transcription, names of the native speakers recorded, the location of the recording, the name of the person making the recording, and location where the source is archived, if not online.
Once the transcript has been authorized by the language community as a faithful transcription, the names of community members involved in verifying the transcription also must be noted, and a copy must be posted to a permanent online location, such as Wikisource.
If Wiktionary admins find any reason to doubt the authenticity of the sources cited, they shall be allowed to examine the source material.

The overall goal of attestation standards for LWTs is to ensure responsible and reliable attestation for LWT entries, while making Wiktionary the best platform for documenting the world's many LWTs.

Clarification re: Sources for Attestation (Text vs. Audio and Video)

Based on comments below, it appears that the "Proposed Attestation Standards" listed above need clarification. My intention was to propose uploading all TEXT (written) sources to a permanent online location. This could be Wikisource or another location, such as an endangered language digital archive. However, I cannot propose uploading the actual AUDIO and VIDEO recordings to a Creative Commons site, because some language communities might not want the actual voice and video recordings of their elders in the public domain. For instance, in the case of the Wauja community (an indigenous people of Central Brazil), it would be offensive to publicly play recordings of elders after they have died, particularly since the community would have no say in how often or under what circumstances the recordings would be played. No such restriction is attached to mere text transcriptions, however, which Wauja elders consider to lack the spiritual power of the human voice. Most types of written texts could be posted in a freely accessible permanent online location.

Fortunately, however, the recordings could still be used for attestation. The Criteria for inclusion states:

"Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google. Print media such as books and magazines will also do, particularly if their contents are indexed online. Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived." (emphasis added)

If audio and video recordings used for attestation are deposited at a digital archive for endangered languages (for example: ELAR, the Smithsonian Institution, the Library of Congress), then "someone referring to Wiktionary years from now is likely to be able to find the original source" and, at the same time, the wishes of the language community will be honored regarding the respectful and appropriate use of their recorded material.

In summary, I propose that TEXT source materials (such as PDFs of transcriptions and translations) be posted on Wikisource or another location, such as an endangered language digital archive, but that AUDIO and VIDEO recordings of actual human beings be archived in a publicly-accessible digital archive that is equipped to honor specific intellectual property rights and privacy concerns of the endangered-language community in question. (Examples of suitable archives: ELAR, the Smithsonian Institution, the Library of Congress, and so on.) Emi-Ireland (talk) 22:10, 2 February 2015 (UTC)[reply]

Honoring the "No Original Research" Principle

For a language with a written tradition, it is appropriate to refer to published sources written in that language. However, for a language that consists of an exclusively oral tradition, it is appropriate to refer to authoritative oral sources that have been recorded and transcribed. To ensure that the "no original research" principal is honored, transcriptions of traditional stories, historical narratives, public oratory, and sacred incantations performed by elders before an audience can be given priority as sources, since these linguistic sources are particularly authoritative and reliable for LWTs.

Clarification: Faithful transcriptions from audio or video sources are NOT considered original research on Wikipedia

I searched for a Wiktionary policy statement on No Original Research, but have not found it yet. In the meantime, here is the Wikipedia policy statement on transcriptions from audio and video sources:

"Translations and transcriptions: Faithfully translating sourced material into English, or transcribing spoken words from audio or video sources, is not considered original research. For information on how to handle sources that require translation, see Wikipedia:Verifiability#Non-English sources."

https://en.wikipedia.org/wiki/Wikipedia:No_original_research#Translations_and_transcriptions Emi-Ireland (talk) 19:05, 3 February 2015 (UTC)[reply]

Proposed Standard for Transitioning from LWTs to LDLs

When a language has a sufficient body of publications (authored by native speakers) so that every word in the language can be referenced to a published work authored by native speakers, that language is no longer an LWT.

In practical terms, there is no hard and fast cut-off point, but perhaps we can say that once an LWT community has achieved a minimum threshold of 3,000 entries in Wiktionary, the community will have become aware of the importance of lexicography and its methods, and it will have benefited greatly from using Wiktionary to document, analyze, and teach literacy in their language. The language community will have had an opportunity to standardize their orthography, properly review transcriptions of older recordings of traditional oral literature, have native speakers produce new publications based on new recordings, and permanently archive online all such transcripts and publications. As a result, this language community will be considered capable of meeting LDL attestation standards going forward.

Emi-Ireland (talk) 19:35, 1 February 2015 (UTC)[reply]

Broadly support. This whole idea needs much more detail yet, but it seems clear that attestation standards for languages that have an existing literature, just one that is not well documented on the Internet, will have to be different from languages that have never had a written tradition.

It is also unclear to me if any considerable community of editors with a LWT as their heritage language (whether as a mother tongue, passive knowledge, or something in-between) even exists on the English Wiktionary yet. Even several larger minority languages out there, with relatively long-running historical traditions, have hardly any editors with more than elementary skills (e.g. Nahuatl, Navajo, Northern Sami, Xhosa). The situation might be different on other Wiktionaries, though, and e.g. I would not be surprized if a hypothetical Wauja wiktionarian community ended up preferring the Portuguese Wiktionary.

Also, as far as I know, "materials deemed appropriate as the only sources for entries based on a single mention" is not a priori limited to material permanently available online. This could well include sources such as linguistic publications, depending on the language in question.

Some other issues to consider:

If no consensus orthography exists yet, how are we to title any word entries? In terms of a pronunciation?
Would the entries qualify for the main namespace at all, or should a new dedicated namespace such as Unwritten: be established?
What about unwritten extinct languages? (I have been preparing a proposal with respect to some extinct languages, but for now I think I will instead watch this discussion unfold.)

--Tropylium (talk) 00:16, 2 February 2015 (UTC)[reply]

Re Tropilium's question: "If no consensus orthography exists yet, how are we to title any word entries?"

Tropylium, this is an excellent question. Perhaps we can consider the case of the Wauja, as an example. Currently, the Wauja themselves agree on the spellings for many words, but there is a vowel that missionary linguists spell one way, using a character not found on standard keyboards, and some young Wauja schoolteachers want to spell it another way, using the standard Latin alphabet. The community will have to sort that out, and it may not be decided overnight. (Certainly English spelling was not standardized overnight). In the meantime, the spelling of Wauja words in Wiktionary may occasionally need to be corrected.

A more thorny issue is where to place the breaks between words. This is where various Wauja authors most often differ from one another. Wauja is an agglutinating language. For example, verbs can use multiple suffixes simultaneously. Some authors write them all as one word, and others might break off the last suffix or two and write them as a separate words. It is possible that decisions to break up long Wauja words may result from notions that a word looks "too long" when compared to Portuguese words. My view is that both approaches are entirely valid, and that the community will have to decide which it chooses to use as the standard.

In the meantime, this wonderful language is endangered, and so it is essential to continue with the process of documenting it. Documentation is valuable not only for its own sake, but because it sends a strong message to young Wauja that the outside world values their language. In fact, the community is very excited that this summer they will be trained in how to participate in building a digital lexicon on the Wauja-Portuguese site. It is entirely possible, Tropylium, that you are correct, and that the Wauja will see the Wauja-Portuguese site as "their" dictionary. However, the Wauja also see themselves as global citizens, and are delighted and proud that a dictionary is being created that translates their language into English. Currently, some young Wauja learn snatches of English from popular song lyrics they encounter online. A Wauja-English Wiktionary will be welcomed not only by scholars and the general public, but by the Wauja themselves. Emi-Ireland (talk) 01:19, 2 February 2015 (UTC)[reply]

Oppose. No mechanism of independent verification proposed. The attesting recordings are proposed to be uploaded directly onto Wikimedia servers as per "Proposed Attestation Standards for LWTs" above. The section 'Honoring the "No Original Research" Principle' above seems to be contradictory; this does look like original research, especially in that the attesting material itself is original research (we do original research in that we are figuring out definitions from attesting quotations, but that's a different game, I think). Thus, this seems like something for Wikiversity. --Dan Polansky (talk) 18:48, 2 February 2015 (UTC)[reply]

Re: No Original Research rule

Please note that, per Wikipedia Policy, transcribing spoken words from audio or video sources is not considered original research. If Wiktionary has a policy on transcriptions and the No Original Research rule that contradicts this, I have not been able to find it. See: https://en.wikipedia.org/wiki/Wikipedia:No_original_research#Translations_and_transcriptions Emi-Ireland (talk) 19:22, 3 February 2015 (UTC)[reply]

Re: Importance of including all human languages

Thank you for your thoughtful comments. Given that you do not support the proposed attestation standards, I earnestly invite you to contribute your own suggestions for attestation standards that you could support. If we put our heads together, surely we can devise attestation standards that do not automatically exclude a large number of human languages, simply because they have an oral tradition, and not a written one.

The thing we must not lose sight of is that languages without a written tradition should not automatically be excluded from Wiktionary. That would be grossly unfair to the speakers of those languages, and it would be a sad day for Wiktionary, as well. We must find a way to include all human languages in Wiktionary, while taking every reasonable measure to ensure that the work is done as it should be.

I am a newcomer to Wiktionary, and so I assume your knowledge of suitable attestation standards is greater than mine. Can we work together to come up with a standard that does not exclude LWTs (languages that do not happen to have a written tradition)? Surely they are many ways we can address this problem. The important thing is to refrain from treating LWTs as if their languages don't belong here.

This is what inspired me to contribute to Wiktionary:

"Wiktionary ... aims to describe all words of all languages using definitions and descriptions in English."

We should live up to that, as well as to our attestation standards. Let's find a way to do both.

Emi-Ireland (talk) 20:10, 2 February 2015 (UTC)[reply]

Add category for terms with IPA pronunciation edit

I propose to add a category for the terms with IPA pronunciation by language. The edit is here which I reverted one second after I did it, in order to demonstrate the method before in order to ask for consensus. --kc_kennylau (talk) 12:34, 3 February 2015 (UTC)[reply]

To what end? I think it would be more helpful to have a category for terms without IPA pronunciation, so we know what needs to be added. —Aɴɢʀ (talk) 20:37, 3 February 2015 (UTC)[reply]

We already have Category:English terms with audio links though. —CodeCa t 20:42, 3 February 2015 (UTC)[reply]

@Angr Well, it is virtually impossible to have a category for terms without IPA pronunciation, and using category scanning tools can actually identify those terms without IPA pronunciation if there is a category for terms with IPA pronunciation. --kc_kennylau (talk) 11:10, 4 February 2015 (UTC)[reply]

Sure, why not? We are blessed to be equipped with categories, and they should be exploited. --Type56op9 (talk) 11:15, 4 February 2015 (UTC)[reply]

Done. --kc_kennylau (talk) 12:42, 6 February 2015 (UTC)[reply]

Languages - are they proper nouns or not? edit

I have had a long discussion with User:-sche (User talk:-sche#Maori) (can't get it right) about whether languages are proper nouns. In my opinion they are mass nouns instead. It was suggested by -ische that a discussion be started here on the subject. Donnanz (talk) 22:52, 3 February 2015 (UTC)[reply]

IFYPFY - -sche (discuss)

Thanks for that! Donnanz (talk) 23:38, 3 February 2015 (UTC) [reply]

I see someone already made the point about pluralisation ("various Englishes"); however, our proper-noun template doesn't preclude the possibility of a plural. Some people seem to like to add plurals for given names and surnames. I will admit that I find the common/proper distinction very confusing. Equinox ◑ 23:44, 3 February 2015 (UTC)[reply]

We should make languages common nouns, also demonyms (nationalities, ethnicities), e.g. German (person), even if they are capitalised in English and some other languages. Nominalised adjectives, like English, Chinese, etc. shouldn't have plural forms in standard English, it's easy to address. --Anatoli T. ^{(обсудить}/^вклад) 23:47, 3 February 2015 (UTC)[reply]

@Atitarev: WTF? Why? This is English Wiktionary. Why in the world should we subjugate our syntactic tradition to those of other languages? That there are lots of secondary uses of proper nouns as common nouns or mass nouns is immaterial.

"Nominalised adjectives, like English, Chinese, etc. shouldn't have plural forms in standard English, it's easy to address." Are you saying that you don't like Englishes and that you disapprove of those who use the word? You should take it up with the authors in these Google Books hits for Englishes. DCDuring TALK 00:27, 4 February 2015 (UTC)[reply]

You misunderstand me but I haven't expressed myself well. My point is "English" (noun) and (proper noun) sections should be merged into common noun and a note about "Englishes" should be added, as it is normally uncountable and is pluralised only for some senses. --Anatoli T. ^{(обсудить}/^вклад) 00:40, 4 February 2015 (UTC)[reply]

Funnily enough, the top match (World Englishes: A Resource Book for Students) is a course text at my university. Equinox ◑ 00:41, 4 February 2015 (UTC)[reply]

Perhaps English is not the best example but "Chineses" or "Vietnameses" sounds pejorative. Anyway, no need to be picky about what I said, let's focus on PoS discussion - common nouns vs proper nouns. --Anatoli T. ^{(обсудить}/^вклад) 00:48, 4 February 2015 (UTC)[reply]

OK. Please provide some reason why you think the merger would be a good idea for Wiktionary users. DCDuring TALK 00:59, 4 February 2015 (UTC)[reply]

There was a similar discussion~~, I think started by User:CodeCat about eliminating proper nouns,~~ there was some reasoning there. Not sure where that discussion is now. While I think it's a good idea, the first step is perhaps deciding what candidates are first to be reduced to common nouns. This will reduce the entries, remove duplication in translations, less maintenance. Days of the week (e.g. Saturday), month names (e.g. November) are also capitalised but they are common nouns. I think language names and demonyms are also common nouns but there are various opinions on this. Let's see what other people think. --Anatoli T. ^{(обсудить}/^вклад) 01:18, 4 February 2015 (UTC)[reply]

The discussion I meant - Wiktionary:Beer_parlour/2014/October#On_proper_nouns --Anatoli T. ^{(обсудить}/^вклад) 01:24, 4 February 2015 (UTC)[reply]

There seems to be a general consensus in that discussion that languages are not proper nouns. However I wouldn't go as far as recommending implementing CodeCat's suggestion that the categories for proper nouns and (common) nouns be merged. Names like English and French are also surnames, so even if languages are treated as common nouns there would still be a need for a proper noun in those cases. Donnanz (talk) 10:36, 4 February 2015 (UTC)[reply]

What consensus? Donnanz and Atitarev, non-native speakers of English? It would be like Equinox and I agreeing that Russian entries should not be in Cyrillic. DCDuring TALK 11:16, 4 February 2015 (UTC)[reply]

Perhaps you'd like to clarify that, I am a native speaker of English by the way. No decision was reached in that discussion, but reading that thread (even between the lines) I got the impression that there was a general consensus for treatment of languages as common nouns. Donnanz (talk) 11:32, 4 February 2015 (UTC)[reply]

In Spanish and French they are treated as common nouns (and are uncountable). --Type56op9 (talk) 11:13, 4 February 2015 (UTC)[reply]

In French, the plural may be used in some cases (e.g. Français parlés et français enseignés, a book by Juliette Delahaie, 2010). It's the same as English, except that they are not capitalized. Lmaltier (talk) 21:19, 4 February 2015 (UTC)[reply]

In English we dispense with diacritcal marks. So let's clean them out of Spanish and French entries. DCDuring TALK 11:20, 4 February 2015 (UTC)[reply]

The Scandinavian languages also treat languages as common nouns, and no capital letter is used. Donnanz (talk) 11:32, 4 February 2015 (UTC)[reply]

My inclination is to continue treating language names as proper nouns. I'm having a hard time find authoritative advice on the matter, however. Most of the reference works I've found (via Google Books), both those from a hundred years ago and those from last year, conflate properness-vs-commonness with capitalization. One outright says "Capitalize proper nouns and words derived from them; do not capitalize common nouns", which is obviously inaccurate — tell it to the Marines, the Americans and the Englishmen.
Alfred Marshall Hitchcock's 1910 Junior English Book says North, South, East and West are proper nouns; spring, summer, autumn, fall and winter are common nouns; arithmetic, science, geography and other branches of study are common nouns; and English, French, German, Latin and other names of languages are proper nouns. However, it then goes on to discussion how people don't capitalize the names of familiar animals, but are sometimes tempted to capitalize the names of unfamiliar animals, which makes me question if it, too, is equating properness-vs-commonness with capitalization.
Perhaps most promisingly, International English Usage (2005, →ISBN discusses not only "proper nouns (and the names of languages are proper nouns)" and common nouns (its examples are fruit and spider) but also concrete (coin) vs abstract (jealousy) nouns. Maybe someone can find better references — but I'm told CGEL is silent on the matter. - -sche (discuss) 09:04, 5 February 2015 (UTC)[reply]

I finally found a source that is explicit on the subject: The Oxford Guide to Practical Lexicography, Atkins and Rundell (2008). In the course of discussing the groups of proper names that a dictionary might include depending on how important the class is to the target market, they have some lists: 'place name', 'personal names', and 'other names'. Under other names are the following subclasses: 'festivals, ceremonies', 'organizations', 'languages', 'trademarks', 'beliefs and religions', and 'miscellaneous'.

To this explicit characterization should be added that all language names are capitalized in English and that they refer to unique things, though those things may be subdivided, especially in technical discussion. The non-existence of a plural of a capitalized noun might be a sufficient condition to indicate that the noun is a proper noun, but the existence of a plural is not sufficient to indicate that it is a common noun. DCDuring TALK 04:59, 8 February 2015 (UTC)[reply]

Note that what you explain applies to languages names in English. I think no reader will object when finding language names in French described as common nouns, which reflects better how they are considered by French-speaking people. Lmaltier (talk) 18:47, 8 February 2015 (UTC)[reply]

They are common nouns in English too, except here of course. Donnanz (talk) 19:15, 8 February 2015 (UTC)[reply]

In the absence of a meaningful way to define the alleged distinction between common nouns and proper nouns (and I mean anywhere in the world, not just on Wiktionary), the question is moot. —Aɴɢʀ (talk) 20:38, 8 February 2015 (UTC)[reply]

And will remain moot, because there is a general idea of the definition (and this is more or less the same meaning in all languages), but details about how it's applied are ruled by tradition only, and this tradition depends on languages (and is not always clear). Lmaltier (talk) 20:50, 8 February 2015 (UTC)[reply]

@Donnanz: Could you produce some evidence from references or something that asserts that language names are common nouns in English? DCDuring TALK 21:47, 8 February 2015 (UTC)[reply]

Look for "mass noun" in orange (not the best colour), otherwise you may miss it.

http://www.oxforddictionaries.com/definition/english/Bokm%C3%A5l and :http://www.oxforddictionaries.com/definition/english/English. Donnanz (talk) 22:10, 8 February 2015 (UTC)[reply]

You assume that a mass noun must necessarily be a common noun. But you have acknowledged that trademarks are proper nouns. Providing specific counterexamples to the assumption is left as an exercise to the reader. DCDuring TALK 23:44, 8 February 2015 (UTC)[reply]

Trademarks are a different kettle of fish from languages. They start off as proper nouns, but can gravitate into common nouns and can even became verbs (e.g. google and hoover). See also Marmite, Mercedes and Bentley, I think the Bentley car should be a proper noun, not a common noun. Editors can quite easily get their knickers in a twist over trademarks, Oxford lists Marmite as both a mass noun and a trademark. But there shouldn't be any confusion with languages. Donnanz (talk) 09:57, 9 February 2015 (UTC)[reply]

There is no confusion. Languages are proper nouns because the are names of singular entities. That they can be used as plurals, used as mass nouns, and used attributively is barely interesting as other proper nouns can too, though the relative frequency might differ by type of noun. Perhaps it would be easier to swallow if you viewed it as metonymy: "There were two IBMs in a refrigerated room."; "I own too much IBM."; "It's an IBM computer." DCDuring TALK 15:28, 9 February 2015 (UTC)[reply]

Anagrams - do they serve a purpose? edit

While I'm at it, I may as well ask whether the inclusion of anagrams in Wiktionary actually serves a purpose - are they useful, or just a fun thing? I'm not sure whether this has been discussed before. Donnanz (talk) 22:59, 3 February 2015 (UTC)[reply]

Yeah, someone will come along and object to the anagrams fairly regularly. Points brought up in the past include (i) they are useful for word games such as Scrabble; (ii) they are a genuine provable "function" of a word, whereas e.g. spelling-bee trivia is not. There also tends to be general interest in words with unusual properties, such as palindromes, very long words, and words with unusual combinations of letters (such as our Q-without-U category); anagrams are that sort of thing. Equinox ◑ 23:08, 3 February 2015 (UTC)[reply]

I see. I have actually changed one or two where the anagram happened to be a synonym or variant spelling, but this doesn't happen very often. Donnanz (talk) 23:21, 3 February 2015 (UTC)[reply]

What did you change? From the point of view of the Scrabble player, SPECTER and SPECTRE might as well be totally different words: the point is which one of them is better strategy (e.g. perhaps you don't want the E on a square that gives more possibilities to your opponent). Anagrams are anagrams; I don't think we should edit them based on semantics. Equinox ◑ 23:46, 3 February 2015 (UTC)[reply]

I'm afraid I can't remember now, but it was only two at the most. I'll bear that in mind in future. Donnanz (talk) 00:03, 4 February 2015 (UTC)[reply]

I give zero phucks about games. They do not add any lexicographical content, so we should not include it. Even paronyms and folk etymologies (I have read somewhere that we do not include them) are more interesting than anagrams. --Dixtosa (talk) 15:07, 5 February 2015 (UTC)[reply]

I agree, anagrams take space and are useless. Let's remove them. --Vahag (talk) 15:15, 5 February 2015 (UTC)[reply]

Out of interest, do Dixtosa and Vahag also favour removal of the palindrome and Q-without-U categories? Equinox ◑ 16:51, 5 February 2015 (UTC)[reply]

No, because categories do not take up space on the page. I am concerned about the layout of our entries. The fewer sections we have, the better. --Vahag (talk) 16:54, 5 February 2015 (UTC)[reply]

Agreed. BTW, I can argue that QwoU category can have lexicographical value (as I see it, it is gonna contain exceptional words, because every occasion when Q is not followed by U is an exception).--Dixtosa (talk) 19:10, 6 February 2015 (UTC)[reply]

I personally find anagrams (and also rhymes) pretty useful for solving and compiling cryptic crosswords. The sections are automatically maintained by bots, so I don't see any real reason to object to them. Smurrayinchester (talk) 15:28, 5 February 2015 (UTC)[reply]

I don't use the anagrams sections for anything myself, but they do no harm and don't duplicate information available (or even potentially available) in any other Wikimedia project, so I'd be opposed to trashing them. —Aɴɢʀ (talk) 15:41, 5 February 2015 (UTC)[reply]
Ah, we're getting some mixed reactions. Keep 'em coming. Donnanz (talk) 16:02, 5 February 2015 (UTC)[reply]
At least anagrams can be automatically generated (e.g. this tool for fr). Rhymes can also be generated automatically as well. Although both depend on the exhaustivity of information. Dakdada (talk) 16:08, 5 February 2015 (UTC)[reply]
A simple way to reduce the space taken by anagrams is to list them horizontally instead of vertically. DCDuring TALK 18:12, 5 February 2015 (UTC)[reply]

Not a bad idea, if there's quite a few of them. Donnanz (talk) 18:56, 6 February 2015 (UTC)[reply]

It is not exactly about how many lines they take, but rather the fact that nonlexicographical content does not deserve place in articles' pages. Besides, I am sure no1 is able to prove that listing rearrangements rather than, for example, subsets is more plausible. --Dixtosa (talk) 19:09, 6 February 2015 (UTC)[reply]

If we can accommodate such content at low cost and low intrusiveness, it might serve to get a few more contributors. I would expect that word-puzzle and word-game fans constitute a significant share of users and contributors. Anything that makes contributing fun is worth consideration. DCDuring TALK 20:23, 6 February 2015 (UTC)[reply]

Looking at earnt, putting anagrams on one line has already been happening. Donnanz (talk) 23:52, 6 February 2015 (UTC)[reply]

I've been doing it for a while, but not systematically. DCDuring TALK 02:25, 7 February 2015 (UTC)[reply]

Apparently, Conrad.bot, which also inserted anagrams, used to do it. DCDuring TALK 02:29, 7 February 2015 (UTC)[reply]

Why stating that they do not add any lexicographical content? Usually, they are not included in dictionary entries, true, but it would be possible to include them, and we do it, this makes them lexicographical content. Some dictionaries are dedicated to anagrams (see w:Anagram dictionary). The important question is their usefulness, and I think they are useful. Lmaltier (talk) 18:55, 8 February 2015 (UTC)[reply]

Anagrams are mathematical. They reduce a word to mathematical properties ignoring meaning, pronunciation, etymology. Literally everything apart from what letters the word uses and if any other words use the same letters. 95.144.169.113 11:56, 7 March 2015 (UTC)[reply]

Admin vote edit

This is to inform you that I've decided to nominate myself for adminship again. The reason is because I want to work with lots of templates, in order to make lots of cleanup pages. This way I won't have to keep bugging other admins to make changes. BTW, the page is at Wiktionary:Votes/sy-2015-02/User:Type56op9 for admin. It would be fun to hear your opinions. --Type56op9 (talk) 11:33, 4 February 2015 (UTC)[reply]

I oughta nominate myself for the mop as well. If this guy's gonna get it, and Kephir still has it, why can't I? Purple backpack89 15:08, 4 February 2015 (UTC)[reply]

Hey, I gotta great idea. Instead of blocking vandals, why don't we make them sysops instead? SemperBlotto (talk) 21:07, 4 February 2015 (UTC)[reply]

Yeah! And instead of applying CFI and policy, why not ignore them instead? Oh wait we've done that one. Equinox ◑ 13:27, 5 February 2015 (UTC)[reply]

We've already made vandals sysops too, and even allowed them to remain sysops after their vandalism has come to light. —Aɴɢʀ (talk) 15:43, 5 February 2015 (UTC)[reply]

If I knew what a sysop is, maybe I could have a laugh (?). Donnanz (talk) 15:50, 5 February 2015 (UTC)[reply]

Did you consider consulting an online dictionary - hint sysop SemperBlotto (talk) 15:58, 5 February 2015 (UTC)[reply]

Er, no, I thought it was Wiktionary jargon. Thanks. Donnanz (talk) 16:04, 5 February 2015 (UTC)[reply]

Category for double modals edit

Would it be useful to have a category for double modals (might can), like we have one for double contractions? We don't even have a category for single modals at the moment. - -sche (discuss) 20:58, 6 February 2015 (UTC)[reply]

As an English-specific category, maybe. —CodeCa t 21:26, 6 February 2015 (UTC)[reply]

Certainly. (For those reading this thread who don't speak other languages: German and many other languages are also able to stack modals, but they're not defective so they're not remarkable.) Do you think it would be useful to also have categories for modal verbs in general? I notice we do already have Category:German modal verbs, but it's empty. - -sche (discuss) 22:15, 6 February 2015 (UTC)[reply]

I'm surprised that we don't have something as fundamental as modals categorized. If we had that would we need one for double modals? That is, wouldn't it be clear by inspection of the base category which were double modals. DCDuring TALK 15:21, 8 February 2015 (UTC)[reply]

We do have Category:English auxiliary verbs, which does not seem to be complete. Even English modals may be too sparse to be a good category. Perhaps an Appendix? DCDuring TALK 17:37, 8 February 2015 (UTC)[reply]

Is Nostratic allowed in etymologies? edit

When I removed Nostratic material from a PIE etymology section, Ivan Štambuk reverted me; evidently he has some belief in it, but scholarly opinion is strongly opposed to it. I think we ought to avoid having something so far from the linguistic mainstream treated as credible in PIE entries, and if necessary I would create a vote about it. Does the community support its inclusion? —Μετάknowledge^{discuss/deeds} 07:53, 7 February 2015 (UTC)[reply]

No. —Aɴɢʀ (talk) 08:21, 7 February 2015 (UTC)[reply]

What's "Nostratic material" stand for, exactly? A sourced claim to the effect "*wódr̥ may be akin to *wete" would be sensible enough — there are several "Nostratic" comparisons of this sort that are both credible and well-established, and the main dispute is if they involve inheritance or some kind of loaning. But anything along the lines of "from Proto-Nostratic *wede" (privileging a disputed explanation; not to mention that no two Nostraticists agree on a reconstruction) or "also compare XX in Southern Oromo, YY in Old Kannada, ZZ in Evenki" (or other utterly dubious comparisons) should obviously be nuked on sight.

Looking up the appendix in question (*h₁er-), it seems we have some from column A, some from column B here. At least the Semitic root is well-reconstructed and should be OK to mention. I don't know if there's much point in discussing the alleged cognates in other Afrasian branches and Dravidian, if we don't even have the relevant Proto-Chadic or Proto-Dravidian etymology pages up yet. (The former could well be mentioned in the Proto-Semitic entry, of course.) --Tropylium (talk) 09:22, 7 February 2015 (UTC)[reply]

(Also, mutatis mutandis, I would suggest the same for claims of Altaic cognates, in case there's any work going on with that.)

A separate appendix of "Nostratic roots" for documenting the various proposals out there would be OK for me, but it should not be cross-linked from the main etymological appendices. --Tropylium (talk) 09:01, 7 February 2015 (UTC)[reply]

Why shouldn't it be linked? How else are people going to find out the connections? --Ivan Štambuk (talk) 20:55, 8 February 2015 (UTC)[reply]

What I'm against is creating any Nostratic appendices in the same mold as established protolangs, i.e. each page is an entry for a single proto-root, which lists its descendants, and each descendant is linked back specifically as "from Proto-Nostratic *ʔer-". The research just isn't far enough for that to be a sensible approach. There is no coherent consensus reconstruction of "Proto-Nostratic" that could be treated as a language according to Wiktionary's standards.

It seems doable to instead have pages that catalogue the different overlapping proposals in a particular semanto-phonetic area. Let's say we've a PIE root that has been compared to three different Semitic roots by Bomhard, Dolgopolsky and Illich-Svitych respectively; we can neither pool all of those into a single root, nor should we try to enforce an executive decree on which proposal is the closest to being correct. Instead a new kind of an article layout entirely seems to be required.

Moreover, note that this same problem also comes up within several established language families. There is no standard reconstruction of Proto-Afro-Asiatic, Proto-Niger-Congo, Proto-Sino-Tibetan, etc. So if we'd need some less mechanical way of formatting etymology appendices dealing with these families anyway, it stands to reason that the same approach, and not the proto-language approach, should be applied to Nostratic as well.--Tropylium (talk) 15:29, 13 February 2015 (UTC)[reply]

What do you mean by "research is not far enough" ? What standards does Wiktionary have when we allow original research in etymologies? I'm all for creating and establishing standards, but they should be applied consistently.

That type of layout should also be used for all protolangs, since they vary widely depending on the author/school, even established ones. The problem is the software which requires one spelling to be the main entry, and others redirecting to it. It's often a political question as well. But it's not that much of a priority IMHO - the priority is to collect information, and the formatting/presentation issue can always be solved later.

I think that there should definitely be a way in a PIE or PS reconstruction to indicate "there is a Nostratic root that has been connected with this reconstruction, and you can find more information about it here". It's absurd to have Nostratic roots listed somewhere without linking back to them. Perhaps some kind of a floating box would suffice? --Ivan Štambuk (talk) 01:12, 26 February 2015 (UTC)[reply]

The research is not far enough in that there is no such thing as accepted Nostratic soundlaws, or an accepted perimeter of Nostratic, that could possibly guide our work. Within any relatively young and well-studied group (on the order of Germanic, Slavic, Finnic, etc.) it is usually simple enough to check whether a particular proto-form, even if not explicitly sourced, is what the alleged descendants suggest. Admittedly I have not paid attention to what kind of OR we might have around exactly though; if editors are establishing etymological connections or devising new soundlaws all on their own, and they have some kind of a policy support for this, I'd argue that that's rather worrisome, yes. But my understanding has been that Wiktionary doesn't "allow OR" as much as "has been lenient in tolerating OR".

Your comment that some type of less formulaic entry layout should be used for better-established protolanguages as well is intriguing. It would indeed probably work for many roots in bottom-level languages like PIE and PU on which there remain many open questions. On the other hand, aside from notational fine-tuning, there is also widespread agreement on the reconstruction of words/roots like *gʷṓws or *kala, and chucking the regular entry layout entirely doesn't seem necessary (even if individual sub-headings may require different treatment). And again, closer to historicaly recorded languages, proto-words like this are probably the majority. --Tropylium (talk) 20:49, 27 February 2015 (UTC)[reply]

My opinion on Nostratic is the same as my opinion on Altaic. Someone who digs all the way down to the Appendix page of a PIE root is probably into etymologies, and might be interested in theories that go even deeper, like Nostratic — the key is that they need to be clearly labelled so no-one is mislead either to think that the theories are reliable, or that Wiktionary is believing them. "As part of the controversial Nostratic hypothesis, Smith connects this word to foo." And like Altaic, Nostratic should be limited to appendices (linked to from other places using {{etyl}}, the same way we link to any proto-language appendices).
If the wording were strengthened just a bit ("Within the controversial Nostratic framework" — note the added word and added wikilink to more info), the text at Appendix:Proto-Indo-European/h₁er- would be fine, IMO, though like Tropylium I would find it preferable if someone created a page for the Proto-Dravidan root and moved the individual proposed-cognates there. I wouldn't require someone to create pages for roots before mentioning cognates, for pretty much the same reasons as I outline in my comment here that begins "people may feel comfortable noting..."
- -sche (discuss) 23:05, 7 February 2015 (UTC)[reply]

You're presuming that there is a Proto-Dravidian root, but no such thing has been established here. Nostraticists are not a reliable source on whether a given word in a language is inherited. It's entirely possible that Dravidian specialists instead etymologize those Telugu and Kannada words by some kind of derivation, loaning, semantic shift, etc. E.g. if we can trust the StarLing people to have correctly encoded Burrow & Emeneau's Dravidian Etymological Dictionary, Kannada ere 'black soil' is not connected to the Telugu words, and indeed completely isolated within Dravidian. I for one would first ask if an etymology from the homonymous ere 'dark color' is possible, before reaching all the way to Nostratic. --Tropylium (talk) 15:57, 13 February 2015 (UTC)[reply]

Nostraticists don't make up the reconstructions for protolanguages that they compare. If you look at the e.g. last edition of Bombhard's dictionary, it has thousands of citations throughout, and the list of references alone is 300 pages. Burrow's dictionary has been available at the DSAL website for a decade now so it's easy to double check that: [1] - it's indeed connected. --Ivan Štambuk (talk) 00:57, 26 February 2015 (UTC)[reply]

OK, fair enough with the Dravidian words then. Although we may note that the original DED provides no reconstruction, and explicitly states it is an arrangement of data for later etymological study, not a Proto-Dravidian rootlist. I am also somewhat skeptical on using these kind of positivist resources, but in the absense of any clear arguments against the comparison, e.g. if there are no well-vetted reconstructions of PD out there yet, I'll accept it.

And no, I am not saying that Nostraticists mostly pull their reconstructions from up their sleeve! But sometimes they do, typically when attempting to project isolated words backwards (a la taking a word previously reconstructed only for Proto-Indo-Iranian and asserting a PIE root behind it), and we should source our lower-level proto-term reconstructions from "local" specialists in the first place. Whether preferentially or exclusively is a different debate though. --Tropylium (talk) 20:49, 27 February 2015 (UTC)[reply]

Your opinion and acceptance doesn't really matter in the grand scheme of things. The fact of the matter is that we have a credible authority in the field making the connection, and that's enough. Opinions of editors are irrelevant, other than assessing the credibility of the sources themselves. We are just minions collecting knowledge and our personal prejudices or affinities shouldn't get in the way of that. The relevant question is 1) Is information worthy of adding in terms of relevance 2) Is the source credible.

It's funny that you mention that projecting backwards thing - it's very common for "established" protolangs. Since all of the interesting stuff was done a century ago, researchers today are stuck with positing fanciful theories of protolang prehistory and making reconstructions on the flimsiest evidence. If you look at LIV and EIEC every other reconstruction has a question mark. Methodologically it's of course wrong, but there is always a possibility that a genuine PIE root was preserved only in one branch.

It would be nice to have a PII form instead, but unfortunately PII has not been yet been adequately reconstructed in two centuries of scholarship. IEist seem to like to take shortcuts instead, skipping the middle step. Why blame Nostraticists for doing the same thing?

Note that I don't necessarily disagree with criticism, but if you apply the same scrutiny the entry *h₁er- itself should be deleted. It's at least as far-fetched as the Nostratic etymology thereof. --Ivan Štambuk (talk) 00:24, 28 February 2015 (UTC)[reply]

If we're going for the "just follow the credible sources" angle, I should hope for you to remember distinguishing between "a number of Dravidian words have been considered probably related to each other" and "the mentioned words come from a unique Proto-Dravidian root".

I would agree, yes, that *h₁er- is not exactly the most convincing proto-root out there; but it does have one major selling point over any given Nostratic root, namely being reconstructed for a proto-language that we at least know to have existed. --Tropylium (talk) 01:59, 28 February 2015 (UTC)[reply]

I agree with including Nostratic on -sche's terms. --Vahag (talk) 09:41, 8 February 2015 (UTC)[reply]

Nostratic, Altaic and other long-range etymologies have vibrant scholarly communities who publish books and peer-reviewed papers on it. There are even journals exclusively dedicated to it. The opposition usually comes from linguists who oppose it in principle, and are not against any long-range theory per se. At any case, it's not up to us to decide whether it's worthy of inclusion or not on the basis of whether the majority of linguists believe those theories to be true or not - the only thing that matters is the notability of theories itself. It seems to me that you're rather worried that poor readers would be mistakenly guided into believing that Nostratic is on the same level of credibility as PIE. Which is on one hand kind of ironic because the PIE reconstruction *h₁er- is itself dubious, just like the two thirds of the entire PIE lexicon. At any case, should Nostratic be marked with some kind of exra-safe version of {{reconstructed}}, it would be fair that the same kind of scrutiny be applied to original research reconstructions done by CodeCat & co. --Ivan Štambuk (talk) 20:52, 8 February 2015 (UTC)[reply]

"It is important prononounce it with a long á, otherwise it will sound like…" edit

This kind of a disclaimer seems to have been added to several Hungarian words that are vowel-length minimal pairs, mostly by User:Panda10. E.g. kén, kérés, kint, mély, méz, vágy, vét. Some cases also warn readers about consonant length, e.g. arra, száll.

Is there a point to this? On one hand, I guess this is semi-useful for English speakers prone to ignoring diacritics; on the other, it seems arbitrary to mention just these kind of minimals pairs, and not pairs involving e.g. s/sz. We are not a language-teaching resource, and so this does not seem to generalize into any kind of a useful policy. --Tropylium (talk) 09:38, 7 February 2015 (UTC)[reply]

There were other editors who complained about this before, so apparently it is not useful. Feel free to delete them when you see them. I will do the same. --Panda10 (talk) 12:31, 7 February 2015 (UTC)[reply]

Trivia sections in entries edit

WT:ELE says "Other sections with other trivia and observations may be added, either under the heading “Trivia” or some other suitably explanatory heading. Because of the unlimited range of possibilities, no formatting details can be provided." However, in practice, we haven't accepted ad-hoc section headings or random encyclopaedic factoids in years, and we've done away with ==Trivia== sections, too. (There were 22, out of our 3 950 000 entries, in the last dump, containing stuff like this.) I suggest removing the clause. - -sche (discuss) 06:07, 8 February 2015 (UTC)[reply]

On the one hand, spelling bee trivia is pretty clearly outside what belongs in a dictionary, as is stuff like this and this (which was falsely marked as a "Usage note"). On the other hand, all the other trivia seems to be along the lines of assessees#Trivia, 鬱#Trivia and scrootched#Trivia. If not exactly dictionary material, it is still at least information that pertains to the word itself. 死ぬ#Trivia in particular contains useful and interesting information, and I think we should certainly note somewhere that 死ぬ is the only ぬ verb in modern Japanese. What I think we need is just something a bit more structured than "You can add anything you like under any title you like in any format that you like". Smurrayinchester (talk) 12:07, 8 February 2015 (UTC)[reply]

I think they'd be more palatable if they weren't called "Trivia". We already have a "Usage notes" section; what about making an "Orthographic notes" section for information such as is provided in the sections linked to above? —Aɴɢʀ (talk) 13:06, 8 February 2015 (UTC)[reply]

That sounds a bit too formal and academic for what is essentially word games. Equinox ◑ 14:18, 8 February 2015 (UTC)[reply]

In my experience, things like the note at 死ぬ generally get shoehorned into the Usage notes section. And that's fine, in my opinion. I'd be hesitant to create an "Orthographic notes" section for just a tiny handful of entries. OTOH, perhaps we could move anagrams under that header, enclosed in a template similar to {{homophones}} (which would reduce how much space they take up), and then the section wouldn't be so useless/little-used? Alternatively, perhaps we could just have a ====Notes==== section, perhaps even replacing ====Usage notes====? But my preferred solution is to shoehorn the dozen-or-so useful Trivia sections under Usage notes. I mean, it's not wrong to call a note that 死ぬ is the only ぬ verb a "usage" note... - -sche (discuss) 18:42, 8 February 2015 (UTC)[reply]

I support a ===Notes=== header in ELE to replace both this trivia business and ===Usage notes===. —Μετάknowledge^{discuss/deeds} 20:50, 8 February 2015 (UTC)[reply]
This is the fr.wikt current practice (a Notes header). Lmaltier (talk) 20:53, 8 February 2015 (UTC)[reply]

I don't like "Notes" — too vague. It's like having an "Information" section. The whole entry is notes, or information, of various kinds. Equinox ◑ 21:01, 8 February 2015 (UTC)[reply]

I don't mind "Trivia", but it shows condescension. MW Online and some other dictionaries accommodate word games in their entries. Even in taxonomic names folks play word games (eg Iouea, Aa, Zyzzyzus.

How about "Miscellany" or a right-floating box placed so that it does not rise far above the ruled lines at the bottom of entries. DCDuring TALK 23:06, 8 February 2015 (UTC)[reply]

On inflections of extinct languages edit

Wiktionary has an interesting policy of only including Old Irish verb forms that are actually attested. Why is this? Sometimes I've wondered if as similar policy is appropriate for Ancient Greek as well, which has never seemed to have well-defined conjugations. Thoughts? ObsequiousNewt (ἔβαζα|ἐτλέλεσα)

I wouldn't call it a policy so much as my personal decision which no one has objected to. I made that choice for Old Irish because Old Irish verb forms are notoriously unpredictable. It is very hard, often impossible, to say what a given form of a given Old Irish verb will be unless it's attested. (Students of Old Irish are often left with the impression that all verbs in that language are irregular; that's an exaggeration, but only a small one.) For this reason, I thought it best if we don't even try to predict them, but merely to list the attested forms. Ancient Greek, on the other hand, has comparatively well-behaved verbs: if you know the stem and the ending, you can glue them together to make the verb form. Even if that form is unattested, you can quite certain that the predicted form is correct. Also, the Ancient Greek corpus is orders of magnitude larger than the Old Irish corpus, making it much more difficult to find out what is and isn't attested. —Aɴɢʀ (talk) 16:41, 10 February 2015 (UTC)[reply]

Wiktionary Culture edit

Some time ago I was told by email that my farewell message in the Beer Parlour had been greatly annotated and that there was a lot of support for what I was doing as a Wikipedia editor. At first I was inclined to ignore it but the lure of the Wiktionary's potential was such that I did look the Parlour item over. The experience was not encouraging as my position seems not to have been understood.
My resolve to discontinue editing Wikipedia was very definitely not because of what was done to my contributions. Rather it was because of how it was done, that is, because of the despicable rudeness of not even telling me what was being done but leaving me to discover it.
As a child of the Great Depression I was brought up to abhor rudeness: to avoid doing it and to shun people who do it. This was taught in the home and at both Sunday school and public school, and rudeness was severely punished in the latter, by strapping in the case of boys. Unfortunately the modern educational dogma of building self-esteem seems rather to build selfishness and thus encourage rudeness in many (see here).
So it's not surprising to me that I should be subjected to rudeness in the Wikipedia quasicommunity, but the lack of surprise doesn't make it less abhorrent to me. In a way it makes it more abhorrent because it is accompanied by a great sadness.
I know that modern culture is not my culture, but culture makes the person and is not easy to change, especially when you don't want to change it. When as a Wikipedia editor I would often come across modern sexual senses, with informal or transient attestation if any, I realised that this meant that the dominant culture of the Wikipedia editors was modern. Though I disagreed with the inclusion of such senses I left them alone. To undo them would have been rude, and with the predominant editorial culture I felt that raising such an issue in, say, the Beer Parlour would be a waste of time.
Incidentally, I'm not WF, as was suggested in the Beer Parlour. But I must confess that I got my early Wiktionary coding skills from being an editor under another name. However I left the Wiktionary alone for quite some time after being very rudely lambasted and only returned when a growing enthusiasm for its potential overcame my distaste for the editorial culture.
This message does not signify a return beyond a couple of items I will put below to convey my personal hopes for the Wiktionary. I will not log back in again and I will not read any emails coming from the Wiktionary editorial community.—ReidAA (talk) 03:45, 10 February 2015 (UTC)[reply]

Right, we'll stop posting "sexual" senses that you dislike, and await our strappings. Oh wait, my mistake, I meant to say "good riddance". There's nothing viler than somebody polluting a space with a big bitching rant topped off with the rotten cherry of "I won't be coming back though". If you're gone, don't post. Equinox ◑ 00:23, 12 February 2015 (UTC)[reply]

I can think of a great many viler things, even if I limit myself to behavior on online forums, and it's unclear to me if the above exaggerration is supposed to accomplish anything other than embellish how your disdain for this ex-editor runs deep indeed. --Tropylium (talk) 14:37, 13 February 2015 (UTC)[reply]

I found ReidAA editing style pretty rude: he generally did not respond to user talk interactions in action, and went ahead as he saw fit regardless of disagreement. For an instance not involving me as the main actor, when Widsith asked him to stop switching or get consensus in October 2014, he continued regardless of the conversation. From this and mine interactions with this user and from seeing his long-term pattern of behavior, I learned that he will need to be dealt with directly in the mainspace. And there is no riddance: User:Smuconlaw; last contribution: 15 February 2015. I find it pretty insolent to whine about how one is leaving the project and then go on editing under another user. I think I saw one more user used by the same person, but I cannot find it now. --Dan Polansky (talk) 20:31, 15 February 2015 (UTC) Let me strike out what is an inappropriate speculation, based on insufficient evidence; there is even some evidence to the contrary. --Dan Polansky (talk) 20:42, 15 February 2015 (UTC)[reply]

Er ... not sure what this is all about but I am not ReidAA. Smuconlaw (talk) 22:56, 15 February 2015 (UTC)[reply]

Suggestions for Examples edit

The following suggestions describe an approach to adding examples to the Wiktionary. The motivation for this approach is to exploit the possibilities for an online dictionary to use practically unlimited storage.
The potential for examples lies in their benefit for learners of English, either children or non-native speakers of English.
For such users of the Wiktionary, most of whom one would expect to not be logged on as an editor, any quotations would only be accessible by clicking on the individual quotation tags provided with each sense when quotes are available. Probably the option of seeing all quotes should only be offered to users who are logged on, though the ability to get at quotes for individual senses must be available to the learner whose curiosity about the word/sense background must be catered for, which is why quotations should be linked to sources and to online text where the context of the quote can be found (more on this).
Ideally (I would hope eventually) every sense would have a few examples. Being primarily for learners, the examples should be short phrases or sentences, each with a distinct context within a sense.
The user should be able to click on any example to hear it spoken. To be able to do this would be a tremendous help for learners. Maybe there should be options for learners to choose between male and female speech when a choice is available, and even for regional accents.
If the Wiktionary becomes popular for learning to speak English, the learner could maybe choose to have their pronunciation checked and corrected by it.
Another good option would be to have sign language used to back up the example. This might not be of all that much use to deaf people, but there is a school of thought that sign language should be taught to all students in their early education both for the mental benefits of being bilingual in this fashion but also because it is a distinct channel of communication to be used where speech is ineffectual, for example in noisy rooms or across long distances.
In traditional dictionaries, like 1913 Webster's, very brief quotes have been used as examples. This is because of the limitations of the printed page and bound volumes. There is some benefit and interest in using such quotes as examples, particularly for obsolete/archaic senses where an archaic pronunciation would be appropriate, but such examples should be stripped of all their context, except for the date, and an improved version of the example provided as a quotation.—ReidAA (talk) 04:05, 10 February 2015 (UTC)[reply]

Suggestions for Quotations edit

The following suggestions describe an approach to adding quotations to the Wiktionary. The motivation for this approach is to exploit the possibilities for an online dictionary to use practically unlimited storage and to link to a rich and rapidly increasing source of related online data.
Quotations (hereinafter "quotes") are in general of particular interest to two kinds of users.
Primarily they are of interest to experienced readers and writers wishing to discover more about a word or a phrase, or one of its senses, in particular about its early use and its more recent use.
Secondarily they are of interest to avid readers for whom the quote may spark an interest in a quote's author or source or context.
For neither of these kinds of Wiktionary users would an abundance of quotes be appropriate for any sense upon initial presentation. Rather a maximum of three or four should be presented directly and these should be spread over a variety of dates and contexts. Should there be more available then all should be stored separately and linked to as a store for that sense's quotes.
For neither of these kinds of Wiktionary users would very brief quotes be available for any sense; that's what examples are for.
For both of these kinds of Wiktionary users at least three links should be provided as well as a date: to the author(s), to the work, and to an online source of the work. The Wikipedia will often provide the first two links and Wikisource or Gutenberg or Google books the third.
It's very important that the works quoted from should be formally published and that the links used should be reliably very persistent.
The following suggestions describe an approach to adding quotes, at least while they remain relatively scarce. Although the quote will most often be for a word, it should be remembered that the Wiktionary also contains phrases, though not very thoroughly, and quotes for these should be added where there is a quote gap.
1. Choose a book to read in hard copy whose text is available online and preferably whose author is not yet quoted in the Wiktionary.
2. In reading the book make a note of any interesting word and its location, and check (then or later) that it is consistent with the online version. The online version will sometimes need editing, or might be of a different edition to the one you are reading.
3. Prepare an RQ template (with documentation) for the book and enter it into the (incomplete) Wikipedia table of quoted works.
4. Prepare a skeleton of the code to be used for adding the quote for the first word you have noted for use in the Wiktionary, using a text editor so that you can easily copy and paste the entry into the Wiktionary. The skeleton will be in two parts, the first using the RQ template filled out for the word of your choice, the second holding the chosen word and the surrounding text, say forty or fifty words, which can be copied easily from the online text. Do not highlight the occurrance(s) of your chosen word.
5. Paste a copy of your code quoting the chosen word into the appropriate sense in the Wiktionary and highlight the word wherever it occurs.
6. Now go through the quote word by word and check whether there is a quote gap for each word's sense. There are very very many such gaps, even for very common word senses. If you find a gap, fill it in the same manner that your chosen word was used.
7. The skeleton code can be modified for each of your chosen words so that the procedure above can be repeated for them.
Note that the benefit of using an RQ template for a book is not just to simplify the adding of multiple quotes from a single source, but also to allow all quotes from a single source to be upgraded, for instance when a better online source becomes available, simply by upgrading the template.
Another avenue for quote improvement in the Wiktionary is to focus on one of the sources used as a combination of example and quote in the original 1913 Webster's Dictionary. Often these are simply given with only a short author or source name which is explained here.
One way to improve one of these is on the one hand to simplify it as an example, with no source and only giving a date if the sense is archaic or obsolete; and on the other hand to expand and link it. Many of the sources for such quotes are already supplied with an RQ template (see [2]). — ReidAA (talk) 04:06, 10 February 2015 (UTC)[reply]

A great tool to have for quotations is some kind of app, where at the click of a button, one can add quotations to a corresponding WT entry, coming in from Wikisource, Google Books, or another compatible media. I'd pay good many for that! --Type56op9 (talk) 15:40, 10 February 2015 (UTC)[reply]

Ask, and ye shall receive. Smurrayinchester (talk) 15:53, 10 February 2015 (UTC)[reply]

Wow, that is an awesome gadget. It should be linked from Wiktionary:Quotations! --Type56op9 (talk) 11:46, 12 February 2015 (UTC)[reply]

Cool gadgets edit

After hearing recently about WT:QQ, a cool quotations gadget, I found myself wondering if we had some other cool gadgets that maybe some users don't know about. So, in hope of some civility, I think it would be appreciated if some other users mentioned here some cool Wiktionary gadgets that may not be known to all the communities, as a way of helping each other out. --Type56op9 (talk) 11:08, 15 February 2015 (UTC)[reply]

I'll start: WT:ACCEL is a nice gadget to quickly and semi-automatically create forms of words (plurals, conjugations, feminine forms etc.) in various languages. --Type56op9 (talk) 11:10, 15 February 2015 (UTC)[reply]

I am already working on that. here--Dixtosa (talk) 11:18, 15 February 2015 (UTC)[reply]

Let's categorize semantic loans edit

somehow.

I think it is interesting.--Dixtosa (talk) 15:31, 15 February 2015 (UTC)[reply]

I have seen "semantic calque" being used synonymously with "semantic loan". We can assume semantic loans are a subtype of calques and include them in Category:Calques by language. --Vahag (talk) 17:22, 15 February 2015 (UTC)[reply]

There are also "phono-semantic" loanwords (sometimes adding new, funny senses), such as 馬殺雞／马杀鸡 (mǎshājī). --Anatoli T. ^{(обсудить}/^вклад) 00:45, 5 March 2015 (UTC)[reply]

WT:WE length edit

It was my impression base on this discussion that we were supposed to be slowly shortening the list on WT:WE. But given this diff, that is becoming very difficult. I like helping out with it, but it is slowly filling up with words I'm not able to add or that are so obscure that I can't find anything about them. May I request some aid shortening the list or removing unattestable entries? —John C5 22:44, 18 February 2015 (UTC)[reply]

I've moved some apparent Translinguals out of there and -sche removed some blue links. You could move some of the non-English items that you don't know to the various WT:RE:lang pages, eg WT:RE:he. DCDuring TALK 03:33, 19 February 2015 (UTC)[reply]

Pitjantjatjara case marking edit

In Pitjantjatjara, the ergative case is indicated by the ending -ngku: watingku yuu palyaṉu / man-ERG windbreak make-PAST / The man made a windbreak. Straightforward enough; indeed, I created an experimental entry at watingku. However, case marking in this language strikes me as rather odd: the case ending only attaches to the last word of a noun phrase: wati ninti tjuṯangku yuu palyaṉu / man wise many-ERG windbreak make-PAST / The wise men made a windbreak.

Could -ngku be considered a clitic? Is it appropriate to case entries of the type watingku, given this situation? Given that a lot of nouns (wati being a prime example) are frequently found in their inflected form about as often as not, I worry that we would be doing our readers a disservice by not creating entries for these forms. This, that and the other (talk) 12:32, 20 February 2015 (UTC)[reply]

New constructed languages edit

Why can't we include all constructed languages in Wiktionary, including Idiom Neutral?

Or my own constructed language, Sintelsk, at least somewhere in Wiktionary's appendix? My constructed language, which I'm writing about at my own Wiktionary that I created and run for fun: http://wikitoslav.monathevampirewiki.org/wiki/Wikitoslav:Frumpsida . It is a constructed language, mainly based on Danish, that has a precise pronunciation system, and looks more straightforward than most Germanic languages.

If you're interested in considering my advice about my constructed language at least, you may find these categories interesting: English lexicon, Danish lexicon, Spanish lexicon, French lexicon, Lexicon for Sintelsk itself, which includes definitions in Sintelsk and translations of its words into other languages, just like all other Wiktionaries. I built up my Wiktionary to make categories by using lots of templates. Also, my longest page on the wiki is on, which is a Sintelsk word meaning one, a, or an.

Also, let's consider definitely including Idiom Neutral into Wiktionary. NativeCat ^{drop by and say Hi!} 06:04, 21 February 2015 (UTC)[reply]

Minor Constructed languages can be included in the appendix namespace (for example Appendix:Sindarin). — Ungoliant ^(falai) 16:27, 21 February 2015 (UTC)[reply]

...within certain parameters. Most importantly, if the language is copyrighted (which many constructed languages are), we can't include too much of it or we're violating copyright; see Wiktionary:Beer parlour/2014/July#Inclusion_of_Dothraki. Secondly, if the language has no community of users, it's doubtful whether or not it should be included; many people have opposed including minor constructed languages that have no users. Lastly, if you made the language up yourself, it's a bunch of protologisms not suitable for inclusion except in your userspace. (As long as you're also making useful edits to Wiktionary, and as long as no copyright issues arise, people shouldn't complain about it if you put it in your userspace. Note that if you made the language up yourself, and copyrighted it, but you then publish it on Wiktionary, then per the disclaimer at the bottom of every edit window, "you irrevocably agree to release your contribution under the CC-BY-SA 3.0 License and the GFDL", which you might or might not want to do.) Disclaimer: I am not a lawyer, but we have lawyers here, and they weighed in on the discussion I linked to above. - -sche (discuss) 17:55, 21 February 2015 (UTC)[reply]

Manual adding of audio files edit

Hello. User:DerbethBot/February 2015 contains a list of audio files (and matching Wiktionary entries) that my bot was unable to add automatically - in most cases due to multiple etymologies (human needs to decide where an audio file belongs). Currently there are 675 audio files that can be immediately used to enrich entries in 30 languages. If you want to help, please check the page and remove entries that are done. --Derbeth ^talk 18:24, 21 February 2015 (UTC)[reply]

Subcategory for 'nyms? edit

Would it be acceptable to set up a class of subcategory of "Category:<language> names" for ethnonyms (endonyms, etc.)? The names category has the boilerplate description "<language> terms that are used to refer to specific individuals or groups," but it seems this category generally just has 2 subcats, for given names and surnames - no problem there but wondering if the intent of the names categories was to accommodate wider purpose. Or whether an ethnonyms category would go elsewhere. Or if this topic has previously been considered and ruled out. Aside from thinking this would be of general cross-language interest, I'm personally interested in possibly adding various Chinese ethnonyms, using a category to facilitate access to them as a sub-lexicon. I'm no expert on Chinese, but have noted with interest variant forms in hanzi for African ethnic groups. TIA for any feedback.--A12n (talk) 16:39, 23 February 2015 (UTC)[reply]

Simplification of topic categories adding edit

As the creator of {{zh-cat}}, I propose to generalize this template to {{cat}} and use it to add the topic categories. The syntax would be {{cat|en|CATEGORY_ONE|CATEGORY_TWO}}. I do not believe that there would be technical problems in creating the template, so I am only putting this here for consensus and discussion. --kc_kennylau (talk) 09:12, 24 February 2015 (UTC)[reply]

See {{catlangcode}}. Chuck Entz (talk) 13:25, 24 February 2015 (UTC)[reply]

@Chuck Entz Oh, then I propose the automation of it and the name changing :) --kc_kennylau (talk) 16:28, 24 February 2015 (UTC)[reply]

Re automation (if I understand correctly that you mean a bot to go through and change existing cats to the new template): What would happen then if one wanted to keep specific categories with associated etymologies or word senses within a section for a particular language?--A12n (talk) 18:02, 26 February 2015 (UTC)[reply]

Support the simplification. I can also see benefits for languages requiring a non-default sorting order. Module:zh-cat does it already for Chinese - sorting by radicals. Ideally, Japanese would use a similar approach to sort by hiragana. That way, entries won't require code like this: [[Category:ja:Mammals|しし]] or [[Category:ja:Mammals|ひいばあ']] for kanji or katakana entries, e.g. in 獅子 or ビーバー. --Anatoli T. ^{(обсудить}/^вклад) 00:41, 5 March 2015 (UTC)[reply]

Category:Historical terms by language vs. Category:Terms with historical senses by language edit

There seems to be a category scheme renaming in progress here that has not been quite completed. Apparently the latter is where most things go these days. Is there any particular reason the former is still kept around as well, just for three Chinese, three English, two Spanish, one French and one Latvian term? Or is it just waiting for deletion once the articles have been edited to be in the latter category branch instead?

The only previous discussion I can find on this is a brief exchange from June 2011: English terms with obsolete senses, etc. --Tropylium (talk) 19:39, 27 February 2015 (UTC)[reply]