Wiktionary talk:Votes/2012-03/CFI for Endangered Languages

Latest comment: 12 years ago by BenjaminBarrett12 in topic Language changed

Material uploaded to Usenet edit

Anyone can post on Usenet. I am not convinced this is a good idea. On the other hand, I do see a need for allowing non-print sources, but I'm not sure how it should be laid out. -- Liliana 18:17, 23 March 2012 (UTC)Reply

Then again, anyone can publish a book (even though it's more work). Equinox 18:24, 23 March 2012 (UTC)Reply
Frankly, I envision people typing up their notes in Word or Google Docs, outputting to PDF and uploading them to Usenet. That's actually more formal than many of the notebooks of linguists and anthropologists that are studied today to recover words.
I agree that this loose requirement seems to open a Pandora's box to allow abuses. My hope is that the restriction to endangered languages will be adequate to avoid widespread abuse. BenjaminBarrett12 (talk) 19:23, 23 March 2012 (UTC)Reply
However, we already allow Usenet. I'd rather remove that line; we already know that we can use Usenet, because CFI already says so. It prevents future questions; if we banned Usenet or permitted the Web Archive, we won't have to struggle with why this part explicitly calls out Usenet.--Prosfilaes (talk) 00:00, 24 March 2012 (UTC)Reply
That makes sense. It brings up the question of whether endangered and extinct languages should be conflated. I'll start a new section. BenjaminBarrett12 (talk) 04:47, 24 March 2012 (UTC)Reply

Contemporaneous sources for extinct/endangered languages edit

One of the criteria for attestation in the WT:CFI is: "For terms in extinct languages: usage in at least one contemporaneous source." I was hoping to leave that alone, but it is closely related to the endangered language issue, so perhaps it is best to consider expanding it to include both.

Many "endangered languages" can also be classified as extinct. Makah, a language closely related to Ditidaht, is listed on the UNESCO Interactive Atlas of the World’s Languages in Danger as endangered, but a few years back, the last two truly native speakers of Makah passed away. Even though people continue to use and learn Makah, it can therefore be referred to as extinct (see: w:Extinct_language, though the Wikipedia article does not specifically mention "native speakers," that is used by some as a criterion).

The issue with the extinct language criterion is the word "contemporaneous." I'm not sure why that is used, but my guess is that the intention is to address languages like Latin, so that only Latin words in use at the time Latin was alive would be used (i.e., not Neo-Latin). (Some people draw a distinction between dead languages like Latin that are still in use for science, etc., and extinct languages like Yana that have completely disappeared even if some written records remain.)

If "contemporaneous" can be replaced with something like "contextually accurate," then "extinct" can be widened to include "endangered languages." I'm not sure that "contextually accurate" is the right term, though. BenjaminBarrett12 (talk) 05:05, 24 March 2012 (UTC)Reply

Non-endangered languages needing similar treatment edit

I fully support this proposal, and I think there are some languages that are not in danger of going extinct, but are regarded as sociolects or slangs and thus are almost exclusively spoken. For example, I don't think that citing terms in Tok Pisin would be very easy, even though a few million people speak it, because it is a pidgin. Except in dictionaries, a lot of words would present a struggle to find online. Would anyone have ideas for criteria to determine what languages would be covered by this, or be interested in supporting this? --Μετάknowledgediscuss/deeds 21:35, 24 March 2012 (UTC)Reply

Tok Pisin has newspapers and a (written) grammar according to the Ethnologue, but your point is still valid. (It may also be that the written records available for Tok Pisin are far too sparse to provide adequate attestation for the language.) This point also applies to sign languages, for which there appears to be a double-standard.
The CFI for sign languages says: '(CFI) is considered to be met by any sign that is used by multiple independent deaf communities, and the "usage in permanently recorded media" condition includes any video media that has been widely distributed, including DVDs, broadcast television, and sign language dictionaries.'
In the discussion in the beer parlor preceding this vote proposal, Prosfilaes said, 'I'd almost say that clearly widespread use is not for any word where we can't wave at Google Books or Usenet and go look, "a metric assload of cites", which includes all the words from most languages.' I believe I have seen comments from others to the same effect.
If the sign language CFI is correct, then a native speaker of an endangered language can proclaim word X to be widespread and, as long as there is no challenge, it should stand. This seems reasonable for endangered languages. If Prosfilaes is correct, however, then I think the sign language CFI needs revisiting. BenjaminBarrett12 (talk) 22:38, 24 March 2012 (UTC)Reply
The sign-language CFI makes a specific exception to the general CFI... that's not really a double standard. Other endangered languages must meet the general CFI, unless and until we create exceptions for them. - -sche (discuss) 22:43, 24 March 2012 (UTC)Reply
You're right. thank you. The word "considered" is what's critical in the sentence I quoted. BenjaminBarrett12 (talk) 23:08, 24 March 2012 (UTC)Reply
The way I see it, we have already solved the sign language problem at Wiktionary (although orthography is still a mess, in my opinion). That still leaves languages like Fiji Hindi in a strange place - if we sent all our Fiji Hindi terms to RFV, many of them would end up deleted, even though they are legitimate, because only a couple newspapers, one or two non-dictionary books, and a handful of Bible fragments exist written in that language.--Μετάknowledgediscuss/deeds 15:08, 25 March 2012 (UTC)Reply
Thank you for the concrete example. My inclination is to lump endangered and extinct languages together (defining them later on the CFI page or creating a new page), and then add a provision that says exclusions can be made for languages like Fiji Hindi by vote. BenjaminBarrett12 (talk) 16:35, 25 March 2012 (UTC)Reply

wording of the clause edit

I'm a fan of standardisation, so I've modified the text of the vote to use "terms" rather than "a term" (etc), to match the plurality of the extinct languages clause. If we wanted to match the wording of the latter clause even more closely, we could use wording like this. - -sche (discuss) 22:14, 24 March 2012 (UTC)Reply

Proposal for new wording edit

I appreciate all the guidance and conversation so far. Taking into consideration the conversation and the goals of trying to be welcoming to all languages while maintaining a high level of quality for Wiktionary, I think a rewrite of this proposal is needed. Perhaps a complete new vote page is needed as well; I'm not sure.

The one contemporaneous usage requirement for extinct languages was discussed at Wiktionary_talk:Votes/pl-2011-05/Attestation_of_extinct_languages_2, but there was not a lot of discussion. It seems to me that there was not a lot of concern about Latin terms being contemporaneous because anything not contemporaneous would not be Latin (it would be neo-Latin, medieval Latin, etc.)

Below is the new proposal. I've included the part about sign languages from the current CFI just because it seems to fit, but perhaps keeping it separate is best.

Proposal

Replace

"For terms in extinct languages: usage in at least one contemporaneous source."

with

"For terms in languages with sparse documentation, usage in at least one source or as agreed by the community according to the language."

Under "Languages to include," add the following section:

Languages with sparse documentation

The criterion for inclusion for languages that are extinct, endangered or without a strong written tradition is generally a contextually appropriate citation in at least one source. An endangered language is one listed by an institution such as UNESCO Interactive Atlas of the World’s Languages in Danger or the Living Tongues Institute for Endangered Languages, or a dialect of those languages.

In cases where the requirement of at least one source is difficult to meet for a language, a vote may be held to grant an exemption from the criteria for inclusion. For languages granted exemption, the community of that language's speakers will determine criteria for inclusion and other policies for that language. Exclusions may be rescinded as well through a vote.

Terms in signed languages are acceptable as entries, and should be entered as described in the policy document Wiktionary:About sign languages.

BenjaminBarrett12 (talk) 08:58, 27 March 2012 (UTC)Reply

Much better, and maybe this should be a separate sister vote that supersedes this vote if both pass. One problem: there may be no way to get the "community of that language's speakers" to figure out how to handle a language (there may be no such community on English Wiktionary, and no Wiktionary in that language). I think we should change that line from being about giving a nebulous "exemption" to being about granting those languages the right to have a single citation and still meet CFI. --Μετάknowledgediscuss/deeds 16:57, 27 March 2012 (UTC)Reply
Glad it looks better. My intention is to use this, not the one on the main page. Either I can replace the wording or delete this vote page and create a new one. The community would start out on their "About" page and expand if necessary; the community would be the only people qualified to develop the guidelines necessary for that language. The exemption is intended for languages that cannot meet the one usage requirement. The idea is that the vote for the exemption would pass only if it's demonstrated that the usage requirement cannot be met. BenjaminBarrett12 (talk) 17:52, 27 March 2012 (UTC)Reply
I don't like the part about voting to exempt languages. We can vote on anything whether or not this vote passes, but I strongly oppose anything that would turn Wiktionary into a primary source instead of a secondary source.--Prosfilaes (talk) 02:17, 28 March 2012 (UTC)Reply
Wiktionary is already a primary source for sign languages, which have an extremely limited amount of documentation that can be cited, so this proposal does nothing new. It builds on the sign language precedent but only in cases where a language has a demonstrable lack of documentation in a similar manner to sign languages. The rescission (cancellation) part is to provide a mechanism in case an exempted language (including sign languages) veer from a reasonable path. I struggled a great deal to come up with a medium path that would address the concerns on both sides of this issue, and welcome any alternative ideas to address the problem of sparsely documented languages. I see this proposal not only as a way to welcome other languages but to fulfill the stated aim on the Wiktionary:Main_Page "to describe all words of all languages using definitions and descriptions in English." BenjaminBarrett12 (talk) 03:09, 28 March 2012 (UTC)Reply
Re "Wiktionary is already a primary source for sign languages": not quite. Wiktionary is still a secondary source for sign languages; the exception we make for sign languages is that we allow other secondary sources (dictionaries, instructional videos which only mention rather than using terms, etc) to count as proof of the word, rather than requiring people to cite primary documents (books, newspapers, Usenet, etc). The proposed wording above would seem to allow a community of people to claim to have knowledge of the language and verify a term by say-so. I think the proposal at the top of this page, to have speakers publish things on Usenet (durably archived), and accept 1 rather than 3 Usenet post(s) (or book(s), etc), is likely to garner more support. - -sche (discuss) 04:52, 28 March 2012 (UTC)Reply
So, re changing the wording: perhaps drop the second half of the above rule, leaving "For terms in languages with sparse documentation, usage in at least one source." Keep the paragraph "The criterion [] or a dialect of those languages." And omit "In cases [] a vote." - -sche (discuss) 05:04, 28 March 2012 (UTC)Reply
Maybe I'm misunderstanding something, but my reading of Wiktionary:About_sign_languages is that Wiktionary is indeed a primary source for sign languages. It says: 'Unlike spoken languages, sign languages are rarely written outside of reference materials and academic publications. Thus, the "clearly widespread use" condition of Wiktionary:Criteria for inclusion (CFI) is considered to be met by any sign that is used by multiple independent deaf communities, and the "usage in permanently recorded media" condition includes any video media that has been widely distributed, including DVDs, broadcast television, and sign language dictionaries.' BenjaminBarrett12 (talk) 05:54, 28 March 2012 (UTC)Reply
Ah, I think you're interpreting the first part (the "clearly widespread use" condition [] is [] met by any sign that is used by multiple independent deaf communities) as allowing Wiktionary's deaf community to simply assert that a sign is used, without proof. I and others in other discussion fora, however, interpret "clearly widespread use" as a way of passing something by noting the existence of proof (on Usenet, in a Google Book, etc) without having to format the proof and put it into the entry. If someone claims "this term is clearly in widespread use", "this is in use in my deaf community and in other deaf communities", but other editors can't find or cite any (proof of) such use (in primary sources, secondary sign-language dictionaries—even ones not available online—, etc), then the term isn't clearly in use — and we've RFV-failed supposedly "widespread" but unciteable entries before. (I failed UGG靴子 because I could find much non-durable but no durable proof of it.) - -sche (discuss) 19:38, 29 March 2012 (UTC)Reply
I understand that the "clearly widespread use" clause is interpreted that way in general. It's vague wording but more specific interpretation is very confusing on Wiktionary. To try to understand it, I have referred to it many, many times over the last couple of weeks since I re-joined. I still struggle to understand its purpose.
But with sign languages, Wiktionary:Votes/pl-2008-08/Wiktionary:About_sign_languages says "relaxes WT:CFI for these rarely written languages." Like the general CFI, the sign language CFI is written vaguely (Wiktionary:About_sign_languages#Criteria_for_inclusion), and it appears to allow inclusion by consensus as well as by verification in recorded media. BenjaminBarrett12 (talk) 20:38, 29 March 2012 (UTC)Reply
I suspect for any sign language spoken in the western world, there is tens of thousands of hours of footage that can be used as documentation. And the issues that make recording sign language hard don't apply to spoken languages. In any case, slippery slope is not a valid argument; that we've done it once (which -sche contests) doesn't mean we have to do it again.
I stand firmly and implacably against any proposal that lets people add material without citations. I believe it will produce a few dissonant entries in arbitrary orthographies that no one will ever use; and at the same time, it will produce even more entries that people make up.--Prosfilaes (talk) 05:00, 28 March 2012 (UTC)Reply
I believe whether a language is granted an exemption is something that should be determined based on the evidence at hand for that specific language. I can certainly imagine someone who speaks a sparsely documented language having a smartphone but not a scanner and wanting to preserve the language they speak in that community.
As I've noted in my comment above, the sign language community seems very much to have such a policy, and this merely builds on that precedent. I'm open to an alternative for people who do not have written documentation of scanners. Does anybody have one? BenjaminBarrett12 (talk) 05:54, 28 March 2012 (UTC)Reply
I think it's a bad precedent, and will vote against anything building on it. I see absolutely no point in us admitting languages that have no permanently archived sources; the only people who would care are linguists, and they probably have their own sources and would never trust Wiktionary.--Prosfilaes (talk) 09:41, 28 March 2012 (UTC)Reply
I'm a linguist, so I disagree about not trusting Wiktionary. My hope is, indeed, that linguists specializing in endangered languages will welcome Wiktionary with open arms once a good framework is in place.
Nobody has commented about changing the wording on the main page, so I'm about ready to modify the language there to reflect the discussion here. I believe that without the voting part of this proposal, any language community will be able to do the same thing the sign language community did and proclaim "clearly widespread use" as a main CFI. If nobody has a concern about that, I will go ahead. BenjaminBarrett12 (talk) 18:00, 28 March 2012 (UTC)Reply
Why, as a linguist, would you use an anonymously edited site on matters where you have no independent verification if you had a choice?
Part of this issue is perhaps I'm not sure what you mean by "linguist." To me, it primarily means a person who studies or researches linguistics. Therefore, Wiktionary is itself a topic for linguistics study. However, this is veering off-topic. If you are interested in my views as a linguist, you are welcome to post on my page.BenjaminBarrett12 (talk) 20:38, 29 March 2012 (UTC)Reply
No language has the right to override CFI by fiat. There was a vote about sign languages. I'll note that I'd be hard pressed to accept any claim of "clearly widespread use" from a language with 8 speakers. With the major sign languages, at least we could check YouTube or the Internet Archive, permanently archived or not, and look for and at usage. (Stuff like http://archive.org/details/our_solar_system_sign_language even purports to come from NASA, and could be uploaded to Commons as verification.)--Prosfilaes (talk) 10:49, 29 March 2012 (UTC)Reply
Thank you for finding that :) So it seems that they did what I was proposing to allow in the CFI: Hold a vote. As nobody besides Prosfilaes has commented one way or the other, I will modify the language, stripping out the voting provision :) BenjaminBarrett12 (talk) 19:01, 29 March 2012 (UTC)Reply

Language changed edit

I've updated the language to reflect sche's suggestions and the general discussion. The voting is scheduled to start in about two hours, I think. I hope this language works for everyone, and my thanks to everyone who has provided input and feedback.

Do I need to advertise this in the beer parlour again?BenjaminBarrett12 (talk) 21:24, 29 March 2012 (UTC)Reply

Don't you think the start of the vote should have been postponed? To me, the timing was almost too close to review everything, and obviously the proposal is still under discussion. -- Liliana 03:11, 30 March 2012 (UTC)Reply
That's fine. I thought the issues had been ironed out. Can you advise me how to postpone or go ahead and do it? It seems that the content of the proposal is being misunderstood, too. BenjaminBarrett12 (talk) 03:14, 30 March 2012 (UTC)Reply
Well now that the vote has started, it's a bit late to do that. I think. -- Liliana 03:20, 30 March 2012 (UTC)Reply
Sorry, I couldn't find instructions on how to postpone, so I went ahead and started it. Can you tell me what to do for future reference? BenjaminBarrett12 (talk) 03:23, 30 March 2012 (UTC)Reply
Change the date that it is set to start (to something a few days in the future), like this. - -sche (discuss) 03:30, 30 March 2012 (UTC)Reply
Okay, thank you! BenjaminBarrett12 (talk) 03:33, 30 March 2012 (UTC)Reply
Return to the project page "Votes/2012-03/CFI for Endangered Languages".