Wiktionary:Beer parlour/2018/June

Labelling of bound morphemes

After a little talk with WF at the talk page of Spanish -dumbre, we've drawn the conclusion that our current "dated/archaic/obsolete" labelling scheme doesn't work too well for suffixes (and probably other affixes too). Why?

-dumbre isn't used nowadays to coin new words. In fact, it seems it's only been used twice in Spanish proper: in pesadumbre and podredumbre; all other words are inherited from Latin words ending in -tūdo, -tūdinis.^[1] In others words, it's unproductive/non-productive, as opposed to a suffix such as -ción;
however, as it's found in a few words that are still in current use (certidumbre, muchedumbre and others), saying it's "obsolete"/"extinct" is an unfortunate choice of words.

For that reason, I think it would be better to use the labels (productive/non-productive) when speaking of affixes. In fact I think we already do that in a few places, but it would be good to codify the practice.

That's not going to solve all our problems, however: there's even some disagreement on the productivity of -tion at Wiktionary:Tea room/2017/November § -tion.

Another thing: maybe the (obsolete) label doesn't have to be relinquished completely: if there are no words in current use using a certain suffix, then that suffix can be said to be both "non-productive" and "obsolete"/"extinct"? --Per utramque cavernam 15:58, 31 May 2018 (UTC)[reply]

I always label (no longer productive) in such cases. Ƿidsiþ 13:38, 1 June 2018 (UTC)[reply]

Yes, for an affix that is no longer productive but which is found on words which are still used, I think "no longer productive" is better than "obsolete". An affix could also be obsolete, like you say, or archaic (perhaps if it's an alternative spelling of another affix, and sounds archaic: -iren isn't the best example because it's apparently outright obsolete, but it's the vein of thing I'm thinking of ... a- -ing says it's an example). I don't know if it's necessary to specifically label productive affixes or if it's implied by the absence of a "not productive" label. - -sche (discuss) 16:22, 1 June 2018 (UTC)[reply]

Footnotes

^ By the way, would it be correct to speak of a Spanish -dumbre suffix if there had been no word formed with it in Spanish proper, i.e. if all words "using" it had been inherited from Latin?

Lemmatisation of valent adjectives - preposition in entry title?

Equinox and I have had a little talk at Talk:sweet on, and we both agree that sweet on and sweet upon shouldn't have separate entries, but only get a sense at sweet.
Similarly, I've created big of, but after talking about it with Kiwima I decided to turn it a simple redirect to big (“mature; generous”) (with senseid).
dependent on is a red link; prone to, keen on and hard on whoops, that's about something else, however, get their own entry.

I think it would be better to turn prepositionful titles into redirects, and work only with prepositionless titles. We can then use a label/template there; in fact, we already do that for verbs: the first sense of French succéder is tagged with {{indtr|fr|à}} → (transitive with à), a template similar to {{label}}.

At Template_talk:indtr#Adjectives, Ungoliant suggests we do that with a usex-level template instead. Though I don't know what it would look like, I think I'd be fine with that too.

But in the end, my main concern is consistency; let's either move them all to prepositionless titles or to prepositionful titles. And even more importantly, let's have a way of keeping track of those (Category:English transitive adjectives using on, etc.?).

Thoughts? @-sche, DCDuring? --Per utramque cavernam 21:03, 31 May 2018 (UTC)[reply]

@Equinox, Kiwima, Ungoliant MMDCCLXIV, -sche, DCDuring. --Per utramque cavernam 21:04, 31 May 2018 (UTC)[reply]

This issue also comes up with verbs: will someone who sees I foobared at/on/to/for/of the widgets really think to look for our entry "foobar at" if they find that we have an entry "foobar" (which simply fails to link to "foobar at" except possibly as a derived term)? So (momentarily speaking orthogonally to your point,) we should probably consistently use an {{only in}}-like template to link from every verb or adjective foobar to every foobar on/at/ etc that we keep a separate entry for. But many foobar at titles could just be turned into redirects, as you say, and I think I'd prefer that (as the probably more-intuitive-to-use and also easier-to-maintain system) to moving things to prepositionful entry titles. We do sometimes use labels to indicate that, in a particular sense, a word is used "with the" or "with on" or "with for", etc. - -sche (discuss) 21:22, 31 May 2018 (UTC)[reply]

Phrasal-verb definitions are more difficult than adjective definitions that have as complements phrases headed by certain prepositions.

I still bear bruises from having been beaten about the head and shoulders for saying that not all our entries for "phrasal verbs" were for authentic phrasal verbs and that not all of our definitions in our entries for authentic phrasal verbs were non-SoP.

It is not easy to decide whether a given verb + particle pair constitutes a phrasal verb: purveyors of dictionaries of English phrasal verbs and those who've built a career on them find them everywhere; others not so much. It seems to me that there are 'true' phrasal verbs, with definitions that are related to the bare verb principally etymologically. For example:

have at, have (someone) on, and have up

These have(!) almost no connection with any intuitively obvious definition of have. Moreover, phrasal verb definitions at [[have]] are very likely to be lost among all the other definitions we have there.

As we often have bastard English entries to serve as translation targets, part of a rationale for having phrasal verb entries could be that the phrasal verbs are often more common in speech than their Latin- or French-derived single-word synonyms. OTOH, as -sche writes, many language learners might not know enough about English to look up 'verb + particle' rather than 'verb'. We don't often make our decisions about inclusion etc on normal-user-behavior grounds rather than, say, syntactic grounds, but perhaps we should do so more often.

I doubt that we can formulate decision rules that would work in all cases. I don't doubt that we can come up with templates that will often be misapplied.

As a rule, it seems to me that we don't use hard redirects from common SoP phrases to the appropriate definition of the key noun, verb, or adjective in the SoP phrase. But, also as a rule, I am inclined to follow the 'lemming' heuristic: if other dictionaries and glossaries have a real entry (not a redirect) for a term, we should too. DCDuring (talk) 22:23, 31 May 2018 (UTC)[reply]

Thoughts on the adjectives: prone to, sweet on, keen on, etc. aren't adjectives, IMO, and can't be broken down into one POS; better entries (if these were to stay and not be moved to the bare adj.) would be be prone to, be sweet on, be keen on as transitive verbs. That being said, I think only be sweet on should be an entry, and prone and keen should just be senses with lil labels b/c they're not really idiomatic. Redirect prone to to prone (because AFAIK to is the only prep. used with prone), but in keen's case, b/c it can be used just as easily with to and (according to a few Google searches, maybe with), no redirect. Sidenote: is have on (“to be wearing”) really idiomatic? – Julia (talk) ^{• formerly Gormflaith •} 15:52, 1 June 2018 (UTC)[reply]

Tibetan observations and questions

I recently discovered the area I now live in in Sydney has the largest Tibetan community in Australia, so I'm taking the opportunity to teach myself some Tibetan.

I've made hundreds (I think) of Tibetan entries and translation entries in the past few weeks, mostly from a scanned PDF of English-Tibetan Dictionary of Modern Tibetan compiled by Melvyn C. Goldstein + Ngawangthondup Narkyid. I've borrowed a copy of Colloquial Tibetan but I'm not very far into it yet.

So far I've only really learned the alphabet and Windows keyboard layout. Very little pronunciation, grammar, or vocabulary.

Most of my Tibetan neighbours I've spoken with have very little English, one has OK English and one has excellent English. Most of them have pre-teenage kids who are all bilingual. I've met two who told me they're from Lhasa or nearby and two or three told me they're from Amdo. At least one of the ones from Amdo also speaks Chinese but some others don't even seem to recognize my attempts to speak Mandarin. They all recognize my attempts to speak Tibetan.

My understanding from a friend from Amdo I knew while staying in Xiamen a year ago is that educated Tibetans in Amdo know Lhasa and Amdo dialects but might not know Mandarin. My friend was probably about 30 years old and told me he'd only recently taught himself Mandarin and was now teaching himself English. He did also seem to speak a peculiar variety of Chinese that sounded very Tibetan to me. He used it once when we ate at an eatery run by Han people from Qinghai.

It seems to me that our Tibetan pronunciation template doesn't cover Amdo Tibetan. Or maybe there's many dialects and I just don't know which dialect names I should be looking at?

Colloquial Tibetan uses a variant of Tibetan with two tones or three tones. High, low, and neutral or no tone / unstressed. How does this relate to the variants covered in our pronunciation tables/template?

If Stephen, Wyang, or any other contributors who have some skills in Tibetan would like to glance at a bunch of my entries and offer constructive feedback if I'm making any consistent mistakes etc, I would appreciate it. — hippietrail (talk) 02:28, 1 June 2018 (UTC)[reply]

Update: Tonight I went to the local Tibetan restaurant and the two staff members I spoke to are both from Kham. On my way home I met another Tibetan, also from Kham. So it seems at least those three major varieties are represented in my neighbourhood. — hippietrail (talk) 13:05, 1 June 2018 (UTC)[reply]

Thanks for creating those Tibetan entries. Only a formatting suggestion from me for now - the Tibetan links in the Etymology section should be in {{m}} rather than {{l}}: diff (it's a ridiculous formatting rule).

For {{bo-pron}}, the |zeku= parameter is used for the Zêkog dialect, and |labrang= is for the Xiahe dialect; both of these are Amdo dialects.

For Lhasa, there are different ways of analysing its tones. I'm assuming the high/low in Colloquial Tibetan to be following a two-tone analysis. In the four-tone model (which is what {{bo-pron}} uses) each of the high/low categories can be further split into two subcategories: high becomes high flat (f) and high falling (h), low becomes low flat (w) and low rising (v). There are some minimal pairs of words in the two kinds of high tones.

I think I know which area of Sydney you meant (DY?). The Tibetan diaspora in Sydney is quite diverse AFAIK; I know some from Amdo living in that area. The Lhasa dialect is the prestige dialect of Tibetan, and many educated Tibetans know Lhasa and can use it when communicating with Tibetans from other dialect regions. Wyang (talk) 03:45, 2 June 2018 (UTC)[reply]

Yep I now live in Dee Why (etymologically comes from the letters "DY" on a map that nobody knows the reason for). Colloquial Tibetan actually describes their MO right at the beginning of the book. They made certain choices for practicality so the learner can communicate with the maximum number of Tibetan speakers, rather than adhere exactly to Standard Lhasa Tibetan. So I believe they used the two-tone model as part of that.

I'll try to remember to stick with "m" in etym sections. I'm so used to seeing a random mix of "m" and "l" that I just stick with the one I knew best. I met another new Tibetan this afternoon, this time a guy with good English who was from Lhasa and taught me several ways to say "goodbye" or "see you" which I've unfortunately already forgotten.

Thanks for your feedback! — hippietrail (talk) 10:26, 2 June 2018 (UTC)[reply]

Citing Twitter

Tweets are an absolute goldmine for vernacular, and this is especially crucial for oral-only languages (including a couple I'm particularly interested in like Scots and Swiss German). I assume since they can be deleted that they aren't considered durably archived? Can we find any solution to this? Has this been discussed before anywhere? Ƿidsiþ 05:29, 2 June 2018 (UTC)[reply]

As you acknowledged, Twitter is very, very much not durably archived. We currently have no solution for this, unless you collect a bunch of tweets and self-publish them, I suppose. Ultimately, we could develop new criteria (say, three tweets count as one regular cite, and a photo of the tweets must be uploaded to Commons and the original tweets checked by an admin to ensure that it hasn't been doctored), but that would require a lot of work and need to be subject to a vote. —Μετάknowledge^{discuss/deeds} 14:08, 2 June 2018 (UTC)[reply]

The Library of Congress is (ostensibly) archiving all tweets, save the ones which get deleted. Some are archived by law (POTUS, etc.) and should be citable. - TheDaveRoss 13:37, 3 June 2018 (UTC)[reply]

Trump's tweets are rather beside the point, as I presume they will all find their way into published material. —Μετάknowledge^{discuss/deeds} 14:56, 3 June 2018 (UTC)[reply]

True, but it is an interesting point that because his (and other presidents') are known to be archived (presumably durably), by law, arguably they could be cited without waiting for them to get into print. Hmm... I don't know. Also, @ The Dave Ross, the Library of Congress stopped archiving all tweets at the end of 2017. How accessible is their pre-2017 archive? - -sche (discuss) 15:04, 3 June 2018 (UTC)[reply]

Ah, I missed the update about switching to "selective" archiving, whatever that means. And the last I hear (a few years ago) the archive was proving technically challenging for them, and public access was limited. They also say they are going to keep the archive of tweets they acquired up until they stopped, so those are citable. Internet Archive is saving and sharing the "Spritzer" Twitter stream (1% of all public Tweets) but since they are essentially random that isn't useful for our purposes. - TheDaveRoss 17:05, 3 June 2018 (UTC)[reply]

If this were to come to a vote, I'd definitely support it. IMO tweets are one of the best sources because they're often very close to spoken language. Even if it requires a lot of work, as MK acknowledged, I think it'd be worth it. Also, is "publishing to the internet" via Google Drive durably archived? Like this thing? – Julia (talk) ^{• formerly Gormflaith •} 17:58, 3 June 2018 (UTC)[reply]

Google doesn't have a good track record of keeping its services alive. (It's a pity we now have to rely on Google Groups for a Usenet archive!) Also if we archive others' tweets then there might possibly be legal issues around privacy/copyright. Equinox ◑ 18:01, 3 June 2018 (UTC)[reply]

I agree with User:Julia and it's very long (years) overdue. Kaixinguo~enwiktionary (talk) 18:01, 3 June 2018 (UTC)[reply]

We're not a twitter archiving service. We're not going to become a twitter archiving service. There are currently no Twitter archiving services that we can use. That might change in the future, but right now we would just be doing something half-assed. What do you mean by "a lot of work"? That's meaningless. DTLHS (talk) 18:08, 3 June 2018 (UTC)[reply]

Why not just take a screenshot and upload it to Commons? They could be verified by one or possibly two admins at the time, and permission could be asked just as it is for some images. I think the expression 'a lot of work' is fairly common and self-explanatory. Kaixinguo~enwiktionary (talk) 19:35, 3 June 2018 (UTC)[reply]

In that case what do we do if the original author comes with a copyright claim, or claiming distress that we are persisting something they chose to delete? GDPR etc... -- Actually I suppose the same questions apply to Usenet! Hmm. Equinox ◑ 19:42, 3 June 2018 (UTC)[reply]

"a lot of work" as in more steps than just getting a cite from a book or something. Regarding copyright issues (which I don't know much about) what rights do the creators of the tweet have? And what rights would we be possibly violating? – Julia (talk) ^{• formerly Gormflaith •} 20:06, 3 June 2018 (UTC)[reply]

That's a question for the people at Wikimedia Commons. DTLHS (talk) 04:54, 4 June 2018 (UTC)[reply]

Display text of Template:der3 and others

I have reverted Dan Polansky's latest attempts to change the table title, on the grounds that his version looks stupid and repeats "Derived terms" right after the heading that says the same thing. Furthermore his references to the "status quo" seem specious. Finally, the pattern "Terms derived from X" is easy to extend to "Terms derived from X (noun)" if disambiguation is necessary. Other's thoughts? DTLHS (talk) 06:36, 2 June 2018 (UTC)[reply]

I admit that repeating "Derived terms" in the table heading after the same section heading looks a little odd, but "Terms derived from X" is a needless repetition of X, and looks really bad to me. Of course they are derived from X; X is the entry. I admit that I did not raise this when this practice started to be introduced but only recently. There are so many practice changes being inntroduced without a discussion. I think the "status quo ante" is fundamentally correct, and was used by me in {{rel3}}; I dispute that diff from 2016 was based on consensus, and I have no evidence of that consensus other than silence. --Dan Polansky (talk) 14:31, 2 June 2018 (UTC)[reply]

I like DTLHS's approach for a default. If something else emerges from user-added (non-default) content, that might be considered for a replacement of the default. We (by which I mean DTLHS) could do a dump run to find any potentially desirable innovations. DCDuring (talk) 16:00, 2 June 2018 (UTC)[reply]

Derived terms used to have no collapsible tables; for short derived terms, that was much more user friendly and avoided cruft. Now, when I visit the parta#Czech, I find section "Derived terms", underneath collapsible "Terms derived from parta", and within mordparta, and parťák. It looks ugly and stupid, pardon my French. --Dan Polansky (talk) 09:39, 3 June 2018 (UTC)[reply]

One solution would be to provide no text in the collapsible table: that would remove all cruft and all repetition, even the odd-looking "Related terms" (section heading), "Related terms" (collapsible heading) repetition. --Dan Polansky (talk) 09:48, 3 June 2018 (UTC)[reply]

I consulted English entry party for an unrelated purporse. This is what I see, as a sequence:

"Hyponyms"
"Hyponyms of party"
"Derived terms"
"Derived terms of party (noun)"
"Related terms"
"Terms related to party"

"party" is in italics there. This is what we call in Czech "jak u blbejch na dvorku", like at a morons' yard.

Making all the collapsible headings empty would be a huge improvement.

--Dan Polansky (talk) 10:05, 3 June 2018 (UTC)[reply]

In your example of parta#Czech, I think it would be better for the 'derived terms' not to be collapsed. There are only two of them and it is important information. Kaixinguo~enwiktionary (talk) 10:14, 3 June 2018 (UTC)[reply]

Our entries don't make good use of space, it's true. There's a lot of whitespace (other online dictionaries usually put all their pronunciation information compactly on one same line as 'pronunciation:', for example, not on three or more short lines with lots of empty space to the right of them). When there is text, a fair amount is redundant. But removing all text from the collapsible headings would be bad because it'd make it too easy to miss that there was information there. The "show" text in small font all the way at the other side of the screen from where most text starts; the collapsible box itself is a light grey which, when my screen is tilted at some angles, isn't even distinct from the background, so it's possible to miss the entire existence of it, and even when it is seen, it's possible to miss the "show text" (as mentioned) and, if no other text is present, to think it's an empty box i.e. that there are no derived terms. I know, because I stumbled onto an entry where someone had manually suppressed the text. Perhaps the redundant "derived terms" text should be replaced with "list" (which would also discourage use of the template when there's only a single derived term), or replaced with floating the "show" link to the left. - -sche (discuss) 15:05, 3 June 2018 (UTC)[reply]

Pakistani surnames

There are Pakistani cricketers named Misbah-ul-Haq, Inzamam-ul-Haq, Imam-ul-Haq and probably others. All spelled as a single term with two hyphens. Is "ul-Haq" a surname? If not, can anyone explain the format of the names please? SemperBlotto (talk) 13:38, 2 June 2018 (UTC)[reply]

It is a family name. I think it's "the truth" (حق); see Al-Haqq. Not certain. Equinox ◑ 13:46, 2 June 2018 (UTC)[reply]

@SemperBlotto, Equinox: الْحَقّ (al-ḥaqq) is the definite form of حَقّ (ḥaqq, “truth”). In the formal Arabic the definite article الْ (al-, “equivalent of "the"”) is pronounced with an "a-" at the beginning of an utterance, in other positions it follows desinential inflection (iʿrāb) ending of the previous word and the initial vowel of "al-" is dropped (elided). E.g. "Misbah-ul-Haq" would be مِصْبَاحُ الْحَقِّ (miṣbāḥu l-ḥaqqi), "luminary of the truth" in the nominative case. So "u" belongs to the previous word but this vowel is usually not written, and the initial "ا" is silent. Languages borrowing from Arabic often follow these conventions. "al-" is more common, though. "el-" or "il-" is from dialectal/informal Arabic. --Anatoli T. ^{(обсудить}/^вклад) 15:02, 2 June 2018 (UTC)[reply]

Thanks. So, would his father, brothers and sons also be xxxx-ul-haq? (and females??) SemperBlotto (talk) 05:56, 3 June 2018 (UTC)[reply]

Idiomatic names.

I'd like to add a type of category for names with idiomatic/sarcastic usage by language. An example in English: You can say 'You don't say, Sherlock' and even people who have not read Sherlock Holmes will understand that the word 'Sherlock' encodes the information that they're being obtuse and have just stated something obvious. This does not work with other fictional detectives such as 'you don't say, Hercule' or 'you don't say, Continental Op', whose names are therefor not idiomatic. I propose a label like 'langname names with idiomatic usage' or 'with sarcastic usage' since I can't recall names like Sherlock/Gandhi/Einstein etc. to be used as an actual praise. Korn [kʰũːɘ̃n] (talk) 11:59, 3 June 2018 (UTC)[reply]

@Korn: Are there many such names? I think the "sarcastic" part would make the category too narrow; besides, isn't sarcasm a contextual/pragmatic phenomenon more than a lexical one?

However, I definitely agree that we should gather all those genericised names (which run parallel to "genericised trademarks", imo) in a category; I even suggested as much last year. By the way, I think the genericisation process is called antonomasia (sense 2). --Per utramque cavernam 15:26, 3 June 2018 (UTC)[reply]

"Idiomatic" is broader than "sarcastic", of course, since Einstein#Noun "intelligent person", Joe#Noun "a guy", and arguably "mein Name ist Hase" "I know nothing" all seem like idiomatic but not sarcastic uses of names. I do think it'd be useful to have a category for idiomatically-used names (others: John Doe, Bubba, and arguably Johnny Reb and Johnny Foreigner, etc). I'm not sure a category for sarcasm would be as maintainable, since most names and other words are used sarcastically sometimes and so it might largely duplicate the "idiomatic" category (though a vote excludes separate senses for sarcasm except when terms are "seldom or never used literally"). Perhaps we should avoid the very opaque antonomasia, though (I suspect DCDuring will agree with me on this part?). - -sche (discuss) 15:33, 3 June 2018 (UTC)[reply]

The reason I'm thinking about using the 'sarcastic' tag is that it might be that non-sarcastic usages are less idiomatic rather than plain references to the actual person, but the idiomatic label seems preferable to me too. Korn [kʰũːɘ̃n] (talk) 16:50, 3 June 2018 (UTC)[reply]

Adapting Wikipedia template to warn about NSFW/sexual images for Persian

For some reason the Persian Wikipedia has more extreme images than others, for example, there is a gif at 'ejaculate'. When I was checking the translations for 'pearl necklace' and 'footjob' I have seen images of that as well. What if the {{wikipedia}} template were adapted to show a short warning? Would anyone mind? Kaixinguo~enwiktionary (talk) 17:55, 3 June 2018 (UTC)[reply]

w:Pearl necklace (sexuality) has the same image as w:fa:گردنبند مروارید. Wikipedia links are basically off-site links, and have the implied warning that we don't control the content there. Trying to track the current NSFW status of every Wikipedia page we link to, even if there were a clear agreement about what NSFW means, is hopeless.--Prosfilaes (talk) 19:06, 3 June 2018 (UTC)[reply]

No, there is no implied warning at all. Kaixinguo~enwiktionary (talk) 19:30, 3 June 2018 (UTC)[reply]

It says it's going to Wikipedia. Wikipedia is not censored, and a link to Wikipedia could potentially lead to anything, and a stable Wikipedia page for a sexual term may have various types of illustrations. That should give users plenty of warning.--Prosfilaes (talk) 04:54, 4 June 2018 (UTC)[reply]

No, Wikipedia isn't censored (much) and sometimes has sexual pictures. That's how it is. You should install censorware on your own computer if you need to stop this. Equinox ◑ 19:33, 3 June 2018 (UTC)[reply]

I didn't request to censor Wikipedia. It's not unreasonable to warn of, not censor, a gif of ejaculation.

By the way, I'm not thinking of myself, although I had never seen those images before. Kaixinguo~enwiktionary (talk) 19:39, 3 June 2018 (UTC)[reply]

I understand the notion and agree that there's too great a laxity with preventable exposure in the Wiki community (We have an entry with a picture of a corpse, which I abhor.), but I too think this is an issue to be fixed at Wikipedia and that the very fact that you're moving to another site, to read an article about sexual practices, implies that you might get exposed to the act in question. — This comment was unsigned. My bad. Korn (talk • contribs)

I don't think it is at all reasonable to expect that there will be a video demonstrating a sex act on every page which describes a sex act. While I don't think we should censor content, I don't see any reason why labeling content that is likely to be offensive or otherwise problematic to a large segment of the population would be a bad thing. The fact that we can't be comprehensive is not an argument against such labeling when we are aware. Personally, I often use Wiktionary and Wikipedia while at work, and would be annoyed if a video of someone ejaculating showed up on my screen if I wanted to find out what some term in a random song or comedy act I was listening to meant. I don't think this is a prudish or censorial viewpoint. - TheDaveRoss 00:00, 4 June 2018 (UTC)[reply]

Then don't click on Wikipedia links. The discussion of what to display on Wiktionary is entirely separate from whether we should try to mark up certain Wikipedia links as potentially NSFW.--Prosfilaes (talk) 04:54, 4 June 2018 (UTC)[reply]

The idea that one should never click on Wikipedia links because some of them will show graphic videos is patently ridiculous. - TheDaveRoss 11:22, 4 June 2018 (UTC)[reply]

The idea that Wiktionary should keep track of and put warnings on links to pages on Wikipedia that might contain "NSFW"/"explicit" images, when Wikipedia itself doesn't put warnings on, is also ridiculous, especially because the pages for which the presence of explicit images is least likely to change and invalidate any such warning-labelling, namely pages about sex or body parts, are ones readers can expect an uncensored encyclopedia to illustrate.
"Explicit"/"NSFW" are nebulous, anyway: is a link to a page with an image of a nipple going to be tagged initially? (I'm sure it'll be tagged in the end; censorship creeps.) Is a link to a page that might or might not contain an illustration of breastfeeding to be tagged, initially? Is a Wikipedia "List of Foo slang" that documents swear words or words for sex acts "explicit", given that we do have at least one user who has a filter that deletes swear words? There are entire communities of bigots who would prefer not to see images of gay people, or of any women. Maybe some people only want to tag the most explicit animations, but the censorship will inevitably creep "to err on the side of caution".
- -sche (discuss) 14:12, 4 June 2018 (UTC)[reply]

There are user-scripts users can individually enable to block images / videos from displaying, if they wish to merely read about ejaculation at work. - -sche (discuss) 14:18, 4 June 2018 (UTC)[reply]

Labeling is not censorship, that is a red herring. As is the fact that we cannot be comprehensive, that is an argument against the project as a whole. The subject of this discussion is not to limit what can or cannot be shown on Wikipedia (or Wiktionary); it is merely suggesting that, in cases where an editor knows that the target of a link contains graphic imagery, the editor can let readers know. This does not impede readers from seeing or following links, it just lets them self-select out of certain content if they wish to, instead of forcing them to play roulette. Graphic content lowers the utility of Wikipedia for many, labeling such content so that users can actively avoid it as they choose mitigates this problem. - TheDaveRoss 15:11, 4 June 2018 (UTC)[reply]

They are labeled; every word that starts this discussion has a definition that clearly warns anyone of what might be in Wikipedia. I'm not even sure where Kaixinguo~enwiktionary expects us to put the warning; he talks about the Persian Wikipedia and says "When I was checking the translations"; are we supposed to warn on every translation? If you can't handle what's on Wikipedia, don't go there, or at least install a filter that should try and protect you.

We are pretty comprehensive in English at least. Our usefulness drops drastically in other languages where we don't have a decent coverage. What good does tagging a handful of Wikipedia links as NSFW if there's ten times as many NSFW links that aren't so tagged? It gives you no reason to think you can ever safely click on a Wikipedia link, exactly where you started.--Prosfilaes (talk) 21:29, 4 June 2018 (UTC)[reply]

Eliminating undocumented withtext= param in `{{borrowed}}`

I plan to use a bot to eliminate the remaining places where withtext= is used in {{borrowed}}. The plan is to use a bot to replace "{{bor|...|withtext=1}}" with "Borrowed from {{bor|...}}" whenever the template occurs at the beginning of a line or sentence, and to handle the remaining cases by hand. I've spot-checked a dozen or so cases so far and all of them have {{bor}} at the beginning of a line or sentence, and all of them read fine when using the "Borrowed from" text instead of the auto-generated "Borrowing from" text (and in many cases, "Borrowed" reads better than "Borrowing"). Benwing2 (talk) 18:19, 3 June 2018 (UTC)[reply]

I think you're good to go, this was part of the plan anyway. See Wiktionary:Beer_parlour/2017/November#Template:bor:_Replace_notext=1_with_withtext=1 --Per utramque cavernam 18:27, 3 June 2018 (UTC)[reply]

OK, I wrote the script and it's ready to go. With some special-case hacking, there are only around 135 lemmas (out of 10,200+) that need to be handled manually; most of these are erroneous uses of withtext=1 of various sorts. I'll wait a bit longer to make sure no one objects. Benwing2 (talk) 20:19, 3 June 2018 (UTC)[reply]

I fixed all the manual cases and am running the script to fix the automatic cases. Benwing2 (talk) 03:04, 4 June 2018 (UTC)[reply]

Finished. Benwing2 (talk) 07:21, 4 June 2018 (UTC)[reply]

Parents of foo-mid languages, foo-old languages

The parent of bn-mid (Middle Bengali) and bn-old (Old Bengali) are given as bn (Bengali), which seems totally wrong. Same for or-mid, or-old, kok-mid, kok-old, etc. etc. Is this correct? Maybe so because these are etym-only languages but it seems weird. Benwing2 (talk) 03:03, 4 June 2018 (UTC)[reply]

If the Old/Middle forms of the language aren't considered sufficiently distinct to treat as different languages from the modern language, then I guess it makes sense; after all, Biblical Hebrew is an etymology-only language with (modern) Hebrew as its "parent". - -sche (discuss) 14:06, 4 June 2018 (UTC)[reply]

No, that isn't right. I don't think Old and Middle Konkani have enough attestation to deserve full codes, but Old/Middle Bengali and Odia (or Oriya, whatever we call it here) should be upgraded to real codes. —Aryaman^A ^{(मुझसे बात करें • योगदान)} 01:47, 5 June 2018 (UTC)[reply]

Soft redirection template for Japanese

Hi everyone. What do you think about the soft redirection format on 貴方?

For pronunciation and definitions of 貴方 *– see あなた.* (This term, 貴方, is a kanji spelling of あなた.)

The soft-redirection template is meant to serve the same function as {{zh-see}} for Chinese. Although currently not implemented, it should be able to display glosses and copy categories from the lemma entry in the future. If the idea of having a Japanese soft-redirection template is accepted, we can create alternative forms (mainly of pairs like まっとう / 全う) faster by doing away with the need of copying POS headers as well as manually providing glosses, which can become out of dated if the lemma entry changes.

(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take): --Dine2016 (talk) 11:31, 4 June 2018 (UTC)[reply]

I support centralisation of Japanese entries. The ongoing trouble is to decide what IS the Japanese lemma. This can't be decided easily. There are good arguments in favour of kana entries and in favour of kanji entries (if a term has both) and it very much depends on:

What is the most frequent spelling?
Are there multiple etymologies and what is their distribution? Unrelated homophones with different etymologies are better off having kanji entries as lemmas, if the kanji spelling is more common than kana.
Verbs or adjectives with the same reading/pronunciation might be better off lemmatised at kana entries with only one inflection table. They are mostly native Japanese words.
Sino-Japanese entries (or more broadly words with 100% on'yomi readings) are better off lemmatised at kanji with the most frequent spelling, only if it happens to be kanji.

It's roughly my position before we jump into making redirects for Japanese entries. --Anatoli T. ^{(обсудить}/^вклад) 12:02, 4 June 2018 (UTC)[reply]

I was under the impression that the discussion two months ago reached a preliminary consensus for native words. Korn [kʰũːɘ̃n] (talk) 12:28, 4 June 2018 (UTC)[reply]

Thanks for your replies. I think the choice of lemma forms is a separate issue, and existing entries can be left as is before we settle on an approach of centralization. There are lots of words for which most editors would agree on the kanji spelling as the lemma form. At the current stage, the soft redirection template could facilitate the creation of their hiragana forms. --Dine2016 (talk) 12:47, 4 June 2018 (UTC)[reply]

Another term would be ふるさと (furusato, “hometown”, amongst other meanings). It has (at least) three known attested kanji spellings: 古里, 故里, and 故郷. Better than calling an alternate spelling of one kanji compound (as in the third)? ～ POKéTalker（═◉═） 13:26, 4 June 2018 (UTC)[reply]

ふるさと (furusato) is a good candidate to be a lemma. A native Japanese word and Chinese characters for it are only a visual help, possibly none of them is more common than the kana form but if one of them is more common than that spelling should be the lemma. If it's decided that native words are lemmatised at kana, then so be it. --Anatoli T. ^{(обсудить}/^вклад) 13:37, 4 June 2018 (UTC)[reply]

I support this proposal for {{ja-see}}.

Re: which spelling to choose as lemma, I'll reiterate my preference, as discussed and developed in the earlier thread: native-Japanese terms (i.e. 和語 (wago)) and non-Chinese foreign-derived terms (外来語 (gairaigo)) would go under the kana spellings, while Chinese-derived terms (漢語 (kango)) would go under the kanji spellings. The rationale for this is that wago and gairaigo may have multiple possible kanji spellings (where such exist), whereas kango generally only have one (rarely two) spellings. Kana entries for kango would be soft redirects to the kanji entries (the current status quo), while kanji entries for wago and gairaigo would be soft redirects to the kana entries (this would require changes from our current state). The wago and gairaigo entries could specify spelling frequencies with usage notes or labels.

Example: とる (toru) has a basic meaning of to take. However, which kanji spelling is used most frequently depends on which sense is intended: 撮る for photography and video, 採る for samples to be used for something, 捕る for capturing or trapping a pest or catching a pop-fly, 獲る for capturing or trapping prey, 盗る for taking something illicitly, etc. etc. Personally, I think it makes the most sense to consolidate all of these under the とる (toru) kana spelling, and in fact, this is what monolingual JA-JA dictionaries essentially do. See this entry in Daijirin for the 採る spelling, grouped with the other spellings of the same etymology. ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:59, 4 June 2018 (UTC)[reply]

@Eirikr: Hi. OK, 繰(く)り返(かえ)し (kurikaeshi) is a wago. Could you explain, why 繰り返し should be a redirect to くりかえし if 繰り返し is the most common spelling? --Anatoli T. ^{(обсудить}/^вклад) 23:01, 4 June 2018 (UTC)[reply]

Two reasons: 1) technical constraints inherent in the MediaWiki platform, and 2) consistency.

The technical constraint has to do with how we have redirects. Electronic JA-JA dictionary apps that I've been able to try out generally have pretty slick redirection, where the user can input kanji or kana and still get to the desired entry either directly or with one additional input. If the user clicked the wrong entry, the list is still there on the screen, so no need to go back: just click another entry in the list. We could do that with hard redirects, but since a single spelling might overlap with terms in other languages, we cannot do so across all JA entries.
The consistency consideration is in part to match other JA-JA dictionaries (lemming-wise, including our cohorts at the JA WT), in part so all wago are lemmatized similarly, and in part for usability.

Wago readings tend to be unambiguous, with one reading matching one term. This is consistent with the history, where wago derive from the verbal language. Spellings were an afterthought as literacy was imported into Japan from a completely unrelated donor language. Even today, usage conventions for okurigana (the kana after the kanji) can be somewhat loose -- the KDJ lists kurikaeshi under the spelling 繰返, for instance. But if a user knows the pronunciation, they can always spell out the kana.

Kango, meanwhile, were borrowed from written Chinese, with a focus on the meaning inherent in the characters and without much regard for what they sound like. In extreme instances, a single kango reading might have tens of spellings. せいしん (seishin), for instance, generates 21 distinct hits in my local electronic Daijirin, each with distinct derivations and senses. せいせい (seisei) generates 28 distinct spellings, belonging to 25 different terms. とうし (tōshi) generates 24 distinct spellings for 20 terms.

This difference in history and verbal / visual distinctiveness carries over into how terms are used: wago are used more in spoken and informal speech, where auditory disambiguation is key, while kango are used more in written and formal texts, where the written text allows authors to visually specify meaning that might be lost in a spoken medium.

→ Broadly speaking, wago are phonemically distinct (kana spellings), while kango are graphemically distinct (kanji spellings).

Conversely, one could turn your question around: for any given wago, not just 繰り返し (kurikaeshi), why would we use the kanji for the lemma? There's more overhead for editors in having to identify which spellings are more common (clear for kurikaeshi, but not always so simple for other terms, and not always provided even in modern dictionaries), duplication of data at multiple spellings and/or frustrating arbitrariness where we have to just choose one among multiple current variants of roughly equal frequency, and more potential for confusion among users (which spelling? which okurigana? why is Wago A under a kanji spelling, but Wago B under a kana spelling?). Using kanji spellings for wago can also obscure otherwise-clear relationships, as observable at あばく (abaku). For instance, if we split each sense of とる (toru) out to its kanji spelling, we would fracture the entry and make it harder for users to see that all the spellings of toru are just shades of meaning of the same verb. Imagine if English get were similarly split up, where each sense had a distinct spelling and separate entry, despite all senses having the same reading, same derivation, same underlying meaning.

‑‑ Eiríkr Útlendi │^{Tala við mig} 00:09, 5 June 2018 (UTC)[reply]

@Eirikr: OK, thanks for the detailed answer, almost convinced. Well, the Chinese handling is not perfect either - simplified characters act as redirects, even if their usage is much higher than that of the traditional. --Anatoli T. ^{(обсудить}/^вклад) 07:22, 5 June 2018 (UTC)[reply]

Transclusion

@Eirikr: I know it's a bold idea, but what if we make the template transclude the appropriate sections from the lemma entry, like this?

==Japanese==
{{ja-see|繰り返し}}

For pronunciation and definitions of くりかえし *– see 繰り返し.* (This term, くりかえし, is a kana spelling of 繰り返し.)
Pronunciation Kun’yomi IPA^(key): [kɯ̟ᵝɾʲika̠e̞ɕi] Noun 繰り返し (hiragana くりかえし, rōmaji kurikaeshi) repetition

--Dine2016 (talk) 01:06, 5 June 2018 (UTC)[reply]

Very nice. Wyang (talk) 03:33, 5 June 2018 (UTC)[reply]

I'm more than okay with that. I've long thought about this kind of transclusion as a means of providing users the relevant info while avoiding flat-out manual duplication. ‑‑ Eiríkr Útlendi │^{Tala við mig} 03:37, 5 June 2018 (UTC)[reply]

@Dine2016, just expanded 繰り返し (kurikaeshi). How any sections (etymology, kanjitab, derived terms, etc.) can be omitted in appropriate entry with ja-see? ～ POKéTalker（═◉═） 04:18, 5 June 2018 (UTC)[reply]

@Poketalker: I think every relevant section should be transcluded, so that whether the reader searched for くりかえし or 繰り返し, they will be able to get the same information on the word 繰(く)り返(かえ)し (kurikaeshi, “repetition”). Different spellings, same word, same information. I believe this is how the electronic dictionaries Eirikr mentioned above work, except that we don't require an extra click if we take this approach. An exception may be made for {{ja-kanjitab}}, which is usually spelling-specific. --Dine2016 (talk) 06:59, 5 June 2018 (UTC)[reply]

@Poketalker: There are technical restrictions on the amount of transclusion. In addition, homograph entries like 上下 (agarisagari, ageoroshi, agekudashi, agesage, ueshita, kamishimo, shōka, jōka, jōge, noboriori) can be long and hard for readers to find the entry they're looking for, canceling out the advantage of not requiring an extra click for the full entry. Reconsidering the issue now, I think it's ok to just display the POS headers and definitions (and perhaps pronunciation and usage notes) and direct the reader to the lemma entry for full information. Alternatively, we can include all the information but make the templates collapsed by default (which is probably technically inferior due to page load time). @Wyang, Eirikr, any thoughts? --Dine2016 (talk) 11:24, 6 June 2018 (UTC)[reply]

@Dine2016: You have actually made the kana entry a redirect to kanji in your example but Eirikr wanted the other way around, no? Good job, anyway. --Anatoli T. ^{(обсудить}/^вклад) 07:22, 5 June 2018 (UTC)[reply]

@Atitarev: Yes, but {{ja-see}} is expected to support both kanji-to-kana and kana-to-kanji, so converting it to the other way should be easy. --Dine2016 (talk) 07:48, 5 June 2018 (UTC)[reply]

The main potential problem I can see would be when the template is on a page with lots of other content- such as huge lists of Chinese compounds on kanji pages. There are limits to the amount of memory and transcluded content allowed on a single page. Going over the memory limit causes highly visible module errors. Going over the transclusion limit, on the other hand, means that every template that hasn't already been transcluded becomes a hyperlink to the template, so {{l|en|example}} is replaced by Template:l (sometimes it just has the invoke statement, but either way, it's useless). Worse, these unexpanded duds are almost always at the bottom of the page, where editors are less likely to spot them- unless you happen to check Category:Pages where template include size is exceeded or notice the category at the bottom of the page, you may not realize anything is wrong. See User:Hermitd/Greek wordlist for an example. That said, the limit is 2 MB of transclusion, so it may not happen much. Chuck Entz (talk) 04:56, 6 June 2018 (UTC)[reply]

@Chuck Entz: Thanks for the heads-up. What contributes to reaching the transclusion limit: fetching the wikicode of the lemma entry with getContent(), expanding the relevant sections with preprocess(), or only returning the expanded wikicode from module to page? --Dine2016 (talk) 07:12, 6 June 2018 (UTC)[reply]

Wiktionary:Summer Competition 2018

Hey. I'm probably gonna start the new Wiktionary:Summer Competition 2018 soon. It's Wiktionary:Christmas Competition 2013 repeated, but with a less able Gamesmaster. I should probably fix a few things first. I'll keep y'all posted about publication dates etc. when I can be bothered to. --Genecioso (talk) 21:09, 4 June 2018 (UTC)[reply]

How come...

Wiktionary:Translation requests is such a popular page? --Genecioso (talk) 22:04, 4 June 2018 (UTC)[reply]

WMF doesn't keep track of (or at least doesn't publish) referrer statistics for individual pages. If someone here was so inclined they could make a tool to database referrers and add some javascript on this end to populate the table, then we could find out where all of the traffic is coming from. We don't seem to result highly on Google for many seemingly obvious queries. - TheDaveRoss 14:11, 5 June 2018 (UTC)[reply]

Important: No editing between 06:00 and 06:30 UTC on 13 June

This is just to tell you that your wiki will be read-only between 06:00 UTC and 06:30 UTC on 13 June. This means that everyone will be able to read it, but you can’t edit. This is because of a server problem that needs to be fixed. You can see the list of affected wikis on Phabricator.

If you have any questions, feel free to write on my talk page on Meta. /Johan (WMF) (talk) 12:37, 5 June 2018 (UTC)[reply]

For those who don't know what to do during that time, take up the harmonica - I have one to donate - send me an email with your address, and I'll send it off to you. --Genecioso (talk) 13:43, 5 June 2018 (UTC)[reply]

Sounds great! User:Equinox, c/o Wikimedia Foundation, 1 NOTPAPER Way, Banville, CA 95966. Equinox ◑ 16:41, 5 June 2018 (UTC)[reply]

In the post. Let me know when it arrives. --Genecioso (talk) 20:13, 6 June 2018 (UTC)[reply]

Oh no! We can't edit Wiktionary for 30 whole ~~years~~ minutes! What are we gonna do?! It's not like we can read a book or play a video game or watch a movie or anything in that time! I mean, it's 30 minutes! We are truly doomed as a species. PseudoSkull (talk) 03:21, 10 June 2018 (UTC)[reply]

Buryat IPA transcription

I don't know if this is appropriate, but can anyone check if the IPA transcriptions for the Buryat lyrics of the Buryatia regional anthem here are accurate? There might be some errors. If there are any errors, would anyone (who is familiar with Buryat phonology and/or phonologies of Mongolic languages) provide a better transcription than the current one? Thanks. 213.183.63.189 04:18, 6 June 2018 (UTC)[reply]

I didn't go through all of it, only the first three stanzas. I think the IPA transcriptions are very good. I only noticed one thing that I take issue with: the letter ө should, in my opinion, be transcribed as IPA œ. However, the transcription was added by w:User talk:Lucarubis, and he or she may have had a good reason for transcribing it as a simple o. We should ask him or her about it. —Stephen ^(Talk) 11:49, 10 June 2018 (UTC)[reply]

On the placement of constructed languages, and on the attestation of appendix-only languages

See also: Beer_parlour/2018/April

Wiktionary:Votes/pl-2018-04/Disallowing appendix-only languages has failed. I think Gamren has taken things a bit backwards, and it made it look like he wanted to delete the mainspace-like content (i.e. the entries) currently hosted in appendices altogether. If I've understood the issue correctly, that wasn't his intent at all. I think his view is that our current separation between main space and appendix-only languages

1) is artificial;
2) leads to our hosting unchecked content.

I'll address the second issue first. In my view, he's made a valid point: it would seem that, at present, Appendix-only languages are not subject to any attestation criteria. Is that really what we want, and what we wanted when we relegated Lojban to the Appendix namespace (Wiktionary:Votes/2018-02/Moving Lojban entries to the Appendix)?

I don't think so, or at least I hope not; in my opinion, all words, wherever they are, should be subjected to some kind of attestation criteria. Does everyone agree on that, or does it need to be put to the vote?

If that's agreed upon, the question would then be: what kind of attestation criteria do we want for Appendix-only languages?

the stringent ones of WDL?
the more lenient ones of LDL?
a middle ground (two quotes?)?
a mix of both (i.e. submitting some languages to the WDL criteria, others to the LDL criteria)?
something else, but something?

Whatever the answer, I see one problem: given that the distinction between main space languages and appendix-only languages would be:

neither one of attestation (since we'd have agreed that all of them need some kind of attestation);
nor one of strength of attestation (since we already make a distinction in the main space between WDL and LDL – it thus seems difficult to find a third one which would be completely specific to appendix-only);
nor one of "naturalness" (since there are both natural languages and constructed languages in the main space);

what exactly would be the criterion? This brings us back to problem number one: is there a meaningful distinction to be made between main space languages and appendix-only languages?

If we want to make strength of attestation the criterion, we move all LDL-subjected languages – natural or constructed – to the appendix namespace, and all WDL-subjected languages – natural or constructed – to the main space (that's a shit idea, if you ask me);
If we want to make "naturalness" the criterion, we move all constructed languages to the Appendix: all natural languages belong in the main space, and all constructed languages (that we've agreed to keep on Wiktionary) belong in the Appendix space, regardless of the attestation criteria we'll choose for each. That would be my preference, but I think many people will be opposed to that.

Given that neither of those solutions is particularly appealing, we have to look further. Fact one: there are only constructed languages in the appendix. Fact two: constructed languages kept in the main space are subject to stringent (WDL) attestation criteria. Fact three: ...

But if, in the end, there really isn't any meaningful distinction to be made (which, again, I'm not convinced of), we go back to Gamren's solution: disallowing appendix-only languages, which means:

1) moving everything to the main space;
2) working from there: what do we want to keep, under which criteria, and what do we want to see deleted for good?

I might have taken a shortcut or two, but I think that's the gist of it. I hope I haven't misrepresented the facts. --Per utramque cavernam 17:52, 6 June 2018 (UTC)[reply]

With LDL languages the presumption is that we're referencing a dictionary that recorded use, even if we don't have primary access to that use. With constructed languages that may not be the case. I could imagine doing something where words in a particular constructed language would be disallowed in mainspace unless they had three actual examples of use (*not* based on RFV, but required examples before they are in mainspace at all), moved to the appendix if all they had was a dictionary reference, and deleted otherwise. DTLHS (talk) 19:03, 6 June 2018 (UTC)[reply]

Indeed, that would be another solution. Per utramque cavernam 19:08, 6 June 2018 (UTC)[reply]

@DTLHS: By the way, I've edited my message a bit (and a little bit more) since you replied. Per utramque cavernam 19:19, 6 June 2018 (UTC)[reply]

I would like to have a bit of discussion regarding the bigger topic of votes and resolution of Lojbannic issues. I think it reasonable to discuss if a better criteria for inclusion of lojban entries can be made. Since Lojban like other languages involves putting together words from pieces or affixes, how do natural languages like German which do things like this decide which words to include on wiktionary ? Jawitkien (talk) 02:14, 7 June 2018 (UTC)[reply]

Per's presentation of my position is accurate. @DTLHS As I pointed out to Meta, LDL doesn't mean that we have to allow dictionaries, if no descriptive and reliable ones exist, as would be the case for most conlangs. I would argue that an unreliable mention should be worth nothing, just like for natural languages. Generally, I don't understand treating conlangs and natural languages differently; if people have used a word in their writings, what does it matter whether we knew who invented it, or it developed through centuries?__Gamren (talk) 07:45, 7 June 2018 (UTC)[reply]

Except if very few people have written in it.__Gamren (talk) 12:10, 7 June 2018 (UTC)[reply]

The communities of editors for each language decide on what counts for LDL attestation. Unfortunately, the editing community for Lojban has been very antagonistic toward CFI, and was outright ignoring it for some time, so I don't think they are likely to exclude dictionaries that include newly coined words (which is all Lojban dictionaries that I know of). —Μετάknowledge^{discuss/deeds} 13:08, 7 June 2018 (UTC)[reply]

Having entries in languages that can't be attested by any reasonable means but are correct is inherently valuable. That's why we have reconstructed protolanguages (note that reconstructed languages are constructed languages — it's easy to forget that since they now have their own namespace). I want fish to be able to link to a Proto-Germanic etymon just as I want bat'leth to be able to link to a Klingon one. But that doesn't mean that Klingon belongs in mainspace, or that much of its lexicon would pass CFI (although potentially more than Lojban!). I think there is a meaningful distinction to be made there, and that we don't want Proto-Germanic in mainspace either. But we can definitely close the loophole by establishing explicit attestation standards for the appendix, perhaps one durably archived use. —Μετάknowledge^{discuss/deeds} 13:08, 7 June 2018 (UTC)[reply]

I don't think constructed language is commonly used to include reconstructed languages, and we don't have such a sense at constructed language.__Gamren (talk) 17:10, 7 June 2018 (UTC)[reply]

I'm talking about the concepts, not the words. —Μετάknowledge^{discuss/deeds} 12:00, 8 June 2018 (UTC)[reply]

I don't understand what you mean by that.__Gamren (talk) 14:27, 8 June 2018 (UTC)[reply]

@Per utramque cavernam, Dan Polansky, Metaknowledge, Jawitkien, DTLHS I've made a quick vote draft for inclusion criteria. Edit/discuss/{add more options} if you want. After that, I think we should have a vote to move the language fully or partially back into mainspace (because the decision to move it in the first place was influenced by inclusion concerns, which doesn't have to be related). Then we can decide what to do about the other languages. Sound good?__Gamren (talk) 10:51, 23 June 2018 (UTC)[reply]

I don't like any of those options. Option 1 is the same as before we moved everything to an appendix, so we would just be reverting that vote. Options 2 and 3 are too loose. DTLHS (talk) 16:00, 23 June 2018 (UTC)[reply]

Several people voted for moving to appendix because they just didn't want it in mainspace, not because they wanted looser criteria, so it's not just a reversal, and it might fall differently than before. Above, it seems like you want to include everything that has either three citations or a dictionary reference, is that correct? I've added that, but we need to specify what dictionaries are usable. For now, I've just put jbovlaste.__Gamren (talk) 16:46, 23 June 2018 (UTC)[reply]

The vote is very problematic. By bringing up a non-durable dictionary (jbovlaste), it muddies the waters considerably. What we should actually do is have a vote about attestation of constructed languages in the Appendix in general, rather than making a vote with such poor options that I can easily imagine all of them failing. —Μετάknowledge^{discuss/deeds} 20:20, 23 June 2018 (UTC)[reply]

Sigh. So, what do you want? You contributed to making this mess, why not give us your idea for cleaning it up, instead of complaining each time I show some initiative?__Gamren (talk) 07:46, 24 June 2018 (UTC)[reply]

I just told you what I want: a general vote concerning CFI for appendix-only languages. We could create new criteria for them (which I've suggested elsewhere), or just borrow the LDL criteria, but it should be consistent with how we normally approach attestation (e.g. everything must be durably archived, no matter how lax the attestational requirements). I'd be happy to help, but you'll have to take a step back from blaming me and recognise that your last vote failed and unless you craft it better, the next will as well. —Μετάknowledge^{discuss/deeds} 08:25, 24 June 2018 (UTC)[reply]

Sorry for the tone. I made a vote specifically for this language because I think people feel differently about it than the others. If a vote to introduce criteria for all of them could pass, a vote to introduce it to one language should also pass, since the result of the latter is a subset of the result of the former. When I asked you what you wanted, I meant exactly what options did you want to have a vote about?__Gamren (talk) 09:47, 24 June 2018 (UTC)[reply]

@Gamren Sorry for weighing in late, work is not kind to my wiktionary/Lojban efforts. Gamren, thank you for creating a sample vote form. I looked over your options and feel that constructed languages such as Lojban will tend to be more dynamic than many other legacy languages. Generally, the community of usage will determine the words that are "official" with perhaps a starter set created by the initial language creator. In legacy languages, we have a language corpus which shows usage as durably published. For a language still under construction, you are more likely to find online usages. They aren't as stable as durably published records, and they will tend to be less grammatically correct. I know for Lojban there are websites where new words may be proposed and then voted upon. Perhaps that can be part of the criteria for inclusion in Wiktionary. The more folks who like a word + meaning pair, the more likely they will use it, and thus the more likely the need to include it. I wonder if there is any tool that examines wiktionary access logs and tells over a time period how often the page for that word has been delivered to an end user? It might be very instructive. Of course, when the link is red, we will never know how many folks would have clicked on it if the page had existed. Jawitkien (talk) 12:32, 29 June 2018 (UTC)[reply]

If you go to an entry and click "Page information" in the sidebar to the left, you can see how many views it has had in the last 30 days. Not sure what you can do with that information, but, I mean, we could just make empty pages if the data really was useful (not that I think we should). Also: if Lojban has such a "starter pack", we could add some option(s) permitting those words. I looked at this The Complete Lojban Language, and while it contains a glossary, it calls it "brief and unofficial".__Gamren (talk) 12:48, 29 June 2018 (UTC)[reply]

As I'm relatively new to the language, I would see the phrase "brief and unofficial" as a warning that the language has well defined construction rules, and the listing there is not complete because the writer did not apply the construction rules to each of the entries in the glossary to even approach a listing that would be considered more than "brief" Jawitkien (talk) 23:36, 12 July 2018 (UTC)[reply]

Finnish category redirects

The Finnish categories in Category:Category redirects which are not empty are populated by the {{suffix}} template. Looks like Rua moved the categories but did not update the template to point entries to the new categories. Can someone who knows Finnish confirm that the new categories are the correct ones and, if so, update the template so that it categorizes correctly? There is a German one too. - TheDaveRoss 13:16, 7 June 2018 (UTC)[reply]

All of the categories are for the front-vowel variants of the suffixes. It seems the ones we use for lemmas are the back-vowel variants. For example, -yys is the front vowel variant of -uus. The practice I have mostly seen and have myself too been following is to use the back vowel variant with an |alt2= to create the appearance of the front vowel suffix (so that the only category would be the one for the back-vowel variant that is being used as a lemma). The best approach would probably be to change them via a bot, AWB or a similar approach. (The conversion process itself is easy: ä -> a, ö -> o, y -> u to convert a front vowel variant into a back vowel variant). SURJECTION _{·talk·contr·log·} 22:41, 7 June 2018 (UTC)[reply]

I could probably take care of the smaller categories by hand, but -jä, -tön, -ys and -yys definitely need automation of some kind. SURJECTION _{·talk·contr·log·} 22:46, 7 June 2018 (UTC)[reply]

I am going to also leave -tä or -y to someone who has a bot or AWB, but I have now emptied some of the categories. SURJECTION _{·talk·contr·log·} 23:11, 7 June 2018 (UTC)[reply]

Thanks. If nobody else gets around to it before I do I can make a script to update the remaining entries. I might ping you again with some examples to make sure I am not messing everything up. - TheDaveRoss 23:19, 7 June 2018 (UTC)[reply]

@TheDaveRoss Still interested in doing this? SURJECTION _{·talk·contr·log·} 15:56, 12 August 2018 (UTC)[reply]

Sure, it will be another week or so before I would be able to get to it though. - TheDaveRoss 14:49, 13 August 2018 (UTC)[reply]

"Linguistic phenomenon of the week/month"?

I think it could be interesting to create a new section on the main page, similar to WOTD and FWOTD, to feature funny linguistic phenomena (such as rebracketings: a napron → an apron, etc.), or groups of funny lexical items (such as exocentric compounds: cutthroat, rotgut, spitfire, etc.), or words that maybe aren't that interesting in themselves, but denote an interesting lexical concept (libfix would be an example), etc.

I sometimes try to do that in my WOTD and FWOTD nominations, and hope that readers notice the categories at the bottom of the page, or pay attention to the etymology section, etc., but it's not really suited to that purpose. Plus I'm not too keen on featuring libfix or exocentric because 1) the WOTD waiting list is already several months long and 2) as I said, those words aren't that interesting in themselves.

What do you think? Would that be feasible?

@-sche, DCDuring, Metaknowledge, Sgconlaw and whoever is interested. Per utramque cavernam 18:31, 7 June 2018 (UTC)[reply]

I wouldn't have thought that there were very many instances of such things. SemperBlotto (talk) 18:52, 7 June 2018 (UTC)[reply]

Such things could be of interest to some. I doubt that it could be featured daily rather than, say, weekly or monthly. How many categories do we now have that might contain examples of "interesting" linguistic phenomena? I assume that we are talking about English, though I could imagine something similar for cross-language phenomena, though the audience might be small. DCDuring (talk) 22:47, 7 June 2018 (UTC)[reply]

Sure, if you’re happy to maintain it. I’d say make it less frequent (at least in the beginning) so you don’t kill yourself thinking of themes or looking for words. Also, it needs a catchier name ... — SGconlaw (talk) 01:24, 8 June 2018 (UTC)[reply]

We should call it "Word Focus", "Weekly Fun", "Wondrous Formant" or "Wordishly featurized". Funnily enough, these would all be acronymed as WF. --Genecioso (talk) 11:44, 8 June 2018 (UTC)[reply]

@DCDuring: I'd try to feature things relevant to English, but the ideas I've had tend to be cross-linguistic anyway.

@Sgconlaw: Yes, you're right, I'll have to come up with a better name. What about "linguistic thingamajig of the week"? It almost rhymes :p

@Sgconlaw, DCDuring: Yes, daily or even weekly seems unreasonable. I think monthly would be manageable (I've already gathered a dozen ideas on my userpage, so we would be set for one year), but maybe a bit slow? Per utramque cavernam 16:58, 8 June 2018 (UTC)[reply]

I don't know how we'd handle it, exactly, but obligatorification is another interesting phenomenon. (On which note, I notice we have an entry for wreak havoc but also an entry for wreak which would seem to make clear that it is SOP even if it is the most common collocation these days.) - -sche (discuss) 19:13, 8 June 2018 (UTC)[reply]

I abstain on this. It's interesting, but we're not an encyclopaedia of linguistics (Wikipedia covers that); we're a dictionary. It's appropriate to feature words and categorise them according to the phenomena that have affected them, but featuring those phenomena is going out on a limb a bit. —Μετάknowledge^{discuss/deeds} 12:01, 8 June 2018 (UTC)[reply]

I would be concerned about maintainability, in terms of the maintainer(s) burning out (several times over the years WOTD has gone unupdated because maintaining it was too much work) and things to feature running out. Setting it monthly would make it manageable. I don't know how much interest there would be in it, and it might be boring that the featured thing was the same for a month at a time, but perhaps we could make it into an invitation (announce it monthly in the BP), a la fr.Wikt's LexiSessions, for new and old editors to edit in the featured area, e.g. identifying/adding libfixes, cases of phonetic erosion, etc. - -sche (discuss) 17:37, 8 June 2018 (UTC)[reply]

This might be off-topic/redundant, but we could do joint focus weeks for WOTD and FWOTD that feature the same phenomenon, which I personally would find interesting. And have a box above the two WOTD boxes explaining the phenomenon a bit, with a wikipedia link too probably. This way cool phenomena could be featured more visibly but it doesn't have to be every week. – Julia ☺ ^{• formerly Gormflaith •} 17:46, 8 June 2018 (UTC)[reply]
I like this, just because more dedicated focus weeks reduce my workload in maintaining FWOTD. —Μετάknowledge^{discuss/deeds} 01:04, 9 June 2018 (UTC)[reply]
I second that. I think it would be more manageable and might make(F)WOTD more interesting. Andrew Sheedy (talk) 17:04, 15 June 2018 (UTC)[reply]

Persian as ancestor of Tajik

For some reason, Persian has not been set as an ancestor language of Tajik. I'm sure this happened only by accident, but let me nevertheless explain the case. "Persian" is Modern Persian, which begins in the 8th century AD. So in order to claim that Persian is not ancestor of Tajik, one must claim that for the past 1200 years there has existed a Tajik language independently derived from Middle Persian. In fact, the area was probably only Persified by the Samanids in the 9th and 10th centuries. Moreover, Tajik is still often considered a mere dialect of Persian and otherwise its independent history begins only with the Russian rule of the 19th century. So may I ask you to please make the appropriate changes. Thank you! (PS: I'm neither Tajik nor Iranian nor Afghan, so no nationalism involved.) — This unsigned comment was added by 84.188.181.78 (talk) at 22:28, 7 June 2018 (UTC).[reply]

The current arrangement is in fact intentional and is the result of Wiktionary talk:About Persian#Tajik. I have no special knowledge of the matter, so I have only edited the modules in the ways the editors there have requested. The situation is complicated; if it needs to be changed again, we'll need to be sure our Persian editors / the people who participated in that discussion are on board. - -sche (discuss) 00:03, 8 June 2018 (UTC)[reply]

Tajik is descended from Classical Persian, but since we treat Classical Persian and Modern Persian as the same thing on en.wikt, it is a bit confusing. Tajik not descended from Middle Persian though. --Victar (talk) 19:25, 28 June 2018 (UTC)[reply]

fake news!

It's been bugging me for months how people keep saying "fake news" about things which are not any kind of news at all. Usually just as a way of disagreeing with something another person said.

I was surprised to see we don't currently have an English entry for fake news (just a Danish one) but though this misuse bugs the hell out of me, as an lexicographer hobbyist I also realize this is a non-sum-of-parts and hence unexpected phenomenon that we should be documenting, being descriptivist and all.

Sometimes it seems to be used as an interjection: Person A says something; person B yells "FAKE NEWS!"

But other times it's used as a noun in a sentence such as "Well that's fake news".

I'm not sure I've seen it in print but I can definitely provide some YouTube links to people saying it in videos, and I bet it won't be difficult to find used in the same way in forums, comment sections, etc. online.

Comments anyone? — hippietrail (talk) 07:18, 9 June 2018 (UTC)[reply]

Aren't those people just either deluded or deliberately misapplying the term, like when people accuse any dissenter of being a "shill" for the other side? Equinox ◑ 14:06, 9 June 2018 (UTC)[reply]

FWIW I use it humorously as

a) a way of disagreeing with a popular opinion — "Salted caramel ice cream is fake news. It's not even that good." (= It's overhyped, therefore it's "fake". idk)

b) synonymous to "No way"; doesn't imply you don't believe it, but that you don't want to — "The Cavs lost again last night." / "That's fake news. How'd they lose again?"

It's just a funny way to make fun of Trump when it's not about actual news. Like pronouncing China as /ˈdʒaɪnə/ or bigly or, recently, "Thank you X, very cool!". — Julia ☺ ^{• formerly Gormflaith •} 15:59, 9 June 2018 (UTC)[reply]

That could be when it's used by non Trump supporters. But I've also seen it used by Trump supporters when not about news. It wouldn't surprise me to see people totally ambivalent about Trump using it this way too. — hippietrail (talk) 05:07, 11 June 2018 (UTC)[reply]

Could be that "fake news" is a synonym of "liberal lies/propaganda"? Anyways, I've been trolling around Twitter (not a source! I know) and found a few interesting instances or either it being used as an adjective/not regarding news: [1][2][3][4][5][6][7][8][9][10][11] – Julia ☺ ☆ ^{• formerly Gormflaith •} 14:52, 11 June 2018 (UTC)[reply]

This is a really interesting phenomenon that I have been personally eyeing for awhile now. "Fake news" was a term used in normal parlance (the mainstream media, casual conversation, etc.) to refer specifically to nonsense semi-news outlets that peddle in wild conspiracy theories and which serve as fronts for selling herbal supplements (e.g. InfoWars). Politically, they are all very right-wing and sometimes "libertarian" as well. Then, the word got perverted by the alt-right to mean "actual news and facts" and the way it is used is as a shorthand for saying, "[x reasonable opinion and set of facts] is fake news/obviously biased and meant just to impede our movement, since normal human beings believe it." It's a fascinating and scary phenomenon. —Justin (koavf)❤T☮C☺M☯ 00:21, 12 June 2018 (UTC)[reply]

I think that underlying the peculiarities of the use of fake news is a meaning shift in news. If the meaning is learned inductively, then the content of media that label themselves as offering news becomes the meaning of news. To the extent that news is aimed at increasing viewership/readership, as it usually is when the media are advertiser supported, then 'news' comes to include 'entertainment news', 'sports news', 'weather news', 'human interest news', 'lifestyle news', online video content, etc., principally delivered in 30-90-second bits, much of managed by publicists. If that constitutes real news, what defines fake news? DCDuring (talk) 11:50, 12 June 2018 (UTC)[reply]

Maybe we should just wait a decade or so for the meaning to settle down before we attempt to define this (ha ha that's not going to happen). DTLHS (talk) 18:21, 15 June 2018 (UTC)[reply]

module:ine-nominals

The module ine-nominals generates Proto-Indo-European declension tables for nouns, pronouns, and adjectives. I'm changing the accusative plural desinence "*-ns" for "*-ms" because of the reasons outlined in it's discussion page. I've already pinged some IE editors and no one has opposed, if anyone does this change please let me know. Greetings -Tom 144 (𒄩𒇻𒅗𒀸) 00:57, 10 June 2018 (UTC)[reply]

A category for latinx, lxs, etc

Would it be useful to gather words like these, where x is used to gender-neutralize a word, into a category? As lxs shows, the phenomenon seems to go a bit beyond / be a bit different from cases where -x is an ending, which could otherwise arguably be handled by the existing suffix categories. What should the category be called? "Langname terms spelled with gender-neutral X"? (See also Xicana, an apparently distinct phenomenon.) - -sche (discuss) 03:48, 14 June 2018 (UTC)[reply]

Is this a thing in languages other than Spanish? I've only encountered -x and -e and -@ as replacements with Spanish but not (e.g.) Portuguese. Also, I'm not sure the best way to document this but these -x constructions actually come from English and anglo/hispanic source in the United States far more commonly than non-U. S. Hispanics. It's been adopted increasingly by Hispanics with the mother tongue of Spanish (i.e. not bilingual American Hispanics) but its origin is as a kind of anglo hypercorrection. —Justin (koavf)❤T☮C☺M☯ 04:27, 14 June 2018 (UTC)[reply]

I've seen it in Spanish, English (including with some words not of Spanish origin, like alumnx, womxn and hxstory, the latter of which are good examples of non-suffixal use of x) and German (although I don't know how many of the examples meet CFI...), and apparently it also happens in Portuguese. (Someone should check Italian, Catalan and other such languages for examples, too) - -sche (discuss) 04:54, 14 June 2018 (UTC)[reply]

@Koavf All three suffixes you mentioned are also used in Portuguese. — Ungoliant ^(falai) 15:18, 14 June 2018 (UTC)[reply]

@-sche, do you think it is necessary to separate words with x from other gender-neutral replacements/coinages? If not, something like Category:English gender-neutral terms is sufficient. — Ungoliant ^(falai) 15:18, 14 June 2018 (UTC)[reply]

I do see some benefit to separately categorizing xs vs @s (including e.g. Pin@y where again it isn't a suffix), etc, but I suppose lumping them all together would also work. My concern with calling the category "gender-neutral terms" is that it could attract terms that are, well, (merely) gender-neutral, like person or scientist. I note that for example sex worker is in that category even though it was coined to replace also-technically-gender-neutral terms like prostitute not to gender-neutralize them, but to recognize the work aspect of sex work. And also, that none of the x or @ terms are currently in that category. Still, that's a fallback if we don't want to make a more specific (sub)category for these. - -sche (discuss) 17:12, 14 June 2018 (UTC)[reply]

I remember some cool dude making entries for amigx too. --Harmonicaplayer (talk) 17:53, 14 June 2018 (UTC)[reply]

Templatize "a native or resident of"

The definitions of our words for "person who is from or in [place]" are a haphazard mix of "resident of X", "native of X", etc. The differences suggest the scope of the words differs: for example, "Asturian" is defined as "a native of Asturias", while "Minorcan" is "an inhabitant of Minorca", and "Madagascan" as "a native or inhabitant of Madagascar". But in fact you can still call a native who no longer lives in Minorca a "Minorcan" and, in the same situations as you can call a non-native resident of Madagascar a Madagascan, you can call a non-native resident of Asturias an Asturian. I suggest we create a simple template for these definitions so the wording can be harmonized (on something to the effect of "A native or resident of {{{1}}}") whenever applicable. (Compare {{place}}, but this template could be much simpler.) Is this a good idea? Obviously, in the rare case where a word does refer exclusively to a native or to a resident, that could be spelled out manually like all the definitions are at present. - -sche (discuss) 04:15, 14 June 2018 (UTC)[reply]

Agree Boilerplate definitions (or glosses) should be templatized and ultimately stored at d:. —Justin (koavf)❤T☮C☺M☯ 04:26, 14 June 2018 (UTC)[reply]

That seems reasonable.__Gamren (talk) 18:09, 15 June 2018 (UTC)[reply]

It would be good to have a text that is equally appropriate for places (towns, cities, ...), regions (islands, provinces, ...), countries, and even continents. It is not unusual to call a person of descent from Zimbabwe (for example) a Zimbabwean, even if they have never been to Zimbabwe. So what about this?

{{demonym|Asturias}} → A person from, or of descent from, Asturias
{{demonym|Minorca}} → A person from, or of descent from, Minorca
{{demonym|Zimbabwe}} → A person from, or of descent from, Zimbabwe

Assuming that the pagename is also the related-to adjective, the output could also be made to read, "A person from Zimbabwe or of Zimbabwean descent". In case that is not appropriate, a second parameter could be used for an alternative descent adjective, for example, for Brit:

{{demonym|Britain|British}} → A person from Britain or of British descent

And then the value - for that parameter could signify that the descent clause is to be omitted in its entirety.

An unresolved issue is that English grammar may require the definite article while it should not be linked-in: "... from the Dominican Republic, ...". Perhaps an initial part of the first parameter up to the first word with a capital letter could be copied to the output but kept outside the link. And if a parameter already contains a link, it should just be copied verbatim, so the Dominican Republic and the [[Dominican Republic]] will have the same effect. --Lambiam 23:51, 24 June 2018 (UTC)[reply]

'Male-and-female' vs 'unisex' given names

Currently, some unisex given names use {{given name|female|or=male}} (or male|or=female), which displays as "female or male given name" and categorizes them exclusively as "male given names" and "female given names" but not as "unisex given names". Others use {{given name|unisex}}, which displays "unisex given name" and categorizes them exclusively as unisex but not as male or female.
IMO, all the preceding inputs should categorize the names as male, female, and unisex, but even if the categorization is fixed, the inconsistent definitions seem undesirable.
I'd like to switch them all to {{given name|unisex}}. Thoughts? Note that I'm only talking about editing entries like Dakota, River and probably Indiana, where the name has the same origin when applied to men as when applied to women; I'm not talking about entries where the male and female names have separate origins or contexts, like Dana, Jocelyn, and Jess. - -sche (discuss) 04:46, 14 June 2018 (UTC)[reply]

English names rarely stay unisex. It's often necessary to define that one sex is more common, or that US and British usage is different, or that the gender has changed at some point. Defining George or Shirley as "unisex given names" would be confusing. In some languages unisex names may be the norm (Chichewa, Hawaiian) while in others they are forbidden (Finnish). It would be fine if someone could make a bot adding unisex category to any name with male and female definitions in the same language.--Makaokalani (talk) 15:43, 14 June 2018 (UTC) To avoid confusion, I would rather remove the new unisex parameter from Template:given name and have a bot replace it by "male|or=female". Though I've always thought "male|and=female" would sound more definite. --Makaokalani (talk) 12:11, 15 June 2018 (UTC)[reply]

In fact, I've noticed that the template/module accepts (and adds a category for!) anything that is added in the first parameter, so e.g. {{given name|dumb|lang=en}} gets put in "English dumb given names". Probably the module should be updated so that setting the parameter to anything other than "male", "female" or "unisex" puts the entry into a cleanup category we can monitor. We shouldn't do away with "unisex", precisely because for many languages it's the best term, and even in English there are names it's applicable to. Whereas, names like George shouldn't have their male and female uses combined onto one line, anyway, because they need context/commonness labels. I agree that the wording should be changed from "or" to "and" (I think I see how to make that change to the module) for any single-definition-line names where "male [or/and] female" is kept rather than switched to "unisex". - -sche (discuss) 18:37, 15 June 2018 (UTC)[reply]

@-sche: The module now adds a tracking template for unrecognized genders. — Eru·tuon 19:55, 15 June 2018 (UTC)[reply]

Thanks! One thing that turned up was this, with separate "female given name" and "male given name" on the same line, which only happened to be caught because of a stray space, but should still be combined even if not typoed. I'll mention that on WT:T:TODO. - -sche (discuss) 22:03, 15 June 2018 (UTC)[reply]

Clarification to Wiktionary:Entry layout

I felt that some of the text in the section List of headings was particularly unclear, in the sense that a reader would not get the intended meaning unless they already knew it. So I devised a replacement text. I cannot apply it myself; the page is locked to prevent editing. "View source" calls up a text that suggests recommending any additions or changes to the page on its talkpage. Which I duly did, here: Wiktionary talk:Entry layout#Indentation?. That was three-and-a-half months ago, but nothing happened. --Lambiam 21:03, 16 June 2018 (UTC)[reply]

...and that one editor is back

See Wiktionary:Beer_parlour/2018/May#Possible_IP_range_blocks_required. The ranges are the same and the edits not much different at all. SURJECTION _{·talk·contr·log·} 14:13, 18 June 2018 (UTC)[reply]

I'm still not sure how best to handle this, there are lots of seemingly unrelated editors in those ranges, some with accounts and some without. Anyone have any thoughts about creative methods for reducing this person's edits without blocking a bunch of good contributors? - TheDaveRoss 14:48, 18 June 2018 (UTC)[reply]

I would probably set an IP range block but allow account creation and only block anon edits. SURJECTION _{·talk·contr·log·} 15:11, 18 June 2018 (UTC)[reply]

Still an issue. Found a new broadband IP: Special:Contributions/82.203.184.19. Actively used alongside the mobile IP ranges reported earlier. SURJECTION _{·talk·contr·log·} 12:25, 24 June 2018 (UTC)[reply]

Onomatopoeic PIE *a

Do some *a words in PIE exist simply because they're onomatopoeic? Case in point, *kan- ("to sing") and *al-al- (“to shout”) (cf. աղաղակ (ałałak, “shouting”) and ἀλαλαγή (alalagḗ, “shouting”)). --Victar (talk) 16:49, 19 June 2018 (UTC)[reply]

Old Frisian /j/

Hey, user @Leornendeealdenglisc and I were discussing the usage of j in Old Frisian orthography. As j didn't really come onto the scene until much later, /j/ typically appears to have been rendered in contemporary texts as i, however some scholars have chosen to transcribe it as j. Case in point, ieva ~ jeva (see references).

Should we normalize to j, and if so, just word-initial, or everywhere before vowels, ex. tohakia > tohakja? I have no problem with the former, but the later seems a bit hyper-corrective to me. @Metaknowledge, -sche, Leasnam, Korn --Victar (talk) 20:11, 22 June 2018 (UTC)[reply]

I'm not familiar with Old Frisian to an extent where I could contribute to this discussion. Would that remove any ambiguity? Thinking about it made me notice btw. that we're not consistent with j in Old Saxon where we write it ⟨i⟩ in words like hebbian. Korn [kʰũːɘ̃n] (talk) 21:28, 22 June 2018 (UTC)[reply]

@Korn, I can't think of any situations where it would disambiguate. I believe we transcribe OS and OHG primarily with j only word-initial. --Victar (talk) 22:14, 22 June 2018 (UTC)[reply]

@Mnemosientje, Isomorphyc --Victar (talk) 06:48, 23 June 2018 (UTC)[reply]

Reminds me of how we sorta treat i and j in Latin entries (cf. iaceō vs. jaceō). We show that as i unless actually attested as j if I'm not mistaken. Would this same reasoning also work for Old Frisian ? Leasnam (talk) 15:30, 24 June 2018 (UTC)[reply]

As for "unless actually attested": For some languages (including Old and Middle Germanic languages) a so called normalised spelling is used in Wiktionary. For example, ⟨u⟩ and ⟨v⟩ are often normalised by their (assumed) sound. E.g. it's silvir albeit only attested as "Siluir" (capital S as beginning of a sentence). Compared with WT:About Old Saxon#Normalisation ("Any other attested spellings may be listed under an ===Alternative forms==="), there should be both, an actually attested "siluir" and a fictional "silvir". It would also be nice to mark fictional - not properly attested normalised - entries by a note like "This spelling is not attested, but normalised", maybe with an addition giving some sources and actual spellings. -84.161.6.98 01:10, 3 July 2018 (UTC)[reply]

Only information specific to the entries belong into the entries, for ease of work and readability. Normalisation of e.g. Old Saxon is something that applies to every single entry and hence is noted in About: Old Saxon, so that our entries don't become spammed with annotations. Korn [kʰũːɘ̃n] (talk) 11:09, 3 July 2018 (UTC)[reply]

Not all normalised spellings are unattesed - sometimes the normalised spelling does indeed occur somewhere (well, maybe without diacritics similar to Latin and macra, and with alterations of letters like turning ſ into s if there is only ſ, ı into i if there is only ı, similar to Latin and old ALL CAPS style). On the other hand, some normalised spellings could be unattested, and some are unattested, when not accepting later editions (19th till 21st ct.). Thus the above does not apply to all entries. And it might be useful information to know that silvir (< siluir, text differing between u/v in another way than u=vowel, v=consonant) is fictional, while mīn (< mın, text without i), des (< deſ, text without s), daȥ (< daz, ȥ treated like z+diacritic) do exist. -84.161.48.172 02:38, 5 July 2018 (UTC)[reply]

Can a thesaurus page have multiple senses?

Should a thesaurus page have a single sense or are multiple senses in a single page OK? I'm not sure how to proceed at the moment.

When I started creating thesaurus pages, I kept to a single sense for each page. Mostly, this was for no better reason than the existing pages appeared to do this. That is why there are separate Thesaurus:workaround (noun) and Thesaurus:kludge (verb) pages and why there are separate pages for Thesaurus:smell, Thesaurus:olfact, Thesaurus:olfaction, etc (a lot of Category:Thesaurus:Smell could come under the headword "smell"). However, there are some thesaurus pages with multiple senses and multiple parts of speech, such as Thesaurus:death, Thesaurus:surprise and Thesaurus:worsen.

The guidelines at Wiktionary:Thesaurus and Wiktionary:Thesaurus/Format are not explicit either way. I haven't found anything else yet to clear this up. Multiple senses in a single page would fit with the mainspace. Single senses are clearer and easier to interlink, both for semantic relationships and for other languages (because thesaurus pages are single semantic concepts rather than words). Either could seem more intuitive to different people. - AdamBMorgan (talk) 13:10, 25 June 2018 (UTC)[reply]

Requested move: Μεσόγειος θάλασσα → Μεσόγειος Θάλασσα

At some time in the past, the page Μεσόγειος Θάλασσα (Greek for Mediterranean Sea) was moved to Μεσόγειος θάλασσα. I think this was a bad move; the common practice is to capitalize both words. See. e.g., the Greek Wiktionary at Μεσόγειος Θάλασσα and the Greek Wikipedia at Μεσόγειος Θάλασσα. In English we also write Mediterranean Sea and not *Mediterranean sea. I'd move the page back if I could; unfortunately, the redirect page does not have a trivial edit history because interwiki links were added (and then removed when we got rid of interwiki links in general), so I cannot perform the move ("You do not have permission to move this page"). --Lambiam 04:40, 29 June 2018 (UTC)[reply]

Seems reasonable.

Done. - -sche (discuss) 21:15, 2 July 2018 (UTC)[reply]

naming audio pronunciation files

For english and non-english words. Could someone kindly help by expanding/clarifying the audio-Help page? I left a message at Help audio Talk. Is it Xx(lang) - cc(country) - word.ogg, or could this also be allowed: Xx(lang) - word - cc/dialect - accent? Thank you. sarri.greek (talk) 09:52, 30 June 2018 (UTC)[reply]

Distinguishing between "Derived terms" and "Derived compound verbs"

This topic is primarily for Azerbaijani and possibly other Turkic languages. In görmək (“to see”), I attempted a way of distinguishing between derived terms, under which I put deverbal nouns and simple deverbal verbs (verbs that are derived from another verb by suffixation) on the one hand and derived compound verbs, where I put light verb constructions and alike (complex predicates consisting of at least two separate words) on the other. Relatively few such terms are derived from görmek, but there are verbs which are used to derive many more compounded verbs, and in such cases I find it very useful to distinguish between the two categories instead of mixing derived nouns and simple verbs with compound verbs. What do you think? @Anylai, Crom daba etc. Allahverdi Verdizade (talk) 12:18, 30 June 2018 (UTC)[reply]

@Allahverdi Verdizade. For what it's worth, hold court is listed as a derived term under hold, and take advantage is listed as a derived term under take, even though you might call them derived compound verbs. So this refinement of the classification may not be needed. Probably, some of these many cases may also be classified as hyponyms, as you can see for Turkish yapmak. --Lambiam 23:15, 4 July 2018 (UTC)[reply]

@Allahverdi Verdizade A similar issue comes up in Russian verbs, where there are many prefixed derivative verbs of simplex verbs as well as derived terms of other parts of speech; cf. вари́ть (varítʹ) for an example. I think you probably shouldn't create a new heading within discussion, but you could create a subheading, maybe like this: Benwing2 (talk) 01:17, 5 July 2018 (UTC)[reply]

Derived terms

görüş (“view; meeting”)
görünmək (“to look, to appear”)
- görünüş (“appearance”)
görkəm (“appearance; view”)

compound verbs:

məsləhət görmək (“to advice”)
yolunu görmək (“to bribe”)

Subheading is an excellent idea. Allahverdi Verdizade (talk) 11:27, 6 July 2018 (UTC)[reply]

WOTD: April Fools' Day 2019

Proposals for a theme for the April Fools' Day period next year (1–6 April 2019) for Word of the Day are welcome. Words should preferably be chosen from the list of existing nominations, which is already rather long. — SGconlaw (talk) 19:05, 30 June 2018 (UTC)[reply]

Use "laurel" with an audio clip of "yanny"? "Groom of the stool"? - -sche (discuss) 19:09, 30 June 2018 (UTC)[reply]

We've been featuring a series of interesting words which have a common theme, rather than just gags. The theme should preferably be more intriguing than something like "nouns". This year's was words about unusual concepts. We can have up to six words in the series. — SGconlaw (talk) 19:15, 30 June 2018 (UTC)[reply]

How about "animal paradoxes/contradictions"? Examples: black swan, buffalo wing, butterfly effect, chicken-or-egg question, Cockney, flying fish, hen's teeth, horsefeathers, I'll be a monkey's uncle, infinite monkey theorem, lipstick on a pig, w:Man bites dog (journalism), neither fish nor fowl, raining cats and dogs, Schrödinger's cat, the straw that broke the camel's back, turtles all the way down, walking catfish, when pigs fly. If that's too broad or vague, I think I see at least a couple of themes within the theme. Chuck Entz (talk) 22:33, 30 June 2018 (UTC)[reply]

@Chuck Entz: that sounds cute. In what way are they paradoxes or contradictions, though? At the moment they just look like animal-related terms to me. — SGconlaw (talk) 10:47, 2 July 2018 (UTC)[reply]

Not all the examples I gave are perfect reflections of the theme, but there are enough so you can select the best. The apparent paradoxes or contradictions, in order: Until they went to the Southern hemisphere, Europeans thought swans could only be white. Buffaloes don't have wings. How can a butterfly change the weather? Which came first, the chicken or the egg? Cockney is from cock's egg- only hens have eggs. Fish normally swim rather than fly. Hens don't have teeth. Horses don't have feathers. Humans don't have nephews that are animals. How can monkeys on keyboards produce anything meaningful? Pigs are ugly/lipstick is pretty. Dogs bite men. Is it a fish or a fowl? Dogs and cats don't fall from the sky. How can a cat be both alive and dead? Straws are light/camels carry extremely heavy things. What's under the bottom turtle? Fish normally swim rather than walk. Pigs don't fly. Chuck Entz (talk) 12:47, 2 July 2018 (UTC)[reply]

[1] By the way, would it be correct to speak of a Spanish -dumbre suffix if there had been no word formed with it in Spanish proper, i.e. if all words "using" it had been inherited from Latin?

[1]