Pronunciation of 'w' in Czech

Hi Dan, How would you pronunce 'w' when written in Czech? (e.g. Wikipedie, Winchester) – AWESOME meeos * (「欺负」我) 12:51, 2 January 2017 (UTC)

@Awesomemeeos: It is pronounced like v. Thus, Wikipedie would be /vɪkɪpɛdɪjɛ/. --Dan Polansky (talk) 09:09, 7 January 2017 (UTC)
You can hear the Czech pronunciation of v at váza. --Dan Polansky (talk) 09:09, 7 January 2017 (UTC)

Czech lemmatizer

A Czech lemmatizer is at

My favorite setting for the lemmatizer is as follows:

  • Task: Lemmatize
  • Tag set: Raw lemmas
  • Output: Plain

Example input: Komu není shůry dáno, v apatyce nekoupí. Komu se nelení, tomu se zelení.

Example output: kdo být shůry dát, v apatyka koupit. kdo se lenit, ten se zelený.

One use of this lemmatizer is that you pick a piece of Czech text, run it through the lemmatizer, wikify words and fill redlinks by creating Wiktionary entries. This redlink-filling activity was suggested by SemperBlotto some time ago, without the lemmatization part. Since I am interested in creating lemmas rather than inflected forms, I need a lemmatizer.

--Dan Polansky (talk) 18:39, 13 January 2017 (UTC)

The following Python script grabs clipboard content, wikifies words and puts the result back to clip:

import re
from Tkinter import Tk
newContent = re.sub(r"([^ ,\.:;]+)", r"[[\1]]", Tk().clipboard_get())

The regex may need finetuning. --Dan Polansky (talk) 18:54, 13 January 2017 (UTC)

CFI vote

I made the CFI vote start at 19:00 today (my local time is 18:49). I put back the end date as well. I removed the 'premature' tag.I hope that was the right thing to do.

John Cross (talk) 16:50, 17 January 2017 (UTC)

What you did in Wiktionary:Votes/pl-2017-01/Trimming CFI for Wiktionary is not an encyclopedia 2 was fine, thank you. --Dan Polansky (talk) 18:51, 17 January 2017 (UTC)

Czech words for females

I hesitate how to mark up definition lines for Czech words for females such as učitelka, lékařka, ředitelka and prezidentka. The problem obviously applies to other languages as well, e.g. German Professorin.

One option that I have often used and that is quite possibly prevalent in the Czech entries is like this:

  1. female teacher

A disadvantage of that is that the word "female" does not usually appear in translation; you do not say "she is a female teacher" but rather "she is a teacher".

Another option that I must have used at least once is this:

  1. teacher (female)

What I do not like about this is that the disambiguator "female" appears only in the gloss, but maybe it's okay. Furthermore, I like the gloss to be an abbreviated definition, which "female" isn't; it would be "female teacher", which would lead to a repetition of "teacher" in:

  1. teacher (female teacher)

Another option that I must have seen somewhere is the use of a context label:

  1. (female) teacher

That actualy looks okay since, in an English sentence like "she is a teacher", the subject of the sentence (she) is in the context of the predicate (be teacher).

Based on the above, I may stay with "female teacher", or I may switch to "(female) teacher".

--Dan Polansky (talk) 11:49, 22 January 2017 (UTC)

  • It would be good if {{cs-noun}} allowed you to put "m=učitel" of "f=učitelka" in the headword. But, anyway, I think "teacher (female)" or "teacher (male)" is the way to go. SemperBlotto (talk) 11:54, 22 January 2017 (UTC)
I don't see "female teacher" (or indeed "male teacher") being a problematic definition: they don't have to be an exact word-for-word phrase that you can insert into a translation without thinking. There are plenty of English entries of the same kind, like usherette. Equinox 20:54, 22 January 2017 (UTC)

(outdent) Someone likes to use {{feminine noun of}}, it seems, seen in Catalan psicòloga or French masseuse. Lehrerin uses that as well, but used to use {{feminine of}}:

{{feminine of|lang=de|Lehrer|nodot=y}}, a female [[teacher]] {{gloss|person who teaches}}.

The above was converted to {{feminine noun of}} in July 2015 by MewBot. In more distant past, Lehrerin used to have the following plain markup:

(female) [[teacher]] (a person who teaches)

--Dan Polansky (talk) 17:57, 4 February 2017 (UTC)

No Babel/Language Categories

Hi Dan Polansky, just a general question, what do you feel when user pages do not have Babel and/or have categories derived from them? For me, I find it irritating. I wanted to find native French/Dutch/Japanese speakers, but they don't put Babel on their page, so I had to discover them from contributions from other pages – AWESOME meeos * (chōmtī hao /t͡ɕoːm˩˧.tiː˩˧ haw˦˥/) 22:55, 12 February 2017 (UTC)

I like Babel, which is why I often ask people to add it to their use pages. Babel is relatively important in a dictionary project, especially since we have seen multiple editors contribute in a plethora of languages they do not speak; with Babel, we know whether the person relied in part on their knowledge of the language or whether they had to go by sources alone. --Dan Polansky (talk) 10:18, 18 February 2017 (UTC)

Slovensko etymology

Hi Dan, I wonder why this slovak word did not have its etymology fixed up before? (I had to do it myself) — AWESOME meeos * (не нажима́йте сюда́ [nʲɪ‿nəʐɨˈmajtʲe sʲʊˈda]) 19:43, 10 March 2017 (UTC)

What makes you think it came from OCS rather than from an inherited form? --WikiTiki89 19:57, 10 March 2017 (UTC)
It was based off the original etymology. — AWESOME meeos * (не нажима́йте сюда́ [nʲɪ‿nəʐɨˈmajtʲe sʲʊˈda]) 19:59, 10 March 2017 (UTC)
I did not edit the etymology of the Slovak entry Slovensko, from what I remember and from what I can see by a quick glance at the revision history. I am not into Slavic etymology; what I did in etymology many years ago is source English etymologies from Century 1911, checking with modern sources. --Dan Polansky (talk) 07:52, 11 March 2017 (UTC)

External link templates and excessive detail

Keywords: reference templates, baroqueness, ornament.

I strongly prefer external link templates that are simple, simply formatted and contain minimum identification information.

In particular:

  • ISBN and OCLC are not necessary for unique identification and present visual noise. In general, I feel exposing numerical identifiers on user interface is rude to the user unless the user requested them or has special urgent need for them.
  • I see no need to render Dictionnaire Illustré Latin-Français into English as "[Illustrated Latin–French Dictionary]"; the translation should be pretty clear to anyone who understands English.
  • I see no need to spell author names in full. That is unnecessary for unique identification, and adds words for the eye to read.

There is cost to the skimming reader in being exposed to unnecessary detail for their eye to parse. Let us recall the busy search entry page of AltaVista search engine from the last century that was replaced by the beautiful minimalistic Google search entry page; the removal of items that were of no interest for the searching user was a major user experience improvement.

For those who feel a strong need for a lengthy, baroque identification, such identification could be placed to an appendix linked from the external link template. In such an appendix, there is a plenty of room for extraneous detail, and there, it does not disturb anyone's skimming.

For templates that are hyperlinking to online sources as opposed to merely referencing paper sources, the full identification is also present on the linked site. The key purpose of these templates is to take the reader to the target page to learn more about a particular word. Once the hyperlink is there, getting to the target information-bearing page is a matter of a single mouse click, unlike in the age of paper information resources when the resource identification was key for getting to the page where the referenced information was located.

For Wiktionary maintenance purposes, fuller identification can be placed on the template documentation page. Thus, even when a website goes down, the template documentation history shows details the resource once linked.

--Dan Polansky (talk) 07:40, 18 March 2017 (UTC)

To prevent further reverts, I create a vote: Wiktionary:Votes/2017-03/Reference templates and OCLC. I remember a BP discussion on the subject where I argued extensive identification could be in appendices, but I cannot find the discussion. --Dan Polansky (talk) 08:16, 18 March 2017 (UTC)
You're definitely right that such a discussion about appendices happened, but I can't seem to find it either. I'll try and look around a bit more. —JohnC5 08:26, 18 March 2017 (UTC)
A little bit of discussion is now unfolding at Wiktionary:Beer parlour/2017/March#Vote: Reference templates and OCLC. --Dan Polansky (talk) 08:42, 18 March 2017 (UTC)

One work online that uses reasonably short link identification is Encyclopedia Britannica. To wit, its article on tiger[1] has Additional Reading section containing K. Ullas Karanth, The Way of the Tiger: Natural History and Conservation of the Endangered Big Cat (2001) as an identification, with no ISBN and no place of publication. Its article on Australia[2] contains similar style, e.g. Tony MacDougall (ed.), The Australian Encyclopaedia, 6th ed., 8 vol. (1996) and Australian Bureau of Statistics, Year Book, Australia (1978– ).

--Dan Polansky (talk) 12:15, 19 March 2017 (UTC)

