Wiktionary talk:About Serbo-Croatian/Archive

Latest comment: 12 years ago by Saimdusan in topic justification

justification

edit

I'm going to have to say that while well intention, this proposal has a number of serious flaws. Firstly, the ISO 639-1 sh code is deprecated. This is because ISO 639 has come to the decision that Serbo-Croatian is a macrolanguage (ISO 639-3 code: hbs), with three member languages, not a single language with three primary dialects as this proposal treats it. Secondly, because of the way the Wiktionary templates use language codes, it's quite useful to have the Serbian, Croatian, and Bosnian as separate entries. From my own work on Catalan entries, I find myself wishing there was a code for the Valencian dialect thereof, as it would make things easier when dealing with that dialect. (Maybe once ISO 639-6 comes out with its alpha-4 codes for dialects, there will be.) Lastly but not least, where available there should be links to the relevant foreign language wiktionary. The Serbian has 15182 entries, the Croatian has 2930, the Bosnian has 315, and the Serbo-Croatian a mere 42. If this proposal were followed , there would be the odd situation of treating on the English language wiktionary as dialects what those writing entries in corresponding foreign language wiktionaries have chosen to come down strongly on the side of being distinct languages.

That said, given the high degree of mutual intelligibility, coming to a consensus as to common formatting and including cross links by default between the two scripts for all three affected languages is a good idea. Carolina wren 23:22, 3 March 2009 (UTC)Reply

Valencian is not a dialect of Catalan, they are two names for the same linguistic area. There are various political issues (minor differences between Catalan/Valencian governed by the IEC and the AVL, but mostly the same), usually it is better to refer to a particular standard rather than referring to a different language. E.g. Catalan/Valencian IEC, Catalan/Valencian AVL, Valencian RACV etc. - Francis Tyers 23:45, 20 July 2009 (UTC)Reply
The reason why individual-language sh code got deprecated is because there is no nationial institution supporting that name any more. There is absolutely no reason not to use the sh code itself for our purposes, esp. giving the fact that ISO won't assign two-letter language codes anymore, so it cannot get "overwritten" (plus, there is always 'hbs' as a backup), and that sh is used as mediawiki language code for SC wikiprojects.
You seem to take the assignment of ISO codes by SIL as some kind of argument of what is a "real" language or not. You shouldn't. Ethnoglogue is full of bizarre decisions: e.g. they treat Crimean Gothic as a "Gothic dialect", but they assign ISO code to one imaginary language called Knaanic which I assure you doesn't exist. We cannot treat the opinion of some Christian organisation such as SIL as the ultimate arbitrator on issues that are far more complex then they seem to realise. Wiktionary as a dictionary has it's own reasons and goals: in this particular case I cannot imagine what English-speaking (not necessarily a native speaker) Wiktionary user would want to learn "Croatian", but not "Bosnian" and "Serbian". They're all differently stylised variets of one particular Neotšokavian idiom that share identical inflection and 95% of lexis. By treating their "common core" as SC, and marking differences when they occur, we'd be doing a tremendous favour to any user of Wiktionary, not mentioning the editors themselves which need to triple or even quadruple the otherwise identical content.
because of the way the Wiktionary templates use language codes, it's quite useful to have the Serbian, Croatian, and Bosnian as separate entries.From my own work on Catalan entries, I find myself wishing there was a code for the Valencian dialect thereof, as it would make things easier when dealing with that dialect. You don't understand - standard B/C/S are all based on the same dialect. Two neighbouring villages on Adriatic have more differences in their speech then either of those 3 (that's the paradox of the whole thing..). However, sub-standard dialect speakers basically always write in literary dialect, and speak it when they come to bigger urban centres (similar to MSA vs. Arabic dialects issue..).
I've posted Robert a note on whether the mapping of multiple FL wikts to one en. wikt header could be handled gracefully by Tbot. It's not particularly relevant tho, since all those wikts are effectively dead (bs no activity since Dijan left it, I personally created/edited most Croatian-language entries at hr wikt a long time ago, and sr wikt's 95% entries are some bot-generated proper names). --Ivan Štambuk 12:26, 4 March 2009 (UTC)Reply
I have no insight into this particular issue of Serbian and Croatian but I wonder whether there is an objective or near-objective way of determining a distance between two languages or two sets of terms and meanings, and whether it is possible to source the result of determining such a distance. I can imagine having a practically determinable metric of distance when we disregard meanings of terms and only focus on differences in the sets of terms.
I wonder how the distance between Serbian and Croatian stands in contrast to the distance between Czech and Slovak, and the distance between Spanish and Portuguese.
To backup the various claims about the similarity and dissimilarity of languages, I think it would be useful to have links to external sources that compare Serbian with Croatian. --Dan Polansky 14:30, 4 March 2009 (UTC)Reply
It's not that I think the ISO 639 folks are paragons of virtue, it's that they've actually taken the time to hash out thorny political issues like this and look into the merits of the case. Do they mistakes? Of course they do, especially with languages for which they make decisions on the basis of very little evidence of how people use the purported language(s). I doubt that factor has any bearing on their decision here, as there is plenty of evidence of how the Yugoslavs speak and write. Deferring to the ISO's judgment as to what is a macrolanguage, a language, or a dialect will save us having to hash things out repeatedly as the population of active wiktioneers changes.
If a decision is made to use Serbo-Croatian as a language header and deprecate the usage of Croatian, Serbian, and Bosnian, it shouldn't be done with respect to just Croato-Serbian, but rather as a general policy rule that applies to all ISO 639 macrolanguages.
Finally in reply to Ivan's point about what English speakers might want to know about the language(s), as a native speaker of English, let me say that if I were interested in learning a Yugoslav language/dialect, I'd likely start with Croatian, simply so I could avoid having to deal with Cyrillic as I was learning the language. Carolina wren 21:25, 4 March 2009 (UTC)Reply
Carolina, please don't use the term Yugoslav in some ethno-national sense as many people find it very insulting. Yugoslavs as some "nation" were fabricated by the commies, and never really existed. In 2001 census 176 Croats declared their nationality as "Yugoslavs" so you can imagine how popular that term here really is. Not to mention the fact that Croats and Bosniaks were 15 years ago for 5 years at war with army that called itself "Yugoslav People's Army", and the country that called itself "Yugoslavia". Esp. insulting is calling Croats/Bosniaks/Serbs/Montenegrins "Yugoslavs" and ignoring Slovenes and Macedonians which were also equal members of SFRJ, which insinuates certain propaganda theories which I shall not name here.
Deferring to the ISO's judgment as to what is a macrolanguage, a language - we should defer to arguments not to some fabricated classification endorsed by some Christian organisation as SIL. There is no such thing as "macrolanguage" in linguistics. There are some dozen ISO "macrolanguage" codes that we already use here as level-2 language names (Albanian, Arabic, Azeri, Inuktitut etc., see this list and find more examples by yourself) and I don't see why SC should be in any case different, esp. given the fact that 3 SC standards are several orders of magnitude more "closer" then either of those other sub-macrolanguages (some of which are not even mutually intelligible, which is also the case in some SC subliterary dialects that will fortunatelly never gain ISO codes themselves). We should use common sense and arguments and not plainly trusting the organisation which has predilection to assign ISO code to non-existing languages such as Knaanic.
What applies to other "macrolanguages" (whatever that term means) is of no concern to this particular policy. This pertains to Croatian, Serbian and Bosnian language entries at wiktionary and their organisation. Neither of them is "going away", no content is going to be "deleted": it's just going to get optimized for conciseness to enhance learning experience of what was 20 years ago (and in most modern linguistic books still is) treated as simply "Serbo-Croatian". Why have 4-6 entries when you can have 2 or 3?
While learning literary Croatian, you'd probably be unaware that you also learned most parts of literary Bosnian and Serbian too. Language is not determined by its script, but its lexis and grammar, which these mostly share. Moreover, Serbian is today written mainly in Latin script (I'd say >90% of Web content), and most diaspora Serbs cannot even read Cyrillic. Cyrillic should alaways remain "there" tho being one big "badge of Orthodoxy", but its usage is growingly marginalising one. --Ivan Štambuk 19:20, 5 March 2009 (UTC)Reply
Sorry Daniel, there is no objective metric to measure "distance" between two languages. As a native speaker of SC, I can tell you that inflection is 95-100% the same, words are >90% the same, depending on the standard variety: in case of basic terms the shared "core" is near ~98%, in terms of specific terminology (law, science) it is a bit lower. However, the cases where all 3 standard varieties have different words are extremely rare: usually they're shared S&C/S&B/B&S, which at any case could be eased with grouping as simply treating them all as ==S-C==, with context labels designating the term's spread.
Czech and Slovak are based on different dialects, as well as Bulgarian and Macedonian (dunno about S&P), and are hence not really valid comparisons, where all SC standards are based on the same particular idiom of the same dialect (moreover, the only dialect that is shared among Bosniaks, Croats, Montenegrins and Serbs). Slovak was standardised in the 19th century on Middle Slovakian dialects deliberately to be as distinct as possible from the dialects used for conemporary literary Czech and Ukrainian. Has history unfolded itself a bit differently (Štúr not "winning", but Beronlák) you'd have pretty much the same situation with C&S as you have with B/C/S today.
For comparison purposes you have w:Differences between standard Croatian, Serbian and Bosnian, which is not that good written tho, and e.g. Appendix:Swadesh lists for Slavic languages out of which you can draw your own conclusions. Differences are trivial and lexical, and for an English-speaker it would be most convenient to mark the differences when they occur, not to ridiculously duplicate the content ratarding the learning experience. --Ivan Štambuk 18:48, 5 March 2009 (UTC)Reply
(to Dan Polansky) As for your quæstion about Spanish and Portuguese, this is a completely different case, because of the huge difference in spelling (letters ã and ç) and because Portuguese literature can be traced back as early as 15th century. The uſer hight Bogorm converſation 19:34, 5 March 2009 (UTC)Reply
(to Carolina wren) I'd likely start with Croatian, simply so I could avoid having to deal with Cyrillic - and if you are sincerely interested in it and its literary tradition, you could not do without mastering Glagolitic script for the ancient writings and I assure you that it is far more difficult to cope with than Cyrillic (My personal opinion is that it is more difficult than Devanagari as well). Replacing three languages with one would not be less useful, but the contrary - it would facilitate new entries without the need to triple them. The uſer hight Bogorm converſation 19:34, 5 March 2009 (UTC)Reply
Original Glagolitic isn't that difficult, but Croatian angular/cursive versions of it, and astonishing number of unique ligatures (>250 unique ligatures in e.g. Brozić's breviary) makes it very hard for non-specialist to decode the language underneath. Here you can find a free e-book containing copies some kewl Glagolitic writings ^^ It should be noted, however, that transcribed in Roman script, some 95% of those Glagolitic MSS. would be intelligible to any SC speaker. --Ivan Štambuk 20:24, 5 March 2009 (UTC)Reply
That is why I would rather opt for Devanagari, it has also numerous ligetures. I have mastered some of them, but three and more consonants are quite painful to learn. The uſer hight Bogorm converſation 20:36, 5 March 2009 (UTC)Reply
So the King of Yugoslavia was a communist? :) I'm aware that Yugoslav is no longer a fashionable term, which is unfortunate as it least it was a concise and ethnic neutral way of referring to the interrelated peoples, languages, and cultures of the area. Bosno-Croato-Montenegri-Serbian just doesn't have the same panache. Casual users who come here looking for an entry in Bosnian, Croatian, or Serbian are likely to be confused by this proposal as it stands, since it does its best to avoid using those terms in favor of specialist terminology regarding the various dialects. I'm worried that this proposal is ignoring the needs and desires of non-specialist users. At the very least, there probably ought to be short referral entries under Bosnian, Croatian, and Serbian L2 headers in the case of the Latin entries, and Bosnian and Serbian L2 headers in the case of the Cyrillic entries that point to the main Serbo-Croatian entry. Abolishing the use of Bosnian, Croatian, and Serbian as L2 headers will not be user friendly, tho I concede it would be editor friendly.
By the way, Ivan, why all the vitriol over Knaanic? Given the situation, certainly something analogous to Ladino or Yiddish developed in the region, though whether what existed would be better described as a dialect than as a language of its own, I'll leave to those who take the interest to study the matter further.
And laſtly, to Bogrom, given that I feel no urge to read Bēoƿulf in the Anglo-Saxon, I think I can ſafely ſay I'll never bother with Glagolitic under any circumſtance. Carolina wren 20:53, 5 March 2009 (UTC)Reply
That is fine, but the Kingdom hight Kingdom of Serbs, Croats and Slovenes, only afterwards did it change its name. Just as the unity of Oſſetian nouns does not ignore no one's deſires and needs and L2 headers for Digor and Iron would only encumber the entries and gainſay the mainſtream linguiſtic poſition on the Oſſetian language, here too I would not ſupport them. Inſtead, I would ſuggest adopting Vahagn Petrosyan's layout for Ossetian as ſhewed in ефс/æфсæ - one ſuccinct clarification in brackets and redirect to the regional ſpelling would be completely ſufficient. The uſer hight Bogorm converſation 21:09, 5 March 2009 (UTC)Reply
Concerning the difficulty which editors may have who are expecting Serbian, Croatian, etc., that is a real concern. However, the simple fact is that any dictionary, and certainly one which intends to cover all languages, must have some conventions to be learned by its readers. Many people looking for Latin terms might be confused when they can't find IUDEX, iūdex, or judex, as the entry is at iudex, but once the conventions are learned, they should be fairly straightforward. Concerning the deviance from SIL's categorization, one should recognize the difference in purposes between our categorization and theirs. SIL is under a real pressure to recognize claimed autonomy and political issues which are present in abundance in this situation. We, on the other hand, exist to document and inform, and place less value on political considerations. As Ivan has stated, the small amount of difference between Serbian, Croatian, Bosnian, etc., is better communicated with a unified Serbo-Croatian header. Now, I know next to nothing about Slavic languages, so I am not in a place to judge the specific merits of this proposal, but as a rule I generally defer to the judgment of people working on a language, as I think they often know better than I. We have seen similar things done with Hebrew and Nahuatl, and I have so far been happy with the results. -Atelaes λάλει ἐμοί 22:10, 5 March 2009 (UTC)Reply
Carolina, there are 5 political entities that bore the name "Yugoslavia":
  1. Kingdom of Yugoslavia (1918 - 1941)
  2. Democratic Federal Yugoslavia (1943 - 1946)
  3. Federal People's Republic of Yugoslavia (1946 - 1963)
  4. Socialist Federal Republic of Yugoslavia (1963 - 1992)
  5. Federal Republic of Yugoslavia (1992 - 2003)
Today the terms Yugoslavia and Yugoslav almost always pertain to the 4th one (SFRJ), which was a communist state, and during the censi of which there was a "Yugoslav" nationality. The first one you refer to was not a communist state, but a Serb-dominated monarchy. Each of those "Yugoslavias" widely differed in ethnic and territorial coverage. If there is a "need" for term to denote some arbitrarily defined notion of "interrelated peoples, languages, and cultures", it would certainly not be Yugoslav today.
I don't think that the issue of "confusion" you describe is a valid usage scenario. If someone is not willing spend a few minutes learning the convention of this dictionary, he shouldn't come here (or anywhere else) in the first place. If he knows anything about B/C/S utilise Wiktionary as a learning tool, he has almost certainly heard of Serbo-Croatian, and the "neighbouring" standard languages. Perhaps Mediawiki software will one day be much more user-friendly towards dictionary-type projects such as Wiktionary, in terms of facilitating entry searching (or more likely: this project will get relocated to some real piece of software), but that's just a bunch of speculative ifs. Wiktionary entry format is strict in structure, and I'm afraid that "blank" redirect entries are not valid when at the same time page already contains a full-blown entry in that very same language.
Note also that Bosnian is not written in Cyrillic script.
I'm mentioning Knaanic as an example of SIL gone mad, and ISO codes not reflecting the "real" language state of affairs. At best, "Knaanic" is Czech stuffed with some Jewish cultural borrowings. What constitutes a "language" or not, and how Wiktionary should treat it, is something that should be discussed and argumented, and not simply settled by appealing to some dubious authority such as SIL. A large amount of languages/ISO codes they list will never get Wiktionary L2 header (just look at that macrolanguages list..) --Ivan Štambuk 11:09, 6 March 2009 (UTC)Reply
Ivan, I'm not certain what you mean by "not valid" since if foo be a Serbo-Croatian word:
==Bosnian==
:See [[#Serbo-Croatian|foo]] (Serbo-Croatian).
would certainly work to redirect the person to appropriate section of the page, tho be a little silly if only Serbo-Croatian used that entry page. However sometimes a little silliness is necessary to achieve both consistency and ease of use.
I'm troubled by your apparent opinion that we should not make reasonable attempt to accommodate casual users and instead force them to learn technical terms to be able to use a dictionary. Let's take for example a tourist off to spend a couple of weeks in Croatia. While it probably would be logical to assume that such a user would at the very least be able to quickly grasp the idea that Serbo-Croatian and Croatians are usually one and the same, I highly doubt that terms such as Ekavian, Ijekavian, etc. will be anything than opaque to casual users, and if those are deemed to be a sufficient indicator of usage, then it will render the Wiktionary useless to such people.
Leaving aside the poor tourist, who admittedly is an extreme example, I'm not even certain that they would be useful to more knowledgeable users. It is obviously difficult for me to judge how specialized such terms as you propose to use as the primary means of differentiation are in actual use among those already possessing some ability in those languages. For an average Croatian student, (not one going intensively into language study, but one taking a typical college-preparatory curriculum) just when would they be likely to encounter and be expected to use such terminology as Ekavian, Ijekavian, etc. At a distance, they seem to be about as opaque as using the terms Rhotic and Non-Rhotic to describe English dialects would be to a typical native language user of an English dictionary. Using regional markers rather than technical markers would in general be better as those would be easily grasped by non-specialists. (Not that I am wholly opposed to such technical markers. Ideally Wiktionary should be useful to specialists as well as generalists. I just want to make certain that it isn't something only a university language major could love.) Carolina wren 20:54, 6 March 2009 (UTC)Reply
These redirects would be unnecessary for 95% of SC entries, as in these cases SC entries would be the only entris in their respective pages, or perhaps shared with another closely-related South Slavic languages with identical/similar meanings, so I doubt that the poor user would get confused. As for the rest of 5%: I wouldn't have nothing against such redirects if community feels they're needed. Tho I have high doubts that anyone looking up Croatian and Bosnian words wouldn't have scanned the ToC for SC entry as well.
As I said, differences on Ekavian/Ijekavian/Ikavian are taught usually as lesson 1 on most B/C/S language courses. E.g. here (click on "Culture"), here, here only in lesson 5 (but the entire book consistently introduces forms with jat reflex in ijekavian/ekavian pairs, even tho they're explained only in lesson 5), here in the preface, here in chapter one etc. It's as basic as -our/-or variant spellings of English. I'd be surprised to see some B/C/S handbook (in English or any other language) not mentioning this basic distinction.
Tourists usually read specialised tourist phrasebook and "for dummies"-type literature that writes the word in some English-friendly transliterations and Wiktionary entry is likely to be an overkill for them. But presenting more high-quality information cannot be a bad thing overall. Wiktionary should strive for completeness and quality, not for some low-quality content worrying that it might hurt minds of innocent users. Our primary audience are not some (presumably) ignorants that randomly parachute via search engines but people that would use Wiktionary to learn/look up words of a language they already have some basic proficiency in. --Ivan Štambuk 22:37, 6 March 2009 (UTC)Reply
As a learner of Serbo-Croatian, I'd like to thank pioneering Serbo-Croatian-speaking editors of this site for keeping political considerations out of the discussion and thus making it easier for me too actually look up terms. I don't know how much content there would be if you all had to quadruplicate each entry. Saimdusan (talk) 16:14, 14 March 2012 (UTC)Reply

Inflection template

edit

I think we should have an inflection template for this, similar to {{sr-noun}}, that would allow for the script variants without having to write "Cyrillic spelling" or "Roman spelling" in full. It would also need to allow for the head= parameter and automatically embed {{l}}, including the correct script template (for the mention of the other spelling). The first numbered parameter should probably be for the script (c and r, respectively). What do you think, guys? – Krun 15:07, 25 June 2009 (UTC)Reply

Good idea. I have a program that converts those automagically, so it's no fuss for me, but I can imagine other folks growing frustrated with such redundant typing all the time. I noticed Opiaterein's bot replacing {infl} with langauge-specific templates for some languages, perhaps we can ask him to replace it en masse for SC too once the template gets written. --Ivan Štambuk 20:01, 25 June 2009 (UTC)Reply

Like this:

  • {{sh-noun|g=m|head=bùnār|r|бунар|бу̀на̄р}} > bùnār m (Roman spelling бу̀на̄р)
  • {{sh-noun|g=m|head=бу̀на̄р|c|bunar|bùnār}} > бу̀на̄р m (Cyrillic spelling bùnār)

? --Ivan Štambuk 21:22, 25 June 2009 (UTC)Reply

Looks good, although I would personally find entering {{{1}}} before any of the other parameters more straightforward (that doesn't affect the display, of course). Shouldn't we also include a plural parameter? – Krun 21:44, 25 June 2009 (UTC)Reply
OK, it's up to the user how to sequence parameters. Plural is unnecessary because it's an ambiguous category in languages with cases (it would be nominative plural), plus there is always the declension table. --Ivan Štambuk 21:48, 25 June 2009 (UTC)Reply
Yeah, ok, I suppose it's not necessary. I've just been using it when I unified entries that had it marked, so as not to lose information, as I am not knowledgeable enough to put in the whole declension table. – Krun 21:51, 25 June 2009 (UTC)Reply
OK. Have you noticed the issue I pointed out at Template talk:sh-noun ? If you use sc= for {l} it adds some stray {'s and }'s --Ivan Štambuk 21:57, 25 June 2009 (UTC)Reply
Noted. I think I have it fixed. I've put the template into use at a new entry, потпис. Take a look. – Krun 22:07, 25 June 2009 (UTC)Reply

Rationale for merging

edit

Here I'll briefly outline the rationale for merging L2 sections of Bosnian, Croatian, Serbian into one L2 section ==Serbo-Croatian==. Initially it was meant to be pretty straightforward policy when active SC contributors (Dijan, Bogorm and me) agreed to pursue the unification path, as when you're a native speaker (or have a native or good knowledge or other close Slavic language) the benefits of unification are more then obvious. However, given the amount of community attention by other non-native speakers, and contributors not familiar with linguistic and historical problematics of the "split of Serbo-Croatian", it seems necessary to explicitly justify the merger to interested parties.

I'll start by citing an introductory excerpt from the book Language and Identity in the Balkans by American Slavist Robert D. Greenberg (you can read his CV here), which is entirely dedicated to the ethnic and historical background of the split of SC. you can also find it scanned in PDF format in "well-known places" if you're too lazy to go to the library

 

1. Introduction

To this very day ethnicity strikes many Westerners as being peculiarly related to "all those crazy little people and languages out there", to the unwashed (and unwanted) of the world, to phenomena that are really not fully civilized and that are more trouble than they are worth.
(Fishman 1989: 14-15)

1.0 Overview

It must have been only my third day in Yugoslavia, when my Croat friends took me to Zagreb's Mirogoj Cemetery. I had arrived in Yugoslavia to complete dissertation research. My topic was in theoretical Slavic linguistics on Serbo-Croatian appellative forms, which essentially included forms of address, commands, and prohibitions. I came armed with my charts of verb classes, imperative endings in dozens of dialects and the rough draft of a questionnaire. I planned to travel to each republic, and was going to seek out dusty band-written records of dialect forms. However, on that day in September 1989, I was still the tourist taking in the sights. I was amazed when my friends asked me if I wanted to see the grave of Ljudevit Gaj. I felt the kind of excitement the wide eyed student might experience when going on a field trip to a place they had only read about. When we reached the grave, my friends knelt down, genuinely moved. With visible emotion, they explained that Gaj, who had sought the unity of all Southern Slavs in the nineteenth century, embodied for them a lost dream of ethnic harmony, and of pan-Slavic cooperation. In retrospect, their feeling of loss preceded the events that were to occur only a few years later: as if they knew that Yugoslavism no longer had a chance. In that conversation, they told me that Serb-Croat relations would never recover from the upsurge of nationalism in the late 1980s. I had studied about Gaj primarily for his role in bringing about the unity of the Serbo-Croatian language. Was I to understand my friends' mournful comments as an indication that Serbo-Croatian was also no longer possible?

Six months later I was back in Zagreb at the Institute for Language to disseminate my questionnaire on Croatian appellative forms. I had painstakingly produced two versions of the questionnaire—one in the Eastern (Belgrade) variant of Serbo-Croatian, and one in the Western (Zagreb) variant. I did my best to adjust my speech from Belgrade to Zagreb mode. However, in a slip of the tongue, I innocently mentioned something about my plans for July. Much to my embarrassment, my interlocutors chastised me for using the Serbian form jul 'July', rather than the Croatian form srpanj. To add insult to injury, one of the Institute's staff then took me aside and made me repeat after her all the proper Croatian forms for all twelve months. I knew that language was a sensitive issue, but did not realize the emotional and ideological baggage each word carried. Most Croats had simply praised my excellent "Croatian," even though I could have sworn that I had been speaking with a Belgrade accent. When I received the questionnaires from the various Croatian linguists, who graciously agreed to provide data from their native dialects, I was pleased at the level of cooperation. Only one or two questionnaires were returned blank, with a terse note to the effect that they could not answer my questions, since I was primarily interested in phenomena occurring only in Serbian.

Later that month, I attended a reception at the Belgian Embassy in Belgrade. One distinguished guest, having discovered that I am a budding linguist, came up to me, and asked if I would answer a question which had long troubled him. I braced myself for yet another potentially embarrassing moment, but was relieved to hear that he simply wanted to know if I thought that Serbo-Croatian was one language or two. It was 1990, and the answer seemed obvious to me— officially the language was still united, and mutual intelligibility among its speakers was still possible. It was true that two literary languages had the potential to emerge, but it was too early to determine if this split had really occurred. This answer could not have made my questioner happier; having listened intently to my explanations, he became animated, and thanked me profusely for bringing closure to an issue that had been tormenting him for years. My theory about the basic unity of the language had been confirmed some weeks earlier, when I joined dialectologists from all over Yugoslavia at a weekend working session in the Serbian town of Arandjelovac. Perhaps I was naive, but it seemed that the Croat dialectologists had cordial relations with their Serb counterparts, and that they were all cooperating on the joint project of producing the Common Slavic Linguistic Atlas.

When I returned to the region after the cataclysmic events of the wars in Croatia and Bosnia-Herzegovina, the language situation had changed radically. Having landed at Sarajevo Airport in June 1998, I struck up a conversation with one of the airport's land crew. Her first comment was that she was impressed with my skills in the Bosnian language. Frankly, I had had no idea that I was even capable of speaking Bosnian, since during my previous visit to Sarajevo in 1990, I had openly admitted to speaking Serbo-Croatian. Relaxing at a cafe the next day, I was told by a Bosnian Croat colleague from Sarajevo University that he felt that the officials at the university were forcing the Bosnian language on everyone. He felt uncomfortable speaking it. The friends I stayed with were a Serb and Bosniac couple. She was not afraid to tell me that even though she speaks the Bosnian language, she completely rejects the initiatives of the Bosniac language planners, who in her view are insisting that everyone unnaturally adopt the speech characteristics of her grandmother from a small village. The next morning I crossed the inter-entity boundary in order to catch the bus to Belgrade. In Bosnian Serb territory, i spoke the same language I had used the day before, only now I was treated as a Serb. When the Yugoslav border guards singled me out for extra questioning upon my entry to Serbia, the bus driver told them to let me through, because he considered me to be one of theirs. While it still seemed as though Bosnian and Serbian were variants of one language, it was not at all clear how many years were needed before a foreigner would truly encounter difficulties in switching from one language to the other.

When I visited Montenegro that same summer, I gingerly asked my linguist colleagues whether or not they took seriously the moves to split off a Montenegrin language from the Republic's prevailing Serbian language in its ijekavian pronunciation. They retorted that supporters of a separate Montenegrin language were extremist Montenegrin nationalists, and that nobody in the community of linguists took them seriously. One colleague, a dialectologist, went so far as to say that it is impossible to identify a single linguistic form that would identify all Montenegrins. "If there were such forms," he chuckled, "they could be counted on one or two fingers." Since then, however, the advocates for a Montenegrin language have remained vocal, and given the political strains with Serbia, an official status for a separate Montenegrin language cannot be ruled out.

 

(Note that since this was written Montengerin language became officialized in the constitution of the Republic of Montenegro, Montengerin grammar, dictionary and orthography books being written and prepared to be published this fall, ISO/SIL code probably soon-following.)

I don't want to go into any more details of the history of SC language relations, and how it came to be that you are treated to speak "Bosnian", "Croatian" or "Serbian" depending on which side of the border you're located. Greenberg's book provides an excellent de-nationalized perspective on the diachronic and synchronic problems surrounding the SC. And that's something unfortunately you're not going to find with prominent B/C/S/M linguists when making any kind of big-picture statements over the national language, even though they all very much know somewhere deep down inside that they're untrue, as such acts would be regarded as "treason" and would likely seal their professional career.

If we look strictly from the perspective of dialectology and genetic linguistics, we're dealing with the same dialect - Neštokavian. Neoštokavian is part of larger dialect cluster called Štokavian (named after the interrogative pronoun što "what"), and resulting from the conjunction of historical circumstances (which are now irrelevant) that dialect has been used as a base of modern standard B/C/S. Concordantly, their grammars are basically identical with pretty trivial differences. Phonology and accentuation are identical (exactly the same phonemic inventory with the same proscribed pronunciation, the same pitch-accent system, with negligible differences). Inflections of nouns and adjectives are the same (and are provided with common templates such as {{sh-decl-noun}}), and as well as the inflection of verbs where the only difference is in the spelling of Future I tense for verbs whose infinitive ends in -ti (and which is handled gracefully inside the {{sh-conj}} template).

There are some differences in derivational morphology which are intended to be treated with context labels (denoting which form is preferred where), and usage notes. Common such examples are the suffixes -ica (Croatian) vs. -ika (others) to form female agentive nouns (E.g. doktorica vs. doktorka - but the male form doktor is shared), or the verbal suffixes -irati (Croatian and partly Bosnian, borrowed from German) vs. -isati (others, borrowed from Greek) & -ovati (borrowed from Russian). This latter pair reflects different historical and cultural associations - western Croatian which was influenced by German-Hungarian Catholic provenience, and eastern Serbian which was under the influence of Greek and Russian Orthodox provenience. These kind of differences were much larger some 150 years ago, but foreign elements got cleansed due to purification efforts, and common SC grammars and orthographies in the last century eventually balanced the usage of what used to be exclusive Croatian or Serbian trait (so you get funny situations that what some Croatian purists today call "Serbianisms" was 100-150 years ago actually perceived as "Croatism" in Serbian circles).

Major point of difference is in the reflexes of words that have the Common Slavic jat phoneme. That sound was anomalous in phonological system and eventually lost or transformed in all Slavic dialects but with different outputs. In SC area these variant forms are called Ekavian, Ijekavian and Ikavian, with reflexes of /e/, /ije/ and /e/ of long jat, respectively. Standard Croatian and Bosnian are Ijekavian, Serbian is Ekavian (central Serbia) and Ijekavian (Serbs of Bosnia). These are treated as alternative forms of one word: E.g. mlijéko, mléko, and mlíko, all reflecting pre-form *mlěko "milk", jat originating by by liquid metathesis from Common Slavic form *melko. Such variant forms are treated in the ==Alternative forms== section because they are just considered different spelling forms of one and the same "underlying" word. I drew a comparison to British and American English difference of rhotic and non-rhotic pronunciation, had it been reflected in phonological orthography (like it is used for SC): would bɜːrd and bɜːd really be 2 different words? The important thing to note here is that switching between Ijekavian and Ekavian forms is (mostly, there are some quirky exceptions) completely trivial to all native speakers of either form, as the difference among jat reflexes is non-lexical, not inducing and kind of intelligibility barrier.

Another point of departure are the words which are exclusively associated with literary style one of the variants, not having trivial differences in spelling. These are the words that Dungodung mentioned, such as rajčica, vlak, ručnik..these would be clearly perceived by native speakers as belonging to the "Western variant" (i.e. Croatian). These are meant to be treated in usage notes, which states which form is used where. Such words are however very low in volume, and constitute at most 1-2% of lexis. It should be noted however that lots of speakers (of those older than mid-20s, i.e. folks who grew up in Yugoslavia - all) have no problems in understand corresponding words preferred by other literary variant.

These all differences can be analyzed at the example text of the Universal Declaration of Human Rights. If you look at the tables the primary differences are:

  • Ijekavian/Ekavian pairs (čovjek/čovek, svijest/svest, vjera/vera, porijeklo/poreklo, smije/sme, nečovječnom/nečovečnom)
  • Trivially-differing spelling resulting from different cultural associations: C/B opći vs. S. opšti - opći is a native Slavic form, naturally reflecting Common Slavic *obьtjь, whilst Serbian opšti is a taken from Serbian Church Slavonic, where it was borrowed from Old Church Slavonic обьщь (obĭštĭ), with /št/ as a reflex of Common Slavic /*t'/ (characteristic of Bulgaro-Macedonian dialectal area where OCS originated), as opposed to /ć/ reflex in Štokavian dialect. Similar argument goes for članak/član, spol/pol, tko/ko (and its derivatives: netko/neko, svatko/svako - 99% of Croats pronounce tko as [ko] in vernacular speech though, it's just the orthography that is more conservative), which can all be said to be the same words in different forms.
  • svugdje/svagdje/svuda is purely literary distinction and all 3 forms are used in all 4 states
  • oblik/forma - Croatian has a mild preference towards the Slavic word over the Latinism, however both forms are valid in all 3 standards.
  • completely different words:
    starateljstvo/skrbništvo - abstract nouns derived from different verbal stems of the verbs starati and skrbiti, both of which are valid in literary idioms of all 3 standards, but in Croatian skrbiti is much more used.
    osoba/lice, osobni/lični - lice means means "face" all 3 standards, but in Serbian it also means "person", and this sense was borrowed from Church Slavonic. Older people in Croatia still call personal ID card lična karta, while younger use osobna iskaznica (which is the official term). lice/lični is still used in Croatian in the sense of "person" in grammar terminology (e.g. first-person singular = prvo lice jednine).
    temeljem/na osnovu - the latter form is also valid in literary Croatian, however legislative terminology has a perverse preference towards the phrase temeljem (instrumental case of the noun temelj "base, foundation") which is however absolutely never spoken in vernacular speech.
    sigurnost/bezb(j)ednost - The former is an abstract noun derived from the adjective siguran "safe, secure" by means of suffix -ost, the latter is derived from the adjective bezbjedan in the same way. siguran is markedly Croatian, bezbjedan is markdley Serbian.
    podrijetlo/por(ij)eklo - Croatian since the 1990s prefers podrijetlo in literary idiom, chiefly thrugh the puristic efforts of extremists who find the term "more Croatian", given that it was originally confined to Dubrovnikan local speech (where rijet = "word" is still used), and was not found anywhere else in Croatia. It's really annoying because the word sounds really ugly. However porijeklo is also literary Croatian and hence no real difference.
  • the difference in syntax (the only one worth mentioning): Croatian perfers infinitive (in -ti/-ći) where Serbian prefers da + present tense. Hence the differences such as postupati/da postupaju, činiti/da čini, biti/da bude. This is a dictionary so we cannot cover this.

As you can see, basically all the real-world occurring differences can be easily handled with the proposed scheme. Most of them are really trivial and could be easily handled at the orthographic level, such as the introduction of the jat sign <ě> which would then abstract away the different pronunciation of words with its reflex. This was done in the 19th century but unfortunatelly hasn't caught on.

What is exactly gained by this approach? Well, for once it massively reduces the efforts of both contributors and the users. The giant overlap in vocabulary would cause a great deal of redundancy if we were to treat it separately. In fact, most of the separate B/C/S entries so far were created by simple copy/pasting of the one originally written form, and changing the ISO code for separate categorization. Furthermore, for the end-users, it would prove to be causing lots of confusion, as they would have to manually discover the differences in standards, by comparing the meanings list of 3 different language sections. It makes much more sense to treat the common core as the default, and make the differences exceptions, rather than vice versa.

The important thing to have in mind is that we're writing a dictionary to contain "all words in all languages" in English, which is quite a different goal than writing a dictionary of either of the literary idioms by its native linguists. It makes sense to write standard reference dictionary of Serbian or Croatian in Serbian/Croatian, for common people to use, because most of them does not really have any kind of desire to know what is the corresponding term in the other variant. They want to achieve literacy in their own national idiom. For non-native speakers learning the language, on the other hand, it would seem really silly to learn relatively complex language like SC in a specific variant, completely disregarding the other sides, not making that extra 5% effort needed to understand them too without any problems. But even if you are intent on such perverse intellectual endeavor, the unified scheme wouldn't present you much of a difficulty as you'd simply had to ignore everything starting with (Serbian), (Bosnian) or (Montenegrin). For (I presume - most) of the others the benefits are quite clear.

It has been mentioned that the term Serbo-Croatian is potentially insultive. This is true, but only for some nationalists which would exclude most of the potential English-speaking userbase of Wiktionary (well, native contributors are entirely different category, but the current SC contributors don't have any problems with the term). The term Serbo-Croatian is moreover very much used in English literature, as I illustrated in that BP discussion. It is used by e.g. Britannica, and by the most authorative grammatical sketch of the language in the big Slavonic languages book by Wayles Browne. Current English-language research papers use it also without problems, though sometimes the author feel a bit uncomfortable about it. E.g. in this paper by Dutch Slavist Willem Vermer he writes in the footnote: I stick to the traditional label of 'Serbo-Croatian' because from the point of view of the diachronic linguist a technical term denoting the dialect continuum traditionally referred to by it is indispensable and would have to be invented if it did not already exist. This choice should not be construed as implying a political preference. Indeed, I am very unhappy with the traditional requirement (which has always been widespread in SCr. linguistics) that investigators of the history of the language should adapt their linguistic terminology to political priorities.

But we're using the term Serbo-Croatian primarily to refer to the literary dialect, which is the same, and subliterary dialects (Čakavian, Kajkavian, Torlakian and Ikavian Štokavian) are to be handled by means of context labels as is illustrated in the proposal. The term is thus used in both senses.

Unfortunately, there are is no non-ethnic name for the language to be used as an alternative. I personally am not also very satisfied with the term Serbo-Croatian as it leaves aside Bosniaks and Montenegrins. Some modern grammar books use the abbreviation BCS, or simply write Croatian and Serbian (with the conjunction instead of a hyphen). SIL/ISO also use it but as a macrolanguage identifier, containing B/C/S as "individual languages". We must use something, and this term has no alternative. Seeing something like ==BCS== would be really strange. --Ivan Štambuk 13:19, 28 June 2009 (UTC)Reply

The above might be a rationale for allowing Serbo-Croatian. Here is a rationale for allowing Serbian, Croatian, Bosnian headers:
  • verifiable argument: they are viewed as individual languages by all states where these languages are spoken,
  • verifiable argument: they are viewed as individual languages by international organizations (e.g. ISO-639)
  • verifiable argument (not contested by the above rationale) : forbidding these headers would be considered insulting by a number of native speakers (we cannot know how many)
  • verifiable argument: Wiktionary already considers that it's acceptable to be redundant when it comes to neutrality (cf color/colour).
  • argument to be verified: not allowing them would violate the Wikimedia NPOV policy
  • argument to be verified: not accepting these headers despite the opinion of all international organizations, and basing this decision on linguistic arguments by contributors, would violate the Wikimedia "no original research" policy (all the more so as some linguists would disagree with the decision)
  • an additional argument: as explained by Ivan Štambuk, there is no satisfying name for covering all three languages (but I think this one is much weaker than other arguments: if (and only if) a name is the common name, then it's normal to use it, even when it's not perfect).
Lmaltier 20:04, 28 June 2009 (UTC)Reply
OK, Lmaltier, here's the thing. It is not "current practice" to have Serbo-Croatian alongside its constituent national standards in pages. The pages contain either split sections or a unified one, not both. The SC contributors here have started merging (mostly their own) SC entries under one header. It's a planned change of scheme. If the en.Wikt community will not accept this, then we would rather split them back and not have this new header in addition. In that case, we would simply be using the ISO scheme; in effect, the individual political schemes of the states involved. Although counter-productive and cumbersome in itself, it is at least better than using both systems, thereby also causing a great deal of confusion. Wouldn't you be confused as a curious layman with little linguistic knowledge to see Serbian, Croatian and then Serbo-Croatian as well? In a nutshell: If we are to keep B/C/S headers, we don't want a Serbo-Croatian one. If SC is accepted, we don't want the other headers. – Krun 23:14, 28 June 2009 (UTC)Reply
My arguments are not about this point, the rationale is only about allowing Serbian, Croatian and Bosnian headers. Lmaltier 05:27, 29 June 2009 (UTC)Reply
These are all external arguments which in no way invalidate any point made in the proposal. Wiktionary is a dictionary project to write "all words in all languages." As such, it has its own best-practices in achieving that goal. SIL/ISO is an international institution that simply assigns some 2- or 3-letter codes which mean absolutely nothing to us, other than being a unique mapping to language name used by context labels and categorization/linking templates. We even have our own list of languages without ISO codes, which we plan to assign some Wiktionary-specific unique pseudo-codes, to achieve the Goal. Even Wikimedia invents its own codes to achieve the goal of "free knowledge in all languages". For our own purposes, we don't want for SIL, speakers of the language, constitution of some state or anybody else stand on the path. They can provide useful tools for achieving the Goal, but once their convictions become more of an problem than they're worth, we can simply ignore them. We don't need to do anything, just because it's written somewhere or said by someone, if it confronts with the Goal.
Rabid nationalists would of course find the term insultive (in fact, they'd pretend that its insultive, trying to fabricate history in order to show that "there never was SC", and that there were always "different languages"). But none of them is likely to be contributing to this project anyway. As it turns out, all the current WT SC contributors have absolutely no problem with the term. I've encountered only one case when certain IP (a well-known Serbian nationalist troll from Australia, who has a record for adding fake etymologies for Turkish borrowings) added ==Serbian== when there was ==Serbo-Croatian== already present. Really, is the mental health of such people really worth so much?
color/colour is a special case, with long and painfully history of discussion (the very mentioning of it is prob. stressful to certain folks). Although it is not regulated by a policy AFAIK, as not to give undue prominence to either of the equally "proper" Englishes, there is a strong preference in the community to treat variant English spellings with the same meanings by means of creating a redirect via {{alternative spelling of}} or a similar type of template. Actually, most of the variant -our/-or, -ise/-ize etc. pairs are handled that way. Redundancy is not the preferred option either here or anywhere else. Just look at the content of Category:English alternative spellings.
Using the term ==Serbo-Croatian== would not be OR as that term has more than a century of attested usage in the two senses it is used on Wiktionary (standardized Neoštokavian varieties, collection of 4 dialects). It is used in these senses by prominent English-language publications such as the ones mentioned above. I wouldn't know how Wikimedia would think about this all (whether they do care at all - I suspect not, or whether they have anyone competent to comment upon this), but given that we have sh Wikipedia and Wiktionary already, I somehow find it really hard to believe that they'd be shutting this project for the usage of ==Serbo-Croatian== L2 section, or some alleged OR or NPOV associated with that term. --Ivan Štambuk 10:12, 29 June 2009 (UTC)Reply
You should have read more slowly. I don't suggest that using Serbo-Croatian would be original research (of course not), only that deciding that a Croatian header would be unacceptable for linguistic reasons might be original research and/or a POV not acceptable by Wikimedia. Lmaltier 11:54, 29 June 2009 (UTC)Reply
I don't see how disallowing a Croatian header is an issue really. We wouldn't be disallowing Croatian entries, just regulating how they are put in. Vocabulary specific to Croatian is marked with a context template, etc., and any rare peculiarities, regional pronunciation in different areas (both within Croatia and elsewhere in these former Yugoslavian states) also noted. In fact, we are doing this very same thing for Southern Nan (Min Nan) varieties Teochew and Amoy (these could easily classified as separate languages, mutual intelligibility only being about 50%); they are both being entered under the unified header Min Nan, and this seems to work fine (the difference not being as huge in written form). The internal differences of Serbo-Croatian are of course negligible compared to this example. – Krun 21:27, 29 June 2009 (UTC)Reply
This is what I explain: you prefer your linguistic arguments to internationally recognized views on what individual languages are. Lmaltier 05:17, 30 June 2009 (UTC)Reply
This is one language in 3/4 differently codified literary varieties. These varieties used to be called "Eastern" and "Western" until the 1990, after which they became "languages", with the rise of newly-discovered self-identity. You can hardly call them "internationally recognized", since most (if not all) of the western linguists devoid of nationalistic prejudices still consider them as one language.
For the purposes of compiling a dictionary of all of them there is simply no alternative but to treat them collectively. --Ivan Štambuk 07:18, 30 June 2009 (UTC)Reply

Issue with declension

edit

I disagree with the assertion that all forms should always be entered manually. If the following statement is true: Most contributors will, however, only enter normal inflection without accent marks. Then why not just let them enter their {{sh-noun-x|stem}}, and wait for someone who wants to add the accents to switch it to the full table later? — [ R·I·C ] opiaterein12:10, 9 July 2009 (UTC)Reply

That is no problem at all. The problem is that you'd need to create some 100+ these {{sh-noun-x|stem}} templates to cover all the inflection cases for nouns. I starting creating them a while ago (c.f. Category:Croatian declension templates) but have been slowly eliminating them in favor of manual inflection. The benefits of manual inflection are several IMHO:
  1. It's much more easier to copy such inflection tables to foreign Wiktionaries (as I noticed some FL wiktionary editors have been doing)
  2. It's will be much more easier to generate entries on inflected forms by a bot one day (you just need to parse one template)
  3. In most of the cases, for a native speaker it would be much easier to copy-paste stem and add the desinences, rather than to look up the appropriate template based on the properties of the word (whether it is animate or inanimate, is there palatalization occurring or not, how many syllables does the word have, whether there is "fleeting a" or not, whether it ends in some special ending that requires some special treatment etc.). Plus you can use engines such as this one to generate inflection for lots of nouns at once, and process its output by a simple program that will convert it to {sh-decl-noun} format.
In princple, I agree that for the most common types of inflections there should be automatic inflection templates. But, who decides which are these? For each type of inflection you can find thousands of words in that category.. At any case, I think these would be OK to have them as a typing shorthand as long as fully expanded versions are the preferred ones. --Ivan (ⰃⰎⰀⰃⰑⰎⰅⰞⰉ) 13:01, 9 July 2009 (UTC)Reply
Yes, evil Slavic stress makes automatic templating tough. I had to make some 200 templates for Russian, but in the end it paid off. --Vahagn Petrosyan 14:31, 9 July 2009 (UTC)Reply
Yes, but in SC you have 4 tones and not just one stress as in Russian, and the number of templates covering all the morphological and accentual paradigms would be some 300-400 (my estimate). It's similar to Lithuanian for which Opiaterein already wrote some templates (in fact, their systems are deeply related), but in SC there are much more paradigms, and the limitations of template language (no substring search) would require addition of lots of parameters to such automatic inflection templates. But, in the end, accents in SC are not that important as they are in Russian (where they change underlying vowel's phonetic quality), and most speakers doesn't even know them properly. So IMHO they should be best added manually. --Ivan (ⰃⰎⰀⰃⰑⰎⰅⰞⰉ) 14:44, 9 July 2009 (UTC)Reply

Translation tables

edit

The current format uses two rows, for Cyrillic and Roman respectively. However, it could all fit in one row, since there is 1:1 mapping between the Cyrillic and Roman script, and we could effectively reuse the tr= parameter of {{t}} to link to Roman script entry. For example, for English "thought":

I could make a special template {{t-sh}} that would take 4 parameters, the Cyrillic script entry with and without accent marks, and Latin script entry with and without accent marks. Parameters with accents would be optional, of course. Whaddaya think? --Ivan Štambuk 01:51, 22 July 2009 (UTC)Reply

IMO it would not be right to have the roman spelling in the place of a transliteration, as in SC entries we are treating them equally. I have seen this done before with Serbian translations and think it just looks silly. I should be much more happy with something like this:
This way both forms would get an interwiki link and look like equally valid modes of writing (which they are, of course). It might also be a good idea to add tooltips to the words, which would say "Cyrillic spelling" and "Roman spelling" (particularly useful where the only letters used are ones whose forms exist in the other alphabet as well). I am not sure how ekavian/ijekavian fits into all this, though. – Krun 08:42, 4 September 2009 (UTC)Reply

inflected forms

edit

I've created templates {{sh-form-noun}} and {{sh-form-verb}} for the inflected forms of Serbo-Croatian verbs and nouns.

What it bothers me is the question whether it makes sense to link Cyrillic and Roman spellings of the inflected forms, as we do for the lemma in the inflection lines?

I think that they shouldn't be linked, unless there is an actual definition line provided, and not just mere e.g. "genitive singular of XXX". Sometimes folks create example sentences, pronunciations and translations for inflected forms - in that case it would IMHO be justified to have Cyrillic/Roman spellings linked, but not otherwise. --Ivan Štambuk 20:07, 23 August 2009 (UTC)Reply

Montengerin

edit

I've added to Index:Montenegrin all the dictionary entries from Montenegrin orthography that was published a few months ago (you can find an online copy of it here, at the website of Montenegrin government). I've removed from the index all of the entries not passing our CFI (various proper names mostly). As you can see, it also contains various notes on the inflected forms, usage or spellings of words, in certain doubtful cases. The conclusions:

  1. New Montenegrin standard is a superset of old Ijekavian Serbo-Croatian standard. Exactly nothing of what was allowed before (of spellings, lexis etc.) is now "forbidden". The standardisation effort was primarily all-encompassing in character, also allowing for some Montenegrin-specific features.
  2. It has many "double forms" (dublete), just as Bosnian standard, which again makes it another transitional form of Serbo-Croatian, something between Western and Eastern tradition. For example:
    1. it allows for nouns is both -ist (Croatian) and -ista (Serbian): basist / basista, kapitalist/ kapitalista etc.
    2. It allows both relative adjectives in -ski (Croatian) and -ioni (Serbian): koalicijski / koalicioni, melioracijski / melioracioni etc.
    3. It allows both nouns in -kt (Croatian) and -kat (Serbian): objekt / objekat, dijalekt / dijalekat etc.
    4. It allows both nouns in -nt (Croatian) and -nat (Serbian): akcent / akcenat, ambijent / ambijenat etc.
    5. It allows both nouns in -ica (Croatian) and -ka (Serbian): tužiteljica / tužiteljka, but apparently only učiteljica and only atentatorka ?
    6. It allows both verbs in -irati (Croatian) and -ovati (Serbian): apsorbirati / apsorbovati, apstrahirati/apstrahovati. But apparently only koordinirati, opstruirati, and only dezinfikovati, ekstrahovati ?
    7. Various misc. double forms which do not exist in C/S standards: metod / metoda, kafa/kava, mladež / omladina, Talijan / Italijan, nesrećnik / nesretnik etc.
  3. The two new "letters" <ś> and <ź>, denoting "new ijekavian iotation" sounds resulting from former sequences <sj> and <zj>. Both the "older" spelling and newer spellings are allowed. The major problem with these new spellings is the fact that almost all of them are hypothetical, and do not pass our CFI. AFAIK, there are no Montenegrin works still published that utilize the new orthography. (The only place I've seen them used is are the papers of Vojislav Nikčević). Try googling some of them (but under quotes "", so that diacritics matter) - the only result is usually the abovelinked PDF, or various internet sites (usually Web fora) where these new spellings are listed (i.e. not actually used). My suggestion would be that we do not add them unless the evidence of their actual usage is provided. My guess is that it ain't gonna be quite soon, because 99% of people doesn't know how to type them, and 90% of Montenegro doesn't even use those sounds at all. But there is no doubt that sooner or later their usage will surface (it's Balkans, after all, and differences matter, however trivial they be :) When they become abundant enough so that we can cite them successfully, I suggest we treat them at ==Alternative forms== with (Montenegrin) label. (They can also easily be machine-generated from existing entries).

All the other cases are treated with the usual "lest common inclusion" criteria - if a single specific form, when there are multiple variants, is confined to a specific standard, only then we indicate it. E.g. Croatian standard only supports verbs on -irati, so we only add (Croatian) to them, but not also Montenegrin and/or Bosnian (Bosnian standard in theory also supports both verbs in -irati and -isati/-ovati, but the former ones are rarely used in practice, I think). With all that in mind, the integration should be pretty-much painless: (Montenegrin) tag would only occur only in "new iotation" spellings (<ś> and <ź>, plus the <đ> resulting from from earlier /dj/, as in đevojka - these kind of forms are sub-standard in modern B/C/S, although they have plenty of historically attested usage!), and in dual forms of which only one is "allowed". This latter category, however, still needs to be full determined, because it seems that the criteria on them in the orthographic dictionary are quite random (at least to me, note the question marks in 2.5 and 2.6) --Ivan Štambuk 16:44, 3 September 2009 (UTC)Reply

An example new Montenegrin standard word I've added: đevojka/ђевојка. Paradoxically, with citations mostly from Croatian and Serbian WikiSource :D (and also one from modern Montenegrin periodical). This demonstrates the flexibility of common Serbo-Croatian treatment, similar to what we already have with "Serbianisms" in Croatian (not necessarily modern standard idiom) and "Croatianisms" in Serbian (not necessarily modern standard idiom): the form đevojka is actually spoken by today's Croats, Bosniaks and Serbs, it can be attested in such usage by Croat/Bosniak/Serb authors, but it's sub-standard in the modern literary language. However, in Montenegrin it's also part of the modern standard idiom, together with other form djevojka (without the iotation). This latter form is shared with B/C/S standard, so we simply treat it as unmarked ==Serbo-Croatian==. This former form, đevojka, we simply treat as "Montenegrin", despite the fact that it has regional, sub-standard usage outside Montenegro (in the area where I grew up in southern Croatia, they also say đevojka, so I'm particularly fond of such iotated forms :D). If we treated them all separately, we'd needed to dig out citations for every B/C/S section from Bosniak/Croat/Serb authors, and mark it there as {{regional}}, {{dialectal}} and {{nonstandard}} - which could be done but it would be a major PITA, but this way we simply ignore the issue altogether and treat it as a part of the common SC system (new ijekavian iotation is by no means "ethnically Montenegrin" sound change!), using labels to differentiate among modern standards. --Ivan Štambuk 21:37, 3 September 2009 (UTC)Reply

Removing languages

edit

I honestly cannot agree with the removal of Bosnian, Serbian, and Croatian entries that are already written in favor of Serbo-Croatian. While they might be nearly identical, the chances that three languages are that close enough to overlap almost completely is very slim. Furthermore, consensus wasn't reached the last time this went up for vote, so I really don't see why the removal of the already established entries in the already established languages are being deleted in favor of Serbo-Croatian. I honestly don't think that this is the best thing for the English Wiktionary at this point in time, and even though we may have few to no native speakers in any of the three official languages, that doesn't mean that there never will be some in the near future. I honestly think that maybe if people stopped leaning so hard to merge all four of these languages that we might actually get some more editors from the Croatian, Bosnian, and Serbian Wikipedia and Wiktionaries, which would be a very good thing. Maybe we could try a test of not removing any sections that already exist and see if maybe the interest in them piques up after a while? If sufficient interest isn't piqued in say, a month or two, then I would then be forced to agree with the Serbo-Croatian point of view because we have more native sh speakers than we do any of the Bosnian, Serbian, or Croatian speakers, but I still think that it would be worth a shot to try to recruit more of those kinds of users to help out in those areas. Serbo-Croatian might be growing now, but I don't think that is has enough popularity yet to replace the official languages. I hope you at least think over some of the suggestions that I gave in this post! Amikas, Razorflame 23:37, 15 January 2010 (UTC)Reply

The chances are that you have no idea what you're talking about, and that pseudorant of yours is a targeted trolling attack. --Ivan Štambuk 23:45, 15 January 2010 (UTC)Reply
This isn't a rant and I am not trying to "troll". I am merely trying to work out a solution that would benefit everyone, not just the people editing in Serbo-Croatian. If more Bosnian, Croatian, and Serbian editors come here, then they might start to like Serbo-Croatian as well asn your numbers will grow, so that would a positive thing. It would also be a positive thing if they just wanted to come here to edit in their own individual languages and not the conglomerate of the languages. I think that giving people the chance to edit in the individual languages would be very benifical to the English Wiktionary.
As for the part about me not know what I am talking about, I know what I am talking about. I've read the June vote for merging all four languages together, so I know the arguments for and against it, so please don't say that I don't know what I am talking about. Thanks, Razorflame 00:00, 16 January 2010 (UTC)Reply
Well, he does have a point, Ivan. We all know you weren't keen on unified treatment when you came here. Others could be convinced as well if they are actually working on this stuff for a prolonged period. Coexistance for some time is OK (like you're doing with Pepsi Lite and Elephantus's entries), although the long-term goal would be unification. Still, I think merging of existing entries, where the Serbo-Croatian and/or one or more of the other sections is missing and something extra is added as well (and they are not from PL, Eleph., etc.) is OK; in other words, the current practice is probably a already a good compromise, as new contributors get to do their thing uninterrupted. – Krun 00:11, 16 January 2010 (UTC)Reply
I would agree to that, as well as Bosnian, Serbian, and Croatian entries staying the way they are. I'm talking about the ones that have already been made. I don't see any reason why they should just be deleted if someone went through the trouble of making them in the first place. While your compromise is good Krun, I thought that an even better compromise would be to, when a Serbo-Croatian section needs to be added to an entry, that the other languages are left alone because even if they look identical, that does not mean that they are identical. To tell you the truth, I'm not too surprised to not see many native Bosnian, Serbian, or Croatian editors editing here because they would probably edit with the fear that all the hard work that they made to the English Wiktionary could just be wiped out in a second or two with the "merging" into Serbo-Croatian. Frankly speaking, I don't think that any unification of the languages should occur without first, the community's express show of approval for it first. Razorflame 00:28, 16 January 2010 (UTC)Reply
I'm not sure you fully understood me. I was talking about the current practice of not merging entries by non-unification-supportive hr/bs/sr-speakers, while continuing to merge entries by Dijan, Ivan, etc., if and when they are expanded. – Krun 00:44, 16 January 2010 (UTC)Reply
Ok. I think I get what you mean now. Basically, Ivan and Dijan go about making new entries, but not merging any of the old or preexisting ones, and then, when we get more hr, bs, and sr editors, they can add their own languages to the articles that have already been amde. Is that what you were getting at? Razorflame 00:52, 16 January 2010 (UTC)Reply
Entries created by editors supporting the merger are merged. Those created by those not supporting the merger (the whole 0.1% of them) are not. Absolutely nothing forbids these hyptothetical editors of yours eager to add new B/C/S entries to do so, with the presence of existing ==Serbo-Croatian== entries. But they choose not to. Why? Because these would be identical in some 98% of cases, and the whole effort of "proving" that these are "different languages" would end up being a ludicrous exercise in "How strongly does my nationalism defy common sense". --Ivan Štambuk 01:00, 16 January 2010 (UTC)Reply
Still, it might benefit us to make it more clear that entries for new words are very welcome, even if they are not under the common header. They could then possibly be merged if that does not cause any strife; otherwise, not yet. – Krun 01:14, 16 January 2010 (UTC)Reply
New content is of course always welcome. On the other hand, any newbie Wiktionary editor for SC will sooner or later find out that > 90% of the most common words already have a thorough ==Serbo-Croatian== entry, more thorough that he could've possibly created by himself initially, and is likely to get discouraged and demotivated to continue his activity other than in the merger direction. It's like fighting with Borg: nationalism is futile, you will be assimilated. He could find escape in a some particular Croatian-only or Serbian-only words (take a look at contribs of Kubura (talkcontribs), Roberta F. (talkcontribs)), but the absolute majority of the most relevant lemmata is ultimately common, out of their selfish reach. At any case, any editors that showed interested so far in creating (in practice more like: duplicating/triplicating/quadruplicating) separate B/C/S entries have already been aware of the direction of the merger and what they are allowed to do. Razorflame here is the only person I've seen getting confused about it, but one should bear in mind that his involvement with this topic is of random and sporadic interest, and that he has no intention of learning SC or contributing SC words here. --Ivan Štambuk 04:59, 16 January 2010 (UTC)Reply

(unindenting) What I don't get it why we can't have four headers for all four languages on an entry. Even though they are all very close, they have their differences, so I really think that they should just be added onto instead of just deleted. I don't really think that merging is the way to go here because then, the same problem arises with, say, for example, a Serbian editor started making entries in the Serbian language. Then, he stopped and a couple of hours after he stopped and was satisfied with a job well done; the next morning, when he gets on, he finds that his work was deleted and replaced with someone else's work. Couldn't you imagine how that might make you feel if you were that other person? I honestly don't see what the big deal is with just keeping the four languages separate until a consensus with the community is met to merge them. Razorflame 01:17, 16 January 2010 (UTC)Reply

Well, Razorflame, this is exactly the sort of thing that is not happening right now. Perhaps after he has created a few entries, one (only one) of them is expanded and made into Serbo-Croatian and a message is posted on his talk page explaining what we're trying to achive and asking him how he feels about it. If he's fine with it all his entries will be converted and he will presumably create entries under the common header in future; if not, his ==Serbian== header will be reinstated immediately and he will be left alone (unless there is something simply wrong with his edits). – Krun 01:28, 16 January 2010 (UTC)Reply
Yeah. I would be fine with that. If you let the person who made the article know that you rewrote his article, I would have no more problems with this issue. Ivan, would you be willing to do that? If you are, I will have no more qualms about you doing what you are doing, and I probably will support you in your Serbo-Croatian quest :). Please forgive me if I spoke out of turn with my last message, Ivan. I truly am sorry for speaking a little out of turn with you. The reason why I said there were differences is because I saw an English Wikipedia article on just the differences between the four languages. I, myself, don't speak Bosnian, Croatian, Serbian, or Serbo-Croatian, however, I do have a decent interest in Serbo-Croatian. If you can forgive me for speaking out of turn, I would be very greatful. All that I ask is that you let someone know if you completely rewrite their article. You won't have to if they are not active any longer, but if they are active, I would really appreciate it if you could do that common courtesy for them.
The interest in Serbo-Croatian originated from watching you edit every day, Ivan. I've always been fascinated with your edits since I started watching you edit quite some time ago. You really do a great job here. Anyways, thanks for reading this message: Razorflame 05:01, 16 January 2010 (UTC)Reply
No entries are being merged that are not created by editors supporting the merger. Dijan and I in particular authored > 99.9% of separate B/C/S entries before the merger was proposed. Since the merger started, I haven't seen any knew SC (B/C/S/M) editor on Wiktionary that wasn't already aware of the merger issue (lots of them were nationalists canvassed from Croatian and Serbian Wikipedia for the particular purpose of "proving" that we're dealing with different languages). I don't touch or merge entries created by them (I only check them sometimes to see if they contain any errors, as they often do, often to their astonishment when they found out that Serbs also use what they thought to be "Croatian words" or vice versa).
That article on WP that you mention is really bad and needs to be rewritten from scratch. Half of it is simply junk (political nonsense, differences not of standards which are 99% identical but of colloquial speech/dialects etc.) that needs to be relocated elsewhere. It's very misleading to draw any kind of conclusions from it as the article was deliberately crafted by PoV fundamentalist to make it appear as there are some significant difference when in fact there are none. When you read e.g. the sections on accentuation and phonology you might imagine that these are some kind of relevant factor when in fact in modern standard idioms (described in the grammar books) these are almost 100% identical. That some areas do not differ /č/ and /ć/ has no influence that these are separate phonemes in the standard idiom (Neoštokavian dialect). I can almost sympathize with the frustration of a person who wrote that that he couldn't have introduced more such "differences". --Ivan Štambuk 05:28, 16 January 2010 (UTC)Reply
Well, then, we are all good. I have no more problems with you doing what you are doing because you have proven to me that you are doing it right and that you are making good changes to the English Wiktionary. I have no more qualms with you doing what you are doing. Have fun editing, Razorflame 05:32, 16 January 2010 (UTC)Reply
Razorflame, pay attention: what you describe is not what's going on. I'm quite annoyed that you all of a sudden speak of "four languages". Whence does this sudden surge of interest of yours for Serbo-Croatian originate? Not so long ago you were singing praise on my talkpage for the great efforts done it expanding Serbo-Croatian entries over the last few months. Never did you ever even insinuate that you have problems with that. And now it's sudden of "deleting languages", when in fact entries are being heavily expanded and rectified.
I would also be interested if you could explain to me what are these great differences between Serb-Croatian varieties that merit separate treatment that you speak of. Can you mention 10 major differences among Ijekavian Serbian, Bosnian, Croatian and Montenegrin out of your head? Would you describe them in volume/significance as larger or smaller than that of American and British English, Germany and Austrian or Swiss German, Brazilian and Portugal Portuguese, South American and European Spanish? --Ivan Štambuk 04:37, 16 January 2010 (UTC)Reply

cleanup lists

edit

If somebody is interested, I could generate cleanup lists for Serbo-Croatian entries of interest, satisfying any particular criteria you can imagine. I currently have several for my own edit purposes. I'm not sure if it makes sense publicizing them on some of the subpages, but if everyone is interested please let me know, either here or on my talk page. I plan to run them periodically on the most recent dump to keep all the SC entries nice and clean :)

I'm also making preparation for the generation of form-of entries for the already present Serbo-Croatian entries. Almost all of them have inflection tables, and there are some 200-300k entries that ought to be generated. --Ivan Štambuk 00:44, 24 February 2010 (UTC)Reply

I'd help you go through some cleanup lists, if it doesn't involve adding hyphenation. :S But, about the form-ofs: have you got a really good scheme planned, e.g. with identical inflected forms of several words, especially when there is also a separate lemma with the same spelling? Because this does tend to get a bit messy sometimes (I've been cleaning up some Icelandic ones and would hardly recommend the creation of any new ones for a few years at least). I know Serbo-Croatian has reached a stage where it is OK to start really adding them, but it just has to come out so that nobody has to go through those 300k entries (yikes). There would need to be a separate etymology section (containing just "See {{term|lemma||gloss if needed}}.") for each separate word being inflected; differing accents on forms that are otherwise identical need to be handled smoothly (there comes another problem: these might currently get bundled together and written without an accent, because the accents are largely missing from inflected forms in tables) and differing pronunciation marked with appropriate tags. Also, would you create form-ofs at all if they're identical to the lemma? (e.g. accusative forms of masculine inanimate nouns). (SIGH) We shouldn't really have to do this at all. If only we weren't stuck with the limitations of MediaWiki. I so want a database format that effectively has a separate entry (page) for each separate (etymologically within a language or in another language). Each would have separate fields for marking the part of speech, headword line, each definition, etymology, each inflected form (the system would allow a template instead that would put all the forms in automatically, the forms changing dynamically in case the template is updated/corrected); then there could be custom categories etc. This kind of database program wouldn't be so difficult to make, and none of that all-language-editions-together and one-to-one nonsense they're trying to do on OmegaWiki (that gets much too complicated). Then there would be a possibility for script conversion add-ons and only one entry would be needed, e.g. for each Serbo-Croatian lemma. Latin, Cyrillic, Glagolitic or Arabic script could be displayed throughout the entry if the user so chooses. Etymologies could always link to the right lemma, etc. One would be able to search by inflected form … Oh, how simple life would be. – Krun 09:07, 24 February 2010 (UTC)Reply
OK, once I get ŠtambukBot operational it will commence uploading various analyses of the newest dump of SC entries on a regular basis on a subpage yet to be specified. I'm currently struggling to encapusulate those python wikipedia scripts into a .NET assembly via IronPython framework, so that their functionality can be consumed from less primitive programming environments.
Regarding the separate lemma issue: yes that should be handled gracefully. All the cases where the inflected form collides with the already-present SC entry should be treated separately as a special case, by etymology-splitting or expanding (adding additional etymology if there is already more than one present). In those cases, bot should replace the existing SC entry with a new one, and not merely create a new page, or append the generated entry at the bottom of an already existing page.
Noun cases being identical to lemma form (accents notwithstanding) are to be ignored. Methinks it's very wrong to either mix them with actual definitions, like they do it for Latin, or to add them as a separate "etymology" (it's exactly the same lexeme with the same etymology...). Inflected form entries are not definitions! They're formatted like definitions but they're not. They should in fact all be wrapped inside {{non-gloss definition}}. If there is an inflection table already present at the lemma form, it makes little sense to extend the real definition lines with morphological tags pertaining to that very same lexeme...
Regarding the accents: we should really just ignore them IMHO. E.g., vremena, as an inflected form of vrijeme / vreme, being either vrȅmena, vremèna or vreménā, should appear one beneath the another, under the same ===Noun==== section. I personally am not particularly fond of the form-of entries at all, and see them as a mere soft redirects to lemma entries containing all the extensive information. Serbo-Croatian accentuation schemes are very complex, for nouns alone there are almost 300 morphological-accentological inflectional paradigms, and given that lots of words can have both multiple base accents and multiple paradigms, the feat of synching the manually updated lemma form and the accompanying declensional table with the previously-generated form-ofs doesn't seem to be worth the effort. Form-ofs should be kept for minimal maintenance. If someone feels particularly masochistic and would like to expand them with pronunciations and stuff - be my guest! I'm only interested in their basic generation for now.
There are many other issues to be considered, the most important one being collision for Ijekavian/Ekavian forms, because these two are intrinsically connected and often collide in paradigms (e.g. in the aforementioned vremena, and very often in verbs where infinitive stems display Ijekavian/Ekavian pairing, but present stems are the same). And I also plan to do validation of all the inflection tables against HJP/HML, as well as my own set of algorithms, maintain a database of the bot's work so that for the future runs it would do all the checking and generation of missing form-of's completely automatically... It needs some planning, but once it's up and running it's a no-brainer business. With God's help, I'll also put an XML web service interface so that you can log on to a special page with your Wiki credentials and make it generate entries for your newly-added lemmata before your eyes ^_^ Lots of plans I have...
I agree with your rant against MediaWiki. It's a very primitive piece of technology, written in a horrible language, badly designed with short-sighted goals in mind, almost completely non-extendable and generally reeking of ad-hoc type of "solutions" so prevalent in open source technologies. Its template language is among the ugliest abominations conceived by a human cognitive apparatus. It's Turing-complete, yet it lacks basic short-circuit evaluation. You can do just about any useless thing conceivable, other than elementary string processing, addition of which would reduce the number of our templates by at least an order of magnitude. People responsible for the design of this ill-thoughted monster should be publicly executed under the charges of incurring countless hours of deep mental suffering to everyone who has had his enthusiasm for this project crushed by utilizing its utter dysfunctionality.
You're right, it wouldn't be that hard to have MediaWiki completely replaced by a more productive editing environment. Implementing a decent lemmatizer alone would reduce the Wiktionary "dump" database size by some 10 times. Or at least setting up such a more productive editing environment elsewhere, initially only for some high-profile languages, being mirrored in real time by a dedicated bot here.. Someone needs to sit down and lose some two-digit number of hours of his short life to achieve this noble goal. --Ivan Štambuk 11:21, 24 February 2010 (UTC)Reply

There is a cleanup list continuously being generated at User:ŠtambukBot/Report, in case anybody is willing to help out.
Another thing, what about {{wikipedia}} ? So far there has been multiple usages of it in a single entry, for every separate wikipedia with different lang= code. We could either

  1. Enhance {{wikipedia}} to support multiple language codes
  2. Create a specialized template to handle bs/hr/sr/sh pedias

In the second case, it might be preferable to have language codes as unnamed consecutive parameters (in any other), in order to reduce typing. In default case when no code is provided, we could assume "link to all pedias". pagenames defaulting to {PAGENAME}, with explicit override parameters by name. Thoughts? --Ivan Štambuk 03:19, 9 March 2010 (UTC)Reply

Outrageous

edit

The sentence:
All the other L2 headers for Serbo-Croatian varieties (==Bosnian==, ==Croatian==, ==Serbian== and ==Montenegrin==) are obsoleted by L2 ==Serbo-Croatian==.
is outrageous! How is it possible to do such thing with several different (albeit very similar) languages which have different standard forms, regulated by different authorities? It's not much worse than, for example, unifying other mutually inteligible languages as Russian and Ukrainian or Swedish and Norwegian. By the way, Norway itself has two acknowledged languages, so why not lump both Bokmål and Nynorsk simply as "Norwegian" (although there are even two different Wikipedias for those)? Regards, Arny 20:27, 10 August 2010 (UTC)Reply

The sentence is accurate, as it ackowledges the ineptitude of the artificial, concocted and specious distinction between the varieties of this South Slavic language. For further information, you may want to consult this controversial vote which cast some light on the issue. Your juxtaposition of the Serbo-Croatian issue with Scandinavia and Russia/Ukraine is maladroit and misleading. The approptiate comparison is with the Austrian dialect of German, Flemish dialect of Dutch, Bornholm dialect of Danish and non-Europæan dialects of English (by the way, English is not regulated by a central body either). Please try to switch to a more linguistic approach, as shown by (the bulk of) discussions on yonder vote page (where it was demonstrated that the differences between US and Commonwealth English are by far more considerable than those between the Štokavian varieties of Serbo-Croatian), in lieu of that political cant. The uſer hight Bogorm converſation 21:08, 10 August 2010 (UTC)Reply
Are you suggesting that the speeches of Zagreb, Sarajevo, Belgrade and Podgorica are mutually unintelligible? If so, you're either completely ignorant of the topic, or blatantly lying. Either way, we're not interested in your personal opinion. Unless you have useful or otherwise productive objections to set out as regards the feasibility of the proposal, it is advisable to take your politicking and ideological pontificating elsewhere. --Ivan Štambuk 22:20, 10 August 2010 (UTC)Reply
You are right Arny, this is outrageous. Croats, leave Serbian language alone! --Pepsi Lite 01:03, 11 August 2010 (UTC)Reply
IMO, it's as ridiculous to have separate Serb and Croat entries of a word when they are identical as it is to have separate British and American entries when they are identical. We should say "Serbo-Croatian" for the one, and "English" for the other. Same for Hindi-Urdu/Hindustani, Malay/Malaysian-Indonesian, and Tagalog/Filipino. If, on the other hand, Croatian has a word which is not used in Serbian, then it is proper to tag it as "Croatian", just as we would "American English".
The real diversity, of course, is within Croatian, not between (Standard) Croatian and (Standard) Serbian. kwami 05:39, 9 October 2010 (UTC)Reply
You are right Kwami, it is ridiculous to have Croatian words identical to Serbian. I don't like it either. The solution is to remove the damage caused by Croatian Serbophiles like Ljudevit Gaj, Tomislav Maretić, Ivan Broz, etc.
Croatian language before the damage caused in 1835, 1850 differs from Serbian in the following ways:
  • Croatian alphabet did not have the letters: đ, ž, š, ć, č, lj, nj, dž.
  • Croatian language had different plural grammatical case endings in the 3rd (dative), 6th (instrumental) and 7th (locative) cases, which means every noun and adjective is different between the 2 languages. Below are some examples (there's many more):
    1. Masculine nouns zero ending in nominative singular
      • Serbian: šaranima, šaranima, šaranima
      • Croatian: šaranom, šarani, šaranijeh
    2. Neuter nouns ending with -o or -e in nominative singular
      • Serbian: selima, selima, selima
      • Croatian: selom, seli, selijeh
    3. Masculine or feminine nouns ending with -a in nominative singular
      • Serbian: ženama, ženama, ženama
      • Croatian: ženam, ženami, ženah
    4. Feminine nouns ending with in nominative singular
      • Serbian: stvarima, stvarima, stvarima
      • Croatian: stvarim, stvarmi, stvarih
    5. Pronouns (plural instrumental)
      • Serbian: nama, vama
      • Croatian: nami, vami
    6. Adjectives (plural dative, instrumental, locative) of all genders
      • Serbian: sam-im (-ima, -ijema); sam-im (-ima, -ijema); sam-im (-ima, -ijema)
      • Croatia: samim; samimi; samih
      • Serbian: naš-im (-ima, -ijema); naš-im (-ima, -ijema); naš-im (-ima, -ijema)
      • Croatian: naš-im; naš-imi; naš-ih
    7. Adjectives (plural dative, instrumental, locative) of all genders
      • Serbian: žut-im (-ima, -ijem, -ijema), vruć-im (-ima, -ijem, -ijema); žut-im (-ima, -ijem, -ijema), vruć-im (-ima, -ijem, -ijema); žut-im (-ima, -ijem, -ijema), vruć-im (-ima, -ijem, -ijema)
      • Croatian: žutim, vrućim; žutimi, vrućimi; žutih, vrućih
  • Croatian grammar has a third future tense that is equal to the future tense in the Russian language, but is non-existent in Serbian.
  • When two verbs have the same subject, Croatian language has to express the second verb as an infinitive, whereas Serbian can also use the sequence da + present tense.
  • Differces in spelling:
    • Croatian: bankir, bitost, bjelanjak, bolesničarka, cijelj, čišći, čižma, dalnji, šport, neovisnost,
    • Serbian: bankar, bitnost, bjelance, bolničarka, cilj, čistiji, čizma, daljni, sport, nezavisnost,
  • Differces in vocabulary:
    • Croatian: akoprem, bedast, djelokrug, dokinuti, domjenak, domovnica, dostatan, droptina, zrakomlat, rabiti, žurno
    • Serbian: premda, lud, područje, ukinuti, razgovor, zavičaj, dovoljan, mrva, helikopter, koristiti, hitno
  • Mr. Štambuk does not really believe that Croatian language is Serbian, he is only trolling for attention as described by Mr. Ullmann, and has many Croatian supporters on English Wikipedia for reasons described here.
  • Croats, leave Serbian language alone! --Pepsi Lite 02:15, 22 October 2010 (UTC)Reply

Who authorized this?

edit

This is not agreed upon.

 

and the content of the respective sections should be subsumed under the Serbo-Croatian section.

 

This is not ok.

WT:ASH is in this category Category:Wiktionary think tank policies.

 

"Think tank" policies are unofficial policies that have not been approved by a vote. For official policies, see Category:Wiktionary policies.

 


Regards, -- Bugoslav 15:59, 8 June 2011 (UTC)Reply


And this deletions also?

  • 2011-06-06T16:23:58 Krun (Talk | contribs) deleted "Category:Serbian declension-table templates" ‎ (no longer used; see Category:Serbo-Croatian templates)
  • 2011-06-06T16:23:44 Krun (Talk | contribs) deleted "Template:sr-decl-noun" ‎ (no longer used; see Template:sh-decl-noun)
  • 2011-06-04T00:31:33 Mglovesfun (Talk | contribs) deleted "Template:sr-noun" ‎ (no longer used (but see Template:sh-noun))
  • 2011-05-26T18:50:00 Mglovesfun (Talk | contribs) deleted "Template:sr-adj" ‎ (nonstandard form of Template:sh-adjective)

These too:

  • 2011-06-01T14:07:38 Mglovesfun (Talk | contribs) deleted "Category:Croatian neologisms" ‎ (no longer used (but see Category:Serbo-Croatian neologisms))
  • 2011-06-01T14:07:39 Mglovesfun (Talk | contribs) deleted "Category:Serbian neologisms" ‎ (no longer used (but see Category:Serbo-Croatian neologisms))
  • 2011-06-01T15:01:15 Mglovesfun (Talk | contribs) deleted "Category:sr:Sports" ‎ (no longer used (but see Category:sh:Sports))
  • 2011-06-01T15:01:23 Mglovesfun (Talk | contribs) deleted "Category:bs:Sports" ‎ (no longer used (but see Category:sh:Sports))
  • 2011-06-02T15:36:28 Mglovesfun (Talk | contribs) deleted "Category:sr:Months" ‎ (no longer used (but see Category:sh:Months))
  • 2011-06-02T15:36:55 Mglovesfun (Talk | contribs) deleted "Category:bs:Months" ‎ (no longer used (but see Category:sh:Months))
  • 2011-06-02T15:38:55 Mglovesfun (Talk | contribs) deleted "Category:bs:Time" ‎ (no longer used (but see Category:sh:Time))
  • 2011-06-04T00:31:33 Mglovesfun (Talk | contribs) deleted "Template:sr-noun" ‎ (no longer used (but see Template:sh-noun))
  • 2011-06-04T00:41:27 Mglovesfun (Talk | contribs) deleted "Template:bs-noun" ‎ (no longer used (but see Template:sh-noun))
  • 2011-06-04T12:00:56 Mglovesfun (Talk | contribs) deleted "Category:Serbian male given names" ‎ (no longer used (but see Category:Serbo-Croatian male given names))
  • 2011-06-04T12:19:37 Mglovesfun (Talk | contribs) deleted "Category:bs:Countries" ‎ (no longer used (but see Category:sh:Countries))
  • 2011-06-04T12:19:49 Mglovesfun (Talk | contribs) deleted "Category:sr:Countries" ‎ (no longer used (but see Category:sh:Countries))
  • 2011-06-05T13:13:01 Mglovesfun (Talk | contribs) deleted "Category:Serbian adjectives" ‎ (Empty category, but see Category:Serbo-Croatian adjectives)
  • 2011-06-05T13:19:22 Mglovesfun (Talk | contribs) deleted "Category:Serbian verb forms" ‎ (Empty category, but see Category:Serbo-Croatian verb forms)
  • 2011-06-05T13:28:40 Mglovesfun (Talk | contribs) deleted "Category:Serbian conjunctions" ‎ (Empty category, but see Category:Serbo-Croatian conjunctions)
  • 2011-06-05T13:31:01 Mglovesfun (Talk | contribs) deleted "Category:Serbian interjections" ‎ (Empty category, but see Category:Serbo-Croatian interjections)
  • 2011-06-05T13:31:56 Mglovesfun (Talk | contribs) deleted "Category:Serbian palindromes" ‎ (Empty category, but see Category:Serbo-Croatian palindromes)
  • 2011-06-05T20:55:23 Mglovesfun (Talk | contribs) deleted "Category:Croatian pronouns" ‎ (Empty category, but see Category:Serbo-Croatian pronouns)
  • 2011-06-06T00:30:50 Mglovesfun (Talk | contribs) deleted "Category:hr:Continents" ‎ (no longer used, but see Category:sh:Continents)
  • 2011-06-06T00:34:38 Mglovesfun (Talk | contribs) deleted "Category:hr:Islands" ‎ (no longer used, but see Category:sh:Islands)
  • 2011-06-06T12:29:11 Mglovesfun (Talk | contribs) deleted "Category:Bosnian palindromes" ‎ (no longer used (but see Category:Serbo-Croatian palindromes))
  • 2011-06-06T12:31:44 Mglovesfun (Talk | contribs) deleted "Category:Serbian colloquialisms" ‎ (no longer used (but see Category:Serbo-Croatian colloquialisms))
  • 2011-06-06T12:31:55 Mglovesfun (Talk | contribs) deleted "Category:Serbian vulgarities" ‎ (no longer used (but see Category:Serbo-Croatian vulgarities))
  • 2011-06-06T12:38:37 Mglovesfun (Talk | contribs) deleted "Category:hr:Dialectal" ‎ (--explanation of deletion--)
  • 2011-06-06T14:59:26 Mglovesfun (Talk | contribs) deleted "Category:Translation requests (Bosnian)" ‎ (please use Category:Translation requests (Serbo-Croatian))
  • 2011-06-06T14:59:29 Mglovesfun (Talk | contribs) deleted "Category:Translation requests (Serbian)" ‎ (please use Category:Translation requests (Serbo-Croatian))
  • 2011-06-06T14:59:31 Mglovesfun (Talk | contribs) deleted "Category:Translation requests (Croatian)" ‎ (please use Category:Translation requests (Serbo-Croatian))
  • 2011-06-06T23:54:10 Mglovesfun (Talk | contribs) deleted "Category:Croatian phrasebook" ‎ (Empty category, but see Category:Serbo-Croatian phrasebook)
  • 2011-06-06T23:54:49 Mglovesfun (Talk | contribs) deleted "Category:Croatian idioms" ‎ (Empty category, but see Category:Serbo-Croatian idioms)
  • 2011-06-06T23:56:14 Mglovesfun (Talk | contribs) deleted "Category:Croatian phrases" ‎ (Empty category, but see Category:Serbo-Croatian phrases)
  • 2011-06-08T00:52:14 Mglovesfun (Talk | contribs) deleted "Category:Translations to be checked (Bosnian)" ‎ (please use Category:Translations to be checked (Serbo-Croatian))
  • 2011-06-08T00:52:17 Mglovesfun (Talk | contribs) deleted "Category:Translations to be checked (Croatian)" ‎ (please use Category:Translations to be checked (Serbo-Croatian))
  • 2011-06-08T00:52:38 Mglovesfun (Talk | contribs) deleted "Category:Translations to be checked (Serbian)" ‎ (please use Category:Translations to be checked (Serbo-Croatian))

Who? -- Bugoslav 16:55, 8 June 2011 (UTC)Reply

It was sanctioned by de facto community consensus. --Ivan Štambuk 18:18, 8 June 2011 (UTC)Reply
'Authorized' is totally the wrong word here, makes us sound like the military! We're not, we're a group of volunteers. You could just as easily turn it round. Who authorized keeping these categories and template? Nobody, that's who. Mglovesfun (talk) 21:42, 30 December 2011 (UTC)Reply