Module talk:hi-noun

Latest comment: 2 years ago by Taimoorahmed11 in topic Width of template

Module creation discussion from Module talk:hi-decl/noun edit

@AryamanA, Atitarev Anatoli asked me to implement auto-declension of Hindi nouns but it looks like it's already partly implemented. What I've been working on is more general, supporting multiword expressions (e.g. adjective-noun expressions), singular-only and plural-only nouns, alternative declensions, overrides, footnotes, phonetic respelling and similar sorts of things; basically the same things that Module:uk-noun, Module:be-noun, Module:bg-nominal and Module:ru-noun already support. Some of these things may be less necessary for Hindi, which has much simpler declensions than the Slavic languages. The syntax would be essentially the same as is implemented by Module:uk-noun and Module:be-noun, but with the addition of phonetic respelling support. Is this work still needed? Once implemented, I was planning on running a bot to convert all the template-based declensions to the new {{hi-ndecl}}. Benwing2 (talk) 21:57, 9 August 2020 (UTC)Reply

@Benwing2: Oh that would be wonderful. This is a... skeleton of a module, and I don't have as much time to dedicate to working on anything more intensive than making new entries these days. If you're interested in working on it, I would be more than happy to let you take over! And from there, it would not be hard to extend it to Punjabi/Gujarati/Marathi/Bengali/other Indo-Aryan languages with similar systems. —AryamanA (मुझसे बात करेंयोगदान) 22:29, 9 August 2020 (UTC)Reply
@Benwing2, AryamanA: Thanks, guys. @AryamanA, is there a native way to force the shwa to NOT be dropped as वास्तुकला (vāstukalā) (vāstukalā). I understand that "a" is not silent here for etymological reasons - वास्तु (vāstu) + कला (kalā). If there is no native way to do that, would an additional (invisible) symbol after क be appropriate (to force "ka", not "k") or should we use a manual transliteration in such cases? For @Benwing2, it's probably the same effort but I think we should really use native ways as much as possible, it will also be helpful for the pronunciation module. BTW, Burmese modules (by Wyang) use a bunch of tricks (symbols) where native methods don't work. --Anatoli T. (обсудить/вклад) 23:01, 9 August 2020 (UTC)Reply
@Benwing2, Atitarev: None that I am aware of (and thus none that would be familiar to any Hindi speaking/learning Wiktionary user). One idea: how about allowing an indicator for syllable breaks in the respelling or something like that? I think that would resolve most, if not all, of the schwa dropping problems in declensions. could be used for that, and it's native, so e.g. वास्तु॰कला. —AryamanA (मुझसे बात करेंयोगदान) 03:06, 10 August 2020 (UTC)Reply
@Atitarev, AryamanA Sounds good to me. Can you maybe fix Module:hi-translit to respect this symbol and not drop schwa after it? If that isn't the right thing to do, I'll look into implementing separate handling of this symbol for respelling purposes. Benwing2 (talk) 03:10, 10 August 2020 (UTC)Reply
@Benwing2: It already works: वास्तु॰कलाएँ (vāstu.kalāẽ)AryamanA (मुझसे बात करेंयोगदान) 03:12, 10 August 2020 (UTC)Reply
@AryamanA Hmm, OK, it adds an extra dot but I can take that out when generating the respelling translit. Benwing2 (talk) 03:13, 10 August 2020 (UTC)Reply
@Benwing2: Yeah so it is used as an abbreviation marker generally, but I have seen it in dictionaries to indicate morpheme boundaries. So yes, in respelling maybe it can be removed especially. —AryamanA (मुझसे बात करेंयोगदान) 03:14, 10 August 2020 (UTC)Reply
(edit conflict)@Benwing2, AryamanA: Using seems interesting and it actually helps with the etymologies but should it be suppressed (be invisible) for our purposes, so that the respelling वास्तु॰कला (vāstu.kalā) is displayed as वास्तुकला and transliterated as "vāstukalā", so that is only used as a hint for the module? --Anatoli T. (обсудить/вклад) 03:18, 10 August 2020 (UTC)Reply

Module creation edit

I created this module as an anchor for discussions, as it will in time become the full Hindi declension module. Benwing2 (talk) 00:50, 10 August 2020 (UTC)Reply

Independent variants of various declensions edit

@Atitarev, AryamanA डोई (ḍoī) is a unique declension in spelling but not pronunciation, due to the difference in Devanagari between independent and diacritic variants of vowels. It would be helpful for me if either of you could point out more examples of independent variants of the various declensions, and in particular create declensions for these terms using {{hi-decl-noun}}, as is done on डोई (ḍoī). An example with independent masculine is कुँआ (kũā), which is currently a hard redirect to कुआँ (kuā̃) but should be fixed. Benwing2 (talk) 00:50, 10 August 2020 (UTC)Reply

@Benwing2, AryamanA: Yes, these inflections are logical (expected), despite the spellings. I will look for more words ending in these independent long vowels (ā), (ī), (ū).
I also want to know if there are nouns ending in a visarga () or more likely अः (aḥ) and how they are inflected. --Anatoli T. (обсудить/вклад) 02:47, 10 August 2020 (UTC)Reply
@Atitarev, Benwing2: I added the one you requested, and yes these behave exactly as the vowel matra ones, just taking the full vowel sign instead. I believe the only common noun ending in visarga is प्रातः (prātaḥ), checking for search results right now. —AryamanA (मुझसे बात करेंयोगदान) 02:53, 10 August 2020 (UTC)Reply
@AryamanA, Benwing2: Thanks! It's always better and faster when a native speaker is helping. --Anatoli T. (обсудить/вклад) 03:00, 10 August 2020 (UTC)Reply
@Atitarev, Benwing2: Visarga is dropped when an ending is added. I can't even find any results keeping visarga. प्रातों (prātõ) (obl. pl.), प्रातो (prāto) (voc. pl.). —AryamanA (मुझसे बात करेंयोगदान) 03:03, 10 August 2020 (UTC)Reply
@AryamanA, Benwing2: Actually, प्रातः (prātaḥ) is an adverb. Is it declined? --Anatoli T. (обсудить/вклад) 03:38, 10 August 2020 (UTC)Reply
@Atitarev, Benwing2: My bad, added the noun sense. Adverbs are not declined. —AryamanA (मुझसे बात करेंयोगदान) 06:50, 10 August 2020 (UTC)Reply
@AryamanA, Benwing2: I've added manual declensions for अगाऊ (agāū) (final independent vowel ū case) and प्रातः (prātaḥ) (final visarga case). @AryamanA, please check both when you have a chance. --Anatoli T. (обсудить/вклад) 07:31, 10 August 2020 (UTC)Reply

Words requiring manual translit edit

@Atitarev, AryamanA Some words requiring manual translit:

How would these be written using phonetic respelling? Benwing2 (talk) 02:06, 10 August 2020 (UTC)Reply

@Benwing2, AryamanA: Here you are (2nd param). I couldn't do ALL. I can force the dropping of certain shwa by inserting a virama, which fixes the following syllable but I can't tell it to pronounce it when needed, not sure if there is a native way. Sanskrit, unlike Hindi pronounces all inherent "a", unless they are killed with virama (haland) --Anatoli T. (обсудить/вклад) 02:25, 10 August 2020 (UTC)Reply
@Atitarev If there's no native way to force a pronounced schwa, we could do it using a special symbol ala User:Wyang, like * to force a schwa. I'm not sure what symbols Wyang uses for which purposes in Burmese. Benwing2 (talk) 02:34, 10 August 2020 (UTC)Reply
@Benwing2, AryamanA: Any of the symbols will do, IMO - *, ' (apostrophe), ^, /, as long as everyone's happy. --Anatoli T. (обсудить/вклад) 02:39, 10 August 2020 (UTC)Reply
@Benwing2, Atitarev: I like * since it probably won't be used anywhere else. I don't know of any Devanagari symbol for forcing the schwa in Hindi (although Marathi has repurposed the anusvara for that). —AryamanA (मुझसे बात करेंयोगदान) 02:44, 10 August 2020 (UTC)Reply
@AryamanA, Benwing2: I am OK with * as well, thanks. --Anatoli T. (обсудить/вклад) 02:49, 10 August 2020 (UTC)Reply

Adjectival nouns edit

@Atitarev, AryamanA Do there exist any nouns that are declined like adjectives? Benwing2 (talk) 04:23, 10 August 2020 (UTC)Reply

@Benwing2, AryamanA: Adjectival nouns behave normally, like other nouns, e.g. विदेशी (videśī) and most adjectives behave like nouns, AFAIK (to be checked). Pronouns and determiners have different declensions. --Anatoli T. (обсудить/вклад) 04:56, 10 August 2020 (UTC)Reply
@Atitarev See w:Hindustani grammar, where declinable adjectives appear to behave differently from nouns. विदेशी (videśī) is indeclinable as an adjective, but apparently declines as a noun in when a noun. BTW the declension for विदेशी (videśī) is wrong; it's masculine but using a feminine declension. Benwing2 (talk) 05:15, 10 August 2020 (UTC)Reply
@Benwing2: Thanks. Yes, you're right. --Anatoli T. (обсудить/вклад) 05:37, 10 August 2020 (UTC)Reply
@Benwing2: Yeah, adjectives have a different declensional system. Only certain ones ending in -ā (declinability for -ā has to be specified manually; that's why videśī is indeclinable) are declined, and in that case it's -ā for masculine singular nominative, -e for masculine everything else, -ī for feminine singular, and -ī̃ for feminine plural. It's not like the noun declensions. {{hi-adj-auto}} handles it already as I mentioned below. —AryamanA (मुझसे बात करेंयोगदान) 06:48, 10 August 2020 (UTC)Reply

Adding {{rfinfl}} edit

I am thinking of doing a run to add {{rfinfl}} to nouns that are missing their declension. Any objections? Also are there any indeclinable, plural-only or adjectivally-declined nouns in Hindi? Any nouns in Hindi with alternative declensions? (I guess तारीख़ (tārīx) is an example in the plural. Any others?) Benwing2 (talk) 04:25, 10 August 2020 (UTC)Reply

@Benwing2, AryamanA: If I'm not mistaken, there are no indeclinable nouns, unless they are singularia tantum. As for adjectives, there are some indeclinable adjectives, e.g. ख़राब (xarāb, bad). --Anatoli T. (обсудить/вклад) 05:03, 10 August 2020 (UTC)Reply
@Atitarev, Benwing2: I support doing that! There are definitely some plural only e.g. compounds like बाल-बच्चे (bāl-bacce). Thankfully for us, only the second element is declined in these. As for adjectival compounds (do you mean adjective + noun?), they can happen in theory but I can't say I know of any entries for any so far, nor can I think of one off the top of my head. It would have to be something idiomatic I suppose. BTW, {{hi-adj-auto}} exists. —AryamanA (मुझसे बात करेंयोगदान) 06:44, 10 August 2020 (UTC)Reply
@Atitarev, AryamanA I'm surprised there are no adjective-noun compounds made up of declined adjectives, since they are common. What I mean by adjectivally-declined nouns are nouns that are declined according to adjectival declensions, e.g. a noun declined like barā "big", with in the direct singular and -e in all other forms. Benwing2 (talk) 02:20, 11 August 2020 (UTC)Reply
@Atitarev, AryamanA I am ready to run my script to add {{rfinfl}}. A question, however: should we add {{rfinfl}} to nuqtaless forms? Benwing2 (talk) 02:21, 11 August 2020 (UTC)Reply
@Benwing2, Atitarev: I think that would be okay, seeing as they are common, regular orthographic variants. —AryamanA (मुझसे बात करेंयोगदान) 02:25, 11 August 2020 (UTC)Reply
@Benwing2: And no, there is nothing like the latter. बड़ा (baṛā, grownup, adult), for example, can be nominalized, but then it behaves as a noun would decline. —AryamanA (मुझसे बात करेंयोगदान) 02:26, 11 August 2020 (UTC)Reply
@Benwing2, AryamanA: I think nuqtaless forms would require respellings/transliterations if we add inflections to them. --Anatoli T. (обсудить/вклад) 04:10, 11 August 2020 (UTC)Reply
@Benwing2, AryamanA Thanks for adding {{rfinfl|hi|noun}}. Now this category Category:Requests for inflections in Hindi noun entries is very large. @Benwing2, Is it probably worth waiting for automated template? That way filling the requests will be easier and we may find some cases not yet covered by the inflection module? --Anatoli T. (обсудить/вклад) 00:01, 12 August 2020 (UTC)Reply
@Atitarev Yes, I will maybe have a version of the module ready tonight, if not probably tomorrow night. A lot of the declensions can probably be filled in automatically using the gender in the headword. There are a few endings that will have to be handled manually, e.g. masculines in and feminines in -iyā. Benwing2 (talk) 00:14, 12 August 2020 (UTC)Reply

Module is almost ready edit

@Atitarev, AryamanA See User:Benwing2/test-hi-ndecl. Pluralia tantum don't work yet. Please check the declensions and let me know if anything is wrong, thanks! Benwing2 (talk) 04:21, 12 August 2020 (UTC)Reply

@Benwing2: All perfect! Thanks, this looks great! —AryamanA (मुझसे बात करेंयोगदान) 04:46, 12 August 2020 (UTC)Reply
@Atitarev, AryamanA OK, I pushed it live. You can see examples so far at दुनिया (duniyā) and अजनबी (ajanbī), the latter with respelling. Benwing2 (talk) 05:00, 12 August 2020 (UTC)Reply
@Benwing2, AryamanA: Exciting! Great work, @Benwing2! --Anatoli T. (обсудить/вклад) 05:10, 12 August 2020 (UTC)Reply
@Benwing2, AryamanA: How should terms requiring respellings be categorised, e.g. अजनबी (ajnabī), if they should? Category:Hindi terms with irregular pronunciations, Category:Hindi terms with respellings, something else? --Anatoli T. (обсудить/вклад) 05:17, 12 August 2020 (UTC)Reply
(edit conflict) @Benwing2, Atitarev: Nice!!! I added an alternative declension to दुनिया that gets plenty of google hits, and it works great. —AryamanA (मुझसे बात करेंयोगदान) 05:19, 12 August 2020 (UTC)Reply
@Atitarev, AryamanA Just added Category:Hindi nouns with phonetic respelling. If you prefer a different category name, let me know. Benwing2 (talk) 05:20, 12 August 2020 (UTC)Reply
BTW, next thing is to implement the catboiler for the various categories being added. Benwing2 (talk) 05:21, 12 August 2020 (UTC)Reply
@Benwing2, AryamanA: Thank you. And I think you're going to apply a similar logic to the headword later on, like you did so well with Slavic languages, right? @AryamanA, what do we want to see in the noun headword? E.g. plural forms, if different from lemma or in any case? Maybe all three forms, which may be different from lemma? --Anatoli T. (обсудить/вклад) 05:37, 12 August 2020 (UTC)Reply
@Atitarev, Benwing2: I'm not sure adding any inflection to the headword is necessary, given that we haven't standardized that on South Asian languages, and either way it appears to be a stylistic choice... don't want to have too much redundancy with the table. But yes, would love respellings in the headword. —AryamanA (मुझसे बात करेंयोगदान) 05:44, 12 August 2020 (UTC)Reply
@Atitarev, AryamanA: OK, the respellings can be added by bot once we get them working on the declension tables. The catboiler works now; {{hi-noun cat}} with no arguments, or (for one-off categories) with one argument consisting of arbitrary text that will be prepended with "This category contains Hindi nouns ". The table-of-contents template is automatically added. Benwing2 (talk) 05:49, 12 August 2020 (UTC)Reply

Nouns with manual translit edit

See Special:WhatLinksHere/Template:tracking/hi-headword/manual-translit/nouns. Benwing2 (talk) 07:51, 12 August 2020 (UTC)Reply

@Atitarev, AryamanA Benwing2 (talk) 07:51, 12 August 2020 (UTC)Reply
@Benwing2, AryamanA Thanks. I have fixed what's fixable. Left where it's required for now in the headword. Let's also review our forcing of shwa method. "॰" is definitely not going to work on प्रजनन. I suggest to use an invisible * (in Devanagari or translit) e.g. प्रज*नन to get "prajanan". This could fix the inflection table. --Anatoli T. (обсудить/вклад) 12:37, 12 August 2020 (UTC)Reply
@Atitarev, Benwing2: It does work, you have to place it at the morpheme boundary: प्र॰जनन. —AryamanA (मुझसे बात करेंयोगदान) 14:27, 12 August 2020 (UTC)Reply

Vocative plural of nasal-stem nouns edit

(moved from Talk:रेस्तराँ)

@AryamanA, Benwing2 How is the word declined? --Anatoli T. (обсудить/вклад) 13:56, 16 August 2020 (UTC)Reply

@Benwing2, Atitarev: It should be like <M.unmarked> and with the nasal shifting to the end. So obl. pl. is रेस्तराओं (restarāõ), voc. pl. is रेस्तराओ (restarāo). Module doesn't support nasalized M.unmarked yet though. —AryamanA (मुझसे बात करेंयोगदान) 14:52, 16 August 2020 (UTC)Reply
@AryamanA, Benwing2: Thanks. I don't quite understand what "unmarked" is - no gender? --Anatoli T. (обсудить/вклад) 00:40, 17 August 2020 (UTC)Reply
@Atitarev, AryamanA "Unmarked" is Wikipedia's term for declensions where the endings are added onto the direct singular stem without a stem ending that is removed. A question though about vocative plural ... I have assumed in words with nasalized direct singular that the vocative plural is also nasalized. If this is not true, I need to fix the module. Benwing2 (talk) 01:26, 17 August 2020 (UTC)Reply
@Benwing2, Atitarev: You know, now I am less sure. The vocative and oblique have generally been merged in speech; I personally do not distinguish them. Standard written form does. The problem is there aren't (m)any words with nasal endings that can refer to people, so finding vocative plural attestations is difficult... —AryamanA (मुझसे बात करेंयोगदान) 02:28, 17 August 2020 (UTC)Reply
@Atitarev, AryamanA There is at least ख़ानसामाँ (xānsāmā̃, cook, butler). Benwing2 (talk) 02:35, 17 August 2020 (UTC)Reply

Issues of various sorts edit

@Atitarev, AryamanA I went through all the cases of various sorts needing phonetic respelling. I came across a bunch of issues, which I've enumerated here. Aryaman, if you have a chance, could you take a look at them? It's enough just to fill in comments to the right of each issue. Thanks!

  • रिआया: plurale tantum? - AT: sg tantum, plural meaning
  • कारख़ाना, कारखाना: regular or unmarked? guessing regular A: regular
  • क़िला, किला: regular or unmarked? guessing regular A: regular
  • ज़हर: should the translit be zahar not zahr? I changed it this way. AT: zahar, that's the only way for CaCaCa when there's no virama A: agree
  • ज्यादा, ज़्यादा: can this adj be declined? A: nope
  • दवाख़ाना: regular or unmarked? guessing regular A: regular
  • टॅकनोलॉजी: should this really be transcribed ṭĕkanolŏjī? not ṭĕknolŏjī? AT: ṭĕknolŏjī A: agreed
  • नेत्र: is tr=netr correct? pronunciation disagrees with translit. AT: netr A: I think netră: AT response - it depends how we decide on it, please see Module_talk:hi-translit#a_after_non-native_consonant_clusters "pātr"/"mitr" examples. It's a matter of choice, both for translit.
  • पारंपरिक: should this be tr=pāramparik? AT: yes, per McGregor, probably unpredictably A: yes it should be
  • असमीकरण: should this be tr=asamīkaraṇ not tr=asamīkraṇ? AT: asamīkaraṇ , also समीकरण samīkaraṇ (McGregor), both unpredictably A: agreed
  • ऊष्मीकरण: should this be tr=ūṣmīkaraṇ? AT: ūṣmīkaraṇ, it seems a bunch of Sanskrit derivations are pronounced that way, possibly analysed as separate morphemes A: agreed, and yes, they're separate morphemes
  • भूमंडलीय ऊष्मीकरण: should this be tr=bhūmaṇḍalīya ūṣmīkaraṇ not tr=bhūmaṇḍalīya ūṣmīkraṇ? AT: the former per above A: agreed
  • तटस्थीकरण: should this be tr=taṭasthīkaraṇ? AT: ditto A: agreed
  • ध्रुवीकरण: should this be tr=dhruvīkaraṇ? AT: ditto A: agreed
  • निरस्त्रीकरण: should this be tr=nirastrīkaraṇ? AT: ditto A: agreed
  • प्रत्यक्षीकरण: should this be tr=pratyakṣīkaraṇ? AT: ditto A: agreed
  • मशीनीकरण: should this be tr=maśīnīkaraṇ? AT: ditto A: agreed
  • वर्गीकरण: should this be tr=vargīkaraṇ? AT: ditto A: agreed
  • संक्षिप्तीकरण: should this be tr=saṅkṣiptīkaraṇ? AT: ditto A: agreed
  • प्रांगार द्विजारेय: m or f? and is |tr=prāṅgār dvijārey correct? A: yes
  • प्रायापिज़्म: m or f? and is |tr=prāyāpizm correct? AT: m, correct, -ism often spelled without a virama and nuqta, resulting in -ijm/-izm, weakened shwa in pronunciation after "m" A: definitely no weakened schwa, but -izam is an acceptable pronunciation of the cluster: AT: response the virama between would contradict -izam/-ijam
  • प्रिय: is the rendered pronunciation /pɾiː.jᵊ/ correct? A: yes
  • फ़ख़्र: is tr=faxr correct? AT: yes A: agreed
  • फेफडे: should this be deleted since we have फेफड़ा? A: probably...
  • बीज गणित: m or f? and is |tr=bīja gaṇita correct? A: m, and no, fill fix
  • बौद्ध धर्म: auto translit is bauddh dharma, presumably should actually be bauddh dharm, the correct translit AT: yes A: agreed
  • हिन्दू धर्म: same issue as with बौद्ध धर्म A: agreed
  • भिन्नता: is manual |tr=bhinntā correct? AT: should be bhinnatā (bhinnătā in McGregor) but we have no rule on weakened shwa A: I think bhinntā is good, nn clusters and other geminates don't cause weakened schwas: AT response: McGregor is wrong on "bhinnătā"?
  • भौर: m or f? and is |tr=bhaura correct? AT: should be bhaur, gender? A: not a word, भोर (bhor) is the right spelling
  • मंडलवक: m or f? and is |tr=maṇḍalavak correct? A: yes, m
  • मदहोश: is tr=madahośa correct? also the definitions are redundant. AT: madhoś per McGregor A: agreed
  • मायने: plurale tantum needing phonetic respelling (note to self)
  • युरेनस: I have deleted tr=yūrainas as wrong, please verify, thanks! Also what is the gender? AT: new translit is correct A: agreed
  • योनि सेक्स: I have deleted |tr=yonī seks as wrong, please verify, thanks! Also what is the gender? A: m I think
  • राजनीतिज्ञ: auto tr=rājnītigy with final -gy is correct? A: should have weakened schwa imo
  • वलयिन: m or f? and is |tr=valayin correct? A: no idea to both, never heard this word in my life
  • शैतानवादी: I have deleted |tr=śaitanavadi as wrong, please verify, thanks! Also what is the gender? AT: correct, m A: agreed
  • श्वेता: I have deleted |tr=śveta as wrong. Is the noun श्वेता with meaning "white" correct? Should it be "whiteness" or "the color white", or an adjective? A: श्वेत (śvet) is the colour (but सफ़ेद (safed) is way more common)
  • संयोजन: I have deleted |tr=saṅyojan as wrong, please verify. AT: yes A: yes
  • संवेदना: I have deleted |tr=samvednā, please verify. AT: yes A: yes
  • समाचारपत्र: Please verify tr=samācārpatr, thanks! A: fixed
  • सर्वनाम: Pronunciation sarvanām disagrees with translit sarvnām. AT: Snell gives sarvanām but I am not sure, see व्यवसाय (vyavasāy) - dictionary vs AryamanA's pronunciation at Module_talk:hi-translit#a_after_non-native_consonant_clusters
    • A: Appears both are reasonable (but sarvănām with weakened schwa), I prefer the latter.
  • साँई, सांई: Auto translit sānī is probably wrong. When fixed, remove manual translit from सांई. A: f
  • सिंह: Is auto translit siṅh correct? AT: correct per McGregor but he may transliterate chandrabindu and anusvara differently A: siṅh is correct yes, it's pronounced siṅgh universally
  • सैकता: m or f? and is |tr=saikatā correct? AT: should be "saiktā" IMO A: fixed
  • सोणिये: This is messed up; not nominative form.
  • सौंदर्यशास्त्र: Is auto translit tr=saũdaryaśāstr correct?
  • हीमप्रपात: m or f? and is |tr=hīmprapāt correct? AT: correct
  • हुस्न: Pronunciation husna disagrees with translit husn. AT: I think husn (husn in McGregor) is fine but we don't have weakened shwa rule
Benwing2 (talk) 08:05, 18 August 2020 (UTC)Reply
@Benwing2, AryamanA: I had a go on the questions. --Anatoli T. (обсудить/вклад) 10:21, 18 August 2020 (UTC)Reply
I tried to answer some too. —AryamanA (मुझसे बात करेंयोगदान) 18:06, 18 August 2020 (UTC)Reply
@Benwing2, AryamanA: I responded again to some by A. Let's decide how we do मित्र "mitr" or "mitra" and similar. Both ways are possible in different dictionaries. --Anatoli T. (обсудить/вклад)

alternatives in -īe/-ie edit

(moved from Talk:रसोईया)

@Benwing2 In Hindi, īye/iye also have the orthographical variant of īe/ie. Could the module support that automatically? So like, obl. sg. can also be रसोईए (rasoīe). —AryamanA (मुझसे बात करेंयोगदान) 15:32, 17 August 2020 (UTC)Reply

@AryamanA I should be able to implement this in a day or so. Benwing2 (talk) 08:09, 18 August 2020 (UTC)Reply
I mean to say, I will get to it in a day or so (not that it will take me a whole day to implement ...). Benwing2 (talk) 08:10, 18 August 2020 (UTC)Reply
@AryamanA This is done; apologies for the delay. Benwing2 (talk) 05:19, 27 August 2020 (UTC)Reply
@Benwing2: Awesome, and no problem! —AryamanA (मुझसे बात करेंयोगदान) 15:03, 27 August 2020 (UTC)Reply

bot run to add declensions edit

@Atitarev, AryamanA I am thinking of doing a bot run at some point to add declensions to nouns. If there's no manual translit and the gender is in the headword, it should be possible to auto-add declensions to most types of nouns, I think. The only exceptions I know of are masculine in (which can be regular or unmarked) and feminines in -iyā (which can use the declension or -iyā declension). Any others? Benwing2 (talk) 08:27, 18 August 2020 (UTC)Reply

@AryamanA I have written the script to do this, just waiting for confirmation that I haven't missed any cases (other than the ones mentioned above) that need to be skipped. Benwing2 (talk) 03:59, 24 August 2020 (UTC)Reply
@Benwing2: I would also say skip the pluralia tantum for now. But besides that, I think it's good to go! You didn't miss any other cases. —AryamanA (मुझसे बात करेंयोगदान) 04:01, 24 August 2020 (UTC)Reply
@AryamanA, Atitarev The bot run is almost done. When it's done, it will have added declensions to 2,383 nouns and left 726 alone, for one of a number of reasons:
  • 279 WARNING: Won't add declension: Masculine head ends in -ā or -ā̃, needs manual evaluation:
  • 226 WARNING: Won't add declension: No gender:
  • 167 WARNING: Won't add declension: Space in headword:
  • 23 WARNING: Won't add declension: Gender m-p unrecognized or required manual evaluation:
  • 9 WARNING: Won't add declension: Explicit headword doesn't agree with pagetitle:
  • 8 WARNING: Won't add declension: Gender f-p unrecognized or required manual evaluation:
  • 6 WARNING: Won't add declension: Saw manual translit:
  • 5 WARNING: Won't add declension: Feminine head ends in -iyā or -iyā̃ needs manual evaluation:
  • 1 WARNING: Won't add declension: Saw extra gender g2=f-p:
  • 1 WARNING: Won't add declension: Gender fm unrecognized or required manual evaluation:
  • 1 WARNING: Won't add declension: Gender ? unrecognized or required manual evaluation:
Benwing2 (talk) 07:59, 24 August 2020 (UTC)Reply

irregular plural ending in a vowel edit

(moved from Talk:अमीर)

@Benwing2 Some problems here with vowel-ending plural stem, it should add an independent vowel to the end:

I've fixed it manually in the entry. —AryamanA (मुझसे बात करेंयोगदान) 20:37, 12 August 2020 (UTC)Reply

@AryamanA Apologies, I missed this. I'll see if I can't get to it along with the other issues you've pointed out. Benwing2 (talk) 08:39, 18 August 2020 (UTC)Reply
@AryamanA This is fixed. Benwing2 (talk) 03:37, 2 September 2020 (UTC)Reply

plurale tantum nouns edit

@AryamanA, Atitarev I am going to add support for plurale tantum nouns to this module, but I'm not sure how they are declined. Here are various examples:

  • अंतरिक्ष: "outer space", "the heavens"; masculine plurale tantum; is current declension correct?
  • अत्फ़ाल: "children"; masculine; should this be plurale tantum? How declined?
  • जटा: "dreadlocks"; feminine; should this be plurale tantum? how declined?
  • औलाद: "offspring", "children"; feminine plurale tantum; how declined?
  • अवाम: "the public"; masculine plurale tantum; how declined?
  • अहबाब: "friends", "companions"; masculine plurale tantum; how declined?
  • उमरा: "people of high rank"; masculine plurale tantum; how declined?
  • एशियाई खेल: "Asian Games"; masculine plurale tantum; how declined?
  • ऐनक: "glasses, spectacles"; feminine plurale tantum; how declined?
  • काग़ज़ात: "documents, papers"; masculine plurale tantum (is gender correct?); how declined?
  • काली-पीली: "taxicabs, taxis"; feminine plurale tantum; how declined?
  • किंतु-परंतु: "ifs and buts"; masculine plurale tantum; how declined?
  • किमी: "km" (??); masculine plurale tantum; how declined?
  • ख़ुशख़बरी: "good news"; feminine plurale tantum; how declined?
  • गाली-गलौज: "abuse and cursing"; feminine plurale tantum; how declined?
  • गोरू:"cattle"; masculine plurale tantum; how declined?
  • छोले भटूरे: "chhole bhature"; masculine plurale tantum; how declined?
  • जज़्बात: "feelings, emotions"; masculine plurale tantum (is gender correct?); how declined?
  • प्रजा: "subjects", "people"; feminine plurale tantum; how declined?
  • फ़ीस: "fees"; feminine plurale tantum; how declined?
  • भाई-बहन: "siblings"; masculine plurale tantum; how declined?
  • माँ-बाप: "parents"; masculine plurale tantum; how declined?
  • माता-पिता: "parents"; masculine plurale tantum; how declined?
  • मायने: "meanings, nuances"; masculine plurale tantum; how declined?
  • मुजाहिदीन: "mujahideen"; masculine plurale tantum; how declined?

This is not all the existing pluralia tantum but is a good sample of them and should cover the various cases. Some examples may be redundant to others. Thanks! Benwing2 (talk) 06:07, 19 August 2020 (UTC)Reply

@Benwing2, AryamanA: I can't answer many questions but it seems with words like माता-पिता (mātā-pitā), only the last part is inflected, so inflects like plural of पिता (pitā).
The inflections of the words are according to their gender, only in plural. E.g. in अहबाब (ahbāb), the plural column is valid. Let's wait for @AryamanA to confirm. --Anatoli T. (обсудить/вклад) 12:46, 19 August 2020 (UTC)Reply
@Benwing2, Atitarev: Anatoli is right. Some of these are actually not plurale tantum, so I will be fixing them. —AryamanA (मुझसे बात करेंयोगदान) 13:53, 20 August 2020 (UTC)Reply
Okay so @Benwing2 I suppose the only issue is feminine pluralia tantum that don't easy fit into a declension class, like औलाद (aulād). This has obl. pl. औलादों (aulādõ), voc. pl. औलादो (aulādo) (so it's behaving more like a masculine noun). —AryamanA (मुझसे बात करेंयोगदान) 13:56, 20 August 2020 (UTC)Reply
Masculine pluralia tantum like अवाम (avām) are the same. But English derived plurals like फ़ीस (fīs) simply don't decline. —AryamanA (मुझसे बात करेंयोगदान) 13:59, 20 August 2020 (UTC)Reply
मायने (māyne) has obl. pl. मायनों (māynõ), voc. pl. मायनो (māyno). These are all quite irregular, maybe it should all just be specified manually. —AryamanA (मुझसे बात करेंयोगदान) 13:59, 20 August 2020 (UTC)Reply
@AryamanA, Atitarev How are dvandva compounds declined? Examples:
Benwing2 (talk) 06:46, 28 August 2020 (UTC)Reply
@Benwing2, Atitarev: For nouns, generally only the second element declines, but there exist variants where both decline. I'm not sure how common it is for both to decline, it may depend on the term. E.g. in my experience, माता-पिता (mātā-pitā) only the second element declines, but लेखा-जोखा (lekhā-jokhā) it seems both declining is more common (confirmed by Google hits).
For adjectives, both parts decline always.
For verbs, both parts conjugate always. —AryamanA (मुझसे बात करेंयोगदान) 18:22, 28 August 2020 (UTC)Reply
@AryamanA Thank you! You've answered many of my questions, but left some unanswered:
  1. How does माता-पिता (mātā-pitā) decline? Does पिता (pitā) decline as a singular or plural noun?
  2. How does लेखा-जोखा (lekhā-jokhā) decline? Do लेखा (lekhā) and जोखा (jokhā) decline as singulars or plurals, and as regular or unmarked?
  3. How exactly do काली-पीली (kālī-pīlī) and किंतु-परंतु (kintu-parantu) decline?
  4. For which other words in my lists above (besides लेखा-जोखा (lekhā-jokhā)) do both words decline?
  5. Is अस्त्र-शस्त्र (astra-śastra) a plurale tantum? If so, how does it decline? As a singular or plural noun?
  6. How do compound tenses of dvandva verbs like हिलना-डुलना (hilnā-ḍulnā) conjugate? Can you give some examples?
Thanks! Benwing2 (talk) 05:28, 29 August 2020 (UTC)Reply
@AryamanA, Benwing2: I am following all the discussions but I have exceeded my poor knowledge on many recent questions and I am not able to find more good references. I will join in on the conversion to the new templates, once most of the issues are resolved. --Anatoli T. (обсудить/вклад) 07:03, 29 August 2020 (UTC)Reply
@Benwing2, Atitarev: My apologies, I was not thorough enough in answering these.
  1. This I hadn't thought of before. माता-पिता (mātā-pitā), only पिता (pitā) declines, and as a plural only. Basically, the paradigm is the exact same as पिता (pitā) just with the undeclined माता (mātā) prepended. The compound is always treated as a plural, e.g. an adjective modifying it would always take on masculine plural forms.
  2. लेखा-जोखा (lekhā-jokhā), on the other hand, declines as any other countable noun, both singular and plural. Only the second element is declined, so we have obl./voc. sg./dir. pl. लेखा-जोखे (lekhā-jokhe), obl. pl. लेखा-जोखों (lekhā-jokhõ), voc. pl. लेखा-जोखो (lekhā-jokho).
  3. काली-पीली (kālī-pīlī) is not used in my dialect, but I think it would be indeclinable plural-only. I guess it's an adjective that has been nominalized. —AryamanA (मुझसे बात करेंयोगदान) 15:19, 29 August 2020 (UTC)Reply
  4. किंतु-परंतु (kintu-parantu) is again declined the same way माता-पिता (mātā-pitā) is, only second element declines and follows the regular forms of a u-stem masculine noun.
  5. अस्त्र-शस्त्र (astra-śastra) is same again, plural-only, only second element declines.
  6. हिलना-डुलना (hilnā-ḍulnā) both elements conjugate as normal, replacing the slot that a single verb would take. E.g. oblique infinitive is हिलने-डुलने (hilne-ḍulne), past progressive masculine singular is हिल-डुल रहा था (hil-ḍul rahā thā).
AryamanA (मुझसे बात करेंयोगदान) 15:19, 29 August 2020 (UTC)Reply
@AryamanA, Atitarev I added declensions to almost all the pluralia tantum. I got stuck on नैना (dialectal) and ख़ुशख़बरी. Benwing2 (talk) 03:15, 2 September 2020 (UTC)Reply
Also भाई-बहन (masculine plural but the head is feminine). Benwing2 (talk) 03:56, 2 September 2020 (UTC)Reply

Issues of various sorts, #2 edit

@AryamanA, Atitarev I did a big pass over the Hindi lemmas. This led to a ton of questions. Some of the more important ones are below:

  • उड़िया, उइग़ुर, ख्मेर, चेक, चेचेन, जर्मन, डच, तमिल, तातार, थाई, पोलिश, बड़ो, लोजबान, सिंहला, स्लोवेनियन, इंडोनेशियन, बम्बैया: guessing these are feminine as languages.
  • कोकबोरोक, हिंदको, बंगला, रेख़ता, स्कॉट्स: are these masculine or feminine as a language? Given as masculine.
  • स्पेनी, कोरियाई, हिन्दुस्तानी: are these masculine or feminine as a language? Given as masculine but I changed them to feminine.
  • अरोरा, अस्थाना, ओबामा, कच्छारा, कुशवाहा, एशिया, ऑस्ट्रिया, ओझा, बनिया, मराठा, ओशिआनिया, कनाडा, कन्हैया, कलकत्ता, क़ाहिरा, कातालोनिया, कान्हा, कुशवाहा, कृष्णा, कॅटलोनिया, कोटा, कोरिया, कोलकाता, गया, गुप्ता, गोवा/गोआ, चाइना, झा, तेलंगाना, दशहरा, धतूरा, पोखरा, फ़्लोरिडा, बोलिविया, मंगोलिया, मखीजा, मथुरा, मराठवाडा, मल्होत्रा, महात्मा, मादरेचा, मित्रा, म्यान्मा, यहूदा, रशिया, रेख़ता, वागरेचा, शर्मा, श्रीलंका, सक्सेना, सिसोदिया, सोमालिया, हड़प्पा, etc.: Can I assume that all masculine place names in -ā are regular rather than unmarked? I have made this assumption. What about masculine personal names, surnames and castes?
  • ओझा, कायस्थ, क्षत्रिय, खत्री, जाट, दे, ब्राह्मण, मराठा: are these caste names masculine or feminine? Given as masculine. Also it appears the names can be both caste names and names of members of the caste (except ब्राह्मण?). I assume the latter can be masculine or feminine based on sense, like names of ethnicities (except क्षत्रिय, which appears to have feminine क्षत्रियण).
  • अरोरा: is this caste name masculine or feminine? Given as feminine.
  • अग्रवाल, अरोरा, असलम, आडवाणी, ओबामा, कच्छारा, कुशवाहा, कपूर, केजरीवाल, कैफ, खंडेलवाल, ख़ान, गाँधी, गांगुली, गुप्ता, गॉर्डन, चंद, चौधरी, जलाल, झा, तेंदुलकर, त्रिपाठी, त्रिमूर्ति, त्रिवेदी, बंसल, बच्चन, मखीजा, मल्होत्रा, मादरेचा, मित्रा, वागरेचा, शर्मा, शिंदे, सक्सेना, सरनाईक, सरवटे, सिसोदिया, सेठ, हार्वर्ड, हिंगड, etc.: Surnames are generally given as masculine, should they actually be masculine/feminine by sense?

Thanks! Benwing2 (talk) 04:13, 24 August 2020 (UTC)Reply

@Benwing2, AryamanA: I had a look at the first on the list - उड़िया and उइग़ुर. As language names, they are definitely feminine but they also have senses for people, which is not in dictionaries. If the forms are correct, maybe @AryamanA can advise on the gender.
I will check if I can do anything with these entries but my knowledge is limited. --Anatoli T. (обсудить/вклад) 04:27, 24 August 2020 (UTC)Reply
@Benwing2, Atitarev: I can say this definitely: all languages are feminine. All ethnicity names, surnames, caste names can be either masculine or feminine. —AryamanA (मुझसे बात करेंयोगदान)
@Benwing2, AryamanA: Thanks. What's the declension for हिंदको (hindko) and is it correct? --Anatoli T. (обсудить/вклад) 04:39, 24 August 2020 (UTC)Reply
@Atitarev, Benwing2: Hmm, so languages would all be singularia tantum. I'm not sure how a feminine o-stem would be declined; I believe hypothetically it would be dir. pl. हिंदकोएँ (hindkoẽ), obl. pl. हिंदकोओं (hindkoõ), voc. pl. हिंदकोओ (hindkoo)? —AryamanA (मुझसे बात करेंयोगदान) 04:42, 24 August 2020 (UTC)Reply
@AryamanA, Atitarev Thanks. Are all the examples I give above of masculines in declined as "regular" nouns (oblique/vocative in -e) not "unmarked" nouns? Benwing2 (talk) 04:48, 24 August 2020 (UTC)Reply
@Benwing2: I think the opposite actually, all of these seem to be unmarked. I would hold off on adding declensions to them automatically IMO. —AryamanA (मुझसे बात करेंयोगदान) 04:52, 24 August 2020 (UTC)Reply
@AryamanA OK. Are they unmarked or simply indeclinable? In particular, surnames and personal names can be pluralized; are they pluralized with endings in the oblique, or are they indeclinable? BTW I already added regular declensions to these words, I'll change them to be unmarked or indeclinable depending on your answer. Benwing2 (talk) 05:01, 24 August 2020 (UTC)Reply
@Benwing2: Unmarked as in they are declined the same way पिता (pitā), keeping the ā-stem, is. —AryamanA (मुझसे बात करेंयोगदान) 05:03, 24 August 2020 (UTC)Reply
@AryamanA Thanks. BTW, the words referring to castes can presumably either refer to people in those castes or to the castes themselves. When referring to people, they are understandably either masculine or feminine by sense, but what about when referring to the castes themselves? E.g. for अरोरा, it could be used as a caste name in the Hindi equivalent of "Arora is a caste originating from the Punjab region of India and Pakistan". In such a usage, is it masculine or feminine? Benwing2 (talk) 05:08, 24 August 2020 (UTC)Reply
@Benwing2: Ah, good question, didn't think about that. I think they are all masculine in that sense; at least, I don't know of any feminine-by-default castes names or anything like that. —AryamanA (मुझसे बात करेंयोगदान) 05:11, 24 August 2020 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Atitarev, AryamanA I went through and converted all surnames to have g=mf and a masculine/feminine declension. I made all surnames ending in be unmarked, per the discussion above. Here is the list of all such surnames, please verify that all are actually unmarked, i.e. indeclinable in the singular rather than ending in -e in the oblique/vocative singular: अरोरा, अस्थाना, ओबामा, कच्छारा, कुशवाहा, गुप्ता, झा, मखीजा, मल्होत्रा, मादरेचा, मित्रा, मिश्र, मेहता, वर्मा, वागरेचा, शर्मा, सक्सेना, सिसोदिया. Thanks! Benwing2 (talk) 07:50, 24 August 2020 (UTC)Reply

@Benwing2: Seems I somehow missed this, but they all appear intuitively correct to me. Thanks for fixing them! —AryamanA (मुझसे बात करेंयोगदान) 06:16, 26 August 2020 (UTC)Reply

Issues of various sorts, #3 edit

@Atitarev, AryamanA Here is the list of remaining issues, except for some questions about the gender of individual words, which can wait. Apologies for the number of questions.

  • इस्तांबुल: gender? also is tr=istānbul (not auto tr=istāmbul) correct?
  • एमएलए: how declined? and is tr=emele correct? (auto tr=emaelae)
  • ग़ुस्लख़ाना: should this be tr=guslxānā?
  • बुलंदशहर, शहर: śahr or śahar? translit and pronun disagree
  • वहम: tr=vahm or tr=vahma? translit and pronun disagree
  • संलयन: tr=sanlayan or tr=sãlayan?
  • हुस्न: tr=husn or tr=husna? translit and pronun disagree
  • सहस्र: tr=sahasr or tr=sahasra? translit and pronun disagree
  • समुदाय: how is this pronounced? pronun and translit disagree
  • शाह: how is this pronounced? pronun and translit disagree
  • रहस्य: how is this pronounced? pronun disagrees with translit and Urdu
  • रस्म: how is this pronounced? pronun gives [ˈɾə.səm]|[ɾəsm] but it has a virama
  • रत्न: how is this pronounced? pronun gives [ɾəˈt̪ən̪] only but it has a virama
  • तत्त्व: pronunciation and translit disagree
  • दलदल: tr=daldal, tr=daladal, or both?
  • नागरिकता: tr=nāgriktā, tr=nāgrikatā or both?
  • सामी: does this actually mean "Semites, Semitic people" or "Semitic person"?
  • कन्नड़: This has two pronuns, one with final -a, I'm assuming it should have two translits.
  • केरल: Should this have translit/pronun kerala? Urdu equivalent کیرلا has final -a.
  • गुजरात: this has two etymologies, one a state "Gujarat" and one a city "Gujrat" but both given with the same pronun; should they be different?
  • पुराना नियम: is this declined differently in the oblique sg?
  • बंजारा: defined as "the Banjara or Lambadi people", should it actually be "member of the Banjara or Lambadi people"?
  • बक़रीद: should this be tr=baqarīd not default tr=baqrīd?
  • मित्र: should this be tr=mitr not default tr=mitra? Urdu is متر without any final vowel, which is normally indicated.
  • मोंटेनीगरो: gender? given as masculine but with attention note asking to check gender.
  • यहूदा इस्करियोती: does the first part decline differently in the oblique?
  • रोमानिया, सेंट लूसिया: given as feminine, is this correct? most countries in -yā are masculine.
  • ऑस्ट्रिया: given as both masc and fem, is this correct? most countries in -yā are masculine.
  • लैला और मजनू: I have marked this as masculine singular-only, is that correct?
  • विराटनगर: I have changed the tr to tr=virāṭnagar, correct?
  • शाहमुखी: I have assumed this to be feminine as other scripts appear to be feminine.
  • शिंदे, शिगात्से, सरवटे: How is this declined? I have assumed it to be indeclinable.
  • शिव: Given with two pronunciations, one with final long . Is this correct? Maybe it's actually a final short -a? The translit needs to be updated.
  • शुक्र: How should it be pronounced and transliterated? Presumably as tr=śukr not default tr=śukra, per the Urdu شکر?
  • हवाई: Is feminine correct? It's marked as "check this".
  • अति (likewise अभिव्यक्ति): Given with pronun like this: {{hi-IPA||अती}} Is this intended to indicate two pronuns, one with short final -i and the other with long final ? If so it doesn't work.
  • अतिसंवेदनशील: This has translit atisãvedanśīl but the pronunciation uses a respelling to indicate /ə.t̪ɪ.səm.ʋeː.d̪ən.ʃiːl/, without a nasal vowel before the v. Is this correct and/or normal? If normal, should we fix the translit to not make it a nasal vowel?
  • अग्निशस्त्र: pronun has respelling अग्नीशस्त्र indicating a long -ī- in the middle. Is this correct and/or normal? The same respelling is not done to अग्निशमन, अग्निपरीक्षा or अग्निशामक
  • अज्ञेयवाद: at this point it has pronun respelling अज्ञेय॰वाद, which has no effect. Is this correct? Same for अज्ञेयवादी with pronun respelling अज्ञेय॰वादी.
  • अनुपस्थित: pronun has respelling अनूपस्थित indicating a long -ū- in the middle. Is this correct and/or normal? There are a large number of other words in anu- without similar respelling.
  • अम्लता (likewise अम्लपित्त): Given with two pronuns अम्ल-ता and अम्लता. This causes two pronuns /əm.lᵊ.t̪ɑː/, [ə̃m.l̪ᵊ.t̪äː] and /əm.lə.t̪ɑː/, [ə̃m.l̪ə.t̪äː]. This appears to indicate that reduced schwa is phonemic. Should we be transliterating it as ă?

Benwing2 (talk) 05:13, 24 August 2020 (UTC)Reply

@AryamanA, Benwing2: सहस्र is in McGgregor but with the spelling सहस्त्र (with a silent "t" - "sahas(t)ră"), transliterated as "sahasră". Perhaps we should transliterate it "sahasr" per previous discussions. --Anatoli T. (обсудить/вклад) 07:58, 24 August 2020 (UTC)Reply
@AryamanA, Benwing2: रहस्य (rahasya) is "rahasyă" in McGgregor, make it "rahasy"? --Anatoli T. (обсудить/вклад) 08:07, 24 August 2020 (UTC)Reply
@Atitarev, AryamanA Couple more questions: How are मातृ (mātŕ) and मेट्रो (meṭro), both feminine, declined? Benwing2 (talk) 08:10, 24 August 2020 (UTC)Reply
@Atitarev We should preserve the weak schwa. I will implement probably + to force a weak schwa. Benwing2 (talk) 08:11, 24 August 2020 (UTC)Reply
@Benwing2, AryamanA: I welcome the introduction of the weak shwa. Transliterated as "ă"? It may solve some discrepancies between translit and IPA. @AryamanA said it's not always predictable. However, we do have many cases when it's shown in IPA as "ᵊ" but we either ignore or use a normal "a" in the translit. --Anatoli T. (обсудить/вклад) 23:07, 24 August 2020 (UTC)Reply
@Benwing2, AryamanA: I have checked, where I could issues #2 and #3 but let me know if you disagree with any changes. --Anatoli T. (обсудить/вклад) 10:36, 25 August 2020 (UTC)Reply
@Atitarev, Benwing2: I believe I have gone through everything now, should be all okay. I'll be implementing the predictable weakened schwas (end of the word w/ certain clusters) into hi-translit. —AryamanA (मुझसे बात करेंयोगदान) 02:18, 26 August 2020 (UTC)Reply
@AryamanA, Atitarev Thanks! I have one more round of questions, all related to pronunciation. I'll post them in a little while. Benwing2 (talk) 06:11, 26 August 2020 (UTC)Reply
@Benwing2: Great, thanks for being so thorough! I may not be able to get to them until tomorrow (~12 hours from now), but I certainly will as soon as I can. —AryamanA (मुझसे बात करेंयोगदान) 06:17, 26 August 2020 (UTC)Reply

Issues of various sorts, #4 edit

@Atitarev, AryamanA Here is the last (?) list of issues.

  • अर्जेन्टीना, अर्जेंटीना: Note to self: Put main at अर्जेण्टीना.
  • अल्पसंख्यक: Still syllabifies badly as /əl.p.səŋ.kʰjək/ with respelling अल्प-संख्यक. Same for ऑक्सफ़ोर्ड with respelling ऑक्स-फ़ोर्ड, syllabified as /ɔk.s.pʰoːɾɖ/, and many others.
  • असंवेदनशील: Pronunciation says /ə.səm.ʋeː.d̪ən.ʃiːl/ but translit asãvedanśīl. Is the pronunciation normal in cases of anusvara + v? If normal, should we fix the translit to not make it a nasal vowel? Same issue in अतिसंवेदनशील, संवेदनशील, असंवैधानिक, संवैधानिक.
  • असहयोग: Pronunciation says /ə.səʱ.joːɡ/, what does the raised ʱ after a vowel mean?
  • अहंकार: Manual pron respelling forces [ɛ.ɦɛ̃ŋ.käːɾ] and [eː.ɦẽː.käːɾ], is this a special exception or something that should be handled by default?
  • आठों पहर, पहर: Manual respellings force [äː.ʈʰõː‿pəʱɾ] but /peːʱɾ/; one is presumably wrong. Also, is this an exceptional pronun?
  • आत्महत्या: Pronun given as /ɑːt̪.mᵊ.ɦət̪.jɑː/, should this use translit ātmăhatyā?
  • आवाज़: Pronun given as /ə.ʋɑːz/, /ɑː.ʋɑːz/. Should the translit follow?
  • इकहत्तर: Pronun given as /ɪ.kʰət̪.t̪əɾ/ or /ɪ.kət̪.t̪əɾ/, should the first actually be /ɪk.ɦət̪.t̪əɾ/?
  • उदाहरण: Second pronun given as /ʊ.d̪ʱɑː.ɾəɳ/, should this actually be /ʊd̪.ɦɑː.ɾəɳ/? Also should there be a second translit to reflect this pronun?
  • उन्होंने: Pronun is rendered as /ʊn.ɦoːn.neː/, which is probably wrong.
  • ऊँचा: Should second pronun /uː.t͡ʃɑː/ be reflected in the translit?
  • एकड़: Should second pronun /eː.kəɾ/ be reflected in the translit?
  • एहसान: Is /ə.ɦə.sɑːn/, [ɛ.ɦɛ.s̪ä̃ːn̪] correct as the only pronun? If so, how should this be reflected in the translit (currently ehsān, which is wrong)?
  • ऐयाशी: Is second pronun /əj.jɑː.ʃiː/ correct? If so, should it be reflected in the translit?
  • कम्युनिज़म: Second pronun given as /kəm.jʊ.nɪz.mᵊ/, I changed it to /kəm.jʊ.nɪzm/. Correct?
  • कर्मभूमि: Note to self: Change translit handling of final -rm.
  • ख़त्म करना: First pronun is /kʰət̪.mə‿.kəɾ.nɑː/. Is that intended? ख़त्म by itself is /kʰə.t̪əm/ or /xət̪m/.
  • खींचना: First pronun is /kʰiːɲ.t͡ʃnɑː/, is that correct in light of recent discussions?
  • जन्मदिन: Can this be pronounced either janmadin or janamdin, despite the spelling?
  • ज़र्रा: Pronun given as zarraa, causes /zəɾ.ɾə.ᵊ/. What does this mean?
  • ज़ेहन: Pronun given as ज़हन, yielding [zɛ.ɦɛ̃n̪]. Correct?
  • जातिवाद: Pronun given as जातीवाद, with long -ī-. Correct? Likewise जातिवादी.
  • तनख़्वाह: All given pronuns are lacking the व. Correct?
  • धर्मशाला: Pronun gives /d̪ʱə.ɾəm.ʃɑː.lɑː/ despite the virama. Correct?
  • पेंशन: Auto translit pẽśan but pronun /peːn.ʃən/. Is pronun correct? If so, should the translit change? Do all words in anusvara + ś work this way?
  • फ्रेन्चाइज़: Note to self: This is a nuqtaless form serving as the main entry. Fix it.
  • चाँदी: Two pronuns given, चाँदी and चांदी, which produce the same results. I removed them. Likewise सूँघना and बाँधना.
  • भैंस: Has pronun respelling भैँस, which is the same as the title. I removed it. Likewise भैंसा, भौंह, सांप, सौंह; also मँगवाना, मँगेतर (in the opposite direction).
  • मधुमेह: Has respelling मधू-मेह. Is this predictable?
  • मानवीय: Pronun disagrees with translit.
  • मैंने: Pronun is /mɛːn.neː/, almost certainly wrong.
  • संकलक: Why is the pronun respelling given as संक-अलक? Seems wrong.
  • संवाद: Pronun has /səm.ʋɑːd̪/ but translit sãvād. Which is correct? Likewise संवाददाता, संवेदनशील, संवैधानिक
  • सर्वसाधारण: Translit was sarvasādhāraṇ, I changed it to sarvsādhāraṇ, and pronun accordingly. Likewise सर्वहारा.
  • सिंहला: Pronun given as /sɪ̃ʱ.lɑː/. Correct?
  • स्वायत्तता: Pronun given as /sʋɑː.jət̪t̪.t̪ɑː/. Does it really have three t's in a row?
  • आगरा, उड़ीसा, ओडिशा, ऊपरवाला, ओझा, कन्हैया, कान्हा, कृष्णा, कलकत्ता, कोटा, कोलकाता, गया, गोआ, गोवा, तेलंगाना, दशहरा, धतूरा, नोएडा, पटना, पोखरा, बँटवारा, बनिया, ब्रह्मा, मथुरा, मराठवाडा, मराठा, महात्मा, रेख़्ता, श्रीलंका, हड़प्पा, हरियाण: Proper nouns that I didn't change to 'unmarked' because they appear derived from common nouns or I have questions about them. Please let me know if they should be 'unmarked'.
  • निकारागुआ: Can this really be either masc or fem?

Benwing2 (talk) 02:42, 27 August 2020 (UTC)Reply

असंवेदनशील (asamvedanśīl): McGregor romanises संवेदन (samvedan) as "saṃ-vedan". can we romanise them with "m" or "ṃ" to match the pronunciation?
असहयोग (asahyog): Raised ʱ must be a light syllable final h but सहयोग (sahyog) in https://hi.forvo.com/word/सहयोग/ even a shwa is heard. --Anatoli T. (обсудить/вклад) 06:12, 27 August 2020 (UTC)Reply
@Atitarev, Benwing2: Going through this. BTW, regarding your note to self on the first one, please don't move the anusvara entries to the full-consonant forms, we standardize at the anusvara like most dictionaries (e.g. Oxford) do these days. —AryamanA (मुझसे बात करेंयोगदान) 13:48, 27 August 2020 (UTC)Reply

Standardizing alternatives edit

@Atitarev, AryamanA Should we standardize on chandrabindu as the main article when there are chandrabindu/anusvara alternatives and the character doesn't go above the line? E.g. महँगा is alt form of महंगा, मूँछ is alt form of मूंछ, साँस is alt form of सांस, etc. but conversely सांप is alt form of साँप, अंधेरा is alt form of अँधेरा, etc. What about explicit n vs. anusvara? E.g. शान्त is alt form of शांत, सम्पूर्ण is alt form of संपूर्ण, हिन्द is alt form of हिंद, अत्यन्त is alt form of अत्यंत, अन्तर्गत is alt form of अंतर्गत, etc. but conversely समंदर is alt form of समन्दर, हिंदुस्तानी is alt form of हिन्दुस्तानी, हिंदू is alt form of हिन्दू, etc.. हिंद vs. हिन्दू are related but have opposing tendencies in alt forms. Benwing2 (talk) 02:43, 27 August 2020 (UTC)Reply

@Benwing2, AryamanA: As far as I know, anusvara is currently more common, modern or standard than chandrabindu or explicit N (with an invisible virama, which creates a conjunct with the next consonant). --Anatoli T. (обсудить/вклад) 03:46, 27 August 2020 (UTC)Reply
@Atitarev, Benwing2: Yes, agree with Anatoli. —AryamanA (मुझसे बात करेंयोगदान) 13:49, 27 August 2020 (UTC)Reply
@Atitarev, AryamanA OK. I made a list of all cases where a form with anusvara is an alt form of a main lemma with chandrabindu or explicit n/m. Should we rename all of them? Note that I am guessing that anusvara is only universally correct when it comes before a (homorganic) stop sound; before a fricative or approximant, or a non-homorganic stop sound, use chandrabindu or explicit n, and word-finally, use chandrabindu on vowels that don't go above the line, and anusvara on vowels that do go above the line. Is that right?
The following list contains proposed renames based on alt forms. The form on the left is the current main lemma, and the form on the right is the current alt form that I'm thinking should be the main lemma. I marked ?? or ??? by cases that do not involve a stop sound, as described above.
  • आँकड़ा -> आंकड़ा
  • खण्ड -> खंड
  • घुँघरू -> घुंघरू
  • चाँदी -> चांदी
  • दाँत -> दांत
  • बाँस -> बांस ??
  • बाँह -> बांह ??
  • भाँग -> भांग
  • मँगवाना -> मंगवाना
  • माँ -> मां ???
  • माँग -> मांग
  • माँगना -> मांगना
  • मूँग -> मूंग
  • राँची -> रांची
  • लँगड़ा -> लंगड़ा
  • हँसना -> हंसना ??
  • हँसाना -> हंसाना ??
  • हाँ -> हां ???
  • हालाँकि -> हालांकि
  • अँधेरा -> अंधेरा
  • आँसू -> आंसू
  • आनन्द -> आनंद
  • आरम्भ -> आरंभ
  • इन्द्र -> इंद्र
  • उसांस -> उसाँस ???
  • चन्द्रबिन्दु -> चंद्रबिंदु
  • चाँटा -> चांटा
  • चुम्बन -> चुंबन
  • टाँग -> टांग
  • तन्दूर -> तंदूर
  • पाँच -> पांच
  • पाँचवाँ -> पांचवाँ ? (no alt form currently)
  • पाँव -> पांव ???
  • पूँछ -> पूंछ
  • बाँह -> बांह ??
  • महँगा vs. महंगा ?? which one is better?
  • महँगाई and महंगा are main forms, inconsistent
  • main forms मुँह, मुँह काला करना, मुँह मीठा करना, मुँहतोड़, मुँहतोड़ जवाब, मुँहमाँगा, मुँहासा
  • समन्दर -> समंदर
  • साँप -> सांप
  • main form सल्लू साँप
  • सौगन्द -> सौगंध
  • हिन्दुस्तानी -> हिंदुस्तानी
  • हिन्दू -> हिंदू
What do you think? Should we rename? Benwing2 (talk)
@AryamanA, Benwing2: I am confused now, since some chandrabindu spellings as above seem more common in dictionaries than the bindu (=anusvara) spellings, e.g. मँगवाना (maṅgvānā) or हालाँकि (hālā̃ki) are in McGregor but the bindu forms are not, despite the previous findings. --Anatoli T. (обсудить/вклад) 07:12, 29 August 2020 (UTC)Reply
@Atitarev, Benwing2: Having looked into this more, I don't think we should do any automated pagemoves because we risk losing information or even providing false information. Basically, chandrabindu instances can be replaced with anusvara, but the reverse is not always true. And nasal + consonant clusters can be replaced with anusvara but it's not equivalent to the chandrabindu-type anusvara. Some of them I will move manually when I am sure. —AryamanA (मुझसे बात करेंयोगदान) 15:40, 29 August 2020 (UTC)Reply
@Atitarev, AryamanA All of the cases above with an arrow in them represent situations where both the forms with chandrabindu (or explicit n) and anusvara exist; what I'm proposing is just changing which one is the main form vs. the alt form. Benwing2 (talk) 20:29, 29 August 2020 (UTC)Reply
@Benwing2: Okay so, I am good with all nasal consonant -> anusvara moves. We should go ahead with those.
As for chandrabindu, the historically "correct" or "standard" way for anusvara vs. chandrabindu is the chandrabindu form, but this isn't really adhered to in modern times (possibly because many keyboards e.g. Gboard on Android don't even have a key for the chandrabindu). So from a descriptivist standpoint anusvara is more common but I'm not sure we should move those. —AryamanA (मुझसे बात करेंयोगदान) 20:44, 29 August 2020 (UTC)Reply
@AryamanA, Atitarev I moved all the nasal consonant conjunct lemmas to anusvara variants. As for chandrabindu vs. ansuvara, McGregor isn't consistent, e.g. he has अंकन (aṅkan, valuing, appraising) but अँकना (ãknā, to be valued, to be appraised). I wonder if we shouldn't adopt a single principle and stick with it. Benwing2 (talk) 00:18, 31 August 2020 (UTC)Reply
@Benwing2, AryamanA: As for me, I'd prefer to use a single principle. If references only use conjuncts or chandrabindu, we can use those as links and have them as alt forms. --Anatoli T. (обсудить/вклад) 00:24, 31 August 2020 (UTC)Reply
@AryamanA, Atitarev I think maybe the principle should be the following: (1) Word-finally and before a non-stop consonant, use chandrabindu if the letter doesn't go above the line, anusvara if it does; (2) Before stop consonants, always use anusvara. This seems to accord with actual usage. Benwing2 (talk) 02:04, 31 August 2020 (UTC)Reply
@Itsmeyash31, AryamanA I'd like to revisit this. @Itsmeyash31, do you have any opinions about standardizing on anusvara vs. chandrabindu? Currently in Wiktionary it's a mess with no standardization. Benwing2 (talk) 07:52, 13 September 2020 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Benwing2, AryamanA According to me chandrabindu and that dot have two different functions. For example:

  • हंस /ɦəns/ is a Swan (The bird)
  • हँस /ɦə̃s/ is imperative of to laugh.

You can see the pronunciation difference between those two words. Now, the dot works the same as chandrabindu when it appears at the end of a word for example:

  • दें /dẽ/ (can be considered both 1st and the last syllable; more discussed below)
  • लें /lẽ/ (can be considered both 1st and the last syllable)
  • बोलें /bolẽ/.
  • खोलें /kʱolẽ/

When it appears in an in-between position its pronunciation changes depending on whether the consonant next to it. For example:

  • कंबल /kəmbəl/ (the dot here represents /m/)
  • चंचल /t͡ʃənt͡ʃəl/ (but here the dot represents /n/)

The logic followed is written below.

  1. before the consonants (क ख ग घ) (च छ ज झ) (ट ठ ड ढ) (त थ द ध) the dot is pronounced /n/.
  2. before the consonants (प फ फ़ ब भ) the dot is pronounced /m/.
  3. when it appears at the end of the word it is pronounced /m̥/
  4. For single syllable word always use chandrabindu e.g. माँ and ख़ाँ but obviously the dot for दें and लें because the vowel marks े, ै etc don't go well with chandrabindu.

Fix the orthography rules as:

  1. ँ = /m̥/
  2. ँ = ं if and only if the nasalised vowels are ें ैं because ेँ ैँ don't look good.
  3. ं = /m/ (for labial consonants including फ़) [(प फ फ़ ब भ)]
  4. ं = /n/ (for all non-labial consonants) [(क ख ग घ) (च छ ज झ) (ट ठ ड ढ) (त थ द ध)]

Now, the words you gave. The rule to follow is: Wherever you can replace ं with either a म or a न use a ं , and wherever you cannot, use ँ.

  • आँकड़ा
  • खंड
  • घुँघरू
  • चाँदी
  • दाँत
  • बाँस
  • बाँह
  • भाँग
  • मँगवाना
  • माँ
  • माँग
  • माँगना
  • मूँग
  • राँची
  • लँगड़ा
  • हँसना
  • हँसाना
  • हाँ
  • हालाँकि
  • अंधेरा
  • आँसू
  • आनंद
  • आरंभ
  • इंद्र
  • उसाँस
  • चंद्रबिंदु
  • चाँटा
  • चुंबन
  • टाँग
  • तंदूर
  • पाँच
  • पाँचवाँ
  • पाँव
  • पूँछ
  • बाँह
  • महँगा
  • महँगाई
  • मुँह
  • मुँहतोड़
  • मुँहमाँगा
  • समंदर
  • साँप
  • सौगंध
  • हिंदुस्तानी
  • हिंदू

There are other changes that also happen depending on whether the consonant after the nasalisation is voiced or unvoiced. But, note these no do not say whether the chandrabindu should be used or the dot. These below show "allophonic" sound changes. So, they ideally should not be used to change the orthography. So, I recommend not to use ण्ड consonant combination anywhere, unless it's used for a proper noun such as for e.g. इण्डेन गैस (which is the company's official way of writing it).

  • करूँ /kəɾũ:/ & करूँगा /kəɾu:ŋga:/ (note the final vowel is nasalised as /ũ:/, now note the same vowel changed to /u:ŋ/) [it's pronounced /ŋ/ before voiced non-retroflex consonants]. It's exact phonetic devanagari spelling would be करूङ्गा.
  • इंडियन = इण्डियन /ɪɳɖiyən/ (before retroflex consonants the /n/ changes to /ɳ/). So, if you wouldn't use करूङ्गा anywhere, you also shouldn't use ण्ड consonant combination anywhere.

Itsmeyash31 (talk) 04:32, 14 September 2020 (UTC)Reply

@Benwing2, Itsmeyash31: I find this very well-articulated and agree with everything said by @Itsmeyah31 (especially good point on the minimal pair हंस हँस). We should not engage in any merger of chandrabindu to anusvara. —AryamanA (मुझसे बात करेंयोगदान) 21:27, 14 September 2020 (UTC)Reply
I've added audio to हंस (hans) and हँसना (hãsnā) btw. —AryamanA (मुझसे बात करेंयोगदान) 21:35, 14 September 2020 (UTC)Reply
@AryamanA, Itsmeyash31, Atitarev Thank you very much for the long response. I'm a bit confused by what you wrote but I'll try to paraphrase it:
  1. Chandrabindu and anusvara are equivalent before stop consonants i.e. क ख ग घ च छ ज झ ट ठ ड ढ त थ द ध प फ फ़ ब भ (where they are pronounced as the homorganic nasal consonant, i.e. /n/ /m/ [ŋ] [ɳ] [ɲ]) and at the end of a word (where they are pronounced as nasalization of the preceding vowel), but are not equivalent before fricatives (at least /s/, probably also /z/ /f/ /v/ /ʃ/ /h/), where anusvara indicates /n/ and chandrabindu indicates nasalization of the preceding vowel.
  2. Therefore, we can standardize on one or the other before stop consonants and at the end of a word, but not elsewhere.
  3. The choice of which one to use, based on the examples given above, is as follows:
    1. When following a short vowel /a/ /i/ /u/, use anusvara.
    2. When following a long vowel that goes above the line, i.e. े ै ो ौ ी , use anusvara.
    3. When following a long vowel that does not go above the line, use chandrabindu.
    4. As an exception to the first rule, use chandrabindu following a short vowel /a/ /u/ (maybe just /a/) before ग. This rule seems strange to me but you have two examples of this (मँगवाना लँगड़ा) and no examples the other way. Did you mean मंगवाना लंगड़ा ?
  4. These four rules explain all your recommendations.
Do you mind if I standardize according to the above rules? (BTW we need to fix the module to handle chandrabindu before fricatives as vowel nasalization and not /n/, which is what it does currently.) Benwing2 (talk) 03:01, 15 September 2020 (UTC)Reply
@Benwing2: I agree that it's only unpredictable before fricatives. Regarding the rest, I don't think it's possible to standardize on any one, it's word specific. Words borrowed from Sanskrit use anusvara in front of fricatives: हंस (hans), संस्थान (sansthān), etc. while words inherited from Sanskrit use the chandrabindu regardless of whether the outcome is nasalization or a nasal stop: लँगड़ा (laṅgṛā) and मँगवाना (maṅgvānā) are correct. However, modern orthographic variants that replace chandrabindu with anusvara are extant (but not the standard). The reverse is not true, the Sanskrit borrowings can never be written with chandrabindu. —AryamanA (मुझसे बात करेंयोगदान) 03:08, 15 September 2020 (UTC)Reply
@AryamanA Are there are words where the most standard form has anusvara following a long vowel? E.g. I found अनांत्र and अयनांत, which are Sanskrit borrowings and do not redirect to chandrabindu spellings, but I don't know if these are the standard. Meanwhile, is the rule about chandrabindu in inherited words reliable? E.g. I found कांपना, which does not redirect to a chandrabindu spelling and is inherited, but I don't know if this is standard. I suspect there are a great deal of wrong forms currently extant on Wiktionary. Benwing2 (talk) 03:35, 15 September 2020 (UTC)Reply
@AryamanA Also, is the rule about chandrabindu + /s/ as vowel nasalization reliable? E.g. we have साँस as an alternative spelling of सांस. Are these pronounced differently? Benwing2 (talk) 03:44, 15 September 2020 (UTC)Reply
@Benwing2: Those Sanskrit borrowings are indeed only spelled with anusvara, अनांत्र and अयनांत are correct as is. Chandrabindu forms are not always permissible variants of anusvara forms, but an anusvara is a normal variant of a chandrabindu form. Basically, anusvara is a superset of chandrabindu. काँपना is the appropriate spelling indeed. And yeah, we probably have some moving to do.
साँस and सांस are both supposed to be pronounced /sɑ̃ːs/ (nasalized, not a nasal stop). Again, this is because chandrabindus are replaced with anusvaras often. One can never write हंस, संस्थान, etc. with chandrabindu (they have nasal stops). So this will have to be fixed per-entry.
Also, McGregor doesn't bother to differentiate at all between chandrabindu/anusvara in translit, which is quite annoying. —AryamanA (मुझसे बात करेंयोगदान) 21:59, 15 September 2020 (UTC)Reply
@AryamanA Thank you for your response. Sorry to keep bugging you with questions; I don't have any reference book for Hindi spelling or grammar, and there don't seem to be very many good online resources, particularly for pronunciation that properly notes elision where it occurs. (Contrast e.g. with Ukrainian, where there are at least 4 separately-sourced online sites listing all the case forms of each word.) Benwing2 (talk) 03:51, 16 September 2020 (UTC)Reply
@Benwing2: No worries, sorry for my slow responses, it's a busy semester. If you go on LibGen or university library sites (if you have access to those), Yamuna Kachru's book titled Hindi is available. It's the most recently published comprehensive Hindi grammar. But I think we've probably exhausted its usefulness, since so much stuff has been implemented already. —AryamanA (मुझसे बात करेंयोगदान) 21:12, 16 September 2020 (UTC)Reply

country names edit

@Atitarev, AryamanA Are all country names masculine? The following are given as feminine: गिनी, चिली, निकारागुआ (either masc or fem). Benwing2 (talk) 05:31, 27 August 2020 (UTC)Reply

@Benwing2, AryamanA Yes, I was wondering too. Are there any more or less reliable and easy tricks we can use to find out?
In Slavic languages, Arabic, etc. you can use adjectives, pronouns or forms to determine the gender, e.g.:

но́вая Чи́лиnóvaja Čílithe new Chile

Confirming that Russian Чи́ли (Číli) can be a feminine.

مِصْرُ هِيَ دَوْلَةٌ عَرَبِيَّةٌmiṣru hiya dawlatun ʕarabiyyatunEgypt is an Arabic country

Confirming that Arabic مِصْر (miṣr) can be a feminine.
I find these tests harder with Hindi, it's the same pronoun for m and f and both adjective forms seem to be used.
I also for grammatical terms स्त्रीलिंग (strīliṅg, feminine) / पुल्लिंग (pulliṅg, masculine), hoping to find some reference but it doesn't help much. --Anatoli T. (обсудить/вклад) 05:48, 27 August 2020 (UTC)Reply
@Atitarev, Benwing2: They all sound better in the masculine to me. Notable, I found hits for नया गिनी (nayā ginī, New Guinea) with the masculine. —AryamanA (मुझसे बात करेंयोगदान) 14:13, 27 August 2020 (UTC)Reply
@Benwing2, AryamanA: Thanks, what's the gender/plurality of फ़िलीपीन्स (filīpīns)? --Anatoli T. (обсудить/вклад) 02:09, 28 August 2020 (UTC)Reply
@Atitarev: Masculine sounds best to me, singular only (it was borrowed as a singular). —AryamanA (मुझसे बात करेंयोगदान) 03:31, 28 August 2020 (UTC)Reply

Dual gender edit

@Benwing2, AryamanA: According to this blog, प्रधान मंत्री (pradhān mantrī) can be both masculine and feminine, depending who it is.

प्रधान मंत्री कल जा रहे हैं (male)pradhān mantrī kal jā rahe ha͠iThe Prime Minster is going tomorrow. प्रधान मंत्री कल जा रही हैं (female)pradhān mantrī kal jā rahī ha͠iThe Prime Minster is going tomorrow. --Anatoli T. (обсудить/вклад) 02:22, 28 August 2020 (UTC)Reply

@Atitarev: That makes sense. Indira Gandhi was one of India's most notable prime ministers, and a woman, so surely the word took on the feminine in her time. I think all occupations (can't think of exceptions) can be both m/f. —AryamanA (मुझसे बात करेंयोगदान) 02:50, 28 August 2020 (UTC)Reply
@AryamanA, Benwing2: @AryamanA, thanks. In Russian, there are words that refer to both men and women but they stay grammatical masculines (they may or may not a feminine equivalent), that's why I wanted to make sure. I can see you have applied the dual declension as well, thanks.
Would अध्यापक (adhyāpak) be an exception (masculine only?), since there is a feminine equivalent अध्यापिका (adhyāpikā)?
I also posted a question earlier today above regarding फ़िलीपीन्स (filīpīns), in case you have missed. --Anatoli T. (обсудить/вклад) 03:25, 28 August 2020 (UTC)Reply
@Atitarev: Oops, I missed it earlier, sorry. Yes, so when there is a feminine equivalent that is common, then the masculine form is only masculine. प्रधान मंत्रिणी (pradhān mantriṇī) does exist as feminine of प्रधान मंत्री (pradhān mantrī), but it's very rare, and didn't catch on. —AryamanA (मुझसे बात करेंयोगदान)

@Benwing2, AryamanA: Should nouns with dual genders display additional info in declensions, e.g. the parts that refer to masculines and feminines, especially for humans? For example, in साक्षी (sākṣī), साक्षियाँ (sākṣiyā̃) is a direct plural only for female witnesses. --Anatoli T. (обсудить/вклад) 04:44, 2 September 2020 (UTC)Reply

Or, if it's difficult, confusing or crowded, the declension should be split into two tables, so that they get gender labels. --Anatoli T. (обсудить/вклад) 04:56, 2 September 2020 (UTC)Reply
@Atitarev, Benwing2: I like this idea! —AryamanA (मुझसे बात करेंयोगदान) 20:43, 2 September 2020 (UTC)Reply
@Benwing2, AryamanA: Thanks! Thank you also for restoring the dual gender to साक्षी (sākṣī), removed by User:शब्दशोधक. I didn't revert that change, since I didn't feel confident. --Anatoli T. (обсудить/вклад) 01:05, 3 September 2020 (UTC)Reply

gender of cities and continents edit

@Atitarev, AryamanA Is there a rule about the gender of cities and continents? E.g. अजमेर, अनंतपुर, अबटाबाद, अमृतसर, अलीगढ़, अहमदाबाद, अहिच्छत्र, आगरा, आलप्पुष़ा, इंदौर, इलाहाबाद, इस्तांबुल, इस्लामाबाद, उस्मानाबाद, ऋषिकेश, औरंगाबाद, कन्नौज, कपिलवस्तु, करनाल, कलकत्ता, क़सूर, क़ाहिरा, काठमांडू, कानपुर, काबुल, कार्थेज, कैथल, कोटा, कोरेगांव, कोलकाता, ख़ुशाब are given as masculine, but अमरावती, अयोध्या, उत्तरकाशी are given as fem. (Checking the first 3169 of about 13000 lemmas.) Benwing2 (talk) 20:37, 29 August 2020 (UTC)Reply

@Benwing2: No clear cut rule unfortunately. However, a lot of these are derived from regular nouns पुर m (pur, city), गढ़ m (gaṛh, fort), आबाद m (ābād, city) so we could automate the gender for some of those perhaps. —AryamanA (मुझसे बात करेंयोगदान)
@AryamanA Thanks. What about foreign names of cities? E.g. the following (up through lemma #4000) don't currently have gender given: अंकारा, अबू धाबी, इंचियोन (in South Korea), उरुमची (in China), ऊफ़ा (in Russia), एडिलेड (in Australia), एथेंस, ऐम्स्टर्डैम, ऑकलैंड, ऑक्सफ़ोर्ड, ओटावा, ओसाका, क़ुस्तुंतुनिया, कार्डिफ़, कुआला लमपुर, केपटाउन, कैंब्रिज, कैनबरा, कैलगरी, क्योटो, क्राइस्टचर्च, गोल्ड कोस्ट, ग्रोनिंगन, ग्लासगो. Benwing2 (talk) 22:16, 29 August 2020 (UTC)Reply
@AryamanA, Benwing2: Sorry, asking this again. Is there a trick to find out the gender of e.g. तेहरान (tehrān, Tehran)? I have tried to search for e.g. both "बड़ा तेहरान" and "बड़ी तेहरान" (large Tehran) or ""तेहरान बड़ा है"/""तेहरान बड़ी है" (Tehran is large) (using various masculine and feminine adjectives). The results tell me nothing of use. Are both genders valid?! Could there be some other ways to check? I am surprised there's no online reference or even discussions on the topic. Maybe something in Hindi? It seems also that loanwords in general don't follow rules, so it could be arbitrary before it settles. --Anatoli T. (обсудить/вклад) 00:33, 30 August 2020 (UTC)Reply
@Atitarev: Yes, exactly, loanwords don't really follow fixed rules for gender. Sometimes terms with similar semantics settle on the same gender, e.g. भाषा (bhāṣā), बोली (bolī), and borrowings ज़बान (zabān), लिसान (lisān) are all feminine and all mean "language". Nothing like that for cities. Tehran appears masculine to me so I made the entry. That's the best we can do for now. —AryamanA (मुझसे बात करेंयोगदान) 16:08, 30 August 2020 (UTC)Reply
@AryamanA जोधपुर is given as fem. Is this a mistake? Benwing2 (talk) 17:58, 30 August 2020 (UTC)Reply
@Benwing2, AryamanA: Re: जोधपुर (jodhpur): I am pretty sure it was a mistake, so I went ahead and fixed it.
Re: तेहरान (tehrān): @AryamanA, thanks for responding and making the entry but what method did you use to determine the gender, if it's not just the gut feeling? Can you share? --Anatoli T. (обсудить/вклад) 23:59, 30 August 2020 (UTC)Reply
@Atitarev: Good fix on जोधपुर. I don't have anything better than native speaker intuition to go off of unfortunately. The thing is, I'm not perfect at it; for a good portion of my childhood I only spoke in English so my ability to do things like intuit genders for loans is a little dull. —AryamanA (मुझसे बात करेंयोगदान) 00:03, 31 August 2020 (UTC)Reply
@AryamanA: That's fine, thank you but I still encourage you to come up with a few tricks (even if imperfect) as per Module_talk:hi-noun#country_names to determine word gender being actually used. A sample sentence in Module_talk:hi-noun#Dual_gender with रहे/रही, which determine the gender, is a good method, I think but there could be multiple. In Russian, for example, it would only be difficult to determine gender for rare loanwords, which are never used with adjectives, pronouns or verb forms, which indicate gender. --Anatoli T. (обсудить/вклад) 00:14, 31 August 2020 (UTC)Reply
@Atitarev: Ohh, you mean like a diagnostic grammatical trick? That would be the habitual "to be": तेहरान होता है "It's Tehran/Tehran exists" sounds better to me than तेहरान होती है. Whenever I ask my parents a question about gender of a noun, they use this construction to figure it out, so maybe it's a common thing. —AryamanA (मुझसे बात करेंयोगदान) 03:02, 31 August 2020 (UTC)Reply
@AryamanA, Benwing2: Yes, that's what I meant. That test doesn't work in Google for तेहरान, though. Share with us more of those tricks :) --Anatoli T. (обсудить/вклад) 05:26, 31 August 2020 (UTC)Reply

@AryamanA, Benwing2: Update: All the city names from 22:16, 29 August 2020 by @Benwing2 above are masculine. Confirmed by multiple users on Word Reference forum. --Anatoli T. (обсудить/вклад) 22:52, 2 September 2020 (UTC)Reply

कच्चा लोहा edit

@AryamanA, Atitarev Do both parts decline, e.g. is the oblique singular कच्चे लोहे? There are also several terms using काला, e.g. काला कच्चू, काला कलूटा, काला कोयला, काला कीकर; I assume both parts of each also decline. Benwing2 (talk) 04:54, 31 August 2020 (UTC)Reply

Also, काला कोयला is confusing to me because the second part is a noun but the whole expression is an adjective. काला कलूटा might be the same; not sure what कलूटा means. Benwing2 (talk) 04:59, 31 August 2020 (UTC)Reply
There are several others: काला कौआ (kālā kauā, raven), काला चश्मा (kālā caśmā, sunglasses) (I corrected the gender from fem to masc), काला चोर (kālā cor, great thief, dark horse), काला जीरा (kālā jīrā, black cumin), काला तिल (kālā til, black sesame), काला धन (kālā dhan, black money) (also कालाधन (kālādhan), written as one word), काला नमक (kālā namak, kala namak, Indian black salt), काला पहाड़ (kālā pahāṛ, elephant), काला बाल (kālā bāl, pubic hair, groin) (also काले बाल (kāle bāl), plural), काला भुजंग (kālā bhujaṅg, black snake; jet-black (as an adjective)), काला लोन (kālā lon, black salt), काला हिरन (kālā hiran, blackbuck), कालापानी (kālāpānī, Cellular Jail; exile). This leads to some questions: (1) If both parts of the above terms are declined, what about काला धन (kālā dhan) vs. कालाधन (kālādhan)? Does the latter decline as a single word but the former decline as two parts? (2) What about काला भुजंग (kālā bhujaṅg) as an adjective? Does the first part decline and not the second part? Or does neither part decline? Benwing2 (talk) 06:08, 31 August 2020 (UTC)Reply
Finally, how does कौआ (kauā) decline? I am guessing as a regular, not unmarked, masculine. Is there any reference for these declensions? I would be very surprised if there isn't. Benwing2 (talk) 06:10, 31 August 2020 (UTC)Reply
Other examples: खारा पानी (khārā pānī, salt water, literally salty water), छोटा कुत्ता (choṭā kuttā, lapdog, literally small dog), टूटता तारा (ṭūṭtā tārā, shooting star), बड़ा दिन (baṛā din, Christmas, literally big day). Benwing2 (talk) 06:16, 31 August 2020 (UTC)Reply
A problematic case: छोटा हाज़िरी (choṭā hāzirī, chota hazri, light breakfast), which is given as feminine despite छोटा being masculine. Benwing2 (talk) 06:38, 31 August 2020 (UTC)Reply
@AryamanA, Benwing2 The more grammatical छोटी हाज़िरी (choṭī hāzirī) is more frequent in Google searches. Also, the English "choti hazri" may be even attestable. --Anatoli T. (обсудить/вклад) 02:56, 1 September 2020 (UTC)Reply
@Benwing2, Atitarev: Yes, the adjective declines as usual in all of these (so the obl कच्चे लोहे is correct). Even in the compounds like कालाधन (kālādhan) the adjective part declines. For काला भुजंग (kālā bhujaṅg) yes the first part declines and not the second one. कौआ (kauā) I've added the declension for, it's regular. —AryamanA (मुझसे बात करेंयोगदान) 03:05, 1 September 2020 (UTC)Reply

वज़ीर-ए-ख़ारजा edit

@AryamanA, Atitarev How are terms like वज़ीर-ए-ख़ारजा (vazīr-e-xārjā), इस्म-ए-शरीफ़ (isma-e-śarīf) declined? Logically the head is the first word. Benwing2 (talk) 01:31, 2 September 2020 (UTC)Reply

@Benwing2, AryamanA: I've Googled and you must be right. --Anatoli T. (обсудить/вклад) 04:35, 2 September 2020 (UTC)Reply
@Benwing2, Atitarev: The head is the second word. Hindi has reanalyzed these izāfat constructions as compounds, and the word takes the gender of the second word and is declined only at the end. It might be that Urdu treats them more like Persian does (head is first word), I'm not sure about that. —AryamanA (मुझसे बात करेंयोगदान) 15:23, 2 September 2020 (UTC)Reply
@Benwing2, AryamanA: Thanks. My search results for inlfected forms, such as "वज़ीरों-ए-ख़ारजा" must be for Urdu written in Devaganari or are they alternative Hindi declensions? --Anatoli T. (обсудить/вклад) 00:47, 3 September 2020 (UTC)Reply
@Atitarev, Benwing2: I don't get any hits for that... I also tried वज़ीरों-ए-आज़म (vazīrõ-e-āzam) and no hits for that either. Ditto for using the borrowed plural वज़ीरन (vazīran). I'm quite certain they aren't used in Hindi. —AryamanA (मुझसे बात करेंयोगदान) 02:30, 3 September 2020 (UTC)Reply
@AryamanA: Thanks again. I made a mistake. --Anatoli T. (обсудить/вклад) 02:44, 3 September 2020 (UTC)Reply
@Atitarev: Well, couple thousand correct, one mistake, it happens :) BTW, I responded to your email. —AryamanA (मुझसे बात करेंयोगदान) 02:49, 3 September 2020 (UTC)Reply

करनेवाला edit

@AryamanA, Atitarev Presumably this word has feminine oblique/plural करनेवालीं (karnevālī̃) ending in an anusvara? If so I need to modify {{hi-adecl}}. Are there other adjectives with a separate feminine oblique/plural? Benwing2 (talk) 02:05, 2 September 2020 (UTC)Reply

@Benwing2: Yeah, I think that info belongs in the conjugation table. The feminine plural endings (that's never the oblique) with anusvara only occur in verb forms, so don't think we should replicate that on the verb form pages. —AryamanA (मुझसे बात करेंयोगदान) 02:51, 3 September 2020 (UTC)Reply
@AryamanA OK. Are you sure the feminine forms with anusvara are plural-only? They are labeled as both oblique and plural in e.g. घुस करना and various of the old verb templates. Benwing2 (talk) 06:15, 3 September 2020 (UTC)Reply
@Benwing2: Yes, I'm quite certain that's incorrect. The conjugation tables have a lot of issues right now... —AryamanA (मुझसे बात करेंयोगदान) 13:42, 3 September 2020 (UTC)Reply

two-part nouns edit

@AryamanA, Atitarev Above you mentioned that two-part nouns like लेखा-जोखा (lekhā-jokhā, accounting) decline only in their second part. However, "लेखे-जोखे" gets many more hits than "लेखा-जोखे" (10,400 vs. 267), and adverbial oblique आमने-सामने (āmne-sāmne) suggests that both parts of आमना-सामना (āmnā-sāmnā) also decline. Is this a general phenomenon? Is it restricted to nouns ending in ? Benwing2 (talk) 04:28, 2 September 2020 (UTC)Reply

@Benwing2, AryamanA I couldn't find the discussion quickly but if I remember correctly, Aryaman said there are different examples and both parts can be declinable (case by case), so your finding must be right, IMO and both parts are declinable. --Anatoli T. (обсудить/вклад) 04:50, 2 September 2020 (UTC)Reply
@Atitarev, Benwing2: Yes, I think it's best to include both variants in such cases. It does seem to be generalizable. —AryamanA (मुझसे बात करेंयोगदान) 20:44, 2 September 2020 (UTC)Reply

Adjectives in -ānā and -iyā edit

@AryamanA, Atitarev Are all adjectives in these two suffixes indeclinable? Benwing2 (talk) 04:48, 8 September 2020 (UTC)Reply

What about -śudā? Same thing? Benwing2 (talk) 04:49, 8 September 2020 (UTC)Reply
@Benwing2, Atitarev: -ānā and -śudā are indeclinable always (Persian borrowings). -iyā I'm not so definitive about, but I think it's always indeclinable and I can't come up with a counterexample. —AryamanA (मुझसे बात करेंयोगदान) 17:15, 8 September 2020 (UTC)Reply
@AryamanA, Benwing2: I am using User:Dixtosa's tool to identify those: śudā-adjectives, ānā-adjectives, -iyā-adjectives (assuming these are spelled with a diacritic). I can't comment on the actual question. --Anatoli T. (обсудить/вклад) 23:38, 8 September 2020 (UTC)Reply
@Atitarev, AryamanA Thanks! Here's a list of the remaining adjectives in -iyā without declensions: अठपहिया, इकपहिया, उड़िया, ख़ुफ़िया, गिरमिटिया, चिनिया, चीनिया, छलिया, तिपहिया, दुपहिया, नौसिखिया, मज़ाक़िया, मुखिया, मुग़लिया, हालिया. Are all of these indeclinable? Benwing2 (talk) 02:24, 9 September 2020 (UTC)Reply
@AryamanA, Benwing2: I have made them all indeclinable. Many are in McGregor but not all. The -wheeled" adjectives are not labelled. Interesting that मुखिया (mukhiyā) is labelled as "usu. inv.". So it can also be declined? --Anatoli T. (обсудить/вклад) 03:20, 9 September 2020 (UTC)Reply
@Atitarev, Benwing2: Sounds good to me. I would never decline मुखिया (mukhiyā), but other dialects may treat it another way. —AryamanA (मुझसे बात करेंयोगदान) 03:24, 9 September 2020 (UTC)Reply

on-m declension edit

@AryamanA, Atitarev I noticed you added an on-m declension. Are you sure we need this? It looks to be indeclinable, and we already can add notate indeclinable nouns using |ind=1 on the headword. (We also already have an indeclinable noun type indicated using <$>, although I am planning on removing it.) Benwing2 (talk) 05:21, 9 September 2020 (UTC)Reply

@Benwing2: I'm not very comfortable with calling nouns "indeclinable" because, unlike adjectives, in this case the usual obl. pl. and voc. pl. morphemes are being added but they just get assimilated and on the surface there is no change. There's nothing weird going on relative to other masculine nouns, the same morphemes are involved. Adjectives that are undeclinable just do not care about gender agreement, they have no other forms. They are an entirely distinct class from declinable adjectives. And unlike adjectives, there's no precedent of Hindi dictionaries calling nouns "invariable" or "indeclinable", so that may be confusing for the end user. —AryamanA (मुझसे बात करेंयोगदान) 05:32, 9 September 2020 (UTC)Reply
@AryamanA I see your point, but I'm concerned that if we follow this logic, we need to add separate declensions for masculine and feminine nouns in -e, and feminine nouns in -o, and masculine and feminine nouns in -ai, etc. etc., all or most of which exist and all of which are indeclinable. See CAT:Hindi indeclinable nouns and CAT:Hindi indeclinable proper nouns for examples of such nouns. Overall I think that a declension table all of whose forms are the same is more confusing than simply stating that the noun is indeclinable. Benwing2 (talk) 06:20, 9 September 2020 (UTC)Reply

Need of a new Pronoun declension Table edit

(moved from Talk:तुमने)

@Benwing2, AryamanA, Atitarev Honestly, the pronoun declension tables that are used for Hindi, the way they are now, are an abomination to say the least haha. I really think we need a new one. The current one ignores many declensions. Have a look at the declensions below in the pictures. Can this be implemented? There should be two tables. One for the pronouns and their grammatical cases and the second for all the pronouns formed using the 8 case-markers/primary postpositions of Hindi. Also, there are these genitive and semblative cases which unlike other case-markers and pronouns further decline depending on gender and number. If those declensions (Pic3) could be implemented in the table in Pic2 then that would be better.

 
Pic1: Pronouns and the case declensions
 
Pic2: Pronouns using case-markers
 
Pic3: declensions of the declinable pronouns (genitive and semblative) and case-markers

We can replace the columns that we use now with a column as shown in the pics. The current table is like this below. I don't think "indirect" is the common word that is used to refer to the "oblique" case. Itsmeyash31 (talk) 01:07, 22 September 2020 (UTC)Reply

@Itsmeyash, AryamanA, Atitarev Let me see about implementing this, it may not be so hard. Benwing2 (talk) 02:39, 22 September 2020 (UTC)Reply
@Itsmeyash31 Benwing2 (talk) 02:39, 22 September 2020 (UTC)Reply
@Benwing2, Atitarev, Itsmeyash31: Why not just have one big manually encoded table and use that at all the entries? {{hi-personal pronouns}}. Automation seems unnecessary seeing as these are very unique declensions. —AryamanA (मुझसे बात करेंयोगदान) 03:29, 22 September 2020 (UTC)Reply
@Benwing2, Atitarev, AryamanA: Yes, that would be much better. By the way, I just noticed there are some wrong entries in the pictures I uploaded, for e.g. some are wrongly gendered and I used "proximal/distal" for the relative and interrogative pronouns. That should be removed.

Width of template edit

The template seems to be bigger than the entire screen width, and apparently it is set from 30-45em, isn't that quite big, should it not be just width 100%, at least for mobile devices?
-Taimoor Ahmed(گل بات؟) 00:50, 21 May 2021 (UTC)Reply

@Taimoorahmed11: Yes, on mobile devices the template is too wide. If you want to discuss this further, you should try contacting the author User:Benwing2. Kutchkutch (talk) 09:00, 21 May 2021 (UTC)Reply
Pinging: @Benwing2:, Hi! What's your opinion on this?
-Taimoor Ahmed(گل بات؟) 22:26, 21 May 2021 (UTC)Reply
Return to "hi-noun" page.