User:LA2

      LA2 is the username for Lars Aronsson, Sweden. See w:user:LA2.

      Wiktionary:Babel
      sv Den här användaren talar svenska som modersmål.
      en-3 This user is able to contribute with an advanced level of English.
      de-2 Dieser Benutzer hat fortgeschrittene Deutschkenntnisse.
      da-1 Denne bruger har et grundlæggende kendskab til dansk.
      no-1 Denne skribenten har litt kjennskap til norsk.
      Search user languages or scripts
      For my cut-and-paste convenience:
      ==Swedish==
      ===Etymology===
      {{compound|a|b|lang=sv}}
      ====Conjugation====
      ====Declension====
      ====Related terms====
      ====Usage notes====
      ===References===
      * {{R:SAOL|åäö|%e5%e4%f6}}
      * {{R:SAOB online|åäö}}
      
      ====Translations====
      {{trans-top|}}
      {{trans-mid}}
      * Swedish: {{t|sv|}}
      {{trans-bottom}}
      
      ===Adjective===
      {{head|sv|adjective form}}
      
      # {{sv-adj-form-abs-indef-n|}}
      # {{sv-adj-form-abs-def-m|}}
      # {{sv-adj-form-abs-def+pl|}}
      # {{sv-adj-form-comp|}}
      # {{sv-adj-form-sup-pred|}}
      # {{sv-adj-form-sup-attr|}}
      
      ===Adverb===
      {{head|sv|adverb}}
      
      ===Noun===
      {{head|sv|noun form}}
      
      # {{sv-noun-form-indef-gen|}}
      # {{sv-noun-form-def|}}
      # {{sv-noun-form-def-gen|}}
      # {{sv-noun-form-indef-pl|}}
      # {{sv-noun-form-indef-gen-pl|}}
      # {{sv-noun-form-def-pl|}}
      # {{sv-noun-form-def-gen-pl|}}
      
      ===Verb===
      {{head|sv|verb form}}
      
      # {{sv-verb-form-pre|}}
      # {{sv-verb-form-past|}}
      # {{sv-verb-form-sup|}}
      # {{sv-verb-form-imp|}}
      # {{sv-verb-form-inf-pass|}}
      # {{sv-verb-form-pre-pass|}}
      # {{sv-verb-form-past-pass|}}
      # {{sv-verb-form-sup-pass|}}
      # {{sv-verb-form-prepart|}}
      # {{sv-verb-form-pastpart|}}
      

      Diary

      May 4, 2013: Should sometimes read:

      • Ladislav Zgusta, Manual of Lexicography (1971; foreword signed 1968) Google Books
      • C.C. Berg (professor at Leiden), Report on the Need for Publishing Dictionaries which do not to-date exist (booklet, between 1960 and 1962, published by CIPSH, Conseil International de la philosophie et des sciences humaines)

      January 24, 2013: I introduce {{sv-compound}} and category:Swedish compounds with maskin, as used for displaying Derived terms in maskin#Swedish. -- Bad idea.

      November 19, 2012: Fun photo gallery: 10 Swedish words you won’t find in English: orka, harkla, hinna#Verb, blunda, mysa, vabba, duktig, jobbig, gubbe/gumma, mormor/farmor/morfar/farfar (actually 14).

      August 27, 2012: I give up all hope about the Norwegian entries in en.wiktionary. Please remind me to stay away if any discussion should come up again.

      April 18, 2011: To do: handgemäng, hägn, ohägn, hugnad, misshällighet

      April 7, 2011: All the words from this article about common translation errors should be incorporated into Wiktionary.

      April 3, 2011: I think I'm done with Swedish form entries for now. When the new XML dump arrived 20110402, Wiktionary contained 87,651 Swedish words. After parsing the XML dump I was able to generate 1521 new Swedish form entries. I have the machinery in place to fill in the missing form entries after each new dump. Now we need to expand the 20,000 Swedish gloss entries to a full Swedish vocabulary. But can that work be automated? How do we add the next 20,000 gloss entries without spending 3 minutes on each? (1000 hours or 25 weeks of fulltime work)

      March 20, 2011: When spannen#Swedish is the definite singular of spann (bucket) and definite plural of spann (set of horses), I'd like to indicate in the form entry which sense belongs to which form. Perhaps "senseid" is the way to do this. Both the form templates and the declension/conjugation templates would have to take the sense ID as an extra parameter. This would be a major change to the 80,000 existing Swedish entries.

      March 18, 2011: I create Appendix:Swedish verbs.

      March 10, 2011: The new XML database dump shows 80,000 Swedish entries, yet another giant leap forward. My simple script for generating missing form entries has evolved into one that reads the declension and conjugation table template calls and concludes which form entry templates should be called from where. For example {{sv-noun-reg-ar|2=and}} in ande should generate {{sv-noun-form-def|ande}} in the page anden. If this form entry template call is found, fine. If not, the wanted form entry is saved as a file, that a modified version of pagefromfile.py can read. If the page doesn't exist, it is created. If it exists, a ==Swedish== entry is appended at the bottom. If a Swedish entry already exists, because "anden" is also the definite form of and, this is logged and I have to edit the existing Swedish entry manually. At least for now, this happens a lot. In some cases, a verb form entry is also an adjective form. In some cases, the form entry exists but uses another template (form of, plural of, ...) or no template at all. Right now I have a backlog of 8,000 entries to go through, or 10 percent of the existing stock. Maybe I should automate the addition of adjective form entries to Swedish entries that don't have an adjective subheading already ... done.

      March 2, 2011: The most commonly used Norwegian templates are: {{no-noun-infl}} (733 calls), {{nn-noun-m1}} (351), {{nb-noun-m2}} (221), {{nn-noun-form}} (178), {{no-noun}} (125), {{nn-verb}} (101), {{no-noun-c}} (97), {{nb-noun-m1}} (87), {{nn-inf}} (85), {{no-verb}} (76), {{no-noun-m1}} (73), {{no-noun-n1}} (71), {{no-verb-1}} (68), {{no-verb-2}} (54), {{nn-noun-n1}} (51), {{no-noun-mu}} (48), {{no-adj-infl}} (47), {{no-noun-form}} (41), {{no-noun-irreg}} (40), {{no-adj-2}} (39), {{no-adj-1}} (33), {{nn-verb-form}} (32), {{nb-noun}} (32), {{nn-verb-1}} (30), {{no-adj}} (26), {{nn-noun-f2}} (24), {{nb-noun-n1}} (23), {{no-verb_form}} (22), {{nn-noun-irreg}} (21), {{nb-class1}} (18), {{nb-g}} (17), {{nb-noun-c}} (16), {{no-adj-3}} (15), {{no-noun-nu}} (13), {{nn-pers-pron}} (13), {{no-noun-n4}} (12), {{no-noun-n3}} (12), {{nn-noun-f1}} (12), {{nn-adj-2}} (11), {{nb-verb-1}} (11), {{no-noun-cu}} (10), {{nn-adj-table}} (10), {{nb-noun-n3}} (10), {{no-verb-4}} (9), {{nn-adj-1}} (9), {{nb-verb}} (9), {{no-noun-f}} (8), {{nn-verb-2}} (8), {{nb-adj-table}} (8), {{nn-verb-form-pre}} (7), {{nb-pers-pron}} (7), {{no-noun-f1}} (6), {{no-adv}} (6), {{no-adj-irreg}} (6), {{nn-noun-f3}} (6), {{nn-adj-3}} (6), {{nb-verb-2}} (6), {{nb-class2}} (6), {{nb-adj-2}} (6), {{no-verb-form}} (5), {{no-noun-reg-m}} (5), {{nn-g}} (5).

      February 27, 2011: I don't speak French or Italian, but when I saw all these form entries (mostly created by Keenebot2 and SemperBlottoBot) for verbs using the primitive {{form of}}, I started to substitute them to the more structured {{conjugation of}}. See Template talk:conjugation of#Stats. I have made the following translations of parameters:

      • lang=French/Italian ⇒ lang=fr/it
      • First/second/third person ⇒ 1/2/3
      • singular/plural ⇒ s/p
      • present indicative ⇒ pres|ind
      • present tense ⇒ pres|ind
      • present subjunctive ⇒ pres|sub
      • imperfect indicative ⇒ imperf|ind
      • imperfect tense ⇒ imperf|ind
      • imperfect subjunctive ⇒ imperf|sub
      • past historic ⇒ [[past historic]]
      • conditional mood/tense ⇒ cond
      • future tense ⇒ fut
      • imperative ⇒ imp
      • infinitive ⇒ inf
      • gerund ⇒ gerund

      February 25, 2011: In the XML database dump of 2011-02-05, the most common headings for Swedish entries (compare August 21, 2010) are:

       61850 Swedish
       43879 Noun
       11033 Verb
        5695 Adjective
        5524 Declension
        4499 Etymology
        4115 Related terms
        2774 Pronunciation
        1777 Conjugation
        1675 See also
        1478 Proper noun
        1006 Synonyms
         666 Adverb
      
         591 Derived terms
         565 References
         533 Usage notes
         333 Antonyms
         135 Pronoun
         129 Abbreviation
         125 Cardinal number
         104 Interjection
          90 Etymology 2
          90 Etymology 1
          86 Inflection
          79 Preposition
          76 Suffix
      
          71 Compounds
          53 Idiom
          51 Conjunction
          49 Phrase
          39 Prefix
          37 Ordinal number
          25 Proverb
          22 Descendants
          18 Etymology 3
          16 Hypernyms
          12 Hyponyms
          11 Phrases
          11 Initialism
      
          10 Homophones
          10 Determiner
           9 {{abbreviation|Swedish}}
           8 Acronym
           5 Article
           5 {{abbreviation|sv}}
           4 Troponyms
           4 Letter
           4 {{initialism|sv}}
           4 External links
           4 Antonym
           4 Anagrams
      

      As a comparison, the most common headings for all languages (not counting the L2 headings for the language names themselves) are:

      1235093 Verb
       811866 Noun
       272027 Etymology
       267882 Pronunciation
       254013 Adjective
       234614 Anagrams
       123356 Related terms
       119880 Declension
        91788 Synonyms
        76466 Derived terms
        66909 Translations
        66639 References
        62966 Proper noun
        58676 See also
        49495 Alternative forms
        48712 Conjugation
        36726 Adverb
        33230 Participle
        32812 Hanzi
        26224 Han character
      
       24396 Inflection
       17626 Usage notes
       17535 External links
       16834 Antonyms
       15105 Descendants
       13497 Readings
       13331 Kanji
       10042 Etymology 1
       10033 Etymology 2
        8953 Hanja
        7029 Pronoun
        4578 Interjection
        3809 Compounds
        3623 Phrase
        3610 Suffix
        3452 {{initialism}}
        3422 Numeral
        3341 Verb form
        3238 Symbol
        3142 Cardinal number
      
        2998 Preposition
        2663 Prefix
        2572 Quotations
        2491 Mutation
        2433 Letter
        2380 Idiom
        2169 Conjunction
        1901 {{abbreviation}}
        1875 Pinyin syllable
        1853 Pronunciation 2
        1852 Pronunciation 1
        1652 Abbreviation
        1569 Coordinate terms
        1507 Proverb
        1485 Etymology 3
        1447 Pinyin
        1429 Hyponyms
        1342 Gismu
        1152 Hypernyms
        1066 Syllable
      
         973 Statistics
         728 {{acronym}}
         709 Contraction
         677 {{abbreviation|mul}}
         669 Devanagari spelling
         660 Ordinal number
         645 Urdu spelling
         569 Particle
         542 Determiner
         505 Number
         482 Abbreviations
         446 Alternative spellings
         368 Article
         365 Derived characters
         355 Scientific names
         342 Etymology 4
         330 Postposition
         299 Initialism
         288 Homophones
         239 Roman spelling
      

      The most common combinations and sequences for Swedish sections are:

       36998 ((Swedish(Noun)))
        8123 ((Swedish(Verb)))
        3620 ((Swedish(Adjective)))
         981 ((Swedish(Proper noun)))
         918 ((Swedish(Etymology;Noun(Declension))))
         699 ((Swedish(Noun(Declension))))
         410 ((Swedish(Etymology;Noun(Declension;Related terms))))
         372 ((Swedish(Adjective;Verb)))
         359 ((Swedish(Verb(Conjugation;Related terms))))
         340 ((Swedish(Etymology;Verb(Conjugation;Related terms))))
         339 ((Swedish(Noun(Declension;Related terms))))
         330 ((Swedish(Noun;Verb)))
         249 ((Swedish(Pronunciation;Noun)))
         220 ((Swedish(Etymology;Adjective(Declension))))
         211 ((Swedish(Etymology;Noun(Declension)References)))
         182 ((Swedish(Adverb)))
         180 ((Swedish(Noun(Related terms))))
         156 ((Swedish(Pronunciation;Noun(Declension))))
         145 ((Swedish(Etymology;Proper noun)))
         139 ((Swedish(Etymology;Noun)))
         121 ((Swedish(Pronunciation;Noun(Declension;Related terms))))
         120 ((Swedish(Etymology;Adjective(Declension;Related terms))))
         114 ((Swedish(Noun(See also))))
         109 ((Swedish(Proper noun(Related terms))))
         106 ((Swedish(Etymology;Noun(Declension;See also))))
         105 ((Swedish(Noun(Synonyms))))
         104 ((Swedish(Etymology;Verb(Conjugation))))
      
          99 ((Swedish(Pronunciation;Verb(Conjugation;Related terms))))
          95 ((Swedish(Noun(Declension;See also))))
          91 ((Swedish(Adjective;Adverb)))
          88 ((Swedish(Verb(Conjugation))))
          79 ((Swedish(Pronunciation;Noun(Related terms))))
          79 ((Swedish(Noun(Declension;Related terms;See also))))
          78 ((Swedish(Etymology;Pronunciation;Noun(Declension))))
          72 ((Swedish(Abbreviation)))
          71 ((Swedish(Adjective(Related terms))))
          70 ((Swedish(Cardinal number)))
          69 ((Swedish(Pronunciation;Adjective)))
          62 ((Swedish(Noun(Declension;Synonyms))))
          62 ((Swedish(Etymology;Noun(Declension;Related terms;See also))))
          61 ((Swedish(Adjective(Declension;Related terms))))
          57 ((Swedish(Adjective(Declension))))
          52 ((Swedish(Etymology;Noun(Declension;Synonyms))))
          43 ((Swedish(Etymology;Pronunciation;Noun(Declension;Related terms))))
          42 ((Swedish(Verb(Conjugation;Related terms;See also))))
          42 ((Swedish(Pronunciation;Verb)))
          42 ((Swedish(Pronunciation;Etymology;Verb(Conjugation;Related terms))))
          42 ((Swedish(Noun(Derived terms))))
          42 ((Swedish(Etymology;Pronunciation;Noun)))
          40 ((Swedish(Alternative forms;Proper noun)))
          38 ((Swedish(Pronoun)))
          38 ((Swedish(Etymology;Verb(Conjugation;Related terms;See also))))
          38 ((Swedish(Etymology;Adjective)))
          37 ((Swedish(Etymology;Adverb)))
      

      February 8, 2011: English Wiktionary now contains more Swedish entries (78,985) than Swedish Wiktionary (76,119). The overlap is only 34,178 entries. Swedish Wiktionary has more gloss definitions and English Wiktionary has more form entries, many created by LA2-bot.

      February 6, 2011: I should try to incorporate as much as possible of Wikipedia:Swedish Wikipedians' notice board/Terminology into Wiktionary.

      February 4, 2011: I set up {{R:Rikstermbanken}} and create some entries that refer to it.

      January 30, 2011: I set up {{R:Utrikes namnbok}} and create some entries that refer to it, mostly in Category:sv:Government.

      January 20, 2011: How to extract a list of Swedish headwords from the Swedish Wiktionary:

      wget -O - "http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php?wikilang=sv&wikifam=.wiktionary.org&basecat=Svenska&basedeep=5&templates=&mode=al&go=Search&format=csv&userlang=en" |
         awk '-F\t' '$1==0 {print $2}' |
         tr _ ' ' | LC_COLLATE=sv_SE.utf8 sort
      

      January 10, 2011: How to extract a list of Swedish headwords:

      wget -O - "http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php?wikilang=en&wikifam=.wiktionary.org&basecat=Swedish+language&basedeep=5&templates=&mode=al&go=Search&format=csv&userlang=en" |
         awk '-F\t' '$1==0 && $3!="Translation_requests_(Swedish)" && $3!="Translations_to_be_checked_(Swedish)" && $3!~/derived_from_Swedish/ {print $2}' |
         tr _ ' ' | LC_COLLATE=sv_SE.utf8 sort
      

      November 19, 2010: I import {{R:runeberg.org}} from sv.wikipedia.

      November 15, 2010: I think there are now 20,000 Swedish entries in en.wiktionary.org, which is twice as many as the beginning of this year. This has been achieved mainly by adding form entries. Statistics here. I have added more word forms, based on word frequency lists (see corpus coverage in the August 31 entry below). I have focused less on including all defintions and all forms for every word. What I have tried to do is to create links between the entries, so compounds link to their component words. Hopefully, this will attract more users who then start to fill in the missing definitions (second usage of words) and forms. This philosophy, known as eventualism, is similar to creating stub articles in Wikipedia, hoping that later users will fill in more facts. I'm not a general subscriber to that idea, but it can be a useful approach in the early stages of a project. A useful Swedish dictionary probably needs 120,000 basic forms (and half a million form entries), which is ten times more than en.wiktionary has today and five times more than sv.wiktionary has.

      September 18, 2010: There are 51,318 pages that call {{t}}, {{t+}} or {{t-}}. The page with most translations is be (607 translations), followed by you (447), set (438), love (421). Halfway down the list we find words like toner and toadstool (4 translations each). The most translated words that don't yet have any Swedish translation (or where the translations didn't use these templates in the database dump of 2010-09-12) are: judge (161), (156), heat (154), jump (153), spread (141), stroke (140), proper (137), cry (131), behind (130), desire (126), nose (125), round (123), article (122), double (121), taste (117), end (117), situation (116), shut up (116), male (116), Albanian (116), draft (112), chest (112), e-mail (110), truth (108), storm (108), squeeze (105), same (105), job (105), exit (105), (104), cheap (103), steer (102), prayer (100), entry (100), cinema (100), split (99), Gypsy (99), care (99), waste (98), sole (97), hook (97), chat (97), welcome (96), believe (96), coach (95), short (94), bend (94), herd (91), finish (91), sit (90), return (90), pickle (90), drill (90), dragon (90), cum (90), cherry (90), butt (90), British (90), masculine (88), correct (88), icon (87), gun (87), gentleman (87), freedom (87), beginning (87), separate (86), Moon (86), account (86), justice (85), I'm Jewish (85), definition (85), puzzle (84), atmosphere (84), corner (83), Macedonian (81), lime (81), lady (80), decline (80), damn (80), cardinal (79), plague (78), interest (78), dash (78), auxiliary (78), study (77), newspaper (77), hi (77), criminal (77), cement (77), bundle (77), bug (77), appropriate (77), agree (77), vacuum (76), swarm (76), reach (76), poetry (76), late (76), harmony (76), custom (76), chip (76), certainly (76), authority (76), rear (75), pumpkin (75), discharge (75), silk (74), dinner (74), crash (74), Commonwealth of Independent States (74), cheat (74), accept (74), walnut (73), transfer (73), grain (73), ceremony (73), abate (73), victim (72), vagina (72), type (72), prophet (72), increase (72), contact (72), constitution (72), constellation (72), budget (72), application (72), soldier (71), plot (71), painting (71), crew (71), brass (71), thunder (70), roast (70), psychology (70), communism (70), brake (70), witch (69), saddle (69), neighbour (69), vault (68), shallow (68), perfume (68), particle (68), harvest (68), electronic (68), coral (68), camp (68), amount (68), odd (67), occupation (67), how much (67), device (67), chamber (67), bust (67), association (67), airplane (67), track (66), stab (66), spice (66), pomegranate (66), crust (66), comfort (66), aeroplane (66), random (65), plough (65), no way (65), married (65), foundation (65), execution (65), channel (65), breath (65), arrest (65), studio (64), Myanmar (64), fail (64), enter (64), dish (64), actual (64), abrupt (64), wizard (63), Vladimir (63), substantial (63), splinter (63), reply (63), purple (63), paddle (63), nucleus (63), notice (63), illusion (63), how are you (63), deliver (63), dairy (63), counterfeit (63), blackmail (63), arrive (63), wardrobe (62), stuff (62), seat (62), not at all (62), deliberate (62), cylinder (62), crop (62), advertisement (62), zone (61), tower (61), source (61), sexuality (61), litter (61), gravity (61), fill (61), composition (61), business (61), bully (61), asshole (61), trial (60), sponge (60), sigh (60), resolution (60), orthography (60), mount (60), Java (60), implement (60), hood (60), half (60), habit (60), forever (60), anyway (60). Of course there can also be many definitions of be or you that don't have Swedish translations.

      September 7, 2010: Some Unix/Linux shell commands:

      To extract just one language (here: Swedish) from the XML database dump and removing the interlanguage links:
      sed 's/<text.*>/\n/;s/<\/text>/\n==End==/' enwiktionary.xml | \
         sed '/^==\s*Swedish/,/^==[^=]/!d;/^==[^=]/d;/^\[\[[a-z][-a-z]*:/d'
      
      To extract just the native language example sentences from the above (beware of the " and ' trick):
      sed '/^#:[^:]/!d;s/^#:*\s*//;s/=.*//;s/'"'''"'//g;s/'"''"'//g;s/&[/a-z]*;//g'
      
      To cut plain text into a list of words (I kept hyphen in words, but not digits; you might want to add »:
      tr ' -&(-,.-?[]|' '\n'|sed '/^$/d'
      
      To find the most frequent words:
      sort | uniq -c | sort -nr
      

      When all of the above are combined, I get a list of all words occurring in the Swedish example sentences, sorted by frequency. And so I can check that Wiktionary provides explanaitions for all or most of them. The Swedish example sentences constitute an 84 kbyte e-text, having 13,255 words of which 4819 are unique. Wiktionary has Swedish entries for 71.1 percent of the occurrences. This is rather low. Part of the explanation is that some text is in English, because the example sentences are incorrectly formatted and contain templates and URLs.

      September 4, 2010: Inserting the templates l and t:

      python replace.py -family:wiktionary -lang:en -xml:enwiktionary.xml -summary:"l:sv, t:sv" -regex -recursive \
       '\[\[#Swedish\|([^\]]+)\]\]' '{{l|sv|\1}}' \
       '\[\[([^#\|\]]+)#Swedish\|[^\]]*\]\]' '{{l|sv|\1}}' \
       '(\* *Swedish:.*?)\[\[([^\]]*)\]\]' '\1{{t|sv|\2}}' \
       '(\* *Swedish:.*?){{l\|sv\|' '\1{{t|sv|' \
       '(\* *Swedish:.*?{{t[^}]*)}} {{([cfmnp](\|[cfmnp])*}})' '\1|\2'
      

      August 31, 2010: The Swedish Bible of 1917 contains 769,316 words of text, using a vocabular of 26,990 words and word forms, including some capitalized words at the beginning of sentences. Of this vocabulary, 3802 words or 14 % have Swedish entries in en.wiktionary. However, since these 14 % contain many of the most common words, they make up 74 % of the text. This number (74 %) is the definition of the dictionary's coverage of this corpus of text. If you pick a random page, line and word in the Bible, there's 74 % chance that word has a Swedish entry here. 74 % is a very low coverage for a dictionary, and a sign that we have a very long way to go.

      Here's how it works on the two first verses: i begynnelsen skapade gud himmel och jord. och jorden var öde och tom, och mörker var över djupet, och guds ande svävade över vattnet. (Genesis 1:1-2) Of these 24 words, 5 are "och", 2 are "var", 2 are "över". These three words alone make up 9 of the 24 words or 37% of the text.

      Corpus Bible
      (1917)
      Herr Arnes
      penningar
      Swedish
      Wikipedia
      as of
      2010-06-08
      Tankar i
      utvandrings-
      frågan
      KB:s underlag
      till en nationell
      strategi...

      (2010)
      Kultur-
      utredningen

      (2009)
      SvD Under-
      streckare
      ,
      Sept. 1–18,
      2010
      Framtidens
      Internet

      by Jan
      Kallberg
      Words in corpus 769,316 23,514 111,625,635 93,078 18,607 248,282 31,608 23,414
      Unique words 26,990 3,303 3,412,039 14,516 4,017 23,050 8,815 5,086
      Date of
      database
      dump
      Swedish
      entries
      Percent coverage of corpus
      2010-08-12 10,987 72.6 75.4 55.6 66.1 49.2 58.3 63.9 70.1
      2010-08-24 11,531 74.2 76.1 55.8 66.7 49.4 58.5 64.0 70.4
      2010-09-01 14,678 84.8 84.7 59.7 73.1 55.0 65.0 69.2 76.0
      2010-09-12 16,926 87.3 87.5 61.6 77.0 65.7 73.2 71.2 78.5
      2010-09-23 17,836 87.5 88.1 62.9 78.4 70.3 76.4 73.9 80.2
      2010-10-05 17,851 87.5 88.1 63.0 78.4 70.4 76.4 74.0 80.2
      2010-10-15 17,885 87.5 88.1 63.2 78.4 70.5 76.4 74.1 80.3
      2010-10-30 19,449 87.7 88.2 64.0 80.4 71.5 77.8 75.5 81.4
      *2010-12-31 22,135 88.7 89.2 65.9 84.5 77.8 83.7 79.3 85.1
      **2011-01-10 40,621 89.5 89.6 68.3 85.6 78.4 84.9 81.1 89.7
      **2011-01-23 53,421 90.0 89.8 69.4 86.6 82.8 86.3 82.1 90.5
      **2011-01-31 59,889 90.2 90.0 69.9 87.5 83.1 86.8 82.6 91.2
      **2011-02-08 78,985 91.1 90.6 71.1 89.1 84.1 88.1 83.9 92.3
      **2011-03-23 87,267 91.4 90.7 71.8 89.7 84.9 88.5 84.6 92.7

      (The Wikipedia corpus used here contains some garbage that will never be covered by the dictionary, e.g. Wikipedia user names, occasional talk pages in English, and some remaining wiki markup, so the coverage percentage will inevitably be lower. It's still interesting to have a really large corpus to study.)

      (* No database dump exists for 2010-12-31, but a preliminary dictionary was extracted.)

      (** Dictionary generated by category wget. See diary entry for January 10, 2011.)

      August 28, 2010: I think it would be helpful to know how common a word is. This can be determined by computing its rank in some large body of text, putting the most frequent word ("the" for English, "och" for Swedish) at position 1. This is what template {{rank}} does, for example able has rank 391, but I think a logarithmic scale would be more informative than a linear one. Color graphics could indicate how "hot" a word is, but with the cool and neutral black, white and light-blue appearance of Wiktionary, the colors must be restricted to a very small area:

      Spectre des couleurs.svg
      rank 8
      Spectre des couleurs.svg
      rank 64
      Spectre des couleurs.svg
      rank 512
      Spectre des couleurs.svg
      rank 4096
      Spectre des couleurs.svg
      rank 32,768
      Spectre des couleurs.svg
      rank 262,144

      August 21, 2010: Many open issues:

      • So far, only 10,000 entries in Swedish. Redefining templates is easier now than after many more entries have been created.
      • How should templates be named? Is the -reg-/-irreg- part of the name really necessary? Can we do with fewer templates and shorter names?
      • How do we create entries for all inflected forms? Can this be automated?
      • Can conjugation/declension tables handle passive verbs? Subjunctives? All adjectives?
      • Should template parameters be standardized? Now they are different everywhere: 2=, stem=, sg-def-gen=
      • Can templates support irregular verbs, so avgå, tillstå kan be based on gå, stå?
      • Can templates support prefixed and suffixed words, e.g. "gå an/gick an" smarter than today?
      • Should templates for Swedish words be standardized across languages of Wiktionary?
      • Old spelling (elf/älf/älv) can be handled, but how should we handle giva/ge, hava/ha?

      The most common headings in Swedish sections are:

       10969 Swedish        533 Derived terms          72 Compounds          37 Ordinal number
        6402 Noun           319 Adverb                 72 Abbreviation       31 Conjunction
        2618 Pronunciation  251 Usage notes            63 Cardinal number    25 Proverb
        1705 Verb           251 Antonyms               58 Conjugation        22 Verb form
        1520 Related terms  214 Alternative spellings  54 Idiom              22 Descendants
        1300 Adjective      100 Etymology 2            52 References         17 Etymology 3
        1247 Proper noun    100 Etymology 1            51 Preposition        16 Hypernyms
        1013 Etymology       96 Inflection             48 Phrase             14 Homophones
         995 See also        88 Interjection           41 Alternative forms  12 Hyponyms
         789 Synonyms        83 Pronoun                39 Suffix             11 Phrases
      

      The most common heading structures are listed below. "((" means heading level 2.

        3158 ((Swedish(Noun)))                               57 ((Swedish(Pronunciation;Noun(See also))))
         831 ((Swedish(Proper noun)))                        56 ((Swedish(Pronunciation;Noun(Derived terms))))
         660 ((Swedish(Verb)))                               47 ((Swedish(Abbreviation)))
         565 ((Swedish(Pronunciation;Noun)))                 45 ((Swedish(Pronunciation;Adjective(Related terms))))
         505 ((Swedish(Adjective)))                          43 ((Swedish(Noun;Verb)))
         290 ((Swedish(Noun(Related terms))))                42 ((Swedish(Pronunciation;Noun;Verb)))
         206 ((Swedish(Etymology;Noun)))                     41 ((Swedish(Verb(See also))))
         168 ((Swedish(Noun(Synonyms))))                     37 ((Swedish(Alternative spellings;Proper noun)))
         168 ((Swedish(Noun(See also))))                     34 ((Swedish(Pronunciation;Noun(Synonyms))))
         156 ((Swedish(Pronunciation;Verb)))                 34 ((Swedish(Pronunciation;Adverb)))
         142 ((Swedish(Pronunciation;Noun(Related terms))))  34 ((Swedish(Alternative spellings;Noun(Related terms))))
         131 ((Swedish(Pronunciation;Adjective)))            33 ((Swedish(Phrase)))
         121 ((Swedish(Verb(Related terms))))                32 ((Swedish(Adjective(See also))))
         112 ((Swedish(Etymology;Proper noun)))              29 ((Swedish(Adjective;Noun)))
         101 ((Swedish(Proper noun(Related terms))))         28 ((Swedish(Pronunciation;Verb(See also))))
          81 ((Swedish(Adjective(Related terms))))           28 ((Swedish(Etymology;Noun(Related terms))))
          73 ((Swedish(Adverb)))                             27 ((Swedish(Etymology;Verb)))
          72 ((Swedish(Pronunciation;Verb(Related terms))))  27 ((Swedish(Etymology;Adjective)))
          72 ((Swedish(Etymology;Pronunciation;Noun)))       26 ((Swedish(Verb(Synonyms))))
          62 ((Swedish(Noun(Derived terms))))                26 ((Swedish(Interjection)))
      

      Starting to introduce ====Declension==== and ====Conjugation==== on a big scale, will change this pattern.

      It seems I have a bot command that works:

      python replace.py -family:wiktionary -lang:en -cat:'Swedish verbs' -summary:'Conjugation heading' -regex -dotall \
         '(===Verb===\s*({{infl[^\n]*}})?\s*)({{sv-verb-(irreg|reg-)[^\n]*}}\s*)(([^-=\[][^\n]*\n\s*)*)'    '\1\5====Conjugation====\n\3'    \
         '(====Verb====\s*({{infl[^\n]*}})?\s*)({{sv-verb-(irreg|reg-)[^\n]*}}\s*)(([^-=\[][^\n]*\n\s*)*)'  '\1\5=====Conjugation=====\n\3'
      


      August 20, 2010: In the database dump of 2010-08-12, there were 6341 calls to templates named sv-. Kinds are conj = conjugation table for verbs, decl = declension table for adjectives and nouns, form = referring from an inflected form to the main entry, infl = one-liner inflection pattern.

      Calls Template Kind Comment
      813 {{sv-noun-reg-er}} decl Since painted blue
      707 {{sv-noun-reg-ar}} decl Since painted blue
      488 {{sv-verb-reg-ar}} conj Since painted blue
      433 {{sv-noun-reg-or}} decl Since painted blue
      418 {{sv-noun-n-zero}} decl Since painted blue
      343 {{sv-noun}} decl Since painted blue and renamed {{sv-decl-noun}}. Contains table layout and colours, serving as the base for other noun templates.
      257 {{sv-adj-reg}} decl
      254 {{sv-noun-unc-irreg-c}} decl Since painted blue
      250 {{sv-noun-irreg-c}} decl Since painted blue
      218 {{sv-verb-irreg}} conj Since painted blue. Contains table layout and colours, serving as the base for other verb templates.
      191 {{sv-adv}} infl
      153 {{sv-noun-reg-r-c}} decl Since painted blue
      137 {{sv-verb-reg}} infl
      132 {{sv-noun-unc-irreg-n}} decl Since painted blue
      108 {{sv-adj-abs}} decl
      105 {{sv-adj-peri}} decl
      102 {{sv-verb-reg-er}} conj Since painted blue
      101 {{sv-noun-c-zero}} decl Since painted blue
      96 {{sv-noun-unc-n}} decl A redirect to {{sv-noun-unc-irreg-n}}
      92 {{sv-noun-irreg-n}} decl Since painted blue
      86 {{sv-adj}} infl
      84 {{sv-noun-unc-c}} decl A redirect to {{sv-noun-unc-irreg-c}}
      78 {{sv-verb-form-pre}} form
      72 {{sv-noun-reg-n}} decl Since painted blue
      60 {{sv-noun-form-indef-pl}} form
      56 {{sv-verb-form-past}} form
      54 {{sv-noun-form-def}} form
      33 {{sv-verb-irr}} infl
      31 {{sv-verb-form-sup}} form
      31 {{sv-adj-form-abs-pl}} form
      30 {{sv-adj-form-abs-indef-n}} form
      27 {{sv-verb-form-imp}} form
      26 {{sv-adj-form-abs-def}} form
      19 {{sv-noun-form-indef-gen}} form
      18 {{sv-adj-pastpart}} decl
      17 {{sv-noun-reg-r-n}} decl Since painted blue
      16 {{sv-noun-form-def-pl}} form
      15 {{sv-verb-form-pastpart}} form
      14 {{sv-verb-form-prepart}} form
      14 {{sv-verb-ar}} infl A redirect to {{sv-verb-reg}}
      13 {{sv-noun-form-indef-gen-pl}} form
      13 {{sv-noun-ar}} decl A redirect to {{sv-noun-reg-ar}}
      13 {{sv-adj-form-abs-def-m}} form
      11 {{sv-adj-prepart}} decl
      11 {{sv-adj-form-comp}} form
      10 {{sv-noun-or}} decl A redirect to {{sv-noun-reg-or}}
      10 {{sv-noun-form-def-gen}} form
      9 {{sv-noun-form-def-gen-pl}} form
      9 {{sv-adj-form-sup-pred}} form
      8 {{sv-adv-form-sup}} form
      7 {{sv-noun-un}} decl A redirect to {{sv-noun-unc-irreg-c}}
      7 {{sv-adj-form-sup-attr-pl}} form
      6 {{sv-adj-form-sup-attr-m}} form
      5 {{sv-adv-form-comp}} form
      5 {{sv-adj-form-sup-attr}} form
      4 {{sv-noun-n}} decl A redirect to {{sv-noun-reg-n}}
      3 {{sv-verb-form-pre-pass}} form
      3 {{sv-verb}} Erroneous call, since replaced.
      2 {{sv-verb-form-pres-pass}} form
      2 {{sv-verb-form-inf-pass}} form
      2 {{sv-adj-irreg}} decl
      2 {{sv-adj-form-sup-pred-pl}} form
      1 {{sv-noun-reg-}} Mentioned in {{sv-new-noun}}
      1 {{sv-noun-proper-def-irreg}} Listed on Wiktionary:Swedish inflection templates
      1 {{sv-noun-pl-irreg}} Listed on Wiktionary:Swedish inflection templates
      1 {{sv-noun-form-adj}} form
      1 {{sv-adj-small}} decl Called from {{sv-adj-decl}}, which is never used.
      1 {{sv-adj-form-comp-pl}} form
      1 {{sv-adj-abs-irreg}} Listed on Wiktionary:Swedish inflection templates

      August 19, 2010: There are currently 81 templates named sv-... (too many for my taste), having the following parts of their names:

      Number of templates
      having this component
      in their name
      Name
      component
      Meaning
      6 abs Absolute form of an adjective
      27 adj Adjective
      4 adv Adverb
      4 ar -ar plural declension of noun
      3 attr Superlative attribute form of an adjective
      5 c Common gender of noun (= utrum, n-gender)
      3 comp Comparative form of an adjective
      1 custom sv-verb-custom is a base/meta template
      2 decl Declension of nouns and adjectives
      6 def Definite form of nouns/adjectives
      2 er -er plural declension of noun
      30 form Inflected forms referring to the main entry
      4 gen Genitive form
      1 imp Imperative form of a verb
      4 indef Indefinite form of nouns/adjectives
      1 inf Infinitive passive form of a verb
      1 irr Irregular inflection
      8 irreg Irregular inflection
      2 m Masculine form of adjectives
      1 mermest Redirect shorthand for "peri"
      8 n Neutral gender (neutrum, t-gender)
      5 new Called from "nogomatch"
      1 nogomatch "You can create an entry..."
      30 noun Noun
      2 or -or plural declension of noun
      3 pass Passive form of a verb
      1 past Past tense form of a verb
      2 pastpart Past participle form of a verb
      2 peri Adjective comparation with mer/mest
      8 pl Plural
      2 pre Present tense form of a verb
      2 pred Superlative predicative form of an adjective
      2 prepart Present particip form of an adjective
      1 pres Present passive form of a verb
      2 r -r plural declension of noun
      11 reg Regular inflection
      1 small Smaller table layout, not used
      7 sup Superlative form of an adjective
      81 sv Swedish
      2 un Redirect synonym for abs or unc
      4 unc Uncountable noun (no plural forms)
      18 verb Verb
      2 zero Declension of nouns where plural = singular
      ↑Jump back a section

      Read in another language

      This page is available in 4 languages

      Last modified on 3 May 2013, at 23:21