User talk:Metaknowledge/Latin problems
Latest comment: 12 years ago by Metaknowledge in topic Code to generate these lists
Code to generate these lists
editThis Perl program:
#!perl -w
use warnings;
use strict;
open OUT_1, ">", "Latin-problems-1.txt" or die;
open OUT_2, ">", "Latin-problems-2.txt" or die;
open OUT_3, ">", "Latin-problems-3.txt" or die;
open OUT_4, ">", "Latin-problems-4.txt" or die;
open OUT_5, ">", "Latin-problems-5.txt" or die;
open OUT_6, ">", "Latin-problems-6.txt" or die;
$/ = "</page>\n";
while(<>)
{
next unless m/<ns>0<\/ns>/;
next unless m/Latin/;
next if m/<text xml:space="preserve" \/>/;
die unless m/<title>([^<]+)<\/title>/;
my $title = $1;
next unless $title =~ m/ibus$/ or $title =~ m/tote$/;
die unless m/<text xml:space="preserve">([^<]+)<\/text>/;
$_ = "$1\n";
next unless m/^== *Latin *== *$/m;
die unless s/^(?:.*\n)?== *Latin *== *(?:\n|$)//s;
s/\n==(?!=) *.*$/\n/s;
# now have just the text of the Latin section
print OUT_1 "* [[$title#Latin|$title]]\n"
if $title =~ m/[āēīōūĀĒĪŌŪĭ]/;
print OUT_2 "* [[$title#Latin|$title]]\n"
if m/{{inflection of[|][^|]*[|][^|]*[|][^|]*[|]c[|]/;
print OUT_3 "* [[$title#Latin|$title]]\n"
if m/{{inflection of[|][^|]*[|]([^|]*)us[|][^|]*[|]m[|]p[|]lang=la}}/
and m/{{la-part-form[|]\Q${1}\Eorum}}/;
print OUT_4 "* [[$title#Latin|$title]]\n"
if $title =~ m/ibus$/
and m/{{inflection of[|][^|]*[|][^|]*[|][^|]*[|]f[|]s[|]lang=la}}/;
print OUT_5 "* [[$title#Latin|$title]]\n"
if $title =~ m/ibus$/ and ! m/abl/;
print OUT_6 "* [[$title#Latin|$title]]\n"
if $title =~ m/tote$/
and m/{{conjugation[ ]of[|][^|]*[|][^|]*[|]2[|]p[|]pres[|]act[|]imp
[|]lang=la}}/x;
}
__END__
takes the uncompressed XML dump from standard-input and writes out the Latin problem-lists in six files named Latin-problems-1.txt, Latin-problems-2.txt, etc. In Bash, if bzip2 is installed, it could be run like this:
time bzip2 -d < enwiktionary-20120803-pages-articles.xml.bz2 | perl Latin-problems.pl
—RuakhTALK 00:49, 5 August 2012 (UTC)
- Wow, thank you! I've got my work cut out for me! --Μετάknowledgediscuss/deeds 03:54, 5 August 2012 (UTC)