User talk:Metaknowledge/Latin problems

Latest comment: 11 years ago by Metaknowledge in topic Code to generate these lists

Code to generate these lists

edit

This Perl program:

#!perl -w

use warnings;
use strict;

open OUT_1, ">", "Latin-problems-1.txt" or die;
open OUT_2, ">", "Latin-problems-2.txt" or die;
open OUT_3, ">", "Latin-problems-3.txt" or die;
open OUT_4, ">", "Latin-problems-4.txt" or die;
open OUT_5, ">", "Latin-problems-5.txt" or die;
open OUT_6, ">", "Latin-problems-6.txt" or die;

$/ = "</page>\n";

while(<>)
{
  next unless m/<ns>0<\/ns>/;
  next unless m/Latin/;
  next if m/<text xml:space="preserve" \/>/;
  die unless m/<title>([^<]+)<\/title>/;
  my $title = $1;
  next unless $title =~ m/ibus$/ or $title =~ m/tote$/;
  die unless m/<text xml:space="preserve">([^<]+)<\/text>/;
  $_ = "$1\n";
  next unless m/^== *Latin *== *$/m;
  die unless s/^(?:.*\n)?== *Latin *== *(?:\n|$)//s;
  s/\n==(?!=) *.*$/\n/s;
  # now have just the text of the Latin section
  print OUT_1 "* [[$title#Latin|$title]]\n"
    if $title =~ m/[āēīōūĀĒĪŌŪĭ]/;
  print OUT_2 "* [[$title#Latin|$title]]\n"
    if m/{{inflection of[|][^|]*[|][^|]*[|][^|]*[|]c[|]/;
  print OUT_3 "* [[$title#Latin|$title]]\n"
    if     m/{{inflection of[|][^|]*[|]([^|]*)us[|][^|]*[|]m[|]p[|]lang=la}}/
       and m/{{la-part-form[|]\Q${1}\Eorum}}/;
  print OUT_4 "* [[$title#Latin|$title]]\n"
    if     $title =~ m/ibus$/
       and m/{{inflection of[|][^|]*[|][^|]*[|][^|]*[|]f[|]s[|]lang=la}}/;
  print OUT_5 "* [[$title#Latin|$title]]\n"
    if $title =~ m/ibus$/ and ! m/abl/;
  print OUT_6 "* [[$title#Latin|$title]]\n"
    if     $title =~ m/tote$/
       and m/{{conjugation[ ]of[|][^|]*[|][^|]*[|]2[|]p[|]pres[|]act[|]imp
                               [|]lang=la}}/x;
}

__END__

takes the uncompressed XML dump from standard-input and writes out the Latin problem-lists in six files named Latin-problems-1.txt, Latin-problems-2.txt, etc. In Bash, if bzip2 is installed, it could be run like this:

time bzip2 -d < enwiktionary-20120803-pages-articles.xml.bz2 | perl Latin-problems.pl

RuakhTALK 00:49, 5 August 2012 (UTC)Reply

Wow, thank you! I've got my work cut out for me! --Μετάknowledgediscuss/deeds 03:54, 5 August 2012 (UTC)Reply
Return to the user page of "Metaknowledge/Latin problems".