User:Flyax/New pages by a user

The following two scripts give a list of all pages that have been created by a certain user. The first one takes a list of all their contributions and produces a list of all entries this user has modified. The second script gets data about the first revision of these entries, finds out the creator and adds the headword to the final list, if the creator is the user we are interested in.

How to run

We need to be patient!
We need a computer running Linux
The first script will create a temporary sub-directory and the second one the final list, named "pagesby$.txt", where $ is a username.
If the user has moved any pages, the redirects that have been created because of the moving will be in the final list.
For example:

./titlesbyuser.sh Flyax
./newbyuser.sh Flyax

The final list will be in "pagesbyFlyax.txt" in the "temp_npbu" sub-directory.

titlesbyuser.sh

#!/bin/bash

usage() {
  echo "Usage: $0 username"
  echo "This script gets all titles that have been modified "
  echo "by a specific user on en.wiktionary.org."
  echo
  echo "For example:"
  echo "$0 Flubot"
  echo
  echo "The list of titles will be created in the temporary directory"
  echo "temp_npby/titles.txt"
  exit 1
}

if [ -z "$1"  ]; then
  usage
fi

user="$1"
r=0
tmp="./temp_npbu"

mkdir -p $tmp

while [ 1 ]; do
  changes="$tmp/temp$user.xml"
  changes1="$tmp/titles$user.xml"
  titles1="$tmp/titles$user.txt"
  if [ $r == 0 ]; then
      curl --retry 10 -f "http://en.wiktionary.org/w/api.php?action=query&list=usercontribs&format=xml&uclimit=500&ucuser=$user" | sed -e 's/>/>\n/g' >  $changes
  else
      curl --retry 10 -f "http://en.wiktionary.org/w/api.php?action=query&list=usercontribs&format=xml&uclimit=500&ucstart=$startdate&ucuser=$user"  | sed -e 's/>/>\n/g' >  $changes
  fi

  if [ $? -ne 0 ]; then
      echo "Error $? from curl, unable to get user's contributions, bailing"
      exit 1
  fi
# getting titles
  grep title $changes > $changes1
  cat $changes1|
  while read tt; do
	echo $tt | awk -F 'title="' '{ print $2 }' | awk -F '"' '{ print $1 }' >> $titles1
  done

# getting timestamp
# <usercontribs ucstart=
  startdate=`grep "<usercontribs ucstart=" $changes | awk -F 'ucstart="' '{ print $2 }' | awk -F '"' '{ print $1 }'`

  # if there is no timestamp then ... end
  if [[ -z "$startdate" ]]; then
    break
  fi

  let r=$r+1
  sleep 5
done

newbyuser.sh

#!/bin/bash
usage() {
  echo "Usage: $0 username"
  echo "This script gets all titles that have been created "
  echo "by a specific user on en.wiktionary.org."
  echo
  echo "For example:"
  echo "$0 Flubot"
  echo
  echo "The list of titles will be created in the temporary directory"
  echo "temp_npby/pagesbyFlubot.txt"
  exit 1
}

if [ -z "$1"  ]; then
  usage
fi


user="$1"
user1=`echo $user | awk '{print toupper($0)}'`
tmp="./temp_npbu"
titles1="$tmp/titles$user.txt"
titles2="$tmp/titles$user.list"

cat $titles1 | sort | uniq > $titles2

cat $titles2 |
while read title; do
  title0=`echo $title | sed -e "s/ /_/g"`
  echo 
  info=`curl --retry 10 -f "http://en.wiktionary.org/w/api.php?action=query&prop=revisions&format=xml&titles=$title0&rvlimit=1&rvprop=user&rvdir=newer"`
  if [ $? -ne 0 ]; then
      echo "Error $? from curl, bailing"
      exit 1
  fi
  uname=`echo $info | awk -F 'user="' '{ print $2 }' | awk -F '"' '{ print $1 }'`
  uname1=`echo $uname | awk '{print toupper($0)}'`
  echo 
  echo "$title by $uname"
  echo

  if [ "$uname1" == "$user1" ]; then
	echo "$title" >> $tmp/pagesby$user.txt
  fi
done