Migration Script

How to run

On the machine running fedora, set the FEDORA_HOME environment variable and extract the list of current articles:

ARTICLES=`$FEDORA_HOME/client/bin/fedora-find localhost 9090 pid '?' http | \
  grep pid | grep 'pone.[0-9]*$' | awk '{print $2}' | \
  sed 's/doi:10.1371%2F\(journal.pone.*\)/info:doi\/10.1371\/\1/'`

On a machine with a properly configured 0.8 /etc/topaz/plosone.xml, run the migration script as follows:

for doi in $ARTICLES; do echo "Migrating $doi"; /usr/local/topaz/bin/migration -a $doi; done

To properly configure /etc/topaz/plosone.xml, set network.hosts.fedora and network.hosts.mulgara in /etc/topaz/plosone.xml. Be sure that 9090 and 9091 are not blocked by the firewall.

NOTE: You can do a test migration by using -t (before -a). This is probably safe to do against a live 0.7 database. YMMV :)

It isn't clear how performance may vary between different environments, but it should be possible to migrate 100 articles/hour.

When done running, please look for warnings in /var/log/topaz/tools.log. Since the migration script is restarted for every article, for less noise, you may want to set org.topazproject.xml.transform.EntityResolvingSource and org.plos.configuration.ConfigurationStore to WARN in /usr/local/topaz/classes/log4j.xml.

  <logger name="org.topazproject.xml.transform.EntityResolvingSource">
    <level value="warn"/>
  </logger>
  <logger name="org.plos.configuration.ConfigurationStore">
    <level value="warn"/>
  </logger>

For the full list of options the migration script has, see:

/usr/local/topaz/bin/migration --help

Specific recommendations for production

(We're still verifying one or two things here, but this is our belief at the moment. Contact Eric to verify if unsure.)

Migration can be done on a 0.7 database while 0.7 is running. But it must be run from a machine with the 0.8 tools rpm installed.

  • Install the 0.8-SNAPSHOT tools rpm on some machine on the production network.
  • Modify its /etc/topaz/plosone.xml as described above. Clearly fedora and mulgara authentication must be configured properly too.
  • Open firewall ports 9090 and 9091 on fedora and mulgara machines respectively.
  • Get the list of articles from the fedora machine. See script above.
  • Run migration script from 0.8 machine in test mode using bash loop above. (Be sure to specify -t.)
  • Inspect tools.log on machine running migration script.
  • Run migration script from 0.8 machine in real mode.
    • It may be wise to snapshot mulgara before and a few times in this process. (We had some bad experiences this weekend because we started with a corrupt mulgara database. And we have nothing to indicate the migration script should or could cause a problem. But backups offer a piece of mind.)

Issues

  • article.back.ref.citation.volume (#556) - Is not always an integer. e.g. "86-A", "296B", "1587 -1595", "B57"
  • article.back.ref.citation.year - Is not always an integer. e.g. "2001 Dec 13", "May 25, 2006", "1998a", "in press", "accepted"
  • Some article citations have more than one title - Seen in article 169 - there may be more