Commit e88d3a25 authored by Gerhard Gossen's avatar Gerhard Gossen
Browse files

Update version references

parent f3c130ad
......@@ -3,6 +3,7 @@
:icons: font
:toc: preamble
:source-highlighter: pygments
:version: 0.1.0
Gerhard Gossen <>
......@@ -23,10 +24,10 @@ Download and build the code by running the following shell commands:
git clone
cd archive-recrawling/code
mvn package
mvn package -Prun
Copy the runnable version in `target/archive-crawler-$VERSION.jar` to a cluster machine.
Copy the runnable version in `target/archive-crawler-{version}.jar` to a cluster machine.
== Specifying Events
......@@ -82,17 +83,17 @@ CAUTION: By default the pages are retrieved from the[G
Now run the Collection Specification Creator tool using
java -cp target/archive-crawler-$VERSION.jar topicsFile.tsv
java -cp target/archive-crawler-{version}.jar topicsFile.tsv
to create the `.json` files containing the collection specifications.
You can optionally also specify a directory where the files should be stored:
java -cp target/archive-crawler-$VERSION.jar topicsFile.tsv outputDir
java -cp target/archive-crawler-{version}.jar topicsFile.tsv outputDir
The created JSON files can be used as describe below.
......@@ -103,16 +104,16 @@ The extraction process needs to be started on a cluster server.
Upload the JAR you build during the setup as well as the JSON collection specifications to your server and log in using SSH.
On the server, upload the JAR to HDFS, e.g as follows:
hadoop fs -put -f archive-crawler-$VERSION.jar
hadoop fs -put -f archive-crawler-{version}.jar
Now you can run th extraction as
yarn jar archive-crawler-$VERSION.jar hdfs:///user/$USER/archive-crawler-$VERSION.jar topic.json /tmp/archive-crawler-out
yarn jar archive-crawler-{version}.jar hdfs:///user/$USER/archive-crawler-{version}.jar topic.json /tmp/archive-crawler-out
This command takes the following parameters:
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment