summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSilvan Jegen <s.jegen@gmail.com>2016-09-16 21:21:34 +0200
committerSilvan Jegen <s.jegen@gmail.com>2016-09-16 21:21:34 +0200
commit4eb814f9253639a43d0eaa1897535ba1e1bf67df (patch)
tree68d1cb990930e0abbbc52680f6f7d0936fa0fff8
parent20849860c5572fa4fda86d26a5ad0a6fb760a3b8 (diff)
Add link to README
-rw-r--r--README.md11
1 files changed, 7 insertions, 4 deletions
diff --git a/README.md b/README.md
index 57a3884..d475efc 100644
--- a/README.md
+++ b/README.md
@@ -36,11 +36,14 @@ Run the benchmark
-----------------
To run the benchmark you need the test input which is a subset of all
-the Open Access Pubmed Central full text XML files. The subset used can
-be found in the 'xmldata/subset.txt' file. The input consists of 10'000
-small XML files that have to be copied into their subdirectories in the
-'xmldata' directory.
+the Open Access Pubmed Central full text XML files[0]. The exact subset
+used can be found in the 'xmldata/subset.txt' file. The input consists of
+10'000 small XML files that have to be copied into their subdirectories
+in the 'xmldata' directory.
If you have located and copied all the input files into 'xmldata/'
you can execute the "runbenchmarks.sh" script to run the benchmark.
+
+[0] I used a subet of ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_bulk/non_comm_use.A-B.xml.tar.gz
+ (warning: the file is about 1.2GB in size)