diff options
-rw-r--r-- | README.md | 11 |
1 files changed, 7 insertions, 4 deletions
@@ -36,11 +36,14 @@ Run the benchmark ----------------- To run the benchmark you need the test input which is a subset of all -the Open Access Pubmed Central full text XML files. The subset used can -be found in the 'xmldata/subset.txt' file. The input consists of 10'000 -small XML files that have to be copied into their subdirectories in the -'xmldata' directory. +the Open Access Pubmed Central full text XML files[0]. The exact subset +used can be found in the 'xmldata/subset.txt' file. The input consists of +10'000 small XML files that have to be copied into their subdirectories +in the 'xmldata' directory. If you have located and copied all the input files into 'xmldata/' you can execute the "runbenchmarks.sh" script to run the benchmark. + +[0] I used a subet of ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_bulk/non_comm_use.A-B.xml.tar.gz + (warning: the file is about 1.2GB in size) |