Update README

author: Silvan Jegen <s.jegen@gmail.com> 2016-09-17 19:53:12 +0200
committer: Silvan Jegen <s.jegen@gmail.com> 2016-09-17 20:37:03 +0200
commit: b0e8849e1ee35e61e7ba51c5db285e01378959bb (patch)
tree: ea532ff4b065b6d51d2870911d98cc22b5c580b2
parent: c21bc7da0bc2f6aa5e6009fdc555b609f5fba839 (diff)
1 files changed, 17 insertions, 7 deletions
diff --git a/README.md b/README.md
index d475efc..7c51dc5 100644
--- a/README.md
+++ b/README.md
@@ -23,9 +23,9 @@ You will have to install the mini-xml (mxml) library somewhere and
 then make sure that the compiler can find it by editing the Makefile
 (provided the library is not installed in one of the usual places). All
 other libraries have been copied into the benchmark programs (in their
-own C file ending on 'lib').
+own C file ending on 'lib.c').
 
-If you have the mxml library installed you can just run the usual
+As soon as you have the mxml library installed you can just run the usual
 
 make
 
@@ -35,15 +35,25 @@ to compile everything.
 Run the benchmark
 -----------------
 
-To run the benchmark you need the test input which is a subset of all
-the Open Access Pubmed Central full text XML files[0]. The exact subset
-used can be found in the 'xmldata/subset.txt' file. The input consists of
-10'000 small XML files that have to be copied into their subdirectories
-in the 'xmldata' directory.
+To run the benchmark you need the test input XML files which are a subset
+of all the Open Access Pubmed Central full text XML files[0]. The exact
+subset used can be found in the 'xmldata/subset.txt' file. The input
+consists of 10'000 small XML files that have to be copied into their
+subdirectories in the 'xmldata' directory (just untar the tar.gz file
+found at the link location there).
 
 If you have located and copied all the input files into 'xmldata/'
 you can execute the "runbenchmarks.sh" script to run the benchmark.
 
+The benchmarks will be run 10 times each (taking around 45 minutes to
+complete) while being timed. The 10 time measurements will be appended
+to log files (so that running the benchmark several times will result in
+more data points). The "runbenchmarks.sh" script will convert the time
+measurements to seconds afterwards and then run a R one-liner I found
+on the internet[1] to print out the mean and the standard deviation of
+the measurements to $programname.statistics files.
+
 
 [0] I used a subet of ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_bulk/non_comm_use.A-B.xml.tar.gz
     (warning: the file is about 1.2GB in size)
+[1] http://stackoverflow.com/questions/9789806/command-line-utility-to-print-statistics-of-numbers-in-linux
author	Silvan Jegen <s.jegen@gmail.com>	2016-09-17 19:53:12 +0200
committer	Silvan Jegen <s.jegen@gmail.com>	2016-09-17 20:37:03 +0200
commit	b0e8849e1ee35e61e7ba51c5db285e01378959bb (patch)
tree	ea532ff4b065b6d51d2870911d98cc22b5c580b2
parent	c21bc7da0bc2f6aa5e6009fdc555b609f5fba839 (diff)