Code for the slcon3 "XML damage control" presentation ===================================================== This repo contains contains a benchmark of several XML libraries that I have written for the presentation. Most of the libraries are C ones because the goal is to compare some of the simplest but most efficient tools that ease the pain of having to work with XML. The libraries compared in this benchmark are: * ezxml * simple xml (sxmlc) * mini xml (mxml) * yxml * Go's encoding/xml * Python's elementtree Compile ------- You will have to install the mini-xml (mxml) library somewhere and then make sure that the compiler can find it by editing the Makefile (provided the library is not installed in one of the usual places). All other libraries have been copied into the benchmark programs (in their own C file ending on 'lib'). If you have the mxml library installed you can just run the usual make to compile everything. Run the benchmark ----------------- To run the benchmark you need the test input which is a subset of all the Open Access Pubmed Central full text XML files[0]. The exact subset used can be found in the 'xmldata/subset.txt' file. The input consists of 10'000 small XML files that have to be copied into their subdirectories in the 'xmldata' directory. If you have located and copied all the input files into 'xmldata/' you can execute the "runbenchmarks.sh" script to run the benchmark. [0] I used a subet of ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_bulk/non_comm_use.A-B.xml.tar.gz (warning: the file is about 1.2GB in size)