TREC Terabyte – Baseline Systems

This page explains how to conduct the performance comparison baseline runs for the TREC 2006 Terabyte track. All participating groups are encouraged to conduct at least one such run (more if time permits), using one of the publicly available retrieval systems listed below. This run will be conducted on a single computer and will hopefully help us to come up with a satisfying methodology to do reliable inter-system performance comparisons of information retrieval systems.

For each system, building the index and running the efficiency queries should each take somewhere between 5 and 30 hours, depending on the characteristics of the hardware you are using. Please make sure that the disk cache of your operating system is empty before you start a new run. Otherwise, the times reported by time might not reflect the true performance of your system. One method to do this is to create a file that is larger than the available amount of memory and run md5sum on that file. Another method is to reboot the machine.


Clarification: When building an index with any of the three systems, you need to use uncompressed input files instead of the gzip-compressed format that GOV2 is shipped in..


Systems available for performance baseline runs:

Stefan Buettcher, 2006-09-01
To report bugs on this page, please send an email to sbuettch@plg.uwaterloo.ca.