The following resources are available:
Documentation
General-purpose releases
These releases should compile under any version of Linux with gcc 3.0
or greater. However, some Linux distributions do not include zlib,
which is used by Wumpus to compress its internal docid map. If you see
a compiler error that reads like "zlib.h: No such file or directory",
then please edit indexcache/docidcache.cpp and change the
"#define USE_ZLIB 1" line from 1 to 0.
For all other problems, please send me an
email.
-
Wumpus 2011-11-10 (.tar.gz, 1091 KB)
Maintenance release to fix build errors with modern versions of GCC and to fix a bug that made bigram indexing incompatible with XPath support.
Also contains a few hacks to make it build and run under MacOS.
-
Wumpus 2009-02-02 (.tar.gz, 1083 KB)
This release contains a few performance improvements as well as a
basic implementation of BM25F. It also includes rudimentary support
for character N-gram indexing.
-
Wumpus 2007-11-23 (.tar.gz, 1089 KB)
This release fixes some minor problems as well as an annoying off-by-one
bug that sometimes showed up when fetching the text for a given index
extent, and the text was not properly aligned with the index data.
-
Wumpus 2007-09-07 (.tar.gz, 1102 KB)
This release introduces a new on-disk index format. The new
version can read the index files created by an older version, but older
versions cannot process index files produced by the new version!
TREC Terabyte release
-
Wumpus 2006-05-03-TREC (.tar.gz, 884 KB)
Use this release if you want to do a performance baseline run for TREC Terabyte 2006.
After downloading the package, untar it and chdir to the wumpus/ directory.
Then type make. If the build process fails, please send
me a bug report so that I can try to fix
the problem. Under Linux, everything should work smoothly. After building the distribution,
there are two run modes. From within the wumpus/ directory, execute
-
bin/trec INDEX input_file output_file log_file
in order to build an index for the document collection. The index will be created
in wumpus/database/. The command-line parameter input_file
must refer to a file that contains a list of all files that are to be indexed (these
files may either be plain text, .gz, or .bz2, but must be in TREC format). Nothing
is written to the file given by output_file. Log messages, including
performance information, are written to log_file.
-
bin/trec QUERY input_file output_file log_file
in order to run search queries against the index built by following the instructions
above. The command-line parameter input_file must refer to a file
that contains all ad-hoc search queries that are to be executed. The queries in the
input file must be of the format "topic_id term_1 term_2 .. term_N". To see the exact
format, you can download a set of
example queries.
TREC-formatted search results are written to output_file. Log messages,
including performance information, are written to log_file.
System requirements: In order to perform a Wumpus performance baseline run for the
GOV2 document collection, you need at least 512 MB of RAM and 40 GB of free hard disk space
on the partition where you install Wumpus.
Wumpus is free software and licensed under the GNU General Public License
(GPL).