Wumpus is an information retrieval system developed at the University of Waterloo.
Its main purpose is to study issues that arise in the context of indexing dynamic
text collections in multi-user environments. One particular scenario that we are
studying is file system search (aka "desktop search"), in which the underlying
text collection is very dynamic and the number of expected index update operations
is much greater than the number of search queries submitted by the users of the
system.
The intended use of Wumpus is two-fold. It can be used
-
as an ordinary information retrieval system, with multi-user support enabled
or disabled;
-
as a file system indexing service that automatically keeps tracks of all
changes in the file system and updates the index accordingly.
Wumpus is very scalable and has been used on text collections consisting of
many hundreds of gigabytes of text and containing dozens of millions of documents.
For more information, please have a look at the
documentation, the
tutorial,
or the
list of publications.
If you are interested in Linux file system notification necessary for real-time
file system indexing, you may find some interesting information on the
fschange
page.
fschange is a patch for the Linux kernel that can be used by a
process with superuser rights to keep track of
all changes to the file
system (recursive watch). Wumpus uses
fschange to keep its internal
index structures up-to-date.
Wumpus is freely available under the terms of the GNU General Public License
(GPL).
If you have any questions regarding the Wumpus system, please send an e-mail to
Stefan Büttcher (see
stefan.buettcher.org).
Also, if you would like to use Wumpus as part of your own system, but for
some reason are unhappy with the terms of the GPL, please drop me a line,
and we can discuss things.