[
Contents] [
Next: What's new]
Below, you can find a succinct list of features offered by Terrier.
General
- Open Source (Mozilla Public Licence).
- Written in cross-platform Java - works on Windows, Mac OS X, Linux and Unix.
- Modular and open indexing and querying APIs, to allow easy extension.
- Handling full-text indexing of large-scale document collections, in a centralised architecture to at least 25 million documents.
- Large user-base over 3 years of public release.
- Active Information Retrieval research fed into the Open Source platform.
Indexing
- Out-of-the box indexing of tagged document collections, such as the TREC test collections.
- Out-of-the box indexing for documents
of various formats, such as HTML, PDF, or Microsoft Word,
Excel and PowerPoint files.
- Indexing of field information, such as TITLE, H1, HTML tags information
- Indexing of position information on a word, or a block level.
- Support for various encodings of documents (UTF), to facilitate multi-lingual retrieval.
- Highly compressed index disk data structures.
- Highly compressed direct file for efficient query expansion.
- Various stemming techniques included, and easy to extend to others.
Retrieval
- Provides standard querying facilities, as well as Query Expansion (pseudo-relevance feedback)
- Can be applied in interactive applications, such as the included Desktop Search, or in
a batch setting for research & experimentation.
- Provides a number of Divergence From Randomness (DFR) document
ranking models.
- Support for classic retrieval models, such as tf-idf, Okapi's BM25 as well as
several language models, and Rocchio's query expansion.
- Provides a number of parameter-free DFR term weighting models
for automatic query expansion.
- Flexible processing of terms through a pipeline of components,
such as stop-words removers and stemmers.
Experimentation
- Handles all currently available TREC test collections
- Easily scriptable to evaluate many parameter settings, or many weighting models in batch form
- In-built evaluation tools for use with TREC ad-hoc and known-item search
retrieval results, to produce various Precision and Recall measures.
- Advanced query language that supports boolean operators,
+/- operators, phrase and proximity search, and fields.
[
Contents] [
Next: What's new]