[
Previous: Overview] [
Contents] [
Next: Installing and Running Terrier]
Terrier 1.1.1 - 24/10/2007
Minor update. Mostly bug fixes. Some minor code enhancements, plus the inclusion of a test harness. Snowball stemmers were added to boost support for languages other than English. This will likely be the last release in the 1.x.x series.
Indexing
- BUG: When merging block indices, ensure that the resulting inverted index has blocks.
- BUG: Field indexing not working properly.
- BUG: Block ids recorded incorrectly when fields are enabled.
- BUG Resilience: Dont throw NPE in SimpleFileCollection if no files are processed.
- BUG Resilience: Dont throw exceptions if index has no terms/documents - fail more gracefully (LexiconBuilder, Indexer).
- When parsing a TREC-like document collection, use Streams at TRECCollection level, and Reader at Document level. This allows easier change of encoding, etc.
Retrieval
- BUG: When retrieving phrases, prevent Exception from debugging code in Manager.
- BUG: Regression when retrieving phrases, some documents not matched.
- BUG: DFRWeightingModel breaks when first normalisation or tf normalisation is not speficied.
- BUG Resilience: Do not throw NPE in ExpansionTerms if original query terms are not set by client code.
- Create a .settings file for each TREC results file, so that it is easy to determine the setting for a run.
- Added an alternative batch query parser, known as SingleLineTRECQuery, mostly to support the test harness.
Desktop
- BUG: PDF parsing too noisy through log4j, indexing may never finish. Turned down default logging level to info.
- BUG: Logging may not appear for indexing Terrier's own documentation. Indexing run in new Thread, not SwingUtilities.invokeLater().
Other
- Tokenisation: Added Snowball stemmers. For more information, see documentation on Non English language support.
- Java: Various Java Generics changed.
- Testing: Added test harness, which checks that the correct documents are retrieved for various queries and index formats. Uses Shakespeare's Merchant of Venice play for the test document collection.
- Shell scripts: Take notice of TERRIER_ETC environment variable and pass to Terrier.
- Shell scripts: added anyclass.bat.
Terrier 1.1.0 - 15/06/2007
Major update. Many changes to the source code, including more robust indexing and index structure merging.
Indexing
Indexing architecture has been updated for Terrier 1.1.0, however indices created with 1.1.0 are completely compatible with those created with 1.0.x, and vice-versa.
- Separated string.byte.length property into two properties: max.term.length and docno.byte.length.
- Allow UTF characters in indexing, and use a compatible method for saving these in the Lexicon. This enables Terrier to be used for non-English languages. Set string.use_utf to true when indexing, and use TRECUTFCollection to parse the collection.
- Merge multiple temporary lexicons at once in LexiconBuilders. 16 seems to be a good default setting.
- Don't use tree structures for indexing, they are slower and larger (20% indexing speed improvement). New classes DocumentPostingList and LexiconMap.
- Writing structures (direct and inverted) flush normally, to reduce memory consumption.
- Add lexicon hashing to reduce size of binary search.
- LookAheadStream and LookAheadReader are now case-sensitive, as the String.toUpperCase() affects indexing speed.
- When the current indexing hits a threshold, finish it, then start a new index. Merge indices at completion. See Indexer.
- Added code for merging indices - see StructureMerger and BlockStructureMerger.
- Added a CollectionFactory, to allow Collections to wrap other Collections.
- TRECCollection no longer throws exception when used for re-indexing and docPointers.col exists (Thanks to Dolf Trieschnigg, Univ of Twente).
Retrieval
- CollectionStatistics is now non-static.
- Added Hiemstra LM and Lemur TF_IDF weighting models.
- BUG: Lexicon would match prefixes of terms when the desired term does not exist in the Lexicon.
- Use a LexiconEntry, to support easier thread-safety with the Lexicon.
- Added generic DFRWeightingModel, which can generate many DFR document weighting models. More information in Extending Retrieval.
Other
- Improved documentation.
- Java: Move to Java 1.5 source, and upgrade GNU Trove jar.
- Logging: Use log4j throughout source. Log4j config can be read from etc/terrier-log.xml.
- Java: Various source code changes, to allow easier extension and re-use.
- Compiling: Included compile.bat, by Jurrie Overgoor (Univ of Twente).
Terrier 1.0.2 - 17/03/2005
- BUG: Language modelling didnt index properly when block indexing was enabled.
- BUG: Lexicon Merging compare strings the same way as the LexiconTree outputs them, to ensure sorting is correct.
- BUG: Block ids are correctly recorded in the inverted index for large collections.
- BUG: Block ids are correctly read from the direct index.
- BUG: The phrase score modifier has been rewritten to a more correct implementation.
- BUG: HTML Stack only lives for one document.
- BUG: Cropping the resultset did not function properly with metadata.
- BUG: If more than one control mapped to a post(process/filter) then only the last one would be noted. This is now fixed, and simpler datastructures are used for the controls and the post(process/filter).
- TREC: During indexing, start indexing from the beginning of a new file, not from the previous state.
- TREC: Added trec.collection.class property to allow TRECIndexing to determine the TREC class to be used during indexing.
- Added DLH Divergence From Randomness model - this hyper-geometric weighting model is completely parameter free and is very robust over many test collections.
- Query Parser: Allow characters in the extended character set to be in terms.
- LookAheadReader: Corrected implementation of Reader interface to give better support wrt EOF and subsequent method calls.
- Added more TermPipeline classes: CropTerm, DumpTerm.
- Updated and organised documentation and Javadoc.
Terrier 1.0.1 - 09/02/2005
- BUG 1: bin/interactive_terrier.bat doesn't run the correct class.
- BUG 2: bin/compile.sh compiles the ANTLR parser correctly.
- BUG: Lexicon binary search failed when searching for the last entry. Binary search has been updated.
- Document Index binary search made more robust for different types of documentIds.
- Desktop Terrier: starts new threads using correct Swing utility API.
- Desktop Terrier: close PDF documents correctly.
- Desktop Terrier: search text logging is slightly more robust.
- Desktop Terrier: always disable search tab while indexing.
- Desktop Terrier: temporary lexicon folders are deleted if they exist in the index folder before indexing.
- Desktop Terrier: process only 25,000 terms at a time during block inverted index building, as only 120MB heap space is restrictive.
- TREC: Model, QEModel & C value is displayed correctly in TREC querying and results file.
- Documentation: Removed Known Issue 1 from doc/todo.html.
- Documentation: Updated javadoc in ApplicationSetup.java.
- Documentation: Added more details about compiling in doc/terrier_develop.html.
Terrier 1.0.0 - 28/01/2005
- New Indexing APIs, that allow more diverse forms of collections to be easily indexed.
- New Querying API and languages (eg fields, phrases, proximity, requirements).
- More Statistical IR Models: tf-idf, BM25, Divergence From Randomness models, and Ponte-Croft language model.
- More example applications, including a Desktop Search application.
Terrier 1.0 Beta2 - 22/11/2004
- Minor bugfix release - documentation error.
Terrier 1.0 Beta - 18/11/2004
- First public release of Terrier.
[
Previous: Overview] [
Contents] [
Next: Installing and Running Terrier]