java.lang.Object
- org.terrier.structures.indexing.Indexer
- - org.terrier.structures.indexing.classical.BlockIndexer

```
public class BlockIndexer
extends Indexer
```
An indexer that saves block information for the indexed terms. Block information is usually recorded in terms of relative term positions (position 1, positions 2, etc), however, since 2.2, Terrier supports the presence of "marker terms" during indexing which are used to increment the block counter.
Properties:
- blocks.size - How many terms should be in one block. If you want to use phrasal search, this need to be 1 (default).
- blocks.max - Maximum number of blocks in a document. After this number of blocks, all subsequent terms will be in the same block. Default 100,000
- block.indexing - This class should only be used if the block.indexing property is set.
- indexing.max.encoded.documentindex.docs - how many docs before the DocumentIndexEncoded is dropped in favour of the DocumentIndex (on disk implementation).
- See Also: Properties in org.terrier.indexing.Indexer and org.terrier.indexing.BasicIndexer
Markered Blocks
Markers are terms (artificially inserted or otherwise into the term stream that are used to denote when the block counter should be incremented. This functionality is enabled using the block.delimiters.enabled property, while the terms are specified using a comma delimited fashion with the block.delimiters property. The following lists the properties:
- block.delimiters.enabled - enabled markered blocks. Defaults to false, set to true to enable.
- block.delimiters - comma delimited list of terms that are markers. Defaults to empty. Terms are lowercased is lowercase is set to true (default).
- block.delimiters.index.terms - set to true if markers terms should actually be indexed. Defaults to false.
- block.delimiters.index.doclength - set to true if markers terms should contribute to document length. Defaults to false, only has effect if block.delimiters.index.terms is set.
Author:

Craig Macdonald, Vassilis Plachouras, Rodrygo Santos

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`protected class`	`BlockIndexer.BasicTermProcessor`	This class implements an end of a TermPipeline that adds the term to the DocumentTree.
`protected class`	`BlockIndexer.DelimFieldTermProcessor`	This class behaves in a similar fashion to FieldTermProcessor except that this one treats blocks bounded by delimiters instead of fixed-sized blocks.
`protected class`	`BlockIndexer.DelimTermProcessor`	This class behaves in a similar fashion to BasicTermProcessor except that this one treats blocks bounded by delimiters instead of fixed-sized blocks.
`protected class`	`BlockIndexer.FieldTermProcessor`	This class implements an end of a TermPipeline that adds the term to the DocumentTree.

Field Summary

Fields
Modifier and Type	Field	Description
`protected int`	`BLOCK_SIZE`	The maximum number of terms allowed in a block.
`protected int`	`blockId`	The block number of the current document.
`protected CompressionFactory.CompressionConfiguration`	`compressionDirectConfig`	The compression configuration for the direct index
`protected CompressionFactory.CompressionConfiguration`	`compressionInvertedConfig`	The compression configuration for the inverted index
`protected int`	`MAX_BLOCKS`	The maximum number allowed number of blocks in a document.
`protected int`	`numOfTokensInBlock`	The number of tokens in the current block of the current document.
`protected int`	`numOfTokensInDocument`	The number of tokens in the current document so far.
`protected TermCodes`	`termCodes`	Mapping of terms 2 termids
`protected java.util.Set<java.lang.String>`	`termFields`	The fields that are set for the current term.
`protected DocumentPostingList`	`termsInDocument`	The list of terms in this document, and for each, the block occurrences.

Fields inherited from class org.terrier.structures.indexing.Indexer
blocks, BUILDER_BOUNDARY_DOCUMENTS, currentIndex, directIndexBuilder, docIndexBuilder, emptyDocCount, emptyDocIndexEntry, externalParalllism, fieldNames, fileNameNoExtension, IndexEmptyDocuments, invertedIndexBuilder, lexiconBuilder, logger, MAX_DOCS_PER_BUILDER, MAX_TOKENS_IN_DOCUMENT, metaBuilder, numFields, path, pipeline_first, prefix, useFieldInformation

Constructor Summary

Constructors
Constructor	Description
`BlockIndexer(java.lang.String pathname, java.lang.String prefix)`	Constructs an instance of this class, where the created data structures are stored in the given path, with the given prefix on the filenames.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`createDirectIndex(Collection[] collections)`	For the given collection, it iterates through the documents and creates the direct index, document index and lexicon, using information about blocks and possibly fields.
`protected void`	`createDocumentPostings()`
`void`	`createInvertedIndex()`	Creates the inverted index from the already created direct index, document index and lexicon.
`protected void`	`finishedInvertedIndexBuild()`	Hook method, called when the inverted index is finished - ie the lexicon is finished
`protected TermPipeline`	`getEndOfPipeline()`	Returns the object that is to be the end of the TermPipeline.
`protected void`	`indexDocument(java.util.Map<java.lang.String,java.lang.String> docProperties, DocumentPostingList _termsInDocument)`	This adds a document to the direct and document indexes, as well as it's terms to the lexicon.
`protected void`	`load_indexer_properties()`

Methods inherited from class org.terrier.structures.indexing.Indexer
createMetaIndexBuilder, finishedDirectIndexBuild, getExternalParalllism, index, indexEmpty, init, load_builder_boundary_documents, load_field_ids, load_pipeline, main, merge, merge, mergeTwoIndices, parseInts, setExternalParalllism, useFieldInformation

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - numOfTokensInDocument
```
protected int numOfTokensInDocument
```
    The number of tokens in the current document so far.
  - numOfTokensInBlock
```
protected int numOfTokensInBlock
```
    The number of tokens in the current block of the current document.
  - blockId
```
protected int blockId
```
    The block number of the current document.
  - termFields
```
protected java.util.Set<java.lang.String> termFields
```
    The fields that are set for the current term.
  - termsInDocument
```
protected DocumentPostingList termsInDocument
```
    The list of terms in this document, and for each, the block occurrences.
  - termCodes
```
protected TermCodes termCodes
```
    Mapping of terms 2 termids
  - BLOCK_SIZE
```
protected int BLOCK_SIZE
```
    The maximum number of terms allowed in a block. See Property blocks.size
  - MAX_BLOCKS
```
protected int MAX_BLOCKS
```
    The maximum number allowed number of blocks in a document. After this value, all the remaining terms are in the final block. See Property blocks.max.
  - compressionDirectConfig
```
protected CompressionFactory.CompressionConfiguration compressionDirectConfig
```
    The compression configuration for the direct index
  - compressionInvertedConfig
```
protected CompressionFactory.CompressionConfiguration compressionInvertedConfig
```
    The compression configuration for the inverted index
- Constructor Detail
  - BlockIndexer
```
public BlockIndexer(java.lang.String pathname,
                    java.lang.String prefix)
```
    Constructs an instance of this class, where the created data structures are stored in the given path, with the given prefix on the filenames.
    
    Parameters:
    
    pathname - String the path in which the created data structures will be saved. This is assumed to be absolute.
    
    prefix - String the prefix on the filenames of the created data structures, usually "data"
- Method Detail
  - getEndOfPipeline
```
protected TermPipeline getEndOfPipeline()
```
    Returns the object that is to be the end of the TermPipeline. This method is used at construction time of the parent object.
    
    Specified by:
    
    getEndOfPipeline in class Indexer
    
    Returns:
    
    TermPipeline the last component of the term pipeline.
  - createDirectIndex
```
public void createDirectIndex(Collection[] collections)
```
    For the given collection, it iterates through the documents and creates the direct index, document index and lexicon, using information about blocks and possibly fields.
    
    Specified by:
    
    createDirectIndex in class Indexer
    
    Parameters:
    
    collections - Collection[] the collection to index.
    
    See Also:
    
    Indexer.createDirectIndex(org.terrier.indexing.Collection[])
  - indexDocument
```
protected void indexDocument(java.util.Map<java.lang.String,java.lang.String> docProperties,
                             DocumentPostingList _termsInDocument)
                      throws java.lang.Exception
```
    This adds a document to the direct and document indexes, as well as it's terms to the lexicon. Handled internally by the methods indexFieldDocument and indexNoFieldDocument.
    
    Parameters:
    
    docProperties - Map<String,String> properties of the document
    
    _termsInDocument - DocumentPostingList the terms in the document.
    
    Throws:
    
    java.lang.Exception
  - createInvertedIndex
```
public void createInvertedIndex()
```
    Creates the inverted index from the already created direct index, document index and lexicon. It saves block information and possibly field information as well.
    
    Specified by:
    
    createInvertedIndex in class Indexer
    
    See Also:
    
    Indexer.createInvertedIndex()
  - finishedInvertedIndexBuild
```
protected void finishedInvertedIndexBuild()
```
    Hook method, called when the inverted index is finished - ie the lexicon is finished
    
    Overrides:
    
    finishedInvertedIndexBuild in class Indexer
  - createDocumentPostings
```
protected void createDocumentPostings()
```
  - load_indexer_properties
```
protected void load_indexer_properties()
```
    Overrides:
    
    load_indexer_properties in class Indexer

Class BlockIndexer

Nested Class Summary

Field Summary

Fields inherited from class org.terrier.structures.indexing.Indexer

Constructor Summary

Method Summary

Methods inherited from class org.terrier.structures.indexing.Indexer

Methods inherited from class java.lang.Object

Field Detail

numOfTokensInDocument

numOfTokensInBlock

blockId

termFields

termsInDocument

termCodes

BLOCK_SIZE

MAX_BLOCKS

compressionDirectConfig

compressionInvertedConfig

Constructor Detail

BlockIndexer

Method Detail

getEndOfPipeline

createDirectIndex

indexDocument

createInvertedIndex

finishedInvertedIndexBuild

createDocumentPostings

load_indexer_properties