Package org.terrier.structures
Class BaseCompressingMetaIndex
- java.lang.Object
-
- org.terrier.structures.BaseCompressingMetaIndex
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable,MetaIndex
- Direct Known Subclasses:
CompressingMetaIndex,LZ4CompressedMetaIndex,ZstdCompressedMetaIndex
public abstract class BaseCompressingMetaIndex extends java.lang.Object implements MetaIndex
AMetaIndeximplementation that compresses contents. Values have maximum lengths, but overall value blobs are compressed. Various sub-classes vary in the particular compression algorithm used. From version 3.0 zlib deflate was default.- Since:
- 3.0
- Author:
- Craig Macdonald & Vassilis Plachouras
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBaseCompressingMetaIndex.InputStreamAn iterator for reading a MetaIndex as a stream
-
Field Summary
Fields Modifier and Type Field Description protected org.terrier.structures.BaseCompressingMetaIndex.ByteAccessordataSourceprotected gnu.trove.TObjectIntHashMap<java.lang.String>key2bytelengthprotected gnu.trove.TObjectIntHashMap<java.lang.String>key2byteoffsetprotected gnu.trove.TObjectIntHashMap<java.lang.String>key2reverseOffsetprotected intkeyCountprotected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[]keyFactoriesprotected java.lang.String[]keyNamesprotected intnumDocsprotected org.terrier.structures.BaseCompressingMetaIndex.Docid2OffsetLookupoffsetLookupprotected java.lang.Stringpathprotected static java.lang.ThreadLocal<org.terrier.structures.BaseCompressingMetaIndex.OffsetPointer>pointerCacheprotected java.lang.Stringprefixprotected intrecordLengthprotected java.util.Map<org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable>[]reverseMetaMapsprotected int[]valueByteLengthsprotected int[]valueByteOffsetsprotected boolean[]valuesSorted
-
Constructor Summary
Constructors Constructor Description BaseCompressingMetaIndex(IndexOnDisk index, java.lang.String structureName)Construct an instance of the class with
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected int_binarySearch(java.lang.String key, java.lang.String value)performs a binary search on the metaindex, if they keys happen to be in lexographical ordervoidclose()Closes the underlying structures.protected abstract byte[]decode(byte[] input)java.lang.String[]getAllItems(int docid)Obtain all metadata for specified document.intgetDocument(java.lang.String key, java.lang.String value)Obtain docid where document has specified metadata value in the specified type.java.lang.StringgetItem(java.lang.String Key, int docid)Obtain metadata of specified type for specified document.java.lang.String[]getItems(java.lang.String[] Keys, int docid)Obtain metadata of specified types for specified document.java.lang.String[][]getItems(java.lang.String[] Keys, int[] _docids)Obtain metadata of specified types for specified documents.java.lang.String[]getItems(java.lang.String Key, int[] _docids)Obtain metadata of specified type for specified documents.java.lang.String[]getKeys()Returns the keys of this meta indexjava.lang.String[]getReverseKeys()Returns the reverse keys of this meta indexprotected voidloadIndex(IndexOnDisk index, java.lang.String structureName)static voidmain(java.lang.String[] args)mainintsize()How many documents in this metaindex
-
-
-
Field Detail
-
pointerCache
protected static final java.lang.ThreadLocal<org.terrier.structures.BaseCompressingMetaIndex.OffsetPointer> pointerCache
-
offsetLookup
protected org.terrier.structures.BaseCompressingMetaIndex.Docid2OffsetLookup offsetLookup
-
recordLength
protected int recordLength
-
keyNames
protected java.lang.String[] keyNames
-
key2byteoffset
protected gnu.trove.TObjectIntHashMap<java.lang.String> key2byteoffset
-
key2bytelength
protected gnu.trove.TObjectIntHashMap<java.lang.String> key2bytelength
-
key2reverseOffset
protected gnu.trove.TObjectIntHashMap<java.lang.String> key2reverseOffset
-
keyCount
protected int keyCount
-
valueByteOffsets
protected int[] valueByteOffsets
-
valueByteLengths
protected int[] valueByteLengths
-
valuesSorted
protected boolean[] valuesSorted
-
numDocs
protected int numDocs
-
path
protected final java.lang.String path
-
prefix
protected final java.lang.String prefix
-
dataSource
protected final org.terrier.structures.BaseCompressingMetaIndex.ByteAccessor dataSource
-
reverseMetaMaps
protected java.util.Map<org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable>[] reverseMetaMaps
-
keyFactories
protected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[] keyFactories
-
-
Constructor Detail
-
BaseCompressingMetaIndex
public BaseCompressingMetaIndex(IndexOnDisk index, java.lang.String structureName) throws java.io.IOException
Construct an instance of the class with- Parameters:
index-structureName-- Throws:
java.io.IOException
-
-
Method Detail
-
size
public int size()
Description copied from interface:MetaIndexHow many documents in this metaindex
-
getKeys
public java.lang.String[] getKeys()
Returns the keys of this meta index
-
close
public void close() throws java.io.IOExceptionCloses the underlying structures.- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException
-
getReverseKeys
public java.lang.String[] getReverseKeys()
Returns the reverse keys of this meta index- Specified by:
getReverseKeysin interfaceMetaIndex
-
_binarySearch
protected int _binarySearch(java.lang.String key, java.lang.String value) throws java.io.IOExceptionperforms a binary search on the metaindex, if they keys happen to be in lexographical order- Throws:
java.io.IOException
-
getDocument
public int getDocument(java.lang.String key, java.lang.String value) throws java.io.IOExceptionObtain docid where document has specified metadata value in the specified type. Returns -1 if the value cannot be found for the specified key type.- Specified by:
getDocumentin interfaceMetaIndex- Throws:
java.io.IOException
-
getItems
public java.lang.String[] getItems(java.lang.String Key, int[] _docids) throws java.io.IOExceptionObtain metadata of specified type for specified documents.. In this implementation, _docids are sorted to improve disk cache hits. _docids is however unchanged.
-
getItems
public java.lang.String[][] getItems(java.lang.String[] Keys, int[] _docids) throws java.io.IOExceptionObtain metadata of specified types for specified documents. Return array is indexed by documents, then by metakeys. In this implementation, _docids are sorted to improve disk cache hits. _docids is however unchanged.
-
decode
protected abstract byte[] decode(byte[] input) throws java.io.IOException- Throws:
java.io.IOException
-
getItem
public java.lang.String getItem(java.lang.String Key, int docid) throws java.io.IOExceptionObtain metadata of specified type for specified document.
-
getItems
public java.lang.String[] getItems(java.lang.String[] Keys, int docid) throws java.io.IOExceptionObtain metadata of specified types for specified document.
-
getAllItems
public java.lang.String[] getAllItems(int docid) throws java.io.IOExceptionObtain all metadata for specified document.- Specified by:
getAllItemsin interfaceMetaIndex- Throws:
java.io.IOException
-
loadIndex
protected void loadIndex(IndexOnDisk index, java.lang.String structureName) throws java.io.IOException
- Throws:
java.io.IOException
-
main
public static void main(java.lang.String[] args) throws java.lang.Exceptionmain- Parameters:
args-- Throws:
java.lang.Exception
-
-