Class LexiconOutputStream<KEY>

  • Type Parameters:
    KEY -
    All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable
    Direct Known Subclasses:
    FSOMapFileLexiconOutputStreamGeneric

    public abstract class LexiconOutputStream<KEY>
    extends java.lang.Object
    implements java.io.Closeable
    This class implements an output stream for the lexicon structure.
    Author:
    Vassilis Plachouras & Craig Macdonald
    • Field Detail

      • lexiconStream

        protected java.io.DataOutput lexiconStream
        A data input stream to read from the bufferInput.
      • numPointersWritten

        protected long numPointersWritten
        Pointer written - the sum of the Nts
      • numTokensWritten

        protected long numTokensWritten
        collection length - the sum of the TFs
      • numTermsWritten

        protected int numTermsWritten
    • Constructor Detail

      • LexiconOutputStream

        protected LexiconOutputStream()
    • Method Detail

      • close

        public void close()
        Closes the lexicon stream. IOException if an I/O error occurs while closing the stream.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
      • writeNextEntry

        public abstract int writeNextEntry​(KEY _key,
                                           LexiconEntry _value)
                                    throws java.io.IOException
        Writes a lexicon entry.
        Parameters:
        _key - the key - usually the term
        _value - the lexicon entry value
        Returns:
        the number of bytes written to the file.
        Throws:
        java.io.IOException - if an I/O error occurs
      • incrementCounters

        protected void incrementCounters​(EntryStatistics t)
      • getNumberOfPointersWritten

        public long getNumberOfPointersWritten()
        Returns the number of pointers there would be in an inverted index built using this lexicon (thus far). This is equal to the sum of the Nts written to this lexicon output stream.
      • getNumberOfTokensWritten

        public long getNumberOfTokensWritten()
        Returns the number of tokens there are in the entire collection represented by this lexicon (thus far). This is equal to the sum of the TFs written to this lexicon output stream.
      • getNumberOfTermsWritten

        public int getNumberOfTermsWritten()
        Returns the number of terms written so far by this LexiconOutputStream