Class BasicLexiconEntry

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  BasicLexiconEntry.Factory
      Factory for creating LexiconEntry objects
    • Field Summary

      Fields 
      Modifier and Type Field Description
      int maxtf  
      int n_t
      the number of document that this entry occurs in
      byte startBitOffset
      the start bit offset of the entry in the inverted index
      long startOffset
      the start offset of the entry in the inverted index
      int termId
      the termid of this entry
      int TF
      the total number of occurrences of the term in the index
    • Constructor Summary

      Constructors 
      Constructor Description
      BasicLexiconEntry()
      Create an empty LexiconEntry
      BasicLexiconEntry​(int tid, int _n_t, int _TF)
      Create a lexicon entry with the following information.
      BasicLexiconEntry​(int tid, int _n_t, int _TF, byte fileId, long _startOffset, byte _startBitOffset)
      Create a lexicon entry with the following information.
      BasicLexiconEntry​(int tid, int _n_t, int _TF, int _maxtf, byte fileId, BitFilePosition offset)
      Create a lexicon entry with the following information.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void add​(EntryStatistics le)
      increment this lexicon entry by another
      int getDocumentFrequency()
      Return the number of documents that the term occurs in.
      byte getFileNumber()
      Returns the file number (byte value in the 0-31 range)
      int getFrequency()
      Return the frequency (total number of occurrences) of the term.
      int getMaxFrequencyInDocuments()
      Return the maximum in-document term frequency of the term among all documents the terms appears in.
      int getNumberOfEntries()
      Pointer implementation: how many entries in the inverted index.
      long getOffset()
      Return the number of bytes offset.
      byte getOffsetBits()
      Return the number of bits offset.
      int getTermId()
      Return the id of the term.
      java.lang.String pointerToString()
      Returns a textual representation of the pointer alone
      void readFields​(java.io.DataInput in)
      void setBitIndexPointer​(BitIndexPointer pointer)
      Update this pointer to reflect the same values as the specified pointer
      void setDocumentFrequency​(int nt)
      Set the number of documents that the term occurs in.
      void setFileNumber​(byte fileId)
      Set the file number.
      void setFrequency​(int F)
      Set the frequency (total number of occurrences) of the term.
      void setMaxFrequencyInDocuments​(int max)
      Set the maximum in-document term frequency of the term among all documents the terms appears in.
      void setNumberOfEntries​(int n)
      Update the number of entries in the pointer
      void setOffset​(long bytes, byte bits)
      Set the offset in number of bytes and number of bits.
      void setOffset​(BitFilePosition pos)
      Sets the bit file position within this object to that represented by the specified bit file position.
      void setPointer​(Pointer p)
      Update the pointer
      void setStatistics​(int _n_t, int _TF)
      Set the term statistics, in particular, the number of documents that this term appears in and the total number of occurrences of the term.
      void setTermId​(int newTermId)
      Sets the ID for this term
      void subtract​(EntryStatistics le)
      alter this lexicon entry to subtract another lexicon entry
      void write​(java.io.DataOutput out)
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
    • Field Detail

      • maxtf

        public int maxtf
      • termId

        public int termId
        the termid of this entry
      • n_t

        public int n_t
        the number of document that this entry occurs in
      • TF

        public int TF
        the total number of occurrences of the term in the index
      • startOffset

        public long startOffset
        the start offset of the entry in the inverted index
      • startBitOffset

        public byte startBitOffset
        the start bit offset of the entry in the inverted index
    • Constructor Detail

      • BasicLexiconEntry

        public BasicLexiconEntry()
        Create an empty LexiconEntry
      • BasicLexiconEntry

        public BasicLexiconEntry​(int tid,
                                 int _n_t,
                                 int _TF)
        Create a lexicon entry with the following information.
        Parameters:
        tid - the term id
        _n_t - the number of documents the term occurs in (document frequency)
        _TF - the total count of therm t in the collection
      • BasicLexiconEntry

        public BasicLexiconEntry​(int tid,
                                 int _n_t,
                                 int _TF,
                                 byte fileId,
                                 long _startOffset,
                                 byte _startBitOffset)
        Create a lexicon entry with the following information.
        Parameters:
        tid -
        _n_t -
        _TF -
        fileId -
        _startOffset -
        _startBitOffset -
      • BasicLexiconEntry

        public BasicLexiconEntry​(int tid,
                                 int _n_t,
                                 int _TF,
                                 int _maxtf,
                                 byte fileId,
                                 BitFilePosition offset)
        Create a lexicon entry with the following information.
        Parameters:
        tid -
        _n_t -
        _TF -
        fileId -
        offset -
    • Method Detail

      • setStatistics

        public void setStatistics​(int _n_t,
                                  int _TF)
        Set the term statistics, in particular, the number of documents that this term appears in and the total number of occurrences of the term.
        Specified by:
        setStatistics in class LexiconEntry
      • add

        public void add​(EntryStatistics le)
        increment this lexicon entry by another
        Specified by:
        add in interface EntryStatistics
        Parameters:
        le - the other object whose statistics are used to increment the statistics of this object.
      • subtract

        public void subtract​(EntryStatistics le)
        alter this lexicon entry to subtract another lexicon entry
        Specified by:
        subtract in interface EntryStatistics
        Parameters:
        le - the other object whose statistics are used to decrement the statistics of this object.
      • getDocumentFrequency

        public int getDocumentFrequency()
        Return the number of documents that the term occurs in.
        Specified by:
        getDocumentFrequency in interface EntryStatistics
        Returns:
        the number of documents that the term occurs in.
      • getFrequency

        public int getFrequency()
        Return the frequency (total number of occurrences) of the term.
        Specified by:
        getFrequency in interface EntryStatistics
        Returns:
        the frequency (total number of occurrences) of the entry (term).
      • getTermId

        public int getTermId()
        Return the id of the term.
        Specified by:
        getTermId in interface EntryStatistics
        Returns:
        the id of the term.
      • getNumberOfEntries

        public int getNumberOfEntries()
        Pointer implementation: how many entries in the inverted index. Usually the same as getDocumentFrequency().
        Specified by:
        getNumberOfEntries in interface Pointer
        Overrides:
        getNumberOfEntries in class LexiconEntry
        Returns:
        the number of "things" that this pointer refers to.
      • getOffsetBits

        public byte getOffsetBits()
        Return the number of bits offset.
        Specified by:
        getOffsetBits in interface BitFilePosition
        Returns:
        the number of bits offset.
      • getOffset

        public long getOffset()
        Return the number of bytes offset.
        Specified by:
        getOffset in interface BitFilePosition
        Returns:
        the number of bytes offset.
      • getFileNumber

        public byte getFileNumber()
        Returns the file number (byte value in the 0-31 range)
        Specified by:
        getFileNumber in interface BitIndexPointer
        Returns:
        the file number (byte value in the 0-31 range)
      • setFileNumber

        public void setFileNumber​(byte fileId)
        Set the file number.
        Specified by:
        setFileNumber in interface BitIndexPointer
        Parameters:
        fileId - the file number.
      • setTermId

        public void setTermId​(int newTermId)
        Sets the ID for this term
        Specified by:
        setTermId in class LexiconEntry
      • getMaxFrequencyInDocuments

        public int getMaxFrequencyInDocuments()
        Description copied from interface: EntryStatistics
        Return the maximum in-document term frequency of the term among all documents the terms appears in.
        Specified by:
        getMaxFrequencyInDocuments in interface EntryStatistics
        Returns:
        the maximum in-document term frequency of the term among all documents the terms appears in.
      • setMaxFrequencyInDocuments

        public void setMaxFrequencyInDocuments​(int max)
        Description copied from interface: EntryStatistics
        Set the maximum in-document term frequency of the term among all documents the terms appears in.
        Specified by:
        setMaxFrequencyInDocuments in interface EntryStatistics
        Parameters:
        max - the maximum in-document term frequency of the term among all documents the terms appears in.
      • setOffset

        public void setOffset​(long bytes,
                              byte bits)
        Set the offset in number of bytes and number of bits.
        Specified by:
        setOffset in interface BitFilePosition
        Parameters:
        bytes - the number of bytes to set.
        bits - the number of bits to set.
      • setBitIndexPointer

        public void setBitIndexPointer​(BitIndexPointer pointer)
        Update this pointer to reflect the same values as the specified pointer
        Specified by:
        setBitIndexPointer in interface BitIndexPointer
        Parameters:
        pointer - the pointer to use to set the byte offset, bit offset and file number parameters.
      • setOffset

        public void setOffset​(BitFilePosition pos)
        Sets the bit file position within this object to that represented by the specified bit file position.
        Specified by:
        setOffset in interface BitFilePosition
        Parameters:
        pos - other bit file position to update the bit file position in this object.
      • readFields

        public void readFields​(java.io.DataInput in)
                        throws java.io.IOException
        Specified by:
        readFields in interface org.apache.hadoop.io.Writable
        Throws:
        java.io.IOException
      • write

        public void write​(java.io.DataOutput out)
                   throws java.io.IOException
        Specified by:
        write in interface org.apache.hadoop.io.Writable
        Throws:
        java.io.IOException
      • setNumberOfEntries

        public void setNumberOfEntries​(int n)
        Update the number of entries in the pointer
        Specified by:
        setNumberOfEntries in interface Pointer
        Overrides:
        setNumberOfEntries in class LexiconEntry
        Parameters:
        n - the number of "things" that the pointer refers to.
      • setPointer

        public void setPointer​(Pointer p)
        Update the pointer
        Specified by:
        setPointer in interface Pointer
        Overrides:
        setPointer in class LexiconEntry
        Parameters:
        p - other pointer to update the pointer in this object.
      • setFrequency

        public void setFrequency​(int F)
        Description copied from interface: EntryStatistics
        Set the frequency (total number of occurrences) of the term.
        Specified by:
        setFrequency in interface EntryStatistics
        Parameters:
        F - the frequency (total number of occurrences) of the entry (term).
      • setDocumentFrequency

        public void setDocumentFrequency​(int nt)
        Description copied from interface: EntryStatistics
        Set the number of documents that the term occurs in.
        Specified by:
        setDocumentFrequency in interface EntryStatistics
        Parameters:
        nt - the number of documents that the term occurs in.