Class PostingListManager

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class PostingListManager
    extends java.lang.Object
    implements java.io.Closeable
    The PostingListManager is responsible for opening the appropriate posting lists IterablePosting given the MatchingQueryTerms object. Moreover, it knows how each Posting should be scored.

    Plugins are also supported by PostingListManager. Each plugin class should implement the PostingListManagerPlugin interface, and be named explicitly in the matching.postinglist.manager.plugins property.

    Properties:

    • ignore.low.idf.terms - should terms with low IDF (i.e. very frequent) be ignored? Defaults to false, i.e. ignored
    • matching.postinglist.manager.plugins - Comma delimited list of PostingListManagerPlugin classes to load.

    Example Usage

    Following code shows how term-at-a-time matching may occur using the PostingListManager:
     
     MatchingQueryTerms mqt;
     Index index;
     PostingListManager plm = new PostingListManager(index, index.getCollectionStatistics(), mqt);
     plm.prepare(false);
     for(int term = 0;term > plm.size(); term++)
     {
       IterablePosting ip = plm.get(term);
       while(ip.next() != IterablePosting.EOL)
       {
         double score = plm.score(term);
         int id = ip.getId();
       }
     }
     plm.close();
     
    Since:
    3.5
    Author:
    Nicola Tonellotto and Craig Macdonald
    See Also:
    Matching
    • Field Detail

      • logger

        protected static final org.slf4j.Logger logger
      • IGNORE_LOW_IDF_TERMS

        protected static boolean IGNORE_LOW_IDF_TERMS
        A property that enables to ignore the terms with a low IDF. Controlled by ignore.low.idf.terms property, defualts to false.
      • termPostings

        protected final java.util.List<IterablePosting> termPostings
        posting lists for each term
      • termModels

        protected final java.util.List<WeightingModel> termModels
        weighting models for each term
      • termStatistics

        protected final java.util.List<EntryStatistics> termStatistics
        EntryStatistics for each term
      • termStrings

        protected final java.util.List<java.lang.String> termStrings
        String form for each term
      • termTags

        protected final java.util.List<java.util.Set<java.lang.String>> termTags
        String form for each term
      • matchOnTerms

        protected final gnu.trove.TIntArrayList matchOnTerms
      • nonMatchOnTerms

        protected final gnu.trove.TIntArrayList nonMatchOnTerms
      • termKeyFreqs

        protected final gnu.trove.TDoubleArrayList termKeyFreqs
        key (query) frequencies for each term
      • numTerms

        protected int numTerms
        number of terms
      • index

        protected Index index
        underlying index
      • lexicon

        protected Lexicon<java.lang.String> lexicon
        lexicon for the index
      • collectionStatistics

        protected CollectionStatistics collectionStatistics
        statistics of the collection
      • requiredBitMask

        protected long requiredBitMask
        which terms are positively required to match in retrieved documents
      • negRequiredBitMask

        protected long negRequiredBitMask
    • Constructor Detail

      • PostingListManager

        protected PostingListManager​(Index _index,
                                     CollectionStatistics cs)
                              throws java.io.IOException
        Create a posting list manager for the given index and statistics
        Throws:
        java.io.IOException
      • PostingListManager

        public PostingListManager​(Index _index,
                                  CollectionStatistics _cs,
                                  MatchingQueryTerms mqt)
                           throws java.io.IOException
        Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.
        Parameters:
        _index - - index to obtain postings from
        _cs - - collection statistics to obtain
        mqt - - MatchingQueryTerms object calculated for the query
        Throws:
        java.io.IOException
      • PostingListManager

        public PostingListManager​(Index _index,
                                  CollectionStatistics _cs,
                                  MatchingQueryTerms mqt,
                                  boolean splitSynonyms,
                                  java.lang.String scoringTag,
                                  java.lang.String additionalTag)
                           throws java.io.IOException
        Create a posting list manager for the given index and statistics, and populated using the specified MatchingQueryTerms.
        Parameters:
        _index - - index to obtain postings from
        _cs - - collection statistics to obtain
        mqt - - MatchingQueryTerms object calculated for the query
        splitSynonyms - - allows the splitting of synonym groups (i.e. singleTermAlternatives) to be disabled
        Throws:
        java.io.IOException
    • Method Detail

      • mergeStatistics

        public static EntryStatistics mergeStatistics​(EntryStatistics[] entryStats)
        Knows how to merge several EntryStatistics for a single effective term
      • prepare

        public void prepare​(boolean firstMove)
                     throws java.io.IOException
        Counts the number of terms active. If firstMove is true, it will move each posting to the first posting.
        Parameters:
        firstMove - move all postings to the start?
        Throws:
        java.io.IOException
      • getStatistics

        public EntryStatistics getStatistics​(int i)
        Returns the EntryStatistics corresponding to the specified term
        Parameters:
        i - term to obtain statistics for
        Returns:
        Statistics for this i-1th term
      • getPosting

        public IterablePosting getPosting​(int i)
        Returns the IterablePosting corresponding to the specified term
        Parameters:
        i - term to obtain the posting list for
        Returns:
        Posting list for this i-1th term
      • size

        public int size()
        Returns the number of posting lists for this query
      • getNumTerms

        public int getNumTerms()
        Returns the number of postings lists (that are terms) for this query
      • getMatchingTerms

        public int[] getMatchingTerms()
        Returns the indices of the terms that are considered (i.e. scored) during matching
      • getNonMatchingTerms

        public int[] getNonMatchingTerms()
        Returns the indices of the terms that must be called through assignScore() but not actually used to match documents.
      • score

        public double score​(int i)
        Returns the score using all weighting models for the current posting of the specified term
        Parameters:
        i - Which term to score
        Returns:
        score obtained from all weighting models for that term
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • getRequiredBitMask

        public long getRequiredBitMask()
      • getNegRequiredBitMask

        public long getNegRequiredBitMask()
      • getTerm

        public java.lang.String getTerm​(int i)
      • getTags

        public java.util.Set<java.lang.String> getTags​(int i)
      • getKeyFrequency

        public double getKeyFrequency​(int i)