Terrier Core

Pipeline Query/Doc Policy Lifecycle

Details

  • Type: Improvement Improvement
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 3.0
  • Fix Version/s: 3.5
  • Component/s: None
  1. patch.pipeline.stilo
    (55 kB)
    Giovanni Stilo
    12/Mar/10 3:23 PM

Issue Links

Activity

Hide
Giovanni Stilo added a comment - 12/Mar/10 3:22 PM

It should be usefull to have some kind of policy for the pipeline (reset) that should be applied every Documents or every Query submitted to the system.

Example:
You want put in the pipeline a stage that is re-initialized every Query.

Solution:
Here i'm going to give my solution.

The solution refactoring the org.terrier.terms introducing the reset() method in
TermPipeline interface and TermPipelineAccessor.
This change in interface affected all the TermPipeline so new base class (BaseTermPipeline) was created and inerithed by all TermPipeline.
BaseTermPipeline give a default implementation of reset() method and also move the "next" atribute in it.

The patch also affected the Manager class and many Indexer classes:
Manager
Indexer
BasicIndexer
BasicSinglePassIndexer
BlockIndexer
Hadoop_BasicSinglePassIndexer

Thanks to all.

Show
Giovanni Stilo added a comment - 12/Mar/10 3:22 PM It should be usefull to have some kind of policy for the pipeline (reset) that should be applied every Documents or every Query submitted to the system. Example: You want put in the pipeline a stage that is re-initialized every Query. Solution: Here i'm going to give my solution. The solution refactoring the org.terrier.terms introducing the reset() method in TermPipeline interface and TermPipelineAccessor. This change in interface affected all the TermPipeline so new base class (BaseTermPipeline) was created and inerithed by all TermPipeline. BaseTermPipeline give a default implementation of reset() method and also move the "next" atribute in it. The patch also affected the Manager class and many Indexer classes: Manager Indexer BasicIndexer BasicSinglePassIndexer BlockIndexer Hadoop_BasicSinglePassIndexer Thanks to all.
Hide
Craig Macdonald added a comment - 30/Mar/11 7:39 PM

Tagging for 3.1

Show
Craig Macdonald added a comment - 30/Mar/11 7:39 PM Tagging for 3.1
Hide
Craig Macdonald added a comment - 31/Mar/11 12:43 PM

Hi Stilo,

Just working on this now. Two things that I changed:

  • Reset is called AFTER a document/query
  • Its not optional in the Manager, it is called after every query (i.e. no additional property)
  • I didnt make the BaseTermPipeline class.

Can you think of a way of providing a Junit test for this?

Show
Craig Macdonald added a comment - 31/Mar/11 12:43 PM Hi Stilo, Just working on this now. Two things that I changed:
  • Reset is called AFTER a document/query
  • Its not optional in the Manager, it is called after every query (i.e. no additional property)
  • I didnt make the BaseTermPipeline class.
Can you think of a way of providing a Junit test for this?
Hide
Giovanni Stilo added a comment - 31/Mar/11 2:14 PM - edited

Hi Craig.
Unfortunatly i can't provide a Test class (i'm not yet focused on this problem now).
But i think you can make a simple test by make a termpipeline that print something like "Hello wolrd" every document.
Anyway i did not agree on your approach; i my mind the reset option it's necessary to have a document/query oriented
"filtering" so in this sense a BEFORE approach may fit better then a AFTER reset approach.
I didn't understand why you remove BaseTermPipeline hierarcy is elegant for me but probablu u need less ineritanche?

Bye
GS

Show
Giovanni Stilo added a comment - 31/Mar/11 2:14 PM - edited Hi Craig. Unfortunatly i can't provide a Test class (i'm not yet focused on this problem now). But i think you can make a simple test by make a termpipeline that print something like "Hello wolrd" every document. Anyway i did not agree on your approach; i my mind the reset option it's necessary to have a document/query oriented "filtering" so in this sense a BEFORE approach may fit better then a AFTER reset approach. I didn't understand why you remove BaseTermPipeline hierarcy is elegant for me but probablu u need less ineritanche? Bye GS
Hide
Craig Macdonald added a comment - 31/Mar/11 2:19 PM

Hi Stilo,

Thanks for the quick response. My idea with a reset AFTER is that a TermPipeline instance could buffer some terms, and then only let them out once reset() is called. However, you still want them in the same document, so in this respect, reset() is like a flush().

About the base classes - I already made a Stemmer base class, which encapsulated most of the changes.

Show
Craig Macdonald added a comment - 31/Mar/11 2:19 PM Hi Stilo, Thanks for the quick response. My idea with a reset AFTER is that a TermPipeline instance could buffer some terms, and then only let them out once reset() is called. However, you still want them in the same document, so in this respect, reset() is like a flush(). About the base classes - I already made a Stemmer base class, which encapsulated most of the changes.
Hide
Giovanni Stilo added a comment - 31/Mar/11 3:04 PM

Other side of the coin.

GS

Show
Giovanni Stilo added a comment - 31/Mar/11 3:04 PM Other side of the coin. GS
Hide
Craig Macdonald added a comment - 31/Mar/11 4:12 PM

Committed to trunk for version 3.1 release.

Show
Craig Macdonald added a comment - 31/Mar/11 4:12 PM Committed to trunk for version 3.1 release.
Hide
Craig Macdonald added a comment - 31/Mar/11 4:17 PM - edited

Hi Giovanni, Your affiliation is still University of Rome Tor Vergata, right? Need to credit you in the changes documentation.

Show
Craig Macdonald added a comment - 31/Mar/11 4:17 PM - edited Hi Giovanni, Your affiliation is still University of Rome Tor Vergata, right? Need to credit you in the changes documentation.
Hide
Giovanni Stilo added a comment - 31/Mar/11 5:03 PM - edited

Craig,
thanks i'm:
University degli Studi dell'Aquila
and
Nestor Laboratory - University of Rome "Tor Vergata"

many many thanks
GS.

Show
Giovanni Stilo added a comment - 31/Mar/11 5:03 PM - edited Craig, thanks i'm: University degli Studi dell'Aquila and Nestor Laboratory - University of Rome "Tor Vergata" many many thanks GS.

People

Dates

  • Created:
    12/Mar/10 3:13 PM
    Updated:
    01/Apr/11 3:18 PM
    Resolved:
    31/Mar/11 4:12 PM