Yes.
U have problably have to reuse object and don't need to have all in memory
especially if u consider 1 context for each document.
I'm unclear here - are you suggesting that Context could swap events to disk for very large documents?
At the end a think u should stil use terrier as is why u need to chenge it?
I like the Terrier model at present, but it does need to evolve. I think that much is clear, from both Gianni's and my presentations in Rome, and the motivations in the original postfor this issue. Any use of the current model to address the existing problem results in un-standard code, where, with careful thought we could have an improved model, and easy code reuse between applications.
I'm trying to pursue one of two evolutions to the current model, rather than a revolution. However, it's good to discuss such changes to make sure we are evolving in the correct manner.
There are two designs in which this scheme could be carried out. In this comment, I enumerate the two design patterns,
[1] A DOM style method for every event-type:
e.g.
Advantages:
Disadvantages:
[2] Use an abstract class to represent of an event. Document implementations can choose the type of events they wish to produce, each pipeline object can choose the events they wish to process. Other events should be passed onto the next pipeline object unchanged.
Advantages:
Disadvantages:
—
Can we have a discussion about which proposal is preferred? And any merits or disadvantages of either that I have missed. Which do people prefer, and does it cover all of their use cases?
- Events themselves are lightweight (no extra object creation for every token)
Disadvantages:- Recall that most implementations will only use eventToken(). However, every implementation of EventPipeline would have to implement forward the events onto the next object.
- Difficult to add more event types.
[2] Use an abstract class to represent of an event. Document implementations can choose the type of events they wish to produce, each pipeline object can choose the events they wish to process. Other events should be passed onto the next pipeline object unchanged.- Event can be subclassed for more types of events
- Not every event causes a whole slew of method calls.
Disadvantages:- Event objects have to be created for every event. This may mean a new Set<String> and a TokenEvent object for every token, as the state of an Event is mutable. We need to consider carefully whether these objects can be made (a) immutable, and (b) lightweight in that they can be pooled and re-used. Re-use is complicated because an EventPipeline object may not free immeditately after processEvent() returns. This is because a pipeline object may buffer tokens (e.g. upto a sentence or document boundary).
— Can we have a discussion about which proposal is preferred? And any merits or disadvantages of either that I have missed. Which do people prefer, and does it cover all of their use cases?