Terrier Core

Singlepass indexing efficiency hindered by getMemoryConsumption() calls

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Blocker Blocker
  • Resolution: Fixed
  • Affects Version/s: 3.0
  • Fix Version/s: 3.0
  • Component/s: .indexing
  • Description:
    Hide
    Terrier 2.2.1 was reported to index WT10G in the follow times (two pass, singlepass, two pass + blocks, singlepass + blocks):
    62.5 min
    34.7 min
    2hour 18min
    53.1 min

    It would appear that these times are no longer being achieved for Terrier 3. Profiling is required.
    Show
    Terrier 2.2.1 was reported to index WT10G in the follow times (two pass, singlepass, two pass + blocks, singlepass + blocks): 62.5 min 34.7 min 2hour 18min 53.1 min It would appear that these times are no longer being achieved for Terrier 3. Profiling is required.

Activity

Hide
Craig Macdonald added a comment - 08/Sep/09 1:30 PM

Most time was spent summing up the memory consumption of the MemoryPostings object:

rank self accum count trace method
1 41.13% 41.13% 32556 300488 uk.ac.gla.terrier.structures.indexing.singlepass.MemoryPostings.getMemoryConsumption

TRACE 300488:
uk.ac.gla.terrier.structures.indexing.singlepass.MemoryPostings.getMemoryConsumption(MemoryPostings.java:128)
uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.checkFlush(BasicSinglePassIndexer.java:287)
uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.indexDocument(BasicSinglePassIndexer.java:320)
uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:224)

This was caused by TREC-43.

Show
Craig Macdonald added a comment - 08/Sep/09 1:30 PM Most time was spent summing up the memory consumption of the MemoryPostings object: rank self accum count trace method 1 41.13% 41.13% 32556 300488 uk.ac.gla.terrier.structures.indexing.singlepass.MemoryPostings.getMemoryConsumption TRACE 300488: uk.ac.gla.terrier.structures.indexing.singlepass.MemoryPostings.getMemoryConsumption(MemoryPostings.java:128) uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.checkFlush(BasicSinglePassIndexer.java:287) uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.indexDocument(BasicSinglePassIndexer.java:320) uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:224) This was caused by TREC-43.
Hide
Craig Macdonald added a comment - 08/Sep/09 3:06 PM

Instead of calculating memory usage for every call of getMemoryConsumption(), I instead keep track of consumption as postings objects are added or updated. For WT2G indexing, this takes indexing time (with profiling enabled) down from 989.647 seconds to 388.993 seconds (with profiling enabled).

Show
Craig Macdonald added a comment - 08/Sep/09 3:06 PM Instead of calculating memory usage for every call of getMemoryConsumption(), I instead keep track of consumption as postings objects are added or updated. For WT2G indexing, this takes indexing time (with profiling enabled) down from 989.647 seconds to 388.993 seconds (with profiling enabled).
Hide
Craig Macdonald added a comment - 08/Sep/09 3:09 PM

Changed committed to SVN.

Show
Craig Macdonald added a comment - 08/Sep/09 3:09 PM Changed committed to SVN.
Hide
Iadh Ounis added a comment - 08/Sep/09 3:51 PM - edited

Nicola also reported slow indexing with WT2G. He said he could not achieve the timings mentioned on the Terrier web page on his laptop with Terrier3. Is this due to the same issue?

Show
Iadh Ounis added a comment - 08/Sep/09 3:51 PM - edited Nicola also reported slow indexing with WT2G. He said he could not achieve the timings mentioned on the Terrier web page on his laptop with Terrier3. Is this due to the same issue?
Hide
Craig Macdonald added a comment - 08/Sep/09 3:55 PM

Indeed, it was he who alerted me that there might be a problem here. I wont tell him how to fix it, as he doesn't have to index too regularly anyway.

Show
Craig Macdonald added a comment - 08/Sep/09 3:55 PM Indeed, it was he who alerted me that there might be a problem here. I wont tell him how to fix it, as he doesn't have to index too regularly anyway.
Hide
Iadh Ounis added a comment - 08/Sep/09 4:58 PM

I told him to speak to you about the issue. Good (that the bug was fixed). You might be right: we do need some sort of unit testing after all.

Show
Iadh Ounis added a comment - 08/Sep/09 4:58 PM I told him to speak to you about the issue. Good (that the bug was fixed). You might be right: we do need some sort of unit testing after all.

People

Dates

  • Created:
    04/Sep/09 4:35 PM
    Updated:
    05/Mar/10 5:02 PM
    Resolved:
    08/Sep/09 3:09 PM