Terrier Core

Index WARC collections

Details

  • Type: New Feature New Feature
  • Status: Resolved Resolved
  • Priority: Minor Minor
  • Resolution: Duplicate
  • Affects Version/s: 2.2.1
  • Fix Version/s: 3.0
  • Component/s: None
  • Description:
    The documents in the new TREC ClueWeb09 collection are formatted in WARC.
    It will be good if Terrier provides support for this format.
  1. TR-28.patch
    (23 kB)
    Craig Macdonald
    06/Jul/09 5:53 PM

Activity

Hide
Carlos Lorenzetti added a comment - 15/Oct/09 3:15 PM

Hi, I'm trying to index the UK2007 Spam collection that is in WARC format.
I've patched my current version of Terrier with the file attached here and there is a problem with the line: logger = Logger.getLogger(WARC018Collection.class) because there insn't a WARC018Collection class. Why this getLogger is different to the others?

Thank you.

Show
Carlos Lorenzetti added a comment - 15/Oct/09 3:15 PM Hi, I'm trying to index the UK2007 Spam collection that is in WARC format. I've patched my current version of Terrier with the file attached here and there is a problem with the line: logger = Logger.getLogger(WARC018Collection.class) because there insn't a WARC018Collection class. Why this getLogger is different to the others? Thank you.
Hide
Craig Macdonald added a comment - 08/Mar/10 1:50 PM

Duplicate of TR-36

Show
Craig Macdonald added a comment - 08/Mar/10 1:50 PM Duplicate of TR-36

People

Dates

  • Created:
    01/May/09 2:31 PM
    Updated:
    08/Mar/10 1:50 PM
    Resolved:
    08/Mar/10 1:50 PM