added a comment -
23/Apr/10 7:50 PM Sigh. 2.5 hours to index the wikipedia portion, btw.
INFO - map 100% reduce 100%
INFO - Job complete: job_201004231118_0004
INFO - Counters: 23
INFO - Job Counters
INFO - Launched reduce tasks=26
INFO - Rack-local map tasks=20
INFO - Launched map tasks=49
INFO - Data-local map tasks=29
INFO - FileSystemCounters
INFO - FILE_BYTES_READ=15790406810
INFO - HDFS_BYTES_READ=50375670689
INFO - FILE_BYTES_WRITTEN=23558026906
INFO - HDFS_BYTES_WRITTEN=4826037969
INFO - org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer$Counters
INFO - INDEXED_POINTERS=2656803981
INFO - INDEXED_TOKENS=5896971230
INFO - INDEXED_DOCUMENTS=5957529
INFO - INDEXER_FLUSHES=116
INFO - Map-Reduce Framework
INFO - Reduce input groups=6200909
INFO - Combine output records=0
INFO - Map input records=5957529
INFO - Reduce shuffle bytes=7534296876
INFO - Reduce output records=0
INFO - Spilled Records=213486017
INFO - Map output bytes=7646203740
INFO - Map input bytes=-41692599706
INFO - Combine input records=0
INFO - Map output records=70620646
INFO - Reduce input records=70620646
WARN - No reduce 0 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-0]
WARN - No reduce 1 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-1]
WARN - No reduce 2 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-2]
WARN - No reduce 3 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-3]
WARN - No reduce 4 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-4]
WARN - No reduce 5 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-5]
WARN - No reduce 6 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-6]
WARN - No reduce 7 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-7]
WARN - No reduce 8 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-8]
WARN - No reduce 9 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-9]
WARN - No reduce 10 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-10]
WARN - No reduce 11 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-11]
WARN - No reduce 12 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-12]
WARN - No reduce 13 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-13]
WARN - No reduce 14 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-14]
WARN - No reduce 15 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-15]
WARN - No reduce 16 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-16]
WARN - No reduce 17 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-17]
WARN - No reduce 18 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-18]
WARN - No reduce 19 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-19]
WARN - No reduce 20 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-20]
WARN - No reduce 21 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-21]
WARN - No reduce 22 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-22]
WARN - No reduce 23 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-23]
WARN - No reduce 24 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-24]
WARN - No reduce 25 output : no output index
[/home/soboroff/terrier-3.0/var/index,data-25]
java.lang.NullPointerException
java.lang.NullPointerException
at org.terrier.applications.HadoopIndexing.mergeLexiconInvertedFiles(HadoopIndexing.java:276)
at org.terrier.applications.HadoopIndexing.main(HadoopIndexing.java:231)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:373)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)
This patch has been contributed by Ian Soboroff (NIST), and is for Hadoop 0.20.1. The particular Hadoop core jar in use is attached to the issue.