Terrier Core

Enable Hadoop-mode Map Output Compression

Details

  • Type: Improvement Improvement
  • Status: Resolved Resolved
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: 3.0
  • Fix Version/s: 3.0
  • Component/s: .indexing
  • Description:
    Hide
    Hadoop supports the compression of map outputs. Some examination has found that the sequence files of map output that Hadoop moves to the reducer can be halfed in size for Terrier map reduce indexing by applying gzip. This suggests that using Haoop map output compression may be beneficial. See http://hadoop.apache.org/core/docs/r0.18.3/mapred_tutorial.html#Data+Compression for more details.

    In this issue I will report space and efficiency changes in applying various compression changes.
    Show
    Hadoop supports the compression of map outputs. Some examination has found that the sequence files of map output that Hadoop moves to the reducer can be halfed in size for Terrier map reduce indexing by applying gzip. This suggests that using Haoop map output compression may be beneficial. See http://hadoop.apache.org/core/docs/r0.18.3/mapred_tutorial.html#Data+Compression for more details. In this issue I will report space and efficiency changes in applying various compression changes.

Activity

Hide
Richard McCreadie added a comment - 12/May/09 3:03 PM

The Patch to add Map Compression using GZip.
Argument is -c on the command line.

This also improves how arguments are processed from the command line and adds a basic help command (displayed by placing the String help (not case sensitive) any where in the command line).

Show
Richard McCreadie added a comment - 12/May/09 3:03 PM The Patch to add Map Compression using GZip. Argument is -c on the command line. This also improves how arguments are processed from the command line and adds a basic help command (displayed by placing the String help (not case sensitive) any where in the command line).
Hide
Craig Macdonald added a comment - 12/May/09 9:10 PM

If experimentation shows that map output compression is beneficial to efficiency, then I would be inclined to leave it on all the time, rather than adding a command-line option or a Terrier property.

Show
Craig Macdonald added a comment - 12/May/09 9:10 PM If experimentation shows that map output compression is beneficial to efficiency, then I would be inclined to leave it on all the time, rather than adding a command-line option or a Terrier property.
Hide
Richard McCreadie added a comment - 26/May/09 2:11 PM

Bug found in patch ; conf.setMapOutputCompressorClass(GzipCodec.class); causes a null pointer exception during map output, even if compression mode is not selected.

Show
Richard McCreadie added a comment - 26/May/09 2:11 PM Bug found in patch ; conf.setMapOutputCompressorClass(GzipCodec.class); causes a null pointer exception during map output, even if compression mode is not selected.
Hide
Richard McCreadie added a comment - 26/May/09 2:47 PM

I have no idea what is causing this, as it worked in a previous version. It may be an issue with the new Hadoop.

Show
Richard McCreadie added a comment - 26/May/09 2:47 PM I have no idea what is causing this, as it worked in a previous version. It may be an issue with the new Hadoop.
Hide
Craig Macdonald added a comment - 27/May/09 3:17 PM

Can you paste a stack trace?

Show
Craig Macdonald added a comment - 27/May/09 3:17 PM Can you paste a stack trace?
Hide
Craig Macdonald added a comment - 12/Aug/09 7:52 PM

I'd really like to have this turned on by default. Can you provide a working version of this patch?

Show
Craig Macdonald added a comment - 12/Aug/09 7:52 PM I'd really like to have this turned on by default. Can you provide a working version of this patch?
Hide
Craig Macdonald added a comment - 08/Sep/09 10:08 AM

Issue is that for some reason, we cannot use "local" job tracker and have compression working. I have enabled it, but with this special case.

Show
Craig Macdonald added a comment - 08/Sep/09 10:08 AM Issue is that for some reason, we cannot use "local" job tracker and have compression working. I have enabled it, but with this special case.
Hide
Craig Macdonald added a comment - 09/Sep/09 1:59 PM

I committed this.

Show
Craig Macdonald added a comment - 09/Sep/09 1:59 PM I committed this.

People

Dates

  • Created:
    27/Mar/09 5:23 PM
    Updated:
    05/Mar/10 4:47 PM
    Resolved:
    09/Sep/09 1:59 PM