Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression Level is ignored. #142

Open
wilcoln opened this issue Mar 3, 2020 · 2 comments
Open

Compression Level is ignored. #142

wilcoln opened this issue Mar 3, 2020 · 2 comments

Comments

@wilcoln
Copy link

wilcoln commented Mar 3, 2020

I want to compress some file already inside hdfs using different compression levels.
To do so, I write the following program:

Compress.java

import ...
import com.hadoop.compression.lzo.LzoCodec;

public class Compress {
  
 public static class VoidReducer extends Reducer<LongWritable, Text, Text, Text> {
  
   @Override
   public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
      for(Text value: values)
        context.write(value, new Text(""));
      }
   }

  public static void main(String[] args) throws Exception{

    Configuration conf = new Configuration();
    int level = Integer.parseInt(args[2]);
    conf.setInt("io.compression.codec.lzo.compression.level", level);

    Job job = Job.getInstance(conf);
    job.setJobName("Compresser Job");
    job.setJarByClass(Compress.class);
    job.setMapperClass(Mapper.class);
    job.setReducerClass(VoidReducer.class);
    job.setNumReduceTasks(1);

    TextInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job,  new Path(args[1]));
    FileOutputFormat.setCompressOutput(job, true);

    FileOutputFormat.setOutputCompressorClass(job, LzoCodec.class);
    // submit and wait for completion
    job.waitForCompletion(true);

Then I execute run the following commands

$ javac -classpath $(hadoop classpath) *.java
$ jar -cvf Compress.jar Compress.class
$ hadoop jar Compress.jar Compress file.txt test1 1
$ hadoop jar Compress.jar Compress file.txt test7 7

The filefile.txt is of size 1Gb. When I then check the size of test1 and test2 with
hdfs dfs -du -s -h, I get 594.6 M for each.
This proves that the compression level is ignored.

@wilcoln wilcoln changed the title Comression Level is ignored. Compression Level is ignored. Mar 3, 2020
@toddlipcon
Copy link
Contributor

Your code looks fine at first glance. I'm not actively maintaining this project anymore -- it's largely in maintenance mode as most people have moved on to using better file formats like Parquet along with LZ4 or Snappy. I'd suggest doing some debugging of your own -- rebuild hadoop-lzo with logging at the point where the compressor is created and see if it's getting passed through properly, and follow the breadcrumbs from there.

@wilcoln
Copy link
Author

wilcoln commented Mar 7, 2020

Ok thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants