Potential thread safety issue with LzoDecompressor #106

EugenCepoi · 2015-05-20T20:50:37Z

The problem occurs when trying to read lzo compressed files with spark using sc.textFile(...).
But works fine when using LzoTextInputFormat, with the same dataset and job config.

I encounter multiple

java.lang.InternalError: lzo1x_decompress_safe returned: -6
    at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method)
    at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:315)
    at com.hadoop.compression.lzo.LzopDecompressor.decompress(LzopDecompressor.java:122)
    at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:252)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
    at java.io.InputStream.read(InputStream.java:101)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)

And sometime few

Compressed length 892154724 exceeds max block size 67108864 (probably corrupt file)
  at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:291)

Those happen only when having multiple threads per jvm (multiple executor-cores).
We are using a snapshot version of 0.4.20 starting from this commit.

Thanks

The text was updated successfully, but these errors were encountered:

rangadi · 2015-05-20T21:11:26Z

I think this was fixed in #103

EugenCepoi · 2015-05-21T11:12:16Z

Just tried with latest commit but the problem remains

rangadi · 2015-05-22T18:52:33Z

too bad. Does each thread read from a different file or do multiple threads read from the same file? anything you can add here to reproduce easily will be very useful.

EugenCepoi · 2015-05-25T10:41:48Z

So it looks like it is due to something that changed in hadoop 2 and when using the basic textFile method from spark it expects the input to be splittable (in my case the files are not indexed).

Discussed on SO. Anyway using the input format avoids this problem.

Should I close this issue?

rangadi · 2015-05-29T21:32:47Z

So this was in fact because of reader trying to read from an arbitrary offset, right? Thanks for the update.

LTzycLT · 2016-05-23T08:26:44Z

So will it be fixed in the future version? I hope sc.textFile can decompress and split any input files correctly and automatically.

EugenCepoi · 2016-05-23T16:52:48Z

I don't know if it has been fixed but you can use LzoTextInputFormat with the lower level api methods where you can specify the input format to avoid this problem.

@rangadi yeah this is the problem. The reader thinks the input is splittable and tries to read at an arbitrary offset which yields to an invalid format. For small files that don't need to be splitted in theory the problem should not happen.

leafjungle · 2019-03-06T03:55:56Z

I meet the same problem. is it fixed on higher version? I am now using hadoop-lzo-0.4.19.jar ( it looks like be published on 2011 or 2013, too old).

EugenCepoi changed the title ~~Potential thread safety issue~~ Potential thread safety issue with LzoDecompressor May 20, 2015

EugenCepoi closed this as completed May 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential thread safety issue with LzoDecompressor #106

Potential thread safety issue with LzoDecompressor #106

EugenCepoi commented May 20, 2015

rangadi commented May 20, 2015

EugenCepoi commented May 21, 2015

rangadi commented May 22, 2015

EugenCepoi commented May 25, 2015

rangadi commented May 29, 2015

LTzycLT commented May 23, 2016

EugenCepoi commented May 23, 2016 •

edited

Loading

leafjungle commented Mar 6, 2019

Potential thread safety issue with LzoDecompressor #106

Potential thread safety issue with LzoDecompressor #106

Comments

EugenCepoi commented May 20, 2015

rangadi commented May 20, 2015

EugenCepoi commented May 21, 2015

rangadi commented May 22, 2015

EugenCepoi commented May 25, 2015

rangadi commented May 29, 2015

LTzycLT commented May 23, 2016

EugenCepoi commented May 23, 2016 • edited Loading

leafjungle commented Mar 6, 2019

EugenCepoi commented May 23, 2016 •

edited

Loading