-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing .gz files #630
Comments
Hi @srinubabuin , No, compression is not supported, and neither are inputStreams (although I'm not 100% sure what do you mean there). The best option is to unpack the file first. |
Hi Yruslan, private var bufferedStream = new BufferedFSDataInputStream(getHadoopPath(filePath), fileSystem, startOffset, Constants.defaultStreamBufferInMB, maximumBytes) |
Sorry I'm not sure I understand. Keep in mind that the file will be ready in Executors, not on the driver node, and you cannot pass the stream from the driver to an executor. You need to create the stream on the executor. But you can create this stream from the file path. Alternatively, you can use RDDs to read and uncompress input files, and then apply the record extractor to it. the example is called "Working example 3 - Using RDDs and record parsers directly" from https://github.com/AbsaOSS/cobrix |
Hi Team,
While processing .gz files using Cobrix we are getting the error like as follows
There are some files in abc.gz that are NOT DIVISIBLE by the RECORD SIZE calculated from the copybook (3018 bytes per record). Check the logs for the names of the files.
but my abc.gz has only one file. Is cobrix supports .gz file processing? if not can we pass inputstream to cobrix instead of file ?
The text was updated successfully, but these errors were encountered: