Processing .gz files #630

srinubabuin · 2023-06-07T11:03:05Z

Hi Team,
While processing .gz files using Cobrix we are getting the error like as follows

There are some files in abc.gz that are NOT DIVISIBLE by the RECORD SIZE calculated from the copybook (3018 bytes per record). Check the logs for the names of the files.

but my abc.gz has only one file. Is cobrix supports .gz file processing? if not can we pass inputstream to cobrix instead of file ?

yruslan · 2023-06-07T11:05:43Z

Hi @srinubabuin ,

No, compression is not supported, and neither are inputStreams (although I'm not 100% sure what do you mean there).

The best option is to unpack the file first.

srinubabuin · 2023-06-07T11:25:56Z

Hi Yruslan,
https://github.com/AbsaOSS/cobrix/blob/master/spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala
In this code finally we are finally reading BufferedFSDataInputStream with filePath, so here can i pass directly BufferedFSDataInputStream instead of filePath?

private var bufferedStream = new BufferedFSDataInputStream(getHadoopPath(filePath), fileSystem, startOffset, Constants.defaultStreamBufferInMB, maximumBytes)

yruslan · 2023-06-07T12:05:48Z

Sorry I'm not sure I understand. Keep in mind that the file will be ready in Executors, not on the driver node, and you cannot pass the stream from the driver to an executor. You need to create the stream on the executor. But you can create this stream from the file path.

Alternatively, you can use RDDs to read and uncompress input files, and then apply the record extractor to it. the example is called "Working example 3 - Using RDDs and record parsers directly" from https://github.com/AbsaOSS/cobrix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing .gz files #630

Processing .gz files #630

srinubabuin commented Jun 7, 2023

yruslan commented Jun 7, 2023

srinubabuin commented Jun 7, 2023

yruslan commented Jun 7, 2023

Processing .gz files #630

Processing .gz files #630

Comments

srinubabuin commented Jun 7, 2023

yruslan commented Jun 7, 2023

srinubabuin commented Jun 7, 2023

yruslan commented Jun 7, 2023