Is it possible to read a nested Binary Field? #658

Il-Pela · 2024-02-16T16:32:45Z

Background

Let's say that I'm reading a "normal" AVRO file using Spark. One of the fields in the schema of this Avro is a Binary encoded as EBCDIC that should be decoded using a copycobol referenced by another field within the same schema.
Potentially each record can have its copycobol (so for each record the binary might have a different schema) and the desiderata is to produce a json version of the binary field to store somewhere else.

The DF looks something like this:

ID	SCHEMA_ID	BINARY_FIELD	FIELD1	FIELD2	.....
1	001	M1B1N4R11	valueX	valueZ	..
2	010	M1B1N4R12	valueY	valueW	..

And in the folder copycobol/ I have:

001.cob
010.cob

Question

Is it possible to leverage the library to decode a field instead of a file? Or do I have to save the binary field temporarily in a file and decode it from there?

Thank you for any suggestion! :)

yruslan · 2024-02-19T08:26:49Z

Hi, thanks for the interest in the library.
Yes, it is possible to use Cobrix in this case, but it can be quite involved. You can't use spark-cobol Spark data source to decode the data, but have to do it manually like this:

You need to parse each copybook to get an AST:

val copybookForField1 = CopybookParser.parseSimple(copyBookContents)

Then, you can decode each value by applying the copybook to the binary field:
```
 val row = RecordExtractors.extractRecord(copybookForField1.ast, field1Bytes, 0, handler = handler)
 val record = handler.create(row.toArray, copybook.ast)
```
The resulting record will be Array[Any] and for each subfield you can cast to the corresponding Java data type.
If you want decoding to happen in parallel handeled by Spark SQL, you can write a UDF per field. Each UDF could contain pre-parsed copybook, and can just apply extractRecord() and handler.create() to each value. The resulting output can be a JSON string. See how Jackson could be used to convert each record to a JSON:

cobrix/cobol-converters/src/test/scala/za/co/absa/cobrix/cobol/converters/extra/SerializersSpec.scala

Line 161 in 68f7362

val json = mapper.writeValueAsString(record)

Let me know if you decide to do it and have any issues.

Il-Pela added the question Further information is requested label Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to read a nested Binary Field? #658

Is it possible to read a nested Binary Field? #658

Il-Pela commented Feb 16, 2024

yruslan commented Feb 19, 2024

Is it possible to read a nested Binary Field? #658

Is it possible to read a nested Binary Field? #658

Comments

Il-Pela commented Feb 16, 2024

Background

Question

yruslan commented Feb 19, 2024