-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
record_format VB file fails with length of BDW block is too big #642
Comments
Hi, Here is an example of processing of VB (BDW+RDW) files with a simple copybook: |
Hi @yruslan Thanks for the reply. We already gone through above code but didn't get any solution to our problem, as mentioned above whenever we trying to use VB option with the adjustment, we are getting "The length of BDW block is too big" error. dataframe = spark.read.format("cobol"). WARN BlockManager: Putting block rdd_1_0 failed due to exception java.lang.IllegalStateException: The length of BDW block is too big. Got 1223880942. Header: 200,242,240,242, offset: 0.. Can you please provide some solution for above issue. |
The generic approach is to try to simulate the record header parser manually in a Hex Editor in order to understand headers of your file. Things like:
The error message indicates that the record extractor encountered a wrong BDW block. This can happen when there is no BDW header at the specified offset. Now i've noticed that the error happens at the offset 0. Are you sure your file has BDW+RDW headers? What are the first 8 bytes of your file? |
I am having similar error. My file is located at https://raw.githubusercontent.com/jaysara/spark-cobol-jay/main/data/ebcdic_bdwrdw.dat This is how this file is parsed outside of spark. This file has total 12 Records (including Header and Trailer). I am using folllowign
This is my copybook contents
The above file has BDW only for one record. That may not be the typical case. (more commoly)We will have a BDW for multiple records.. e.g Record -2 : What else should I specify? Here is the error that I get,
|
Hi, The example file starts with 0xC3 0xB0 0xC3 0xB4 (which is the same are reported by the error message Please, clarify how did you parse the file to get BDW=430, RDW=426? Which bytes of the file? |
I apologize. I made an error in uploading the file. The ebcdic file with BDW and RDW is at https://raw.githubusercontent.com/jaysara/spark-cobol-jay/main/data/bdw-rdw-sample-ebcdic.dat Here are the read options that I use,
I get following error ,
If I change the record_format to V from VB |
Hi, More on BDW headers: https://www.ibm.com/docs/en/zos/2.1.0?topic=records-block-descriptor-word-bdw If the file has variable length records, these are options available:
From my experience, quite often the team that handles copying of data from the mainframe can adjust conversion options to include RDW headers. This is the most reliable way of getting the data as accurate as possible. |
When converting a Variable Block format EBCDIC file, I got the error "The length of BDW block is too big", tried with following option but still getting same error.
dataframe = spark.read.format("cobol").
option("copybook", util_params["copybook_path"]).
option("encoding", "ebcdic").
option("schema_retention_policy", "collapse_root").
option("record_format", "VB").
option("is_bdw_big_endian", "true").
option("is_rdw_big_endian", "true").
option("bdw_adjustment", -4) .
option("rdw_adjustment", -4) .
option("generate_record_id", True).
load(file_path)
Error:
WARN BlockManager: Putting block rdd_1_0 failed due to exception java.lang.IllegalStateException: The length of BDW block is too big. Got 1223880942. Header: 200,242,240,242, offset: 0..
WARN BlockManager: Block rdd_1_0 could not be removed as it was not found on disk or in memory
ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalStateException: The length of BDW block is too big. Got 1223880942. Header: 200,242,240,242, offset: 0.
Please suggest some way to fix this issue. Can you please share the example where you have tested the VB scenario with EBCDIC file and copybook for reference.
The text was updated successfully, but these errors were encountered: