Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packed-Decimal columns help required in outgoing EBCDIC file #628

Open
ManojKolisetty-git opened this issue May 30, 2023 · 3 comments
Open
Labels
question Further information is requested

Comments

@ManojKolisetty-git
Copy link

ManojKolisetty-git commented May 30, 2023

Background [Optional]

Hi,

I have requirement to create outgoing EBCDIC file (fixed width) with header, detail and footer records of length 155 bytes. I am aware that cobrix dont support writing to EBCDIC using copybooks hence I am writing dataframe to TEXT and encoding to CP037.

Question

In my copybook (file spec) there are few columns with Packed-Decimal. I am trying to understand how numeric data can be converted into packed decimal before encoding to Cp037.

Attribute Name = File Date
Format = S9(07) Packed Decimal (This field indicates the file creation date. The date format is Julian (YYYYDDD))
Length = 4

I tried converting to binary and populated in file but not working when reading back using cobrix (pyspark). Please help with some examples of converting to packed decimal.

image

For testing I am trying to create outgoing EBCDIC file and read it using cobrix in pyspark. Below is the copybook i created for reading header

   01 HEADER.
      10  HeaderRecordKey                                       PIC X(32).
      10  SourceStatus	                                           PIC X(01).
      10  FileDate                       PACKED-DECIMAL PIC S9(7).
      10  FileTime                       PACKED-DECIMAL PIC S9(11).
      10  Filler							   PIC X(112).
@ManojKolisetty-git ManojKolisetty-git added the question Further information is requested label May 30, 2023
@yruslan
Copy link
Collaborator

yruslan commented May 31, 2023

Packed decimals are supported by Cobrix when such fields have USAGE IS COMP-3.

As you said, Cobrix does not support writing to ebcdic files, but maybe the source code of teh decoder can help you encode them:

def decodeBCDIntegralNumber(bytes: Array[Byte], mandatorySignNibble: Boolean): java.lang.Long = {

Example:

10 SOME-FIELD  PIC S9(4)  USAGE IS COMP-3.

Further documentation:
https://www.ibm.com/docs/en/i/7.2?topic=type-packed-decimal-format
https://www.geeksforgeeks.org/comp-3-in-cobol/

@ManojKolisetty-git
Copy link
Author

ManojKolisetty-git commented May 31, 2023

Hi,

Thanks for your reply.

below are sequence of steps am doing to create EBCDIC file for the above header

  1. All string columns (non COMP-3) Using withColumn transformation am encoding to 'cp037'
  2. Concatenating all columns and writing to TEXT file (not encoding again as in 1st step i have done column by column encdoding)

Issue is around COMP-3 columns - I am currently using below UDF to pack comp-3 and storing in BinaryType() column in a dataframe. As data is already converted, should i need to encode to 'CP037'? I tried but no luck

If it is not right, please help me with sequence steps to convert to COMP-3 and encoding to ebcdic (Cp037)

For example:
value = 2023115
packed = pack_number(value)
Result is = b' #\x11_'

def pack_number(n):
""" Pack a COMP-3 number. Format: PIC 9(7). """
# Cobol numbers are stored without decimal info. Remove the decimal before
# calling pack_number()
n = int(n)

# Is the number negative?  Remember for later.
negative = False
if n < 0:
    negative = True
    n *= -1

# Treat the number as a string.  Makes it easier to loop over.
n_str = str(n)
b = int(n_str[0])

# For each digit, shift it onto the result.
for c in n_str[1:]:
    b = (b << 4) | int(c)

# Make the number negative if needed.
if negative:
    b = (b << 4) | 0xd
else:
    b = (b << 4) | 0xf

# Pack the number as a long long and chop off the unused bits at the
# beginning.  This will need to be changed for varying PICture clauses.
b_packed = pack('>q', b)
if len(b_packed) > 4:
    b_packed = b_packed[-4:]

return b_packed

Regards,
Manoj

@yruslan
Copy link
Collaborator

yruslan commented Aug 4, 2023

Hi @ManojKolisetty-git , sorry for the late response.

COMP-3 fields should not be encoded in CP037 nor any other encodings. Encodings are for string and numerics in DISPLAY format only. COMP-3 defines representation of numbers encoding-agnostic.

Did the code above worked an the end to create COMP-3 fields?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants