[Bug]: Table Input + Text File Output - trailing spaces in output #4303

dave-csc · 2024-09-09T10:26:53Z

Apache Hop version?

2.9.0, 2.10.0 SNAPSHOT

Java version?

openjdk 21.0.2 2024-01-16

Operating system

Linux

What happened?

Steps to reproduce:

Create a template pipeline for Metadata Injection, with a Table Input and a Text File Output transforms
Create a second pipeline, inject in the previous one a query on an Oracle connection and the file in which writing the output

The VARCHAR2/String values are always written in the file with the trailing spaces needed to match their maximum length, regardless you choose or not the flag "Right pad fields?".

A workaround would probably be to use a String operations to trim data manually, but to use that you need to specify the table/view schema (so you can't use it with Metadata Injection)

Issue Priority

Priority: 2

Issue Component

Component: Metadata, Component: Transforms

dave-csc · 2024-09-09T13:35:47Z

Update: the same behaviour occurs even without the use of Metadata Injection (i.e. the simplest pipeline Table Input -> Text File Output).

Update # 2: I managed to have a simpler file by selecting the option "Fast data dump", in this case the output has no trailing spaces as expected.

I suggest to revise the functionalities "Right pad fields" and "Fast data dump": I would expect to have just the first one in the UI and select it if I like to have a padded output (or keep it unselected if I don't want to). In the documentation it's not so clear what this sentence means: "Improves the performance when dumping large amounts of data to a text file by not including any formatting information", since a text file has no format information...

hansva · 2024-09-09T17:17:19Z

I suspect that you have specified a length on the text fields in the text file output, putting the length to 0 will limit the size to the exact length.

The main reason for this is that some people need fixed width fields (usually older systems)

There is a button in the bottom for "minimal width" that will change the settings accordingly

For the trim types the documentation states:

The trimming method to apply on the string. Note: Trimming only works when there is no field length given.

dave-csc · 2024-09-10T07:47:44Z

I suspect that you have specified a length on the text fields in the text file output, putting the length to 0 will limit the size to the exact length.

Actually not: I just created a template pipeline with the two components and injected the SQL query and the file to write (I can't set a schema in Text File Output, since I'm using this template to dump several tables in text files).
When developing the same pipeline as a stand-alone one, you can use the Get Fields button and you get the schema from the database query, including the field lengths.
By the way, lengths are retrieved for Integer values too, but differently from Strings they're not applied in the files with the default options (they occupy the minimum length, even with the option "Right pad fields" enabled).

There is a button in the bottom for "minimal width" that will change the settings accordingly

It works if you set a stand-alone pipeline, but this function can't be set through metadata injection

The main reason for this is that some people need fixed width fields (usually older systems)
[...]
For the trim types the documentation states:

The trimming method to apply on the string. Note: Trimming only works when there is no field length given.

I was referring to the "Fast data dump" option, not the "Trim type" one. And indeed for that reason I think the "Fast data dump" and "Right pad fields" should be merged and corrected: you could only implement a single flag like "Write fixed length fields". When enabled it prints a file with fixed width fields (suitable for older systems), when disabled the file is written in CSV-style...

hansva · 2024-09-10T08:23:53Z

Fast data dump is not related to fixed width, it just means it will ignore the formatting options you specify on the fields. Eg we will dump the raw float when the field is a number and we don't format it to #.##. You are a bit in an edge case where the size of the strings is coming from the database but you want to ignore those.

dave-csc added awaiting triage bug labels Sep 9, 2024

github-actions bot added P2 Default Priority Metadata Transforms labels Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Table Input + Text File Output - trailing spaces in output #4303

[Bug]: Table Input + Text File Output - trailing spaces in output #4303

dave-csc commented Sep 9, 2024

dave-csc commented Sep 9, 2024 •

edited

Loading

hansva commented Sep 9, 2024 •

edited

Loading

dave-csc commented Sep 10, 2024

hansva commented Sep 10, 2024

[Bug]: Table Input + Text File Output - trailing spaces in output #4303

[Bug]: Table Input + Text File Output - trailing spaces in output #4303

Comments

dave-csc commented Sep 9, 2024

Apache Hop version?

Java version?

Operating system

What happened?

Issue Priority

Issue Component

dave-csc commented Sep 9, 2024 • edited Loading

hansva commented Sep 9, 2024 • edited Loading

dave-csc commented Sep 10, 2024

hansva commented Sep 10, 2024

dave-csc commented Sep 9, 2024 •

edited

Loading

hansva commented Sep 9, 2024 •

edited

Loading