Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading table blocks fails when there is a comment to the right of the column name row #72

Closed
jfcorbett opened this issue Nov 10, 2020 · 1 comment · Fixed by #73
Closed
Assignees
Labels
bug Something isn't working

Comments

@jfcorbett
Copy link
Member

jfcorbett commented Nov 10, 2020

The reader functions fail when a (perfectly legal) comment is located to the right of the column name row, in a non-transposed table. For example when reading this CSV data:

        **places;
        all
        place;distance;ETA;is_hot;;;---> parser chokes on this perfectly legal comment <---; 
        text;km;datetime;onoff
        home;0.0;2020-08-04 08:00:00;1
        work;1.0;2020-08-04 09:00:00;0
        beach;2.0;2020-08-04 17:00:00;1

This is due to a misconceived "leniency" in pdtable.io.parsers.blocks.preprocess_column_names():

def preprocess_column_names(col_names_raw: Sequence[str], fixer: ParseFixer):
    """
       handle known issues in column_names
    """
    n_names_col = len(col_names_raw)
    for el in reversed(col_names_raw):
        if el is not None and len(el) > 0:
            break
        n_names_col -= 1

    ...

Thus everything on the column name line is counted as a column name up to the last non-blank cell, including any comments and all the empty cells between the actual column names and comments.

This is later passed to a ParseFixer via fixer.fix_missing_column_name(input_columns=column_names) the fixer then assumes that the empty cells are simply column names that the user forgot to write in, and replaces them with placeholder column names 'missing_fixed_000', 'missing_fixed_001', ....

This breaks support for comments. All of this should be ripped out.

@jfcorbett jfcorbett added the bug Something isn't working label Nov 10, 2020
@jfcorbett jfcorbett self-assigned this Nov 10, 2020
@jfcorbett jfcorbett linked a pull request Nov 11, 2020 that will close this issue
@jfcorbett
Copy link
Member Author

solved by #73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant