Be explicit about the datatypes of each column in csv files #68

ablack3 · 2024-09-18T08:12:01Z

We have Eunomia CDM datasets stored in csv files. Currently the datatype of each column is not explicitly specified when reading in the data from csv which is causing #65.

In this PR I'm using the specification in the CommonDataModel package to be explicit about the datatypes when we read the csv files which should fix the issue. However this does mean that the column order matters.

I'm not sure if we consider column order (first, second, ect) part of the CDM specification but I noticed that in the GiBleed dataset the column order does not match the order in CommonDataModel specification csv. We can work around it and/or fix the file. It's a bit more tricky if we want to allow columns to be in any order but possible.

…umn.

ablack3 · 2024-09-18T10:10:35Z

I need to investigate and fix the failing tests.

fdefalco · 2024-09-18T12:53:50Z

Thanks for looking into this, another reason the duckdb based data examples are a nice direction to go in.

fdefalco · 2024-09-18T12:56:54Z

For the column order, I would suggest that the data files should match the order of the columns defined by the CDM specification, so would we rather update the data files to follow that column order as a fix?

ablack3 · 2024-09-18T16:37:35Z

For the column order, I would suggest that the data files should match the order of the columns defined by the CDM specification, so would we rather update the data files to follow that column order as a fix?

That would be my preference as well. So we require csv files to have columns in the same order specified by the CommonDataModel specification.

When reading in csv files explicity specify the datatypes of each col…

be002d9

…umn.

ablack3 changed the base branch from main to develop September 18, 2024 08:12

ablack3 marked this pull request as draft September 18, 2024 08:12

add workaround for gibleed condition_occurrence csv column ordering

29dc6dd

ablack3 marked this pull request as ready for review September 18, 2024 08:28

update version of upload-artifact in github actions workflow

8a7b6ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be explicit about the datatypes of each column in csv files #68

Be explicit about the datatypes of each column in csv files #68

ablack3 commented Sep 18, 2024

ablack3 commented Sep 18, 2024

fdefalco commented Sep 18, 2024

fdefalco commented Sep 18, 2024

ablack3 commented Sep 18, 2024 •

edited

Loading

Be explicit about the datatypes of each column in csv files #68

Are you sure you want to change the base?

Be explicit about the datatypes of each column in csv files #68

Conversation

ablack3 commented Sep 18, 2024

ablack3 commented Sep 18, 2024

fdefalco commented Sep 18, 2024

fdefalco commented Sep 18, 2024

ablack3 commented Sep 18, 2024 • edited Loading

ablack3 commented Sep 18, 2024 •

edited

Loading