Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instructions to load dataset to Amazon Athena #1268

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yuvallb
Copy link

@yuvallb yuvallb commented Mar 3, 2022

No description provided.

Copy link

@Nintorac Nintorac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for writing this! Saved me some time :D

I didn't verify the rest of the schemas yet but these came up as issue for us. When I have some time I will verify the rest and update if necessary.

mimic-iv/buildmimic/athena/schema.sql Outdated Show resolved Hide resolved
mimic-iv/buildmimic/athena/schema.sql Outdated Show resolved Hide resolved
@yuvallb
Copy link
Author

yuvallb commented Oct 31, 2022

Thanks Nintorac!
Let me know if there are other issues, and also if all went well.

@Nintorac
Copy link

Nintorac commented Nov 2, 2022

There ended up being issues on most of the tables, I pretty much just set every varchar as a string so there might be some more improvements to make but here is the current schema I've come up with

schema.tar.gz (this outdated now, will upload again later)

Also seeing some weird results for some tables. eg

SELECT * FROM "mimic_iv_raw"."d_labitems" limit 10;

image

row 3 has a column that looks like "something, else" and athena is breaking on the , within the quotes

@alistairewj
Copy link
Member

alistairewj commented Jan 15, 2023

Would be happy to merge (a working version of) this - my initial thoughts are:

  1. Not familiar with loading data into Athena, but my hunch is you need to specify that the fields are quoted "
  2. Needs an update for v2.2. The tables in core have been moved to hosp, and there are new columns.
  3. Not sure what the distinction is between the mimiciv_parquet and mimiciv_csv schemas. In general it would be great to keep the mimiciv_hosp, mimiciv_icu schema names as that makes concept code much easier to transfer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants