Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on OMOP to MEDS conversion #37

Open
ealonso-vicomtech opened this issue Oct 4, 2024 · 2 comments
Open

Error on OMOP to MEDS conversion #37

ealonso-vicomtech opened this issue Oct 4, 2024 · 2 comments

Comments

@ealonso-vicomtech
Copy link

ealonso-vicomtech commented Oct 4, 2024

Hi!

I am trying to execute the meds_etl_omop command to convert my OMOP dataset into MEDS but I'm getting the following error

Generating metadata from OMOP `concept` table
1it [00:06,  6.49s/it]
Decompressing OMOP tables, mapping to MEDS Flat format, writing to disk...
 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                              | 4/5 [00:34<00:08,  8.63s/it]
Traceback (most recent call last):
  File "/data/venvs/lucia/bin/meds_etl_omop", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 747, in main
    process_table_csv(task)
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 379, in process_table_csv
    write_event_data(
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 315, in write_event_data
    event_data.sink_parquet(fname, compression="zstd", compression_level=1, maintain_order=False)
  File "/data/venvs/lucia/lib/python3.11/site-packages/polars/utils/unstable.py", line 59, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/venvs/lucia/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2193, in sink_parquet
    return lf.sink_parquet(
           ^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: failed to determine supertype of list[extension] and datetime[μs]

Error originated just after this operation:
FILTER String(SNOMED/184099003).strict_cast(String).is_not_null() FROM
RENAME
  DF ["person_id", "gender_concept_id", "year_of_birth", "month_of_birth"]; PROJECT */18 COLUMNS; SELECTION: "None"

How can i deal with it? Looks like the error is related with the Person table but i cannot see which columns have the unsuported type.

Im using the following versions:

meds==0.1.3
meds_etl==0.1.3

When I upgrade meds to the last version it gives me this error

Generating metadata from OMOP `concept` table
1it [00:07,  7.23s/it]
Decompressing OMOP tables, mapping to MEDS Unsorted format, writing to disk...
 40%|████████████████████████████████████████████████████████████████████████████████████████████                                                                                                                                          | 2/5 [00:12<00:19,  6.34s/it]condition
incomplete mapping specified for `replace_strict`

Hint: Pass a `default` value to set unmapped values.
STREAMING:
   SELECT [col("person_id").strict_cast(Int64).alias("subject_id"), col("condition_start_datetime").str.strptime([String(raise)]).coalesce([col("condition_start_datetime").str.strptime([String(raise)]).dt.offset_by([String(1d)]).dt.offset_by([String(-1s)])]).coalesce([col("condition_start_date").str.strptime([String(raise)]).coalesce([col("condition_start_date").str.strptime([String(raise)]).dt.offset_by([String(1d)]).dt.offset_by([String(-1s)])])]).alias("time"), when([(col("__POLARS_CSER_0xddde4da9d1b6e86d")) != (0)]).then(col("__POLARS_CSER_0xddde4da9d1b6e86d")).otherwise(when([(col("__POLARS_CSER_0x37fad8cbfee072e")) != (0)]).then(col("__POLARS_CSER_0x37fad8cbfee072e")).otherwise(null.strict_cast(Int64))).replace_strict([Series, Series]).alias("code"), null.strict_cast(String).str.strip_chars([null]).cast(Float32).alias("numeric_value"), when(null.strict_cast(String).str.strip_chars([null]).cast(Float32).is_null()).then(null.strict_cast(String).str.strip_chars([null])).otherwise(null.strict_cast(String)).alias("text_value"), String(condition).alias("table"), col("visit_occurrence_id").alias("visit_id"), col("condition_end_datetime").str.strptime([String(raise)]).coalesce([col("condition_end_datetime").str.strptime([String(raise)]).dt.offset_by([String(1d)]).dt.offset_by([String(-1s)])]).alias("end")] FROM
     WITH_COLUMNS:
     [col("condition_source_concept_id").strict_cast(Int64).alias("__POLARS_CSER_0xddde4da9d1b6e86d"), col("condition_concept_id").strict_cast(Int64).alias("__POLARS_CSER_0x37fad8cbfee072e")] 
      FILTER when([(col("condition_source_concept_id").strict_cast(Int64)) != (0)]).then(col("condition_source_concept_id").strict_cast(Int64)).otherwise(when([(col("condition_concept_id").strict_cast(Int64)) != (0)]).then(col("condition_concept_id").strict_cast(Int64)).otherwise(null.strict_cast(Int64))).replace_strict([Series, Series]).is_not_null() FROM
        RENAME
          DF ["condition_occurrence_id", "person_id", "condition_concept_id", "condition_start_date"]; PROJECT 7/16 COLUMNS; SELECTION: None
 40%|████████████████████████████████████████████████████████████████████████████████████████████                                                                                                                                          | 2/5 [00:17<00:26,  8.68s/it]
Traceback (most recent call last):
  File "/data/venvs/lucia/bin/meds_etl_omop", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 757, in main
    process_table_csv(task)
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 381, in process_table_csv
    write_event_data(
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 322, in write_event_data
    raise e
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 317, in write_event_data
    event_data.sink_parquet(fname, compression="zstd", compression_level=1, maintain_order=False)
  File "/data/venvs/lucia/lib/python3.11/site-packages/polars/_utils/unstable.py", line 58, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/venvs/lucia/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2385, in sink_parquet
    return lf.sink_parquet(
           ^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: incomplete mapping specified for `replace_strict`

Hint: Pass a `default` value to set unmapped values.
@EthanSteinberg
Copy link
Collaborator

@ealonso-vicomtech That error seems to indicate that you have concept ids within your condition_occurrence table that do not have any entries within your concept table?

Aka you have OMOP validity issues?

Can you verify that every concept id within your condition occurrence table has a corresponding concept entry?

@ealonso-vicomtech
Copy link
Author

Hi,

Every concept_id in condition_occurrence table has a corresponding concept entry in concept table.

I am using OMOP CDM v5.4 and all the condition_source_ids are standard (SNOMED vocab).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants