Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make data source definition more flexible to allow for dataset updates #37

Closed
mlondschien opened this issue Jul 28, 2023 · 7 comments
Closed

Comments

@mlondschien
Copy link
Contributor

Related: #33, #28.

The links (and thus versions) of data sources are currently hardcoded in the data-sources.json. This makes ricu unnecessarily inflexible. Adding a new data source requires changing the ricu source code (e.g., #30) as does benefitting from the data source's updates (e.g., caregiver_id was added to miiv tables in v2.2, which is currently not accessible via ricu).

@prockenschaub
Copy link
Collaborator

I agree with you that this is urgently needed.

This will probably require a bit more thought and some more involved restructuring/refactoring. Would you have a proposal on how to go about this and/or resources to support the implementation of such a proposal?

@mlondschien
Copy link
Contributor Author

Honestly, my R knowledge is too limited to be of technical help here.

Design-wise, I believe that updating existing data sources might be tricky if you want to keep backward compatibility.

@nbenn
Copy link
Member

nbenn commented Sep 29, 2023

@mlondschien I believe we briefly discussed this offline, but just to re-iterate: what you claim above is not true.

Adding a new data source requires changing the ricu source code (e.g., #30).

Data source definitions (as are concept definitions) are user configurable. Maybe the docs at ?ricu::load_src_cfg can serve as starting point for this. We have used this in several places ourselves (and I'm sure @dplecko could share his experience here). Having external data sets and integrating them with the "core" ricu data sets is a central piece of functionality offered by ricu. If you would want to use a newer version of one of ricu's data sets you can use this alongside the older version or in its place.

@mlondschien
Copy link
Contributor Author

Thanks @nbenn! Some guidance on how to do this, optimally an example, would be great.

@nbenn
Copy link
Member

nbenn commented Sep 29, 2023

  1. check out docs starting with ?ricu::load_src_cfg
  2. create your new source config as JSON (maybe call it something like mimic2, make sure (via the class_prefix entry, objects derived from this config inherit from the respective "mimic" classes; copy the mimic config ricu ships and change where necessary
  3. walk through the steps of download_src(), import_src() and attach_src(); most of this should work out of the box via inheritance; maybe check out docs here too
  4. make sure load_difftime(), load_src(), etc. work, but I believe they should, again via inheritance and assuming that there will not be changes between versions that affect the logic here
  5. add concepts for the new data source (mostly a copy/paste exercise again) to a local concept dictionary

I don't know in what ways newer mimic versions differ from what we had ~3 years ago. I'm assuming it's mostly adding some table(s), adding/removing patients. Maybe they dropped CareVue patients? That would simplify concept specification as fewer items have to be considered, but again seems largely backwards compatible. From a distance, I'm hoping that these changes do not really require changing mimic-specific logic but can mostly be handled by config.

Maybe also check in with @prockenschaub as he's working on adding new mimic to ricu I believe.

@prockenschaub
Copy link
Collaborator

I created a (temporary) repository that collects the all the different databases configs, including a short description of how to load them.

@mlondschien maybe you can have a look and check that it works for you. Long-term, I think it would be nice for backward compatibility to include those earlier versions with ricu. Maybe we can also turn the README in my repo into a proper vignette?

@nbenn while I agree that what Malte wants should already be possible, I think we need to provid more guidance for the user. From my experience, the learning curve of ricu can be quite steep. I think shipping configs for each version of MIMIC IV could already go a long way of easing people into it and also highlighting what parts of the config need to be changed to achieve what. load_difftime() etc. thankfully have not been affected by version changes so far. What may change is concepts, though, so a systematic way to not copy paste but inherit and, where necessary, overwrite concepts for new database versions may make sense.

@dplecko
Copy link
Member

dplecko commented Dec 29, 2023

I have discussed this with @nbenn. While I can see that having dataset versioning would be useful, this would create quite a lot of overhead in terms of re-structuring things. Therefore, we will not prioritize this currently, given the fact that other things have a better value/time invested ratio. Instead, we have opted for a different solution: we will try to be up-to-date with the "latest" version of each dataset.

For cases in which an older version of a dataset is needed, one may add this dataset locally. If you think examples for how this could be done would be helpful, please feel free to open a new issue. Adding local datasets is described to some degree here.

@dplecko dplecko closed this as completed Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants