Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Multimodality #50

Open
mmcdermott opened this issue Dec 16, 2024 · 3 comments
Open

[Proposal] Multimodality #50

mmcdermott opened this issue Dec 16, 2024 · 3 comments
Labels
Core Data Relevant to the core data/**.parquet schema enhancement New feature or request Pending Community Use Issues that should be solved after sufficient community uptake and use to dictate method priority:medium Medium priority; should be triaged for inclusion in near-term releases.

Comments

@mmcdermott
Copy link
Contributor

This is a tracking issue for discussions around supporting multimodality, in general. A few starting ideas.

  1. It is very easy to in principle support single-point-in-time modalities such as images or notes; these are just different value columns in the MEDS schema. In fact, a few different projects already use text_value columns for notes, for example. Other types of columns, e.g., images, would need to be typed/standardized, or we'd need to have a library of supported types related to on-disk storage, e.g.,
  2. It is more challenging to support time-varying modalities, such as waveforms, videos, or audio recordings. While these can be so short that they could be functionally considered single-point-in-time, it nevertheless challenges our current paradigm.
@EthanSteinberg
Copy link
Collaborator

I think the best solution for images is to introduce a new path in our MEDS extracts and then have strings that refer to those paths in the event/parquet files.

The worry is that the images are so large that including them in the parquets themselves will cause problems.

I agree that time-varying modalities require a lot more though.

@mmcdermott mmcdermott added enhancement New feature or request priority:medium Medium priority; should be triaged for inclusion in near-term releases. Core Data Relevant to the core data/**.parquet schema labels Jan 7, 2025
@mmcdermott
Copy link
Contributor Author

I agree that images should be stored via file paths

@mmcdermott mmcdermott added the Pending Community Use Issues that should be solved after sufficient community uptake and use to dictate method label Jan 7, 2025
@mmcdermott
Copy link
Contributor Author

I propose we keep this issue open, but add sub-issues for individual modalities as they gain sufficient traction and need in the community. E.g., an issue for text_value for text features, for paths to images, paths to short-duration waveform data, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Data Relevant to the core data/**.parquet schema enhancement New feature or request Pending Community Use Issues that should be solved after sufficient community uptake and use to dictate method priority:medium Medium priority; should be triaged for inclusion in near-term releases.
Projects
None yet
Development

No branches or pull requests

2 participants