Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation updates #35

Merged
merged 5 commits into from
Jul 30, 2024
Merged

Conversation

EthanSteinberg
Copy link
Collaborator

No description provided.

README.md Outdated
Comment on lines 17 to 20
2. A _measurement_ or _patient measurement_ or _observation_ in a MEDS dataset refers to a single measurable
quantity observed about the patient during their care. These observations can take on many forms, such as
observing a diagnostic code being applied to the patient, observing a patient's admission or transfer
from one unit to another, observing a laboratory test result, but always correspond to a single
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the measurement abstraction exists in MEDS anymore since we don't have nesting.

These are just "events" now. "event" is now the term for a single measurable quantity observed about a patient during their care.

This section should be rewritten to refer to "event"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an objection to using "event" instead of "measurement", but I want there to be some term to refer to all observations that occur at a single unique timestamp. This is useful in several contexts, even when we don't nest, such as:

  1. Providing a per-patient integral index of the ordered unique timestamps in their sequence that can be used during tabularization (meds tab uses this), for matching to label cohorts (meds-tab and meds-torch use this), or efficiently subselecting tensorized data over a certain time range (the nested ragged tensorization code uses this).
  2. Differentiating pre-processing transformations that affect the observations within a timepoint but do not affect the timepoints of the data from those that do affect the timepoints of the data.
  3. Differentiating models that operate over sequences of observations/measurements vs. those that operate over sequences of unique timestamps.

To me, the natural division is to use the term "event" to occur to all things at a unique timestamp and "measurement" or "observation" to refer to a single tuple of patient_id, timestamp, code, and *_value, but I'm open to using different verbiage, as long as we can express both concepts. Do you have an alternate nomenclature you'd prefer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think MEDS needs to define that term, or use it, since the schema/datamodel doesn't require referring to unique timestamps.

I'm not sure what I would call this abstraction since I have never used it (and don't really see a case where I would).

README.md Outdated Show resolved Hide resolved
README.md Outdated
Comment on lines 28 to 29
4. An _event_ or _patient event_ in a MEDS dataset corresponds to all observations about a patient that
occur at a unique timestamp (within the level of temporal granularity in the MEDS dataset).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This abstraction no longer exists and this sentence should be deleted.

@EthanSteinberg
Copy link
Collaborator Author

I think this mostly looks good, the main issue is that it uses the double nested terminology of measurements vs events, when that is gone now. There are only "events" (single measurments) now.

@mmcdermott
Copy link
Contributor

mmcdermott commented Jul 30, 2024 via email

@EthanSteinberg
Copy link
Collaborator Author

I think it would be very weird if the Medical Event Data Standard did not use the word "event".

So I would prefer to keep using the name "event".

@mmcdermott
Copy link
Contributor

mmcdermott commented Jul 30, 2024 via email

@EthanSteinberg
Copy link
Collaborator Author

Keep in mind that in most other usage, "event stream" data has the connotation that events have unique time points

I've never seen that usage before, can you provide some examples?

@mmcdermott
Copy link
Contributor

I've never seen that usage before, can you provide some examples?

hmm. In looking through some references, while I find a lot of settings where it is implicitly clear that the mental model is one of unique timestamps (as many sites reference event streams as capturing real-time data, with events occurring in real time and being received in the stream as they happen, or as most diagrams of event streams show unique timestamps, etc.), I don't see anything explicitly claiming that they have unique timestamps, and when I asked ChatGPT (as a proxy for the "consensus" of the field), it states that there is not such a mandate for uniqueness, though this is attributed to limitations in the granularity of the time-unit being recorded, such as in high-frequency settings, not due to our setting where many observations are implicitly binned before being recorded.

So, I guess that argument falls flat. I would still put in a marked preference for us using, in so far as anything is defined here, the term "measurement" or "observation", both to reserve a natural term to refer to the concept that is in use in several settings already as I outlined above and because it more strongly agrees with my conception of these things. Then, if you want to use "event" in your code and terminology, you are free to do so, as we won't standardize that term at all and nothing says you have to use "measurement" or "observation". Do you have a strong objection to this?

@mmcdermott
Copy link
Contributor

Alternatively, I'm also happy banishing both terms from any official terminology and just referring to the main schema as "meds_schema" or something like this.

@EthanSteinberg
Copy link
Collaborator Author

This looks good. Feel free to merge it in

@mmcdermott mmcdermott marked this pull request as ready for review July 30, 2024 18:39
@mmcdermott mmcdermott merged commit e0cd043 into ethan-meds-3-rc Jul 30, 2024
3 checks passed
@mmcdermott mmcdermott deleted the mmd-meds-3-rc-documentation branch July 30, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants