-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VisitContext subset operator #93
Comments
This should be relatively straightforward. An outline of what's needed is follows:
Out of scope for this package but of note with this implementation (or any expansion of functionality) is that it will break any Happy to do this, but it might be better to get some knowledge transfer around this. |
Agreed with all except for the breaking packages part. My thoughts on this is that if we have a Strategus design that includes these new subset operators, prior Strategist designs should continue to function (I consider something 'beaking' when something that used to work no longer works). I could be misunderstanding your point, apologies if I do. We spoke about the case where a new design with new options would not work running on an older version (because the older version would not handle the new options) and that trying to execute a 'buildDesign' script that calls out to new functions (such as createVisitContextSubset) against an old version of the package would also not work, but this is more a case of 'forward compatability' vs. 'backwards compatability'. I think we described a mechanism where we can execute the build design scripts under the context of a renv session that loads the proper dependencies such that we won't run into this problem, but we don't have a solution to that yet. Onto the other questions you raised (and we should get some feedback from @gowthamrao since I think he would have the most benefit from this):
|
Thank you for looping me into this important topic. Yes, creating cohorts of visits to subset is essentially copying full visit table records - not efficient. This would mean using atleast the visit_occurrence and visit_detail tables and corresponding _concept_id and _source_concept_id (i can explain why _source_concept_id in a different thread) as part of the cohortSubset input specifications (i.e. as a subset operator) Use case: persons with anaphylaxis, subset to visit (inpatient, emergency room, doctor's office visit). Discussed here https://forums.ohdsi.org/t/phenotype-phebrurary-2023-p2-anaphylaxis/18193 . In this case, if we had taken a cohort of person evens with condition anaphyalxis, and subset by visitContext - we probably would have picked up the issue using the tool. I am exploring if this value proposition is true, by using visit cohort for now (which as describe above entails creating a cohort of all records in visit_occurrence table). |
(I know this request basically amounts to asking for circe logic embedded into the subset operator (or capR like logic). Just to be clear, I am NOT proposing that at all. Just sharing the use case) In this second post, i will explain my rationale for asking for support of _source_concept_id fields in visit_occurrence (and visit_detail) I need to identify visits that are TRUE 'Doctors office visit'. What I am trying to say here is that 'TRUE Doctors office visits' does not equate to 'Outpatient Visit'. Visits such as urgent care, or outpatient surgery, chemotherapy infusion visits are 'Outpatient Visit' but not the typical 'Doctors office visit' where we are going to a clinic. One logical way to do this is: But the following issues make generalization difficult, mostly driven by ETL conventions
An approach like this appears to provide universal solution -- hence the ask to support both visit_occurrence and visit_details + _concept_id and _source_concept_id |
@gowthamrao Supporting the How this list is maintained and used (e.g. in the phenotype library) is a different topic but it would absolutely be possible to create subsetting definitions that use a specific source code. Even supporting a list of 100s of standard codes in this package seems tricky. I think for a first pass we will just support basic top level names and then require users to enter specific concept ids or source ids for more complex definitions. |
I think you are saying the user has to provide an array of conceptIds and that the CohortGenerator package would not maintain such an array of conceptId's. I think that is acceptable For clarity - i mean to support the two conceptId fields visit_concept_id and visit_source_concept_id (not visit_source_code). visit_occurrence.visit_concept_id visit_detail.visit_detail_concept_id i dont know if we should support |
Regarding visit_type_concept_id, i would like to request that we support visit_type_concept_id if possible. Reason: See this cohort definition https://atlas.ohdsi.org/#/cohortdefinition/348/definition that limits to visits with the diagnosis in primary position vs this that does not https://github.com/OHDSI/PhenotypeLibrary/blob/main/inst/cohorts/235.json All other aspects are the same. Should we building and instantiating cohorts for everytime we want to see if the use of primary position makes any difference? Can we ask CohortGenerator to help us? |
Some additional thoughts based on recent discussion between @chrisknoll , @azimov and @anthonysena
|
I requested a reindex operator here #94 but was wondering how to solve these: Use case: i have a target cohort (anaphylaxis). I want to subset to inpatient visit. The above discussion would support it. But sometimes, the diagnosis of anaphylaxis maybe overlapping with inpatient visit such as that cohort_start_date of anaphylaxis cohort is > visit_start_date and < visit_end_date. This brings the situation of the output of the visitContext subset operator being
i.e. should we allow for reindexing in the visitContext subset operator? In the absence of reindexing in visitContext, we have to do this - which we want to not do. @chrisknoll
|
Subset operators aren't used to reassign cohort dates, but instead to subset the episodes found in a given cohort. You want to be able to go from a subset back to the parent target cohort by subject_id, start_date, end date (ie: subset is subsumed by T). Re-indexing is an idea, but I think beyond the scope of this issue. |
I was thinking about the team conversation the other day on this topic. I like the idea of modeling visit subset operator based on cohortSubset operator. Breaking down: we can think of two steps
|
Elaborating 1 Deriving a cohort like table first visit table(s):
that (when we apply as a subsetOperator in a subsequent step) yields an SQL that is something like
We can make a list of these something like list(visitDefinition1, visitDefinition2, visitDefinition3) that yields an SQL something like
|
Elaborating 2 - A subset operator that subsets the target cohorts with the output of 1. Now we can think of reusing the createSubsetCohortWindow and createCohortSubset like functionality with something like createSubsetVisitWindow and createVisitSubset |
Hi, @gowthamrao , I think these ideas are going beyond the intended purpose of the subset operator which is: identify the cohort episodes that have a period that is in proximity to a visit (based on the specified visit concept ids). Your examples seem to be finding visit_occurrence records, but the purpose of a subset operator is to find the COHORT records based on a subset query. The other confusing part about your queries is that they seem to result in nothing when you have > 1 day visits:
When you say visit_end_date >= visit_start_date, this will only be true for single day visits. Update: I misread that thinking it said start_date >= end_date, but the confusion remains: why even specify that WHERE clause when all visits will have end_dates => start dates? When reading your code, I'm trying to digest every thing you are specifying and find reason for it, but I can't find a reason for this one. Let's stay on target as to what a 'visit subset operator' is intended to do (find cohort episodes relative to a visit) and if you want to do something different like go from a first visit to the end of the last visit (ie: person_id, visit_start_date, max(visit_end_date) visit_end_date), then you'd create a Visit cohort (all events) with a gap days of 99999 such that the cohort episode will go from the first visit to the last visit (and include all time between). With that type of cohort, you'd just use your normal CohortSubsetOperator to do the job. |
Aah - this is because i am paranoid. Yes, by convention visit_end_date >= visit_start_date, but i have seen bad ETL. And this just handles a bad ETL. You can take it off if you dont like it. |
I think you are asking why the max(visit_end_date) ? The answer is -- unlike records in the cohort table, records in the visit table may be overlapping. So a max would remove possible overlaps. You can think of my proposed sql as follows if that helps
|
They key idea i was proposing is that we can think of the visitSubsetOperator as something like this
|
Ok, when you say 'delivering a cohort-like table', I'd say just resort to creating a cohort table with any rules you want to apply to visit occurrences to lump them together. We targeted this function to allow you to subset cohorts by visit context so all you really need is visit concept IDs and time windows. The method you describe above would have a person with a visit in 2015 and 2019 span a timewindow that starts in 2015 and ends in 2019...this is not exactly what visit context means, but rather you're defining a phenotype of 'experienced a visit` that combines different visits into a single one. When you do this, you loose track of intermediate visits, which may not matter for your specific usecase (which i say again, use the cohort subset functionality) but when thinking specifically about visit_occurrence we want to do more specific things like know their actual start/ends, maybe look at provider specialty, etc. When you merge visits together like that, you lose that context. |
I think you are saying is that after a certain level of complexity, we reach a point of diminishing return and could just use the general purpose CohortSubset operator where the subset cohort is another circe based cohort definition for visit. Yup - i agree with you there. So lets agree that the solution should be something as simple (represented in pseudo code)
|
That's fine, tho It might be interesting to incorporate provider into the subsetting (providers are associated by visits) so maybe that can be a phase 2 enhancement. |
Currently, in order to subset a cohort based on visit type (ER, inpatient, etc) we need to create different 'visit' cohorts and use cohortSubset operators on it. The problem is that making visit cohorts effectively copies the VISIT_OCCURRENCE table, which is bad.
The proposal here is to define a new 'visitContextSubset' operator that will take certain parameters (TBD) and yield the cohort subset based on the visit parameters provided.
The text was updated successfully, but these errors were encountered: