Cohort templates #134

azimov · 2024-04-08T18:27:52Z

See issue #133

TODO:
*Vignette on implementation (will wait for agreement on approach before writing)

… as first pass implementation

azimov · 2024-04-08T18:35:11Z

@mdlavallee92 I would appreciate your input here, from the perspective of Capr which could be used to generate templates for broad, highly sensitive phenotypes.

mdlavallee92 · 2024-04-09T13:12:50Z

Thanks for tagging me to the review @azimov. In terms of a code review, I will admit there is not much I can contribute since this falls outside of the Capr realm and my grasp of R6 is limited. Most of my comments are philosophical wrt the purpose of the tool.

Conceptual Understanding
If I understand this correctly, essentially this would be like helpers to instantiate a set of cohorts based on a vocabulary (i.e. all rxNorm Ingredients or all ATC 4th class). So, the workflow would be like build Circe study pop plus all drug ingredient cohorts in the database based on rxNorm (for example). What is the motivation for templating out ingredient cohorts? Are you using them for characterization or modelling features or both? Also how does this differentiate from using either of the eras tables in the cdm? If characterizing, an approach would be to use concept sets instead of cohorts.

Efficiency
Using the vocabularies to build simple cohorts up front is a far more efficient than using Capr (since I am not building all these vocab based cohorts on 'n' CIRCE sql). I think it would be something I would use.

Other templates
My other comment is towards intake of other cohort templates. Right now, you have three examples on clinical domains. A common one I encounter at my company are observation period cohorts for denominators in a basic prevalence calculation (say all males on year 2010 of database). Using non-circe variations of these cohorts would be very handy and also de-clutter an ATLAS instance 😏. Could you provide a road map for how to incorporate other cohort templates?

Non-CIRCE workflow
Final comment is more general. I am softening my stance towards using non-Circe cohorts, particularly in internal studies. While my preferred approach is to always be in CIRCE for its reproducibility attributes, there are many times I have to build non-trivial subsets on say measurements or a cdm domain not covered in circe (ie. episode or cost). Generally, my approach for an OHDSI study is that the study population (target) should be in Circe because it is something I can reproduce across different studies, iterate on changes with colleagues via ATLAS and I have an auditable file via json. While in an internal study in theory you can approach the CDM in any way you want, a more principled approach is to treat any study as if it were a network study. Having said this it becomes a real headache to keep track of all these non-CIRCE cohorts. The best I can come up with is do custom sql magic and then reroute this work into the nx4 cohort table structure to access this cohort (subset or ow) later on in my study. So, I appreciate ways to standardize some of these non-Circe routes of cohort building as you have done here.

azimov · 2024-04-09T17:02:53Z

My other comment is towards intake of other cohort templates. Right now, you have three examples on clinical domains. A common one I encounter at my company are observation period cohorts for denominators in a basic prevalence calculation (say all males on year 2010 of database). Using non-circe variations of these cohorts would be very handy and also de-clutter an ATLAS instance 😏. Could you provide a road map for how to incorporate other cohort templates?

I would say that this is already possible with subset operations included in this repository (saving, loading and using in strategus designs is already well supported too).

For your example question:

library(CohortGenerator)

cohortDefinitionSet <- ROhdsiWebApi::getMyCohorts(...) # could be the target cohorts in this PR

subsetDefinition <- createSubsetDefinition(
    name = "MALES, in 2010",
    definitionId = 1,
    subsetOperations = list(
       # Applied in order
       createLimitSubset(
           name = "Year 2010",
           calendarStartDate = "2010/1/1",
           calendarEndDate = "2010/12/31"
       ),
       createDemographicSubset(
          name = ">=18",
          ageMin = 18
        )
    )
)

cohortDefinitionSet <- cohortDefinitionSet |>
  addCohortSubsetDefinition(subsetDef, targetCohortIds = c(1,2,3,4)) # If not specified, it will apply to all

generateCohortDefinitionSet(...)

This will save you a lot of time and keep clutter out of atlas.

azimov · 2024-04-09T17:12:10Z

Conceptual Understanding
If I understand this correctly, essentially this would be like helpers to instantiate a set of cohorts based on a vocabulary (i.e. all rxNorm Ingredients or all ATC 4th class). So, the workflow would be like build Circe study pop plus all drug ingredient cohorts in the database based on rxNorm (for example).

I definitely see the benefits of getting characteristics for concept sets, but the motivation here is really to have cohorts generated from pure SQL to be treated in the same manner as all other cohorts in packages. At the moment we are doing sql in a lot of places anyway. Personally I have used a lot of hacks to get this done and I want an approach that is cleaner and re-usable in any tools.

What is the motivation for templating out ingredient cohorts?

We are using these in our comparator selection tool and other places to do things at scale. The logic is a bit more than the drug eras table (for example, requiring prior observation).

Using the vocabularies to build simple cohorts up front is a far more efficient than using Capr (since I am not building all these vocab based cohorts on 'n' CIRCE sql). I think it would be something I would use.

This is good to know. In a dream world I would like to go from a templated cohort to a circe/capr compatable definition with a work flow being like:

1. Template a bunch of sensitive cohorts
2. Evaluate which is best, propose potential changes
3. Extract circe definition
4. Add extra features (e.g. relevant measurement, procedures or other codes unrelated to primary diagnosis)
5. Publish to atlas as phenotype

mdlavallee92 · 2024-04-09T17:21:19Z

My other comment is towards intake of other cohort templates. Right now, you have three examples on clinical domains. A common one I encounter at my company are observation period cohorts for denominators in a basic prevalence calculation (say all males on year 2010 of database). Using non-circe variations of these cohorts would be very handy and also de-clutter an ATLAS instance 😏. Could you provide a road map for how to incorporate other cohort templates?

I would say that this is already possible with subset operations included in this repository (saving, loading and using in strategus designs is already well supported too).

For your example question:
library(CohortGenerator)

cohortDefinitionSet <- ROhdsiWebApi::getMyCohorts(...) # could be the target cohorts in this PR

subsetDefinition <- createSubsetDefinition(
    name = "MALES, in 2010",
    definitionId = 1,
    subsetOperations = list(
       # Applied in order
       createLimitSubset(
           name = "Year 2010",
           calendarStartDate = "2010/1/1",
           calendarEndDate = "2010/12/31"
       ),
       createDemographicSubset(
          name = ">=18",
          ageMin = 18
        )
    )
)

cohortDefinitionSet <- cohortDefinitionSet |>
  addCohortSubsetDefinition(subsetDef, targetCohortIds = c(1,2,3,4)) # If not specified, it will apply to all

generateCohortDefinitionSet(...)
This will save you a lot of time and keep clutter out of atlas.

I meant the other way around. denominator is the Males in 2010 for the entire db and the numerator is the study population. Can CG build a set of year, gender, race cohorts with no index event as well? The term "cohort" is used liberally here, more like id gender race year counts...could be out of scope of CG.

But yes, the subsetting you outline here is highly useful and have done it within studies.

azimov · 2024-04-09T23:05:33Z

I meant the other way around. denominator is the Males in 2010 for the entire db and the numerator is the study population. Can CG build a set of year, gender, race cohorts with no index event as well? The term "cohort" is used liberally here, more like id gender race year counts...could be out of scope of CG.

This could be done with templates, but it might make your DBA sweat as storing all demographics for males in a given year could be huge percentage of peeople, so I would say that it would be much better to just generate counts in this context (Patrick will have a lot to say to you about computing incidence rates in this sort of population though).

Generally something like this should live in the CohortIncidence package but there we're trying to be quite specific here - for meaningful measures you want a target population and an outcome as well as a time at risk.

# Conflicts: # R/CohortConstruction.R

codecov · 2024-10-24T18:55:28Z

Codecov Report

Attention: Patch coverage is 81.38686% with 51 lines in your changes missing coverage. Please review.

Project coverage is 95.45%. Comparing base (4ce1ae0) to head (73d3692).
Report is 59 commits behind head on develop.

Files with missing lines	Patch %	Lines
R/TemplateImplementations.R	66.15%	44 Missing ⚠️
R/CohortConstruction.R	89.47%	4 Missing ⚠️
R/TemplateCohorts.R	96.80%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #134      +/-   ##
===========================================
- Coverage    98.80%   95.45%   -3.36%     
===========================================
  Files           12       17       +5     
  Lines         1336     2154     +818     
===========================================
+ Hits          1320     2056     +736     
- Misses          16       98      +82

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

azimov added 4 commits March 25, 2024 17:47

Initial classes and code for creating references with rxNorm template…

19b11b3

… as first pass implementation

Cohort generation of templates works and is tested

cd2910d

Added ATC class cohort templates

6db260f

Template code works with large ints

9aba45f

azimov marked this pull request as draft April 8, 2024 18:28

Snomed templates

1b64c1a

anthonysena linked an issue May 29, 2024 that may be closed by this pull request

SQL cohorts as first class citizens - SQL templates for non-standard cohorts or large, bulk operations #133

Open

Merge branch 'develop' into cohort_templates

73d3692

# Conflicts: # R/CohortConstruction.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cohort templates #134

Cohort templates #134

azimov commented Apr 8, 2024 •

edited

Loading

azimov commented Apr 8, 2024

mdlavallee92 commented Apr 9, 2024

azimov commented Apr 9, 2024 •

edited

Loading

azimov commented Apr 9, 2024

mdlavallee92 commented Apr 9, 2024 •

edited

Loading

azimov commented Apr 9, 2024

codecov bot commented Oct 24, 2024 •

edited

Loading

Cohort templates #134

Are you sure you want to change the base?

Cohort templates #134

Conversation

azimov commented Apr 8, 2024 • edited Loading

azimov commented Apr 8, 2024

mdlavallee92 commented Apr 9, 2024

azimov commented Apr 9, 2024 • edited Loading

azimov commented Apr 9, 2024

mdlavallee92 commented Apr 9, 2024 • edited Loading

azimov commented Apr 9, 2024

codecov bot commented Oct 24, 2024 • edited Loading

Codecov Report

azimov commented Apr 8, 2024 •

edited

Loading

azimov commented Apr 9, 2024 •

edited

Loading

mdlavallee92 commented Apr 9, 2024 •

edited

Loading

codecov bot commented Oct 24, 2024 •

edited

Loading