-
Notifications
You must be signed in to change notification settings - Fork 75
Release planning
This page provides you with the planned maintenance and improvement activities around the OHDSI Standardized Vocabularies. This is to be treated as a forecast. Below you can find the content of each release and an overview of the planned improvement activites (detailed content to be posted separately).
As most of the community members refresh the Vocabularies and the data annually or semi-annually, the cadence of releases is twice a year. Such a schedule has a higher productivity, transparency in the content of the releases, and better version alignment in the community. Two releases (August and February) correlate with the source release schedule. An intermediate release in May 2023 is planned for work already accomplished.
Vocabulary work balances:
- Routine maintenance,
- Automation, usually across concepts and vocabularies of one domain at a time (overhauls, machinery improvements, etc.),
- Process improvement (e.g., external contribution guidelines or version control).
The roadmap is based on a continuous need assessment of the community, both in terms of vocabulary maintenance as well as process improvement.
The roadmap is made publicly available.
The plan for 2023 Q1 - 2024 Q2 includes the refreshes of the following commonly used vocabularies: SNOMED-CT, ICD family, Read, RxNorm, CVX, LOINC, HCPCS, ICD10PCS,
MedDRA, MeSH, NAACCR, dm+d, as well as improvement activities tailored to the most commonly reported problems described above.
Table 1 outlines the vocabularies included in each release as per the roadmap above.
Table 1. Vocabularies and activities included in each release.
Activity | Vocabulary version and modification | Name |
---|---|---|
Spring release, May 2023 | ||
CVX | refresh (20230222 version) and refactored code | Maria, Timur |
dm+d | refresh (20220927 version) and refactored code | Oleg, Timur |
HCPCS | improvement + refresh (Apr 2023 version) | Masha, Timur |
MeSH | refresh (2022 version) and refactored code | Timur |
NAACCR | mapping addition | Vlad, Timur |
NDC | refresh (20230319 version) | Oleg |
RxNorm | refresh (20230306 version) | Oleg, Timur |
RxNorm Extension | refresh (May 2023 version) | Oleg |
Smoking hierarchy | mapping addition | Maria, Timur |
SPL | refresh (20230319 version) | Oleg |
Summer release, August 2023 | ||
CPT4 | refresh (Spring 2023 version) | Masha, Timur |
LOINC | refresh (2.74 version) | Maria, Timur |
NDC | refresh (Aug 2023 version) | Oleg |
RxNorm | refresh (Aug 2023 version) | Oleg, Timur |
RxNorm Extension | refresh (Aug 2023 version) | Oleg |
SPL | refresh (Aug 2023 version) | Oleg |
VANDF | refresh (20230306 version) | Oleg, Varvara, Timur |
External contribution guidelines (part 1) | coverage of basic use cases | Anna, Alex, Christian, Timur |
Vocabulary Quality System (part 1) | conformance checks publicly available with each release | Alex, Anna, Christian, Timur |
Winter release, February 2024 | ||
CVX | refresh (Summer-Fall 2023 version) | Maria, Timur |
LOINC | refresh (Summer-Fall 2023 version) | Maria, Timur |
Read | mapping refresh | Maria, Irina |
HCPCS | refresh (Oct 2023 version) | Masha, Timur |
ICD10PCS | refresh (2023 version) | Masha, Maria, Timur |
MedDRA | improvement + refresh (version 26, Mar 2023) | Mikita, Timur |
NDC | refresh (Jan 2023 version) | Oleg |
RxNorm | refresh (Dec 2023 version) | Oleg, Timur |
RxNorm Extension | refresh (Feb 2023 version) | Oleg, Timur |
SPL | refresh (Jan 2023 version) | Oleg |
SNOMED overhaul | overhaul | Oleg, Timur |
SNOMED UK | refresh (Spring-Summer 2023 version) | Oleg, Timur |
SNOMED Int | refresh (Spring 2023 version) | |
SNOMED US | refresh (Feb 2023 version) | |
ICD | machinery improvement | Irina, Oleg, Timur |
ICD9(CM) | mapping improvement | Irina, Oleg |
ICD10(CM) | refresh (2022/2023 versions) | |
ICD10 (int) | mapping improvement | |
ICD10CN (China) | mapping improvement | |
ICD10GM (Germ) | refresh (2023 version) | |
CIM10 (France) | refresh (2023 version) | |
External contribution guidelines (part 2) | coverage of complex use cases | Anna, Alex, Christian, Timur |
Vocabulary Quality System (part 2) | standardized system with more complex assessment | Alex, Anna, Christian, Timur |
Summer release, August 2024 | ||
ATC | overhaul + refresh (2024 version) | Anna, others tbd |
CPT4 | refresh (2024 version) | Masha, Timur |
CVX | refresh (2024 version) | Maria, Timur |
HCPCS | refresh (April 2024 version) | Masha, Timur |
ICD9(CM) | mapping improvement | Maria, Irina |
ICD10(CM) | refresh (2023/2024 versions) | |
ICD10 (int) | mapping improvement | |
ICD10CN (China) | mapping improvement | |
ICD10GM (Germ) | refresh (2023/2024 versions) | |
CIM10 (France) | refresh (2023/2024 versions) | |
LOINC | refresh (2024 version) | Maria, Timur |
MedDRA | refresh (2024 version) | Mikita, Timur |
NDC | refresh (Aug 2024 version) | Oleg |
OMOP Invest Drug | refresh (2024 version) | Oleg, Varvara, Timur |
Read | mapping refresh | Maria |
RxNorm | refresh (Feb 2024 version) | Oleg, Timur |
RxNorm Extension | refresh (Aug 2024 version) | Oleg |
SNOMED Int | refresh (Spring 2024 version) | Masha, Timur |
SNOMED UK | refresh (Spring-Summer 2024 version) | |
SNOMED US | refresh (Feb 2024 version) | |
SPL | refresh (Aug 2024 version) | Oleg |
VANDF | refresh (2024 version) | Varvara, Timur |
Vocabulary-specific overhauls and improvements include:
- Stable domain and concept class id assignment.
- Alignment of the validity dates with the source.
- Fix of the problem with replacement relationships (such as “Concept replaced by”) not having “Maps to” links that prevent the users from automatically following the “Maps to” relationships from non-standard to standard counterparts.
- Clean-up of existing legacy “Maps to” relationships originating from “Concept is a possible equivalent to”.
- De-standardize and map the concepts in Drug and other (Race, Provider) domains to the standard concepts so that they can be effectively used in the sources that use SNOMED-CT (such as CPRD).
- Split up the pre-coordinated concepts (such as lab test with the results, allergies to the specific substances) and map them over to the respective concepts.
- Documentation of SNOMED-CT processing, domain assignment and quality assurance.
- Mapping re-use across ICD family to identify the discrepancies and similarities across different versions of ICD and improve the consistency of mappings.
- Incorporation of the mappings provided by SNOMED-CT and other sources.
- Fix of the source (CIAML) file processing to capture the ICD concepts currently missing.
- Documentation of the current procedures for mapping and quality assurance.
- Design and document the model that would allow to use MedDRA as both source and Classification terminology in the Condition Domain.
- Development of system that would allow to re-use the mappings of various sources (MedDRA-SNOMED initiative, UMLS), build our own based on the user needs, annotate them with metadata using SSSOM or other standards, and automatically transform them using generated metadata in both horizontal and hierarchical relationships.
- Build of “Maps to” relationships from MedDRA to SNOMED.
- Build of hierarchical relationships between MedDRA and SNOMED.
- Adopt the data-driven approach of attribute selection (RxNorm and RxNorm Extension attributes for ATC codes) based on the data sources that have ATC codes (Z index, JMDC, others).
- Identification of discrepancies and similarities between code assignment in different data sources to establish more consistent and accurate mappings from ATC to RxNorm (Ext).
- Validation of the vocabulary using data-driven approaches (including currently existing comparison for 1:1 matching to Clinical Drug Form and further expansion to comparison of the assignments for Clinical Drug, Branded Drug and 1:many matching).
- If feasible, incorporation of WHO ATC-drug product links and DDD represented in the machine-readable form.
- Hierarchy review, fix and documentation.
Process improvement activities include:
We divide the guidelines and processes into two parts with the first part rolled out by August 2023 release and second part rolled out by February 2024.
The first part will handle simple use cases such as changing “Maps to”, changing concept names and domains, adding or deprecating relationships or adding small vocabularies with no internal hierarchy. We will establish the pipeline for incoming requests with clear communication on when they will be incorporated. The pipeline involves submitting a request on GitHub with filled templates that follow stage tables’ structure to facilitate incorporation, instructions on how to fill them and quality assurance checks that need to be performed on the requester side. GitHub requests will facilitate version control and serve for educational purposes for other contributors. We will use existing requests that have not been fulfilled (such as ethnicity codes provided by the Health Equity WG, NIH provider codes and vocabulary, etc.) for dry runs and illustrative purposes.
The second part will target more complex use cases such as adding new vocabularies and changing hierarchies and therefore requires more comprehensive approaches (common development environment, automated scripts for quality assurance, maintenance scripts if applicable) building into a system for external contribution. Potential use cases for dry runs include ICPC2 that consist of adding a vocabulary, new codes and mappings to existing standard concepts.
As we have a standardized system for incorporating drug vocabularies (which, as opposed to other domains, influence standard vocabularies [RxNorm Extension] and therefore require more robust QA), drug vocabularies will be separated into a distinct chapter in the guidelines following the existing guides for external contributors.
External contribution guidelines will also include the guidance and best practices on how to locally add new concepts (in the form of 2 billion codes) and relationships (in the form of source_to_concept_map or concept_relationship) or modify relationships to enable research in those organizations and teams that require such modifications before they are released.
The guidelines and approaches will be shared with the committee and subsequently with the community for feedback.
We to divide the Vocabulary Quality System into two parts with the first part rolled out by August 2023 release and second part rolled out by February 2024.
The first part (quality control) includes describing existing procedures and making the documentation publicly available and adding the reports about passing the conformance checks and descriptive statistics (structure of the vocabularies, mapping coverage, gaps in hierarchies, orphan codes and more) to each release. It also includes expanding the tests to ensure comprehensive coverage based on the previously reported problems.
The second part (quality management system) includes designing a quality system with more complex completeness and plausibility checks and external validation. A systematic approach needs to be developed and the existing practices in other ontologies will be taken into consideration. As there is lack of frameworks (analogous to Kahn’s framework for data quality) for complex systems that harmonize and align multiple ontologies, this part will require more research and collaboration among the experts in the OHDSI community.
Quick access:
- Home
- News
- Introduction
- Glossary
- The Vocabulary Team
- Roadmap
- Release Notes
- Upcoming Changes
- Community Contribution
- General Structure, Download and Use
- Domains
- Vocabularies
- Vocabulary Statistics
- Vocabulary Development Process
- Vocabulary Metadata
- Quality Assurance and Control
- Known Issues in Vocabularies
- Articles
- COVID-19 Vocabulary/ETL Instructions
- Historical Versions