Skip to content
Alexander edited this page May 3, 2021 · 5 revisions

UK Biobank


OHDSI forums release post: link.

Online showcase of UK Biobank resources: link.

UK Biobank - is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants.

Scope of the vocabulary release

All concepts were included except the following: Genomics data, Cardiac monitoring (measurement characteristics, such as ECG trace, acceleration, impedance and analysis date), Health related outcomes (except for a few fields coming from the HES dataset), bulk data (Invalid fields, DICOM files, chromosome genotype intensities, CRAM files, etc.).

Existing ICD9, ICD10, OPCS4 concepts used as possible answers in the UK Biobank have also been excluded from the UK Biobank vocabulary in order to avoid concept duplication. The concepts of SOC2000 vocabulary were also excluded unless it’s recognized as a separate OMOP vocabulary.

Concept names

All Concepts are assigned the longest of all available names.

UK Biobank concepts have one distinct name per concept. However, the extended description of the question, revealing its context, is stored in UK Biobank notes. During the OMOPing the UK Biobank we preserved those descriptions with slight modifications in the concept_synonym table. Modifications imply erasing of non-relevant words and details: ‘Question asked:...’, ‘Participant was asked:...’, ‘ACE touchscreen question:...’, etc.

Source OMOP
title Low calorie drink intake concept_name Low calorie drink intake
notes Question asked: "How many glasses/cans of low calorie or diet drinks (e.g. fizzy, squash) did you drink yesterday?"<p>If the participant activated the Help feature they were shown the message:<p><i>Low calorie flavoured water should be recorded under low calorie drinks.</i> concept_synonym_name How many glasses/cans of low calorie or diet drinks (e.g. fizzy, squash) did you drink yesterday?

Concept names for pre-coordinated pairs were built by concatenation according to the following format:
‘Question name’: ’Answer name’

‘Variable name’: ’Value name’

Concept codes

In order to provide code uniqueness, for categories ‘c’ was added to category_id and used as concept_code. E.g. “c100078” for Biological samples.

For Biobank fields (Questions or Variables) field_id was used as concept_code. E.g. “30264” for Mean reticulocyte volume acquisition route.

For Answers/Values the combination of ‘encoding_id’-’value’ was used as concept_code. E.g. “1401-17” for Agoraphobia.

Domains

Domains for categories were assigned according to CDM specification. For each UK Biobank concept the respective domain was inferred from the concept’s category domain, unit and answer/value type provided by the source.

All concepts used as answers to questions are Observations.

There are also 4 Meas Values used as possible values for Measurements. The following values are: False, True, Measure invalid, Measure not cleanly recoverable from data.

Concept_classes

Category - all UK Biobank categories were assigned this concept class.
Question - the concept was assigned this concept class if it represents the question the participant was asked.

Variable - the concept was assigned this concept class if it belongs to the Measurement domain or represents technical details of procedure/measurement or it represents the piece of information provided by the provider.

Answer - the concept was assigned this concept class if it is related to concept with ‘Question’ concept_class.

Value - the concept was assigned this concept class if it is related to concept with ‘Variable’ concept_class.

Precoordinated pair - a new concept class for Question/answer or Variable/Value pairs was introduced in order to provide the mappings for combinations. These concepts are Non-standard and always have a mapping to Standard OMOP entities.

Standard concepts

Standard_concept value was assigned by the following rules:

Concept characteristic standard_concept Examples
UK Biobank category Classification Alcohol

Baseline characteristics

Non-numeric Question/Variable without mapping Standard Added milk to espresso

Cheese spread intake

Non-numeric Question/Variable and Answers/Value with direct mapping provided separately by “Maps to” links Non-standard Treatment/medication code

and

aluzine 20mg tablet

Cancer code, self-reported

and

mouth cancer

Non-numeric Question/Variable with full mapping equivalent provided through pre-coordinated pairs Non-standard HSV-1 seropositivity for Herpes Simplex virus-1

HHV-7 seropositivity for Human Herpesvirus-7

Non-numeric Question/Variable with mapping provided through pre-coordinated pairs Standard Noisy workplace

Anaesthethic delivered post delivery

Numeric Question/Variable without mapping and with at least one meaningful predefined Answer/Value Standard Lifetime number of depressed periods

Number of days/week walked 10+ minutes

Numeric Question/Variable without at least one meaningful Answer/Value Non-standard Number of older siblings

Longest period of unenthusiasm / disinterest

Answer/Value with relevant data underlined Standard False

True

Answer/Value without relevant data underlined (flavours of NULL) Non-standard Measure invalid

Measure not cleanly recoverable from data

Pre-coordinated pairs are non-standard concepts but always have mapping to standard ones.

Internal and external relationships

From Relationship To
UK Biobank category Category of UK Biobank field (Question/Variable)
UK Biobank field (Question/Variable) Has answer UK Biobank answer

(Answer/Value)

UK Biobank field (Question/Variable) Maps to unit OMOP Standardized units
UK Biobank field (Question/Variable) Has precoord pair Precoordinated pair of Question/Answer or Variable/Value
UK Biobank answer

(Answer/Value)

Has precoord pair Precoordinated pair of Question/Answer or Variable/Value
Precoordinated pair of Question/Answer or Variable/Value Maps to OMOP Standardized concept

ETL guide

  1. Consider additional letters added to the concept_code and the style used for concatenation of codes.
  2. Depending on the type of mapping delivery (direct “Maps to” links or through pre-coordinated pars), specific JOINs should be used.
  3. Event dates should be extracted from the date/timestamp-type variables associated with variables of interest.
  4. Some variables are mapped to historical concepts based on the context but for some of them it is possible to specify a time period of historical concepts. Use values from such variables (for example, extract value from ‘Age angina diagnosed’ variable) to calculate the exact time period. Also use values from associated variables (for example, for ‘Cancer code, self-reported’ variable, associated variable is ‘Interpolated Year when cancer first diagnosed’) to calculate the exact time period.
  5. “Maps to unit” links should be used for populating the unit_source_value and unit_concept_id fields.

Clone this wiki locally