diff --git a/src/03-modality-agnostic-files.md b/src/03-modality-agnostic-files.md index 68bf8550cf..d12ccad3cf 100644 --- a/src/03-modality-agnostic-files.md +++ b/src/03-modality-agnostic-files.md @@ -36,7 +36,7 @@ Example: ```JSON { "Name": "The mother of all experiments", - "BIDSVersion": "1.4.0", + "BIDSVersion": "1.6.0", "DatasetType": "raw", "License": "CC0", "Authors": [ @@ -57,7 +57,7 @@ Example: "Alzheimer A., & Kraepelin, E. (2015). Neural correlates of presenile dementia in humans. Journal of Neuroscientific Data, 2, 234001. doi:1920.8/jndata.2015.7" ], "DatasetDOI": "doi:10.0.2.3/dfjj.10", - "HEDVersion": "7.1.1" + "HEDVersion": "8.0.0" } ``` @@ -94,7 +94,7 @@ Example: ```JSON { "Name": "FMRIPREP Outputs", - "BIDSVersion": "1.4.0", + "BIDSVersion": "1.6.0", "DatasetType": "derivative", "GeneratedBy": [ { diff --git a/src/99-appendices/03-hed.md b/src/99-appendices/03-hed.md index 0466cb7870..bb217322a5 100644 --- a/src/99-appendices/03-hed.md +++ b/src/99-appendices/03-hed.md @@ -3,120 +3,47 @@ Hierarchical Event Descriptors (HED) are a controlled vocabulary of terms describing events in a machine-actionable form so that algorithms can use the information without manual recoding. -HED was originally developed with EEG in mind, but is applicable to -all behavioral experiments. - -Each level of a hierarchical tag is delimited with a forward slash (`/`). -A HED string contains one or more HED tags separated by commas (`,`). -Parentheses (brackets, `()`) group tags and enable specification of multiple items -and their attributes in a single **HED string** (see section 2.4 in -[HED Tagging Strategy Guide](https://www.hedtags.org/hed-docs/HEDTaggingStrategyGuide.pdf)). -For more information about HED and tools available to validate and match HED -strings, please visit [www.hedtags.org](https://www.hedtags.org). -Since dedicated fields already exist for the overall task classification in the -sidecar JSON files (`CogAtlasID` and `CogPOID`), HED tags from the `Paradigm` -HED subcategory should not be used to annotate events. - -## Annotating each event - -There are several ways to associate HED annotations with events within the BIDS -framework. -The most direct way is to use the `HED` column of the `*_events.tsv` -file to annotate events. - -Example: An `*_events.tsv` annotated using HED tags for individual events. - -```Text -onset duration HED -1.1 n/a Event/Category/Experimental stimulus, Event/Label/CrossFix, Sensory presentation/Visual, Item/Object/2D Shape/Cross -1.3 n/a Event/Category/Participant response, Event/Label/ButtonPress, Action/Button press -... -``` - -The direct approach requires that each line in the events file be annotated. -Since there are typically thousands of events in each experiment, -this method of annotation is not convenient unless the annotations are -automatically generated. -Usually annotations that appear in the `HED` column are specific to each individual event. -Information that is common to groups of events can be annotated by category. -Numerical values associated with each event can be annotated by value type. -Annotating by category and by value greatly reduces the effort required to HED tag -data and improves the clarity for data users. - -## Annotating events by categories - -In many experiments, the event instances fall into a much smaller number of -categories, and often these categories are labeled with numerical codes or short names. -This categorical information usually corresponds to one or more columns in `*_events.tsv` -representing categorical values. -Instead of tagging this information for each individual event, -you can assign HED tags for each distinct categorical value -in an accompanying `*_events.json` sidecar and allow the analysis tools to make -the association with individual event instances during analysis. -The column name in the `*_events.tsv` identifies the type of categorical variable. -The following `*_events.tsv` file has one categorical variable called `mycodes` that -takes on three possible values: `Fixation`, `Button`, and `Target`. - -Example: An `*_events.tsv` containing the `mycodes` categorical column. - -```Text -onset duration mycodes -1.1 n/a Fixation -1.3 n/a Button -1.8 n/a Target -... - -``` - -Example: An accompanying `*_events.json` sidecar describing the `mycodes` categorical variable. - -```JSON -{ - "mycodes": { - "LongName": "Local event type names", - "Description": "Main types of events that comprise a trial", - "Levels": { - "Fixation": "Fixation cross is displayed", - "Target": "Target image appears", - "Button": "Subject presses a button" - }, - "HED": { - "Fixation": "Event/Category/Experimental stimulus, Event/Label/CrossFix, - Event/Description/A cross appears at screen center to serve as a fixation point, - Sensory presentation/Visual, Item/Object/2D Shape/Cross, - Attribute/Visual/Fixation point, Attribute/Visual/Rendering type/Screen, - Attribute/Location/Screen/Center", - "Target": "Event/Label/TargetImage, Event/Category/Experimental stimulus, - Event/Description/A white airplane as the RSVP target superimposed on a satellite image is displayed., - Item/Object/Vehicle/Aircraft/Airplane, Participant/Effect/Cognitive/Target, - Sensory presentation/Visual/Rendering type/Screen/2D), - (Item/Natural scene/Aerial/Satellite, - Sensory presentation/Visual/Rendering type/Screen/2D)", - "Button": "Event/Category/Participant response, Event/Label/PressButton, - Event/Description/The participant presses the button as soon as the target is visible, - Action/Button press" - } - } -} -``` - -## Annotating events by value type - -Each column of `*_events.tsv` containing non-categorical values usually represents a -particular type of data, for example the `speed` of a stimulus object across the -screen or the filename of the stimulus image. -These variables could be annotated in the HED column of `*_events.tsv`. -However, that approach requires repeating the values appearing in the individual -columns in the HED column. -A better approach is to annotate the type of value contained in each of these -columns in the `*_events.json` sidecar. -Value variables are annotated in a manner similar to categorical values, -except that the HED string must contain exactly one `#` specifying a placeholder -for the actual column values. -Tools are responsible for substituting the actual column values for the `#` during analysis. - -Example: An `*_events.tsv` containing a categorical column (`trial_type`) and two value -columns (`response_time` and `stim_file`). +HED annotation can be used to describe any experimental events by combining +information from the dataset's `_events.tsv` files and `_events.json` sidecars. + +## HED annotations and vocabulary + +A HED annotation consists of terms selected from a controlled +hierarchical vocabulary (the HED schema). +Individual terms are comma-separated and may be grouped using parentheses to indicate +association. +See [https://www.hedtags.org/display_hed.html](https://www.hedtags.org/display_hed.html) +to view the HED schema and the +[HED documentation](https://hed-specification.readthedocs.io/en/latest/index.html) +for additional resources. + +Starting with HED version 8.0.0, HED allows users to annotate using individual +terms or partial paths in the HED vocabulary (for example `Red` or `Visual-presentation`) +rather than the full paths in the HED hierarchy ( +`Property/Sensory-property/Sensory-attribute/Visual-attribute/Color/CSS-color/Red-color/Red` +or +`Property/Sensory-property/Sensory-presentation/Visual-presentation`). + +HED specific tools MUST treat the short and long HED tag forms interchangeably, +converting between the forms when necessary, based on the HED schema. +Examples of test datasets using the various forms can be found in +[hed-examples/datasets](https://github.com/hed-standard/hed-examples/tree/main/datasets) +on GitHub. +**Using the short form for tags is strongly RECOMMENDED whenever possible**. + +## Annotating events + +Event-related data in BIDS appears in tab-separated value (`events.tsv`) +files in various places in the dataset hierarchy +(see [Events](../04-modality-specific-files/05-task-events.md)). + +`events.tsv` files MUST have `onset` and `duration` columns. +Dataset curators MAY also include additional columns and define their +meanings in associated JSON sidecar files (`events.json`). + +Example: An excerpt from an `events.tsv` file containing three columns +(`trial_type`, `response_time`, and `stim_file`) in addition to +the required `onset` and `duration` columns. ```Text onset duration trial_type response_time stim_file @@ -124,86 +51,137 @@ onset duration trial_type response_time stim_file 5.6 0.6 stop 1.739 images/blue_square.jpg ``` -Example: An accompanying `*_events.json` sidecar describing both categorical and value columns. +The `trial_type` column in the above example contains a limited number of distinct +values (`go` and `stop`). +This type of column is referred to as a *categorical* column, +and the column's meaning can be annotated by assigning HED tags to describe +each of these distinct values. +The JSON sidecar provides a [JSON object](https://www.json.org/json-en.html) of annotations for these categorical values. +That is, the object is a dictionary mapping the categorical values to corresponding HED annotations. + +In contrast, the `response_time` and `stim_file` columns could potentially contain +distinct values in every row. +These columns are referred to as *value* columns and are annotated by creating +a HED tag string to describe a general pattern for these values. +The HED annotation for a value column must include a `#` placeholder, +which dedicated HED tools MUST replace by the actual column value when the annotations +are assembled for analysis. + +Example: An accompanying `events.json` sidecar describing both categorical and +value columns of the previous example. +The `duration` column is also annotated as a value column. ```JSON { - "trial_type": { - "LongName": "Event category", - "Description": "Indicator of type of action that is expected", - "Levels": { - "go": "A red square is displayed to indicate starting", - "stop": "A blue square is displayed to indicate stopping", - }, - "HED": { - "go": "Event/Category/Experimental stimulus, Event/Label/RedSquare, - Event/Description/A red square is displayed to indicate starting, - Sensory presentation/Visual, Item/Object/2D Shape/Square, - Attribute/Visual/Color/Red, Attribute/Visual/Rendering type/Screen, - Attribute/Location/Screen/Center", - "stop": "Event/Category/Experimental stimulus, Event/Label/BlueSquare, - Event/Description/A blue square is displayed to indicate stopping, - Sensory presentation/Visual, Item/Object/2D Shape/Square, - Attribute/Visual/Color/Blue, Attribute/Visual/Rendering type/Screen, - Attribute/Location/Screen/Center", + "Duration": { + "LongName": "Image duration", + "Description": "Duration of the image presentations", + "Units": "s", + "HED": "Duration/# s" + }, + "trial_type": { + "LongName": "Event category", + "Description": "Indicator of type of action that is expected", + "Levels": { + "go": "A red square is displayed to indicate starting", + "stop": "A blue square is displayed to indicate stopping" + }, + "HED": { + "go": "Sensory-event, Visual-presentation, ((Square, Blue),(Computer-screen, Center-of))", + "stop": "Sensory-event, Visual-presentation, ((Square, Blue), (Computer-screen, Center-of))" } }, "response_time": { "LongName": "Response time after stimulus", "Description": "Time from stimulus presentation until subject presses button", "Units": "ms", - "HED": "Attribute/Response start delay/# ms, Action/Button press" + "HED": "(Delay/# ms, Agent-action, (Experiment-participant, (Press, Mouse-button)))," }, "stim_file": { "LongName": "Stimulus filename", "Description": "Relative path of the stimulus image file", - "HED": "Attribute/File/#" + "HED": "Pathname/#" } } ``` -## Best practices - -Most studies will have event categorical variables and value variables that -are common across many of the datasets in the study. -You should try to annotate these columns in a `*_events.json` sidecar -as high in the study hierarchy as possible to avoid duplicate annotations. -Annotations that can be placed in sidecars are preferred to those placed -directly in the HED column, because they are simpler, more compact, and -less prone to inconsistent annotation. -Downstream tools should not distinguish between tags specified using -the explicit HED column and the categorical specifications, but should -form the union before analysis. -Further, the [inheritance principle](../02-common-principles.md#the-inheritance-principle) -applies, so the data dictionaries can appear higher in the BIDS hierarchy. - -You should try to annotate in as much detail as possible. -The HED path structure makes it easy for analysis tools to extract tags -at different levels of detail: For example a user can consider extracting -events associated with 2D shapes for stimuli, ignoring the particular -color or shape details for the stimuli. - -## HED schema and HED versions +Dedicated HED tools MUST assemble an annotation for each event by concatenating the +annotations for each column. + +Example: The fully assembled annotation for the first event in the above +`events.tsv` file with onset `1.2` (the first row) is: + +```Text +Duration/0.6 s, Sensory-event, Visual-presentation, +((Square, Blue), (Computer-screen, Center-of)), +(Delay/1.435 ms, Agent-action, +(Experiment-participant, (Press, Mouse-button))), +Pathname/images/red_square.jpg +``` + +## Annotation using the `HED` column + +Another tagging strategy is to annotate individual events directly by +including a `HED` column in the `events.tsv` file. +This approach is necessary when each event has annotations that are unique +and do not fit into a standard set of patterns. + +Some acquisition or presentation software systems directly +write annotations during the experiment, and these MAY also be placed in the +`HED` column of the `events.tsv` file. + +Dedicated HED tools that assemble the full annotation for events treat MUST not distinguish +between HED annotations extracted from `_events.json` sidecars and those +appearing in the `HED` column of `_events.tsv` files. +The HED strings from all sources are concatenated to form the final +event annotations. + +Annotations placed in sidecars are the RECOMMENDED way +to annotate data using HED. +These annnotations are preferred to those placed +directly in the `HED` column, because they are simpler, more compact, +more easily edited, and less prone to inconsistencies. + +## HED and the BIDS inheritance principle + +Most studies have event files whose columns contain categorical and +numerical values that are similar across the recordings in the study. +If possible, users should annotate these columns in a single +`events.json` sidecar placed at the top level in the dataset. + +If some recordings in the dataset have a column whose values deviate from a +standard pattern, then the annotations for that column MUST be placed in +sidecars located deeper in the dataset directory hierarchy. +According to the BIDS [Inheritance Principle](../02-common-principles.md#the-inheritance-principle), +once a column key in a sidecar (that is, the column name found in the `events.tsv` files) is set, +information about that column cannot be overridden by a sidecar appearing in a directory +closer to the dataset root. + +## HED schema versions The HED vocabulary is specified by a HED schema, which delineates the allowed HED path strings. -By default, BIDS uses the latest HED schema available in the -[hed-specification](https://github.com/hed-standard/hed-specification/tree/master/hedxml) repository -maintained by the hed-standard group. +The version of HED used in tagging a dataset should be provided in the `HEDVersion` +field of the `dataset_description.json` file located in the dataset root directory. +This allows for a proper validation of the HED annotations +(for example using the `bids-validator`). -You can override the default by providing a specific HED version number in the -`dataset_description.json` file using the `HEDVersion` field. -The preferred approach is to validate with the latest version (the default), -but to use the `HEDVersion` field to specify which version was used for later reference. - -Example: The following `dataset_description.json` file specifies that -`HED7.1.1.xml` from the [hed-specification](https://github.com/hed-standard/hed-specification/tree/master/hedxml) repository -should be used to validate the study event annotations. +Example: The following `dataset_description.json` file specifies that the +[`HED8.0.0.xml`](https://github.com/hed-standard/hed-specification/tree/master/hedxml/HED8.0.0.xml) +file from the `hedxml` directory of the +[`hed-specification`](https://github.com/hed-standard/hed-specification) +repository on GitHub should be used to validate the study event annotations. ```JSON { - "Name": "The mother of all experiments", - "BIDSVersion": "1.4.0", - "HEDVersion": "7.1.1" + "Name": "A great experiment", + "BIDSVersion": "1.6.0", + "HEDVersion": "8.0.0" } ``` + +If you omit the `HEDVersion` field from the dataset description file, +any present HED information will be validated using the latest version of the HED schema, +which is bound to result in problems. +Hence, it is strongly RECOMMENDED that the `HEDVersion` field be included when using HED +in a BIDS dataset.