Python parser for combined EEG and eye-tracking data

Copyright (2022-2025) Hermine Berberyan, Wouter Kruijne, Sebastiaan Mathôt, Ana Vilotijević

About

A Python module for reading concurrently recorded EEG and eye-tracking data, and parsing this data into convenient objects for further analysis. For this to work, several assumptions need to be met, as described under Assumptions. At present, this module is largely for internal use, and focused on our own recording environment.

Key features:

Experimental variables (such as conditions) from the eye-tracking data are used as metadata for the EEG analysis.
Gaze and pupil data is added as channels to the EEG data.
Automated preprocessing of eye-tracking and EEG data.

Example

Parse the data.

import eeg_eyetracking_parser as eet

# eet.read_subject.clear()  # uncomment to clear the cache and reparse
raw, events, metadata = eet.read_subject(2)
raw.plot()

To avoid having to parse the data over and over again, read_subject() uses persistent memoization, which is a way to store the return values of a function on disk and return them right away on subsequent calls. To clear the memoization cache, either call the read_subject.clear() function or remove the .memoize folder.

Plot the voltage across four occipital electrodes locked to cue onset for three seconds. This is done separately for three different conditions, defined by cue_eccentricity. The function eet.autoreject_epochs() behaves similarly to mne.Epochs(), except that autorejection is applied and that, like read_subject(), it uses persistent memoization.

import numpy as np
import mne
from matplotlib import pyplot as plt
from datamatrix import convert as cnv

CUE_TRIGGER = 1
CHANNELS = 'O1', 'O2', 'Oz', 'P3', 'P4'

cue_epoch = eet.autoreject_epochs(raw, eet.epoch_trigger(events, CUE_TRIGGER),
                                  tmin=-.1, tmax=3, metadata=metadata,
                                  picks=CHANNELS)

We can convert the metadata, which is a DataFrame, to a DataMatrix, and add cue_epoch as a multidimensional column

from datamatrix import convert as cnv
import time_series_test as tst

dm = cnv.from_pandas(metadata)
dm.erp = cnv.from_mne_epochs(cue_epoch)  # rows x channel x time
dm.mean_erp = dm.erp[:, ...]             # Average over channels: rows x time
tst.plot(dm, dv='mean_erp', hue_factor='cue_eccentricity')

Because the regular mne.Epoch() object doesn't play nice with non-data channels, such as pupil size, you need to use the eet.PupilEpochs() class instead. This is class otherwise identical, except that it by default removes trials where baseline pupil size is more than 2 SD from the mean baseline pupil size.

pupil_cue_epoch = eet.PupilEpochs(raw, eet.epoch_trigger(events, CUE_TRIGGER),
                                  tmin=0, tmax=3, metadata=metadata,
                                  baseline=(0, .05))
dm.pupil = cnv.from_mne_epochs(pupil_cue_epoch, ch_avg=True)  # only 1 channel
tst.plot(dm, dv='pupil', hue_factor='cue_eccentricity')

Installation

pip install eeg_eyetracking_parser

Dependencies

datamatrix >= 1.0
eyelinkparser
mne
autoreject
h5io
braindecode
python-picard
json_tricks

Assumptions

Data format

EEG data should be in BrainVision format (.vhdr), recorded at 1000 Hz
Eye-tracking data should be EyeLink format (.edf), recorded monocularly at 1000 Hz

File and folder structure

Files should be organized following BIDS.

# Container folder for all data
data/
    # Subject 2
    sub-02/
        # EEG data
        eeg/
            sub-02_task-attentionalbreadth_eeg.eeg
            sub-02_task-attentionalbreadth_eeg.vhdr
            sub-02_task-attentionalbreadth_eeg.vmrk
        # Behavioral data (usually not necessary)
        beh/
            sub-02_task-attentionalbreadth_beh.csv
        # Eye-tracking data
        eyetracking/
            sub-02_task-attentionalbreadth_physio.edf

You can re-organize data files into the above structure automatically with the data2bids command, which is part of this package.

Assumptions:

all EEG files (.eeg, .vhdr, .vmrk) are named in a 'Subject-00X-timestamp' format (e.g. Subject-002-[2022.06.12-14.35.46].eeg)
eye-tracking files (.edf) are named in a 'sub_X format' (e.g. sub_2.edf)

For example, to re-organize from participants 1, 2, 3, and 4 for a task called 'attentional-breadth', you can run the following command. This assumes that the unorganized files are in a subfolder called data and that the re-organized (BIDS-compatible) files are also in this subfolder, i.e. as shown above.

data2bids --source-path=data --target-path=data -s=1,2,3,4 -t=attentional-breadth

Trigger codes

The start of each trial is indicated by a counter that starts at 128 for the first trial, and wraps around after 255, such that trial 129 is indicated again by 128. This trigger does not need to be sent to the eye tracker, which uses its own start_trial message. A temporal offset between the start_trial message of the eye tracker and the start-trial trigger of the EEG is ok, and will be compensated for during parsing.

EE.PulseLines(128 + trialid % 128, 10)  # EE is the EventExchange object

The onset of each epoch is indicated by a counter that starts at 1 for the first epoch, and then increases for subsequent epochs. In other words, if the target presentation is the second epoch of the trial, then this would correspond to trigger 2 as in the example below. This trigger needs to be sent to both the EEG and the eye tracker at the exact same moment (a temporal offset is not ok).

target_trigger = 2
eyetracker.log(f'start_phase {target_trigger}')  # eyetracker is created by PyGaze
EE.PulseLines(target_trigger, 10)

Triggers should only be used for temporal information. Conditions are only logged in the eye-tracking data.

Function reference

autoreject_epochs(*args, ar_kwargs=None, **kwargs)

A factory function that creates an Epochs() object, applies autorejection, and then returns it.

Important: This function uses persistent memoization, which means that the results for a given set of arguments are stored on disk and returned right away for subsequent calls. For more information, see https://pydatamatrix.eu/memoization/

Parameters

*args: iterable

Arguments passed to mne.Epochs()
ar_kwargs: dict or None, optional

Keywords to be passed to AutoReject(). If n_interpolate is not specified, a default value of [1, 4, 8, 16] is used.
**kwargs: dict

Keywords passed to mne.Epochs()

Returns

Epochs:

An mne.Epochs() object with autorejection applied.

epoch_trigger(events, trigger)

Selects a single epoch trigger from a tuple with event information. Epoch triggers have values between 1 and 127 (inclusive).

Parameters

events: tuple

Event information as returned by read_subject().
trigger: int

A trigger code, which is a positive value.

Returns

array:

A numpy array with events as expected by mne.Epochs().

PupilEpochs(*args, baseline_trim=(-2, 2), channel='PupilSize', **kwargs)

An Epochs class for the PupilSize channel. This allows baseline correction to be applied to pupil size, even though this channel is not a regular data channel. In addition, this class allows pupil sizes to be excluded based on deviant baseline values, which is recommended for pupil analysis (but not typically done for eeg).

Parameters

*args: iterable

Arguments passed to mne.Epochs()
baseline_trim: tuple of int, optional

The range of acceptable baseline values. This refers to z-scores.
channel: str, optional

The channel name that contains pupil-size data
**kwargs: dict

Keywords passed to mne.Epochs()

Returns

Epochs:

An mne.Epochs() object with autorejection applied.

read_subject(subject_nr, folder='data/', trigger_parser=None, eeg_margin=30, min_sacc_dur=10, min_sacc_size=100, min_blink_dur=10, blink_annotation='BLINK', saccade_annotation='SACCADE', eeg_preprocessing=True, save_preprocessing_output=True, plot_preprocessing=False, eye_kwargs={}, downsample_data_kwargs={}, drop_unused_channels_kwargs={}, rereference_channels_kwargs={}, create_eog_channels_kwargs={}, set_montage_kwargs={}, annotate_emg_kwargs={}, band_pass_filter_kwargs={}, autodetect_bad_channels_kwargs={}, run_ica_kwargs={}, auto_select_ica_kwargs={}, interpolate_bads_kwargs={})

Reads EEG, eye-tracking, and behavioral data for a single participant. This data should be organized according to the BIDS specification.

EEG data is assumed to be in BrainVision data format (.vhdr, .vmrk, .eeg). Eye-tracking data is assumed to be in EyeLink data format (.edf or .asc). Behavioral data is assumed to be in .csv format.

Metadata is taken from the behavioral .csv file if present, and from the eye-tracking data if not.

Important: This function uses persistent memoization, which means that the results for a given set of arguments are stored on disk and returned right away for subsequent calls. For more information, see https://pydatamatrix.eu/memoization/

Parameters

subject_nr: int or sr

The subject number to parse. If an int is passed, the subject number is assumed to be zero-padded to length two (e.g. '01'). If a string is passed, the string is used directly.
folder: str, optional

The folder in which the data is stored.
trigger_parser: callable, optional

A function that converts annotations to events. If no function is specified, triggers are assumed to be encoded by the OpenVibe acquisition software and to follow the convention for indicating trial numbers and event onsets as described in the readme.
eeg_margin: int, optional

The number of seconds after the last trigger to keep. The rest of the data will be cropped to save memory (in case long periods of extraneous data were recorded).
min_sacc_dur: int, optional

The minimum duration of a saccade before it is annotated as a BAD_SACCADE.
min_sacc_size: int, optional

The minimum size of a saccade (in pixels) before it is annotated as a saccade.
min_blink_dur: int, optional

The minimum duration of a blink before it is annotated as a blink.
blink_annotation: str, optional

The annotation label to be used for blinks. Use a BAD_ suffix to use blinks a bads annotations.
saccade_annotation: str, optional

The annotation label to be used for saccades. Use a BAD_ suffix to use saccades a bads annotations.
eeg_preprocessing: bool or list, optional

Indicates whether EEG preprocessing should be performed. If True, then all preprocessing steps are performed. If a list is passed, then only those steps are performed for which the corresponding function name is in the list (e.g. ['downsample_data', 'set_montage'])
save_preprocessing_output: bool, optional

Indicates whether output generated during EEG preprocessing should be saved.
plot_preprocessing: bool, optional

Indicates whether plots should be shown during EEG preprocessing.
eye_kwargs: dict, optional

Optional keyword arguments to be passed onto the EyeLink parser. If traceprocessor is provided, a default traceprocessor is used with advanced blink reconstruction enabled and 10x downsampling.
downsample_data_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
drop_unused_channels_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
rereference_channels_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
create_eog_channels_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
set_montage_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
annotate_emg_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
band_pass_filter_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
autodetect_bad_channels_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
run_ica_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
auto_select_ica_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.
interpolate_bads_kwargs: dict, optional

Passed as keyword arguments to corresponding preprocessing function.

Returns

tuple:

A raw (EEG data), events (EEG triggers), metadata (a table with experimental variables) tuple.

trial_trigger(events)

Selects all trial triggers from event information. Trial triggers have values between 128 and 255 (inclusive).

Parameters

events: tuple

Event information as returned by read_subject().

Returns

array:

A numpy array with events as expected by mne.Epochs().

braindecode_utils.decode_subject(read_subject_kwargs, factors, epochs_kwargs, trigger, epochs_query='practice == "no"', epochs=4, window_size=200, window_stride=1, n_fold=4, crossdecode_read_subject_kwargs=None, crossdecode_factors=None, patch_data_func=None, read_subject_func=None, cuda=True, balance=True)

The main entry point for decoding a subject's data.

Parameters

read_subject_kwargs: dict

A dict with keyword arguments that are passed to eet.read_subject() to load the data. Additional preprocessing as specified in preprocess_raw() is applied afterwards.
factors: str or list of str

A factor or list of factors that should be decoded. Factors should be str and match column names in the metadata.
epochs_kwargs: dict, optional

A dict with keyword arguments that are passed to mne.Epochs() to extract the to-be-decoded epoch.
trigger: int

The trigger code that defines the to-be-decoded epoch.
epochs_query: str, optional

A pandas-style query to select trials from the to-be-decoded epoch. The default assumes that there is a practice column from which we only want to have the 'no' values, i.e. that we want exclude practice trials.
epochs: int, optional

The number of training epochs, i.e. the number of times that the data is fed into the model. This should be at least 2.
window_size_samples: int, optional

The length of the window to sample from the Epochs object. This should be slightly shorter than the actual Epochs to allow for jittered samples to be taken from the purpose of 'cropped decoding'.
window_stride_samples: int, optional

The number of samples to jitter around the window for the purpose of cropped decoding.
n_fold: int, optional

The total number of splits (or folds). This should be at least 2.
crossdecode_read_subject_kwargs: dict or None, optional

When provided these read_subject_kwargs are passed to read_subject_func for reading the to-be-decoded test dataset.
crossdecode_factors: str or list of str or None, optional

A factor or list of factors that should be decoded during tester. If provided, the classifier is trained using the factors specified in factors and tested using the factors specified in crossdecode_factors. In other words, specifying this keyword allow for crossdecoding.
patch_data_func: callable or None, optional

If provided, this should be a function that accepts a tuple of (raw, events, metadata) as returned by read_subject() and also returns a tuple of (raw, events, metadata). This function can modify aspects of the data before decoding is applied.
read_subject_func: callable or None, optional

If provided, this should be a function that accepts keywords as provided through the read_subject_kwargs argument, and returns a tuple of (raw, events, metadata). If not provided, the default read_subject() function is used.
cuda: bool, optional

If True, cuda will be used for GPU processing if it is available. If False, cuda won't be used, not even when it is available.
balance: bool, optional

Makes sure that a dataset contains an equal number of observations for each label by randomly duplicating observations from labels that have too few observations.

Returns

DataMatrix

Contains the original metadata plus four additional columns:
- braindecode_label is a numeric label that corresponds to the to-be-decoded factor, i.e. the ground truth
- braindecode_prediction is the predicted label
- braindecode_correct is 1 for correct predictions and 0 otherwise
- braindecode_probabilities is a SeriesColumn with the predicted probabilities for each label. The prediction itself corresponds to the index with the highest probability.

License

eeg_eyetracking_parser is licensed under the GNU General Public License v3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Python parser for combined EEG and eye-tracking data

Table of contents

About

Example

Installation

Dependencies

Assumptions

Data format

File and folder structure

Trigger codes

Function reference

autoreject_epochs(*args, ar_kwargs=None, **kwargs)

Parameters

Returns

epoch_trigger(events, trigger)

Parameters

Returns

PupilEpochs(*args, baseline_trim=(-2, 2), channel='PupilSize', **kwargs)

Parameters

Returns

Parameters

Returns

trial_trigger(events)

Parameters

Returns

Parameters

Returns

License

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Python parser for combined EEG and eye-tracking data

Table of contents

About

Example

Installation

Dependencies

Assumptions

Data format

File and folder structure

Trigger codes

Function reference

autoreject_epochs(*args, ar_kwargs=None, **kwargs)

Parameters

Returns

epoch_trigger(events, trigger)

Parameters

Returns

PupilEpochs(*args, baseline_trim=(-2, 2), channel='PupilSize', **kwargs)

Parameters

Returns

Parameters

Returns

trial_trigger(events)

Parameters

Returns

Parameters

Returns

License