A nonlinear eigenvector algorithm for solving data-driven robust Common Spatial Pattern (CSP)
Common Spatial Pattern (CSP) [1, 2] is a popular signal-processing technique for feature extraction of Electroencephalography (EEG) signals. It is a data-driven supervised machine learning algorithm that computes spatial filters, which extracts the distinguishing features of the EEG signals associated with a task at hand (such as a motor imagery task).
Preprocessing steps (such as channel selection, bandpass filtering, time interval selection) of the raw EEG signals are cruical to successful performance of CSP. Let
Each covariance matrix
The goal of CSP is to find filters such that the variance of the spatially filtered signals under one condition is minimized, while that of the other condition is maximized. Specifically, denoting spatial filters as
Collectively, the covariance matrix of all the spatially filtered signals
CSP imposes the following couple of constraints on the spatial filters:
-
$X^T\overline{\Sigma}_{-}X = \Lambda_{-}$ and$X^T\overline{\Sigma}_{+}X = \Lambda_{+}$ such that$\Lambda_{-}$ and$\Lambda_{+}$ are diagonal matrices. -
$X^T(\overline{\Sigma}_{-} + \overline{\Sigma}_{+})X=\Lambda_{-}+\Lambda_{+}=I_n$ where$I_n$ denotes the$n \times n$ identity matrix.
These two constraints imply that the spatial filters correspond to the eigenvectors of the generalized eigenvalue problem
Typically, CSP computes only a subset of the filters that best distinguish the variances between the two classes. Such spatial filters are the extreme eigenvectors of the generalized eigenvalue problem. Let
respectively.
The use of average covariance matrices
Instead of limiting the covariance matrices to the fixed covariance, min-max CSP considers covariance matrices in the tolerance sets
where
A data-driven approach is used to construct the tolerance sets [5], where the norm is defined by a PCA-based approach on the data covaraince matrices.
The parameters
- Vectorize each data covariance matrix
$\Sigma_c^{(i)}$ by stacking its columns into a$n^2$ -dimensional vector. - Compute the covariance matrix
$\Gamma_c$ of the vectorized covariance matrices. - Compute the
$m$ largest eigenvalues and corresponding eigenvectors (principal components) of$\Gamma_c$ as$\{w_{c}^{(i)}\}_{i=1}^m$ and$\{V_{c}^{(i)}\}_{i=1}^m$ , respectively. - Transform the eigenvectors
$\{V_c^{(i)}\}_{i=1}^m$ into$n\times n$ matrices, symmetrizing afterwards.
The data-driven min-max CSP becomes a nonliner Rayleigh quotient (NRQ) opimization
where for
The other min-max CSP follows similarly.
With the spatial filters
and a linear classifier is defined as
where the sign of the classifier
One natural idea for solving the CSP-NRQ is to use the fixed-point iteration scheme:
which is equivalent to the self-consistent field (SCF) iteration
for solving a nonlinear eigenvalue problem (NEPv)
A major issue with this approach is that the solution of CSP-NRQ does not satisfy the above NEPv.
Let
If
corresponding to the smallest positive eigenvalue [6].
The SCF iteration for solving the correct NEPv is then
where
This SCF iteration is named as CSP-nepv.
A synthetic data is generated using the signals created by the following linear mixing model:
where
For the convergence analysis, the nondiscriminative sources
The convergence plots (objective values and errors) show that the Fixed-point iteration converges whereas the SCF iteration for the correct NEPv (Alg. 1) converges rapidly, displaying a local quadratic convergence. Comparison with conjugate gradient (CG) method and trust region (TR) method in Manopt (Python manifold optimization package) also shows that the SCF iteration has an advantage.
The Berlin dataset (available at https://depositonce.tu-berlin.de/items/1b603748-34fe-411c-8fd2-1711925e4101) and the Gwangju dataset (available at http://gigadb.org/dataset/100295) are used to illustrate a classification rate improvement in using robust spatial filters to standard spatial filerts.
(more classification results using different datasets are avilable in this repo)
Berlin
Gwangju
In the above scatter plots, a dot represents a subject participating in the experiment. A dot above the red diagonal line implies that the corresponding subject obtained improved classification rate from using robust spatial filters. Specifically, 87.5% and 98.0% percent of Berlin and Gwangju subjects, respectively, had improved classificaiton rate.
In the 'Datasets' folder, we provide a text file with a Google Drive link to preprocessed datasets that are used in the examples. For those interested, the codes for the preprocessing steps are also provided in the 'Preprocessing Data' folder. (somehow along the way, the preprocessing codes for the Distraction dataset are lost and only the preprocessed datasets remain, which we include in the 'Datasets' folder)
[1] Zoltan Joseph Koles. The quantitative extraction and topographic mapping of the abnormal components in the clinical eeg. Electroencephalography and clinical Neurophysiology, 79(6):440–447, 1991.
[2] Benjamin Blankertz, Ryota Tomioka, Steven Lemm, Motoaki Kawanabe, and Klaus-Robert Muller. Optimizing spatial filters for robust eeg single-trial analysis. IEEE Signal processing magazine, 25(1):41–56, 2007.
[3] Boris Reuderink and Mannes Poel. Robustness of the common spatial patterns algorithm in the bci-pipeline. University of Twente, Tech. Rep, 2008.
[4] Xinyi Yong, Rabab K. Ward, and Gary E. Birch. Robust common spatial patterns for eeg signal preprocessing. In 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 2087–2090. IEEE, 2008.
[5] Motoaki Kawanabe, Wojciech Samek, Klaus-Robert M ̈uller, and Carmen Vidaurre. Robust common spatial filters with a maxmin approach. Neural computation, 26(2):349–376, 2014.
[6] Zhaojun Bai, Ding Lu, and Bart Vandereycken. Robust rayleigh quotient minimization and nonlinear eigenvalue problems. SIAM Journal on Scientific Computing, 40(5):A3495–A3522, 2018.