Skip to content

Commit

Permalink
kurtosis based ica
Browse files Browse the repository at this point in the history
kashefy committed May 27, 2020

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
1 parent 4afa0d8 commit 99431c6
Showing 4 changed files with 106 additions and 62 deletions.
6 changes: 3 additions & 3 deletions notes/06_fastica/1_ica_ambiguous.tex
Original file line number Diff line number Diff line change
@@ -21,8 +21,8 @@ \section{Ambiguities in ICA and limitations}
\end{frame}

\notesonly{
ICA cannot resolve if the mixing matrix is $\vec A$ or a permuatated and/or scaled version of $\vec A$.
It can \textbf{also} not resolve if the independent sources are $\vec s$ or a permutated and/or scaled version of $\vec s$.
ICA cannot resolve if the mixing matrix is $\vec A$ or a permuted and/or scaled version of $\vec A$.
It can \textbf{also} not resolve if the independent sources are $\vec s$ or a permuted and/or scaled version of $\vec s$.
}

\begin{frame}{\secname}
@@ -130,7 +130,7 @@ \subsection{Implications of the ambiguities}
\E \lbrack \, \vec s \, \rbrack = \vec 0
\end{equation}

Substracting the mean from $\vec x$ does not change $\vec A$:
Subtracting the mean from $\vec x$ does not change $\vec A$:

\begin{equation}
\vec x - \E \lbrack \, \vec x \, \rbrack = \vec A \left( \vec s - \E \lbrack \, \vec s \, \rbrack \right)
2 changes: 1 addition & 1 deletion notes/06_fastica/3_badgaussians.tex
Original file line number Diff line number Diff line change
@@ -49,7 +49,7 @@ \subsubsection{A formal argument for why Gaussians are bad for ICA}

%\slidesonly{\textbf{A more formal argument (cont'd):}}

Now consider applying an orthognal mixing matrix $\widetilde{\vec A}$ that is \textbf{known}.
Now consider applying an orthogonal mixing matrix $\widetilde{\vec A}$ that is \textbf{known}.
\slidesonly{(orthogonal because we whitened the data $\vec x$)\\
Consequently:
}
128 changes: 79 additions & 49 deletions notes/06_fastica/4_kurt.tex
Original file line number Diff line number Diff line change
@@ -99,7 +99,7 @@ \section{ICA by maximizing nongaussianity}
\notesonly{
Recall that ICA cannot resolve scale or permutation of the sources and thirdly it cannot resolve the sign.
This is not an issue.
The role of $\vec z_i$ is to route either $s_1$ or $s_2$ to $\widehat{\vec s}_i$. This covers the ambiguitiy in terms of permutation.
The role of $\vec z_i$ is to route either $s_1$ or $s_2$ to $\widehat{\vec s}_i$. This covers the ambiguity in terms of permutation.
We cannot have both independent sources contribute to $\widehat{s}_i$, only one can. Therefore, we only need a single non-zero component for $\vec z_i$.
Wether $s_1$ is scaled by any factor before reaching $\widehat{s}_i$ does not make it more or less independent of $s_2$. Choosing $1$ for the non-zero component is therefore sufficient.
Finally, negating the source by multiplying it by $(-1)$ also has no consequences on the independence criterion.
@@ -166,13 +166,13 @@ \section{ICA by maximizing nongaussianity}
\end{frame}
}

\section{Kurtosis as a measure for nongaussianity}
\subsection{Kurtosis as a measure for nongaussianity}

\begin{frame}{\secname}
\begin{frame}{\subsecname}

\notesonly{
Kurtosis represents the fourth-order cumulant\footnote{
Cumulants allow us to express the i-th moment in terms of a cumulative sum of the moments preceeding it.
Cumulants allow us to express the i-th moment in terms of a cumulative sum of the moments preceding it.
This simplifies the expression of higher-order moments such as kurtosis which is the fourth-order moment.
} of a random variable.
}
@@ -239,11 +239,13 @@ \section{Kurtosis as a measure for nongaussianity}

\subsection{kurtosis-based ICA}

\begin{frame}
\begin{frame}{\subsecname}

\notesonly{
Two statistically independent sources with

$\langle s_i s_j \rangle = \delta_{ij} \quad \Leftrightarrow \quad \langle \vec s \, \vec s^\top \rangle = \vec I_N$ (any scaling can be attributed to $\vec A$)
}

\begin{equation*}
\widehat{s}_i \quad
@@ -252,21 +254,34 @@ \subsection{kurtosis-based ICA}
= \quad \vec{z}^\top \vec{s} \quad
= \quad z_1 s_1 + z_2 s_2
\end{equation*}

\vspace{1mm}
We want the covariance of our reconstructions to match that of the original sources.
We want the covariance of our reconstructions $\widehat{\vec s}$ to match that of the original sources $\vec s$.
\begin{equation*}
\langle \widehat{\vec s} \, \widehat{\vec s}^\top \rangle \eqexcl \langle \vec s \, \vec s^\top \rangle = \vec I_N
\end{equation*}
This implies,
\begin{align*}
\begin{align}
\var(\widehat{s}_i)
\; &= \; \langle \big( z_1 s_1 + z_2 s_2 \big)^2 \rangle_{P_{\vec s}}\\
\; &= \; \langle z_1^2 \, s_1^2 \rangle \;+\; 2 \, \langle z_1\, s_1\, z_2 \, s_2 \rangle \;+\; \langle z_2^2 \, s_2^2 \rangle \\
\; &= \; z_1^2 \, \langle s_1^2 \rangle \;+\; 2 \, z_1\, z_2 \, \underbrace{\langle s_1\, s_2 \rangle}_{= 0} \;+\; z_2^2 \, \langle s_2^2 \rangle \\
\; &= \; z_1^2 \, \langle s_1^2 \rangle \;+\; z_2^2 \,\langle s_2^2 \rangle \\
\; &= \; z_1^2 + z_2^2 \eqexcl 1
\end{align*}
Making the constraint of unit variance for $\widehat{s}_i$ is to match the variance assumed for the orgiinal sources $s_1$ and $s_2$. This implies that solutions for $\vec z$ are constrained to lie on a unit circle.
\end{align}

\end{frame}

\begin{frame}{\subsecname}

\slidesonly{
$$
\var(\widehat{s}_i)
\; = \; z_1^2 + z_2^2 \eqexcl 1
$$
}

Making the constraint of unit variance for $\widehat{s}_i$ is to match the variance assumed for the original sources $s_1$ and $s_2$. This implies that solutions for $\vec z$ are constrained to lie on a unit circle.
\vspace{1mm}
\begin{align*}
\kurt(\widehat{s}) \;\; &= \;\; \kurt(z_1 s_1 + z_2 s_2) \;\; \\ &= \;\; \kurt(z_1 s_1) + \kurt(z_2 s_2) \; = \; z_1^4 \kurt(s_1) + z_2^4 \kurt(s_2)
@@ -332,9 +347,9 @@ \subsection{kurtosis-based ICA}

\end{frame}

\subsection{Kurtosis-based ICA: the gradient algorithm}
\subsubsection{Kurtosis-based ICA: the gradient algorithm}

\begin{frame}
\begin{frame}{\subsubsecname}

\notesonly{
$| \kurt{(\vec{b}^\top \vec{u})} |$ can be maximized by moving $\vec b$
@@ -371,10 +386,8 @@ \subsection{Kurtosis-based ICA: the gradient algorithm}

\end{frame}

\begin{frame}
\slidesonly{
\frametitle{Kurtosis-based ICA: the gradient algorithm}
}
\begin{frame}{\subsubsecname}

\begin{block}{I. batch learning:}
Initialization: random vector $\vec{b}$ of unit length
\begin{eqnarray*}
@@ -480,47 +493,55 @@ \subsection{Kurtosis-based ICA: the gradient algorithm}
\end{block}
\end{frame}

\slidesonly{
\begin{frame}
\frametitle{Summary so far:}
\begin{enumerate}
\item \textcolor{gray}{
Initial ICA Problem: $\vec x = \vec A\, \vec s$
}
\item \textcolor{gray}{
New ICA Problem: $\vec u = \widetilde{\vec A}\, \vec s$,\\
where $\vec u = \vec D^{-\frac{1}{2}} \vec U^\top \vec x$ and $\vec \Sigma_u = \vec I_N$.
}
\item \textcolor{gray}{
$\vec u$ is the \emph{whitened} version of $\vec x$.
}
\item \textcolor{gray}{
$\vec D$ and $\vec U$ can be obtained via PCA on $\vec x$.
}
\item \textcolor{gray}{
Applying ICA on whitened data reduced the number of free parameters.
}
\item \textcolor{gray}{
PCA simplifies the ICA problem.
}
\item Ambiguities in ICA
\item Why are Gaussians bad for ICA?
\item ICA by maximizing nongaussianity
\item Kurtosis-based ICA

\end{enumerate}
%\slidesonly{
%\begin{frame}
%\frametitle{Summary so far:}
%\begin{enumerate}
%\item \textcolor{gray}{
%Initial ICA Problem: $\vec x = \vec A\, \vec s$
%}
%\item \textcolor{gray}{
%New ICA Problem: $\vec u = \widetilde{\vec A}\, \vec s$,\\
%where $\vec u = \vec D^{-\frac{1}{2}} \vec U^\top \vec x$ and $\vec \Sigma_u = \vec I_N$.
%}
%\item \textcolor{gray}{
%$\vec u$ is the \emph{whitened} version of $\vec x$.
%}
%\item \textcolor{gray}{
%$\vec D$ and $\vec U$ can be obtained via PCA on $\vec x$.
%}
%\item \textcolor{gray}{
%Applying ICA on whitened data reduced the number of free parameters.
%}
%\item \textcolor{gray}{
%PCA simplifies the ICA problem.
%}
%\item Ambiguities in ICA
%\item Why are Gaussians bad for ICA?
%\item ICA by maximizing nongaussianity
%\item Kurtosis-based ICA

%\end{enumerate}

\textbf{Next: Can we do better than kurtosis-based ICA?}


\end{frame}
}
%\end{frame}
%}
\notesonly{
Next, we will look for an alternative that mitigates the sensitivity to outliers which kurtosis-based ICA is prone to.
}

\begin{frame}

\slidesonly{

\textbf{Next: Can we do better than kurtosis-based ICA?}

\vspace{5mm}

\pause
}

Kurtosis is easy to compute but can be \emph{sensitive to outliers}.
This is a usual problem with higher-order statistics.
\begin{block}{Example}
@@ -530,8 +551,17 @@ \subsection{Kurtosis-based ICA: the gradient algorithm}
\itl contribution to kurtosis: $ \geq 10^4/1000 -3 = 7$
\end{itemize}
\end{block}
\end{frame}

We therefore turn to an alternate measure for nongaussianity, namely \emph{negentropy} for brevity (not the same as negative entropy $-H(\cdot)$). Negentropy of the reconstructed source $\widehat{\vec s}$ measures the difference between the differential entropy of $\widehat{\vec s}$ and the differential entropy of a Gaussian distribution with the same variance as $\widehat{\vec s}$.
\pause

\slidesonly{
$\Rightarrow\;\;$ a more robust measure for nongaussianity\\
}
\notesonly{We therefore turn to an alternate measure for nongaussianity, namely }\emph{negentropy} \notesonly{for brevity }(not the same as negative entropy $-H(\cdot)$).\\

%\svspace{5mm}
\notesonly{
Negentropy of the reconstructed source $\widehat{\vec s}$ measures the difference between the differential entropy of $\widehat{\vec s}$ and the differential entropy of a Gaussian distribution with the same variance as $\widehat{\vec s}$.
}

\end{frame}
32 changes: 23 additions & 9 deletions notes/06_fastica/5_fastica.tex
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
\subsection{Negentropy}

\mode<presentation>{
\begin{frame}
\begin{center} \huge
\subsecname
\end{center}
\begin{center}
A more robust alternative to Kurtosis-based ICA
\end{center}
\end{frame}
}

Negentropy $J(\widehat{s})$ of the reconstructed sources $\widehat{\vec s}$ is defined as:

@@ -39,7 +51,7 @@
}

\begin{itemize}
\itR theoretically well motivated measure. Considered in some cases the optimzal estimator for nongaussianity.
\itR theoretically well motivated measure. Considered in some cases the optimal estimator for nongaussianity.
\itR non-negative
\itR scale-invariant: $J(\alpha \widehat{s}) = J(\widehat{s}), \ \ \forall \alpha \ne 0$ (cf. exercise sheet)
\itR \textbf{Problem:} requires estimation of density $p(\widehat{s})$
@@ -49,9 +61,9 @@

\end{frame}

\subsection{Approximations of negentropy}
\subsubsection{Approximations of negentropy}

\begin{frame}
\begin{frame}{\subsubsecname}

\notesonly{
Estimating negentropy using the definition in \eqref{eq:negentropy} is computationally costly. It would require estimating the density of the random variable. We therefore resort to simpler approximations for negentropy. Such as the following use of cumulants:
@@ -63,7 +75,7 @@ \subsection{Approximations of negentropy}
\end{equation}

\notesonly{
For symmetric distributions the first term in the approximation in \eqref{eq:negentropyapprox} is effectivley zero, which makes the approximation equivalent to the square of the kurtosis. The approximation would therefore from the same sensitvity to outliers.
For symmetric distributions the first term in the approximation in \eqref{eq:negentropyapprox} is effectivley zero, which makes the approximation equivalent to the square of the kurtosis. The approximation would therefore from the same sensitivity to outliers.
}

\slidesonly{
@@ -79,10 +91,12 @@ \subsection{Approximations of negentropy}
\end{frame}

\clearpage
\begin{frame}{Common contrast functions}

\subsubsection{Contrast functions}

\begin{frame}{\subsubsecname}

\notesonly{
\textbf{Common contrast functions}

The contrast function can be chosen depending on the assumed shape of the source densities.

@@ -112,7 +126,7 @@ \subsection{Approximations of negentropy}

\begin{frame}
\slidesonly{
\frametitle{Common contrast functions:}
\frametitle{Common contrast functions}
}
\slidesonly{
\smaller
@@ -150,12 +164,12 @@ \subsection{Approximations of negentropy}
\end{frame}

\begin{frame}
cf. lecture slides for optmization of negentropy using contrast functions.
cf. lecture slides for optimization of negentropy using contrast functions.
\end{frame}

\begin{frame}
\question{How do we evaluate ICA?}\\

-cf. https://research.ics.aalto.fi/ica/icasso/
- Visualization methods\footnote{If interested cf. https://research.ics.aalto.fi/ica/icasso/}
\end{frame}

0 comments on commit 99431c6

Please sign in to comment.