Skip to content

Commit

Permalink
Merge branch 'master' of github.com:kashefy/mi2notes
Browse files Browse the repository at this point in the history
  • Loading branch information
kashefy committed Mar 3, 2020
2 parents a2d14d3 + 5934d0e commit fe6594b
Show file tree
Hide file tree
Showing 54 changed files with 17,700 additions and 0 deletions.
310 changes: 310 additions & 0 deletions notes/04_density-transform/1_density-transform.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,310 @@

\section{The ICA problem:}

Let $\vec s = (s_1, s_2,...,s_N)^\top$ denote the concatenation of independent sources
and $\vec x \in \R^N$ describe our observations. $\vec x$ relates to $\vec s$ through a
\emph{linear transformation} $\vec A$:

\begin{equation}
\label{eq:ica}
\vec x = \vec A \, \vec s.
\end{equation}

We refer to $\vec A$ as the \emph{mixing matrix} and Eq.\ref{eq:ica} as the \emph{ICA problem},
which is recovering $\vec s$ from only observing $\vec x$.

\underline{Example scenario:}

Two speakers are placed in a room and emit signals $s_1$ and $s_2$.
The speakers operate indepdendent of one another.
Two microphones are placed in the room and start recording.
The first microphone is placed slightly closer to speaker 2, while
the second microphone is placed slightly closer to speaker 1.
$x_1$ and $x_2$ denote the recordings of the first and second microphone respectively.
When we listen to the recordings we expect to hear a mix of $s_1$ and $s_2$.
Since microphone 1 was placed closer to speaker 2, when we only listen to $x_1$ we hear more of $s_2$ than $s_1$.
The opposite can be said when we listen only to $x_2$.

Acoustic systems are linear. This means that $x_1$ is a superposition of \emph{both} sources $s_1$ and $s_2$.
We will assume here that the contribution of a source $s_i$
to an observation $x_j$ is inversely proportional to the distance between the source and the microphone.
The distance-contribution relationship is \emph{linear}. We don't need this to be any more realistic.

If we had a measurement of the distance between each microphone and each speaker,
we would tell exactly what the contribution of each of $s_1$ and $s_2$ is to each recorded observation.
If we know the exact contribution of a source to an observation, we can look at both observations and recover each source in full.

This is what ICA tries to solve, except that it does not have any knowledge about the spatial setting. It is blind.

\underline{Outline:}

Before we tackle ICA itself, we first look at the more basic principle of \emph{density transformation} and
the \emph{convservation of probability}.\\
We start with more specific cases of applying density transformations,
namely \emph{pseudo random number generators} and what the inverse of \emph{cumulative distribution functions (cdf)} can be used for.\\
Finally we discuss how to generalize this in order to transform one probability density function (pdf) into another.

\subsection{PRNG:}

How can we sample from the uniform distribution $\in \lbrack0, 1)$?

\begin{itemize}
\item Create a sequence, preferrably with a wrong period.
\item minimal pattern and sub-subsequences
\item determinism has an advantage:
\begin{enumerate}
\item \emph{reproducible} sequneces
\item efficiency; the starting element or ``seed'' and length of the sequence is suffcient is representative of the entire sequnece.
\end{enumerate}
\end{itemize}

\underline{Linear congruential generator (LCG):}

Start with a seed $y_0 \in \overbrace{\left\{0,\ldots,m-1\right\}}^{=:\;\mathcal{M}}$ with $m \in \N$ ($m$ controls the granularity).
The next sample $y_t$ is computed as:

\begin{equation}
y_t = \left( \, a \; y_{t-1} \; + \; b \, \right) \, \text{mod} \; m,
\end{equation}
where\\[-0.7cm]
\begin{align*}
a \in \mathcal{M}&\; \text{is the multiplier,} \\
b \in \mathcal{M}&\; \text{is the increment.}
\end{align*}

Then $u_i = \frac{y_i}{m} \approx \,\mathcal{U} \in \lbrack0, 1)$.

Although finicky and requiring careful parameterization, LCG gives us something for drawing from a uniform distribution.
Next, we look at how to draw samples of a random variable $X$ with a desired pdf $p_X(x)$.
using uniformly sampled values $\tilde z \in [0,1]$.

\subsection{Inverse CDF:}

If $F_{X}(x)$ is the cumulative
distribution function (cdf) of a random variable $X$, then the
random variable $Z = F_{X}(X)$ is uniformly distributed on the
interval $[0,1]$. This result provides
% (after inverting the relationship)
a general recipe to generate
samples $\tilde x$ of a random variable $X$ with a desired pdf $p_X(x)$
from uniformly distributed random numbers $\tilde z \in [0,1]$:
\begin{enumerate}
\item Compute the cdf $F_X(x)$ of the desired pdf $p_X(x)$

\begin{equation}
F_X(x) = P(X \leq x) = \int_{-\infty}^{x} p(y)\,dy
\end{equation}

The cdf is a one-to-one mapping of the domain of the cdf to the interval $[0,1]$.
If $Z$ is a uniform random variable, then $X=F_X^{-1}(Z)$ has the distribution $F$.

\item Determine the inverse transformation $F^{-1}$.

\item Sample uniformly distributed numbers (in $[0,1]$), $\tilde z$.
\item Get the samples $\tilde x=F^{-1}(\tilde z)$ from $X$.
\end{enumerate}

At this point we can use a PRNG to sample from the uniform distribution and
by plugging those samples into the inverse cdf we can obtain samples from a desired pdf.

\newpage

\section{Density Transformation:}

Let $X_1$ and $X_2$ be jointly continuous random variables with
density function $f_{X_1, X_2}$:

$$
f_{X_1, X_2}(x_1, x_2) = f(\vec x) \qquad \vec x \in \Omega \subset \R^2
$$

and let $\vec u = \vec u(\vec x) = ( u_1(\vec x), u_2(\vec x)) = ( u_1(x_1, x_2), u_2(x_1, x_2)) $ be a one-to-one mapping/transformation.

%\begin{figure}
%\centering
\includegraphics[width=0.7\textwidth]{img/u.pdf}
%\caption*{$\vec{\psi} \in \mathcal{F} = \overline{\operatorname{span} \vec{\phi}(\mathbb{R}^N)}$}
%\end{figure}

The area of the small rectangle in $\Omega$ is $A = dx_1\, dx_2$.\\

The goal is to show that in order for probability to be conserved, the areas on both spaces have to be equal.
We will therefore demonstrate that:
$$
\int_{\Omega} f(\vec{x}) \mathbf{d}\vec{x}
=\int_{u(\Omega)} f({\vec x(\vec u)}) \frac{1}{\left|\det \frac{\partial \vec{u}(\vec{x})}{\partial \vec{x}} \right|} \mathbf{d}\vec{u}.
$$

\begin{itemize}
\item Because $dx_1$ and $dx_2$ are \emph{infinitesimally small} we can consider the mapping
$\vec u$ to act as a linear transformation, resulting in a different shape in $u(\Omega)$.
\item The shape of the shaded area in $u(\Omega)$ is \emph{approximately} a parallelogram.
\item We will compute the ratio of the areas between the two transforms.
\item If we want to go back from $u(\Omega)$ to $\Omega$ we only need $\frac{1}{\text{ratio}}$.
\end{itemize}

\underline{Important:}
Approximating the above as a linear transformation (i.e. a small rectangle turns into a parallelogram) only holds because for very small $d\vec x$.

To get the area of the parallelogram in $u(\Omega)$ we need to find out what $\vec u(dx_1, dx_2))$ is.
The area of the parallelogram then becomes the magnitude of the cross product between the components of $\vec u(\vec x)$.

\newpage

We first only consider the vector due to $dx_1$ in blue:

\includegraphics[width=0.75\textwidth]{img/x1.pdf}

The difference vector between the two correspsonding points in $u(\Omega)$ becomes:

\begin{equation*}
\begin{array}{r}
\rmat{
u_1 (x_1 + dx_1, x_2)\\
u_2 (x_1 + dx_2, x_2)
} -
\rmat{
u_1 (x_1, x_2)\\
u_2 (x_1, x_2)
} \\[0.7cm]
=
\rmat{
u_1 (x_1 + dx_1, x_2) - u_1 (x_1, x_2)\\
u_2 (x_1 + dx_2, x_2) - u_2 (x_1, x_2)
}
\end{array}
\end{equation*}

Because $dx_1$ is so small, we can approximate the transformed vector by the derivative,
which is essentially taking the limit. The difference vector in $u(\Omega)$ becomes:

$$
u: \vec e_1 \mapsto
\rmat{
\frac{\partial u_1}{\partial x_1} \Delta x_1\\[0.2cm]
\frac{\partial u_2}{\partial x_1} \Delta x_1
}
$$

\question{What about $dx_2$?}

- We use the same procedure (one the red vector $\vec e_2$) and get:

$$
u: \vec e_2 \mapsto
\rmat{
\frac{\partial u_1}{\partial x_2} \Delta x_2\\[0.2cm]
\frac{\partial u_2}{\partial x_2} \Delta x_2
}
$$

The area of the parallelogram spanned by the two difference vectors in $u(\Omega)$
is the magnitude of their cross product:
\begin{align*}
&\left| \;
\rmat{
\frac{\partial u_1}{\partial x_1} \Delta x_1\\[0.2cm]
\frac{\partial u_2}{\partial x_1} \Delta x_1
}
\times
\rmat{
\frac{\partial u_1}{\partial x_2} \Delta x_2\\[0.2cm]
\frac{\partial u_2}{\partial x_2} \Delta x_2
}
\; \right|\\
&=
\left| \;
\underbrace{
\frac{\partial u_1}{\partial x_1} \frac{\partial u_2}{\partial x_2}
\; - \;
\frac{\partial u_1}{\partial x_2} \frac{\partial u_2}{\partial x_1}
}_{\text{the Jacobian determinant}}
\; \right| \Delta x_1 \Delta x_2 \\
&=
\left| \; \det \quad
\underbrace{
\left(
\frac{\partial (u_1, u_2)}{\partial (x_1,x_2)}
\right)
}_{\text{the Jacobian}}
\; \right| \Delta x_1 \Delta x_2
\end{align*}

This matrix of partial derivatives is called the \emph{Jacobian}:
$$
\frac{\partial (u_1, u_2)}{\partial (x_1,x_2)} =
\frac{\partial (u_1(\vec x), u_2(\vec x))}{\partial (x_1,x_2)} =
\frac{\partial (\vec u(\vec x))}{\partial (\vec x)} =
\underbrace{
\rmat{
{\partial u_1}/{\partial x_1} & {\partial u_1}/{\partial x_2}\\[0.2cm]
{\partial u_2}/{\partial x_1} & {\partial u_2}/{\partial x_2}
}%rmat
}_{
\substack{
\text{matrix of}\\
\text{partial derivatives}
}%substack
}
$$

If we are given an area spanned by too small vectors (e.g. $\vec e_1$, $\vec e_2$)
we can compute the area of the corresponding parallelogram in $u(\Omega)$
by multiplying the original area by the Jacobian determinant.

\question{What if we want to transform from $u(\Omega)$ back to $\Omega$?}

\includegraphics[width=0.7\textwidth]{img/reverse.pdf}

We apply the inverse mapping. A very small rectangle in $u(\Omega)$ transforms into a parallelogram in $\Omega$.
This reverse transformation would require the inverse of the above matrix of partial derivatives.
The matrix of partial derivatives tells us how to transform infinitesimally small vectors back and forth.

%Transformation between the spaces for infinitesimally small vectors:

%$$
%\rmat{
%u_1(x_1) & u_1(x_2)\\
%u_2(x_1) & u_2(x_2)
%}
%\rmat{
%a\\
%b
%}
%=
%\rmat{
%u_1(x_1)\\
%u_2(x_1)
%}
%a
%+
%\rmat{
%u_1(x_2)\\
%u_2(x_2)
%}
%b
%$$

\question{Do we really have to compute the inverse of the matrix?}

- No, the determinant of the inverse matrix is 1 / det of the original matrix:
$$
\frac{1}{\left| \det \left( \frac{\partial \vec u}{\partial \vec x} \right) \right|} =
{\left| \det \left( \frac{\partial \vec x}{\partial \vec u} \right) \right|}
$$

\underline{Transformation between probability densities:}

Conservation of probability: The area represents the probability of the event,
transforming it into another space should nt cause any increase or decrease in the probability of the event.

Therefore, if we multiply the area of the rectangle in $\Omega$ by the pdf $f(\vec x)$ at $\vec x$, we get the probability of the parallelogram in $u(\Omega)$:

$$
\int_{\Omega} f(\vec{x}) \mathbf{d}\vec{x}
=\int_{u(\Omega)} f({\vec x(\vec u)}) \left|\det \frac{\partial \vec{x}(\vec{u})}{\partial \vec{u}} \right| \mathbf{d}\vec{u}
$$
$$
=\int_{u(\Omega)} f({\vec x(\vec u)}) \frac{1}{\left|\det \frac{\partial \vec{u}(\vec{x})}{\partial \vec{x}} \right|} \mathbf{d}\vec{u}.
$$

40 changes: 40 additions & 0 deletions notes/04_density-transform/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
all: slides notes clean
#all: handout

projname = tutorial
targetname = $(projname)_$(shell basename $(CURDIR))
compile = pdflatex
projnameS = $(projname).slides
projnameH = $(projname).handout
projnameA = $(projname).notes

slides: $(projname).slides.tex $(projname).tex
$(compile) $(projname).slides.tex
# bibtex $(projname).slides
# $(compile) --interaction=batchmode $(projname).slides.tex
# $(compile) --interaction=batchmode $(projname).slides.tex
mv $(projname).slides.pdf $(targetname).slides.pdf

handout: $(projname).handout.tex $(projname).tex
$(compile) $(projname).handout.tex
mv $(projname).handout.pdf $(targetname).handout.pdf

# Repeat compilation for the references to show up correctly
notes: $(projname).notes.tex $(projname).tex
$(compile) $(projname).notes.tex
# bibtex $(projname).notes
# $(compile) --interaction=batchmode $(projname).notes.tex
$(compile) --interaction=batchmode $(projname).notes.tex
mv $(projname).notes.pdf $(targetname).notes.pdf

clean: cleans cleanh cleana

cleans:
rm -f $(projnameS).aux $(projnameS).bbl $(projnameS).log $(projnameS).out $(projnameS).toc $(projnameS).lof $(projnameS).glo $(projnameS).glsdefs $(projnameS).idx $(projnameS).ilg $(projnameS).ind $(projnameS).loa $(projnameS).lot $(projnameS).loe $(projnameS).snm $(projnameS).nav

cleanh:
rm -f $(projnameH).aux $(projnameH).bbl $(projnameH).log $(projnameH).out $(projnameH).toc $(projnameH).lof $(projnameH).glo $(projnameH).glsdefs $(projnameH).idx $(projnameH).ilg $(projnameH).ind $(projnameH).loa $(projnameH).lot $(projnameH).loe $(projnameH).snm $(projnameH).nav

cleana:
rm -f $(projnameA).aux $(projnameA).bbl $(projnameA).log $(projnameA).out $(projnameA).toc $(projnameA).lof $(projnameA).glo $(projnameA).glsdefs $(projnameA).idx $(projnameA).ilg $(projnameA).ind $(projnameA).loa $(projnameA).lot $(projnameA).loe $(projnameA).snm $(projnameA).nav

Loading

0 comments on commit fe6594b

Please sign in to comment.