diff --git a/tex/review.tex b/tex/review.tex index 4204c55..2d0a931 100644 --- a/tex/review.tex +++ b/tex/review.tex @@ -5,12 +5,13 @@ % to compile a camera-ready version, add the [final] option, e.g.: \usepackage[final]{nips_2017} -\usepackage[utf8]{inputenc} -\usepackage[T1]{fontenc} -\usepackage[pagebackref=true]{hyperref} -\usepackage{url} -\usepackage{nicefrac} -\usepackage{microtype} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage[pagebackref=true]{hyperref} +\usepackage{url} +\usepackage{nicefrac} +\usepackage{microtype} +\usepackage[nolist,nohyperlinks]{acronym} %% Import general packages \usepackage{ @@ -43,7 +44,80 @@ \hypertarget{paper-summary}{% \section{Paper Summary}\label{paper-summary}} -\cite{gatys2016image} +In their arXiv preprint, \textit{A Neural Algorithm of Artistic Style}, +\cite{gatys2016image} prove a level of separability between the +\textit{content} of an image, and the \textit{style} that characterizes it. +Using a \ac{CNN} trained to classify images on the ImageNet benchmark, they +transfer the style of famous works of art, onto the content of arbitrary +photographs. To do so, they define loss functions between the activation maps +of various layers in the network that measure either content, or style loss. +Minimizing the joint loss between a white noise image $\textbf{x}$ and both a +content image $\textbf{p}$ and style image $\textbf{a}$ transfers the global +features of $\textbf{p}$, with the local styles of $\textbf{a}$ onto the white +noise image $\textbf{x}$. Put simply, their algorithm paints photographs using +arbitrary works of art as a palette. + +\subsection{Representation} + +\subsubsection{Content} + +This work relies heavily on the understanding of convolutional layers. As a +collection of image filters, each layer extracts unique features from its +input image. As such, \cite{gatys2016image} postulate that as the layer depth +increases, the network cares more about the \textit{content} of the image. +That is to say, deeper layers have a more specific understanding of what +composes an image, whereas the shallower layers primarily understand the image +as raw pixels. This leads to their definition of +\textit{content representation} as the activations from deep layers in the +network. + +To objectively measure the difference of content between two images, +\cite{gatys2016image} define a loss function $\mathcal{L}_{content}$. Given a +content image $\textbf{p}$, a white noise image $\textbf{x}$, and an arbitrary +layer $l$, the activations at $l$ for $\textbf{p}$ and $\textbf{x}$ are +defined as $P^l$ and $F^l$ respectively. The $\mathcal{L}_{content}$ loss then +is the squared euclidean distance between $P^l$ and $F^l$: + +% TODO: note the M_l and N_l variables in the above paragraph +\begin{equation} +\mathcal{L}_{content}(\mathbf{p}, \mathbf{x}, l) = +\frac{1}{2} \sum_{i=1}^{N_l}\sum_{j=1}^{M_l}{(F^l_{ij} - P^l_{ij})^2} +\end{equation} + +\subsubsection{Style} + +Much like the content representation, style representation relies on the +feature responses of particular layers in the \ac{CNN}. However, this +representation uses a different feature space. Converting each activation +map to a \textit{gram matrix}, allows the extraction of just the +\textit{texture} from a given image. It does so by computing the correlations +between different filters in an arbitrary convolutional layer $l$. More +simply, the gram matrix $G^l$ for an activation map $F^l$ is the inner product +of feature maps: + +\begin{equation} +G_{i j}^l = \sum_{k}^{M_l} F_{i k}^l F_{j k}^l +\end{equation} + +With a new feature space representation of raw texture, \cite{gatys2016image} +define an additional loss function $\mathcal{L}_{style}$ between an artwork +image $\textbf{a}$, and a white noise image $\textbf{x}$ for some +convolutional layer $l$. First, the activations at $l$ for $\textbf{a}$ and +$\textbf{x}$ are transformed to their respective gram matrices $A^l$, and +$G^l$. Then, much like the content loss, we define $\mathcal{L}_{style}$ as +the squared euclidean distance between the gram matrices $A^l$, and $G^l$: + +\begin{equation} +E_l = +\frac{1}{4 N_l^2 M_l^2} +\sum_{i=1}^{N_l}\sum_{j=1}^{M_l} +(G^l_{ij} - A^l_{ij})^2 +\end{equation} + +\begin{equation} +\mathcal{L}_{style}(\mathbf{a}, \mathbf{x}, l) = \sum_{l=0}^L w_l E_l +\end{equation} + \hypertarget{strengths}{% \section{Strengths}\label{strengths}} @@ -60,4 +134,9 @@ \section{Questions \& Answers}\label{qa}} \bibliographystyle{my-unsrtnat} \bibliography{references} +% a collection of Acronyms +\begin{acronym} +\acro{CNN}{Convolutional Neural Network} +\end{acronym} + \end{document}