fixed spacing and page-breaks

FreakyBytes · Mar 21, 2018 · 906c99d · 906c99d
1 parent 59a6bf0
commit 906c99d
Show file tree

Hide file tree

Showing 16 changed files with 47 additions and 36 deletions.
diff --git a/tex/chapter/200-background-network.tex b/tex/chapter/200-background-network.tex
@@ -82,7 +82,7 @@ \subsubsection{State-based Intrusion Detection Systems}
 Unfortunately, the high complexity of tracking the state of each and every connection while simultaneously verifying against a profile is a major drawback of this approach. It requires a lot of processing resources and memory capacity.
 Further, this method is only able to detect violations against a protocol's fundamental behaviour.
 Additionally, the state-based analysis works the best with stateful, connection-oriented protocols, this does not include protocols relying on short, broadcasted, and self-containing commands as they can be found in \gls{bas}. \parencite[p.~306]{Whitman2009}
-	
+
 \subsubsection{Anomaly-based Intrusion Detection Systems (A-IDS)}
 \label{sec:background:network:ids:anomaly}
 
@@ -156,7 +156,8 @@ \subsection{Flow Monitoring}
 Common protocols used for flow monitoring are \gls{netflow} \parencite{Claise2004} and \gls{ipfix} \parencite{Claise2013}.
 %\todo{not predominantly used for security, but network congestion avoidance/mitigate.}
 
-\begin{figure}
+\newpage
+\begin{figure}[h]
 	\centering
 	\includegraphics[width=\textwidth]{figures/200-netflow-architecture.pdf}
 	\caption[Simplified flow monitoring system architecture]{Simplified flow monitoring system architecture. \parencite[cf.][]{Hofstede2014} \todo{adjust font size}}
@@ -245,7 +246,7 @@ \section{Anomaly, Outlier, and Novelty Detection}
 It requires also pre-labelled data, but compared to the second type it only works with \emph{normal} labelled data points. Consequently the full gamut of \emph{normality} is required to train a precise model. On the other hand no \emph{outlying} data points are required for training, which is highly beneficial in certain, data-sparse, scenarios, when abnormal data is difficult to obtain.
 However, the training data must not contain outliers, otherwise they will be assumed to be a part of \emph{normality}.
 Generally the aim of this approach is to establish a tight boundary around \emph{normality} and is therefore suitable for static as well as dynamic data.
-If a new data point lies outside of the this boundary it is considered an \emph{outlier}, if not it is part of the \emph{normal} data.
+If a new data point lies outside of the this boundary it is considered an \emph{outlier}, if not it is part of the \emph{normal} data. \parencite{Hodge2004}
 
 % ------------------------------------------------------------------------------
 \newpage
@@ -310,6 +311,7 @@ \subsubsection{Local Outlier Factor}
 \newpage
 \begin{wrapfigure}{l}{0.5\textwidth}
 	\centering
+	\vspace{-5pt}
 	\includegraphics[width=0.5\textwidth,trim={12mm 5mm 15mm 10mm},keepaspectratio,clip]{figures/200-background-lof.pdf}
 	\caption[Example visualisation of the LOF]{Visualisation of the \gls{lof} in a two dimensional vector space. \emph{Green}: trainings data, \emph{Red}: outlier from test data, \emph{White}: inlier from test data. Background indicates the calculated \emph{LOF} value.}
 	\label{fig:background:network:novelty:lof}
@@ -320,6 +322,7 @@ \subsubsection{Local Outlier Factor}
 An example of the \gls{lof} calculation in a two dimensional vector space containing random test data is illustrated in Figure~\ref{fig:background:network:novelty:lof}.
 The background in this figure indicates the \gls{lof} values for every possible point, whereby a blue background indicates values outside of the threshold for \emph{normality} and a red one marks values within those borders.
 
+\vspace{8pt}
 Moreover, the ability of \gls{lof} to work on unclean data and account for different densities in clusters through locality, makes it a good fit for \gls{ids} solutions, as \textcite{Lazarevic2003} shows.
 Also \textcite{Zanero2004} conducted an experiment on using unsupervised learning algorithms for intrusion detection. They found, that proximity-based approaches like \gls{knn} need to address the problem of locality for some observations, in order to archive more precise predictions. The \gls{lof}, as presented by \textcite{Breunig2000}, focusses on this aspect.
 

diff --git a/tex/chapter/300-prior-work.tex b/tex/chapter/300-prior-work.tex
@@ -41,11 +41,13 @@
 The part of their research regarding flow analysis focuses on \gls{bac} over \gls{ip} and employs a simple volumetric approach (measuring the throughput) and calculates the entropy (cf. Section~\ref{sec:background:network:novelty:entropy}) of the flow data.
 As a result \textcite{Celeda2012} were able to detect and identify several attacks and a bot net in \gls{bac} installations, based on flow data.
 
-\begin{wrapfigure}{r}{0.5\textwidth}
+\begin{wrapfigure}{r}{0.6\textwidth}
+	\vspace{-18pt}
 	\centering
-	\includegraphics[width=0.45\textwidth,keepaspectratio]{figures/300-Pan2014-architecture.png}
+	\includegraphics[width=0.6\textwidth,keepaspectratio]{figures/300-Pan2014-architecture.png}
 	\caption[Anomaly detection framework architecture by Pan, Hairi, and Al-Nashif]{Anomaly detection framework architecture by \textcite{Pan2014}}
 	\label{fig:background:prior-work:pan-architecture}
+	\vspace{-20pt}
 \end{wrapfigure}
 
 Further investigations in \gls{bac} security are published by \textcite{Pan2014}. They present \enquote{a framework for a rule based anomaly detection} system in \gls{bas}, using \gls{bac} as example.
@@ -57,7 +59,8 @@
 The algorithm, first proposed by \textcite{Cohen1995}, is used in a two-class version on pre-labelled data and was applied by \textcite{Pan2014} to more than 7000 data points resulting in baseline model with 20 rules.
 These rules are consequently used during the detection phase and applied on every flow-frame within a time window to improve the detection rate.
 In case the rule framework detects an attack, the malicious packet flow is handed over to the attack classification module, which uses a decision table to classify the attack based on three attributes: the targeted protocol layer, the attack motivation, and the victim device.
-The consideration of the targeted protocol layer accounts for different kinds of vulnerabilities within the protocol stack. \textcite{Pan2014} specifically focus on \gls{bac}'s \gls{apdu} and \gls{npdu}, which corresponds to the application and network layer.
+The consideration of the targeted protocol layer accounts for different kinds of vulnerabilities within the protocol stack. \textcite{Pan2014} specifically focus on \gls{bac}'s \gls{apdu} and \gls{npdu}, which correspond to the application and network layer respectively.
+
 Further, the attack classifier accounts for different attack motivations. This includes \emph{reconnaissance attacks}, which aim to collect information about the network and its traffic, \emph{device access attacks}, representing attempts to access devices without permission, and finally \emph{\gls{dos} attacks}, where the network and devices are saturated with useless commands to disturb normal operation.
 The last attribute for attack classification is the targeted device, which uses domain knowledge about the network to assign roles to devices.
 Finally, the classifications from the baseline model and the attack classifier are passed to the action handler module, which is designed to automatically trigger suitable mitigating measures. This includes extracting useful information, dropping packets or suspending connections based on a severity level of the attack, and producing an understandable alert message comprised of the prior gathered information.

diff --git a/tex/chapter/400-methods.tex b/tex/chapter/400-methods.tex
@@ -64,7 +64,7 @@
 Finally, the last category describes \emph{reconnaissance attacks}. These attacks conclude unauthorised detection and mapping of the network and its behaviour. Here only active sweeping approaches are considered, where an attacker probes each individual device in an address range.
 Passive eavesdropping is not considered as it can not be detected on higher protocol levels due to the bus character of the network. (cf. Section~\ref{sec:background:bas:knx:topo})
 
-\section{Generating a Test Dataset including malicious activities}
+\section[Generating a Test Dataset including malicious activities]{Generating a Test Dataset including\\ malicious activities}
 \label{sec:methods:gen-test}
 
 As \gls{bas} automation systems are only seldom considered within threat models, monitoring systems are rarely installed, if at all.
@@ -81,7 +81,7 @@ \section{Generating a Test Dataset including malicious activities}
 Second, a \gls{dos} attack is performed starting at 2017-02-13 09:00 and targeting the entire line \code{3.4}~. The attack is performed in three bursts of 15 minutes with five minutes break in between. In the \gls{dos} attack a flood of \code{A\_Restart} telegrams with \code{SYSTEM} priority is send, which in reality would cause all targeted devices to restart continuously. Additionally this blocks all other traffic, since the \code{SYSTEM} priority is the highest specified. During the attack the telegrams were injected with a maximum of \(500 \ \sfrac{telegrams}{min}\).
 
 As third attack scenario a device scan over the entire possible \gls{knx} address space was performed, starting from 2017-02-13 21:00.
-To determine if a device is present, the management \gls{apci}\break\code{A\_DEVICE\_DESCRIPTOR\_READ} is send to all addresses. Every \gls{knx} device is required to implement certain management routines, among them the query for the device descriptor. \code{A\_DEVICE\_DESCRIPTOR\_READ} is ideal since the requesting telegram does not require any parameters and the response only contains two bytes of additional payload. By choosing a request which is adds as little overhead as possible, the throughput is increased and effectively reducing the time required for the scan. \parencite[cf.][p.~46]{DIN_EN_50090-4-1}
+To determine if a device is present, the management \gls{apci} \code{A\_DEVICE\_DESCRIPTOR\_READ} is send to all addresses. Every \gls{knx} device is required to implement certain management routines, among them the query for the device descriptor. \code{A\_DEVICE\_DESCRIPTOR\_READ} is ideal since the requesting telegram does not require any parameters and the response only contains two bytes of additional payload. By choosing a request which is adds as little overhead as possible, the throughput is increased and effectively reducing the time required for the scan. \parencite[cf.][p.~46]{DIN_EN_50090-4-1}
 Equal to the \gls{dos} attack, the telegrams are injected with a maximum of \(500 \ \sfrac{telegrams}{min}\).
 
 Finally, two new rogue device are introduced with the addresses \code{3.6.26} and \code{3.5.18} during the entire day of 2017-02-14.
@@ -90,12 +90,12 @@ \section{Generating a Test Dataset including malicious activities}
 
 The scripts to generate the malicious traffic and the datasets itself can be found on the data disk in Appendix~\ref{app:disk}.
 
+\newpage
 \section{Evaluating the Detection Results}
 \label{sec:methods:eval}
 
 For each crafted attack, the different anomaly detection algorithms are benchmarked with regards to their ability to detect those.
 This ability is classified by following criteria:
-
 \begin{enumerate}
 	\item General ability to detect the attack
 	\item Differentiation from background noise of the detection results

diff --git a/tex/chapter/500-concept.tex b/tex/chapter/500-concept.tex
@@ -87,7 +87,7 @@ \section{Monitoring Pipeline}
 \begin{figure}[h]
 	\centering
 	\includegraphics[width=\textwidth]{figures/300-concept-architecture.pdf}
-	\caption[Pipeline Architecture]{Architecture of the monitoring pipeline \todo{explain symbols.} \todo{information flow back to agents, for time sync.}}
+	\caption[Pipeline Architecture]{Architecture of the monitoring pipeline concept. \todo{explain symbols.} \todo{information flow back to agents, for time sync.}}
 	\label{fig:concept:architecture}
 \end{figure}
 
@@ -119,7 +119,7 @@ \section{Monitoring Pipeline}
 \begin{figure}
 	\centering
 	\includegraphics[width=\textwidth]{figures/500-knx-demo-topo-with-agents.pdf}
-	\caption[KNX network topology with Agents and Collector]{Exemplary logical topology of a \gls{knx} network with deployed Agents and one Collector.}
+	\caption[KNX network topology with Agents and Collector]{Exemplary logical topology of a \gls{knx} network with multiple deployed Agents and one Collector.}
 	\label{fig:concept:network}
 \end{figure}
 
@@ -312,6 +312,7 @@ \subsection{Generating the Feature Vector}
 
 For the construction of a feature vector normally only the features with the higher variance would be chosen, since the fields seems to stay constant and therefore do not add any additional information.
 However, in anomaly detection also normally stable features could be of interest, since a change in them would most certainly indicate an anomaly.
+
 As a compromise between both points of views the following fields were selected as feature vector dimensions:
 
 \begin{itemize}
@@ -424,13 +425,6 @@ \subsection{The Support Vector Machine Analyser}
 \subsection{The Entropy Analyser}
 \label{sec:concept:anal:entropy}
 
-\begin{figure}[h]
-	\centering
-	\includegraphics[]{figures/300-time-slots.pdf}
-	\caption{Example of shifted time slots used in the entropy analyser module.}
-	\label{fig:concept:time-slots}
-\end{figure}
-
 \begin{comment}
 \begin{itemize}
 	\item cf.~Section~\ref{sec:background:network:novelty:entropy}
@@ -459,6 +453,14 @@ \subsection{The Entropy Analyser}
 The base-model is generated during a dedicated training phase and contains a \glsfirst{pmf} for every dimension of the feature vector. Only the time dimension is excluded, since it is continuous.
 Seasonal sensitivity is instead achieved, as described in Section~\ref{sec:background:network:features:time}, by diving one period into multiple time chunks. Each of these chunks equates to one sub-model, which represents the activity during this time slot. To reduce hard breaks at the end of each chunk, another set of chunks is used shifted by half the chunk length. Hence every point of time with in the season period is within two chunks. (see Figure~\ref{fig:concept:time-slots})
 
+\begin{figure}[h]
+	\centering
+	\includegraphics[]{figures/300-time-slots.pdf}
+	\caption{Example of shifted time slots used in the entropy analyser module.}
+	\label{fig:concept:time-slots}
+\end{figure}
+
+\newpage
 Further, two types of baseline models will be trained: One general world-view and many Agent specific models.
 The world-view model is used to identify general abnormal behaviour in the network and can be seen as a general-purpose, less sensitive model.
 The Agent specific models, on the other hand, are specialised on the traffic and behaviour, unique for one Agent. Therefore, they are able to identify local anomalies, which might be completely normal when seen by another Agent.
@@ -473,6 +475,7 @@ \subsection{The Entropy Analyser}
 \textcite{Toshniwal2014} who proposed this concept initially, are using a fixed amount of clusters, which are used to try-fit new observations into them. If a new observation does not fit into any of these clusters it is considered an outlier. Whereby change in the calculated entropy is used to decide whether a observations fits or not.
 Further, \textcite{Toshniwal2014} only keep a sliding window of data in what would be the baseline model. This comes with the earlier described disadvantage of continuous training, that an attacker can alternate the modelled \emph{normality} by slowly injecting malicious packets.
 
+\newpage
 \section{Monitoring and Alerting}
 \label{sec:concept:mon}
 

diff --git a/tex/chapter/600-prototype-implementation.tex b/tex/chapter/600-prototype-implementation.tex
@@ -90,6 +90,7 @@ \section{The Collector Module}
 
 In the case one Agent's window is never received by the Collector, it waits a configurable timeout of about $60$ seconds before relaying this window anyway. This ensures that a time slot is analyses even when an Agent fails, regardless of the failure mode. As this is an anomaly, which can be easily queried in the monitoring and alerting system, it is also detected there and consequently not handled in the Collector apart from a warning in the log.
 
+\newpage
 \section{The Agent Simulator}
 \label{sec:impl:agent}
 
@@ -128,6 +129,7 @@ \section{The Agent Simulator}
 The statistical window is not based on the feature vector (see Section~\ref{sec:concept:anal:feature-vector}). Instead the appearance of certain features are count, e.g. the source address \code{1.1.15} appeared $15$ times.
 Every further processing, like vectorising and normalising is done in the individual Agent modules.
 
+\newpage
 \section{The Analyser Base Module}
 \label{sec:impl:base}
 
@@ -185,7 +187,6 @@ \section{The Address Analyser Module}
 
 During the operational phase, the model containing all sets is loaded. Following all addresses in incoming windows are compared to their respective address set.
 If an unknown address is discovered a counter will be increased. After a window is processed measurement containing these counters per Agent is pushed to the \gls{influxdb}, which contains:
-
 \begin{itemize}
 	\item amount of unique unknown source addresses
 	\item amount of unique unknown destination addresses
@@ -206,9 +207,9 @@ \section{Converting a Window into a Feature Vector}
 Within the proposed concept this means, that a statistical window has to be transformed into a numerical vector representation.
 The basics principle of this is described in Section~\ref{sec:concept:anal:feature-vector}, whereby this section focusses on the details how each feature is encoded.
 It is to note, that a window contains discrete or categorical statistical data describing a set of events, instead of just a single event, meaning the window produced by an Agent contains a number of occurrences over the period of the window.
+
 The feature vector, however, will contain a normalised excerpt of the events' features, whereby each feature is encoded as one or more dimensions.
 Mentioned excerpt of fields contains a set of low level, application independent, and easy to measure fields: 
-
 \begin{itemize}
 	\item seconds of the week
 	\item source address
@@ -226,6 +227,7 @@ \section{Converting a Window into a Feature Vector}
 \[
 \dfrac{2 \cdot \begin{pmatrix}1 \\ 0\end{pmatrix} + 3 \cdot \begin{pmatrix}0 \\ 1\end{pmatrix}}{2 + 3} = \begin{pmatrix}0.4 \\ 0.6\end{pmatrix}
 \]
+
 The result is a vector encoding the probability of each bit to occur within this specific window.
 Goal was to reduce the amount of dimension, which would have been necessary when the entirety of possible addresses is mapped into the feature space -- each address using one dimension to encode the possibility of occurrence.
 For the two address types, this would result in $2 \cdot 2^{16} = 2 \cdot 65536 = 131072$ dimensions. By applying this adapted form of the hashing trick (cf. Section~\ref{sec:background:network:features:hashing}) it was possible to reduce it to $2 \cdot 16 = 32$ dimensions.