Skip to content

Commit

Permalink
first release.
Browse files Browse the repository at this point in the history
  • Loading branch information
ivg committed Oct 2, 2015
1 parent a3472ef commit ad80a6a
Show file tree
Hide file tree
Showing 2 changed files with 111 additions and 42 deletions.
42 changes: 26 additions & 16 deletions bil.ott
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ grammar
stmt,s :: stmt_ ::=
| var := exp :: :: move
{{ com -- assign [[exp]] to [[var]] }}
| jmp exp :: :: jump
{{ com -- transfer control to given [[exp]] }}
| jmp e :: :: jump
{{ com -- transfer control to a given address [[e]] }}
| cpuexn ( num ) :: :: cpuexn
{{ com -- interrupt CPU with a given interrupt [[num]] }}
| special ( string ) :: :: special
Expand Down Expand Up @@ -71,9 +71,9 @@ grammar
| num : nat :: S :: word
| 1 : nat :: S :: one
| true :: S :: true
{{ com -- sugar for $1:1$ }}
{{ com -- sugar for 1:1 }}
| false :: S :: false
{{ com -- sugar for $0:1$ }}
{{ com -- sugar for 0:1 }}
| w1 .+ w2 :: S :: plus
{{ tex [[w1]] \stackrel{bv} + [[w2]] }}
{{ com -- plus }}
Expand Down Expand Up @@ -364,9 +364,10 @@ defns program :: '' ::=
defn delta , w , var ~> delta' , w' , var' :: :: step :: '' by

delta,w,var |-> {addr=w1; size=w2; code=bil}
delta,w1 |- bil ~> delta',w',{}
delta, w1.+w2 |- bil ~> delta',w3,{}
---------------------------------------------------------- :: step
delta,w,var ~> delta',w'.+w2,var
delta,w,var ~> delta',w3,var


defn delta,w,var |-> insn :: :: decode :: '' by

Expand Down Expand Up @@ -396,7 +397,6 @@ defns reduce_exp :: '' ::=
delta[var' <- v'] |- var ~> v


{{com if variable is not found, then it evaluates to unknown value}}

-------------------------------------------- :: var_unknown
[] |- id:type ~> unknown [str] : type
Expand All @@ -405,12 +405,6 @@ defns reduce_exp :: '' ::=
% LOAD %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

{{ com Load expression is evaluated semi-lazily, first we evaluate an
address, then memory expression. Once memory expression is reduced to
a value, we start to reduce a symbolic memory object, until we end up
with a non-reducible value. The memory object is a sequence of nested
store expressions. Whenever we hit a storage to an unknown address we
stop reduction and reduce the whole load expression to an unknown val}}

delta |- e2 ~> v2
------------------------------------- :: load_addr
Expand All @@ -423,7 +417,7 @@ defns reduce_exp :: '' ::=


----------------------------------------------------------- :: load_byte
delta |- (v1 with [w,ed]:8 <- num:8)[w,ed]:8 ~> num:8
delta |- (v1 with [w,ed]:8 <- num:8)[w,ed']:8 ~> num:8

------------------------------------------------------------------------------ :: load_un_addr
delta |- (v1 with [unknown[str]:t,ed]:8 <- v2)[v3,ed]:8 ~> unknown[str]:imm<8>
Expand All @@ -443,7 +437,7 @@ defns reduce_exp :: '' ::=

succ w = w'
--------------------------------------------------- :: load_word_el
delta |- v[w,el]:sz ~> v[w',el]:(sz-8) @ v[w,el]:8
delta |- v[w,el]:sz ~> v[w',el]:(sz-8) @ v[w,be]:8



Expand All @@ -468,6 +462,22 @@ defns reduce_exp :: '' ::=
------------------------------------------------------------------------- :: store_mem
delta |- e with [v1,ed]:sz <- val ~> v with [v1,ed] : sz <- val

succ w = w'
delta |- high:8[w] ~> w1
delta |- low:(sz-8)[w] ~> w2
delta |- v with [w,be]:8 <- w1 ~> v'
-------------------------------------------------------------------- :: store_word_be
delta |- v with [w,be]:sz <- val ~> v' with [w', be]:(sz-8) <- w2

succ w = w'
delta |- low:8[w] ~> w1
delta |- high:(sz-8)[w] ~> w2
delta |- v with [w,be]:8 <- w1 ~> v'
-------------------------------------------------------------------- :: store_word_el
delta |- v with [w,el]:sz <- val ~> v' with [w', el]:(sz-8) <- w2




delta |- e1 ~> v
------------------------------------------------ :: let_head
Expand Down Expand Up @@ -648,7 +658,7 @@ defns reduce_stmt :: '' ::=

delta |- e ~> w'
---------------------------------- :: jmp
delta,w |- jmp e ~> delta, w .+ w'
delta,w |- jmp e ~> delta, w'


------------------------------------ :: cpuexn
Expand Down
111 changes: 85 additions & 26 deletions bil.tex
Original file line number Diff line number Diff line change
Expand Up @@ -22,24 +22,22 @@ \section{Introduction}
This document describes the syntax and semantics of BAP Instruction
Language. The language is used to represent a semantics of machine
instructions. Each machine instruction is represented by a BIL program
that tries to capture all side effect of the instruction.


that captures side effect of the instruction.

\section{Syntax}
\label{sec:syntax}

\subsection{Metavariables}
\label{sec:meta}

We define a small set of metavariables that are used to denote string
and numeric literals and subscripts:
We define a small set of metavariables that are used to denote
subscripts, numerals and string literals:

\ottmetavars

\subsection{BIL syntax}

BIL program is reperesented as a sequence of BIL statements. Each
BIL program is reperesented as a sequence of statements. Each
statement performs some side-effectful computation.

\ottgrammartabular{
Expand All @@ -57,11 +55,6 @@ \subsection{BIL syntax}

\ottgrammartabular{
\ottexp\ottinterrule
\ottvar\ottinterrule
\ottbop\ottinterrule
\ottuop\ottinterrule
\ottendian\ottinterrule
\ottcast\ottinterrule
}

\ottgrammartabular{
Expand Down Expand Up @@ -92,10 +85,10 @@ \subsection{Bitvector syntax}
value and size. Operations \verb|ext| and \verb|exts| performs
extract/extend operation. The former is unsigned (i.e., it extends
with zeros), the latter is signed. This operation extracts bits from a
bitvector starting from $\mathit{hi}$ and ending $\mathit{lo}$ bit
(both ends including). If $\mathit{hi}$ is greater than the bitwidth
of the bitvector, then it is extended with zeros (for \verb|ext|
operation) or with a sign bit (for \verb|exts|) operation.
bitvector starting from $\mathit{hi}$ and ending with $\mathit{lo}$
bit (both ends included). If $\mathit{hi}$ is greater than the
bitwidth of the bitvector, then it is extended with zeros (for
\verb|ext| operation) or with a sign bit (for \verb|exts|) operation.

\ottgrammartabular{
\ottword\ottinterrule
Expand All @@ -108,8 +101,8 @@ \subsection{Value syntax}
expressions that are not reducible.

We have three kinds of values --- immediates, represented as
bitvectors; unknown values and storages (aka memories) represented
symbolically as a list of assignments:
bitvectors; unknown values and storages (memories in BIL parlance),
represented symbolically as a list of assignments:

\ottgrammartabular{
\ottval\ottinterrule
Expand All @@ -126,7 +119,7 @@ \subsection{Formula syntax}
$\Delta$ context is represented as list of pairs. We also add a small
set of operations over natural numbers, like comparison and
arithmetics. Natural numbers are mostly used to reason about sizes of
bitvectors, that why they are often referred as $\mathit{sz}$.
bitvectors, that's why they are often referred as $\mathit{sz}$.

We also add syntax for equality comparison for values and variables.

Expand All @@ -147,15 +140,16 @@ \subsection{Instruction syntax}
\label{sec:insn}

To reason about the whole program we introduce a syntax for
instruction. An instruction is a binary string with length
$\mathit{w_2}$, that was read by a decoder from an address
$\mathit{w_1}$. The semantics of an instruction is described by a
$\mathit{bil}$ program.
instruction. An instruction is a binary sequence of $\mathit{w_2}$
bytes, that was read by a decoder from an address $\mathit{w_1}$. The
semantics of an instruction is described by the $\mathit{bil}$ program.

\ottgrammartabular{
\ottinsn\ottinterrule
}

\clearpage

\section{Typing}
\label{sec:typing}

Expand All @@ -167,24 +161,89 @@ \section{Typing}

\ottdefnstypingXXexp

\clearpage


\section{Operational semantics}

\subsection{Model of a program}

Program is coinductively defined as an infinite stream of program
states, produced by a step rule. Each state is represented with a
triplet $\Delta, w, var$, where $\Delta$ is a mapping from variables
triplet $(\Delta, w, var)$, where $\Delta$ is a mapping from variables
to values, $w$ is a program counter, and $var$ is a variable
denoting currently active memory.

The \verb|step| rule defines how a machine instruction is
evaluated. We use ``magic'' function \verb|decode| that fetches
instruction from memory and decodes it to a BIL program.
evaluated. We use ``magic'' rule \verb|decode| that fetches
instructions from the memory and decodes them to a BIL program.

A program counter is updated after each instruction.
The BIL code is evaluated using reduction rules of statements (see
section \ref{sec:sema:stmt}). Then the program counter is updated with
the $w_3$, that initially points to a byte following current instruction.

\ottdefnsprogram

\section{Semantics of statements}
\label{sec:sema:stmt}

The reduction rule defines transformation of a state for each
statement. The state of the reduction rule consists of a pair
$(\Delta,w)$, where $\Delta$ is a mapping from variables to values and
$w$ is an address of a next instruction.

Two statements affect the state: \verb|Move| statement introduces new
$var \leftarrow v$ binding in $\Delta$, and \verb|Jmp| affects
program counter.

The \verb|if| and \verb|while| instructions introduce local control
flow.

There is no special semantics associated with \verb|special| and
\verb|cpuexn| statements.

\ottdefnsreduceXXstmt


\section {Semantics of expressions}
\label{sec:sema:exp}

This section describes a small step operational semantics for
expressions. A symbolic formula $\Delta \vdash e \rightarrow e' $
defines a step of transformation from expression $e$ to an expression
$e'$ under given context $\Delta$.

A well formed (well typed) expression evaluates to a value expression,
that is syntactic subset of expression grammar (see
section \ref{sec:values}).

A value can be either an immediate, represented by a bitvector, a
unknown value, or a memory storage.

A memory storage is represented symbolically as a sequence of
storages to the originally undefined memory. Each storage
operation of size greater than 8 bits is desugared into a sequence of
8 bit storages in a big endian order.

A load operation will first reduce all sub expressions of a memory
object to values and then recursively destruct the object until one of
the following conditions is met:


\begin{description}
\item[load-byte:] if the memory object is a storage of a \verb|value|
to an immediate (known) address that we're trying to load then the
load expression is reduced to \verb|value|.
\item[load-un-memory:] if the memory object is an \verb|unknown| value,
then the load expression evaluates to \verb|unknown|.
\item[load-un-addr:] if the memory object is a storage to
\verb|unknown| value address then load expression evaluates to
\verb|unknown|.
\end{description}

\ottdefnsreduceXXexp

\ottdefnshelpers


\end{document}

0 comments on commit ad80a6a

Please sign in to comment.