forked from BinaryAnalysisPlatform/bil
-
Notifications
You must be signed in to change notification settings - Fork 0
/
bil.tex
307 lines (224 loc) · 9.13 KB
/
bil.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
\documentclass[11pt]{article}
\usepackage{amsmath,amssymb}
\usepackage{supertabular}
\usepackage{geometry}
\usepackage{ifthen}
\usepackage{alltt}%hack
\geometry{a4paper,dvips,twoside,left=22.5mm,right=22.5mm,top=20mm,bottom=30mm}
\usepackage{color}
\begin{document}
\include{ott}
\title{A formal specification for BIL: BIL Instruction Language}
\maketitle
\tableofcontents
\clearpage
\section{Introduction}
\label{sec:intro}
This document describes the syntax and semantics of BAP Instruction
Language. The language is used to represent a semantics of machine
instructions. Each machine instruction is represented by a BIL program
that captures side effect of the instruction.
\section{Syntax}
\label{sec:syntax}
\subsection{Metavariables}
\label{sec:meta}
We define a small set of metavariables that are used to denote
subscripts, numerals and string literals:
\ottmetavars
\subsection{BIL syntax}
BIL program is reperesented as a sequence of statements. Each
statement performs some side-effectful computation.
\ottgrammartabular{
\ottbil\ottinterrule
}
\ottgrammartabular{
\ottstmt\ottinterrule
}
BIL expressions are side-effect free. Expressions include a usual set
of operations on bitvectors, like arithmetic operations and converting
bitvectors of one size to bitvectors of another size (casting in BIL
parlance). We write $[e_1/var]e_2$ for the capture-avoiding
substitution of $e_1$ for free occurances of $var$ in $e_2$
\ottgrammartabular{
\ottexp\ottinterrule
}
\ottgrammartabular{
\ottvar\ottinterrule
}
\ottgrammartabular{
\ottbop\ottinterrule
\ottaop\ottinterrule
\ottlop\ottinterrule
\ottuop\ottinterrule
\ottendian\ottinterrule
\ottcast\ottinterrule
}
The type system of BIL consists of two type families - immediate
values, indexed by a bitwidth, and storagies (aka memories), indexed
with address bitwidth and values bitwidth.
\ottgrammartabular{
\otttype\ottinterrule
}
\subsection{Bitvector syntax}
\label{sec:bitvector}
We define a type of {\em words}, which are concrete bitvectors
represented by a pair of value and size. Many operations on words are
required by the semantics and are listed below.
Operations marked with \verb|sbv| are signed. All other operations are
unsigned (if it does matter). Operations \verb|ext| and \verb|exts|
performs extract/extend operation. The former is unsigned (i.e., it
extends with zeros), the latter is signed. This operation extracts
bits from a bitvector starting from $\mathit{hi}$ and ending with
$\mathit{lo}$ bit (both ends included). If $\mathit{hi}$ is greater
than the bitwidth of the bitvector, then it is extended with zeros
(for \verb|ext| operation) or with a sign bit (for \verb|exts|)
operation.
\ottgrammartabular{
\ottword\ottinterrule
}
\subsection{Value syntax}
\label{sec:values}
Values are syntactic subset of expressions. They are used to represent
expressions that are not reducible.
We have three kinds of values --- immediates, represented as
bitvectors; unknown values and storages (memories in BIL parlance),
represented symbolically as a list of assignments:
\ottgrammartabular{
\ottval\ottinterrule
}
\subsection{Context syntax}
\label{sec:context}
Contexts are used in the typing judgments to specify the types of all
variables. While each variable is annotated with its type, the context
ensures that all uses of a given variable have the same type.
\ottgrammartabular{
\ottgamma\ottinterrule
}
\subsection{Formula syntax}
\label{sec:formula}
The following syntax is used to specify symbolic formulas in premises
of judgments.
We use $\Delta$ to denote set of bindings of variables to values. The
$\Delta$ context is represented as list of pairs. We write
$(var,v) \in \Delta$ to indicate that the value $v$ is the right-most
binding of $var$ in $\Delta$. Additionally, we write
$\mathsf{dom}(\Delta)$ for $\Delta$'s {\em domain} (the set of
variables for which it contains values).
We also add a small set of operations over natural numbers, like
comparison and arithmetics. Natural numbers are mostly used to reason
about sizes of bitvectors, that's why they are often referred as
$\mathit{sz}$.
We also add syntax for equality comparison for values and variables.
\ottgrammartabular{
\ottdelta\ottinterrule
}
\ottgrammartabular{
\ottformula\ottinterrule
}
\ottgrammartabular{
\ottnat\ottinterrule
}
\subsection{Instruction syntax}
\label{sec:insn}
To reason about the whole program we introduce a syntax for
instruction. An instruction is a binary sequence of $\mathit{w_2}$
bytes, that was read by a decoder from an address $\mathit{w_1}$. The
semantics of an instruction is described by the $\mathit{bil}$ program.
\ottgrammartabular{
\ottinsn\ottinterrule
}
\clearpage
\section{Typing}
\label{sec:typing}
This section defines typing rules for BIL programs. We define four
judgements: $[[G |- bil is ok]]$ for programs, $[[G |- s is ok]]$ for
statements, $[[G |- e :: t]]$ for expressions, and $[[G is ok]]$ for
contexts.
BIL statement-level variables represent global state, and are
implicitly declared at their first use. While variables carry their
type with them, it is still necessary to track them in a context
during type checking. This is required to rule out programs like:
\begin{verbatim}
if (foo) then {x:imm<1> = 0} else {x:imm<32> = 42};
bar
\end{verbatim}
Such a program would leave the type associated with {\tt x} unclear in
{\tt bar}. This is ruled out because the typing rules are set up such
that $[[G |- bil is ok]]$ implies $[[G is ok]]$, which requires that
each variable has a single type.
\ottdefnstypingXXstmt
\ottdefnstypingXXexp
\ottdefnstypingXXtype
\ottdefnstypingXXgamma
\clearpage
\section{Operational semantics}
\subsection{Model of a program}
Program is coinductively defined as an infinite stream of program
states, produced by a step rule. Each state is represented with a
triplet $(\Delta, w, var)$, where $\Delta$ is a mapping from variables
to values, $w$ is a program counter, and $var$ is a variable
denoting currently active memory.
The \verb|step| rule defines how a machine instruction is
evaluated. We use ``magic'' rule \verb|decode| that fetches
instructions from the memory and decodes them to a BIL program.
The BIL code is evaluated using reduction rules of statements (see
section \ref{sec:sema:stmt}). Then the program counter is updated with
the $w_3$, that initially points to a byte following current instruction.
\ottdefnsprogram
\section{Semantics of statements}
\label{sec:sema:stmt}
The reduction rule defines transformation of a state for each
statement. The state of the reduction rule consists of a pair
$(\Delta,w)$, where $\Delta$ is a mapping from variables to values and
$w$ is an address of a next instruction.
Two statements affect the state: \verb|Move| statement introduces new
$var \leftarrow v$ binding in $\Delta$, and \verb|Jmp| affects
program counter.
The \verb|if| and \verb|while| instructions introduce local control
flow.
There is no special semantics associated with \verb|special| and
\verb|cpuexn| statements.
\ottdefnsreduceXXstmt
\section {Semantics of expressions}
\label{sec:sema:exp}
This section describes a small step operational semantics for
expressions. A symbolic formula $\Delta \vdash e \rightarrow e' $
defines a step of transformation from expression $e$ to an expression
$e'$ under given context $\Delta$.
A well formed (well typed) expression evaluates to a value expression,
that is syntactic subset of expression grammar (see
section \ref{sec:values}).
A value can be either an immediate, represented by a bitvector, a
unknown value, or a memory storage.
A memory storage is represented symbolically as a sequence of
storages to the originally undefined memory. Each storage
operation of size greater than 8 bits is desugared into a sequence of
8 bit storages in a big endian order.
A load operation will first reduce all sub expressions of a memory
object to values and then recursively destruct the object until one of
the following conditions is met:
\begin{description}
\item[load-byte:] if the memory object is a storage of a \verb|value|
to an immediate (known) address that we're trying to load then the
load expression is reduced to \verb|value|.
\item[load-un-memory:] if the memory object is an \verb|unknown| value,
then the load expression evaluates to \verb|unknown|.
\item[load-un-addr:] if the memory object is a storage to
\verb|unknown| value address then load expression evaluates to
\verb|unknown|.
\end{description}
This section also defines $\Delta \vdash e_1 \leadsto^{*} e_2$. This relation
is the reflexive, transitive closure of $\leadsto$. It is useful to describe
reductions that may take many steps. For example, in the evaluation rules for
statements, it is often necessary to evaluate an expression completely to a
value. The $\leadsto^{*}$ relation is used to allow reductions that take an
indeterminate number of individual $\leadsto$ steps. Such a derivation can be
built with repeated use of the $\ottdrulename{reduce}$ rule.
Some evaluation rules depend on the type of a value. Since there are two canonical
forms for each type, we avoid duplicating each rule by defining the following metafunction:
\medskip
\ottfundefncomputeXXtype
\ottdefnsreduceXXexp
\ottdefnshelpers
\ottdefnsmultistepXXexp
\end{document}