Skip to content

Commit

Permalink
adds the symbolic executor (#1105)
Browse files Browse the repository at this point in the history
* changes the trace representation and some initial design

* implements deduplication and classification incidence

* enables wide generators

Generators now are able to produce objects of random width modulo that the
random value underneath the hood is still an integer.

TODO: drop the width parameter from the byte generator.

* hardcodes generators for registers and memory in promiscuous mode

* makes wide generators available through memory-allocate

* disables alloc-on-free hack

it produces lots of false positives due to unwanted aliasing.

* fixes the memcheck rules for strn* family of functions

* EXPERIMENTAL: implements errno support

It uses malloc to allocate a place for the register and lisp
parameters to store the location.

We should also consider clearing errno as soon as possible as it is
the source of aliasing and lots of UAF and DF false positives.

* EXPERIMENTAL: enables buffer-on-creation overflow tracking

basically, the idea is that each function that defines a buffer or a
string (e.g., malloc, strdup, memcpy, etc) imposes constraints on the
buffer that we may track. For example, if we break the bounds of the
strdup generated string it could be a soft alarm.

* revamps Primus Generators

We're now using a much more efficient MSG algorithm with
the implementation that uses the new Bitvec module, so that
we can now generate arbitrary number of random bits and specify
any range we like.

* enables wide generators for memory

* hushes ida error message

it is just a warning, there is no need to overreact

* adds the generated observations for undefined vars and pointers

* the initial implementation of the Primus symbolic executor

* adds the oasis file for the symbolic executor

* makes z3 typechecker happy

coerces boolean operations to bitvectors (we will keep booleans only
for actual logic formulas, i.e., for meta logic - to answer questions
whether a path is feasible and so on).

* abstracts the input type (previously known as origin)

* adds the solver

* implements eq and neq using only only the BitVector logic

* adds salt to generators

Now each generator will generate its own sequence. Note, we still
need to create a separate generator for each independent data location.

* UNFINISHED: publishes the of_iterator function

TODO: document it
TODO: publish bitvector interfaces

* adds the random plugin

this plugin will control all the random generators... as much as they
could be controlled

* makes the generators syntax a little bit less ugly

* implements memory randomization

* registers symbolic executor components using the new interface only

* splits promiscuous mode into several independent components

* restructures the core system

* adds the translation from formula values to words

* don't resubscribe printers every time a new system is started

* implements the master-forker process

* fixes a few more z3/bil discrepancies

* propagates inputs from the worker to master

* fixes the worklist filter

Now we won't solve branches that were already visited by some other process.

* moves mark-visited to library

since so many components anyway track the visited terms it is better
to have it available for everyone, instead of reduplicating the work.

* symbolic executor now uses the mark-visited component

so that we can propagate visited terms between different runs.

* WIP: enables run-until-completion mode

in which we run until all terms are visited.

* cleans up the symexec code

* adds a few more lisp stubs

* adds the [del] operation to Env and Memory

This operation deletes the bindings of the variable or an address.

* a new component for handling unresolved calls

The component will prevent linker from raising the unresolved call
exception by trapping it. Note, that it will conflict with the
default trap that is unconditionally set in the primus.lisp file,
which we will remove in the next commit.

* removes the primus.lisp feature

First of all this feature shouldn't be in the posix module, second it
was used to install the handler.

* assumes the default C prototype if no available

* fixes the component name

* adds constraints to the formula solver and computer

* records the path constraint on each taken jump

* adds assume and assert primitives

* prefixes assert/assume

* fixes ordering primitives

they weren't using binops

* fixes a bug with missing constraints

* adds lisp primitives to the symbolic computation system

* adds the z3 dependency

* fixes more primitives

* fixes the component name in the legacy-main system

* fixes a typo in the abi specifications

* don't hush errors from Primus

* moves and fixes subroutine pertubator

it is now in the primus-random plugin and it is possible to control
each argument individually

* adds the global seed parameter

* documents and cleans up the primus-random plugin

also adds the global seed parameter and an option to keep page
protection.

* enables the upper limit if the limiter is used

It was quite unintuitive that to limit the maximum path you need to
enable the limiter and then set the limit to some finite number as
by default it is unlimited.

This were making sense, however, in the old days, when we had only one
system and all components were automatically included into
it. Therefore, to prevent the limiter from jumping in into the old
style bap:legacy-main and to enable seamless backward compatibility,
we did the following:

1) When either --primus-limit-length or --primus-limit-max-depth are
specified we add the component to the legacy-main system.
2) We also add this component when the promiscuous mode is enabled.

* fixes error messages in the main library

* publishes the ite operator in the interpreter interface

* rewrites the malloc allocator

* prettifies the error messages on the disassemble command

* FIXP to malloc

* reimplements errno without malloc

* FIXUP on limiter

* enables autocoercion in the Primus interpreter

* evaluates conditions in Primus Lisp transparently

* adds a few more primitives to the region library

- region-lower and region-upper return the corresponding bounds of the
region.
- region-count counts the number of regions in the set of regions.

* adds three more primitives to Lisp dictionaries

- dict-first d - is the first key in the dictionary;
- dict-next d k - is the next key after k
- dict-length d - is the length

The first two functions enables iteration over dictionaries.

* a couple of bugfixes in the lisp code

* adds prototypes to the ctypes functions

* cleans up the error messages in the run plugin

* adds a new eval-lisp command

This command can be used to evaluate lisp code without having to
provide a binary.

* drops bogus stubs

* implements support for modern C runtime

In the modern C runtime the __libc_start_main function is called not
via the PLT indirection, but through a .got.plt (or .got.plt, can't
remember which is which) table which is zeroed by default (and it is
expected that the loaded will fill in the values from the relocation
table).

Our existing support for C runtime was searching the libc_start_main
location from the entry point - the first function called by the entry
point is assumed to be th libc_start_main, and the first argument of
that function is assumed to the main function.

The problem is that since there is no longer the plt entry for
libc_start_main we can't really link its stub, so the fix is simple,
just insert the synthetic subroutine for it. And everything else works
more or less fine.

This commit also adds the stack canary provision code, for systems
that require it. It allocates the space for the canary and initializes
the FS_BASE variable.

Sine FS_BASE is not added to the Primus environment by default we also
modify the x86-primus-loader plugin and initialize all registers from
x86 family (including the ymms registers).

* makes branching and looping interpreted.

* fixes the path condition computation in the symbolic executor

the false conditions should be refuted

* makes symbols interning global

so that different machines have the same vision on symbols

* opens the bap package in primus systems

to enable writing the system name without the bap package

* implements primus closure context

this makes it easier to define several closures in the same module.

* makes Env.mem public

* makes addr_size and data_size public in the Descriptor module

* implements the symbolic primitives

now it is possible to create symbolic values and symbolic memories

* switches to Z.t from int64 as the word representation in Lisp

* simplifies the lisp primitives module

collapses all primitives into a single module, less code and merlin is
no that mad

* adds the system name and its components to the lisp context

* the initial implementation of the symbolic input/output

* intialializes global and static variables in Primus Lisp

We don't treat static any differently right now. Later we can enable
their mangling so that the static variable is only available in the
scope of its definition.

* tweaks symbolic files to start with a small initial size

this is a workaround of a bigger problem that we gonna handle later -
how to deal with large memory mappings. Also relevant for malloc.

* do not restart when no solution found

* [RANDOM] fixes a bug in the remembering random values

We were emitting the generated event for values read from static memory.

* enables a deeper search of the execution tree

Instead of investigating each linearly independent path we can
re-execute the same path several times, depending on how much
information it will give us. Still highly experimental.

* disables handling of OCaml exceptions in the closure application

We can't really catch them on that level, we need to modify the bind
operator. And we don't want to do this (even if we can), because this
will lead to stack overflows, as each binding will push an exception
handler to the stack.

* fixes the Lisp `msg` operator and tweaks the symbol printing

The OCaml side-effectful computations were escaping from the monad, so
the messages were sometimes clobbered.

Also tweaked printing the symbols, if a value has an interned symbol
we print it. There could be some false-positives, so probably we could
also check the size of symbol or introduce tagging.

* Lisp global variables should be properly initialized.

* fixes symbolic stdin initialization

also declares the symbolic context for all definitions in the file.

* fixes a potential bug in symbol-of-string primitive

do not use side-effects inside a monad.

* tracks only symbolic values in the symbolic computer.

There is no need to store numerals as we can always compute them on
flight.

We are also no longer creating tasks for non-symbolic path
conditions.

* switches to the partial model evaluation

* chains tasks into trees

Each task now puts the parent task into its constraints.

Note, it would probably be more optimal to to store the models, but I
am not sure that it would be correct. On the first thought, if we will
inherit the model from the parent task we will significantly limit the
solution space.

* threads fgets via fgetc to enable symbolic fgets

also fixes a bug in fgets that was never returning an EOF.

* fixes symbolic memory generators

They are generating random values of correct width

* implements symbolic streams

* adds the missing update_hash operation

we were running all jobs under the same stack-hash.

* fixes the Primus destructor behavior

The destructor wasn't called when the exception was thrown from the
start handler. Also adds the fini callback for finilization,
symmetical to the init.

(Honestly, not sure is it symmetrical, probably it should be run,
_after_ the fini observation, then it will be symmetrical, will think
about it later).

* fixes parent refutation and keeps parents inputs

First of all, parent task constraints weren't refuted and were added
as if we were always taking the yes-branch. We hotfix it to the
negation of the condition, when it was refuted, but in general we
should rething this parent constraints.

Next, we didn't consider the parent inputs so while the model included
them they were never set, since the child thread wasn't considering
them as inputs (as they were already set).

* a massive cleanup of the SMT interface

* disables input creation on symbolic-{value,memory} operations

We do not create new inputs explicitly but let the generators to do
their work. As a result if the input is a part of the model it will
no longer be treated as symbolic.

* adds symbolic getchar

* stores the symbolic context in a separate component

* stops spawning tasks before they are solved

* switches to priority queue for the worklist

right now the priority is just LIFO as it was before but we later may
prioritize by the amount of information that task will provide

* adds the model of fflush and flushes after each puts

* adds the model of fdopen and fileno

* fixes the fgets stub

There were a few off-by-one, forgot-the-corner-case bugs that were
revealed by the symbolic executor. Good boy, symbolic executor.

* fixes tons of bugs in the symbolic stdio

* implements the comparable interface for Formula

* removes the dependency on 'a printer from Regular in Bap.Std

it was impossible to use the printing functions without linking in the
regular library.

* adds the symexec tests

* adds the progress report to the symbolic executor.

* changes the cutoff algorithm

* code cleanup

* adds the fflush prototype

* adds the only-queue option to the run-entry-points parameter

* adds crc32 test to the testsuite

forgot to commit it

* adds symbols to the test binaries to enable test passing on Travis

* moves the glibc runtime support code to a separate plugin

Not only it is possible to enable/disable it, but it is also now
possible to provide alternative competing runtime support without
modifying the existing code.

However, the main motivaiton to move it was the desire to get access
to the project data structure that enables a much more robust runtime
identification that works even in the absence of symbols.

* fixes type errors in strcat and strncat

fixes a type error in strcat

fixes an error in strncat

* fixes issues in the tests

* uses core_kernel in primus-random to enable older ocaml versions

* fixes the cache test

* fixes few small bugs

1) the greedy-promiscuous-executor was still mentioned by other
systems
2) yet another type error in strings

* uses static binding for the errno-location

instead of the parameters

* renames symbolic-io to symbolic-stdio adds symbolic-stdlib

The convention is for each concrete feature `xxx` that needs a symbolic
counterpart to provide the `symbolic-xxx` feature.

The stdlib is right now empty we will start to fill it in with the
stuff after the release.

* adds the new Memory.add_region function

It is the same as the old `allocate` (which is still available, no
worries), but takes two words instead of one word and int, which makes
it possible to create regions whose size do not fit into OCaml int,
which is quite possible during execution in a randomized mode.

And makes the code much eaiser to write as now we don't need to deal
with conversion errors.

* switches to add_region in primus-loader

* switches the memory-allocate primitive to add_region

it will no longer fail on big numbers

* primus random now also uses add_region

so it is possible to map the whole 2^64 memory in one run

* switches the symbolic memory to add_region

* adds the documentation for the Generator.of_iterator function

* drops a bogus observation

Co-authored-by: oleg <[email protected]>
  • Loading branch information
ivg and gitoleg authored May 28, 2020
1 parent 53388e9 commit 9d8c51d
Show file tree
Hide file tree
Showing 85 changed files with 4,420 additions and 1,480 deletions.
4 changes: 2 additions & 2 deletions OMakefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ include _oasis_lib.om
# At this point you may fine-tune the configuration, and modify defaults:
#
# OCAML_LIB_MODULE_SUFFIXES += .ml (install also .ml files with libs)
# OCAMLFINDFLAGS += -verbose (be verbose)
# OCAMLFLAGS += -bin-annot (create .cmt/.cmti files)
# OCAMLFINDFLAGS += -thread
OCAMLFLAGS += -bin-annot # (create .cmt/.cmti files)

OCAMLFLAGS_ANNOT = -annot -bin-annot
OCAMLFLAGS += $(OCAMLFLAGS_ANNOT)
Expand Down
56 changes: 28 additions & 28 deletions lib/bap/bap.mli
Original file line number Diff line number Diff line change
Expand Up @@ -680,7 +680,7 @@ module Std : sig

(** Lazy sequence *)
module Seq : module type of Seq
with type 'a t = 'a Sequence.t
with type 'a t = 'a Base.Sequence.t
(** type abbreviation for ['a Sequence.t] *)
type 'a seq = 'a Seq.t [@@deriving bin_io, compare, sexp]

Expand Down Expand Up @@ -776,7 +776,7 @@ module Std : sig
will create a printer for a [String.Trie] that is populated by
integers. *)
val pp : 'a printer -> 'a t printer
val pp : (Format.formatter -> 'a -> unit) -> (Format.formatter -> 'a t -> unit)
end
end
module V2 : sig
Expand Down Expand Up @@ -1339,7 +1339,7 @@ module Std : sig
Note, the [printf] function from examples refers to the
[Format.printf], thus it is assumed that the [Format] module
is open in the scope. *)
val pp : t printer
val pp : Format.formatter -> t -> unit

(** [printf "%a" pp_hex x] prints [x] in the hexadecimal format omitting
suffixes, and the prefix if it is not necessary.
Expand All @@ -1353,7 +1353,7 @@ module Std : sig
]}
@since 1.3
*)
val pp_hex : t printer
val pp_hex : Format.formatter -> t -> unit

(** [printf "%a" pp_dec x] prints [x] in the decimal format omitting
suffixes and prefixes.
Expand All @@ -1367,7 +1367,7 @@ module Std : sig
]}
@since 1.3
*)
val pp_dec : t printer
val pp_dec : Format.formatter -> t -> unit

(** [printf "%a" pp_oct x] prints [x] in the octal format omitting
suffixes, and the prefix if it is not necessary.
Expand All @@ -1379,7 +1379,7 @@ module Std : sig
# printf "%a\n" pp_oct (Word.of_int32 0x1);;
1
]} *)
val pp_oct : t printer
val pp_oct : Format.formatter -> t -> unit

(** [printf "%a" pp_bin x] prints [x] in the binary (0 and 1) format omitting
suffixes, and the prefix if it is not necessary.
Expand All @@ -1393,7 +1393,7 @@ module Std : sig
]}
@since 1.3
*)
val pp_bin : t printer
val pp_bin : Format.formatter -> t -> unit

(** [printf "%a" pp_hex x] prints [x] in the hexadecimal format omitting
suffixes, and the prefix if it is not necessary.
Expand All @@ -1405,7 +1405,7 @@ module Std : sig
# printf "%a\n" pp_hex (Word.of_int32 0x1);;
1
]} *)
val pp_hex : t printer
val pp_hex : Format.formatter -> t -> unit

(** [printf "%a" pp_dec x] prints [x] in the decimal format omitting
suffixes and prefixes.
Expand All @@ -1419,7 +1419,7 @@ module Std : sig
]}
@since 1.3
*)
val pp_dec : t printer
val pp_dec : Format.formatter -> t -> unit

(** [printf "%a" pp_oct x] prints [x] in the octal format omitting
suffixes, and the prefix if it is not necessary.
Expand All @@ -1433,7 +1433,7 @@ module Std : sig
]}
@since 1.3
*)
val pp_oct : t printer
val pp_oct : Format.formatter -> t -> unit

(** [printf "%a" pp_bin x] prints [x] in the binary (0 and 1)
format omitting suffixes, and the prefix if it is not necessary.
Expand All @@ -1447,7 +1447,7 @@ module Std : sig
]}
@since 1.3
*)
val pp_bin : t printer
val pp_bin : Format.formatter -> t -> unit

(** [printf "%a" pp_hex_full x] prints [x] in the hexadecimal format with
suffixes, and the prefix if it is necessary.
Expand All @@ -1459,7 +1459,7 @@ module Std : sig
# printf "%a\n" pp_hex_full (Word.of_int32 0x1);;
1:32u
]} *)
val pp_hex_full : t printer
val pp_hex_full : Format.formatter -> t -> unit

(** [printf "%a" pp_dec_full x] prints [x] in the decimal format with
suffixes and prefixes.
Expand All @@ -1473,7 +1473,7 @@ module Std : sig
]}
@since 1.3
*)
val pp_dec_full : t printer
val pp_dec_full : Format.formatter -> t -> unit

(** [printf "%a" pp_oct_full x] prints [x] in the octal format with
suffixes, and the prefix if it is necessary.
Expand All @@ -1485,7 +1485,7 @@ module Std : sig
# printf "%a\n" pp_oct_full (Word.of_int32 0x1);;
1:32u
]} *)
val pp_oct_full : t printer
val pp_oct_full : Format.formatter -> t -> unit

(** [printf "%a" pp_bin_full x] prints [x] in the binary (0 and 1)
format omitting suffixes, and the prefix if it is necessary.
Expand All @@ -1497,7 +1497,7 @@ module Std : sig
# printf "%a\n" pp_bin_full (Word.of_int32 0x1);;
1:32u
v} *)
val pp_bin_full : t printer
val pp_bin_full : Format.formatter -> t -> unit

(** [pp_generic ?case ?prefix ?suffix ?format ppf x] - a printer
to suit all tastes.
Expand Down Expand Up @@ -1534,7 +1534,7 @@ module Std : sig
?prefix:[`auto | `base | `none | `this of string ] ->
?suffix:[`none | `full | `size ] ->
?format:[`hex | `dec | `oct | `bin ] ->
t printer
Format.formatter -> t -> unit

(** [string_of_value ?hex x] returns a textual representation of
the [x] value, i.e., ignores size and signedness. If [hex] is
Expand Down Expand Up @@ -1969,15 +1969,15 @@ module Std : sig
val slot : (Theory.Program.Semantics.cls, stmt list) Knowledge.slot

(** [printf "%a" pp_binop op] prints a binary operation [op]. *)
val pp_binop : binop printer
val pp_binop : Format.formatter -> binop -> unit

(** [printf "%a" pp_unop op] prints an unary operation [op] *)
val pp_unop : unop printer
val pp_unop : Format.formatter -> unop -> unit

(** [printf "%a" pp_cast t] prints a cast type [t]
@since 1.3
*)
val pp_cast : cast printer
val pp_cast : Format.formatter -> cast -> unit

(** [string_of_binop op] is a textual representation of [op].
@since 1.3
Expand Down Expand Up @@ -3614,7 +3614,7 @@ module Std : sig
val eval : t -> Bil.value

include Regular.S with type t := t
val pp_adt : t printer
val pp_adt : Format.formatter -> t -> unit
end

(** [Regular] interface for BIL statements *)
Expand Down Expand Up @@ -3895,7 +3895,7 @@ module Std : sig

include Regular.S with type t := t

val pp_adt : t printer
val pp_adt : Format.formatter -> t -> unit
end

(** Architecture *)
Expand Down Expand Up @@ -4432,7 +4432,7 @@ module Std : sig

(** [pp pp_elem] creates a vector printer that uses [pp_elem] to
print elements. *)
val pp : 'a printer -> 'a t printer
val pp : (Format.formatter -> 'a -> unit) -> (Format.formatter -> 'a t -> unit)
end

(** BAP IR.
Expand Down Expand Up @@ -5214,7 +5214,7 @@ module Std : sig
val elements : ('a t -> 'a seq) ranged

(** [pp printer] - creates a printer for table from value printer *)
val pp : 'a printer -> 'a t printer
val pp : (Format.formatter -> 'a -> unit) -> (Format.formatter -> 'a t -> unit)
end

(** A locations of a chunk of memory *)
Expand Down Expand Up @@ -5687,7 +5687,7 @@ module Std : sig
include Container.S1 with type 'a t := 'a t

(** [pp pp_elem] constracts a printer for a memmap to the given element. *)
val pp : 'a printer -> 'a t printer
val pp : (Format.formatter -> 'a -> unit) -> (Format.formatter -> 'a t -> unit)
end

(** Symbolizer defines a method for assigning symbolic names to addresses *)
Expand Down Expand Up @@ -6321,7 +6321,7 @@ module Std : sig

(** [pp_adt] prints instruction in ADT format, suitable for reading
by evaluating in many languages, e.g. Python, Js, etc *)
val pp_adt : t printer
val pp_adt : Format.formatter -> t -> unit

(** {3 Prefix Tree}
This module provides a trie data structure where a sequence of
Expand Down Expand Up @@ -8206,10 +8206,10 @@ module Std : sig
include S with type ('a,'e) state = ('a,'e) Monad.State.t

(** print a set of taints *)
val pp_set : set printer
val pp_set : Format.formatter -> set -> unit

(** print a taint map *)
val pp_map : map printer
val pp_map : Format.formatter -> map -> unit

module Map : Regular.S with type t = map
end [@@deprecated "[since 2018-03] use the Bap Taint Framework instead"]
Expand Down Expand Up @@ -9052,7 +9052,7 @@ module Std : sig
value for the ['a] type. *)
type 'a converter

val converter : 'a parser -> 'a printer -> 'a -> 'a converter
val converter : 'a parser -> (Format.formatter -> 'a -> unit) -> 'a -> 'a converter

(** Default deprecation warning message, for easy deprecation of
parameters. *)
Expand Down
Loading

0 comments on commit 9d8c51d

Please sign in to comment.