BAP 2.0 (#960) · BinaryAnalysisPlatform/bap@2773e8c

Commit

BAP 2.0 (#960)

* new implementation of fsqrt

Based on approximation of sqrt(x + 1) with Ramez coefficients

* initial scrap of the bitvector, still wip

* bap-elementary, initial commit

* new and efficient bitvector implementation

the interface design is inspired (or more correctly enforced) by the
current optimization capabilities of OCaml, the end goal is that if
a bitvector operation could be computed without memory allocations,
then it should be, even if the width is not known in advance.

this is still a wip

* wrote docs

* fixed a typo

* finished the initial implementation

didn't provide the specializations, probably will add them later.

* renames bitvector to bitvec and adds auxiliary libraries

Since we're going to use Bitvec as our next new Bitvector
implementation and since we don't want to alias it to Word or
Addr anymore (because it will introduce confusion with Bap.Std.Word
and Bap.Std.Addr modules as well as significandly slow down
compilation), we need a shorter name. However, Bitv is already used
in the opam universe, and Bv is probably too short. So the middle
ground was chosen, which actually matches with the SmtLib name for
bitvectors.

We also added two sublibraries along with submodules which enable
support for bitvectors in the Janestreet suite of libraries.
- Bitvec_order provides support for the interfaces that depend on
ordering, like maps, sets, comparables, etc
- Bitvec_binprot provides support bin_io

Together they provide functions that make Bitvec.t a first-class
citizen of JS Core libraries. We're moving away from the old JS style
of providing full comparable or Regular interface for each module,
since it bloats the binaries and, more importantly, significandly
slows down the compilation. We are now relying on small interfaces
such as compartors, so now instead of `Bitvec.Set.t` we will
use `(Bitvec.t, bitvec_order) Set.t` and instead of `Bitvec.Set.Empty`
we will use `Set.empty bitvec_order`

See the module documentation for more information.

* makes bitvec-order compatible with newer versions of base

* adds a couple of missing functions

* fixes some leftovers from the merge conflict

* makes bap_build indepndent on core-kernel

* makes the build system indepdent on core-kernel

at least an attempt

* the core-theory no longer depends on BAP

so BAP can depend on the core theory now.
also made core theory less heavy.

* moves some of new primus-list stuff to the new bitvector

* wip

* updates value to reflec the new Bitvec interface

* finishes with the lib part

* switches to an unboxed modulus representation

Since many use cases of the Bitvec interface will use the generic
operations with modulus created on the fly from the integer
represented width, having it boxed will have sever performance
ramifications.

So, instead of representing modulus as a pair of width and the modulus
(i.e., 2^w-1 and w) we have to decide on one or another.

We decided to use 2^w-1 as the modulus representation as this is the
value that is used in normalization, so computing it every time will
add two extra operations which are not [@@noalloc]ed.

The ramification of this choice is that we still need to go back from
2^w-1 to w. The direct solution is to count bits, which is rather
efficient already. For the [msb] operation we don't really need to
know the width to test whether the msb is set, since

msb x m ::= (m = m lsr 1 lor x) mod m

(though I'm still not sure whichever will be more efficient, the
[test_bit (num_bits m)] or the implementation above, but both are
rather efficient)

We also reimplemented the shift operations. They no longer depend
on the signed extract and are implemented purely relying on bitwise
operations. We use `num_bits` to get the width there.

* moves bitvector to the bitvec base

* updates oasis files to add bap dependency

many of our packages were dependent on bap only indirectly

* makes bitvec sublibraries standalone

the subpackages won't work with separate installation, since they have
to share the same META file, and will break Dune and in general
fragile.

* The following code allows ymm registers to be cast to smaller registers. This new casting operation is used to extract xmm registers from ymm registers in order to support monomorphic functions in sse. The SysV ABI is updated to recognize functions that read and write to xmm0.

* prototype for sin function from double to double

* initial example of using Core Theory's floating point support to hook sin function and evaluate user-defined polynomials using Horner's method. Current example only computes a geometric series.

* fix warnings regarding Type not being in scope for Imm and run signature

* working merlin file

* initial functions in core theory IR for inaccurate sine range reduction

* intial sine and cosine implmentations using Remez's approximation.

* reverse coefficients order

* fix fceil bug, but need an fround implmentation for this to work.

* one more test

* made refactoring recommendations to seperate polynomial evaluation from range reduction and reconstruction.

* renaming

* wip: refining the knowledge theory

* added knowledge return operators to stop recomputing values

* renaming and better comment

* more refactoring

* new ceil! :)

* multiply by constant instead of division in fmod. As well as better representations for pi values

* one test passes sort of :Dgit add approximation.ml !

* undo initial ABI changes

* fixes indentation

* optimizes sin computation (10 times better now)

the idea is that we can flip the sign using a simple xor operation.
also, the reduction step stores whether the correction is needed in
a variable, so we don't need to check it twice

* nearly finished the implementation

only serialization is left

* adds serialization of values

* implements some domain functions

* drops the old modules, implements more sophisticated typing

* implementing different type classes for values and objects

* implemented merge and join

idea for tomorrow, allow users to provide the lattice data structure

* the knowledge module finally typechecks

* started rewriting the Core Theory in terms of KB

* at least the empty theory is ready

* adds symbols to knowledge

* finishes adding symbols to knowledge

decided to forfeit symbol importing right now.

* the theory manager now compiles

* vars are now using knowledge objects

* the parser also compiles

* core theory compiles!

* implements symbols visibility

now we keep track whether it is a public symbol or not

* adds namespaces importing

enough with namespaces... really tired of them :/

* abstracts peristence, adds class order

- instead of using Binprot directly we now abstract the persistence
class, so that later we can easily provide backing stores that support
ad-hoc data types (like Redis). The new interface is also easier to
use, as it provides functional combinanors.

- enables partial ordering between classes, adds some documentation
that clarifies class samenames and specificness

* adds support for knowledge data representation

(maybe we should add support for integer data, like bitvectors,
instead of letting it to the downstream libraries...)

* adds support for immediates

Not sure about the interface yet, but sooner or later we will need
them.

* primus 2.0 is nearly ready,

just trying to figure out a proper signature for run.

* ok, Primus doesn't require any components

the components part should be handled by the static knowledge

* satisfied the signature

* adds flat domains and made optional flat

A flat domain is stratifies the set into two classes, empty or
something. Empty is the bottom element. All elements that are
not empty are non-comparable.

Basically any set with an empty element satisfies the domain
structure, moreover this is how most of our domains are defined.

Even more, when we want to derive domain from the optional
representation we actually want the flat domain. For example,
the `addr option` representation actually means that there is either
some knowledge about the address of an object, or there is no such
knowlede. It doesn't mean that `Some 0xdead` has more knowledge than
`Some 0xbeef`.

* all promises are fullfiled when the knowledge monad is run

well, at least the promises specific to the class.

* bap.std finally compiles

* moves the new primus out of the bap_primus folder

we decided to keep the old primus intact, and maybe update it to
the new primus machine, instead of getting rid of it.

* restores primus from the bap-new-semantics branch

* fixes the print plugin

no longer pp_domains

* adds a missing depedency to the core-theory

* renames decide to fact

* simplifies the class type

Removes the hierarchies, we don't really need them. And when necessary
they could be organized for a concrete family of classes using the
same approach.

It is now possible to declare abstract classes and then refine them to
concrete, or declare a concrete class from the start.

Also, makes the class type constructor covariant, so that a class
could be defined as a polymoprhic instance.

* makes 'a cls,'a obj, and ('a,'p) slot covariant in 'a

* revises the sort system

Implemented a simple algebra of sorts, which consists of sort atoms
and applications of sorts (atoms could be either symbols or numbers).

The algebra is typed, but atoms are universally quantified. A Magic
module is provided which enables breaching of the type system, so
that a sort type could be restored after serialization. This is now
safer, when it was before, as the phantom types is now reconstructred
from the sort definition, rather then set arbitrary at the place of
sort definition.

* makes sort refinement more safe

* uses Theory.Var for vars, adds Unk branch to Bap.Std.typ

* wip, nearly finished rewriting IR in new semantics

* moves Ir_jmp to the new semantics

* moves the whole IR to the new semantics

* IR no longer uses BIL

* the Insn.t is now the Semantics.t

* renames links to labels

link is a binary attribute, while label is the unary

* extends the derived interface of Objects

* The Tid.t is now the Theory.label

We will now perform all side effects in the knowledge monad.

* moves bap to the new runtime

* creates program abstraction in CT and refines label

* updates major components to the new interface

* reverts primus components to the old primus

* fixes a bug in the bitvector representation

* fixes the value merging function

the order was confused

* fixes IR generator

still not sure why it is implemented so indirectly

* changed the merging strategy of insn

still need something better for merging values.

* adds the runtime lock uses the runtime in bap functions

* relaxes the lock

otherwise it is impossible to use. We still need to figure out how to
enable some sort of protection while preserving usability. For
example, we can at least allow to read from the state without any
check.

* makes bap_c_abi error printable

* refines IR jmp representation

Dropes the roles along with multiple links as the were increasing the
impendance mismatch between the representation and its reification
into BIL. The old BIL jmp could have up to two destinations, not N
destinations. Since the new representation was accepting many
destinations it was tempting to use the same jmp term to represent a
control flow which should be represented with several jumps.

In the new represenation we have up to two destinations, which we
denote with `dst` and `alt` detinations. Both are optional, therefore
we now can have 0, 1, or 2 destinations,

```
+-----+-----+-----------+
| dst | alt | semantics |
+-----+-----+-----------+
| - | - | undefined |
| x | - | goto x |
| - | y | call y |
| x | y | call y x |
+-----+-----+-----------+
```

where `call x` denotes `call x with noreturn` and
`call y x` denotes `call y return x`.

Here we loosing the `ret x` which is denoted with `alt x`
and then reified back to `call x` instead of `ret x`. In order to
recover the call correctly we need to know whether `x` denotes a
function entry or it is in the middle of the function, which is, in
general, an undecidable property. Therefore, we extend the notion of
call, to an interprocedural control flow transfer.

The value of the `dst` and `alt` fields is an abstract variant with
two options, `Indirect vec` parametrized with the vector expression,
denoting a set of possible destinations, and `Resolved tid`,
parametrised with the destination label.

* prints negative numbers as subtractions of positives

just easier to read

* the CFG is now more or less properly linked

* fixes call wiring

as a separate pass (not a real pass though) we rewire all
jmps to entry blocks as calls to corresponding subroutines.
Note, this will also affect recursive subroutines - all jmps
to the self entry block will be treated as interprocedural
jumps. As a nice side effect, this pass will enforce, that the entry
block has not incomming edges.

* fixes all bugs

now it works :)

* fixes a bug in variable susbtituter

Note, this is not a new bug, the substituter was always broken in BAP.

The variable substitution (exposed via the `substitute_var` function)
is not capture free, contrary to what the function description is
saying. The body was still substituted even when the variable is
captured with an enclosing let expression.

It is fixed now.

* hardens bap_bitvector implementation

A few bugs were due to bap_bitvector internal representation was
breaching the abstraction of the packed bitvector. The new
implementation carefully establishes boundaries, so that we can't
accidentally treat a packed representation as a bitvector.

Also the textual representation of the Bap_bitvector is restored to
match the previous representation, so that we can pass the tests.

* prints argument types using the old bap types if possible

If an argument type is representable in old BIL we use the old
notation in our IR printer to print it. Otherwise we fall back
to the normal sort representation.

* updates the elementary function

* fixes the signed comparison function

* makes disasm failure in powerpc tests more verbose

* fixes object string representation

* fixes representation of tids in piqi serializers

* fixes flat domains, unit tests are passing now

Previously, the derived flat domain was treating all values as
noncomparable if they weren't empty. Therefore `Some 2` and `Some 2`
were non-comparable. In fact, we can safely assume that equal values,
contain equal informational content, thus they are informationally
equal. Therefore, we decided to define flat domain via a pair of
values, the empty value, and the equality operator.

* removes some debugging information

* refines reconstructor, symtab, and IR lifter to fit new semantics

Symtab Changes
--------------

First of all, I've refined the symtab interface to make it clearer.
Now we have the shadow callgraph in the symtab, that stores each call
edge from the callsite, denoted with the basic block, to a callee,
which could be a real function also present in the symtab, or an
external, which represented just by its name. We distinguish between
two kinds of calls - explicit and implicit. An explicit call is
produced by a control flow instruction, e.g., jump, branch, or call,
while an implicit call is produced via a fallthrough of the control
flow from one subroutine to another (yes, this can happend, and we
support this since 1.6). Therefore, it is possible that one callsite
has two call edges - one explicit call and another implicit, e.g.,

```
__printf_chk:
call __stack_check
printf:
...
```

Which we should lift as two calls,

```
sub __printf_chk:
call __stack_check with return next
next:
call printf
printf:
...
```

In general, each block can have 0,1,2 call edges. This is modeled in
the API with two functions, `explicit_callee` and `implicit_callee`,
both returning an optional type. Call edges are inserted into the
callgraph using one `Symtab.insert_call` function, that takes an
optional argument to distinguish between two kinds of edges.

This commit also fixes the remove function, which wasn't removing
function calls, when one was removed.

Reconstructor Changes
---------------------

- Updates to the new symtab interface
- Fixes an externa call classification. It was previously inserted as
a fall edge, while it is actually an opposite, it is always an
explicit call.

IR Lifter Changes
-----------------

The IR lifter implementation no longer depends on BIL and is purely
implemented using the new interfaces. It is basically rewritten nearly
from scratch. It is still not passing tests (we still do not insert
synthetic subroutines, they are not needed anymore, but for the sake
of backward compatibility, we will probably introduce them back, maybe
as a separate pass). The new implementation precisely handels
different variants of calls, trying to keep the CFG representation as
tight as possible (i.e., do not introduce unnecessary blocks). It is
also, more conservative wrt to inter/intra classification of
jumps. When the whole program is lifted, we ensure that all goto's
will never leave the boundaries of their subroutines. We turn all
jumps for which we can't provide such guantees to calls. Basically, it
means, that in the output IR it is now impossible to find an
unresolved goto (all gotos now have a form of `goto %....`).
The new lifter is thoroughly commented, so please continue reading
there.

* WIP: a complete rew of the recursive descent disassembler

The new implementation is supporting delay slots (and out-of-order
execution in general), more conservative (it disasembles more than a
linear sweep disassembler), uses much less space, and is much easier
to understand and to maintain, since it is only 200 lines of code.

However, there are still some bugs, so it is a work in progress.

* makes the disassembler more robust

* prints the disasembly in the ascending order

previously we were printing in the reverse postorder, which actually
complicates reading of the dasm output, which is not printing the fall
destinations.

* cleans up the disassembler

looks like it works fine now

* adds a new constrained superset disassembler

* embeds the constrained superset disassembler into the KB monad

* fixes a bug in the ror x86 semantics

the order of expressions was confused

* switches to the new superset disassembler

still lots of work to be done, but looks promising

* fixes a couple of bugs

1) adds retraction if we fell of the memory
(it's fine to jump over the boundaries)

2) fixes retraction in case of a delay slot

3) fixes delay slot implementation
(it worked only for delays equal to 1)

* prints disassembly in the increasing addresses order

* changes the scanning mode

now, we only scan as much as needed, no more no less. We no longer
add roots on a failed instruction, but instead, if at the end of
a scan we didn't find all code, we restart we the first unsatisfied
instruction as the root, and so on, until we classify everything as
code or data.

* a memory footprint optimization

By using scoped objects for represeting links we can save enomorous
number of bytes. In the previous approach, we were creating a code
object for each by in the input memory and then fill its slots. We
even created a unique symbol for each instruction. Thus, just the
inital markup of 100Mbytes binary takes 50 Gbytes of RAM.

With the new approach it is not only much more memory
efficient (constant space), but also is 4 times faster (probably due
to the less time spent in GC)

* integrating the new disassembler into BAP

still unhappy with _all_ name choices

* few more renamings and refactorings

1) the basic disassembler now will keep the table of created
disassemblers and do the reference counting. We specifically
do not delete disassemblers, we just want to be at most one
disassembler per triple.

2) The speculative disassembler is now renamed to Disasm.Driver,
because it is not a speculative disassembler (one could create a
speculative disassembler using this driver).

3) The Driver will create disassemblers on the fly. Now it is
possible to disassemble a mutliarch program in one pass, and even
create a CFG which will contain blocks of different architectures.

* changes the type of Insn.t

it is not the program, but its semantics. So an instruction doesn't
include its address.

* fixed a fixed point

have to commit due to the fire alert :)

* adds an optional join method to the domain

This is a big change to the semantics of the knowledge system. Since
now when a piece of knowledge is provided it is joined with the
existing knowledge (if any). E.g., `provide s x p` must be consistent
with the existing `collect s x`, it's now impossible just to override
existing knowledge with something different (well modulo that we can't
really control the consistency of the order and join functions that
are provided by the clients of the library).

When a domain is defined it is possible now to pass a join function
that will provide a least upper bound of two values, or indicate a
conflict. Therefore, now we have a complete lattice stucture for all
properties, with conflict being the top value.

The join functon, if not provided, is inferred from the order
function.

Given the new semantics we now can reach a fixed point in less
iterations and reduce the burden of merging from the knowledge
providers, since now they can rely on a default merge function.

Note, it is still possible to define a recursive property with
arbitrary computation in it, and the system will compute the fixed
point of this computation as previously. It is just possible now to
automatically rely on the default join operator, so, to some extent,
this could be seen as an optimization.

A good example, where we really benefit from the new semantics, is
when we have several independent knowledge providers, like lifters
or jump destination providers. They all are not really interested in
what others are providing, as they just adding their information
(in a different slot, in case of lifter, or by Set.union in case of
dests).

* moving forward towards well-defined fixed point semantics

two new updates

1) promises are run only once
2) provide is joining and is forced to preserve monotonicity

Having promises to be computed every time a property is accessed is
stressing the whole system. Ideally, we shall rerun a promise only
if there is some new information, but we don't want to maintain the
whole dependency graph for each promise. Alternatively, a promise
itself may look into the properties and decide whether it has more
information or not. But this will essentially make all promises self
recursive, as the have to first look into the current value, and then
produce better if neccesary. Therefore, we will now evaluate each
property (slot+obj combination) only once (computing the fixed point
as it was before), when the slot of an object is not set. Afterwards,
after the initial value is computed the slot is read and set. Another
problem will re-evaluating promises is that they could every time
generate non-comparable results.

Provide was made monotonic probably in the previous commit. We need
this to enable information joining without introducing expicit
cycles. This change enforces independence on the order of execution,
and ensures fixed point termination. Although it sounds nice, it comes
with a price. Every time we have conflicting opinions (like different
names for the same function) we can't proceed without actually giving
precedence to one or another opinion. Sometimes, consensus just could
not be found, so we have to either create a more complicated lattice
to hold our information, e.g., using sets for all possible names and
aliases of a symbol/use lattice where each name is predicated with
the namer, or, somehow, prune the conflicting information
(thus bringing back to life the problem with the order).

An alternative solutions currently on the table

1) think about the knowledge ordering, like can we treat two different
names, e.g., `.init` and `init_proc` as equivalent from the
perspective of the informational contents. E.g., neither is bearing
more or less information than another. So are they NC or EQ?

2) rewoke the monotonicity requirement, but instead implement joining
only for promises (and probably extend the interface, so that we have
two functions, one that preserves monotonicity and another that
overrides)

3) explore possible data representations. This is the most valid and
theoretically sound approach, but it comes with a huge implementation
and reasoning burden. Like indeed, we have to admit that the same
entity might have different names and in differnt circumstances
different names are applicable. Therefore, a name, instead of being
just a string shall become a (discriminated) union of different
names. E.g., (name,name-provider). We can even enable ranking for
providers, to make it easier to compute a principal name, etc.

* wip: implementing the new agent based system

updates and descriptions will come a yet another fire-alarm push

* moves more service on the knowledge runtime

* renames Bap_state to Bap_toplevel

also Toplevel module was rewritten in the previous commit, we now have
a more top-level like interface. And, we are now having type `'a
Toplevel.t` which denotes the top-level computation monad, which is
declared as `'a Toplevel.t = 'a` in Bap.Std but will be turned back
into a monad outside of Bap.Std.

* adds aliases slot to the program label class

a program label may have multiple names all equally correct.

* fixes abi demangling and the usage of Tid.set_name

since variables in Knowledge should rename consistent, this function
should not be used to change the name. Moreover, demangled name is not
the true name, but rather a view, therefore the demangled name is
added to the set of object aliases.

* small renamings of the new symtab interface

* makes Typeid in Bap.Std.Value abstract, fixes pretty printing

this is the first step, before we will merge Bap.Std.Value into the
new Knowledge framework.

Basically, two options are researched, adding one slot that will
contain the dictionary of all values. Or, turning each tag into
a slot. The latter is more natural, but is harder to implement.

* unifies Bap.Std.Value with Knowledge

Now a tag is a weaker version of a slot, so that we can now store our
attributes directly in the knowledge base.

* switches the project to the new engine

* fixes insn delay computation

Instead of providing one promise to compute the semantics of
the instruction we were providing 37 promises (by the number of
branching instructions we support), each promise was asking for
the semantics (for the name), which was triggering the fixed point
computation, therefore we ended up with ~80 computations per
instruction.

Now, we register only one promise. We also request semantics only
after we match the architecture. (In fact, we do not need the
semantics at all, as we could get the name from the program object
itself, though to make mips less dependent on Bap.Std, and for the
sake of demonstration of the Knowledge capabilities, we decided to
keep it).

Note also, that it is fine to request semantics before the arch, it
will end up in a very slight inefficiency, as the delay provider will
be called twice (in case if arch matched, otherwise it will return an
empty instruction and won't be restartred), to make sure that the
fixed point is reached.

* extracts IR directly from semantics, instead of reruning the lifter

it was a reminiscent of an old infrastructure, now it is
faster (program reconstructon went from 10 seconds to 100ms on
/bin/ls), and is more general, as now the IR semantics doesn't
depend on BIL.

* adds the `join` parameter to domain definition functions

* lifts each instruction only once

previously we were creating a new instruction object every time a
lifter was called. The implementation will essentially guarantee
that for each instruction (address) we will call lifter only once,
for the rest (calls) we will get semantics ready in the knowledge
base

* some microoptimizations to the Exp/Bil/Bir domains

not sure whether they give us anything, so probably we could get rid
of them...

* a new implementation of universal maps

this is an ongoing experiment. We are going to substitute Core's
Univ_map/Map with a more optimal implementation.

The following optimizations are applied:

1) keys are inlined in the tree braches
2) the branches itself are inlined up to level 4
3) tree heights are not stored
4) balance is -1,0
5) data is only stored in leaves

The last optimization could actually be a pessimization. We will
experiment on this also.

* finishes switching to the new maps

so far it works correctly

* optimizes tree representation even more,

now we're storing values in the intermediate nodes, not only in
leaves.

* adds a few inlined attributes to the monads library

not sure if it will have any effect, but shouldn't make it worse

* revises the class/sort system

it was observed that our slots weren't polymorphic in classes,
therefore we were upcasting our values to base classes, which
wasn't in fact a no-op, since we had to change the class annotation
stored in the value. Therefore, an extra allocation for each Value.get
information.

This forced us to revise the class/sort system. Now, it is not only a
little bit more efficient, but is much more straightforward.

First of all, class is defined as a set of sorts. In terms of OCaml
class is indexed by the sort type variable, so that we can create
multiple instances of the same class parametrized by different sorts.

Slots no longer depend on the sort index, but only on the class index,
therefore all instances of a class, no matter the sort, could be
accessed without any manipulations.

On the Core Theory level we rectify the naming issues and now we have
two sorts: Theory.Value.Sort and Theoyr.Effect.Sort (previously we had
an assymetric sort vs effect, where effect was a sort of effect -
pretty confusing).

We also now have a Theory.Value module, which is symmetric to the
Theory.Effect module, as well as Theory.value and Theory.effect types.
This makes writing types much easier.

Finally, the sort system is also revised, since we now have a notion
of sort on the Knowledge level our Theory sorts fit nicely into
it. They are no longer classes, but, as expected, sorts.

Finally, since we have an ideal empty value for the expression
domain (since the introduction of the Unk type to BIL), we no longer
need to make expression domain optional. This also saves a few
allocations.

As a grain of salt, all those optimizations didn't win a lot, about
10% of performance (and 80% improvemnt in the memory footprint).

* yet another pass on dictionary optimization

while our dictionaries are twice as fast as Core's dictionaries for
small inputs, we loose the leadership drastically on larger inputs.

For inputs larger than 1000 we are ten times slowe than our
counterpart :(

A couple of optimizations improved performance, but the overall
picture is still the same.

Two things were optimized - key uid is precomputed and stored in the
key. We use a simple integer instead of relying on the extension
constructor id, since the latter turned out to be rather
expensive (lots of guards).

Second, when a value is inserted we consider cases when it should be
inserted into the head, as previously we were diving down to the leaf
level. And although it was expected that this optimization should fix
the performance gap for large trees it, unfortunately, didn't.

The next suspect is the balancing procedure. An obvious (but tedious)
optimization is instead of creating a disbalanced tree and then
balancing, do this in one step.

Also, it is interesting that core's balance function is not recursive,
which guarantees that there is no more than one balancing per
insertion. Can we guarantee the same?

* continues working on the optimized trees

we're already 10 times faster than core's tree,
and are using 6 times less memory

But still some corner cases are unfinished.

* an intermediate step in havl tree implementation

so far they work x10 faster than JS maps, but in some cases we are not
able to maintain our rank rules.

In the next iteration, we will try to use wavl approach with ranks and
ranks increments instead of classifying trees into LL/LR trees.

* wip on trees

* rank balanced wavl style tree is mostly ready

I would call it bonsai tree, but of course the name is already used(

* fixes a missing case in the rank_increases function

forgot about EQ in some cases.

* a more optimal lower level

maximizes the number of T4

* a few final optimizations and the cleanup.

Microbenchmaring shows that the less we touch the leaves the
better. In fact, if we will drop 90% of the insert function,
microbenchmarks will run even faster - but we will have a less
compact representation.

In any case, for small maps (<1000 entries) we are running 4 times
faster than JS maps. For very small maps (< 30 entries, actually our
expected payload), we are running 8 times faster than core maps and
even faster than Core's hash table

```
┌──────┬────────────┬─────────┬──────────┬──────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├──────┼────────────┼─────────┼──────────┼──────────┼────────────┤
│ dict │ 307.58ns │ 327.02w │ │ │ 12.22% │
│ core │ 2_517.40ns │ 892.04w │ 0.36w │ 0.36w │ 100.00% │
│ hash │ 531.14ns │ 201.00w │ │ │ 21.10% │
└──────┴────────────┴─────────┴──────────┴──────────┴────────────┘
```

For huge maps (> million of entries) we're still faster than core
maps, but most of the time we spend in GC and despite the fact that
we always allocate at least two times less than Core's map the time
difference is not that big, since OCaml's GC runtime does not depend
linearly on the number of bytes (but rather on the number of roots,
which should be basically the same).

* fixes the inline attribute (I keep confusing inline/inlined)

* adds caching to BIL lifter

* removes caching from the BIL lifter

Caching on the semantics level breaks the overall semantics of the
program, because semantically equal instructions should still have
different IR (in terms of the labels at least).

It might be indeed an error in the design of the IR lifter, which
probably, should be considered the semantics, but rather a container
for semantics. However, it is not quite obvious how we can separate
them, given that we have to use labels in control flow terms.

* updates the primus_machine to the new class representation

* switches to custom type keys

the type key from Core (provided by the Type_equal.Id) are relying
on the extension_constructor which is non robust (too many checks)
and is rather slow (again too many checks).

In our representation, we just use our own `int`-typed counter to
check the uniqueness of keys, if needed. In any case OCaml itself
is robust enough to check the keys equality, so we usually don't
need to do its work.

* adds GC setup code

This setup corresponds to `s=1M,w=20,i=64M,o=200`

A bigger minor heap helps to minimize the number of words that get
promoted, while still keeping the memory footprint rather low.

A fixed increment of 64 megawords is better than the relative
increment, as when the memory consumption comes close to dozens of
gigabytes, doubling the memory may lead to a disaster, while only a
few gigabytes are actually needed.

Finally, an increased smoothing window size should fit better with the
bap flow, where we have the reconstruction phase with a lot of
allocations and usually a much calmer analysis phase. TBH, I didn't
see any real effects from changing the window size.

* optimizes some maps in monads

Also,
- removes some dead code
- adds a few inline(d) attributes
- destroys indentation, ocp-indent is broken with attributes

* set the optimization level to 0 by default

* fixes the name of the external-reference ogre attribute

and fixes the indentation as well

* restores the correct merge function

* fixes opinions and ignores empty knowledge

So far, we were storing empty knowledge if it was provided directly or
via a promise or suggestion. Besides an increased memory footprint, it
wasn't a big problem, except with the opinions data structure. The
latter weren't doing anything special to handle empty values,
therefore the empty values were treated as normals and were
participating in elections. As a result, if majority of the providers
(or the authorative provider) didn't have the information (provided
the empty knowledge), it, the empty knowledge, was selected.

This commit fixes this issue, in addition we decided to ignore all
attempts to store an empty value in the knowledge base. It is a mere
optimization (for everything except opinions).

* ensures that the symtab symbolizer doesn't provide empty names

since it is an authorative source of information, we shoudl be sure
that it doesn't provide bogus informaton, as it will override any
information provided by other agents.

* enables image rooter and symbolizer when a projec is created

it might be better to provide it is as a separate plugin and do not
depende on image at all, but so far let's move in small steps.

* enhances IDA's plugin error messaging.

This error code 0 doesn't really say anything

* revamps the objdump plugin

First of all it fixes a bug when an entry with the name `FUN@plt-OFF`
was considered as a function with name FUN, e.g., both `memove@plt`
and `memmove@plt-0x10` were considered as the `memmove` function which
let to a lot of problems.

Note, this is a normal behavior of objdump, which if the name is not known
just finds the closest known location and adds an offset to it.

Also, since I was here, I gave a little bit of love to the plugin,
notably

- performance optimizations:
1) process the input on the fly (do not create the list of input,
lines this is slow and will OOM the plugin for rather large
input files)
2) compile the regular expression once (not for every line)
3) simplify the regexp and get rid of all string ops
(we can perfectly express what we need in regexp, there is no
need to apply extra checks or string operations)
4) get rid of extra allocations (do not create substrings, pairs,
or lists)
5) use Z.t instead of words to represent addresses. Takes less
space, faster, and easier to parse (no need to substring).

- adds an optional demangler
since objdump can do demangling for us, a new --obdump-demangler
option controls which demangler is used. Set it to `disabled` to
disable demangling.

* adds the rooter capability to the objdump plugin

Now the objdump plugin is also providing information about function
starts.

* stores interprocedural edges in the symtab

* publishes the barrier property

* improves the Program IR lifting

makes it more precise, by eliminating false edges based on instruction
semantics.

* respects inter falls only if there are no jumps

* adds support for kernel-style relocation back again

The idea is to fix the fixups as soon as possible to prevent the
problem from spreading to far away. Previously we had to touch a
lot of code to fix the havoc caused by unfixed jumps.

Therefore, the new solution is to fix the BIL code based on the
available relocation information, and everything else will work
correctly automatically.

The main problem here is the representation of the external call,
which we can't represent as a jump, since the latter requires a
concrete address. A simple (but not the cleanesst solution) would
be to use the special statment with a fixup string, which is later
resolved to a correctly named function. Since the solution is totally
confined within the BIL plugin it is rather safe and robust to
changes.

We also added back the synthetic procedures and will keep them until
the abi/api pass is not rewritten in the new framework.

* fixes the sexp printer in Bap.Std.Bitvector

it was not human readable.

Also registers the new serializers.

* fixes the label printer in Bap.Std.Jmp

It didn't print the `@` prefix before named tids.

* fixes the asm printer in the insn

since now the assembler representation could be set via the interface
it is no longer guaranteed that it will be normalized, so we're
normalizing it before printing.

* updates BAP Lisp to the changes in the Core Theory interface

* better error messages in byteweight

* fixes the bitvector comparison function

during optimization I accidentally removed the size comparison.

* makes the code start a function start if none provided

It is our old tradition to treat the first address as a root for
disassembly if no roots were intially provided by a user. This commit
restores this tradition.

* prevents two subroutines to claim the same identifier

We are distinguishing subroutines by names, but we can't guarantee
that our name providers won't provide equal names to different
subroutines, therefore, we should be extra cautious when we assign
a tid to a subroutine by its name and check if it was already
assigned.

* tracks call targets as subroutines

We shouldn't forget to mark targets of known calls as subroutines, so
that our callgraph reconstructor can correctly partition the whole
program CFG.

We are taking an extra care not to mark as subroutines targets of
calls that do not belong to a valid chain (i.e., which will be erased
from a CFG).

The disasm test is also updated to reflect the new changes:

1) The knowledge runtime doesn't allow changing the architecture
of an address, so we had to reset the knowledge.

2) The new CFG reconstruction algorithm is more conservative, so it
now reconstructs an unreachable block, so we had to update the
`call1_3ret` test, in which now our algorithm (correctly) produces a
CFG with 4 blocks.

* removes debugging output

* fixes instruction properties

* restores conditional edges in the old-style CFG

for tests at least

* simplifies IR lifter, since barrier implies return

well at least after we have fixed the instruction properties

* ensures that the lifter fills in the insn

* the driver now respects barriers

therefore we had to reclassify falls after barriers as new inits.

* a small doc update

* ensures that speculative falls are taken after speculative slots

in other words, when we speculate that the code after a barrier is
reachable, in case if we have (speculative) delayed slots, we should
speculate only after the speculative slot (since CPU is allowed to
speculate)

in other other words - previous fix didn't work well for mips, because
a delay slot is always taken even after a barrier.

* updates the microcfg test

a lone sub instruction may no longer form a valid graph, since it
doesn't form a chain that ends with a jump.

also updates the assert message for four nodes, it was saying "three
nodes", but checking for the three nodes.

* restores proper to_string/of_string functions for bitvectors

* updates the mini-projects test

since now BAP rejects invalid graphs, we have to pass some valid
input, to make a mini-project.

* hacks llvm to prevent linking errors during toplevel loading

Some LLVM installations produce an `.a` file with an unresolved
external (secondary) dependency, namely `setupterm`, e.g.,
```
/usr/lib/llvm-6.0/lib/libLLVMSupport.a(Process.cpp.o): reference to setupterm
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libtinfo.so: definition of setupterm
```

And although `llvm-config --link-static --system-libs` correctly
mentions `-ltinfo` the name still remains unresolved on modern Ubuntu
system. Adding `-lcurses` (or `-lncurses`) resolves the issue. Of
course, it may break installation on other targets and systems, so we
introduce this temporary and directly into the oasis file, to see how
it will behave in the wild.

We will later add some proper configuration once we will collect
enough information.

* provides at least basic information if no BIL is avaiable

* updates the testsuite

* downgrades code to the dialect understandable by older OCamls

not a big deal, but most likely we will be still bumping the lower
bound to 4.07 or even 4.08, since we are going to update to core v0.12
or even v0.13 before the final release.

* updates the CI opam file with the removes for newly added libs

* makes most of the important properties persistent

* makes knowledge state serializable

* removes the toplevel state from the project

* simplifies the toplevel representation

* adds knowledge caching to the frontend

* implements caching of the project state

We're splitting the project into the state and the derived
components. We do not want to store the latter, as they are
not portable and could be computed.

The state is currently on the state of the disassembler we
will later add the callgraph.

* adds serialization to the callgraph

as well as publishes a new function in the Graphlib.Std.Solution
module

* eta-expands a function to mitigate a bug in 4.06 compiler

* bumps version to 2.0.0-alpha

maybe beta is better suited, but let's be humble :)

* wires calls to the toplevel via the library

I'm keeping it undocumented right now, because this will be soon
overwritten by a separate toplevel library, which will be independent
on Bap.Std

* updates the testsuite

* enables boolean operations in the bitvec context

it is done by casting of bool expressions to bitvec via ite.

* uses Var.create along with Var.Ident.of_string in CT Parser

Well this should be done long time ago, and it was... but it looks
like that it wasn't done for all cases.

* stop reflecting temps to lets in the BIL Lifter

we used to do this previously, since we didn't have a proper support
for temps in BIL, but now since BIL is using CT Vars, we can safely
pass variable literals and let the CT Parser to handle them properly.

* disables BIL optimizations and normalizations

Neither are needed anymore, though could still be used, especially
passes, so we're not going to disable it.

* updates testsuite

* fixes scoped variables

for some reason they were made mutable, now it is corrected

* keeps signed/unsigned casts abstraction

they were accidentally lowered to primitive bit operations.

* enables two optimizations of the cast expressions

The two new rules are:

1. any two consequent casts of the same size and type could be reduced to one
case, e.g., `low:32[low:32[x]] is `low:32[x]` and so on for `high`,
`pad`, and `extend`, i.e., `cast:m[cast:n[x]] = cast:m[x]`

2) if there are two consequent casts of the same type, s.t. the
inner-most cast size is bigger than the outer-most cast size, then we
can drop the inner-most cast, e.g., `low:n[low:m[x]]` is `low:n[x]`,
if `m > n`. The same is true for all other cast types. In this
reasoning we do not assume that `low` and `high` are always narrowing and
`pad` and `extend` are always widening.

* publishes Bil semantics in the Bil_semantics interface

and loads it in the bil_main, so that later we can add a command line
option which will select an appropriate semantics.

* enables back the optimization

although BIL semantics is now self-optimizing it is still missing some
global optimizations (like elimination of useless assignments, aka
value propagation) which prevents our tests from working. We will
later disable the optimization passes, once our Bil denotational
optimizer will handle such cases.

* updates the testsuite

* fixes a small bug in the optimizer

it needs an architecture, if none is present in the context, then it
will just translate jumps to specials.

* fixes the semantics printer

1) The Value.pp_slots function didn't work at all,
it was assertfaulting because the sexp_of_f function
of Dict wasn't really implemented.

2) The IR printers weren't handling the empty filter as
"print everything" as it is should be by the documentation.

Also, adds the instruction destinations inspector.

* deletes a stale file

btw, it was a good idea to add useful small primitives as a syntax
module, but we can do this later.

* fixes (again) Bitvector sexp representation

it was still using the binary notation in sexp

* prettifies inspectors for a couple of domains

* adds --dump=knowledge option to `bap` that dumps the knowledge base

we will later add a much better tools for knowledge
introspection and visualization, but so far we are blocked by
the bap frontend work, since we need for that a better commandline
interface, than bap 1.x provides. But so far it is better than
nothing.

* implements the fair control flow in Primus

As of Primus 1.x and before this commit, the interprocedural control
flow wasn't following the return stack, but instead ignored the return
statements and was always following the control flow graph.

This had several ramifications
1) it was fragile as not all jumps could be properly classified as
returns statically
2) it didn't work with tail calls
3) it wasn't fair, because we were misinterpreting malicious program
as benign.

The new implementation is following the true control flow, i.e., we
are not ignoring any jumps and call the computed destinations. In case
when the dynamic control flow diverges from the static control flow,
we are raise the Cfi_violation exception, which is properly trapped in
the same manner as Division_by_zero and Segmentation_fault.

Implementation
--------------

We use a shadow stack approach, and each interprocedural call which we
expect to return pushes the return value (which we call a prompt) to
that stack. The the contination linked to the call destination is executed
bounded by the pushed prompt. It will stop when it reached the pushed
prompt (i.e., when the control is literally returned to us) or when
the continuation returns normally (e.g., when a Primus Lisp or
primitive is called). In either case, we pop a prompt (if it was
pushed), and compare the prompt value with the expected and raise (a
trappable) exception if they diverge.

* updated submodule with bap-veri

* fixes Bitvec.to_int[X] converters for negative numbers

* fixed minor bug in module naming

Loading branch information

ivg authored and gitoleg committed Aug 14, 2019

1 parent aefa92b commit 2773e8c

.gitignore

-Original file line number
+Diff line change
@@ Expand Up / @@ -40,3 +40,4 @@ _oasis @@
     /plugins/primus_greedy/.merlin
     /plugins/primus_lisp/primus_lisp_config.ml
     /setup.exe
+    /tools/bap_config

.merlin

-Original file line number
+Diff line change
@@ Expand Up / @@ -13,6 +13,9 @@ B _build @@
     B _build/lib/bap
     B _build/lib/bap_abi
     B _build/lib/bap_api
+    B _build/lib/bitvec
+    B _build/lib/bitvec_order
+    B _build/lib/bitvec_sexp
     B _build/lib/bap_bundle
     B _build/lib/bap_c
     B _build/lib/bap_config
@@ Expand All / @@ -24,13 +27,15 @@ B _build/lib/bap_image @@
     B _build/lib/bap_plugins
     B _build/lib/bap_sema
     B _build/lib/bap_types
+    B _build/lib/bap_core_theory
     B _build/lib/graphlib
     B _build/lib/monads
     B _build/lib/ogre
     B _build/lib/bap_primus
     B _build/lib/bap_llvm
     B _build/lib/regular
     B _build/lib/text_tags
+    B _build/lib/knowledge
     S lib/bap
     S lib/bap_abi
@@ Expand Down @@

lib/arm/arm_env.ml

-Original file line number
+Diff line change
@@ Expand Up / @@ -15,10 +15,6 @@ let reg32 reg = make_register reg reg32_t @@
     let spsr = reg32 `SPSR
     let cpsr = reg32 `CPSR
-    (* Memory definition *)
-    (* let mem = new_var "mem32" (TMem (Reg 32, Reg 8)) *)
     (* Arithmetic flags, individually *)
     let nf = "NF" %: bool_t
     let zf = "ZF" %: bool_t
@@ Expand Down Expand Up / @@ -85,4 +81,4 @@ let of_reg : reg -> var = function @@
       | #ccr_reg as reg -> var_of_ccr reg
     let new_var name = Var.create name reg32_t
-    let mem = Var.create "mem" (mem32_t `r32)
+    let mem = Var.create "mem" (mem32_t `r8)

lib/bap/.merlin

-Original file line number
+Diff line change
@@ Expand Up / @@ -6,5 +6,6 @@ B ../../_build/lib/bap_image @@
     B ../../_build/lib/bap_disasm
     B ../../_build/lib/bap_sema
     B ../../_build/lib/bap_bundle
+    B ../../_build/lib/knowledge
     REC

0 comments on commit `2773e8c`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `2773e8c`

Commit

There are no files selected for viewing

0 comments on commit 2773e8c

0 comments on commit `2773e8c`