Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* new implementation of fsqrt Based on approximation of sqrt(x + 1) with Ramez coefficients * initial scrap of the bitvector, still wip * bap-elementary, initial commit * new and efficient bitvector implementation the interface design is inspired (or more correctly enforced) by the current optimization capabilities of OCaml, the end goal is that if a bitvector operation could be computed without memory allocations, then it should be, even if the width is not known in advance. this is still a wip * wrote docs * fixed a typo * finished the initial implementation didn't provide the specializations, probably will add them later. * renames bitvector to bitvec and adds auxiliary libraries Since we're going to use Bitvec as our next new Bitvector implementation and since we don't want to alias it to Word or Addr anymore (because it will introduce confusion with Bap.Std.Word and Bap.Std.Addr modules as well as significandly slow down compilation), we need a shorter name. However, Bitv is already used in the opam universe, and Bv is probably too short. So the middle ground was chosen, which actually matches with the SmtLib name for bitvectors. We also added two sublibraries along with submodules which enable support for bitvectors in the Janestreet suite of libraries. - Bitvec_order provides support for the interfaces that depend on ordering, like maps, sets, comparables, etc - Bitvec_binprot provides support bin_io Together they provide functions that make Bitvec.t a first-class citizen of JS Core libraries. We're moving away from the old JS style of providing full comparable or Regular interface for each module, since it bloats the binaries and, more importantly, significandly slows down the compilation. We are now relying on small interfaces such as compartors, so now instead of `Bitvec.Set.t` we will use `(Bitvec.t, bitvec_order) Set.t` and instead of `Bitvec.Set.Empty` we will use `Set.empty bitvec_order` See the module documentation for more information. * makes bitvec-order compatible with newer versions of base * adds a couple of missing functions * fixes some leftovers from the merge conflict * makes bap_build indepndent on core-kernel * makes the build system indepdent on core-kernel at least an attempt * the core-theory no longer depends on BAP so BAP can depend on the core theory now. also made core theory less heavy. * moves some of new primus-list stuff to the new bitvector * wip * updates value to reflec the new Bitvec interface * finishes with the lib part * switches to an unboxed modulus representation Since many use cases of the Bitvec interface will use the generic operations with modulus created on the fly from the integer represented width, having it boxed will have sever performance ramifications. So, instead of representing modulus as a pair of width and the modulus (i.e., 2^w-1 and w) we have to decide on one or another. We decided to use 2^w-1 as the modulus representation as this is the value that is used in normalization, so computing it every time will add two extra operations which are not [@@noalloc]ed. The ramification of this choice is that we still need to go back from 2^w-1 to w. The direct solution is to count bits, which is rather efficient already. For the [msb] operation we don't really need to know the width to test whether the msb is set, since msb x m ::= (m = m lsr 1 lor x) mod m (though I'm still not sure whichever will be more efficient, the [test_bit (num_bits m)] or the implementation above, but both are rather efficient) We also reimplemented the shift operations. They no longer depend on the signed extract and are implemented purely relying on bitwise operations. We use `num_bits` to get the width there. * moves bitvector to the bitvec base * updates oasis files to add bap dependency many of our packages were dependent on bap only indirectly * makes bitvec sublibraries standalone the subpackages won't work with separate installation, since they have to share the same META file, and will break Dune and in general fragile. * The following code allows ymm registers to be cast to smaller registers. This new casting operation is used to extract xmm registers from ymm registers in order to support monomorphic functions in sse. The SysV ABI is updated to recognize functions that read and write to xmm0. * prototype for sin function from double to double * initial example of using Core Theory's floating point support to hook sin function and evaluate user-defined polynomials using Horner's method. Current example only computes a geometric series. * fix warnings regarding Type not being in scope for Imm and run signature * working merlin file * initial functions in core theory IR for inaccurate sine range reduction * intial sine and cosine implmentations using Remez's approximation. * reverse coefficients order * fix fceil bug, but need an fround implmentation for this to work. * one more test * made refactoring recommendations to seperate polynomial evaluation from range reduction and reconstruction. * renaming * wip: refining the knowledge theory * added knowledge return operators to stop recomputing values * renaming and better comment * more refactoring * new ceil! :) * multiply by constant instead of division in fmod. As well as better representations for pi values * one test passes sort of :Dgit add approximation.ml ! * undo initial ABI changes * fixes indentation * optimizes sin computation (10 times better now) the idea is that we can flip the sign using a simple xor operation. also, the reduction step stores whether the correction is needed in a variable, so we don't need to check it twice * nearly finished the implementation only serialization is left * adds serialization of values * implements some domain functions * drops the old modules, implements more sophisticated typing * implementing different type classes for values and objects * implemented merge and join idea for tomorrow, allow users to provide the lattice data structure * the knowledge module finally typechecks * started rewriting the Core Theory in terms of KB * at least the empty theory is ready * adds symbols to knowledge * finishes adding symbols to knowledge decided to forfeit symbol importing right now. * the theory manager now compiles * vars are now using knowledge objects * the parser also compiles * core theory compiles! * implements symbols visibility now we keep track whether it is a public symbol or not * adds namespaces importing enough with namespaces... really tired of them :/ * abstracts peristence, adds class order - instead of using Binprot directly we now abstract the persistence class, so that later we can easily provide backing stores that support ad-hoc data types (like Redis). The new interface is also easier to use, as it provides functional combinanors. - enables partial ordering between classes, adds some documentation that clarifies class samenames and specificness * adds support for knowledge data representation (maybe we should add support for integer data, like bitvectors, instead of letting it to the downstream libraries...) * adds support for immediates Not sure about the interface yet, but sooner or later we will need them. * primus 2.0 is nearly ready, just trying to figure out a proper signature for run. * ok, Primus doesn't require any components the components part should be handled by the static knowledge * satisfied the signature * adds flat domains and made optional flat A flat domain is stratifies the set into two classes, empty or something. Empty is the bottom element. All elements that are not empty are non-comparable. Basically any set with an empty element satisfies the domain structure, moreover this is how most of our domains are defined. Even more, when we want to derive domain from the optional representation we actually want the flat domain. For example, the `addr option` representation actually means that there is either some knowledge about the address of an object, or there is no such knowlede. It doesn't mean that `Some 0xdead` has more knowledge than `Some 0xbeef`. * all promises are fullfiled when the knowledge monad is run well, at least the promises specific to the class. * bap.std finally compiles * moves the new primus out of the bap_primus folder we decided to keep the old primus intact, and maybe update it to the new primus machine, instead of getting rid of it. * restores primus from the bap-new-semantics branch * fixes the print plugin no longer pp_domains * adds a missing depedency to the core-theory * renames decide to fact * simplifies the class type Removes the hierarchies, we don't really need them. And when necessary they could be organized for a concrete family of classes using the same approach. It is now possible to declare abstract classes and then refine them to concrete, or declare a concrete class from the start. Also, makes the class type constructor covariant, so that a class could be defined as a polymoprhic instance. * makes 'a cls,'a obj, and ('a,'p) slot covariant in 'a * revises the sort system Implemented a simple algebra of sorts, which consists of sort atoms and applications of sorts (atoms could be either symbols or numbers). The algebra is typed, but atoms are universally quantified. A Magic module is provided which enables breaching of the type system, so that a sort type could be restored after serialization. This is now safer, when it was before, as the phantom types is now reconstructred from the sort definition, rather then set arbitrary at the place of sort definition. * makes sort refinement more safe * uses Theory.Var for vars, adds Unk branch to Bap.Std.typ * wip, nearly finished rewriting IR in new semantics * moves Ir_jmp to the new semantics * moves the whole IR to the new semantics * IR no longer uses BIL * the Insn.t is now the Semantics.t * renames links to labels link is a binary attribute, while label is the unary * extends the derived interface of Objects * The Tid.t is now the Theory.label We will now perform all side effects in the knowledge monad. * moves bap to the new runtime * creates program abstraction in CT and refines label * updates major components to the new interface * reverts primus components to the old primus * fixes a bug in the bitvector representation * fixes the value merging function the order was confused * fixes IR generator still not sure why it is implemented so indirectly * changed the merging strategy of insn still need something better for merging values. * adds the runtime lock uses the runtime in bap functions * relaxes the lock otherwise it is impossible to use. We still need to figure out how to enable some sort of protection while preserving usability. For example, we can at least allow to read from the state without any check. * makes bap_c_abi error printable * refines IR jmp representation Dropes the roles along with multiple links as the were increasing the impendance mismatch between the representation and its reification into BIL. The old BIL jmp could have up to two destinations, not N destinations. Since the new representation was accepting many destinations it was tempting to use the same jmp term to represent a control flow which should be represented with several jumps. In the new represenation we have up to two destinations, which we denote with `dst` and `alt` detinations. Both are optional, therefore we now can have 0, 1, or 2 destinations, ``` +-----+-----+-----------+ | dst | alt | semantics | +-----+-----+-----------+ | - | - | undefined | | x | - | goto x | | - | y | call y | | x | y | call y x | +-----+-----+-----------+ ``` where `call x` denotes `call x with noreturn` and `call y x` denotes `call y return x`. Here we loosing the `ret x` which is denoted with `alt x` and then reified back to `call x` instead of `ret x`. In order to recover the call correctly we need to know whether `x` denotes a function entry or it is in the middle of the function, which is, in general, an undecidable property. Therefore, we extend the notion of call, to an interprocedural control flow transfer. The value of the `dst` and `alt` fields is an abstract variant with two options, `Indirect vec` parametrized with the vector expression, denoting a set of possible destinations, and `Resolved tid`, parametrised with the destination label. * prints negative numbers as subtractions of positives just easier to read * the CFG is now more or less properly linked * fixes call wiring as a separate pass (not a real pass though) we rewire all jmps to entry blocks as calls to corresponding subroutines. Note, this will also affect recursive subroutines - all jmps to the self entry block will be treated as interprocedural jumps. As a nice side effect, this pass will enforce, that the entry block has not incomming edges. * fixes all bugs now it works :) * fixes a bug in variable susbtituter Note, this is not a new bug, the substituter was always broken in BAP. The variable substitution (exposed via the `substitute_var` function) is not capture free, contrary to what the function description is saying. The body was still substituted even when the variable is captured with an enclosing let expression. It is fixed now. * hardens bap_bitvector implementation A few bugs were due to bap_bitvector internal representation was breaching the abstraction of the packed bitvector. The new implementation carefully establishes boundaries, so that we can't accidentally treat a packed representation as a bitvector. Also the textual representation of the Bap_bitvector is restored to match the previous representation, so that we can pass the tests. * prints argument types using the old bap types if possible If an argument type is representable in old BIL we use the old notation in our IR printer to print it. Otherwise we fall back to the normal sort representation. * updates the elementary function * fixes the signed comparison function * makes disasm failure in powerpc tests more verbose * fixes object string representation * fixes representation of tids in piqi serializers * fixes flat domains, unit tests are passing now Previously, the derived flat domain was treating all values as noncomparable if they weren't empty. Therefore `Some 2` and `Some 2` were non-comparable. In fact, we can safely assume that equal values, contain equal informational content, thus they are informationally equal. Therefore, we decided to define flat domain via a pair of values, the empty value, and the equality operator. * removes some debugging information * refines reconstructor, symtab, and IR lifter to fit new semantics Symtab Changes -------------- First of all, I've refined the symtab interface to make it clearer. Now we have the shadow callgraph in the symtab, that stores each call edge from the callsite, denoted with the basic block, to a callee, which could be a real function also present in the symtab, or an external, which represented just by its name. We distinguish between two kinds of calls - explicit and implicit. An explicit call is produced by a control flow instruction, e.g., jump, branch, or call, while an implicit call is produced via a fallthrough of the control flow from one subroutine to another (yes, this can happend, and we support this since 1.6). Therefore, it is possible that one callsite has two call edges - one explicit call and another implicit, e.g., ``` __printf_chk: call __stack_check printf: ... ``` Which we should lift as two calls, ``` sub __printf_chk: call __stack_check with return next next: call printf printf: ... ``` In general, each block can have 0,1,2 call edges. This is modeled in the API with two functions, `explicit_callee` and `implicit_callee`, both returning an optional type. Call edges are inserted into the callgraph using one `Symtab.insert_call` function, that takes an optional argument to distinguish between two kinds of edges. This commit also fixes the remove function, which wasn't removing function calls, when one was removed. Reconstructor Changes --------------------- - Updates to the new symtab interface - Fixes an externa call classification. It was previously inserted as a fall edge, while it is actually an opposite, it is always an explicit call. IR Lifter Changes ----------------- The IR lifter implementation no longer depends on BIL and is purely implemented using the new interfaces. It is basically rewritten nearly from scratch. It is still not passing tests (we still do not insert synthetic subroutines, they are not needed anymore, but for the sake of backward compatibility, we will probably introduce them back, maybe as a separate pass). The new implementation precisely handels different variants of calls, trying to keep the CFG representation as tight as possible (i.e., do not introduce unnecessary blocks). It is also, more conservative wrt to inter/intra classification of jumps. When the whole program is lifted, we ensure that all goto's will never leave the boundaries of their subroutines. We turn all jumps for which we can't provide such guantees to calls. Basically, it means, that in the output IR it is now impossible to find an unresolved goto (all gotos now have a form of `goto %....`). The new lifter is thoroughly commented, so please continue reading there. * WIP: a complete rew of the recursive descent disassembler The new implementation is supporting delay slots (and out-of-order execution in general), more conservative (it disasembles more than a linear sweep disassembler), uses much less space, and is much easier to understand and to maintain, since it is only 200 lines of code. However, there are still some bugs, so it is a work in progress. * makes the disassembler more robust * prints the disasembly in the ascending order previously we were printing in the reverse postorder, which actually complicates reading of the dasm output, which is not printing the fall destinations. * cleans up the disassembler looks like it works fine now * adds a new constrained superset disassembler * embeds the constrained superset disassembler into the KB monad * fixes a bug in the ror x86 semantics the order of expressions was confused * switches to the new superset disassembler still lots of work to be done, but looks promising * fixes a couple of bugs 1) adds retraction if we fell of the memory (it's fine to jump over the boundaries) 2) fixes retraction in case of a delay slot 3) fixes delay slot implementation (it worked only for delays equal to 1) * prints disassembly in the increasing addresses order * changes the scanning mode now, we only scan as much as needed, no more no less. We no longer add roots on a failed instruction, but instead, if at the end of a scan we didn't find all code, we restart we the first unsatisfied instruction as the root, and so on, until we classify everything as code or data. * a memory footprint optimization By using scoped objects for represeting links we can save enomorous number of bytes. In the previous approach, we were creating a code object for each by in the input memory and then fill its slots. We even created a unique symbol for each instruction. Thus, just the inital markup of 100Mbytes binary takes 50 Gbytes of RAM. With the new approach it is not only much more memory efficient (constant space), but also is 4 times faster (probably due to the less time spent in GC) * integrating the new disassembler into BAP still unhappy with _all_ name choices * few more renamings and refactorings 1) the basic disassembler now will keep the table of created disassemblers and do the reference counting. We specifically do not delete disassemblers, we just want to be at most one disassembler per triple. 2) The speculative disassembler is now renamed to Disasm.Driver, because it is not a speculative disassembler (one could create a speculative disassembler using this driver). 3) The Driver will create disassemblers on the fly. Now it is possible to disassemble a mutliarch program in one pass, and even create a CFG which will contain blocks of different architectures. * changes the type of Insn.t it is not the program, but its semantics. So an instruction doesn't include its address. * fixed a fixed point have to commit due to the fire alert :) * adds an optional join method to the domain This is a big change to the semantics of the knowledge system. Since now when a piece of knowledge is provided it is joined with the existing knowledge (if any). E.g., `provide s x p` must be consistent with the existing `collect s x`, it's now impossible just to override existing knowledge with something different (well modulo that we can't really control the consistency of the order and join functions that are provided by the clients of the library). When a domain is defined it is possible now to pass a join function that will provide a least upper bound of two values, or indicate a conflict. Therefore, now we have a complete lattice stucture for all properties, with conflict being the top value. The join functon, if not provided, is inferred from the order function. Given the new semantics we now can reach a fixed point in less iterations and reduce the burden of merging from the knowledge providers, since now they can rely on a default merge function. Note, it is still possible to define a recursive property with arbitrary computation in it, and the system will compute the fixed point of this computation as previously. It is just possible now to automatically rely on the default join operator, so, to some extent, this could be seen as an optimization. A good example, where we really benefit from the new semantics, is when we have several independent knowledge providers, like lifters or jump destination providers. They all are not really interested in what others are providing, as they just adding their information (in a different slot, in case of lifter, or by Set.union in case of dests). * moving forward towards well-defined fixed point semantics two new updates 1) promises are run only once 2) provide is joining and is forced to preserve monotonicity Having promises to be computed every time a property is accessed is stressing the whole system. Ideally, we shall rerun a promise only if there is some new information, but we don't want to maintain the whole dependency graph for each promise. Alternatively, a promise itself may look into the properties and decide whether it has more information or not. But this will essentially make all promises self recursive, as the have to first look into the current value, and then produce better if neccesary. Therefore, we will now evaluate each property (slot+obj combination) only once (computing the fixed point as it was before), when the slot of an object is not set. Afterwards, after the initial value is computed the slot is read and set. Another problem will re-evaluating promises is that they could every time generate non-comparable results. Provide was made monotonic probably in the previous commit. We need this to enable information joining without introducing expicit cycles. This change enforces independence on the order of execution, and ensures fixed point termination. Although it sounds nice, it comes with a price. Every time we have conflicting opinions (like different names for the same function) we can't proceed without actually giving precedence to one or another opinion. Sometimes, consensus just could not be found, so we have to either create a more complicated lattice to hold our information, e.g., using sets for all possible names and aliases of a symbol/use lattice where each name is predicated with the namer, or, somehow, prune the conflicting information (thus bringing back to life the problem with the order). An alternative solutions currently on the table 1) think about the knowledge ordering, like can we treat two different names, e.g., `.init` and `init_proc` as equivalent from the perspective of the informational contents. E.g., neither is bearing more or less information than another. So are they NC or EQ? 2) rewoke the monotonicity requirement, but instead implement joining only for promises (and probably extend the interface, so that we have two functions, one that preserves monotonicity and another that overrides) 3) explore possible data representations. This is the most valid and theoretically sound approach, but it comes with a huge implementation and reasoning burden. Like indeed, we have to admit that the same entity might have different names and in differnt circumstances different names are applicable. Therefore, a name, instead of being just a string shall become a (discriminated) union of different names. E.g., (name,name-provider). We can even enable ranking for providers, to make it easier to compute a principal name, etc. * wip: implementing the new agent based system updates and descriptions will come a yet another fire-alarm push * moves more service on the knowledge runtime * renames Bap_state to Bap_toplevel also Toplevel module was rewritten in the previous commit, we now have a more top-level like interface. And, we are now having type `'a Toplevel.t` which denotes the top-level computation monad, which is declared as `'a Toplevel.t = 'a` in Bap.Std but will be turned back into a monad outside of Bap.Std. * adds aliases slot to the program label class a program label may have multiple names all equally correct. * fixes abi demangling and the usage of Tid.set_name since variables in Knowledge should rename consistent, this function should not be used to change the name. Moreover, demangled name is not the true name, but rather a view, therefore the demangled name is added to the set of object aliases. * small renamings of the new symtab interface * makes Typeid in Bap.Std.Value abstract, fixes pretty printing this is the first step, before we will merge Bap.Std.Value into the new Knowledge framework. Basically, two options are researched, adding one slot that will contain the dictionary of all values. Or, turning each tag into a slot. The latter is more natural, but is harder to implement. * unifies Bap.Std.Value with Knowledge Now a tag is a weaker version of a slot, so that we can now store our attributes directly in the knowledge base. * switches the project to the new engine * fixes insn delay computation Instead of providing one promise to compute the semantics of the instruction we were providing 37 promises (by the number of branching instructions we support), each promise was asking for the semantics (for the name), which was triggering the fixed point computation, therefore we ended up with ~80 computations per instruction. Now, we register only one promise. We also request semantics only after we match the architecture. (In fact, we do not need the semantics at all, as we could get the name from the program object itself, though to make mips less dependent on Bap.Std, and for the sake of demonstration of the Knowledge capabilities, we decided to keep it). Note also, that it is fine to request semantics before the arch, it will end up in a very slight inefficiency, as the delay provider will be called twice (in case if arch matched, otherwise it will return an empty instruction and won't be restartred), to make sure that the fixed point is reached. * extracts IR directly from semantics, instead of reruning the lifter it was a reminiscent of an old infrastructure, now it is faster (program reconstructon went from 10 seconds to 100ms on /bin/ls), and is more general, as now the IR semantics doesn't depend on BIL. * adds the `join` parameter to domain definition functions * lifts each instruction only once previously we were creating a new instruction object every time a lifter was called. The implementation will essentially guarantee that for each instruction (address) we will call lifter only once, for the rest (calls) we will get semantics ready in the knowledge base * some microoptimizations to the Exp/Bil/Bir domains not sure whether they give us anything, so probably we could get rid of them... * a new implementation of universal maps this is an ongoing experiment. We are going to substitute Core's Univ_map/Map with a more optimal implementation. The following optimizations are applied: 1) keys are inlined in the tree braches 2) the branches itself are inlined up to level 4 3) tree heights are not stored 4) balance is -1,0 5) data is only stored in leaves The last optimization could actually be a pessimization. We will experiment on this also. * finishes switching to the new maps so far it works correctly * optimizes tree representation even more, now we're storing values in the intermediate nodes, not only in leaves. * adds a few inlined attributes to the monads library not sure if it will have any effect, but shouldn't make it worse * revises the class/sort system it was observed that our slots weren't polymorphic in classes, therefore we were upcasting our values to base classes, which wasn't in fact a no-op, since we had to change the class annotation stored in the value. Therefore, an extra allocation for each Value.get information. This forced us to revise the class/sort system. Now, it is not only a little bit more efficient, but is much more straightforward. First of all, class is defined as a set of sorts. In terms of OCaml class is indexed by the sort type variable, so that we can create multiple instances of the same class parametrized by different sorts. Slots no longer depend on the sort index, but only on the class index, therefore all instances of a class, no matter the sort, could be accessed without any manipulations. On the Core Theory level we rectify the naming issues and now we have two sorts: Theory.Value.Sort and Theoyr.Effect.Sort (previously we had an assymetric sort vs effect, where effect was a sort of effect - pretty confusing). We also now have a Theory.Value module, which is symmetric to the Theory.Effect module, as well as Theory.value and Theory.effect types. This makes writing types much easier. Finally, the sort system is also revised, since we now have a notion of sort on the Knowledge level our Theory sorts fit nicely into it. They are no longer classes, but, as expected, sorts. Finally, since we have an ideal empty value for the expression domain (since the introduction of the Unk type to BIL), we no longer need to make expression domain optional. This also saves a few allocations. As a grain of salt, all those optimizations didn't win a lot, about 10% of performance (and 80% improvemnt in the memory footprint). * yet another pass on dictionary optimization while our dictionaries are twice as fast as Core's dictionaries for small inputs, we loose the leadership drastically on larger inputs. For inputs larger than 1000 we are ten times slowe than our counterpart :( A couple of optimizations improved performance, but the overall picture is still the same. Two things were optimized - key uid is precomputed and stored in the key. We use a simple integer instead of relying on the extension constructor id, since the latter turned out to be rather expensive (lots of guards). Second, when a value is inserted we consider cases when it should be inserted into the head, as previously we were diving down to the leaf level. And although it was expected that this optimization should fix the performance gap for large trees it, unfortunately, didn't. The next suspect is the balancing procedure. An obvious (but tedious) optimization is instead of creating a disbalanced tree and then balancing, do this in one step. Also, it is interesting that core's balance function is not recursive, which guarantees that there is no more than one balancing per insertion. Can we guarantee the same? * continues working on the optimized trees we're already 10 times faster than core's tree, and are using 6 times less memory But still some corner cases are unfinished. * an intermediate step in havl tree implementation so far they work x10 faster than JS maps, but in some cases we are not able to maintain our rank rules. In the next iteration, we will try to use wavl approach with ranks and ranks increments instead of classifying trees into LL/LR trees. * wip on trees * rank balanced wavl style tree is mostly ready I would call it bonsai tree, but of course the name is already used( * fixes a missing case in the rank_increases function forgot about EQ in some cases. * a more optimal lower level maximizes the number of T4 * a few final optimizations and the cleanup. Microbenchmaring shows that the less we touch the leaves the better. In fact, if we will drop 90% of the insert function, microbenchmarks will run even faster - but we will have a less compact representation. In any case, for small maps (<1000 entries) we are running 4 times faster than JS maps. For very small maps (< 30 entries, actually our expected payload), we are running 8 times faster than core maps and even faster than Core's hash table ``` ┌──────┬────────────┬─────────┬──────────┬──────────┬────────────┐ │ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │ ├──────┼────────────┼─────────┼──────────┼──────────┼────────────┤ │ dict │ 307.58ns │ 327.02w │ │ │ 12.22% │ │ core │ 2_517.40ns │ 892.04w │ 0.36w │ 0.36w │ 100.00% │ │ hash │ 531.14ns │ 201.00w │ │ │ 21.10% │ └──────┴────────────┴─────────┴──────────┴──────────┴────────────┘ ``` For huge maps (> million of entries) we're still faster than core maps, but most of the time we spend in GC and despite the fact that we always allocate at least two times less than Core's map the time difference is not that big, since OCaml's GC runtime does not depend linearly on the number of bytes (but rather on the number of roots, which should be basically the same). * fixes the inline attribute (I keep confusing inline/inlined) * adds caching to BIL lifter * removes caching from the BIL lifter Caching on the semantics level breaks the overall semantics of the program, because semantically equal instructions should still have different IR (in terms of the labels at least). It might be indeed an error in the design of the IR lifter, which probably, should be considered the semantics, but rather a container for semantics. However, it is not quite obvious how we can separate them, given that we have to use labels in control flow terms. * updates the primus_machine to the new class representation * switches to custom type keys the type key from Core (provided by the Type_equal.Id) are relying on the extension_constructor which is non robust (too many checks) and is rather slow (again too many checks). In our representation, we just use our own `int`-typed counter to check the uniqueness of keys, if needed. In any case OCaml itself is robust enough to check the keys equality, so we usually don't need to do its work. * adds GC setup code This setup corresponds to `s=1M,w=20,i=64M,o=200` A bigger minor heap helps to minimize the number of words that get promoted, while still keeping the memory footprint rather low. A fixed increment of 64 megawords is better than the relative increment, as when the memory consumption comes close to dozens of gigabytes, doubling the memory may lead to a disaster, while only a few gigabytes are actually needed. Finally, an increased smoothing window size should fit better with the bap flow, where we have the reconstruction phase with a lot of allocations and usually a much calmer analysis phase. TBH, I didn't see any real effects from changing the window size. * optimizes some maps in monads Also, - removes some dead code - adds a few inline(d) attributes - destroys indentation, ocp-indent is broken with attributes * set the optimization level to 0 by default * fixes the name of the external-reference ogre attribute and fixes the indentation as well * restores the correct merge function * fixes opinions and ignores empty knowledge So far, we were storing empty knowledge if it was provided directly or via a promise or suggestion. Besides an increased memory footprint, it wasn't a big problem, except with the opinions data structure. The latter weren't doing anything special to handle empty values, therefore the empty values were treated as normals and were participating in elections. As a result, if majority of the providers (or the authorative provider) didn't have the information (provided the empty knowledge), it, the empty knowledge, was selected. This commit fixes this issue, in addition we decided to ignore all attempts to store an empty value in the knowledge base. It is a mere optimization (for everything except opinions). * ensures that the symtab symbolizer doesn't provide empty names since it is an authorative source of information, we shoudl be sure that it doesn't provide bogus informaton, as it will override any information provided by other agents. * enables image rooter and symbolizer when a projec is created it might be better to provide it is as a separate plugin and do not depende on image at all, but so far let's move in small steps. * enhances IDA's plugin error messaging. This error code 0 doesn't really say anything * revamps the objdump plugin First of all it fixes a bug when an entry with the name `FUN@plt-OFF` was considered as a function with name FUN, e.g., both `memove@plt` and `memmove@plt-0x10` were considered as the `memmove` function which let to a lot of problems. Note, this is a normal behavior of objdump, which if the name is not known just finds the closest known location and adds an offset to it. Also, since I was here, I gave a little bit of love to the plugin, notably - performance optimizations: 1) process the input on the fly (do not create the list of input, lines this is slow and will OOM the plugin for rather large input files) 2) compile the regular expression once (not for every line) 3) simplify the regexp and get rid of all string ops (we can perfectly express what we need in regexp, there is no need to apply extra checks or string operations) 4) get rid of extra allocations (do not create substrings, pairs, or lists) 5) use Z.t instead of words to represent addresses. Takes less space, faster, and easier to parse (no need to substring). - adds an optional demangler since objdump can do demangling for us, a new --obdump-demangler option controls which demangler is used. Set it to `disabled` to disable demangling. * adds the rooter capability to the objdump plugin Now the objdump plugin is also providing information about function starts. * stores interprocedural edges in the symtab * publishes the barrier property * improves the Program IR lifting makes it more precise, by eliminating false edges based on instruction semantics. * respects inter falls only if there are no jumps * adds support for kernel-style relocation back again The idea is to fix the fixups as soon as possible to prevent the problem from spreading to far away. Previously we had to touch a lot of code to fix the havoc caused by unfixed jumps. Therefore, the new solution is to fix the BIL code based on the available relocation information, and everything else will work correctly automatically. The main problem here is the representation of the external call, which we can't represent as a jump, since the latter requires a concrete address. A simple (but not the cleanesst solution) would be to use the special statment with a fixup string, which is later resolved to a correctly named function. Since the solution is totally confined within the BIL plugin it is rather safe and robust to changes. We also added back the synthetic procedures and will keep them until the abi/api pass is not rewritten in the new framework. * fixes the sexp printer in Bap.Std.Bitvector it was not human readable. Also registers the new serializers. * fixes the label printer in Bap.Std.Jmp It didn't print the `@` prefix before named tids. * fixes the asm printer in the insn since now the assembler representation could be set via the interface it is no longer guaranteed that it will be normalized, so we're normalizing it before printing. * updates BAP Lisp to the changes in the Core Theory interface * better error messages in byteweight * fixes the bitvector comparison function during optimization I accidentally removed the size comparison. * makes the code start a function start if none provided It is our old tradition to treat the first address as a root for disassembly if no roots were intially provided by a user. This commit restores this tradition. * prevents two subroutines to claim the same identifier We are distinguishing subroutines by names, but we can't guarantee that our name providers won't provide equal names to different subroutines, therefore, we should be extra cautious when we assign a tid to a subroutine by its name and check if it was already assigned. * tracks call targets as subroutines We shouldn't forget to mark targets of known calls as subroutines, so that our callgraph reconstructor can correctly partition the whole program CFG. We are taking an extra care not to mark as subroutines targets of calls that do not belong to a valid chain (i.e., which will be erased from a CFG). The disasm test is also updated to reflect the new changes: 1) The knowledge runtime doesn't allow changing the architecture of an address, so we had to reset the knowledge. 2) The new CFG reconstruction algorithm is more conservative, so it now reconstructs an unreachable block, so we had to update the `call1_3ret` test, in which now our algorithm (correctly) produces a CFG with 4 blocks. * removes debugging output * fixes instruction properties * restores conditional edges in the old-style CFG for tests at least * simplifies IR lifter, since barrier implies return well at least after we have fixed the instruction properties * ensures that the lifter fills in the insn * the driver now respects barriers therefore we had to reclassify falls after barriers as new inits. * a small doc update * ensures that speculative falls are taken after speculative slots in other words, when we speculate that the code after a barrier is reachable, in case if we have (speculative) delayed slots, we should speculate only after the speculative slot (since CPU is allowed to speculate) in other other words - previous fix didn't work well for mips, because a delay slot is always taken even after a barrier. * updates the microcfg test a lone sub instruction may no longer form a valid graph, since it doesn't form a chain that ends with a jump. also updates the assert message for four nodes, it was saying "three nodes", but checking for the three nodes. * restores proper to_string/of_string functions for bitvectors * updates the mini-projects test since now BAP rejects invalid graphs, we have to pass some valid input, to make a mini-project. * hacks llvm to prevent linking errors during toplevel loading Some LLVM installations produce an `.a` file with an unresolved external (secondary) dependency, namely `setupterm`, e.g., ``` /usr/lib/llvm-6.0/lib/libLLVMSupport.a(Process.cpp.o): reference to setupterm /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libtinfo.so: definition of setupterm ``` And although `llvm-config --link-static --system-libs` correctly mentions `-ltinfo` the name still remains unresolved on modern Ubuntu system. Adding `-lcurses` (or `-lncurses`) resolves the issue. Of course, it may break installation on other targets and systems, so we introduce this temporary and directly into the oasis file, to see how it will behave in the wild. We will later add some proper configuration once we will collect enough information. * provides at least basic information if no BIL is avaiable * updates the testsuite * downgrades code to the dialect understandable by older OCamls not a big deal, but most likely we will be still bumping the lower bound to 4.07 or even 4.08, since we are going to update to core v0.12 or even v0.13 before the final release. * updates the CI opam file with the removes for newly added libs * makes most of the important properties persistent * makes knowledge state serializable * removes the toplevel state from the project * simplifies the toplevel representation * adds knowledge caching to the frontend * implements caching of the project state We're splitting the project into the state and the derived components. We do not want to store the latter, as they are not portable and could be computed. The state is currently on the state of the disassembler we will later add the callgraph. * adds serialization to the callgraph as well as publishes a new function in the Graphlib.Std.Solution module * eta-expands a function to mitigate a bug in 4.06 compiler * bumps version to 2.0.0-alpha maybe beta is better suited, but let's be humble :) * wires calls to the toplevel via the library I'm keeping it undocumented right now, because this will be soon overwritten by a separate toplevel library, which will be independent on Bap.Std * updates the testsuite * enables boolean operations in the bitvec context it is done by casting of bool expressions to bitvec via ite. * uses Var.create along with Var.Ident.of_string in CT Parser Well this should be done long time ago, and it was... but it looks like that it wasn't done for all cases. * stop reflecting temps to lets in the BIL Lifter we used to do this previously, since we didn't have a proper support for temps in BIL, but now since BIL is using CT Vars, we can safely pass variable literals and let the CT Parser to handle them properly. * disables BIL optimizations and normalizations Neither are needed anymore, though could still be used, especially passes, so we're not going to disable it. * updates testsuite * fixes scoped variables for some reason they were made mutable, now it is corrected * keeps signed/unsigned casts abstraction they were accidentally lowered to primitive bit operations. * enables two optimizations of the cast expressions The two new rules are: 1. any two consequent casts of the same size and type could be reduced to one case, e.g., `low:32[low:32[x]] is `low:32[x]` and so on for `high`, `pad`, and `extend`, i.e., `cast:m[cast:n[x]] = cast:m[x]` 2) if there are two consequent casts of the same type, s.t. the inner-most cast size is bigger than the outer-most cast size, then we can drop the inner-most cast, e.g., `low:n[low:m[x]]` is `low:n[x]`, if `m > n`. The same is true for all other cast types. In this reasoning we do not assume that `low` and `high` are always narrowing and `pad` and `extend` are always widening. * publishes Bil semantics in the Bil_semantics interface and loads it in the bil_main, so that later we can add a command line option which will select an appropriate semantics. * enables back the optimization although BIL semantics is now self-optimizing it is still missing some global optimizations (like elimination of useless assignments, aka value propagation) which prevents our tests from working. We will later disable the optimization passes, once our Bil denotational optimizer will handle such cases. * updates the testsuite * fixes a small bug in the optimizer it needs an architecture, if none is present in the context, then it will just translate jumps to specials. * fixes the semantics printer 1) The Value.pp_slots function didn't work at all, it was assertfaulting because the sexp_of_f function of Dict wasn't really implemented. 2) The IR printers weren't handling the empty filter as "print everything" as it is should be by the documentation. Also, adds the instruction destinations inspector. * deletes a stale file btw, it was a good idea to add useful small primitives as a syntax module, but we can do this later. * fixes (again) Bitvector sexp representation it was still using the binary notation in sexp * prettifies inspectors for a couple of domains * adds --dump=knowledge option to `bap` that dumps the knowledge base we will later add a much better tools for knowledge introspection and visualization, but so far we are blocked by the bap frontend work, since we need for that a better commandline interface, than bap 1.x provides. But so far it is better than nothing. * implements the fair control flow in Primus As of Primus 1.x and before this commit, the interprocedural control flow wasn't following the return stack, but instead ignored the return statements and was always following the control flow graph. This had several ramifications 1) it was fragile as not all jumps could be properly classified as returns statically 2) it didn't work with tail calls 3) it wasn't fair, because we were misinterpreting malicious program as benign. The new implementation is following the true control flow, i.e., we are not ignoring any jumps and call the computed destinations. In case when the dynamic control flow diverges from the static control flow, we are raise the Cfi_violation exception, which is properly trapped in the same manner as Division_by_zero and Segmentation_fault. Implementation -------------- We use a shadow stack approach, and each interprocedural call which we expect to return pushes the return value (which we call a prompt) to that stack. The the contination linked to the call destination is executed bounded by the pushed prompt. It will stop when it reached the pushed prompt (i.e., when the control is literally returned to us) or when the continuation returns normally (e.g., when a Primus Lisp or primitive is called). In either case, we pop a prompt (if it was pushed), and compare the prompt value with the expected and raise (a trappable) exception if they diverge. * updated submodule with bap-veri * fixes Bitvec.to_int[X] converters for negative numbers * fixed minor bug in module naming
- Loading branch information