Skip to content

Latest commit

 

History

History
158 lines (118 loc) · 10.7 KB

MIGRATION.md

File metadata and controls

158 lines (118 loc) · 10.7 KB

Migrating to angr 7

The release of angr 7 introduces several departures from long-standing angr-isms. While the community has created a compatibility layer to give external code written for angr 6 a good chance of working on angr 7, the best thing to do is to port it to the new version. This document serves as a guide for this.

SimuVEX is gone

angr versions up through angr 6 split the program analysis into two modules: simuvex, which was responsible for analyzing the effects of a single piece of code (whether a basic block or a SimProcedure) on a program state, and angr, which aggregated analyses of these basic blocks into program-level analysis such as control-flow recovery, symbolic execution, and so forth. In theory, this would encourage for the encapsulation of block-level analyses, and allow other program analysis frameworks to build upon simuvex for their needs. In practice, no one (to our knowledge) used simuvex without angr, and the separation introduced frustrating limitations (such as not being able to reference the history of a state from a SimInspect breakpoint) and duplication of code (such as the need to synchronize data from state.scratch into path.history).

Realizing that SimuVEX wasn't a usable independent package, we brainstormed about merging it into angr and further noticed that this would allow us to address the frustrations resulting from their separation.

All of the SimuVEX concepts (SimStates, SimProcedures, calling conventions, types, etc) have been migrated into angr. The migration guide for common classes is bellow:

Before After
simuvex.SimState angr.SimState
simuvex.SimProcedure angr.SimProcedure
simuvex.SimEngine angr.SimEngine
simuvex.SimCC angr.SimCC

And for common modules:

Before After
simuvex.s_cc angr.calling_conventions
simuvex.s_state angr.sim_state
simuvex.s_procedure angr.sim_procedure
simuvex.plugins angr.state_plugins
simuvex.engines angr.engines
simuvex.concretization_strategies angr.concretization_strategies

Additionally, simuvex.SimProcedures has been renamed to angr.SIM_PROCEDURES, since it is a global variable and not a class. There have been some other changes to its semantics, see the section on SimProcedures for details.

Removal of angr.Path

In angr, a Path object maintained references to a SimState and its history. The fact that the history was separated from the state caused a lot of headaches when trying to analyze states inside a breakpoint, and caused overhead in synchronizing data from the state to its history.

In the new model, a state's history is maintained in a SimState plugin: state.history. Since the path would now simply point to the state, we got rid of it. The mapping of concepts is roughly as follows:

Before After
path state
path.state state
path.history state.history
path.callstack state.callstack
path.trace state.history.descriptions
path.addr_trace state.history.bbl_addrs
path.jumpkinds state.history.jumpkinds
path.guards state.history.guards
path.actions state.history.actions
path.events state.history.events
path.recent_actions state.history.recent_actions

An important behavior change about path.actions and path.recent_actions - actions are no longer tracked by default. If you would like them to be tracked again, please add angr.options.refs to your state.

Path Group -> Simulation Manager

Since there are no paths, there cannot be a path group. Instead, we have a Simulation Manager now (we recommend using the abbreviation "simgr" in places you were previously using "pg"), which is exactly the same as a path group except it holds states instead of paths. You can make one with project.factory.simulation_manager(...).

Errored Paths

Before, error resilience was handled at the path level, where stepping a path that caused an error would return a subclass of Path called ErroredPath, and these paths would be put in the errored stash of a path group. Now, error resilience is handled at the simulation manager level, and any state that throws an error during stepping will be wrapped in an ErrorRecord object, which is not a subclass of SimState, and put into the errored list attribute of the simulation manager, which is not a stash.

An ErrorRecord object has attributes for .state (the initial state that caused the error), .error (the error that was thrown), and .traceback (the traceback from the error). To debug these errors you can call .debug().

These changes are because we were uncomfortable making a subclass of SimState, and the ErrorRecord class then has sufficiently different semantics from a normal state that it cannot be placed in a stash.

Changes to SimProcedures

The most noticeable difference from the old version to the new version is that the catalog of built-in simprocedures are no longer organized strictly according to which library they live in. Now, they are organized according to which standards they conform to, which helps with re-using procedures between different libraries. For instance, the old SimProcedures['libc.so.6'] has been split up between SIM_PROCEDURES['libc'], SIM_PROCEDURES['posix'], and SIM_PROCEDURES['glibc'], depending on what specifications each function conforms to. This allows us to reuse the libc catalog in msvcrt.dll and the MUSL libc, for example.

In order to group SimProcedures together by libraries, we have introduced a new abstraction called the SimLibrary, the definitions for which are stored in angr.procedures.definitions. Each SimLibrary object stores information about a single shared library, and can contain SimProcedure implementations, calling convention information, and type information. SimLibraries are scraped from the filesystem at import time, just like SimProcedures, and placed into angr.SIM_LIBRARIES.

Syscalls are now categorized through a subclass of SimLibrary called SimSyscallLibrary. The API for managing syscalls through SimOS has been changed - check the API docs for the SimUserspace class.

One important implication of this change is that if you previously used a trick where you changed one of the SimProcedures present in the SimProcedures dict in order to change which SimProcedures would be used to hook over library functions by default, this will no longer work. Instead of SimProcedures[lib][func_name] = proc, you now need to say SIM_LIBRARIES[lib].add(func_name, proc). But really you should just be using hook_symbol anyway.

Changes to hooking

The Hook class is gone. Instead, we now can hook with individual instances of SimProcedure objects, as opposed to just the classes. A shallow copy of the SimProcedure will be made at runtime to preserve thread safety.

So, previously, where you would have done project.hook(addr, Hook(proc, ...)) or project.hook(addr, proc), you can now do project.hook(addr, proc(...)). In order to use simple functions as hooks, you can either say project.hook(addr, func) or decorate the declaration of your function with @project.hook(addr).

Having simprocedures as instances and letting them have access to the project cleans up a lot of other hacks that were present in the codebase, mostly related to the self.call(...) SimProcedure continuation system. It is no longer required to set IS_FUNCTION = True if you intend to use self.call() while writing a SimProcedure, and each call-return target you use will have a unique address associated with it. These addresses will be allocated lazily, which does have the side effect of making address allocation nondeterministic, sometimes based on dictionary-iteration order.

Changes to loading

The hook_symbol method will no longer attempt to redo relocations for the given symbol, instead just hooking directly over the address of the symbol in whatever library it comes from. This speeds up loading substancially and ensures more consistent behavior for when mixing and matching native library code and SimProcedure summaries.

The angr externs object has been moved into CLE, which will ALWAYS make sure that every dependency is resolved to something, never left unrelocated. Similarly, CLE provides the "kernel object" used to provide addresses for syscalls now.

Before After
project._extern_obj loader.extern_object
project._syscall_obj loader.kernel_object

Several properties and methods have been renamed in CLE in order to maintain a more consistent and explicit API. The most common changes are listed below:

Before After
loader.whats_at() loader.describe_addr
loader.addr_belongs_to_object() loader.find_object_containing()
loader.find_symbol_name() loader.find_symbol().name
whatever the hell you were doing before to look up a symbol loader.find_symbol(name or addr)
loader.find_module_name() loader.find_object_containing().provides
loader.find_symbol_got_entry() loader.find_relevant_relocations()
loader.main_bin loader.main_object
anything.get_min_addr() anything.min_addr
symbol.addr symbol.linked_addr

Changes to the solver interface

We cleaned up the menagerie of functions present on state.solver (if you're still referring to it as state.se you should stop) and simplified it into a cleaner interface:

  • solver.eval(expression) will give you one possible solution to the given expression.
  • solver.eval_one(expression) will give you the solution to the given expression, or throw an error if more than one solution is possible.
  • solver.eval_upto(expression, n) will give you up to n solutions to the given expression, returning fewer than n if fewer than n are possible.
  • solver.eval_atleast(expression, n) will give you n solutions to the given expression, throwing an error if fewer than n are possible.
  • solver.eval_exact(expression, n) will give you n solutions to the given expression, throwing an error if fewer or more than are possible.
  • solver.min(expression) will give you the minimum possible solution to the given expression.
  • solver.max(expression) will give you the maximum possible solution to the given expression.

Additionally, all of these methods can take the following keyword arguments:

  • extra_constraints can be passed as a tuple of constraints. These constraints will be taken into account for this evaluation, but will not be added to the state.
  • cast_to can be passed a data type to cast the result to. Currently, this can only be str, which will cause the method to return the byte representation of the underlying data. For example, state.solver.eval(state.solver.BVV(0x41424344, 32, cast_to=str) will return "ABCD".