The release of angr 7 introduces several departures from long-standing angr-isms. While the community has created a compatibility layer to give external code written for angr 6 a good chance of working on angr 7, the best thing to do is to port it to the new version. This document serves as a guide for this.
angr versions up through angr 6 split the program analysis into two modules: simuvex
, which was responsible for analyzing the effects of a single piece of code (whether a basic block or a SimProcedure) on a program state, and angr
, which aggregated analyses of these basic blocks into program-level analysis such as control-flow recovery, symbolic execution, and so forth.
In theory, this would encourage for the encapsulation of block-level analyses, and allow other program analysis frameworks to build upon simuvex
for their needs.
In practice, no one (to our knowledge) used simuvex
without angr
, and the separation introduced frustrating limitations (such as not being able to reference the history of a state from a SimInspect breakpoint) and duplication of code (such as the need to synchronize data from state.scratch
into path.history
).
Realizing that SimuVEX wasn't a usable independent package, we brainstormed about merging it into angr and further noticed that this would allow us to address the frustrations resulting from their separation.
All of the SimuVEX concepts (SimStates, SimProcedures, calling conventions, types, etc) have been migrated into angr. The migration guide for common classes is bellow:
Before | After |
---|---|
simuvex.SimState | angr.SimState |
simuvex.SimProcedure | angr.SimProcedure |
simuvex.SimEngine | angr.SimEngine |
simuvex.SimCC | angr.SimCC |
And for common modules:
Before | After |
---|---|
simuvex.s_cc | angr.calling_conventions |
simuvex.s_state | angr.sim_state |
simuvex.s_procedure | angr.sim_procedure |
simuvex.plugins | angr.state_plugins |
simuvex.engines | angr.engines |
simuvex.concretization_strategies | angr.concretization_strategies |
Additionally, simuvex.SimProcedures
has been renamed to angr.SIM_PROCEDURES
, since it is a global variable and not a class.
There have been some other changes to its semantics, see the section on SimProcedures for details.
In angr, a Path object maintained references to a SimState and its history. The fact that the history was separated from the state caused a lot of headaches when trying to analyze states inside a breakpoint, and caused overhead in synchronizing data from the state to its history.
In the new model, a state's history is maintained in a SimState plugin: state.history
.
Since the path would now simply point to the state, we got rid of it.
The mapping of concepts is roughly as follows:
Before | After |
---|---|
path | state |
path.state | state |
path.history | state.history |
path.callstack | state.callstack |
path.trace | state.history.descriptions |
path.addr_trace | state.history.bbl_addrs |
path.jumpkinds | state.history.jumpkinds |
path.guards | state.history.guards |
path.actions | state.history.actions |
path.events | state.history.events |
path.recent_actions | state.history.recent_actions |
An important behavior change about path.actions
and path.recent_actions
- actions are no longer tracked by default.
If you would like them to be tracked again, please add angr.options.refs
to your state.
Since there are no paths, there cannot be a path group.
Instead, we have a Simulation Manager now (we recommend using the abbreviation "simgr" in places you were previously using "pg"), which is exactly the same as a path group except it holds states instead of paths.
You can make one with project.factory.simulation_manager(...)
.
Before, error resilience was handled at the path level, where stepping a path that caused an error would return a subclass of Path called ErroredPath, and these paths would be put in the errored
stash of a path group.
Now, error resilience is handled at the simulation manager level, and any state that throws an error during stepping will be wrapped in an ErrorRecord object, which is not a subclass of SimState, and put into the errored
list attribute of the simulation manager, which is not a stash.
An ErrorRecord object has attributes for .state
(the initial state that caused the error), .error
(the error that was thrown), and .traceback
(the traceback from the error).
To debug these errors you can call .debug()
.
These changes are because we were uncomfortable making a subclass of SimState, and the ErrorRecord class then has sufficiently different semantics from a normal state that it cannot be placed in a stash.
The most noticeable difference from the old version to the new version is that the catalog of built-in simprocedures are no longer organized strictly according to which library they live in.
Now, they are organized according to which standards they conform to, which helps with re-using procedures between different libraries.
For instance, the old SimProcedures['libc.so.6']
has been split up between SIM_PROCEDURES['libc']
, SIM_PROCEDURES['posix']
, and SIM_PROCEDURES['glibc']
, depending on what specifications each function conforms to.
This allows us to reuse the libc
catalog in msvcrt.dll
and the MUSL libc, for example.
In order to group SimProcedures together by libraries, we have introduced a new abstraction called the SimLibrary, the definitions for which are stored in angr.procedures.definitions
.
Each SimLibrary object stores information about a single shared library, and can contain SimProcedure implementations, calling convention information, and type information.
SimLibraries are scraped from the filesystem at import time, just like SimProcedures, and placed into angr.SIM_LIBRARIES
.
Syscalls are now categorized through a subclass of SimLibrary called SimSyscallLibrary. The API for managing syscalls through SimOS has been changed - check the API docs for the SimUserspace class.
One important implication of this change is that if you previously used a trick where you changed one of the SimProcedures present in the SimProcedures
dict in order to change which SimProcedures would be used to hook over library functions by default, this will no longer work.
Instead of SimProcedures[lib][func_name] = proc
, you now need to say SIM_LIBRARIES[lib].add(func_name, proc)
.
But really you should just be using hook_symbol
anyway.
The Hook
class is gone.
Instead, we now can hook with individual instances of SimProcedure objects, as opposed to just the classes.
A shallow copy of the SimProcedure will be made at runtime to preserve thread safety.
So, previously, where you would have done project.hook(addr, Hook(proc, ...))
or project.hook(addr, proc)
, you can now do project.hook(addr, proc(...))
.
In order to use simple functions as hooks, you can either say project.hook(addr, func)
or decorate the declaration of your function with @project.hook(addr)
.
Having simprocedures as instances and letting them have access to the project cleans up a lot of other hacks that were present in the codebase, mostly related to the self.call(...)
SimProcedure continuation system.
It is no longer required to set IS_FUNCTION = True
if you intend to use self.call()
while writing a SimProcedure, and each call-return target you use will have a unique address associated with it.
These addresses will be allocated lazily, which does have the side effect of making address allocation nondeterministic, sometimes based on dictionary-iteration order.
The hook_symbol
method will no longer attempt to redo relocations for the given symbol, instead just hooking directly over the address of the symbol in whatever library it comes from.
This speeds up loading substancially and ensures more consistent behavior for when mixing and matching native library code and SimProcedure summaries.
The angr externs object has been moved into CLE, which will ALWAYS make sure that every dependency is resolved to something, never left unrelocated. Similarly, CLE provides the "kernel object" used to provide addresses for syscalls now.
Before | After |
---|---|
project._extern_obj |
loader.extern_object |
project._syscall_obj |
loader.kernel_object |
Several properties and methods have been renamed in CLE in order to maintain a more consistent and explicit API. The most common changes are listed below:
Before | After |
---|---|
loader.whats_at() |
loader.describe_addr |
loader.addr_belongs_to_object() |
loader.find_object_containing() |
loader.find_symbol_name() |
loader.find_symbol().name |
whatever the hell you were doing before to look up a symbol | loader.find_symbol(name or addr) |
loader.find_module_name() |
loader.find_object_containing().provides |
loader.find_symbol_got_entry() |
loader.find_relevant_relocations() |
loader.main_bin |
loader.main_object |
anything.get_min_addr() |
anything.min_addr |
symbol.addr |
symbol.linked_addr |
We cleaned up the menagerie of functions present on state.solver
(if you're still referring to it as state.se
you should stop) and simplified it into a cleaner interface:
solver.eval(expression)
will give you one possible solution to the given expression.solver.eval_one(expression)
will give you the solution to the given expression, or throw an error if more than one solution is possible.solver.eval_upto(expression, n)
will give you up to n solutions to the given expression, returning fewer than n if fewer than n are possible.solver.eval_atleast(expression, n)
will give you n solutions to the given expression, throwing an error if fewer than n are possible.solver.eval_exact(expression, n)
will give you n solutions to the given expression, throwing an error if fewer or more than are possible.solver.min(expression)
will give you the minimum possible solution to the given expression.solver.max(expression)
will give you the maximum possible solution to the given expression.
Additionally, all of these methods can take the following keyword arguments:
extra_constraints
can be passed as a tuple of constraints. These constraints will be taken into account for this evaluation, but will not be added to the state.cast_to
can be passed a data type to cast the result to. Currently, this can only bestr
, which will cause the method to return the byte representation of the underlying data. For example,state.solver.eval(state.solver.BVV(0x41424344, 32, cast_to=str)
will return"ABCD"
.