Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organize api docs #398

Merged
merged 5 commits into from
Aug 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@ python:
path: .
extra_requirements:
- docs

sphinx:
fail_on_warning: true
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,8 @@ There are several backends that can be used to load a single file:
- PE is a backend to load Microsoft's Portable Executable format,
effectively Windows binaries. It uses the (optional) `pefile` module.

- Mach-O is a backend to load, you guessed it, Mach-O binaries. It is
subject to several limitations, which you can read about in the
[readme in the macho directory](backends/macho/README.md)
- Mach-O is a backend to load, you guessed it, Mach-O binaries. Support is
limited for this backend.

- Blob is a backend to load unknown data. It requires that you specify
the architecture it would be run on, in the form of a class from
Expand Down
21 changes: 0 additions & 21 deletions cle/backends/macho/README.md

This file was deleted.

16 changes: 8 additions & 8 deletions cle/backends/macho/macho.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,16 +75,16 @@ def get_by_name_and_ordinal(self, name: str, ordinal: int, include_stab=False) -
class MachO(Backend):
"""
Mach-O binaries for CLE
-----------------------

The Mach-O format is notably different from other formats, as such:
* Sections are always part of a segment, self.sections will thus be empty
* Symbols cannot be categorized like in ELF
* Symbol resolution must be handled by the binary
* Rebasing in dyld is implemented via adding a small slide to addresses inside the binary, instead of
changing the base address of the binary and the addresses being relative. CLE needs relative addresses,
so there are a lot of AT.from_lva().to_rva() calls in this backend.
The Mach-O format is notably different from other formats. Specifically:

* ...
- Sections are always part of a segment, so `self.sections` will be empty.
- Symbols cannot be categorized like in ELF.
- Symbol resolution must be handled by the binary.
- Rebasing in dyld is implemented by adding a small slide to addresses inside the binary, instead of
changing the base address of the binary. Consequently, the addresses are absolute rather than relative.
CLE requires relative addresses, leading to numerous `AT.from_lva().to_rva()` calls in this backend.
"""

is_default = True # Tell CLE to automatically consider using the MachO backend
Expand Down
108 changes: 55 additions & 53 deletions cle/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,53 +55,6 @@ class Loader:
"""
The loader loads all the objects and exports an abstraction of the memory of the process. What you see here is an
address space with loaded and rebased binaries.

:param main_binary: The path to the main binary you're loading, or a file-like object with the binary
in it.

The following parameters are optional.

:param auto_load_libs: Whether to automatically load shared libraries that loaded objects depend on.
:param load_debug_info: Whether to automatically parse DWARF data and search for debug symbol files.
:param concrete_target: Whether to instantiate a concrete target for a concrete execution of the process.
if this is the case we will need to instantiate a SimConcreteEngine that wraps the
ConcreteTarget provided by the user.
:param force_load_libs: A list of libraries to load regardless of if they're required by a loaded object.
:param skip_libs: A list of libraries to never load, even if they're required by a loaded object.
:param main_opts: A dictionary of options to be used loading the main binary.
:param lib_opts: A dictionary mapping library names to the dictionaries of options to be used when
loading them.
:param ld_path: A list of paths in which we can search for shared libraries.
:param use_system_libs: Whether or not to search the system load path for requested libraries. Default True.
:param ignore_import_version_numbers:
Whether libraries with different version numbers in the filename will be considered
equivalent, for example libc.so.6 and libc.so.0
:param case_insensitive: If this is set to True, filesystem loads will be done case-insensitively regardless of
the case-sensitivity of the underlying filesystem.
:param rebase_granularity: The alignment to use for rebasing shared objects
:param except_missing_libs: Throw an exception when a shared library can't be found.
:param aslr: Load libraries in symbolic address space. Do not use this option.
:param page_size: The granularity with which data is mapped into memory. Set to 0x1000 if you are working
in an environment where data will always be memory mapped in a page-graunlar way.
:param preload_libs: Similar to `force_load_libs` but will provide for symbol resolution, with precedence
over any dependencies.
:ivar memory: The loaded, rebased, and relocated memory of the program.
:vartype memory: cle.memory.Clemory
:ivar main_object: The object representing the main binary (i.e., the executable).
:ivar shared_objects: A dictionary mapping loaded library names to the objects representing them.
:ivar all_objects: A list containing representations of all the different objects loaded.
:ivar requested_names: A set containing the names of all the different shared libraries that were marked as a
dependency by somebody.
:ivar initial_load_objects: A list of all the objects that were loaded as a result of the initial load request.

When reference is made to a dictionary of options, it requires a dictionary with zero or more of the following keys:

- backend : "elf", "pe", "mach-o", "blob" : which loader backend to use
- arch : The archinfo.Arch object to use for the binary
- base_addr : The address to rebase the object at
- entry_point : The entry point to use for the object

More keys are defined on a per-backend basis.
"""

def __init__(
Expand All @@ -126,6 +79,55 @@ def __init__(
preload_libs: Iterable[Union[str, BinaryIO, Path]] = (),
arch: Union[archinfo.Arch, str, None] = None,
):
"""
:param main_binary: The path to the main binary you're loading, or a file-like object with the binary
in it.

:param auto_load_libs: Whether to automatically load shared libraries that loaded objects depend on.
:param load_debug_info: Whether to automatically parse DWARF data and search for debug symbol files.
:param concrete_target: Whether to instantiate a concrete target for a concrete execution of the process.
if this is the case we will need to instantiate a SimConcreteEngine that wraps the
ConcreteTarget provided by the user.
:param force_load_libs: A list of libraries to load regardless of if they're required by a loaded object.
:param skip_libs: A list of libraries to never load, even if they're required by a loaded object.
:param main_opts: A dictionary of options to be used loading the main binary.
:param lib_opts: A dictionary mapping library names to the dictionaries of options to be used when
loading them.
:param ld_path: A list of paths in which we can search for shared libraries.
:param use_system_libs: Whether or not to search the system load path for requested libraries. Default True.
:param ignore_import_version_numbers:
Whether libraries with different version numbers in the filename will be considered
equivalent, for example libc.so.6 and libc.so.0
:param case_insensitive: If this is set to True, filesystem loads will be done case-insensitively regardless
of the case-sensitivity of the underlying filesystem.
:param rebase_granularity: The alignment to use for rebasing shared objects
:param except_missing_libs: Throw an exception when a shared library can't be found.
:param aslr: Load libraries in symbolic address space. Do not use this option.
:param page_size: The granularity with which data is mapped into memory. Set to 0x1000 if you are
working in an environment where data will always be memory mapped in a page-graunlar
way.
:param preload_libs: Similar to `force_load_libs` but will provide for symbol resolution, with precedence
over any dependencies.

:ivar memory: The loaded, rebased, and relocated memory of the program.
:vartype memory: cle.memory.Clemory
:ivar main_object: The object representing the main binary (i.e., the executable).
:ivar shared_objects: A dictionary mapping loaded library names to the objects representing them.
:ivar all_objects: A list containing representations of all the different objects loaded.
:ivar requested_names: A set containing the names of all the different shared libraries that were marked as
a dependency by somebody.
:ivar initial_load_objects: A list of all the objects that were loaded as a result of the initial load request.

When reference is made to a dictionary of options, it requires a dictionary with zero or more of the following
keys:

- backend : "elf", "pe", "mach-o", "blob" : which loader backend to use
- arch : The archinfo.Arch object to use for the binary
- base_addr : The address to rebase the object at
- entry_point : The entry point to use for the object

More keys are defined on a per-backend basis.
"""
if hasattr(main_binary, "seek") and hasattr(main_binary, "read"):
self._main_binary_path = None
self._main_binary_stream = main_binary
Expand Down Expand Up @@ -289,12 +291,12 @@ def extern_object(self) -> ExternObject:

proposed model for how multiple extern objects should work:

1) extern objects are a linked list. the one in loader._extern_object is the head of the list
2) each round of explicit loads generates a new extern object if it has unresolved dependencies. this object
has exactly the size necessary to hold all its exports.
3) All requests for size are passed down the chain until they reach an object which has the space to service
it or an object which has not yet been mapped. If all objects have been mapped and are full, a new extern
object is mapped with a fixed size.
1) extern objects are a linked list. the one in loader._extern_object is the head of the list
2) each round of explicit loads generates a new extern object if it has unresolved dependencies. this object
has exactly the size necessary to hold all its exports.
3) All requests for size are passed down the chain until they reach an object which has the space to service
it or an object which has not yet been mapped. If all objects have been mapped and are full, a new extern
object is mapped with a fixed size.
"""
if self._extern_object is None:
if self.main_object.arch.bits < 32:
Expand Down
Empty file added docs/_static/.gitkeep
Empty file.
128 changes: 0 additions & 128 deletions docs/api.rst

This file was deleted.

13 changes: 13 additions & 0 deletions docs/api/backend.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Backend Interface
=================

.. automodule:: cle.backends.backend
.. automodule:: cle.backends.symbol
.. automodule:: cle.backends.regions
.. automodule:: cle.backends.region
.. automodule:: cle.backends.named_region

.. automodule:: cle.backends.externs
.. automodule:: cle.backends.externs.simdata
.. automodule:: cle.backends.externs.simdata.simdata
.. automodule:: cle.backends.externs.simdata.common
4 changes: 4 additions & 0 deletions docs/api/backends/binja.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Binary Ninja
============

.. automodule:: cle.backends.binja
4 changes: 4 additions & 0 deletions docs/api/backends/blob.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Blob
====

.. automodule:: cle.backends.blob
6 changes: 6 additions & 0 deletions docs/api/backends/cgc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
CGC
===

.. automodule:: cle.backends.cgc
.. automodule:: cle.backends.cgc.cgc
.. automodule:: cle.backends.cgc.backedcgc
23 changes: 23 additions & 0 deletions docs/api/backends/elf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
ELF Backend
===========

.. autoclass:: cle.backends.ELF

.. autoclass:: cle.backends.elf.ELFCore

.. autoclass:: cle.backends.elf.MetaELF
.. autoclass:: cle.backends.elf.metaelf.Relro

.. autoclass:: cle.backends.elf.symbol.ELFSymbol
.. autoclass:: cle.backends.elf.symbol_type.ELFSymbolType

.. autoclass:: cle.backends.elf.regions.ELFSegment
.. autoclass:: cle.backends.elf.regions.ELFSection

.. automodule:: cle.backends.elf.variable
.. automodule:: cle.backends.elf.variable_type

.. automodule:: cle.backends.elf.lsda
.. automodule:: cle.backends.elf.hashtable
.. automodule:: cle.backends.elf.subprogram
.. automodule:: cle.backends.elf.compilation_unit
4 changes: 4 additions & 0 deletions docs/api/backends/ihex.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
IHex
====

.. automodule:: cle.backends.ihex
Loading
Loading