Translation validation of x86 and decompiled LLVM IR

Problem

An important aspect of decompilation is semantic translation of recovered CFG (from binary) to a suitable IR (e.g. LLVM IR). There are decompilers like McSema, Fcd which are open sourced and output a compiler IR (LLVM IR).

The semantic translation has few problems:

Difficult to extend: If the semantic translation rules are manually specified (which is true for above mentioned tools), then extension to new/unsupported instructions is a pain.
The semantic translation could be buggy.

For example,

In decompilation tools like McSema, the semantic translation rules (from dis-assembled X86 instructions to LLVM IR) are manually specified (by human inspection of the manuals). McSema uses validation tests to test those translations( hence lacks exhaustiveness ). Problems are:
- Difficult to extend the instruction set.
- The semantic translation could be prone to mistake because of mis-interpretation of documentation or a bug in the documentation.
Tools like Strata (3) uses stochastic search based on tests & SMT solvers to automatically generate semantics of dis-assembled X86 instruction. Such tools can be pluged-in to decompilation tools like McSema, as a replacement of McSema's manually specified semantic translation(4). This solves the extension problems to an extend but are NOT guaranteed to generate accurate semantics.

Proposal

The current proposal (1) is about proving equivalence of two programs (say in LLVM IR language) which are generated by different semantic translation tools (e.g McSema's manually specified semantics Vs Strata's automatic generated semantics ) using KEQ(2).

Let's consider the following two example starting with a simpler one.

Compare semantic translation of McSema Vs semantic translation of Fcd (another decompilaton tool; alike McSema, its semantic rules are also manually specified ).

For each input program

Feed KEQ the following 3 inputs:
- Take the LLVM IR generated by McSema
- Take the LLVM IR generated by Fcd
- The K semantics for LLVM
Use KEQ to proof equivalence the McSema generated IR and Fcd generated IR.
In case the programs are not equivalent we may use the counter example to check which semantic translation is correct by running the counter example on CPU.

Compare manually crafted semantic rules of McSema (lets call in McSema semantics) with the auto generated semantic rules of Strata (Strata Semantics).

We could have two approaches here:

Approach 1

Create a decompilation tool D' by extending McSema so that it will use Strata's auto generated X86 semantics as a drop in replacement of its manually specified one.

For each input program

Feed KEQ the following 3 inputs:
- Take the LLVM IR generated by McSema
- Take the LLVM IR generated by D'
- The K semantics for LLVM
Use KEQ to proof equivalence of the two IRs.
In case the programs are not equivalent we may use the counter example to check which semantic translation is correct by running the counter example on CPU.

Approach 2

For each input program

Feed KEQ the following 4 inputs:
- Take the LLVM IR generated by McSema
- The the input x86 dis-assembled program.
- LLVM semantics in K.
- Strata's x86 semantics in K.
Use KEQ to prof equivalence of x86 disassembled program (accompanied with the Strata semantics in K) and McSema generated LLVM IR (with the LLVM IR semantics in K).
In case the programs are not equivalent we may use the counter example to check which semantic translation is correct by running the counter example on CPU.

Feedback/Comments

Even though here we are trusting the disassembler for converting the x86 to assembly, but that trust is for both the comparison candidates. That way a bug in disassembly will be manifested equally in both the compared semantics.
In this proposal, I refrained from using translation because I am trying to proof equivalence of 2 programs which are not related to "one translates to other" rather they are generated by different translations. Having said that, pointing out pairs of corresponding sync points in two programs might be difficult (as KEQ requires that for its functionality). But the reason I am optimistic is because the semantic translation in de-compilation process in not like the compilation process ( which does invoke a lot of optimizations and thus need external hints about the sync points) but rather they map 1-1 from dis-assembled output to LLVM IR, so I hope find corresponding sync points in those 2 LLVM IRs might not be that difficult.

Footnotes

Initially we thought about validating the entire decompilation of McSema (which consist of x86 --> dis-assembly + semantic translation --> LLVM IR) by trusting the strata semantics, which does not seem to be the right choice because of reasons mentioned below.
- How can we trust Strata's semantics as they are also not guaranteed to produce accurate semantics.
- Also another problem was to trust the dis-assembler in decompilation tools like McSema. The reason is Strata gives the semantics of assembly programs, so using the tool KEQ we can "translation validate" Assembly programs and corresponding LLVM IR programs, but this misses the dis-assembly which is the very first step of decompilation.
KEQ: K framework based language independent translation validation tool like KEQ. Even though the tool in its current form does not generate counter examples in the event when the input programs are not equivalent but that is not a fundamental limitation. KEQ as a tool needs 4 inputs:
1. The K semantic of language 1
2. The K semantic of language 2 (Language 1 & 2 may be the same)
3. Two programs in languages 1 and 2 to validate the translation of one program to the other. KEQ can even handle programs in same language.
Strata: Strata in its current form specify the semantics of a complex assembly instruction in terms of a sequence of simpler assembly instructions ( whose semantics are known ).
For plugin in Strata in McSema, Strata could be extended so that it will generated LLVM IR instead of assembly instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation validation of x86 and decompiled LLVM IR

Problem

Proposal

Compare semantic translation of McSema Vs semantic translation of Fcd (another decompilaton tool; alike McSema, its semantic rules are also manually specified ).

Compare manually crafted semantic rules of McSema (lets call in McSema semantics) with the auto generated semantic rules of Strata (Strata Semantics).

Approach 1

Approach 2

Feedback/Comments

Footnotes

Clone this wiki locally