Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISCV PLT call causes subsequent instructions to be lost. #1606

Open
matt-j-griffin opened this issue Jun 11, 2024 · 1 comment
Open

RISCV PLT call causes subsequent instructions to be lost. #1606

matt-j-griffin opened this issue Jun 11, 2024 · 1 comment

Comments

@matt-j-griffin
Copy link
Contributor

I've been using BAP to analyze cURL in RISC-V (libcurl.4.4.0).

Calling llvm-objdump on the binary results in this dump.

Generating BIL for the same binary using bap libcurl.4.4.0 -dbil.adt produces this file.

In the BIL output, after the instruction jal appears in a subroutine all the subsequent instructions are lost. In these cases, jal is used to call PLT stubs in the binary.

An example can be found in the curl_easy_getinfo subroutine given below:

000000000001be2c <curl_easy_getinfo>:
   1be2c: 5d 71        	addi	sp, sp, -0x50
   1be2e: 3e fc        	sd	a5, 0x38(sp)
   1be30: 1c 10        	addi	a5, sp, 0x20
   1be32: 06 ec        	sd	ra, 0x18(sp)
   1be34: 32 f0        	sd	a2, 0x20(sp)
   1be36: 36 f4        	sd	a3, 0x28(sp)
   1be38: 3a f8        	sd	a4, 0x30(sp)
   1be3a: c2 e0        	sd	a6, 0x40(sp)
   1be3c: c6 e4        	sd	a7, 0x48(sp)
   1be3e: 3e e4        	sd	a5, 0x8(sp)
   1be40: ef e0 ef ab  	jal	0x1a0fe <Curl_getinfo>
   1be44: e2 60        	ld	ra, 0x18(sp)
   1be46: 61 61        	addi	sp, sp, 0x50
   1be48: 82 80        	ret

The BIL for this subroutine is as follows:

1be2c: <curl_easy_getinfo>
1be2c:
1be2c: addi sp, sp, -0x50
(Move(Var("X2",Imm(64)),PLUS(Var("X2",Imm(64)),Int(18446744073709551536,64))))
1be2e: sd a5, 0x38(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(56,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1be30: addi a5, sp, 0x20
(Move(Var("X15",Imm(64)),PLUS(Var("X2",Imm(64)),Int(32,64))))
1be32: sd ra, 0x18(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(24,64)),Var("X1",Imm(64)),LittleEndian(),64)))
1be34: sd a2, 0x20(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(32,64)),Var("X12",Imm(64)),LittleEndian(),64)))
1be36: sd a3, 0x28(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(40,64)),Var("X13",Imm(64)),LittleEndian(),64)))
1be38: sd a4, 0x30(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(48,64)),Var("X14",Imm(64)),LittleEndian(),64)))
1be3a: sd a6, 0x40(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(64,64)),Var("X16",Imm(64)),LittleEndian(),64)))
1be3c: sd a7, 0x48(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(72,64)),Var("X17",Imm(64)),LittleEndian(),64)))
1be3e: sd a5, 0x8(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(8,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1be40: jal -0x1d42
(Move(Var("X1",Imm(64)),Int(114244,64)), Jmp(Int(106750,64)))

Instructions at 1be44, 1be46 and 1be48 do not appear in the BIL output.

Is there a workaround?

@bmourad01
Copy link
Contributor

Playing with bap mc, I can see that we get the following output:

$ bap mc --show-knowledge --addr 0x1be40 --target riscv64 -- ef e0 ef ab 
(in-package user)
(in-class core:program)
(bap:start-pseudo-node
  ((core:label-aliases (start-pseudo-node))
   (core:label-name (start-pseudo-node))))
(bap:exit-pseudo-node
  ((core:label-aliases (exit-pseudo-node))
   (core:label-name (exit-pseudo-node))))
(<0xa>
  ((bap:lisp-args
    ((((lisp-symbol (X1)) (bap:exp X1))
      ((bap:static-value (0xffffffffffffe2be)) (bap:exp 0xFFFFFFFFFFFFE2BE)))))
   (bap:lisp-name (llvm-riscv64:JAL))
   (primus:attributes ((core:context (context (target riscv)))))
   (bap:insn ((JAL X1 -0x1d42)))
   (bap:mem ("1be40: ef e0 ef ab"))
   (core:semantics
    ((bap:ir-graph "0000000d:
                    0000000e: X1 := 0x1BE44
                    00000011: goto %0000000f")
     (bap:insn-dests ((15)))
     (bap:insn-ops ((X1 -7490)))
     (bap:insn-asm "jal -7490")
     (bap:insn-opcode JAL)
     (bap:insn-properties
      ((:invalid false)
       (:jump true)
       (:cond false)
       (:indirect false)
       (:call false)
       (:return false)
       (:barrier true)
       (:affect-control-flow true)
       (:load false)
       (:store false)))
     (bap:bir (%0000000d))
     (bap:bil "{
                 X1 := 0x1BE44
                 jmp 0x1A0FE
               }")
     (core:insn-code ("ef e0 ef ab"))))
   (core:label-addr (0x1be40))
   (core:label-unit (3))
   (core:encoding bap:llvm-riscv64)))
(0x1a0fe ((core:label-addr (0x1a0fe)) (core:label-unit (3))))
(in-class bap:toplevel)
(bap:main
  ((bap:insn13 <opaque>)
   (bap:target-and-encoding12 <opaque>)
   (bap:last2 <opaque>)))
(in-class core:theory)
(core-internal:'(bap\:bir bap\:jump-dests bap\:bil-fp-emu)
  ((core:instance
    ((bap:bir bap:bil core:empty bap:jump-dests bap:bil-fp-emu)))))
(core-internal:'(bap\:bil-fp-emu)
  ((core:instance
    ((bap:bil core:empty bap:bil-fp-emu)
     "semantics in BIL, including FP emulation"))))
(core-internal:'(bap\:jump-dests)
  ((core:instance
    ((core:empty bap:jump-dests) "an approximation of jump destinations"))))
(core-internal:'(bap\:bir)
  ((core:instance
    ((bap:bir core:empty) "Builds the graphical representation of a program."))))
(in-class core:unit)
(unit
  ((bap:primus-lisp-context
    (context (patterns enabled) (x86-floating-points intrinsic-semantics)))
   (core:unit-source
    ((bap:typed-program (<opaque>))
     (bap:primus-lisp-program <opaque>)
     (core:source-language bap:primus-lisp)))
   (core:unit-target bap:riscv64)))

Specifically, we have (:call false) in bap:insn-properties. My hypothesis is that the edge from 1be40 to 1be44 is not present in the intraprocedural CFG for curl_easy_getinfo, since BAP doesn't recognize that this is a function call. Can you verify that this address is part of the whole-program disassembly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants