Bugs Found

Strata bug

From http://www.programgo.com/article/4441958781/ The relevent cases are ** so sub(A, 0) == A add(A, 0) != A

** 0.0 - 0.0  ->  0.0
(-0.0) - (-0.0)  ->  0.0
(-0.0) + (-0.0)  ->  -0.0
** (-0.0) + 0.0  ->  0.0
** 0.0 + (-0.0)  ->  0.0
** (-0.0) - 0.0  ->  -0.0

/home/sdasgup3/Github/strata/stoke/bin/specgen compare --circuit_dir /home/sdasgup3/Github/strata-data/circuits --opcode extractps_r64_xmm_imm8_66
Circuit for 'extractps $0x42, %xmm1, %rbx' (opcode extractps_r64_xmm_imm8_66)
not equivalent for '%rbx':
  strata:        0x0₃₂ ∘ sub_double(%ymm1[127:64], 0x0₆₄)[31:0]
  hand-written:  0x0₃₂ ∘ (%ymm1[127:0] >> (0x0₁₂₆ ∘ 0x2₂ << 0x5₁₂₈))[31:0]

What if we provode the input: %ymm1[127:64] == -0. sub is not a problem

Examples are abundant in match stoke 01 for immediates as most of the non equivalents are due to UIF on one and non UIFS on other side.

Eg:
	modified:   concrete_instances/immediate-variants/extractps_r32_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/extractps_r64_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/palignr_xmm_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/pextrq_r64_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/pextrw_r32_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/pinsrd_xmm_r32_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/pshufd_xmm_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/vextractps_r32_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/vpextrd_r32_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/vpextrq_r64_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/vpextrw_r32_xmm_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/vpinsrd_xmm_xmm_r32_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/vpinsrq_xmm_xmm_r64_imm8/match_stoke.01.log
	modified:   concrete_instances/immediate-variants/vpshufd_xmm_xmm_imm8/match_stoke.01.log

Master Stoke

For the folloing, 'of' need to be corrected. Undef testing

rclb_r8_cl
rclb_rh_cl
rcll_r32_cl
rclq_r64_cl
rclw_r16_cl
rcrb_r8_cl
rcrb_rh_cl
rcrl_r32_cl
rcrq_r64_cl
rcrw_r16_cl
rolb_r8_cl
rolb_rh_cl
roll_r32_cl
rolq_r64_cl
rolw_r16_cl
rorb_r8_cl
rorb_rh_cl
rorl_r32_cl
rorq_r64_cl
rorw_r16_cl
salb_r8_cl
salb_rh_cl
sall_r32_cl
salq_r64_cl
salw_r16_cl
sarb_r8_cl
sarb_rh_cl
sarl_r32_cl
sarq_r64_cl
sarw_r16_cl
shlb_r8_cl
shlb_rh_cl
shll_r32_cl
shlq_r64_cl
shlw_r16_cl
shrb_r8_cl
shrb_rh_cl
shrl_r32_cl
shrq_r64_cl
shrw_r16_cl

For the following, sf need to be undef. (Cought using undef count testing)

imulb_r8
imulb_rh
imull_r32  
imull_r32_r32
imulq_r64
imulq_r64_r64
imulw_r16
imulw_r16_r16

For the followings,

haddps_xmm_m128
vhaddps_xmm_xmm_m128
vhaddps_ymm_ymm_m256

For formulas are all correct as per manual. But the testing shows that they are not equivalent. Thats because of a simplification formula add_single(A, B) == A if B == 0.0 which does not hold good for A == -0.0. Same restrictions are for add_double/sub_single/sub_double. Proof

Stoke tried to generalize vcvtpd2dq_xmm_m256 from simple handler's vcvtpd2dq_xmm_ymm and comes up with wrong formula as the strata's generalization is buggy. It needs to be generated alike simple_handler's ``vcvtpd2dq_xmm_ymm`. To reproduce:

sf   --smtlib_format   --opc vcvtpd2dq_xmm_m256

Self Goal

For the followings, using the stoke formula instead of strata formula. It gives 2 benefits: simple formula and correctness over af flags. Also testing becomes much more easier as build strata formula is much costlier.

cmpb_m8_imm8 cmpb_m8_rh cmpb_m8_r8 cmpb_rh_m8 cmpb_r8_m8 cmpl_m32_imm32 cmpl_m32_imm8 cmpl_m32_r32 cmpq_m64_imm32 cmpl_r32_m32 cmpq_m64_imm8 cmpq_m64_r64 cmpq_r64_m64 cmpw_m16_imm16 cmpw_m16_imm8 cmpw_m16_r16 cmpw_r16_m16 decb_m8 decl_m32 decq_m64 decw_m16 incw_m16 negb_m8 negl_m32 negw_m16 negq_m64 sbbb_m8_imm8 sbbb_m8_rh sbbb_m8_r8 sbbl_m32_imm32 sbbl_m32_imm8 sbbb_rh_m8 sbbb_r8_m8 sbbq_m64_imm8 sbbq_m64_imm32 sbbl_m32_r32 sbbw_m16_imm16 sbbw_m16_imm8 sbbl_r32_m32 sbbw_m16_r16 sbbw_r16_m16 sbbq_m64_r64 sbbq_r64_m64 subb_m8_imm8 subb_m8_r8 subb_m8_rh subb_r8_m8 subb_rh_m8 subl_m32_imm32 subl_m32_imm8 subl_m32_r32 subl_r32_m32 subq_m64_imm8 subq_m64_imm32 subq_m64_r64 subq_r64_m64 subw_m16_imm16 subw_m16_imm8 subw_m16_r16 subw_r16_m16

Note: All taken fro non-strata handler and af are tested to be OK.

For the followings, we got non-eqivalent formulas for cf between strata_handler and simple_handler.
```
btq_m64_imm8::cf
btl_m32_imm8::cf
btl_m32_r32::cf
btq_m64_r64::cf
```
In this case, The difference does not mean that either of them or both are wrong as we have tested both the formulas and they seem to be true. The diff is due to that fact the symbolic variable for memory has been drawn from different addresses and have different widths. For the time being, we are accepting the formula from the simple handler.

Please remember to generate the generic formula using keep_imm_symbolic flag as the address part will depend on the imm operand.

To recreate, do this after allowing the instr to use either of strata or simple handler.

    sc --strata_path ../../circuits/  --smtlib_format   --code "btq \$0xa, (%rbx)" 
    sc   --smtlib_format   --code "btq \$0xa, (%rbx)" 
     ================== Simple Handler =====================
     %cf    : (let ((a!1 (concat ((_ extract 64 1)
                     (bvashr (concat TMP_BV_64_0 #b0)
                             (concat #b000000000000000000000000000000000000000000000000000000000
                                     #x0a)))
                   #b0)))
    ((_ extract 0 0)
    (bvashr a!1
            (concat #b000000000000000000000000000000000000000000000000000000000
                    #x01))))

   Information about memory reads:
  Value TMP_BV_64_0
 (8 bytes)
    was read at address %rbx
  ================== Simple Handler =====================
   %cf    : ((_ extract 0 0) (bvlshr TMP_BV_8_1 #x02))

   Information about memory reads:
  Value TMP_BV_64_0
 (8 bytes)
    was read at address %rbx

  Value TMP_BV_8_1
 (1 bytes)
    was read at address (bvadd %rbx #x0000000000000001)

xchg_m16_r16 /xchg_m32_r32 /xchg_m64_r64 /xchg_m8_r8 /xchg_m8_rh /xchg_r16_m16 /xchg_r32_m32 /xchg_r64_m64 /xchg_r8_m8 /xchg_rh_m8

For the instruction xchgb %al, (%rax) We got the formula out of generalization
```
%rax   : %rax[63:8] ∘ TMP_BV_8_0
Information about memory reads:
  Value TMP_BV_8_0 (1 bytes)
  was read at address %rax.

Information about memory writes:
  Address %rax[63:8] ∘ TMP_BV_8_0 was updated to
  %rax[7:0] (1 bytes).
```
which is wrong as the address is messed up. This error is very particular to memory operands AND with same registers. For register variants OR with different operands this will not be manifested. The correct solution depends on the order of updates which is different for memory and non memory cases and is implemented by master stoke.
```
  add_opcode_str({"xchgb", "xchgw", "xchgl", "xchgq"},
  [this] (Operand dst, Operand src, SymBitVector a, SymBitVector b, SymState& ss) {
  if (src.is_typical_memory()) {
    ss.set(src, a);
    ss.set(dst, b);
  } else {
    ss.set(dst, b);
    ss.set(src, a);
  }
  });
```
movss_xmm_m32/movsd_xmm_m64 Cannot be generalized as the generalization is wrong. Register variant movsd_xmm_xmm has the following semantics:
```
 %ymm1  : %ymm1[255:128] ∘ (%ymm1[127:64] ∘ %ymm2[63:0])
```
If we just generalize for memory variant movsd_xmm_m64 then the resulting formula is going to be
```
 %ymm1  : %ymm1[255:128] ∘ (%ymm1[127:64] ∘ TEMP_READ_FROM_MEMORY)
```
which is wrong as the correct semantics is
```
 %ymm1  : %ymm1[255:128] ∘ (0x0₆₄ ∘ TMP_READ_FROM_MEMORY)
```
The latest master branch has the correct formula manually implemented.
roundss_xmm_m32_imm8
```
  stoke_debug_circuit  --strata_path ../../circuits/ --opc roundss_xmm_m32_imm8  --smtlib_format
```
Gives a segfault while converting the bv formula to smtlib format. The reason is
```
 %ymm1  : %ymm1[255:128] ∘ (%ymm1[127:64] ∘ cvt_single_to_int32_rm(TMP_BV_32_0[63:0], 0x0₈))
```
Note: TMP_BV_32_0[63:0] which is wrong as its extracting 64 out of 32 bit bv. which is an error in my implementation of the semantics.
pextrw/pextrb The instr is not implemented in stoke, so I had to implement it. These instr have similar semantics as registers but the implementation need to take care of zero size concatenations. Earlier we have the implemented rule as

SymBitVector::constant(dest_width - vec_len, 0) || (b >> offset*vec_len)[vec_len-1][0]

In register cases dest_width > veclen, so we have 0's concatenated in register case. The memory case, have widths equal to vec_len, which means we are concatenating zero width zero. We catch this bug while converting the formula for the followings

vpextrb_m8_xmm_imm8 
vpextrw_m16_xmm_imm8
pextrw_m16_xmm_imm8 
pextrb_m8_xmm_imm8

The z3 conversion gives exception while for concatenation with zero widths. The formula for the memory case are now fixed as follows:

 auto result = (b >> offset*vec_len)[vec_len-1][0];
 if(dest_width > vec_len) {
   result = SymBitVector::constant(dest_width - vec_len, 0) || result;
 }

Intel Manual Bug

https://software.intel.com/en-us/forums/intel-isa-extensions/topic/773342

Stroke Bugs reported

Definition of vaddsubpd/ps

https://github.com/StanfordPL/stoke/blob/2dc61f6f55a9991c25c471f5e6ffe5ff45962afb/src/validator/handlers/packed_handler.h#L54 https://github.com/StanfordPL/stoke/blob/2dc61f6f55a9991c25c471f5e6ffe5ff45962afb/src/validator/handlers/packed_handler.h#L60

%ymm1    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm2    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm3    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01

Sandbox and validator do not agree for 'vaddsubpd %ymm3, %ymm2, %ymm1' (opcode vaddsubpd_ymm_ymm_ymm)
  states do not agree for '%ymm1':
    validator: 0x0₆₄ ∘ 0x0₆₄ ∘ (sub_double(0x0₆₄, 0x1₆₄) ∘ 0x0₆₄)
    sandbox:   0x0₆₄ ∘ 0x0₆₄ ∘ 0x0₆₄ ∘ 0x8000000000000001₆₄

Definition of cvtsi2ssl/cvtsi2ssq/cvtsi2sdl/cvtsi2sdq

https://github.com/StanfordPL/stoke/blob/2dc61f6f55a9991c25c471f5e6ffe5ff45962afb/src/validator/handlers/packed_handler.h#L142 DIfferent createment for cvtsi2* and vcvtsi2* cases. So cannot be accomodated in packed handler, but implemented in simple handler.

cvtsi2ssl/cvtsi2ssq are implemented by srata as well, but the stratified formula s NOT equivalent with the stoke's formula.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly