Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.7 decompiling full 3.7.3 library without error #235

Open
rocky opened this issue May 13, 2019 · 12 comments
Open

Python 3.7 decompiling full 3.7.3 library without error #235

rocky opened this issue May 13, 2019 · 12 comments
Assignees
Labels

Comments

@rocky
Copy link
Owner

rocky commented May 13, 2019

Description

$ pyenv local 3.7.3
$ cd test
$ ./test_pyenvlib.py --3.7.3 --max 2200 --verify

Remove 36 or so parser errors.

@x0ret as with the others, if this is not of interest or too much work, feel free to remove yourself.

Background

Tests

@rocky rocky added the Epic label May 13, 2019
@x0ret
Copy link
Collaborator

x0ret commented May 13, 2019

  File "/Users/x0ret/Desktop/dev/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1448, in print_super_classes3
    if i == start:
UnboundLocalError: local variable 'i' referenced before assignment
                start = n-2
                for i in range(start, 0, -1):
                    if not node[i].kind in ['expr', 'call', 'LOAD_CLASSNAME']:
                        break
                    pass

                if i == start:
                    return
             else:
-                start = n-2
+                i = start = n-2
                 for i in range(start, 0, -1):
                     if not node[i].kind in ['expr', 'call', 'LOAD_CLASSNAME']:

Need more investigation. I'm working on it.

rocky added a commit that referenced this issue May 13, 2019
Python 3.7 decompiling full 3.7.3 library without error #235
@x0ret
Copy link
Collaborator

x0ret commented May 13, 2019

@rocky i was investigating python3.7 datetime issue and noticed the issue is another chained compare, this is the the test case:

def f():
    if (month is None and
	isinstance(year, (bytes, str)) and len(year) == 4 and
	1 <= ord(year[2:3]) <= 12):
        return 5

finally this is the diff:

diff --git a/uncompyle6/parsers/parse37.py b/uncompyle6/parsers/parse37.py
index 1d2baa7e..25c6fd28 100644
--- a/uncompyle6/parsers/parse37.py
+++ b/uncompyle6/parsers/parse37.py
@@ -99,6 +99,8 @@ class Python37Parser(Python36Parser):

         compare_chained1_false_37 ::= expr DUP_TOP ROT_THREE COMPARE_OP POP_JUMP_IF_FALSE
                                       compare_chained2c_37 POP_TOP JUMP_FORWARD COME_FROM
+        compare_chained1_false_37 ::= expr DUP_TOP ROT_THREE COMPARE_OP POP_JUMP_IF_FALSE
+                                      compare_chained2d_37 POP_TOP JUMP_FORWARD COME_FROM
         compare_chained2_false_37      ::= expr DUP_TOP ROT_THREE COMPARE_OP POP_JUMP_IF_FALSE
                                       compare_chained2a_false_37 ELSE POP_TOP JUMP_BACK COME_FROM

@@ -110,6 +112,8 @@ class Python37Parser(Python36Parser):
         compare_chained2c_37       ::= expr DUP_TOP ROT_THREE COMPARE_OP come_from_opt POP_JUMP_IF_FALSE
                                        compare_chained2a_false_37 ELSE

+        compare_chained2d_37       ::= expr COMPARE_OP _come_froms POP_JUMP_IF_FALSE JUMP_FORWARD ELSE
+
         jf_cfs                     ::= JUMP_FORWARD come_froms
         ifelsestmt                 ::= testexpr c_stmts_opt jf_cfs else_suite opt_come_from_except

diff --git a/uncompyle6/semantics/customize37.py b/uncompyle6/semantics/customize37.py
index 71d84dc4..2d9dd26f 100644
--- a/uncompyle6/semantics/customize37.py
+++ b/uncompyle6/semantics/customize37.py
@@ -62,6 +62,9 @@ def customize_for_version37(self, version):
             (0, 19 ) ),
         'compare_chained2c_37': (
             '%[3]{pattr.replace("-", " ")} %p %p', (0, 19), (6, 19) ),
+        'compare_chained2d_37': (
+            '%[1]{pattr.replace("-", " ")} %p',
+            (0, 19) ),
         'if_exp_37a': ( '%p if %p else %p', (1, 'expr', 27), (0, 27), (4, 'expr', 27) ),
         'if_exp_37b': ( '%p if %p else %p', (2, 'expr', 27), (0, 'expr', 27), (5, 'expr', 27) ),

I'm not sure this diff is your favorite, anyway as you can see compare_chained2d_37 is just compare_chained2b_37 with accept multiple COME_FROM. maybe we can fix just 2b.

Please review this and guide me to edit it to fit in your style.

Thank you

@rocky
Copy link
Owner Author

rocky commented May 13, 2019

Thanks for the information. It looks to me like uncompyle6 is putting in COME_FROM to the wrong locations. Here is the offending bytecode reported:

def f--- This code section failed: ---

   2       0  LOAD_GLOBAL              'month'
           2  LOAD_CONST               None
           4  COMPARE_OP               'is'
           6  POP_JUMP_IF_FALSE    74  'to 74'

   3       8  LOAD_GLOBAL              'isinstance'
          10  LOAD_GLOBAL              'year'
          12  LOAD_GLOBAL              'bytes'
          14  LOAD_GLOBAL              'str'
          16  BUILD_TUPLE_2         2  ''
          18  CALL_FUNCTION_2       2  ''
          20  POP_JUMP_IF_FALSE    74  'to 74'
          22  LOAD_GLOBAL              'len'
          24  LOAD_GLOBAL              'year'
          26  CALL_FUNCTION_1       1  ''
          28  LOAD_CONST            4  4
          30  COMPARE_OP               '=='
          32  POP_JUMP_IF_FALSE    74  'to 74'

   4      34  LOAD_CONST            1  1
          36  LOAD_GLOBAL              'ord'
          38  LOAD_GLOBAL              'year'
          40  LOAD_CONST            2  2
          42  LOAD_CONST            3  3
          44  BUILD_SLICE_2         2  ''
          46  BINARY_SUBSCR    
          48  CALL_FUNCTION_1       1  ''
          50  DUP_TOP          
          52  ROT_THREE        
          54  COMPARE_OP               '<='
          56  POP_JUMP_IF_FALSE    66  'to 66'
          58  LOAD_CONST           12  12
          60  COMPARE_OP               '<='
        62_0  COME_FROM            32  '32'
        62_1  COME_FROM            20  '20'
        62_2  COME_FROM             6  '6'
          62  POP_JUMP_IF_FALSE    74  'to 74'
          64  JUMP_FORWARD         70  'to 70'
          66  ELSE                     '70'
          66  POP_TOP          
          68  JUMP_FORWARD         74  'to 74'
        70_0  COME_FROM            64  '64'

   5      70  LOAD_CONST            5  5
          72  RETURN_VALUE     
        74_0  COME_FROM            68  '68'
        74_1  COME_FROM            62  '62'

The jumps from offsets 32, 20, and 6 should be to offset 74, not offset 62. If the COME_FROMs were fixed, I wonder if this would work.

Wouldn't this be a better fix? Your thoughts?

@x0ret
Copy link
Collaborator

x0ret commented May 13, 2019

You are right, I didn't check for that. Honestly, I have no clue where this issue comes from. Could you please guide me (scope) to go for it?

@rocky
Copy link
Owner Author

rocky commented May 13, 2019

This is in scanners/scanner3.py, and specifically function find_jump_targets(). It returns a dictionary called targets which function ingest() uses to insert the COME_FROMs.

Although the detect_control_flow() is a big pile of crap and should be removed, making sure that that getting the targets of jumps point to the right offset shouldn't be so bad to fix.

@x0ret
Copy link
Collaborator

x0ret commented May 13, 2019

Ok, the issue is this line:

@@ -811,7 +814,8 @@ class Scanner3(Scanner):
                             self.fixed_jumps[offset] = fix or match[-1]
                             return
                     else:
-                        self.fixed_jumps[offset] = match[-1]
+                        self.fixed_jumps[offset] = target
                         return
             # op == POP_JUMP_IF_TRUE

According to the condition:

                # If we still have any offsets in set, start working on it
                if match:
                    is_jump_forward = self.is_jump_forward(pre_rtarget)
                    if (is_jump_forward and pre_rtarget not in self.stmts and
                        self.restrict_to_parent(self.get_target(pre_rtarget), parent) == rtarget):

is_jump_forward is False so else is executing. and then we have:

                   else:
                        self.fixed_jumps[offset] = match[-1]

however match is [6, 20, 32, 62] and always we have 62. instead based on fix we have target=74 and disasm will be like:

          64  JUMP_FORWARD         70  'to 70'
          66  ELSE                     '70'
          66  POP_TOP
          68  JUMP_FORWARD         74  'to 74'
        70_0  COME_FROM            64  '64'

   5      70  LOAD_CONST            5  5
          72  RETURN_VALUE
        74_0  COME_FROM            68  '68'
        74_1  COME_FROM            62  '62'
        74_2  COME_FROM            32  '32'
        74_3  COME_FROM            20  '20'
        74_4  COME_FROM             6  '6'
          74  LOAD_CONST               None
          76  RETURN_VALUE

but next we have another issue which i'm working on it:

def f():
    if month is None:
        if isinstance(year, (bytes, str)):
            if len(year) == 4:
                pass
    if not 1 <= ord(year[2:3]) <= 12:
        return 5

EDIT: This diff breaks test cases

@rocky
Copy link
Owner Author

rocky commented May 13, 2019

I am sorry the code is such a mess. (Compare with https://github.com/Mysterie/uncompyle2/blob/master/uncompyle2/scanner27.py#L326-L580 which is where this is derived from and corresponding uncompyle2's and you'll see it was always a mess)

Yes, I saw it breaks the tests.

But we should change the rules so that they don't have to be followed by a (sometimes bogus) "COME_FROM" (because the jump in 3.7 can sometimes go directly back to a loop iterator).

I have come to learn from painful mistakes that we have to understand what's going on and represent things the way they are. If the code for "and" sometimes does not have a COME_FROM at the end, uncompyle6 shouldn't be trying to add some sort of phony "COME_FROM" as may have been done in the past.

Instead all of this code needs an overhaul and replaced by something which is on a more sound ground: control flow analysis which understands the boundaries of loops and of blocks and nested blocks.

@rocky
Copy link
Owner Author

rocky commented May 13, 2019

By the way, if desired, there could be a reduce check added when the "and" rule doesn't have a "come from" because the jump is backwards.

@rocky
Copy link
Owner Author

rocky commented May 13, 2019

It occurs to me that if fixing up all the past sins is to arduous, for now you can just have the change you proposed apply for 3.7+ (self.version >= 3.7 or whatever) and over the weekend I can look at going back and fixing old old 3.x grammar rules that would be broken as a result of doing the right thing.

@x0ret
Copy link
Collaborator

x0ret commented May 14, 2019

@rocky, Please check this issue, i couldn't fix it up.

the issue is in recognizing or since the current grammer does not handle the python 3.7 case. precisely, the case is .pyenv/versions/3.7.3/lib/python3.7/html/__pycache__/__init__.cpython-37.opt-1.pyc or for better integration we have following test case:

def f():
    if 0xD800 <= num <= 0xDFFF or num > 0x10FFFF:
        return 5
    return 4

disasm:

   2       0  LOAD_CONST        55296  55296
           2  LOAD_GLOBAL              'num'
           4  DUP_TOP
           6  ROT_THREE
           8  COMPARE_OP               '<='
          10  POP_JUMP_IF_FALSE    20  'to 20'
          12  LOAD_CONST        57343  57343
          14  COMPARE_OP               '<='
          16  POP_JUMP_IF_TRUE     30  'to 30'
          18  JUMP_FORWARD         22  'to 22'
          20  ELSE                     '22'
          20  POP_TOP
        22_0  COME_FROM            18  '18'
          22  LOAD_GLOBAL              'num'
          24  LOAD_CONST       1114111  1114111
          26  COMPARE_OP               '>'
        28_0  COME_FROM            16  '16'
          28  POP_JUMP_IF_FALSE    34  'to 34'

   3      30  LOAD_CONST            5  5
          32  RETURN_END_IF
        34_0  COME_FROM            28  '28'

   4      34  LOAD_CONST            4  4
          36  RETURN_VALUE
          -1  RETURN_LAST

As of previous grammer for or there are POP_JUMP_IF_TRUE:

or ::= expr jmp_true expr COME_FROM (18)

but for 3.7 we don't. I test multiple thing but couldn't fix. Could you please check?

In last attempt, just 22 cases of all 2055 failed in python 3.7 library.

@rocky
Copy link
Owner Author

rocky commented May 14, 2019

In last attempt, just 22 cases of all 2055 failed in python 3.7 library.

You are awesome!

The problem here is I bet another one of those dishonest COME_FROM's

          16  POP_JUMP_IF_TRUE     30  'to 30'
          ...
          26  COMPARE_OP               '>'
          28_0  COME_FROM            16  '16'

So if you can find the place in that horrible scanner3 routine that is mucking that up and remove it for 3.7 and later (or maybe 3.6 as well) then I bet this would go away or we might have to adjust some rules, but that's okay.

I am sorry I got into this mess. At the time this stuff crept in, I didn't have

  • the better grammar tracing and error display we now have
  • reduction checks
  • the better tree display tools,
  • the grammars for the various versions were not as isolated, and
  • I didn't even have the ability to add comments to the grammar.

Nor did I understand what was going on as well.

Now that all of this has changed, there is this mess to clean up.

I really like the idea of setting a totally new decompiler project just for a single modern version like 3.7 (and later 3.7 and up) which has a stripped down grammar and semantic actions. And uses python-flow-control. I can drop the restriction of writing in a Python2-compatible way (because uncompyle6 will always be around). This can also be in its own github organization if that helps.

@x0ret
Copy link
Collaborator

x0ret commented May 14, 2019

So if you can find the place in that horrible scanner3 routine that is mucking that up and remove it for 3.7 and later (or maybe 3.6 as well) then I bet this would go away or we might have to adjust some rules, but that's okay.

You were right, i failed twice! : ).

@@ -753,8 +754,10 @@ class Scanner3(Scanner):
                     # FIXME: this is not accurate The commented out below
                     # is what it should be. However grammar rules right now
                     # assume the incorrect offsets.
-                    # self.fixed_jumps[offset] = target
-                    self.fixed_jumps[offset] = pretarget.offset
+                    if self.version < 3.6:
+                        self.fixed_jumps[offset] = pretarget.offset
+                    else:
+                        self.fixed_jumps[offset] = target
                     self.structs.append({'type': 'and/or',
                                          'start': start,
                                          'end': pretarget.offset})

This needs a little rule adjustment, i'm working on it.

I am sorry I got into this mess. At the time this stuff crept in, I didn't have

You shouldn't be. This project is awesome, believe me.

I really like the idea of setting a totally new decompiler project just for a single modern version like 3.7 (and later 3.7 and up) which has a stripped down grammar and semantic actions. And uses python-flow-control. I can drop the restriction of writing in a Python2-compatible way (because uncompyle6 will always be around). This can also be in its own github organization if that helps.

So we should go for it, let's do this for a while and see the result. Concentrating on one version and ability to use modern language features. Good idea.

Thanks.

@x0ret x0ret mentioned this issue Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants