Releases: rocky/python-uncompyle6
xdis again
More upheaval in xdis which we need to track here.
April Fool
Back port some of the changes in decompile3 here which mostly helps 3.7 and 3.8 decompilation, although this may also help 3.6ish versions too.
- Handle nested
async for in for...
and better async comprehension detection viaxdis
. Still more work is needed. - include token number in listings when
-g
and there is a parser error - remove unneeded
Makefile
s now that remake 4.3+1.5dbg is a thing that has-c
- Bug in finding annotations in functions with docstrings
- Fix bug found by 2.4 sre_parse.py testing
- Fix
transform
module'sifelseif
bugs - Fix bug in 3.0 name module detection
- Fix docstring detection
plateau
The main focus in this release was fix some of the more glaring problems creapt in from the last release due to that refactor.
uncompyle6
code is at a plateau where what is most needed is a code refactoring. In doing this, until everything refactored and replaced, decomplation may get worse.
Therefore, this release largely serves as a checkpoint before more major upheaval.
The upheaval, in started last release, I believe the pinnicle was around c90ff51 which wasn't a release. I suppose I should tag that.
After c90ff5, I started down the road of redoing control flow in a more comprehensible, debuggable, and scalable way. See The Control Flow Mess.
The bulk of the refactoring going on in the decompyle3 project, but I try to trickle down the changes.
It is tricky because the changes are large and I have to figure decompose things so that little testable pieces can be done. And there is also the problem that what is in decompyle3 is incomplete as well.
Other than control flow, another change that will probably happen in the next release is to redo the grammar for lambda expressions. Right now, we treat them as Python statements, you know, things with compound statements in them. But lambdas aren't that. And so there is hackery to paper over difference making a statement out of an expression the wrong thing to do. For example, a return of an "and" expression can be expressed as nested "if" statements with return inside them, but the "if" variant of the bytecode is not valid in a lambda.
In the decompyle3 code, I've gone down the road making the grammar goal symbol be an expression. This also offers the opportunity to split the grammar making parsing inside lambda not only more reliable because the wrong choices don't exist, but also simpler and faster because all those rules just need don't need to exist in parsing.
I cringe in thinking about how the code has lived for so long without noticing such a simple stupidity, and lapse of sufficient thought.
Some stats from testing. The below give numbers of decompiled tests from Python's test suite which succesfully ran
Version test-suites passing
------- -------------------
2.4.6 243
2.5.6 265
2.6.9 305
3.3.7 300
3.4.10 304
3.5.9 260
3.6.10 236
3.7.6 306
3.8.1 114
Decompiled bytecode files distributed with Python (syntax check only):
2.7.17 647 files: 0 failed
3.2.6 900 files: 0 failed
3.3.7 1256 files: 0 failed
3.4.10 800 files: 0 failed
3.5.9 900 files: 0 failed
3.6.10 1300 files: 28 failed
Martin and Susanne
Of late, every release fixes major gaps and embarrassments of the last release....
And in some cases, like this one, exposes lacuna and rot.
I now have [control] flow under control, even if it isn't the most optimal way.
I now have greatly expanded automated testing.
On the most recent Python versions I regularly decompile thousands of Python programs that are distributed with Python. when it is possible, I then decompile Python's standard test suite distributed with Python and run the decompiled source code which basically checks itself. This amounts to about 250 test programs per version. This is in addition to the 3 CI testing services which do different things.
Does this mean the decompiler works perfectly? No. There are still a dozen or so failing programs, although the actual number of bugs is probably smaller though.
However, in perparation of a more major refactoring of the parser grammar, this release was born.
In many cases, decompilation is better. But there are some cases where decompilation has gotten worse. For lack of time (and interest) 3.0 bytecode suffered a hit. Possibly some code in the 3.x range did too. In time and with cleaner refactored code, this will come back.
Commit c90ff51 was a local maxiumum before, I started reworking the grammar to separate productions that were specific to loops versus those that are not in loops.
In the middle of that I added another grammar simplication to remove singleton productions of the form sstmts-> stmts
. These were always was a bit ugly, and complicated output.
At any rate if decompilation fails, you can try c90ff51. Or another decompiler. unpyc37
is pretty good for 3.7. wibiti uncompyle2
is great for 2.7. pycdc
is mediocre for Python before 3.5 or so, and not that good for the most recent Python. Generally these programs will give some sort of answer even if it isn't correct.
decompyle3 isn't that good for 3.7 and worse for 3.8, but right now it does things no other Python decompiler like unpyc37
or pycdc
does. For example, decompyle3
handles variable annotations. As always, the issue trackers for the various programs will give you a sense for what needs to be done. For now, I've given up on reporting issues in the other decompilers because there are already enough issues reported, and they are just not getting fixed anyway.
Samish
Yet again the focus has been on just fixing bugs, mostly geared in the
later 3.x range. To get some sense what sill needs fixing, consult
test/stdlib/runtests.sh. And that only has a portion of what's known.
make_function.py
has gotten so complex that it was split out into 3 parts
to handle different version ranges: Python <3, Python 3.0..3.6 and Python 3.7+.
An important fix is that we had been dropping docstrings in Python 3 code as a result
of a incomplete merge from the decompile3 base with respect to the transform phase.
Also important (at least to me) is that we can now handle 3.6+
variable type annotations. Some of the decompile3 code uses that in
its source code, and I now use variable annotations in conjunction
with mypy in some of my other Python projects
Code generation for imports, especially where the import is dotted
changed a bit in 3.7; with this release are just now tracking that
change better. For this I've added pseudo instruction
IMPORT_NAME_ATTR
, derived from the IMPORT_NAME
instruction, to
indicate when an import contains a dotted import. Similarly, code for
3.7 import .. as
is basically the same as from .. import
, the
only difference is the target of the name changes to an "alias" in the
former. As a result, the disambiguation is now done on the semantic
action side, rathero than in parsing grammar rules.
Some small specific fixes:
- 3.7+ some chained compare parsing has been fixed. Other remain.
- better if/else rule checking in the 3.4 and below range.
- 3.4+ keyword-only parameter handling was fixed more generally
- 3.3 .. 3.5 keyword-only parameter args in lambda was fixed
3.6.1
Overall, as in the past, the focus has been on just fixing bugs, more geared
in the later 3.x range. Handling "async for/with" in 3.8+ works better.
Numerous bugs around handling lambda
with keyword-only and *
args in the
3.0-3.8 have been fixed. However many still remain.
binary_expr
and unary_expr
have been renamed to bin_op
and
unary_op
to better correspond the Python AST names.
Some work was done Python 3.7+ to handle and
better; less was done
along the lines of handling or
. Much more is needed to improve
parsing stability of 3.7+. More of what was done with and
needs to
be done with or
and this will happen first in the "decompyle3"
project.
Later this will probably be extended backwards to handle the 3.6-
versions better. This however comes with a big decompilation speed
penalty. When we redo control flow this should go back to normal, but
for now, accuracy is more important than speed.
Another assert
transform rule was added. Parser rules to distingish
try/finally
in 3.8 were added and we are more stringent about what
can be turned into an assert
. There was some grammar cleanup here
too.
A number of small bugs were fixed, and some administrative changes to
make make check-short
really be short, but check more throughly what
it checks. minimum xdis version needed was bumped to include in the
newer 3.6-3.9 releases. See the ChangeLog
for details.
gecko gecko
The main focus in this release was more accurate decompilation especially
for 3.7 and 3.8. However there are some improvments to Python 2.x as well,
including one of the long-standing problems of detecting the difference between
try ...
and try else ...
.
With this release we now rebase Python 3.7 on off of a 3.7 base; This
is also as it is (now) in decompyle3. This facilitates removing some of the
cruft in control-flow detection in the 2.7 uncompyle2 base.
Alas, decompilation speed for 3.7 on is greatly increased. Hopefull
this is temporary (cough, cough) until we can do a static control flow
pass.
Finally, runing in 3.9-dev is tolerated. We can disassemble, but no parse tables yet.
JNC
- Pypy 3.3, 3.5, 3.6, and 3.6.9 support
- bump xdis version to handle newer Python releases, e.g. 2.7.17, 3.5.8, and 3.5.9
- Improve 3.0 decompilation
- no parse errors on stlib bytecode. However accurate translation in
control-flow and and/or detection needs work
- no parse errors on stlib bytecode. However accurate translation in
- Remove extraneous iter() in "for" of list comprehension Fixes #272
- "for" block without a
POP_BLOCK
and confusingJUMP_BACK
forCONTINUE
. Fixes #293 - Fix unmarshal incompletness detected in Pypy 3.6
- Miscellaneous bugs fixed
Stony Brook Ride
- Fix fragment bugs
- missing
RETURN_LAST
introduced when adding transformation layer - more parent entries on tokens
- missing
- Preliminary support for decompiling Python 1.0, 1.1, 1.2, and 1.6
- Newer xdis version needed