-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please move to Python 3 #13
Comments
I don't have any plans to port didjvu to Python 3. |
For me it seems like porting |
Python 2 is not maintained anymore regarding security. This means distros do not have a choice. |
Gamera developer @cdalitz says that the main branch has already been ported completely to Python 3 (https://github.com/hsnr-gamera/gamera-4), however it is marked as 'experimental' in the description and it doesn't seem to have an official release yet. |
Concerning the python 3 port of Gamera (gamera-4), this is indeed finished. It is nevertheless still marked as "experimental", because it is not extensively tested. As I no longer use Gamera myself in any of the projects that I currently work on, I do not have the opportunity to test and fix it. Thus, if someone finds any bugs, patches for fixing them are highly welcome. |
@cdalitz Okay, thanks for clarifying! |
I understand a virtual environment for python 2 can be created on e.g. stable Debian and the program run inside it. I will appreciate a fool-proof instruction how to actually do it. |
@jsbien Python 2 is still available as official Debian package up to sid, so you probably don't have to worry about Python 2 for (at least) the next 5 years if you're on Debian. In general I think it might be better just not to use the djvu format anymore. The vast majority of djvu software is unmaintained, and outside the linux/bsd scope there are very few programs left that can open djvu at all. |
@mara004 As for DjVu: djview4 and djvulibre is very well maintained, and new software is created, e.g. https://github.com/trufanov-nok/minidjvu-mod/. For me the compression ratio is the least important feature of DjVu, it has a lot of other advantages which are demonstrated by our tools such as https://github.com/jsbien/djview4shapes and https://bitbucket.org/mrudolf/djview-poliqarp. Their use it demonstrated e.g. by https://github.com/jsbien/iLindeCSV and https://github.com/jsbien/Zaborowski-index4djview. |
I won't deny there is still some active djvu software, but it seems most of it is rather intended for research than for practical use. Development of djvulibre has been slowing down a lot, and the djvu format is barely used compared to PDF or TIFF. Since most macOS, Windows or mobile users won't be able to open djvu, it is also very unsuitable for sharing. |
At least Gamera has been ported to Python 3 (use the Gamera 4 version). If you encounter any problems with Gamera under Python 4, please consder filing a bug report there. This should thus not be an obstacle to porting djvu to Python 3, I think. |
I made some experiments with Gamera 4 and encountered no problems.
Anybody willing to try this approach? |
I just had a look at porting didjvu to Python 3, with the following issues arising:
|
@FriedrichFroebel wrote:
I don't see your fork? |
@mara004 I agree PDF is much more common, and I guess if you put the MRC-djvu result of didjvu through DjVuToy to translate it to PDF it will not be much bigger with JP2000 instead of the FG44 IW44 image masked by the JBIG2 in a similar way as is done in a multilayer DjVu. |
See jwilk#13 for a list of known problems.
@rmast I did this on an old clone of this repository back then for testing and realized the aforementioned porting issues, so I did not upload these changes to GitHub. The incomplete/partially broken Python 3 port is now available in my fork. |
Thanks! I have too little Python-experience to do the full port myself, but I can focus on details.
I compiled Gamera-4 yesterday and ran 2to3 on didjvu, but I already got stuck on some arguments, which seems to be a quite standard porting issue, however I don’t know a site to look up all porting-errors and corresponding fixes. Do you know how to operate libcst to do most of the work?
|
The fork of @FriedrichFroebel just does the job in python3.8 on Mint 20.2 when I run didjvu encode, after compiling and installing Gamera-4 without wx. https://github.com/hsnr-gamera/gamera-4 I don't know how to call it to reproduce the issues that @FriedrichFroebel thought were still there? Edit: I found it: run It only gives a test-issue with tests.test_gamera.test_to_pil_rgb.test_color. So the output has to be judged to be able to point to the right repo to solve it. |
This shows the way to see the ycbcr-jpeg.tiff contains a given colorspace:
Both In.jpg and Out.jpg appear not to have any Colorspace information: gives no APP* or whatever segment markers popping up in the right column as described here: So also no Adobe APP14 marker which could distinguish between RGB and YCbCr. However the default color scheme for JPEG is YCBCR. The documentation of to_pil says it only supports RGB and Grayscale. So putting in a YCbCr image probably already leads to an undefined situation. The tested code seems to try to replace some Gamera-bugs, or try to speed up to_pil with a custom to_pil_rgb. They might have a history in the commits that tells more about what happened and why they're introduced in the first place. |
Glad to see that the port is working, as I have only used the tests before (after 2to3 conversion and some manual fixes). While I have no clean solution for the aforementioned issues (#13 (comment)), I do not feel like a PR makes sense - besides the fact that Python 3 support does not seem to be considered useful by upstream. I am clearly not an expert on the colorspace stuff, so there is not much I can say about it. The commit history for the Gamera support does not seem to tell us much about it as well: 2337b8f, fdd6bf9. |
@FriedrichFroebel As I read those commits you pointed at it might be just an optimization step that made the assumption of RGB necessary, while most real-life images are usually YCrCb.
The only thing that should be thoroughly tested then is behavior with source images of different color spaces, however usually images in scanned input will behave consistently, so if the colorspace fails someone will know at first try. A PR is not necessary at te moment, as the Ubuntu 18-trick of getting the old dropped python-gamera package to work on Ubuntu 20 with Python 2.7 is still valid. As soon as a valid python-gamera package is not reachable that way anymore because some dependencies of the Ubuntu 18 package get upgraded @jwilk will have to decide how to keep the didjvu usable. The package maintainer of Debian has abandoned python-gamera as has has its maintainer. gamera-4 might get out of Alpha at some moment, that would be the moment to put effort in the upstream again, and probably even put effort in getting gamera-4 back in the debian packages. I committed some python3-changes to my fork of the python3 branch as well, for getting the 'bundle' function to work properly. I also made another branch for supporting minidjvu-mod with the -2 parameter to call when --pages-per-dict > 1. However, even with minidjvu-mod in place I see only a small reduction of the size. The resulting djvu-filesize is still way bigger than I would expect from DjVuSolo 3.1. When I scan a letter with a colored logo, an autograph and some colored text on the bottom there is mostly lots of blur on the background-picture, but it takes way too much space in the djvu. I studied DjVuSolo 3.1, it behaves differently with different content, optimizing away layers that practically don't contain useful information, but use an FGBZ instead of a FG44. I saw blur on the background picture behind the JB2 foreground-mask. The official DjVu uses cheap to compress content behind the foreground mask as it will not be shown. |
I just witnessed a case where the colorspace issue appeared with a posterized 8 color .png as input in the Python3.8 version, so the issue isn't only appearing in the test. Here a suggestion to use OpenCV for the conversion: But before conversion you should know what colorspace is used in the image. |
The issue with the tested image is filed at Gamera-4: The issue with the posterized/palletized image can be solved by allowing mode P for PNG. |
@FriedrichFroebel, please take my Gamera-4-patch, on my fork-master to solve the to_pil_rgb-issue: When I run make test on didjvu with my new version of the python3-branch I run into issues with test_xmp.py, which attempts to use a deprecated way to del an imported module. As you are more experienced with Python, could you have a look? |
@rmast Are you sure the module problem is really related to your Gamera change and not to any of the three XMP backends (if I remember correctly, I did not install all of them for testing)? Which backend this is about? Do you have a specific error message I can use to have a look at? |
Friedrich,
I don’t expect the issue to be related to Gamera at all. I guess solving the errors of some tests makes some other tests appear that were previously not visible. If you run make test with my python3-branch within a minute I expect the errors I meant rolling over your screen.
|
@rmast I have been running each of the test files directly on its own, so I would not expect to see any change on it with your patch (with the same OS and Python as in your case). For this reason I asked which XMP backend libraries you have installed, as I used only one as far as I remember (probably |
I work on Mint 20.2. I just did apt-get upgrade to make it easy to follow. Lots of details of apt and pip install: I installed python-xmp-toolkit, it wasn't installed, but it didn't result in any difference. This is the exact error at the end of make test:
This is the programtext in that file:
If I try to run the single test I get:
Edit: I had no libxmp-dev installed via apt. That makes the python2.7 version completely skip the libxmp-tests. Does your version just skip the xmp-tests as well? What is xmp? What of it should be remained in the new upgraded Python3 version? |
@FriedrichFroebel I forgot to tag you in above message with all details. |
@rmast Seems like I never actually run the XMP tests beforehand - now I could actually reproduce your issue about the undefined variable From the docs of the
Wikipedia has some more information: https://en.wikipedia.org/wiki/Extensible_Metadata_Platform I am not sure whether all three backends should be kept. With my latest changes, By the way: Running only one test module can be done with a modified version of the implementation of the |
@FriedrichFroebel Yes! all tests run fine now on your branch python3 when I put the default Python to 3.8 and apt -uninstall all xmp-stuff. I've only issued a PR to your python3-branch for 3 write-lines in djvu_support.py that need an .encode() for the bundle-flow. |
My PR at hsnr-gamera/gamera-4 has just been merged into master! |
@FriedrichFroebel I was looking for code test coverage, but see there is some code coverage statistic in the source tree: I bet this shows the code that has no test-coverage. So all those lines have to inspected on need to upgrade, for example the write bytes instead of string issue. |
Yes! The lines your new test covers don't show up in the code coverage anymore, however the bytes-issue also shows up in a standard coverage package. Don't know if I solved it right, but the private/update-coverage runs: |
@rmast It actually is much simpler for Python3-only code: Just use |
@mara004 wrote
Ever seen this project? https://github.com/internetarchive/archive-pdf-tools |
I didn't know this yet, but it's highly interesting. I wonder whether the author of OCRmyPDF knows about |
I doubt it. I want to investigate how good it is, it probably only supports the written happy flow, It chokes with complex Python erros on leaving out some of those parameters.
|
You mean the project claims a reliability it does not offer? |
I’ve not seen any reliability-claim for general use. Only it’s name, the internet archive, does promise some professional quality for the happy flow:
“While the code is already being used internally to create PDFs at the Internet Archive, the code still needs more documentation and cleaning up, so don't expect this to be super well documented just yet.”
|
@mara004: I've found an issue just with the first run that succeeded with a multipage scanned PDF. The background contains fuzz from the partial pixels just as djvumake. C44 performs better. The suggestion at the bottom of the repo https://github.com/internetarchive/archive-pdf-tools#examining-the-results would probably only be handy with a manual review in a workflow, as done in gscan2pdf and scantailor.
|
We should probably try to get it working on Python 3.10 as well: |
@rmast I am currently on Python 3.8.10 due to my distro, so no way to directly check it (leaving GitHub Actions aside). But it seems like didjvu uses subprocess calls in the corresponding djvulibre wrapper (https://github.com/jwilk/didjvu/blob/master/lib/djvu_support.py) instead of the native wrapper, so it should work in theory. |
This instruction reveals Python3.10.2 at the moment: |
This instruction allows switching between default Python-versions: |
This non-LTS Ubuntu distro 21.04 has 3.10 in the package manager: https://packages.ubuntu.com/hirsute/python3.10-distutils |
I fixed a Python3.10 issue in Gamera-4: hsnr-gamera/gamera-4#39 With Python3.9 there still are some new gi-import-warnings with test_xmp |
I've now seen all tests run Ok in Ubuntu 22.04 with these extra packages and my python3.9 branch. sudo apt install python3-pip gir1.2-gexiv2-0.10 libexempi-dev libboost-python-dev libexiv2-dev libpng-dev libtiff-dev djvulibre-bin exiv2 python3-pil pip install py3exiv2 Friedrich sees room for improvement of my GExiv2-fix. But I think we're near a viable Python3.10 version for the coming Ubuntu 22.04 |
Hi, I have upgraded my last machine from buster and the packages are no longer available, presumably because of no python2.7 support. FriedrichFroebel#5 (comment) Says that Gamera 4 is now officially released. Does anyone have any patches I can try to give the python3 port a test? I'm happy to built it myself but I would appreciate some pointers around which dependencies I need and whether I need to build those myself as well (It looks like I'll need to at least build Gamera 4). Thanks for any pointers you can give me. |
https://github.com/FriedrichFroebel/didjvu |
Thank you @jsbien! |
Python 2 will be EOL end of 2019. Distributions will stop shipping it. https://pythonclock.org/
The text was updated successfully, but these errors were encountered: