-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major refactorings #142
Comments
Hi @KOLANICH . Зря ты здесь что-то спрашиваешь. Щас начнётся стандартная песня данного персонажа за бабло, пожертвования и т.д. и т.п. Спрашивать стоит здесь: https://github.com/rbrito/pdfsizeopt . Хоть какой то отклик (нечастый, но) будет. |
Thank you for these observations. The state of this repo (pts/pdfsizeopt) is a result of these requirements:
Also please note:
Implementing your suggestions would be easier in a project with different requirements. It would be an honor for me if such a project inspired by pdfsizeopt was started. However, it's unlikely that I would contribute much to such a project, because I think that the requirements above are very important, and they make pdfsizeopt uniquely useful. So the little time I have to contribute will go to projects which meet the requirements above.
This is a matter of preference. Some authors prefer larger small files, others prefer smaller source files. For me it would be OK to split main.py to 3, 4 or 5 files, but I would be less productive afterwards.
This is a tradeoff. In my experience, most bindings do not work reliably, because backends tend to be incompatibly upgraded over time. If pdfsizeopt starts using bindings, they will stop working in a year, and maintainers will be busy just updating bindings, possibly introducing incompatibility with other versions of the backends. The end result is many users abandoning pdfsizeopt, because it doesn't work for them.
Thank you for recommending pdfminer! In pdfsizeopt, PDF parsing (in the PdfObj class) is coupled tightly with the PDF optimizations (rest of main.py). To replace PDF parsing, the entire main.py has to be rewritten, almost from scratch. Thus it's easier to start a new Python project, inspired by pdfsizeopt, but with a brand new codebase. There are some tradeoffs implicitly encoded in the current PDF parser, for example the automatic recovery of some corrupt PDFs, and lazy parsing of objects. Reproducing all these nuances would need way too much work (possibly changing code in pdfminer), and failing to reproduce the recovery features would break the workflow of existing users.
Citation needed. I'm not aware of any reason why JPype is more secure than running
Let's continue this discussion in #139.
Let's continue this discussion in #103.
How much refactoring is needed depends on what your plan is with the refactored code. For some plans such as adding many new features (including lossy image optimizations, lossy vector graphics optimizations, TrueType-to-CFF conversion), pdfsizeopt would indeed benefit from heavy refactoring. |
Just my software developing experience. In one of my projects (setuptools plugin for Kaitai Struct) where I have replaced CLI with directly tapping into the compiler (so called " For security the same argument as with bindings.
For python tools the easiest way to install them is pip - the standard package manager for all the platforms.
In my experience the most problem with bindings is that they are almost always a cext. This means a cext has to be recompiled for each python major release, and it often requires some code modification. Another pain with cexts is that they are often (pypy and graalpython have some support for cexts) CPython-specific and don't work with awesome other implementations.
Unfortunately, yes, an it is too much work for me for this project.
Absolutely, and it has to be done. I guess pdfminer community may be also interested in parsing corrupt PDFs (though feels like not enough, my PR was closed on the basis I have not provided them with the PDFs, but I haven't because it would have been a copyright violation and crafting them intentionally is too much work). |
Sure, it would be awesome to make As of now, |
To do these the packages can be created for the binary tools. For linux we can rely on distro package manager for them. For pip, the binaries can be packed into the packages, you only need a naming scheme to distinguish them from python packages. |
1 there are too much text in few files. The text editor I use just lags. The code should be split into different files.
2 most of cli tools should be replaced by using bindings. Calling cli tools may be insecure.
3. There are some code related to pdf parsing. pdfminer has the code, and probably of higher quality. It should substitute as much of this tool as possible and make sense.
4. Java tools should be used through JPype. It is faster and more secure, than using CLI
5. #139: python 2 should be dropped and all the filenames stuff should be done via pathlib
6. #103: pip install pdfsizeopt
7. likely heavy refactoring is needed. I haven't dived deep, but projects with such poor code quality often have poor architecture.
I could have sent some PRs on some points but I am completely unsure if they will be merged. I have a lot of prior negative experience (the worst thing in free software ecosystem IMHO) with people just saying "I'm not interested, say anything about it and I will ban you", "I decide it is not needed", "python 3 is unneeded, I am OK with 2" and similar things and just closing PRs.
The text was updated successfully, but these errors were encountered: