-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance with high-res images #85
Comments
Two points about your comment. First, a DPI of 600 can not alone make eynollah slower. The problem with high resolution documents is (without allow_scaling option) that they can not be scaled down automatically. The allow_scaling should be True and if columns are detected correctly then down scaling can be a case. The second point, allow_scaling lets you to scale down for documents with DPI bigger than 300. But scaling down will happen if its needed. This means if scale of document is much bigger than of "training scale" then scaling down will be applied. |
@vahidrezanezhad please help me understand:
How is that? I can see lots of CPU-bound image processing. Most algorithms are
Why not? Downsampling (with suitable interpolation algorithm) should be trivial – as opposed to upsampling, for which you built an elaborate model.
I am confused. Where does this actually happen? |
Here: eynollah/qurator/eynollah/eynollah.py Lines 2007 to 2008 in 13bc237
eynollah/qurator/eynollah/eynollah.py Line 373 in 13bc237
eynollah/qurator/eynollah/eynollah.py Lines 318 to 323 in 13bc237
(So, essentially, if the column detector is confident enough, there can be downsampling.) |
Sometimes the input comes with DPI 600 or beyond. It seems to me this makes eynollah become much slower. Larger resolution might be needed for newspapers, but there is always a point at which result quality does increase. I would assume that a single downscaling interpolation after import should not be too costly.
The documentation of
allow_scaling
says that it would also scale down images. But the implementation does not look like that's the case:eynollah/qurator/eynollah/eynollah.py
Lines 437 to 444 in 8d5079c
IIUC only too small images get upsampled. I'd expect a secondary
DPI_THRESHOLD2
at which downsampling would begin.The text was updated successfully, but these errors were encountered: