Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--maxcolseps 0 has no effect; ocropus-gpageseg still performs whitespace column segmentation #240

Open
freen opened this issue Aug 25, 2017 · 3 comments

Comments

@freen
Copy link

freen commented Aug 25, 2017

Expected Behavior

I'm expecting ocropus-gpageseg to split the below image into rows without doing any column segmentation. To this effect I'm specifying the arguments --maxcolseps 0 --maxseps 0 (of course the latter is default.)

The command even registers that I have specified 0 as the value of maxcolseps:

root@acf1f5c7b5aa:/tmp# ocropus-gpageseg -d --maxcolseps=0 --maxseps=0 5823821_0.png
INFO:  
INFO:  ########## /usr/local/bin/ocropus-gpageseg -d --maxcolseps=0 --maxseps=
INFO:  
INFO:  5823821_0.png
INFO:  scale 21.977261
INFO:  computing segmentation
INFO:  computing column separators
INFO:  considering at most 0 whitespace column separators
INFO:  debug _1thresh.png
INFO:  debug _2grad.png
INFO:  debug _3seps.png
INFO:  debug _4seps.png
INFO:  debug _colwsseps.png
INFO:  computing lines
INFO:  debug _cleaned.png
INFO:  debug _lineseeds.png
INFO:  debug _seeds.png
INFO:  propagating labels
INFO:  spreading labels
INFO:  number of lines 200
INFO:  finding reading order
INFO:  writing lines
INFO:     195  5823821_0.png 22.0 196

Current Behavior

However, ocropus-gpageseg is doing column separation.

Here are a few examples of rows which should be full bleed, but instead they have been broken into whitespace-separated columns first, then into rows:

0100a0 bin
0100a1 bin
0100a2 bin
0100a3 bin
0100a4 bin

Possible Solution

Unsure!

Steps to Reproduce

Image: 5823821_0.png

5823821_0

  1. ocropus-nlbin 5823821_0.png
  2. ocropus-gpageseg -d --maxcolseps 0 5823821_0.png
  3. Or: ocropus-gpageseg -d --maxcolseps 0 --maxseps 0 5823821_0.png

Your Environment

root@acf1f5c7b5aa:/app/ocropy# git log -n1
commit ae84a8edaf0b76135f749ba66fc30c272d0726d0
Merge: 3b843f5 7eac431
Author: Tom Breuel <[email protected]>
Date:   Fri Aug 11 12:31:33 2017 -0700

    Merge branch 'master' of github.com:tmbdev/ocropy
  • Operating System and version:
    Debian GNU/Linux 8.6 (jessie)
@freen freen changed the title --maxcolseps 0 has no effect; ocropus-gpageseg still performs whitespace column segmentation --maxcolseps 0 has no effect; ocropus-gpageseg still performs whitespace column segmentation Aug 25, 2017
@freen
Copy link
Author

freen commented Aug 25, 2017

As an update: it looks like --hscale=100 does the trick. That being said the documentation for --maxcolseps 0 definitely seems to indicate that additional parameters are unneeded for this effect.

@zuphilip
Copy link
Collaborator

The --maxcolseps parameter works here as expected and some of the lines are already taken the full width:

lineseeds-maxcolseps-0

However, that other lines are not taking the full width, that is connected to the internals of the compute_line_seeds-function. AFAIK it kind of smears over the text and will allow some space before and after. Thus, if two text blocks are horizontally not too far away, then they will be connected. Otherwise they will be saved in two separate images.

I see that this is not the desired output in your example. I don't know the reason or details for the implementation of function. Your observation with --hscale is interesting. I have to admit that I don't know what this parameter is doing overall. Do you understand what this parameter is doing exactly? More or improved documentation would always be welcome here.

@freen
Copy link
Author

freen commented Sep 16, 2017

Thanks for your reply, @zuphilip.

Your observation with --hscale is interesting. I have to admit that I don't know what this parameter is doing overall. Do you understand what this parameter is doing exactly? More or improved documentation would always be welcome here.

I don't know what the actual algorithmic effect of the parameter is, I just arrived at this outcome by experimenting with the various command line options. Sorry I can't provide more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants