Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust reading order when there are no columns #190

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zuphilip
Copy link
Collaborator

@zuphilip zuphilip commented Mar 9, 2017

When there are no white column separators and no black
columns separators, then we determine the reading order
by simply looking which line is above which other lines.

When there are no columns white separators and no black
columns separators, then we determine the reading order
by simply looking which line is above which other lines.
@zuphilip
Copy link
Collaborator Author

The ordering of the textual parts on the same line is sometimes wrong.

@mittagessen
Copy link

This might not be entirely pertinent to this pull request but inclusion of an equivalent switch arose a few days ago in kraken (from an RTL reading order bug). I am not confident that there aren't edge cases where horizontal intra-column ordering is desirable that such a switch would disable. Have you looked into scenarios where such a switch would fail?

It isn't like the current segmenter is particulary good and I still plan on replacing it reasonably soon but it still shouldn't be made worse for some inputs.

@zuphilip
Copy link
Collaborator Author

@mittagessen Another PR #118 is also about the reading order. The picture there shows some of the cases which you might want to test as edge cases. The current code about this is hard to understand and will also only provide a partial ordering which is then extended to the transient hull later.

Maybe the question is also: do we expect that the columns recognition are not giving us all the details which we then have to handle in the reading order step?

@mittagessen
Copy link

I would expect the reading order determination to be completely independent of column detection, just as it is right now, mainly because the metric for column separation is different from the one used to separate lines horizontally.

Disabling the horizontal ordering results in minor vertical variations in typesetting causing reordering. A theoretical example I found is certain (single column) poetry that is justified through whitespace in the middle of the line; without y_overlap logic the ordering of the two line parts should be mostly random.

Admittedly, that's an esoteric use case but the proposed behavior would be worse for these inputs than the old one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants