You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New code is detecting more rulings that the old code.
New rulings:
Old rulings:
That's why is detecting 2 tables instead of 1, see images:
New:
Old:
I think it is ok to detect 2 tables, what should we do in this case?
The text was updated successfully, but these errors were encountered:
melisabok
changed the title
TestTableDetection.[35] Expected one table and detected two
TestTableDetection.[35]: Expected one table and detected two
Mar 3, 2017
That's an interesting side effect of the improvements in PDFBox 2.0: the old version missed some lines.
Also, we've run into this case before. Sometimes, the table detection algorithm picks up two "tables", one contained inside the other. Unfortunately, we haven't arrived to a decision on what to do. My guess is that we should build a tree of rectangles (using containedIn as the linkage criteria) and keep the outermost element. @jeremybmerrill any ideas?
I found the comparator in the NurminenDetectionAlgorithm and I made a fix to make the tests pass.
I'm not sure if this is the right solution, because this comparator doesn't ensure that the TreeSet keeps the outermost table, this depends of the order of the tables that you send in the addAll:
tableSet.addAll(tableAreas);
With this fix all the TestTableDetection tests are passing.
File us-009.pdf
New code is detecting more rulings that the old code.
New rulings:
Old rulings:
That's why is detecting 2 tables instead of 1, see images:
New:
Old:
I think it is ok to detect 2 tables, what should we do in this case?
The text was updated successfully, but these errors were encountered: