Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tables>processing>bordered_tables>cells>indentification.py identify_cells raise ZeroDivisionError: division by zero #220

Open
hbh112233abc opened this issue Sep 27, 2024 · 1 comment

Comments

@hbh112233abc
Copy link

test data

    h_lines_arr = np.array(
        [
            [250, 707, 2302, 707],
            [250, 825, 2302, 825],
            [250, 954, 2302, 954],
            [250, 1066, 1977, 1066],
            [1977, 1066, 2302, 1066],
            [250, 1192, 1703, 1192],
            [1977, 1192, 2302, 1192],
            [250, 1268, 1703, 1268],
            [1977, 1268, 2302, 1268],
            [250, 1346, 1703, 1346],
            [1977, 1346, 2302, 1346],
            [250, 1423, 1703, 1423],
            [1977, 1423, 2302, 1423],
            [250, 1500, 1703, 1500],
            [1977, 1500, 2302, 1500],
            [250, 1770, 1703, 1770],
            [1977, 1770, 2302, 1770],
            [250, 2118, 1703, 2118],
            [1977, 2118, 2302, 2118],
            [250, 2301, 1703, 2301],
            [1977, 2301, 2302, 2301],
            [250, 2401, 1703, 2401],
            [1977, 2401, 2302, 2401],
            [250, 2498, 1703, 2498],
            [1703, 2498, 1703, 2498],
            [1977, 2498, 2302, 2498],
            [250, 2601, 981, 2601],
            [1977, 2601, 2302, 2601],
            [366, 2736, 981, 2736],
            [1977, 2736, 2302, 2736],
            [366, 2872, 981, 2872],
            [1977, 2872, 2302, 2872],
            [366, 3007, 2302, 3007],
            [366, 3143, 2302, 3143],
            [250, 3278, 2302, 3278],
            [250, 2040, 2302, 2040],
            [250, 2194, 2302, 2194],
        ],
        np.int64,
    )
    v_lines_array = np.array(
        [
            [250, 707, 250, 3278],
            [366, 1066, 366, 2118],
            [366, 2601, 366, 3278],
            [523, 707, 523, 1066],
            [981, 1066, 981, 3278],
            [1222, 825, 1222, 1066],
            [1300, 1066, 1300, 2118],
            [1434, 707, 1434, 1066],
            [1703, 707, 1703, 2498],
            [1977, 825, 1977, 3007],
            [2302, 707, 2302, 3278],
        ],
        np.int64,
    )

remove it's wrapper @njit("int64[:,:](int64[:,:],int64[:,:])", cache=True, fastmath=True),exception not raise

@MathieuSeraphim
Copy link

MathieuSeraphim commented Oct 25, 2024

Can confirm.

The problem occurs due to lines 30 and 31:

l_corresponds = -0.02 <= (x1i - x1j) / (x2i - x1i) <= 0.02
r_corresponds = -0.02 <= (x2i - x2j) / (x2i - x1i) <= 0.02

with, at line 22:

x1i, y1i, x2i, y2i = h_lines_arr[i][:]

Basically, if the x coordinates of a horizontal line match (i.e. the line is 1 pixel wide), this generates a division by 0.
In the example ablve, for instance, h_lines_arr contains the following line:
[1703, 2498, 1703, 2498],
Commenting the @njit(...) wrapper at line 11 just turns the ZeroDivisionError into a RuntimeWarning on my end.

Ideally, one-pixel-wide horizontal lines (i.e. points) shouldn't have been identified as horizontal lines in the first place.
A quick fix would be to add this before line 30:

if x1i == x2i:
    continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants