Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: weighted average table metrics #3348

Merged
merged 14 commits into from
Nov 20, 2024

Conversation

badGarnet
Copy link
Collaborator

@badGarnet badGarnet commented Jul 3, 2024

This PR uses (number of actual table) weighted average instead of average without weights for table metrics.

  • pages where there are ground truth tables the weight is proportional to the number of ground truth tables in that page
  • pages where there are no ground truth tables but has predicted tables (false positive) are assigned as 1 table worth of weight for the whole page for calculating the mean value of table_level_acc
  • pages with false positive tables do not contribute to table structural or table content metrics

test

This PR updates the existing test for evaluating table metrics:

  • adds a second file with just 1 table vs. the existing file with 2 tables
  • test the weighted average is written to the report

@plutasnyy
Copy link
Contributor

There is one scenario we need to account for. When there is 0 tables in ground truth file, and there were some false positives. I think it could be considered as weight=1 what do you think? Now the file will be not counted right?

@badGarnet
Copy link
Collaborator Author

There is one scenario we need to account for. When there is 0 tables in ground truth file, and there were some false positives. I think it could be considered as weight=1 what do you think? Now the file will be not counted right?

good call; that makes sense

@badGarnet
Copy link
Collaborator Author

There is one scenario we need to account for. When there is 0 tables in ground truth file, and there were some false positives. I think it could be considered as weight=1 what do you think? Now the file will be not counted right?

good call; that makes sense

@plutasnyy actually in the code we already filter down to only rows with non-zero "total_tables". If we intend to change that behavior it would be better we do that in a different PR since it changes the existing behavior on tallying tables

- false negative tables gets a 0 score for table level acc metric and a
  weight equal to 1 table per page
- false negative tables do not contribute to other table metrics since
  there is no ground truth to evaluate structure or content of those
  tables
@badGarnet badGarnet added this pull request to the merge queue Jul 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 9, 2024
@badGarnet badGarnet added this pull request to the merge queue Jul 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 9, 2024
@badGarnet badGarnet added this pull request to the merge queue Jul 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 9, 2024
@badGarnet badGarnet added this pull request to the merge queue Nov 20, 2024
Merged via the queue into main with commit 3b9b01c Nov 20, 2024
41 checks passed
@badGarnet badGarnet deleted the feat/weighted-average-table-metrics branch November 20, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants