Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of rowspan in tables #1088

Merged
merged 32 commits into from
Feb 2, 2024

Conversation

mjasperse
Copy link

@mjasperse mjasperse commented Jan 3, 2024

Hi team! I've been working on this for a while, apologies in advance for the huge PR. The changes are primarily motivated by my use-case, so am happy to discuss whether things should be taken out or handled differently.

The key problem with rowspan is that the rows can't be treated independently any more and there needs to be multiple layout passes to choose how to allocate padding to ensure no cell is truncated. Furthermore the mapping between "column index" and "table index" is nontrivial because it's perturbed both by colspan and rowspan intrusions. This makes it quite hard to generate a valid table on the first try.

To resolve this I decided to do a table "normalisation" step, where a None entry is inserted into Row.cells to simplify keeping track of where a spanned cell exists, so the index has clear meaning and doesn't need remapping. This greatly simplified the debugging process because you can easily inspect the book-keeping to ensure it matches expectations. This is my third attempt at the rowspan algorithm because of how hard it is to handle edge-cases reliably...

I also removed the requirement that all rows have the same number of cells, instead scaling the whole table to the max number of cells in a row. This means that if you get a rowspan or colspan wrong somewhere, you can get some visual feedback from the PDF output on what is wrong instead of just getting an assertion error with table index, which really helps identify what needs fixing in the table. One option is to emit a warning in this case?

During development I also found it very helpful to be able to define spans using placeholder elements, as requested in #970. My solution is to use an enum object to identify where the span should occur, so it cannot be confused with valid cell contents. This is somewhat similar to extended markdown and I find more intuitive for generating from some table sources - see test_table_with_rowspan_and_pgbreak() for an example. Personally I found this a useful feature but I also see the objections in #970 so I'm open to separating it out.

I've added a bunch of test cases to track the expected behaviour - particularly mess relating to pagebreaks. I anticipate adding some more as I think of other edge cases or run into issues with using it in our application.

There will be merge conflicts with my other PRs, but would prefer to hold off rebasing until after review.

Resolves #1064

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@mjasperse
Copy link
Author

Hi @Lucas-C and @gmischler, hope you're well! Anything I can do to assist with review of my PRs?

@andersonhc
Copy link
Collaborator

@mjasperse can you please resolve the conflicts? I will work on reviewing this PR next.

No longer relevant after rowspan refactor
Returns to frozen dataclasses, row-rendering mechanism
Previous behaviour allowed cascading of font style, in particular test
case test_table_with_capitalized_font_family_and_emphasis
@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2024

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (6a8678b) 93.38% compared to head (3d606ae) 93.32%.

❗ Current head 3d606ae differs from pull request most recent head b9a862c. Consider uploading reports for the commit b9a862c to get more accurate results

Files Patch % Lines
fpdf/table.py 91.25% 8 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1088      +/-   ##
==========================================
- Coverage   93.38%   93.32%   -0.07%     
==========================================
  Files          29       29              
  Lines        8591     8701     +110     
  Branches     1894     1929      +35     
==========================================
+ Hits         8023     8120      +97     
- Misses        354      362       +8     
- Partials      214      219       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andersonhc
Copy link
Collaborator

andersonhc commented Feb 2, 2024

I ran several comparisons between current current master and mjasperse's branch, up to a 100k lines table (5500+ pages file) and there is no significant performance change - mjasperse's version is actually faster in most of the profile runs.

@andersonhc andersonhc merged commit 1effbb0 into py-pdf:master Feb 2, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tables with multiple header rows and merging cells
3 participants