-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds pandas-friendly formatting for multi-column and multi-index tables #986
Conversation
ok what's the best way to include a pandas df for testing (assuming we don't want to require pandas for testing) |
We have camelot.py into test\requirements.txt, and camelot depends on pandas, so pandas is already installed with our test requirements. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #986 +/- ##
==========================================
- Coverage 93.70% 93.40% -0.30%
==========================================
Files 28 28
Lines 8276 8309 +33
Branches 1514 1522 +8
==========================================
+ Hits 7755 7761 +6
- Misses 324 350 +26
- Partials 197 198 +1
☔ View full report in Codecov by Sentry. |
I added the code to create a table from a dataframe directly to the also added a test! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afriedman412, I appreciate your enthusiasm and can support the basic purpose of this PR. It is certainly desirable to make the use of data generated by other software packages as easily useable as possible. This applies to almost any data producing software, which happens to include pandas.
BUT.
The way you implemented this is not how it should be done.
There is absolutely no reason why the core classes of fpdf2 should have any knowledge about internal data structures of any one of those data generating packages. Sorry if this sounds unduly harsh, but from a software architecture perspective, this approach is just utterly and horribly wrong. If we allow this to happen and then continue on the the same path, fpdf2 would quickly turn into bloatware and become unmaintainable.
I've outlined the basic reasons for my objection and at least one possible solution in #970, just before and again after you joined the discussion. You seemed to have either not read, not understood, or just ignored this input. Is there a particular reason for that?
Apologies for not seeing that you already went through this above. I kind of misread the conversation, as what stuck out to me immediately was the mis-framing of the issue. While I agree that it's not ideal that Pandas doesn't do this formatting itself, multi-level dataframes are common enough that their formatting is worth accommodating. Automatically grabbing the number of levels is more intuitive than having to pass them as separate variables. It might not be elegant from a software engineer standpoint, but it's a small UX improvement that will save data people (such as myself) countless small headaches. But it's your package and if you don't want it that way, that's up to you! To the larger point -- and why I missed the earlier conversation -- my goal here was to build out FPDF's ability to understand what index and column labels are, if for no other reason than to make their independent styling less janky! |
Yes, #970 is approaching the topic from a rather particular angle. No worries about that.
That part is very much appreciated! The dilemma is simply about the optimal answer to "what goes where". What the table module definitively needs internally are a good way to handle row labels (are those called "indexes" outside the pandas ecosystem as well?) and row spans. Adding those two features would be a good foundation for any of the more ambitious ideas (and the latter also helpful for On top of that foundation, a good concept for what you call "multi-level dataframes" will need some careful thought. The trick here is not focus your tunnel vision on pandas. Consider that there are likely other packages out there that may produce similar data, but will deliver them in a slightly different form, using differently named and organized data structures. Ultimately we will want to support all of them (or at least make that possible), so we need some kind of translation layer in between, separate from the core fpdf2 modules. Whether this translation layer should become a part of the library (eg. in a new subdirectory "adapters"), would be better hosted as its own project, or some other approach entirely, that is up to discussion. But in any case, there needs to be a cleanly structured system with defined levels of abstraction. Data translation and conversion between different systems is a tricky topic, and doesn't get easier by just stirring everything together in one bowl. If you're still interested in doing this (assuming I haven't chased you away with my criticism 😉), I suggest to restart from a clean slate. You can probably reuse a good chunk of the code already written, but it needs to be organized quite differently, so trying to salvage this PR might more hinder your progress than helping it. |
fpdf/fpdf.py
Outdated
@@ -4882,6 +4882,23 @@ def table(self, *args, **kwargs): | |||
yield table | |||
table.render() | |||
|
|||
def create_table_from_dataframe(self, df, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such function should probably live in its own fpdf/pandas.py
module.
@gmischler already provided a detailed explanation on why this should not be there, in fpdf2
core source file 😊
I fully second @gmischler there 😊 I totally agree that this PR cannot be merged "as it is" now, We sure can find room in A good start could be to refactor your code in order minimizes the changes to the existing classes, |
appreciate the feedback! do you already have |
So far they all live in the "Mixing with other libs" section of the documentation...
I'd hope to keep it as generic as possible. fpdf/adapters/table_base.py There may well be other types of adapters in the future covering different aspects of working with fpdf2. I can't think of a good example right now, but users keep surprising me with their ideas... 😉 |
ok rad, will work on this |
can you clarify what you want the I got down the rabbit hole of heading/index functionality and realized I didn't have a destination! |
(partially) Fixes #970
• Tables can now handle multi-index formatting similar to how multi-column formatted is currently implemented.
• Code added to format multi-index and multi-header to match Pandas (repeat labels are now omitted but spaced correctly
Checklist:
The GitHub pipeline is OK (green),
meaning that both
pylint
(static code analyzer) andblack
(code formatter) are happy with the changes of this PR.A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the
docs/
folderA mention of the change is present in
CHANGELOG.md
By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.