Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect widget key retrieval : inconsistency is caused by a dot in the key string #770

Open
lionpeloux opened this issue Dec 12, 2024 · 2 comments

Comments

@lionpeloux
Copy link

Version

PyPDFForm=1.4.37

Issue Description

In the attached pdf form, I am trying to rename the keys using update_widget_key (and a custom key-> key json map). During that process, I've remarked that a widget key was incorrectly retrieved :

The checkbox Gain de 2 classes on page 2 should be named "Gain de 2 classes.0" (according to Acrobat DC) but PyPDFForm is retrieving it as "0".

Moreover, when I try to rename this checkbox widget with :
update_widget_key ("0", "situation_finale_gain_2_classes")

what I get for the renamed widget is :
key="Gain de 2 classes. situation_finale_gain_2_classes"

looks like something is broken because of the dot . in the widget key ...

202402_AttestationTravauxMPRaccompagne-devis_WEBAI.pdf

@chinapandaman
Copy link
Owner

Hey thanks for posting.

It appears that other non-Acrobat PDF viewers like DocFly also sees that checkbox with a key of 0:

image

I think both PyPDFForm and DocFly retrieved the key based on the /T property of the widget. However, the way Acrobat retrieved the key is by combining the /TU and /T properties with a dot inbetween:

image

Which would explain why when you tried to modify the key by doing update_widget_key ("0", "situation_finale_gain_2_classes") it only modified the part after the dot (although I'm not sure where that extra space comes from).

In the meantime I suggest try to modify that key to remove the dot by using some PDF viewers instead of the library and see what happens. I need to do some researches around the /TU property and see whether it should be part of a widget's key.

@chinapandaman
Copy link
Owner

Hey sorry for a much delayed response.

I figured out the problem. See section 12.7.3.2 found on page 434 of the PDF standard. Turns out Gain de 2 classes.0 is the full name of the checkbox while the library was only supporting partial names.

I have added the support of full names in the most recent bump. See the docs here. One thing to note though, since full names involve two widgets which would make modify widget keys very difficult, that functionality is disabled when full names are enabled for a PdfWrapper object.

Let me know if you have more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants