Cloning errors when using context manager #2912

pubpub-zz · 2024-10-20T06:52:38Z

I agree with @alexaryn in that responsibility of closing a stream should rest purely on the caller of PdfWriter, and not PdfWriter itself. Calling pdf_writer.close() or using the context manager should only clean up the resources that PdfWriter itself has created as part of its operation. I apologize for not catching the current behavior when I reviewed #1193. 😬

Looking at the code, there is also the inefficiency where PdfWriter.__init__ is called twice. It's also permissible to call PdfWriter with a PdfReader as the first argument (to simplify cloning), but in doing so with contexts, will hit the following error:
>>> with PdfReader("./sample-files/001-trivial/minimal-document.pdf") as reader:
...   with PdfWriter(reader) as writer:
...     print(writer.fileobj)
...
__init__
__enter__
__init__
<pypdf._reader.PdfReader object at 0x10304e970>
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/Users/mpeveler/code/github/pypdf/pypdf/_writer.py", line 373, in __exit__
    self.write(self.fileobj)
  File "/Users/mpeveler/code/github/pypdf/pypdf/_writer.py", line 1396, in write
    self.write_stream(stream)
  File "/Users/mpeveler/code/github/pypdf/pypdf/_writer.py", line 1367, in write_stream
    object_positions, free_objects = self._write_pdf_structure(stream)
  File "/Users/mpeveler/code/github/pypdf/pypdf/_writer.py", line 1500, in _write_pdf_structure
    stream.write(self.pdf_header.encode() + b"\n")
AttributeError: 'PdfReader' object has no attribute 'write'
I'm not sure what the correct behavior here should be, especially with regards to cloning. Like, if I call:
with PdfWriter("foo.pdf") as writer:
and foo.pdf exists, do I expect that it'll be cloned into the writer? I'd think no? However, I think this does force the second __init__ call to be made as there's no way to tell in the first __init__ that we're in a context. I'd think this could be confusing for end users, but I guess not since no one has written in about this behavior.

Originally posted by @MasterOdin in #2905 (comment)

The text was updated successfully, but these errors were encountered:

closes py-pdf#2912

alexaryn · 2024-10-21T01:01:30Z

The API is ambiguous here, which makes a solution elusive. In a simple construction like PdfWriter(f) the semantics depend on if f is an empty file or not. Empty means write to f and non-empty means clone from f. This violates the principle of least astonishment. I would like it better if there were separate keyword arguments for "write to" and "clone from".

The other main problem is that the context manager __exit__() is going to try to write to (and close) the file object, even if it was of the "clone from" kind and not writable.

The double-init suggests that the logic and API can be tightened up.

stefan6419846 · 2024-10-21T07:32:58Z

I have already requested changes on the corresponding PR, which is currently flawed and just introduced a magic positional argument which would guess its correct destination - while this would work if the flaws are removed, this just complicates the pypdf code unnecessarily as the user should know best what is desired in each case.

My proposal was to make the API more explicit by enforcing keyword-only arguments in the future (after a deprecation period). This way, we should at least avoid the ambiguity and force the user to set the correct value to avoid the current side effects.

stefan6419846 changed the title ~~errors when using context manager~~ Cloning errors when using context manager Oct 20, 2024

stefan6419846 added the PdfWriter The PdfWriter component is affected label Oct 20, 2024

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Oct 20, 2024

BUG: Improve PdfWriter handing of context manager

fb1ee44

closes py-pdf#2912

pubpub-zz linked a pull request Oct 20, 2024 that will close this issue

BUG: Improve PdfWriter handing of context manager #2913

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloning errors when using context manager #2912

Cloning errors when using context manager #2912

pubpub-zz commented Oct 20, 2024 •

edited by stefan6419846

Loading

alexaryn commented Oct 21, 2024 •

edited

Loading

stefan6419846 commented Oct 21, 2024

Cloning errors when using context manager #2912

Cloning errors when using context manager #2912

Comments

pubpub-zz commented Oct 20, 2024 • edited by stefan6419846 Loading

alexaryn commented Oct 21, 2024 • edited Loading

stefan6419846 commented Oct 21, 2024

pubpub-zz commented Oct 20, 2024 •

edited by stefan6419846

Loading

alexaryn commented Oct 21, 2024 •

edited

Loading