Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add decode_as_image() to ContentStreams #2615

Merged
merged 10 commits into from
Jun 9, 2024
23 changes: 23 additions & 0 deletions docs/user/extract-images.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,26 @@ for image_file_object in page.images:
fp.write(image_file_object.data)
count += 1
```

# Other images

Some other objects can contain images, such as stamp annotations.

For example, this document contains such stamps:

[test_stamp.pdf](https://github.com/user-attachments/files/15751424/test_stamp.pdf)
pubpub-zz marked this conversation as resolved.
Show resolved Hide resolved

you can extract the image from the annotation with the following code
pubpub-zz marked this conversation as resolved.
Show resolved Hide resolved

```python
from pypdf import PdfReader

reader = PdfReader("test_stamp.pdf")
im = (
reader.pages[0]["/Annots"][0]
.get_object()["/AP"]["/N"]["/Resources"]["/XObject"]["/Im4"]
.decode_as_image()
)

im.show()
```
6 changes: 3 additions & 3 deletions tests/test_images.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,7 +442,7 @@ def test_inline_image_extraction():
img = Image.open(BytesIO(get_data_from_url(url, name=name)))
assert image_similarity(reader.pages[0].images[0].image, img) == 1


@pytest.mark.enable_socket()
def test_extract_image_from_object(caplog):
url = "https://github.com/py-pdf/pypdf/files/15176076/B2.pdf"
Expand All @@ -455,10 +455,10 @@ def test_extract_image_from_object(caplog):
with pytest.raises(Exception):
co = reader.pages[0].get_contents()
co.decode_as_image()
assert "does not seems to be an Image" in caplog.text
assert "does not seem to be an Image" in caplog.text
caplog.clear()
co.indirect_reference = "for_test"
with pytest.raises(Exception):
co = reader.pages[0].get_contents()
co.decode_as_image()
assert "does not seems to be an Image" in caplog.text
assert "does not seem to be an Image" in caplog.text
Loading