Got "malloc(): unaligned tcache chunk detected Aborted (core dumped)" while using add_redact_annot/apply_redactions #3758

JiahuanChen · 2024-08-08T03:21:03Z

Description of the bug

I was trying to remove all text from PDF files. My python script looks like the following:

for page in document:
    info = json.loads(page.get_text('json', flags=fitz.TEXTFLAGS_TEXT))
    for block_ind, block in enumerate(info['blocks']):
        for line_ind, line in enumerate(block['lines']):
            for span_ind, span in enumerate(line['spans']):
                # print(span)
                page.add_redact_annot(fitz.Rect(*span['bbox']))
    page.apply_redactions()

This code works well, but notice the # print(span). If I print the infos, i would get malloc(): unaligned tcache chunk detected Aborted (core dumped).

This is really strange to me.

Do I need to upload th PDF files or other informations? Because the files contain personal information, I am not willing to upload it to be honest.

How to reproduce the bug

smiply comment/uncomment the print line would reproduce the bug.

PyMuPDF version

1.24.9

Operating system

Linux

Python version

3.10

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2024-08-08T06:33:31Z

You can send me the file via mail, so it won't be exposed here.
Is this the only file showing the problem?
I also am a little confused:
Why do you extract all text at all if you want to remove it anyway? You can simply add one redaction annotation covering the full page.
But you should add options to apply_redactions that prevent removal of images and graphics.
You don't do that currently albeit your text might overlap such objects...
Anyway, we cannot follow up the problem without a file at hand.

JiahuanChen · 2024-08-08T09:50:53Z

Hello, just send you an email with the problem file. It is the only file with the problem.

And by the way, if I apply_redactions each time after add_redact_annot. the code works well -- without error and correct result.

for page in document:
    info = json.loads(page.get_text('json', flags=fitz.TEXTFLAGS_TEXT))
    for block_ind, block in enumerate(info['blocks']):
        for line_ind, line in enumerate(block['lines']):
            for span_ind, span in enumerate(line['spans']):
                print(span)
                page.add_redact_annot(fitz.Rect(*span['bbox']))
                page.apply_redactions()

JorjMcKie · 2024-08-08T19:48:47Z

Thanks for the file.
I was able to reproduce the problem - but only under Linux: it runs fine under Windows. I used the following simplified script by the way - no need to make a json string which you immediately convert back to a Python dictionary.
Also note that there is no need to convert 4-tuples to rectangles: all PyMuPDF methods will detect Python sequences where points, rectangles or matrices are expected and does the necessary conversions.

import pymupdf


doc = pymupdf.open("test.pdf")
page = doc[0]
blocks = page.get_text("dict", flags=pymupdf.TEXTFLAGS_TEXT)["blocks"]
spans = [s for b in blocks for l in b["lines"] for s in l["spans"]]
for s in spans:
    page.add_redact_annot(s["bbox"])
page.apply_redactions()
print(f"{len(spans)} annots created")
doc.ez_save("redacted.pdf")

This script runs under Windows, but gets the malloc error under Linux.

So how do you want to proceed: we will need to get the MuPDF team involved for a solution, so they would also need the reproducing file - for which I need your ok.
Of course PyMuPDF and MuPDF are all maintained by the same company Artifex, so confidentiality is secured in any case.

JiahuanChen · 2024-08-09T00:52:35Z

Yes sure, you could share the file with your team.

Thank you for the improving codes.

wapiflapi · 2024-10-11T16:52:15Z

I'm seeing the same issue on some documents. Unfortunately I'm not able to share them.

Is there a place where we can follow the progress on this issue on MuPDF's side of things ?

In the meantime, did someone find a workaround for this issue when it happens ?

julian-smith-artifex-com · 2024-10-28T15:32:14Z

We have a fix for the problem in MuPDF.

I don't yet know when this will be available for use in a PyMuPDF release.

For #3758. Note that the input file is not public so this test does nothing if it is not present.

Fixes the issue in: pymupdf/PyMuPDF#3758

julian-smith-artifex-com added fix developed release schedule to be determined upstream bug bug outside this package labels Oct 28, 2024

julian-smith-artifex-com added a commit that referenced this issue Oct 28, 2024

tests/test_annots.py: added test_3758().

ab8ae83

For #3758. Note that the input file is not public so this test does nothing if it is not present.

julian-smith-artifex-com added a commit that referenced this issue Oct 28, 2024

tests/test_annots.py: added test_3758().

7d1dd31

For #3758. Note that the input file is not public so this test does nothing if it is not present.

chris-liddell pushed a commit to ArtifexSoftware/mupdf that referenced this issue Oct 29, 2024

Fix reference counting error in pdf_redact_image_filter_pixels

e487b22

Fixes the issue in: pymupdf/PyMuPDF#3758

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got "malloc(): unaligned tcache chunk detected Aborted (core dumped)" while using add_redact_annot/apply_redactions #3758

Got "malloc(): unaligned tcache chunk detected Aborted (core dumped)" while using add_redact_annot/apply_redactions #3758

JiahuanChen commented Aug 8, 2024 •

edited

Loading

JorjMcKie commented Aug 8, 2024

JiahuanChen commented Aug 8, 2024

JorjMcKie commented Aug 8, 2024

JiahuanChen commented Aug 9, 2024

wapiflapi commented Oct 11, 2024

julian-smith-artifex-com commented Oct 28, 2024

Got "malloc(): unaligned tcache chunk detected Aborted (core dumped)" while using add_redact_annot/apply_redactions #3758

Got "malloc(): unaligned tcache chunk detected Aborted (core dumped)" while using add_redact_annot/apply_redactions #3758

Comments

JiahuanChen commented Aug 8, 2024 • edited Loading

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

JorjMcKie commented Aug 8, 2024

JiahuanChen commented Aug 8, 2024

JorjMcKie commented Aug 8, 2024

JiahuanChen commented Aug 9, 2024

wapiflapi commented Oct 11, 2024

julian-smith-artifex-com commented Oct 28, 2024

JiahuanChen commented Aug 8, 2024 •

edited

Loading