PyMuPDF Pro cannot extract Chinese content from DOC and DOCX files #3976

maxyou2090 · 2024-10-22T03:35:33Z

Description of the bug

The module can only extract numeric or English content and does not support Chinese.

How to reproduce the bug

Code Sample

import pymupdf.pro

pymupdf.pro.unlock()
doc = pymupdf.open("/Users/maxyou/Downloads/demo.docx")
for page in doc:
    print(page.get_text())
    break

Output

PyMuPDFPro: Restricted Mode. Please visit https://pymupdf.io/try-pro to request your trial key.
hello,,123456789

DOCX Content

hello,中文示例,123456789

DOCX File
demo.docx

PyMuPDF version

1.24.11

Operating system

MacOS

Python version

3.12

The text was updated successfully, but these errors were encountered:

julian-smith-artifex-com · 2024-10-28T12:29:18Z

Thanks for the bug report. This was fixed in PyMuPDFPro-1.24.12.

julian-smith-artifex-com closed this as completed Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyMuPDF Pro cannot extract Chinese content from DOC and DOCX files #3976

PyMuPDF Pro cannot extract Chinese content from DOC and DOCX files #3976

maxyou2090 commented Oct 22, 2024

julian-smith-artifex-com commented Oct 28, 2024 •

edited

Loading

PyMuPDF Pro cannot extract Chinese content from DOC and DOCX files #3976

PyMuPDF Pro cannot extract Chinese content from DOC and DOCX files #3976

Comments

maxyou2090 commented Oct 22, 2024

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

julian-smith-artifex-com commented Oct 28, 2024 • edited Loading

julian-smith-artifex-com commented Oct 28, 2024 •

edited

Loading