Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyMuPDF Pro cannot extract Chinese content from DOC and DOCX files #3976

Closed
maxyou2090 opened this issue Oct 22, 2024 · 1 comment
Closed

Comments

@maxyou2090
Copy link

Description of the bug

The module can only extract numeric or English content and does not support Chinese.

How to reproduce the bug

Code Sample

import pymupdf.pro

pymupdf.pro.unlock()
doc = pymupdf.open("/Users/maxyou/Downloads/demo.docx")
for page in doc:
    print(page.get_text())
    break

Output

PyMuPDFPro: Restricted Mode. Please visit https://pymupdf.io/try-pro to request your trial key.
hello,,123456789

DOCX Content

hello,中文示例,123456789

DOCX File
demo.docx

PyMuPDF version

1.24.11

Operating system

MacOS

Python version

3.12

@julian-smith-artifex-com
Copy link
Collaborator

julian-smith-artifex-com commented Oct 28, 2024

Thanks for the bug report. This was fixed in PyMuPDFPro-1.24.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@maxyou2090 @julian-smith-artifex-com and others