You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm getting errors with PDFs encoded with latin-1. Here's an example.
The problem occurs at this line because the byte string isn't encoded with UTF-8. If I replace with title.decode('iso-8859-1'), it works flawlessly.
I think a solution would be to extract the info encoding using pdfminer but I couldn't find how. Another possibility is using chardet or testing for several encodings and catching the exceptions.
The text was updated successfully, but these errors were encountered:
Hi,
I'm getting errors with PDFs encoded with latin-1. Here's an example.
The problem occurs at this line because the byte string isn't encoded with UTF-8. If I replace with
title.decode('iso-8859-1')
, it works flawlessly.I think a solution would be to extract the info encoding using pdfminer but I couldn't find how. Another possibility is using chardet or testing for several encodings and catching the exceptions.
The text was updated successfully, but these errors were encountered: