-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: unhashable type: 'list' where processing a pdf file #1039
Comments
Hmm. According to the PDF spec:
Either pdfminer has gotten the PDF font dictionary and the font program confused, or whatever piece of software created the PDF did that, because an Encoding entry in the font dictionary can only be a name or a dictionary, whereas a Type 1 font's Encoding array looks exactly like what you've got in the log (it's full of ".notdef"). Since the log you've provided is just reporting what's in the file itself, I'm inclined to think that it's the PDF software's fault (especially since it claims that this is a TrueType font!). But of course pdfminer should be robust to these sorts of shenanigans. What software created the PDF? |
I am having the same issue with a similar looking file (I also cannot provide it for data sensitivity issues). |
TypeError: unhashable type: 'list' where processing a special pdf file:
Sorry I could not provide pdf file here as it is internal doc.
I did live debug, and the call flow info as below (other objid seems fine):
line: 384 in pdfminer/pdfinterp.py
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
stack value:
k = 'Font'
fontid = 'F220'
objid = 24
resources = {'Font': {'F151': PDFObjRef:192, 'F158': PDFObjRef:22, 'F165': PDFObjRef:23, 'F220': PDFObjRef:24, 'F222': PDFObjRef:25, 'F225': PDFObjRef:19, 'F229': PDFObjRef:26, 'F274': PDFObjRef:17, 'F296': PDFObjRef:27, 'F298': PDFObjRef:28, 'F318': PDFObjRef:15, 'F321': PDFObjRef:29, 'F363': PDFObjRef:30, 'F366': PDFObjRef:31, 'F373': PDFObjRef:32, 'F377': PDFObjRef:33, 'F378': PDFObjRef:34, 'F381': PDFObjRef:35, 'F97': PDFObjRef:14}, 'ProcSet': [/'PDF', /'ImageB', /'ImageC', /'Text'], 'Type': /'Resources', 'XObject': {'I100': PDFObjRef:56, 'I104': PDFObjRef:58, 'I108': PDFObjRef:60, 'I112': PDFObjRef:62, 'I116': PDFObjRef:64, 'I12': PDFObjRef:66, 'I120': PDFObjRef:68, 'I124': PDFObjRef:70, 'I128': PDFObjRef:72, 'I132': PDFObjRef:73, 'I136': PDFObjRef:75, 'I140': PDFObjRef:77, 'I144': PDFObjRef:79, 'I148': PDFObjRef:81, 'I152': PDFObjRef:83, 'I156': PDFObjRef:85, 'I16': PDFObjRef:87, 'I160': PDFObjRef:89, 'I164': PDFObjRef:91, ...}}
spec = {'BaseFont': /'3_of_9_Barcode', 'Encoding': [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', ...], 'FirstChar': 30, 'FontDescriptor': PDFObjRef:39, 'LastChar': 255, 'Subtype': /'TrueType', 'Type': /'Font', 'Widths': [750, 750, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, ...]}
==>
line: 219 in pdfminer/pdfinterp.py
font = PDFTrueTypeFont(self, spec)
==>
line: 992: pdfminer/pdffont.py
init(rsrcmgr, spec)
==>
line: 956: pdfminer/pdffont.py
PDFSimpleFont.init(self,
descriptor: Mapping[str, Any],
widths: FontWidthDict,
spec: Mapping[str, Any])
stack value:
descriptor = {'Ascent': 750, 'CapHeight': 0, 'Descent': -12, 'Flags': 42, 'FontBBox': [0, -7, 2197, 750], 'FontFile2': PDFObjRef:38, 'FontName': /'3_of_9_Barcode', 'ItalicAngle': 0, 'StemV': 0, 'Type': /'FontDescriptor'}
widths = {30: 750, 31: 750, 32: 580, 33: 580, 34: 580, 35: 580, 36: 580, 37: 580, 38: 580, 39: 580, 40: 580, 41: 580, 42: 580, 43: 580, 44: 580, 45: 580, 46: 580, 47: 580, 48: 580, 49: 580, 50: 580, 51: 580, 52: 580, 53: 580, 54: 580, 55: 580, 56: 580, 57: 580, 58: 580, 59: 580, 60: 580, 61: 580, 62: 580, 63: 580, 64: 580, 65: 580, 66: 580, 67: 580, 68: 580, 69: 580, 70: 580, 71: 580, 72: 580, 73: 580, 74: 580, 75: 580, 76: 580, 77: 580, 78: 580, 79: 580, 80: 580, 81: 580, 82: 580, 83: 580, 84: 580, 85: 580, 86: 580, 87: 580, 88: 580, ...}
spec = {'BaseFont': /'3_of_9_Barcode', 'Encoding': [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', ...], 'FirstChar': 30, 'FontDescriptor': PDFObjRef:39, 'LastChar': 255, 'Subtype': /'TrueType', 'Type': /'Font', 'Widths': [750, 750, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, ...]}
==>
line: 965: pdfminer/pdffont.py
stack value:
encoding = [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'space', /'exclam', /'universal', /'numbersign', /'existential', /'percent', /'ampersand', /'suchthat', /'parenleft', /'parenright', /'asteriskmath', /'plus', /'comma', /'minus', /'period', /'slash', /'zero', /'one', /'two', /'three', /'four', /'five', /'six', /'seven', /'eight', /'nine', /'colon', ...]
the code failed on
self.cid2unicode = EncodingDB.get_encoding(literal_name(encoding))
The stack trace is:
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/high_level.py", line 211, in extract_pages
interpreter.process_page(page)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
self.init_resources(resources)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 384, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 219, in get_font
font = PDFTrueTypeFont(self, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 1010, in init
data = self.fontfile.get_data()[:length1]
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 969, in init
self.unicode_map = FileUnicodeMap()
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/encodingdb.py", line 113, in get_encoding
if diff:
TypeError: unhashable type: 'list'
The text was updated successfully, but these errors were encountered: