You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In production at IA, probably caused by petabox downtime or network error, I got a the following exception and stack trace:
TypeError: sequence item 0: expected str instance, bytes found
File "extraction_ungrobided.py", line 272, in <module>
MRExtractUnGrobided.run()
File "mrjob/job.py", line 424, in run
mr_job.execute()
File "mrjob/job.py", line 433, in execute
self.run_mapper(self.options.step_num)
File "mrjob/job.py", line 517, in run_mapper
for out_key, out_value in mapper(key, value) or ():
File "extraction_ungrobided.py", line 228, in mapper
info, status = self.extract(info)
File "extraction_ungrobided.py", line 143, in extract
info['file:cdx']['c_size'])
File "extraction_ungrobided.py", line 126, in fetch_warc_content
gwb_record = rstore.load_resource(warc_uri, offset, c_size)
File "wayback/resourcestore.py", line 65, in load_resource
return create_resource(loader.load_block(bstart, blen))
File "wayback/resource.py", line 583, in create_resource
record, errors, offset = parser.parse(rs, 0, line)
File "hanzo/warctools/warc.py", line 223, in parse
% (",".join(self.KNOWN_VERSIONS)),
In production at IA, probably caused by petabox downtime or network error, I got a the following exception and stack trace:
self.KNOWN_VERSIONS
is defined as bytes at https://github.com/internetarchive/warctools/blob/master/hanzo/warctools/warc.py#L177, but is being joined with a string.One fix, though i'm not sure it would work in Python 2.7, would be:
(",".join([s.decode('utf-8') for s in self.KNOWN_VERSIONS])
There's probably a more idiomatic way, but I can submit a patch for that.
While we're at it, might want to make it a join on ", ", not ","?
The text was updated successfully, but these errors were encountered: