You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
test=u"\n\n\n\n\n\n\n\n\n\n\n\n\n\nHi <<First Name>>\nthis is filler text \xa325 more filler.\nadditilnal filler.\n\nyet more\xa0still more\xa0filler.\n\n\xa0\n\n\n\n\nmore\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nfiller.\x03\n\t\t\t\t\t\t almost there \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nthe end\n\n\n\n\n\n\n\n\n\n\n\n\n"test.encode('utf8')
Out[23]: b'\n\n\n\n\n\n\n\n\n\n\n\n\n\nHi <<First Name>>\nthis is filler text \xc2\xa325 more filler.\nadditilnal filler.\n\nyet more\xc2\xa0still more\xc2\xa0filler.\n\n\xc2\xa0\n\n\n\n\nmore\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nfiller.\x03\n\t\t\t\t\t\t almost there \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nthe end\n\n\n\n\n\n\n\n\n\n\n\n\n'cld2.detect(test.encode())
Traceback (mostrecentcalllast):
File"/home/ivan/Documentos/scrapinghub/dev/web-rcnn-venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line3267, inrun_codeexec(code_obj, self.user_global_ns, self.user_ns)
File"<ipython-input-24-68905466763d>", line1, in<module>cld2.detect(test.encode())
File"/home/ivan/Documentos/scrapinghub/dev/web-rcnn-venv/lib/python3.6/site-packages/cld2/__init__.py", line396, indetectcld_results.bytes_found))
ValueError: inputcontainsinvalidUTF-8aroundbyte158 (of-1117539408)
There are some UTF8 characters that make cld2 detector fails even if being UTF8 allowed characters. An example from mikemccand/chromium-compact-language-detector#22 (comment):
I'm using the following workaround as suggested in this link: mikemccand/chromium-compact-language-detector#22 (comment):
The text was updated successfully, but these errors were encountered: