You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unfortunately, the current HTML parser and tokenization does not have great support for UTF-8 and other encodings and is not well tested with many languages. We started a full rewrite (issue #80) of the HTML parser in order to solve this and other issues it is not complete and well tested yet (pull request #191).
I am trying to crawl arabic site. but the content is being stored as question marks. English site works good.
The text was updated successfully, but these errors were encountered: