You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See the rawresponse class - from solr to xml as string parsable by etree. Note that the html tag removal can't be here - it's running against the xml text blocks instead. Likely also of any encoding issues related to the unicode escape.
So basic text cleanup just to parse and then the two other cleanup tasks against the xml.
Related: #3 encoding problems.
So there's a parsing pathway for the NLP pipeline (clean everything) and a pipeline to the triplestore (text from the node, untouched).
Tasks:
The text was updated successfully, but these errors were encountered: