You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working with a corpus that primarily consists of longer documents. I'm seeking recommendations for the most effective approach to semantically tokenize them.
Examples:
Original Text: "I like the ambiance but the food was terrible."
Desired Output: ["I like the ambiance"] ["but the food was terrible."]
Original Text: "I don't know. I like the restaurant but not the food."
Desired Output: ["I don't know."] ["I like the restaurant"] ["but not the food."]
Any suggestions or advice on how to achieve this would be greatly appreciated!
The text was updated successfully, but these errors were encountered:
We don't have anything we explicitly does what you're looking for. You could constituency parse the sentence and take the top level divisions and that might do a good job, though.
I'm working with a corpus that primarily consists of longer documents. I'm seeking recommendations for the most effective approach to semantically tokenize them.
Examples:
Any suggestions or advice on how to achieve this would be greatly appreciated!
The text was updated successfully, but these errors were encountered: