-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to remove POS tagging before input to word cloud #991
Comments
Two small corrections:
|
Thank you for the report. I think we should internally discuss the best solution to this issue. Is there any other situation where you would like to have pos tags and then have them removed later besides the following two:
|
Yes, I assume the POS tags (if present) make a difference not only in filtering but in any type of analysis (classification, clustering, network analysis, ...), but I'd like to have the choice not to show them in any type of visualization - not only Word Cloud but also, for instance, Annotated Corpus Map and even in Data Table. There, I think it also makes sense to merge different 'versions' of a word, like 'practitioner' in my screenshot above. |
BTW, Annotated Corpus Map is clustering and visualization in one widget. I seems to makes sense to consider the POS tags for clustering but not for the visualization. |
This is a bit of a stale issue but I gave it some thought. Word Cloud currently doesn't show POS tags anymore. However, it would not merge two words with the same name into one. |
I agree this could best be added to Preprocess Text. However, if you add it to POS Tagger, you have to activate POS Tagger twice: once before and once after Filtering. Perhaps it makes more sense as a final option in Filtering, where the current final option is filtering based on POS tags? |
Duh, how did this not occur to me? 🤦♀️ |
Is your feature request related to a problem? Please describe.
In a workflow where I applied POS tagging to allow selecting (for instance) just nouns and verbs, then Bag of Words, Distances, Hierarchical Clustering and visualize clusters in Word Cloud, the word cloud shows all words with their POS tags, and words that are present with different tags are shown multiple times:
Instead I would like to be able to see each word in Word Cloud only once, without POS tagging.
Contrary to Bag of Words, widgets with similar functionality such as Document Embedding or Similarity Hashing do not produce output with POS tagging.
Describe the solution you'd like
I think there are different options:
Describe alternatives you've considered
Couldn't find any
The text was updated successfully, but these errors were encountered: