Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some inconsistencies in generating word clouds? #1073

Open
mw0000 opened this issue Mar 15, 2024 · 1 comment
Open

some inconsistencies in generating word clouds? #1073

mw0000 opened this issue Mar 15, 2024 · 1 comment
Labels

Comments

@mw0000
Copy link

mw0000 commented Mar 15, 2024

Dear Lexos team :)

I have recently started using Lexos, unfortunately I have noticed some inconsistencies in generating word clouds. I am attaching a file with the data, and in the Scrub options, I chose to remove spaces. The system generates different word clouds (see lexus4.png and lexus5.png). The words 'woman' and 'man' are the most popular in the text.

Did I make any mistake in preparing the data? The BubbleViz visualization works correctly.

Thanks!

words.txt
lexos4
lexos5

@scottkleinman
Copy link
Contributor

Thank you for reporting this. The issue relates to our use of the d3.js library, which sometimes removes high-frequency terms if they don't fit the layout. The issue was addressed for older versions of Lexos here, however, it may have been neglected in the latest release. From what I can tell from discussion in the d3 repo, the issue was not fixed in the d3 library and individual users have come up with their own workarounds. We will investigate whether we have one and, if so, whether it can be improved.

In the meantime, the problem can often be solved by re-generating the word cloud (click the "Generate" button). If you are unsure if the top terms are represented in the word cloud, you can check by going to the Prepare > Tokenize screen. You can do this by opening a new tab in your browser. Select the Raw and Descending radio buttons, and then click "Generate". You will be able to see the most frequent terms. If they don't match what you are seeing in the word cloud, try regenerating the word cloud until they do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants