-
Notifications
You must be signed in to change notification settings - Fork 0
Wordcloud
Our choice has been to download the package wordcloud using pip, thus option two. From the github page https://github.com/amueller/word_cloud it has been pretty straightforward to implement the package. So we preprocessed our documents, henceforth to be called results. Thus, getting first all the text elements from each result datastructure, i.e. query,answers and accepted anwswers fields. Those strings are then concataned in one big string. But not every query has an accepted answers field so, we wrap a try and catch around it. So after all preprocessing we use wordcloud to generate the cloud and write it to the file. The final step is to return a html element of the cloud to be included with each result.
Example of a wordcloud of the collection of documents with topic 3D-Printing:
Another wordcloud with the topic Christianity:
From the look of it, the wordcloud looks like it accurately describes the topic.
This method of making a wordcloud work very well but the only problem is that it takes time to do it because it creates a file for each document and loads it. Another option for making a wordcloud would be to do it through javascript but this was something we found out after we already wrote the python file.