Skip to content

mercury297/Course5

Repository files navigation

Course5

Step 1

The data from the given link was scraped using BeautifulSoup4

The given features were extracted and the data was cleaned.

This is done in the file step1.py

Step 2

Now a json file was generated in step1 which was imported in mongodb database using the following cmd: mongoimport --db test --collection productreviews --file data.json

picture alt

Step 3

Text classification of 1st 100 reviews using Latent Dirichlet Allocation algorithm

The text is lemmatized, the stop words are removed.

Now the TF-IDF of the text is taken along with LDA model from nltk library

Thus we get what topics might be associated with the given text.

More training would yield better results

picture alt

Step 3

Now finally the semantic analysis is done using Afinn library

This does not yield the best results. But due to the time constraint i've used this

picture alt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages