Course5

Step 1

The data from the given link was scraped using BeautifulSoup4

The given features were extracted and the data was cleaned.

This is done in the file step1.py

Step 2

Now a json file was generated in step1 which was imported in mongodb database using the following cmd: mongoimport --db test --collection productreviews --file data.json

Step 3

Text classification of 1st 100 reviews using Latent Dirichlet Allocation algorithm

The text is lemmatized, the stop words are removed.

Now the TF-IDF of the text is taken along with LDA model from nltk library

Thus we get what topics might be associated with the given text.

More training would yield better results

Step 3

Now finally the semantic analysis is done using Afinn library

This does not yield the best results. But due to the time constraint i've used this

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
classified.json		classified.json
classify.py		classify.py
data.json		data.json
db_ss.PNG		db_ss.PNG
lda_model.PNG		lda_model.PNG
semantic.py		semantic.py
semantic_analysis.PNG		semantic_analysis.PNG
step1.py		step1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course5

Step 1

Step 2

Step 3

Step 3

About

Releases

Packages

Languages

mercury297/Course5

Folders and files

Latest commit

History

Repository files navigation

Course5

Step 1

Step 2

Step 3

Step 3

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages