Skip to content

Investigating "fake news" via text classification using TF-IDF vectorization in Python

Notifications You must be signed in to change notification settings

ljsonnanburg/Classifying-Fake-News

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Project classifying roughly ~45,000 news articles as "real" or "fake" based on pre-labeled training set, and extracting some other useful insights in the process.

Approach involved TF-IDF vectorization to abstract individual words into scores indicating how likely they were to appear with either label. Achieved >99% scores in all metrics with a random forest classifier. Results may not generalize well as the original method of collecting the training data is unexplained; it appears that whoever labeled the data based their labels solely on the news source.

About

Investigating "fake news" via text classification using TF-IDF vectorization in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published