Skip to content

StephenLee129/CIS419-Final-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Problem and Motivation

  • US public companies submit report regarding operational or financial change with corresponding label to Security exchange commision
  • Companies sometimes misclassify concerning disclosure as label 8 (miscellaneous) while other more appropriate label exists
  • Due to quantity of report, using natural language processing is practical alternative to manual review

Dataset

  • 3000 entries
  • 16 labels
  • Training Set : reports that aren’t labeled as miscellaneous

NLP algorithm

  • Preprocess the data by deleting stop ward, stemming, lemmanizing and tokenizing the input
  • Used bag of word approach with inverse frequency weight
  • Predict using Multi-Nominal Naive Bayes

Prediction Result

  • 91% Accuracy achieved using Porter Stemmer and Wordnet Lemmatizer

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published