Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Email Classification Using NLP #79

Open
darshbaxi opened this issue Mar 22, 2024 · 0 comments · May be fixed by #82 or #88
Open

Implement Email Classification Using NLP #79

darshbaxi opened this issue Mar 22, 2024 · 0 comments · May be fixed by #82 or #88
Labels
always open Open throughout the competition hard Hard Level Question worth 7 points round-2

Comments

@darshbaxi
Copy link
Collaborator

Dataset to be Used - Spam_classification.csv (located within the "Round-2 Dataset" folder.)

Tasks:

Data Preprocessing:
Tokenization: Split the text of each email into individual words or tokens.
Normalization: Convert all text to lowercase, remove punctuation, and handle special cases (like email addresses or URLs).
Stopword Removal: Remove common words that don't carry much meaning (e.g., "the", "is", "and").
Feature Extraction: Represent each email as a numerical vector using techniques like bag-of-words, TF-IDF (Term Frequency-Inverse Document Frequency), or word embeddings.

Model Creation:
Implement machine learning or deep learning models for email classification and create a PR with the maximum accuracy you can achieve.

Submission Format - Single Colab File with only the best model showing accuracy as metrics(Remove unnecessary models). Including your thought process (in comments or markdown cells) as to why you did certain steps to increase the accuracy would given an edge.

@darshbaxi darshbaxi added hard Hard Level Question worth 7 points always open Open throughout the competition round-2 labels Mar 22, 2024
@Invincible1602 Invincible1602 linked a pull request Mar 23, 2024 that will close this issue
@Robinaditya1045 Robinaditya1045 linked a pull request Mar 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
always open Open throughout the competition hard Hard Level Question worth 7 points round-2
Projects
None yet
1 participant