English Text Classification

This repository contains scripts for English text classification using various models such as TextCNN, TextRNN, TextRCNN, and DPCNN. The code includes functionalities for dataset preparation, vocabulary construction, as well as model training and evaluation.

Prerequisites

Python 3.x
PyTorch
NumPy
Pandas
tqdm
Gensim (for handling word embeddings)
Other dependencies listed in requirements.txt

Pre-training model download

./fasttext wiki-news-300d-1M.vec

./glove glove.6B.50d.txt

./GoogleNews-vectors-negative300 GoogleNews-vectors-negative300.bin

./datasets vocab.pkl labelled_newscatcher_dataset.csv

Project Structure

tool.py: Utility functions for cleaning special characters and contractions.
train_eval.py: Script for training and evaluating the models.
run.py: run-time file (computing)
TextRNN.py: The TextRNN model proposed in the reference paper "Recurrent Neural Network for Text Classification with Multi Task Learning"
DPCNN.py: The DPCNN model proposed in the reference paper "Deep Pyramid Convolutional Neural Networks for Text Categorization"
README.md: Project documentation.

Usage

1 Data Preparation

1.1 Dataset Structure

train.csv: CSV file containing training data.
val.csv: CSV file containing validation data.
test.csv: CSV file containing test data.

1.2 Building Vocabulary

Run the following script to build the vocabulary:

python data_split.py
python dataset_preprocessing.py
python extracting_pre-trained_word_vectors.py

2. Model Training

2.1 Configuration

Set the model and embedding type using command line arguments in the train.py script:

python train.py --model TextCNN --embedding pre_trained

or

python train.py --model DPCNN --embedding pre_trained

This example uses pre-trained word embeddings to train the TextCNN model. Adjust the parameters according to your requirements.

Loss

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

English Text Classification

Prerequisites

Pre-training model download

Project Structure

Usage

1 Data Preparation

2. Model Training

Loss

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
DPCNN.py		DPCNN.py
LICENSE		LICENSE
README.md		README.md
TextRNN.py		TextRNN.py
data_split.py		data_split.py
dataset_preprocessing.py		dataset_preprocessing.py
extracting_pre-trained_word_vectors.py		extracting_pre-trained_word_vectors.py
loss_plot.png		loss_plot.png
requirements.txt		requirements.txt
run.py		run.py
tool.py		tool.py
train_eval.py		train_eval.py
utils.py		utils.py

License

diya-he/TextRNN

Folders and files

Latest commit

History

Repository files navigation

English Text Classification

Prerequisites

Pre-training model download

Project Structure

Usage

1 Data Preparation

2. Model Training

Loss

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages