Gender-Classification-ICDAR2013

This is a gender classifier that based on a given sample of handwritten text decides if the origin of the text is male or female. This model won first place 🥇 in the competition that corresponded to the project in fulfillment of the classwork requirements of the neural networks course taught to computer engineering juniors in Cairo University.

Datasets & Preprocessing 💾

We initially considered handwritten samples from both the dataset that was collected from our class and the ICDAR2013 dataset but the final model (which was known to be tested on a CMP23 test set) only uses the former. The relevant folder with the two files responsible of preprocessing (filtering and whitespace removal) and reading the images is the "Preprocessing" folder.

This is a sample from the dataset:

Features Extracted 🤳

We have considered GLCM, HoG, LBP, Fractal, COLD and, Hinge features along with a feature chef that tried all possible combinations of them. Each of these is discussed in detail in the project's report. Only Hinge features made it to the final model. The "Features" and "Combined_Features" folders include the relevant models.

This is an example for feature visualization (fractal features):

Models Considered 🕹️

We have considered NN, CNN, Random Forest, SVM, XGboost, Adaboost. Because both accuracy and performance mattered for the project (along with other constraints) only SVM made it to the final model. You can read more on that in the project's report. The "Deep Learning" and "Models" folders include the relevant models.

Running the Project 🚀

If you are a developer then you know how to navigate to the corresponding model/feature extractor/preprocessing module and run it. Otherwise, to test the final model you can run "evaluate.py" in the "Submissions" folder while having the test data in the test folder with labels in the groundtruth text file. When you finish you will find the model results and time taken in the "out" folder. The "test", "out" folders along with "evaluate.py" and "groundtruth.txt" rest within the "Submissions" folder.