Building on prior work that classifies names based on the sequence of characters, we create a model that capitalizes on sequence of sounds to classify names.
To capture the phonetic similarity of different names, we first produce sound encodings of names using https://pypi.org/project/Metaphone/#contents and then use LSTM on top to test classification accuracy. We find that the accuracy is substantially lower than what we can achieve when we just apply LSTM to the name strings. This suggests that there is some information in the spellings (aside from the sound) and very plausibly that the sound encoding algorithms do not capture the way a name is read completely.
In the future, we plan to ensemble the two models.
- Download FL Voter Data
- Prepared FL Voter Data
- LSTM Model Based on Metaphone Encoding
- Comparison Between Ethnicolr and Soundnames and Naive Average of the two models
Suriyan Laohaprapanon and Gaurav Sood