Skip to content

Sound Names: Predict Race and Ethnicity Based on the Sequence of Sounds

Notifications You must be signed in to change notification settings

appeler/sound_names

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Sound Names: Classify Names Using the Sequence of Sounds

Building on prior work that classifies names based on the sequence of characters, we create a model that capitalizes on sequence of sounds to classify names.

To capture the phonetic similarity of different names, we first produce sound encodings of names using https://pypi.org/project/Metaphone/#contents and then use LSTM on top to test classification accuracy. We find that the accuracy is substantially lower than what we can achieve when we just apply LSTM to the name strings. This suggests that there is some information in the spellings (aside from the sound) and very plausibly that the sound encoding algorithms do not capture the way a name is read completely.

In the future, we plan to ensemble the two models.

Scripts

  1. Download FL Voter Data
  2. Prepared FL Voter Data
  3. LSTM Model Based on Metaphone Encoding
  4. Comparison Between Ethnicolr and Soundnames and Naive Average of the two models

Authors

Suriyan Laohaprapanon and Gaurav Sood

About

Sound Names: Predict Race and Ethnicity Based on the Sequence of Sounds

Resources

Stars

Watchers

Forks

Releases

No releases published