A small (approximately 7000 words) English dictionary is broken down into units (letters, phones, or syllables) and a Markov process is used to generate new words. When using phones as the word unit, a rudimentary spelling algorithm is used to create English-pronouncable words.
$ git clone https://github.com/GeorgeNagel/randolang
$ cd randolang
# Create the python virtual environment
$ virtualenv venv
# Install the project requirements
$ source venv/bin/activate
$ pip install -r requirements.txt
$ venv/bin/nosetests
There are several methods to create new words. Three methods use a Markov process using different building blocks: letters, phones, and syllables. A fourth method, tuples, creates new words by randomly combining existing words.
Usage: venv/bin/python generate_words.py <method> <number of words> <order>
where order is the order of the markov process. When using the 'words' method, order is the number of words to combine.
$ venv/bin/python generate_words.py syllables 100 2
Number of words: 100. Method: syllables. Order: 2
Generating words...
New word: subjectionable
New word: stipulative
...
New word: preparagus
Done
Words are also saved in a .csv file in data/saved_words.
You can check the availability of domains using these new words with the usage: venv/bin/python check_domains.py <method> <tld> <skip_checked>
. When skip_checked is '0', domains which were already checked according to the relevant .csv file will be checked again.
$ venv/bin/python check_domains.py syllables .com 1
Domain: emony.com. Availability: unavailable
Domain: inabilious.com. Availability: available
...
Domain: perfectionately.com. Availability: available
This will update the relevant csv file in data/saved_words.
- Improved spelling when using the 'phones' method. Spelling from phones in new words is hard. It's hard even in existing words.
- Allow different languages as inputs so that fake words sound more like Latin versus more like German.
This project uses a few freely-available sources.
- mhyph (http://www.gutenberg.org/ebooks/3204) for a list of syllabified English words.
- CMUDict via nltk (http://www.nltk.org/_modules/nltk/corpus/reader/cmudict.html) for a list of pronunciations of English words.
- Jane Austen's Emma via nltk (http://www.gutenberg.org/ebooks/158) for a sane list of common English words.