Polyglot is an extensible, configurable, experimenting automation for researching the performance of Recurrent Neural Nets language models.
Built as a need for our Bachelor of Science Software Engineering degree final project - Improving performance of RNN based LSTM language models using concurrent Machine Learning techniques with TensorFlow for Python.
In addition to the basic task of predicting the next word of a sentence, theses are the additional tasks that we forced the language model to improve it's performance:
- Part of Speech (POS) - Predicting what part of speech the next word should be (verb, noun, adjective etc.)
- Generated Classifier - Given the original dataset and a dataset generated by the language model itself, the classifier should detect which sentences are generated.
- Same as 2 but this time use an unlearing softmax function as the lost function.
- Write your own.
- Multitask Learning - training the same language model with different tasks concurrently.
- Transfer Learning - training the same language model with different tasks sequentially.
- Write your own.
- Install Docker https://docs.docker.com/install/
- Install TensorFlow version: 1.12 - https://www.tensorflow.org/install
- Install Python dependancies:
pip install -r requirements.txt
Open Python console and run:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
- Build the image (notice the dot at the end):
docker build -t lstm_fast .
- Run the image:
docker run lstm_fast
Make sure your current working directory is the project's root folder
python main.py
We assume of the following:
- Your vocab file fits in memory (train, test and validation datasets are unlimited)
- We need the num batches of each {train, test, validation} to be inserted into hyperparameters.json this is due to the fact that tf.data.Dataset loads in mini batches your data without taking into how much data there is.
-
Open the experiment_config.json in Pycharm
-
Give the mapping any name you wish
-
Choose Schema version - "JSON Schema Version 7"