Members:
- Leiko Ravelo
- Paolo Valdez
- Darwin Bautista
- Google slides
- LaTeX project is in Slack chat
Preferably work on a virtual environment (Python 3). See this guide for installing virtual environments.
I also thought it would be a good idea to use jupyter notebooks for this.
Packages (so far):
- Tensorflow
- Keras
- Jupyter
- Ipython
pip install keras tensorflow jupyter ipython
The following command will open a browser to the jupyter environment.
jupyter notebook
It's also possible to run a jupyter notebook remotely. See running a notebook server. (Can access through vpn w/ opera)
Add here some resources you think might be useful for the project.
- Hyperparameter optimization: hyperopt
- Save and Load Keras Models: link
- Stanford CS224n NLP with DL: link
- Hyperparamter optimization guide: link
- Machine Translation Best Practices "mini guide": link
Keras Tutorials:
Add here relevant research papers
Once we have the dataset available, curate it and format it properly (tab indented text file).
Convert text data from file to acceptable representation. May also need to remove punctuations (like comma, period, must be settled early on). Theres one-hot representation, which is pretty easy to implement. word2vec and glove is also available but may take some time to implement.
Relevant Resources:
- Chris Albon Tutorials
- One-hot representation tutorial
- word embeddings tutorial
- pretrained glove representation
- fasttext
Objective of the project is to create an optimal working machine translator. Define objectives and a success metric. There's BLEU score but let's wait for sir's definition. Standard RNN architecture ('Vanilla Model') used for machine translation can be seen on stanford lectures.
I think it's also important to decouple training and actual translation. Need a way to save the model and load it on a different python file. -Leiko
Relevant Resources:
Possible hyperparameters include. Task is to identify which hyperparameters are important, and find generally accepted ranges. One solution is do a random search through the hyperparameter space to find the model that gives optimal results.
I have found hyperopt although I'm not sure how it works yet.
Possible hyperparameters include:
- Learning rate
- Gradient Descent Optimizer (SGD, Adam, RMSprop)
- Minibatch size
- Epochs
- RNN layers (1-4)
- Choice of RNN architecture (GRU, RNN, LSTM)
- Misc (Attention model, bidirectional lstm, deep lstm)
I'm not sure yet which ones of the above are important -Leiko