GitHub

An example of the code is in main. The environment for the code can be installed using conda with the environment.yml

In order to test the different policies use valids 0, 1, 2 which are list bellow.

The standard risk seeking policy that using the top alpha% of equations with reward based gradients
The unbiased risk seeking policy that uses the top alpha% with a reward based gradient
A linear risk seeking policy that uses the top alpha% of equations with reward based gradeint, however the alpha changes each epoch according to the linear function in the paper

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
functions		functions
.gitignore		.gitignore
BTS_Transformer_model.py		BTS_Transformer_model.py
Exploration.ipynb		Exploration.ipynb
Prob_ML_Project-1-1.pdf		Prob_ML_Project-1-1.pdf
README.md		README.md
environment.yml		environment.yml
expression_tree.py		expression_tree.py
main.py		main.py
policies.py		policies.py
requirements.txt		requirements.txt
trainer.py		trainer.py
unit_tests.py		unit_tests.py

Provide feedback