The project implemented some flat reinforcement learning algorithms on StarCraft II mini-games based on PySC2 Lib
- install PyTorch based on your own configuration: PyTorch Installation Link
- install all packages in
requirements.txt
:pip install -r requirements.txt
- main.py: instantiate the Agent/Algorithm and an
SC2Env
, then run the training or evaluation - Algorithms: algorithms/agents to learn and test
- Networks: neural networks
- utils: some helpers
- NotRelated: some other files that are not related to the project
In the project:
- An
obs
is a dict with three keys:minimap
,screen
andnon_spatial
.- There are two kinds of
obs
with numpy array and PyTorch tensor data type, which are represented with_np
and_ts
suffix respectively. obs
is only for the agent/algorithm itself, for the states returned fromSC2Env
, we are usingstate
.
- There are two kinds of
- An
action
is dict with three keys:function_id
,coordinate1
andcoordinate2
.- There are two kinds of
action
with numpy array and PyTorch tensor data type, which are represented with_np
and_ts
suffix respectively. - For now, for all Actor-Critic algorithms, the critic network only predicts the logits without a softmax layer,
which is put in the
optimize()
function, thelogits
has the same structure asaction
, and for now ,action
andlogits
are exactly same in the project, and we reserveaction
only. action
(same aslogits
) is only for the agent/algorithm itself, for the actions to interact withSC2Env
, we are usingfunction_call
.
- There are two kinds of
- The
non_spatial
argument inobs
packaged from the state only containsavailable_actions
for now. - The predicted arguments only contains
screen
,screen2
andminimap
for now. Based on the experience, no function has more than two arguments from them. - We assume the height and the width of the minimap and the screen are the same.
- The models or checkpoints save in
save_path/model_name/token/MODELS_AND_INFORMATION
using the token to identify a single model or checkpoint - For some on-line algorithms that using trajectories instead of transitions, we define that finishing one target as a trajectory, and finishing the episode as an epoch.
- An
log_prob(s)
is a dict with three keys:function_id
,coordinate1
andcoordinate2
.