Some Flat RL Algorithms on StartCraft II Mini-Games

The project implemented some flat reinforcement learning algorithms on StarCraft II mini-games based on PySC2 Lib

Installation

install PyTorch based on your own configuration: PyTorch Installation Link
install all packages in requirements.txt: pip install -r requirements.txt

Project Structure

main.py: instantiate the Agent/Algorithm and an SC2Env, then run the training or evaluation
Algorithms: algorithms/agents to learn and test
Networks: neural networks
utils: some helpers
NotRelated: some other files that are not related to the project

Notes (TEMP):

In the project:

An obs is a dict with three keys: minimap, screen and non_spatial.
- There are two kinds of obs with numpy array and PyTorch tensor data type, which are represented with _np and _ts suffix respectively.
- obs is only for the agent/algorithm itself, for the states returned from SC2Env, we are using state.
An action is dict with three keys: function_id, coordinate1 and coordinate2.
- There are two kinds of action with numpy array and PyTorch tensor data type, which are represented with _np and _ts suffix respectively.
- For now, for all Actor-Critic algorithms, the critic network only predicts the logits without a softmax layer, which is put in the optimize() function, the logits has the same structure as action, and for now , action and logits are exactly same in the project, and we reserve action only.
- action (same as logits) is only for the agent/algorithm itself, for the actions to interact with SC2Env, we are using function_call.
The non_spatial argument in obs packaged from the state only contains available_actions for now.
The predicted arguments only contains screen, screen2 and minimap for now. Based on the experience, no function has more than two arguments from them.
We assume the height and the width of the minimap and the screen are the same.
The models or checkpoints save in save_path/model_name/token/MODELS_AND_INFORMATION using the token to identify a single model or checkpoint
For some on-line algorithms that using trajectories instead of transitions, we define that finishing one target as a trajectory, and finishing the episode as an epoch.
An log_prob(s) is a dict with three keys: function_id, coordinate1 and coordinate2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Some Flat RL Algorithms on StartCraft II Mini-Games

Installation

Project Structure

Notes (TEMP):

TODO

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Algorithms		Algorithms
Networks		Networks
NotRelated		NotRelated
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

mahaozhe/SCII_RL

Folders and files

Latest commit

History

Repository files navigation

Some Flat RL Algorithms on StartCraft II Mini-Games

Installation

Project Structure

Notes (TEMP):

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages