quick launch and monitor

Usage

To initialize a learner node, run bash run_learner_node.sh $(learner_node_idx) and write corresponding learner configuration. Learner configuration is a Python dict specifying the usage of each GPU on each learner node. For example, if you have 2 learner nodes and you want to use GPU 0 and GPU 1 of node 0 for training policy 0 and use GPU 2 of node 0 and GPU 0 of node 1 for training policy 1,

learner_config:
  "0":
    "0": 0
    "1": 0
    "2": 1
  "1":
    "0": 1

To initialize a worker node, run bash run_worker_node.sh $(worker_node_idx). Number of worker nodes is specified by seg_addrs. For example, if you have 2 learner nodes and 2 worker nodes, seg_addrs should be like

seg_addrs:
  - - learner0_addr:port0
    - learner0_addr:port1
  - - learner1_addr:port0
    - learner1_addr:port1

such that a worker can communicate with all learners. len(seg_addrs) equals to the number of learner nodes and len(seg_addrs[0]) equals to the number of worker nodes.

quick launch and monitor

Specify node numbers (for ssh), container names and learner/worker names in config, and run python run.py in this directory (run this outside of container with python2). Notice that update.sh update your code before running, you can write your own update.sh and uncomment code in run.py to auto update before run.

The monitor processes run on every container. The head monitor try to gather system information (RX, TX, cpu utilization) from other monitors every interval seconds, and print them out to log. If you want a graph for these information after a run, uncomment m.summary() line in run.py.

You can modify monitor/script.py and run.py for your own preference.

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
algorithms		algorithms
components		components
configs/starcraft2		configs/starcraft2
dev		dev
envs		envs
meta_controllers		meta_controllers
monitor		monitor
system		system
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
config.py		config.py
run.py		run.py
run_learner_node.py		run_learner_node.py
run_learner_node.sh		run_learner_node.sh
run_monitor.py		run_monitor.py
run_worker_node.sh		run_worker_node.sh
run_worker_node_on_smac.py		run_worker_node_on_smac.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

quick launch and monitor

About

Releases

Packages

Contributors 2

Languages

garrett4wade/scaling_marl

Folders and files

Latest commit

History

Repository files navigation

Usage

quick launch and monitor

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages