Brain-inspired Opponent modeling in Air combat

The part of codebase for training opponent model based multi-agent reinforcement learning policies for UAV swarms.

Running experiments

Opponent actions Scheme: Train

This will run the baseline experiment. Change the number of workers appropriately to match the number of logical CPU cores on your machine, but it is advised that the total number of simulated environments is close to that in the original command:

CUDA_VISIBLE_DEVICES=0 python -m swarm_rl.train --env=dogfight_multi --train_for_env_steps=160000000 --algo=APPO --use_rnn=False --rnn_type=lstm --num_heads=4 --attention_size=32 --num_workers=18 --num_envs_per_worker=2 --learning_rate=0.0003 --ppo_clip_value=5.0 --recurrence=1 --nonlinearity=tanh --actor_critic_share_weights=False 
--policy_initialization=xavier_uniform --adaptive_stddev=False --with_vtrace=False --max_policy_lag=100000000 --hidden_size=128 --encoder_custom=quad_multi_encoder --with_pbt=False --quads_neighbor_hidden_size=128 --quads_obstacle_hidden_size=128 
--gae_lambda=1.00 --max_grad_norm=5.0 --exploration_loss_coeff=0.0 --rollout=128 --batch_size=1024 
--quads_neighbor_encoder_type=attention --replay_buffer_sample_prob=0.75 --anneal_collision_steps=30000000 --experiment=dogfight4v4_v10 
--num_good_agents=4 --num_adversaries=4 --num_landmarks=4 --num_neighbors_obs=3 --num_oppo_obs=4 --num_obstacle_obs=4 --use_spectral_norm=True
 --quads_num_agents=8 --seed=1 --oppo_model_ally=True --local_time_attention=False --global_time_attention=True --scenario_name=4v4/ShootMissile/Selfplay3_altitude_noheading2 -
 -intention_model=False

A hierarchical world model Scheme: Train

This will run the baseline experiment. Change the number of workers appropriately to match the number of logical CPU cores on your machine, but it is advised that the total number of simulated environments is close to that in the original command:

CUDA_VISIBLE_DEVICES=0 python -m swarm_rl.train --env=dogfight_multi --train_for_env_steps=160000000 --algo=APPO  --use_rnn=False --rnn_type=lstm --num_heads=4 --attention_size=32  --num_workers=18 --num_envs_per_worker=2  --learning_rate=0.0003 --ppo_clip_value=5.0 --recurrence=1 --nonlinearity=tanh --actor_critic_share_weights=False 
 --policy_initialization=xavier_uniform --adaptive_stddev=False --with_vtrace=False --max_policy_lag=100000000  --hidden_size=128 --encoder_custom=quad_multi_encoder --with_pbt=False --quads_neighbor_hidden_size=128  --quads_obstacle_hidden_size=128 --gae_lambda=1.00 --max_grad_norm=5.0 --exploration_loss_coeff=0.0  --rollout=128 --batch_size=1024 --quads_episode_duration=110.0 --quads_collision_reward=5.0  --quads_neighbor_encoder_type=attention --replay_buffer_sample_prob=0.75 --anneal_collision_steps=30000000   --experiment=dogfight4v4_v10 --num_good_agents=4 --num_adversaries=4 --num_landmarks=4 --num_neighbors_obs=3 
 --num_oppo_obs=4 --num_obstacle_obs=4 --use_spectral_norm=True --quads_num_agents=8 --seed=1 --oppo_model_ally=True   --local_time_attention=False --global_time_attention=True --scenario_name=4v4/ShootMissile/Selfplay3_altitude_noheading2  --intention_model=True

A Joint optimization Scheme: Train

This will run the baseline experiment. Change the number of workers appropriately to match the number of logical CPU cores on your machine, but it is advised that the total number of simulated environments is close to that in the original command:

CUDA_VISIBLE_DEVICES=0 python -m swarm_rl.train --env=dogfight_multi --train_for_env_steps=160000000 --algo=APPO --use_rnn=False --rnn_type=lstm --num_heads=4 --attention_size=32 --num_workers=18 --num_envs_per_worker=2 
--learning_rate=0.0003 --ppo_clip_value=5.0 --recurrence=1 --nonlinearity=tanh --actor_critic_share_weights=False --policy_initialization=xavier_uniform --adaptive_stddev=False --with_vtrace=False --max_policy_lag=100000000 --hidden_size=128 --encoder_custom=quad_multi_encoder --with_pbt=False --quads_neighbor_hidden_size=128 -
-quads_obstacle_hidden_size=128 --gae_lambda=1.00 --max_grad_norm=5.0 --exploration_loss_coeff=0.0 --rollout=128 --batch_size=1024 --quads_episode_duration=110.0 --quads_neighbor_encoder_type=attention --replay_buffer_sample_prob=0.75 --anneal_collision_steps=30000000 --experiment=dogfight4v4_v10 --num_good_agents=4 
--num_adversaries=4 --num_landmarks=4 --num_neighbors_obs=3 --num_oppo_obs=4 --num_obstacle_obs=4 --use_spectral_norm=True --quads_num_agents=8 --seed=1 --oppo_model_ally=True --local_time_attention=False --global_time_attention=True --scenario_name=4v4/ShootMissile/Selfplay3_altitude_noheading2 --intention_model=False

We use the MSE with reparameterization trick to indirectly minimize the difference between the prior and posterior distributions and ensure the minimization of trajectory prediction errors.

Env_steps=num_workers * num_envs_per_worker * num_agents * max_length * episodes=160M Real_Train_Env_steps=160M / num_agents= 20M

If you use this repository in your work or otherwise wish to cite it, please make reference to our paper.

Github issues and pull requests are welcome. qu

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Intention_world_Joint_scheme		Intention_world_Joint_scheme
Intention_world_hierachical_scheme		Intention_world_hierachical_scheme
Intention_world_opponent_action		Intention_world_opponent_action
reward_functions		reward_functions
termination_conditions		termination_conditions
tsne		tsne
LICENSE		LICENSE
README.md		README.md
Selfplay3_altitude_noheading2.yaml		Selfplay3_altitude_noheading2.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brain-inspired Opponent modeling in Air combat

Running experiments

Opponent actions Scheme: Train

A hierarchical world model Scheme: Train

A Joint optimization Scheme: Train

About

Releases

Packages

Languages

License

cjmdd/Opponent_modeling_Aircombat

Folders and files

Latest commit

History

Repository files navigation

Brain-inspired Opponent modeling in Air combat

Running experiments

Opponent actions Scheme: Train

A hierarchical world model Scheme: Train

A Joint optimization Scheme: Train

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages