Below is an example of a complete configuration file. The following sections will explain each part of the configuration file in more detail.
# Specify task environment
env: saferl.aerospace.tasks.docking.task.DockingEnv
# Task environment configuration
env_config:
# Define the name of the agent platform
agent: deputy
# Define simulation step size
step_size: 1
# Define a list of environment platforms
env_objs:
# First environment platform
- name: chief # Platform name
class: saferl.aerospace.models.cwhspacecraft.platforms.CWHSpacecraft2d # Platform class
config: # Platform dependent configuration
init: # Platform initialization parameters
initializer: saferl.environment.tasks.initializers.RandBoundsInitializer
x: 0
x_dot: 0
y: 0
y_dot: 0
- name: deputy # Platform name
class: saferl.aerospace.models.cwhspacecraft.platforms.CWHSpacecraft2d # Platform class
config: # Platform dependent configuration
controller:
class: saferl.environment.models.platforms.AgentController
actuators:
- name: thrust_x
space: discrete
points: 11
bounds: [-1, 1]
- name: thrust_y
space: discrete
points: 11
bounds: [-1, 1]
init: # Platform initialization parameters
initializer: saferl.aerospace.tasks.docking.initializers.ConstrainedDeputyPolarInitializer
ref: chief
radius: [100, 150]
angle: [0, 6.283185307179586]
- name: docking_region # Platform name
class: saferl.environment.models.geometry.RelativeCircle # Platform class
config: # Platform dependent configuration
ref: chief
x_offset: 0
y_offset: 0
radius: 0.5
init: # Platform initialization parameters
initializer: saferl.environment.tasks.initializers.RandBoundsInitializer
status: # List of status processors
- name: in_docking # Processor name
class: saferl.aerospace.tasks.docking.processors.InDockingStatusProcessor # Processor class
config: # Processor-dependent configuration
deputy: deputy
docking_region: docking_region
- name: docking_distance # Processor name
class: saferl.aerospace.tasks.docking.processors.DockingDistanceStatusProcessor # Processor class
config: # Processor-dependent configuration
deputy: deputy
docking_region: docking_region
- name: delta_v # Processor name
class: saferl.aerospace.tasks.docking.processors.DockingThrustDeltaVStatusProcessor # Processor class
config: # Processor-dependent configuration
target: deputy
- name: custom_metrics.delta_v_total # Processor name
class: saferl.aerospace.tasks.docking.processors.AccumulatorStatusProcessor # Processor class
config: # Processor-dependent configuration
status: delta_v
- name: max_vel_limit # Processor name
class: saferl.aerospace.tasks.docking.processors.DockingVelocityLimit # Processor class
config: # Processor-dependent configuration
target: deputy
dist_status: docking_distance
vel_threshold: 0.2
threshold_dist: 0.5
slope: 2
- name: max_vel_violation # Processor name
class: saferl.aerospace.tasks.docking.processors.DockingVelocityLimitViolation # Processor class
config: # Processor-dependent configuration
target: deputy
ref: chief
vel_limit_status: max_vel_limit
- name: max_vel_constraint # Processor name
class: saferl.aerospace.tasks.docking.processors.RelativeVelocityConstraint # Processor class
config: # Processor-dependent configuration
target: deputy
ref: chief
vel_limit_status: max_vel_limit
- name: failure # Processor name
class: saferl.aerospace.tasks.docking.processors.FailureStatusProcessor # Processor class
config: # Processor-dependent configuration
docking_distance: docking_distance
max_goal_distance: 40000
timeout: 2000
in_docking_status: in_docking
max_vel_constraint_status: max_vel_constraint
- name: success # Processor name
class: saferl.aerospace.tasks.docking.processors.SuccessStatusProcessor # Processor class
config: # Processor-dependent configuration
in_docking_status: in_docking
max_vel_constraint_status: max_vel_constraint
observation: # List of observation processors
- name: observation_processor # Processor name
class: saferl.aerospace.tasks.docking.processors.DockingObservationProcessor # Processor class
config: # Processor-dependent configuration
deputy: deputy
mode: 2d
reward: # List of reward processors
- name: time_reward # Processor name
class: saferl.aerospace.tasks.docking.processors.TimeRewardProcessor # Processor class
config: # Processor-dependent configuration
reward: -0.001
- name: dist_change_reward # Processor name
class: saferl.aerospace.tasks.docking.processors.DistanceChangeRewardProcessor # Processor class
config: # Processor-dependent configuration
deputy: deputy
docking_region: docking_region
reward: -1.0e-02
- name: delta_v # Processor name
class: saferl.environment.tasks.processor.reward.ProportionalRewardProcessor # Processor class
config: # Processor-dependent configuration
scale: 0
bias: 0
proportion_status: delta_v
- name: max_vel_constraint # Processor name
class: saferl.environment.tasks.processor.reward.ProportionalRewardProcessor # Processor class
config: # Processor-dependent configuration
scale: -0.001
bias: -0.01
proportion_status: max_vel_violation
cond_status: max_vel_constraint
cond_status_invert: True
- name: failure_reward # Processor name
class: saferl.aerospace.tasks.docking.processors.FailureRewardProcessor # Processor class
config: # Processor-dependent configuration
failure_status: failure
reward:
crash: -1
distance: -1
timeout: -5
- name: success_reward # Processor name
class: saferl.aerospace.tasks.docking.processors.SuccessRewardProcessor # Processor class
config: # Processor-dependent configuration
reward: 5
success_status: success
verbose: false # Training verbosity
env: saferl.aerospace.tasks.docking.task.DockingEnv
The env
key defines the Gym environment used during training.
Every env
object expects an env_config
configuration.
Specify the Gym environment you wish to use with the import path
of the environment class.
env_config:
The env_config key points to the custom environment configuration
for the environment specified by env
. All environments which
inherit from the SafeRL BaseEnv
class expect the follow keys
in the custom env_config
:
-
agent:
a string listing the name of the environment object which will take actions within the environment. -
step_size:
a float denoting the step size of the simulation. -
env_objs:
an ordered list ofEnvObj
config entries. This list contains everyEnvObj
present in the environment and their relative configurations. -
observation:
an ordered list ofObservationProcessor
config entries. This list contains everyObservationProcessor
present in the environment and their relative configurations. -
reward:
an ordered list ofRewardProcessor
config entries. This list contains everyRewardProcessor
present in the environment and their relative configurations. -
status:
an ordered list ofStatusProcessor
config entries. This list contains everyStatusProcessor
present in the environment and their relative configurations. -
verbose:
A boolean flag which sets the verbosity of environment setup and training.
Each of these entries in the env_config
configuration will be
explained in further detail below.
name: deputy # Platform name
class: saferl.aerospace.models.cwhspacecraft.platforms.CWHSpacecraft2d # Platform class
config: # Platform dependent configuration
controller:
class: saferl.environment.models.platforms.AgentController
actuators:
- name: thrust_x
space: discrete
points: 11
bounds: [-1, 1]
- name: thrust_y
space: discrete
points: 11
bounds: [-1, 1]
init: # Platform initialization parameters
initializer: saferl.aerospace.tasks.docking.initializers.ConstrainedDeputyPolarInitializer
ref: chief
radius: [100, 150]
angle: [0, 6.283185307179586]
All simulations in saferl
are made up of environment objects.
Environment objects represent physical and logical objects in the
simulation which may act as agents, goals, points of interest,
or obstacles. All environment objects are instantiated in the
order given by the configuration file at the beginning of the
simulation.
Environment object entries in a custom YAML configuration file consist of three primary keys:
name
class
config
The name key points to a string value denoting the name of the environment object in the simulation. The environment object can be easily referenced by its name throughout the rest of the configuration file.
The class key points to a string value containing the python
import path of the class definition that will be used to instantiate
the environment object in the simulation. This class must inherit
from the saferl.environment.models.platforms.BaseEnvObj
class.
The config key points to a (potentially empty) dictionary of key-value
pairs. The contents of this configuration dictionary include an init
configuration and a configuration which is dependent
on the expected configuration parameters of the environment object
class
. For example, the saferl.aerospace.models.cwhspacecraft.platforms.CWHSpacecraft2d
class can take a controller
key-value configuration which
defines the controller class and actuator parameters for the platform.
The config
dictionary also contains an init
configuration. This
configuration specifies the initializer
class used to initialize
the platform within the environment, as well as whatever custom
parameters the initializer
class expects. The initializer
class
is expected to inherit from saferl.environment.tasks.initializers.Initializer
.
name: observation_processor # Processor name
class: saferl.aerospace.tasks.docking.processors.DockingObservationProcessor # Processor class
config: # Processor-dependent configuration
deputy: deputy
mode: 2d
Observation processors define how observations are handled within a simulation. Many tasks require that an agent have only a partial knowledge of the full simulation state. Observation processors transform the full simulation state into an appropriate observation for the agent. All observation processors execute in the order given by the configuration file at each time step of the simulation.
Observation processor entries in a custom YAML configuration file consist of three primary keys:
name
class
config
The name key points to a string value denoting the name of the observation processor in the simulation. The processor can be easily referenced by its name throughout the rest of the configuration file.
The class key points to a string value containing the python
import path of the class definition that will be used to instantiate
the observation processor in the simulation. This class must inherit
from the saferl.environment.tasks.processor.processors.ObservationProcessor
class.
The config key points to a (potentially empty) dictionary of key-value
pairs. The contents of this configuration dictionary include a configuration which is dependent
on the expected configuration parameters of the observation processor
class
. For example, the saferl.aerospace.tasks.docking.processors.DockingObservationProcessor
class can take a deputy
key-value configuration, which registers
the name of the deputy environment object in the simulation, and a
mode
key-value configuration which defines whether observations
are 2D or 3D.
name: time_reward # Processor name
class: saferl.aerospace.tasks.docking.processors.TimeRewardProcessor # Processor class
config: # Processor-dependent configuration
reward: -0.001
Reward processors define how rewards are computed during agent training. Each reward processor uses information in the simulation state and/or the simulation status (see below) to compute a reward for an agent. All reward processors execute in the order given by the configuration file at each time step of the simulation.
Reward processor entries in a custom YAML configuration file consist of three primary keys:
name
class
config
The name key points to a string value denoting the name of the reward processor in the simulation. The processor can be easily referenced by its name throughout the rest of the configuration file.
The class key points to a string value containing the python
import path of the class definition used to instantiate
the reward processor in the simulation. This class must inherit
from the saferl.environment.tasks.processor.processors.RewardProcessor
class.
The config key points to a (potentially empty) dictionary of key-value
pairs. The contents of this configuration dictionary include a configuration which is dependent
on the expected configuration parameters of the observation processor
class
. For example, the saferl.aerospace.tasks.docking.processors.TimeRewardProcessor
class can take a reward
key-value configuration, which defines
the reward value associated with time taken to complete the task.
name: in_docking # Processor name
class: saferl.aerospace.tasks.docking.processors.InDockingStatusProcessor # Processor class
config: # Processor-dependent configuration
deputy: deputy
docking_region: docking_region
Status processors are used to compute specific status information of interest to the task from the raw simulation state (e.g. distance between two environment objects). These statuses can inform other status or reward processors. All status processors execute in the order given by the configuration file at each time step of the simulation.
Status processor entries in a custom YAML configuration file consist of three primary keys:
name
class
config
The name key points to a string value denoting the name of the status processor in the simulation. The processor can be easily referenced by its name throughout the rest of the configuration file.
The class key points to a string value containing the python
import path of the class definition used to instantiate
the status processor in the simulation. This class must inherit
from the saferl.environment.tasks.processor.processors.StatusProcessor
class.
The config key points to a (potentially empty) dictionary of key-value
pairs. The contents of this configuration dictionary include a configuration which is dependent
on the expected configuration parameters of the observation processor
class
. For example, the saferl.aerospace.tasks.docking.processors.InDockingStatusProcessor
class can take a deputy
key-value configuration, which registers
the name of the deputy environment object in the simulation, and a
docking_region
key-value configuration which registers the name
of the docking region environment object in the simulation.