Skip to content

Latest commit

 

History

History
109 lines (70 loc) · 9.89 KB

agents.md

File metadata and controls

109 lines (70 loc) · 9.89 KB

Understanding and creating agents

Defining an agent starts by creating a new class that inherits from navsim.agents.abstract_agent.AbstractAgent.

Let’s dig deeper into this class. It has to implement the following methods:

  • __init__():

    The constructor of the agent.

  • name()

    This has to return the name of the agent. The name will be used to define the filename of the evaluation csv. You can set this to an arbitrary value.

  • initialize()

    This will be called before inferring the agent for the first time. If multiple workers are used, every worker will call this method for its instance of the agent. If you need to load a state dict etc., you should do it here instead of in __init__.

  • get_sensor_config()

    Has to return a SensorConfig (see navsim.common.dataclasses.SensorConfig) to define which sensor modalities should be loaded for the agent in each frame. The SensorConfig is a dataclass that stores for each sensor a List of indices of history frames for which the sensor should be loaded. Alternatively, a boolean can be used for each sensor, if all available frames should be loaded. Moreover, you can return SensorConfig.build_all_sensors() if you want to have access to all available sensors. Details on the available sensors can be found below.

    Loading the sensors has a big impact on runtime. If you don't need a sensor, consider to set it to False.

  • compute_trajectory()

    This is the main function of the agent. Given the AgentInput which contains the ego state as well as sensor modalities, it has to compute and return a future trajectory for the Agent. Details on the output format can be found below.

    The future trajectory has to be returned as an object of type from navsim.common.dataclasses.Trajectory. For examples, see the constant velocity agent or the human agent.

Learning-based Agents

Most likely, your agent will involve learning-based components. Navsim provides a lightweight and easy-to-use interface for training. To use it, your agent has to implement some further functionality. In addition to the methods mentioned above, you have to implement the methods below. Have a look at navsim.agents.ego_status_mlp_agent.EgoStatusMLPAgent for an example.

  • get_feature_builders() Has to return a List of feature builders (of type navsim.planning.training.abstract_feature_target_builder.AbstractFeatureBuilder). FeatureBuilders take the AgentInput object and compute the feature tensors used for agent training and inference. One feature builder can compute multiple feature tensors. They have to be returned in a dictionary, which is then provided to the model in the forward pass. Currently, we provide the following feature builders:

  • get_target_builders() Similar to get_feature_builders(), returns the target builders of type navsim.planning.training.abstract_feature_target_builder.AbstractTargetBuilder used in training. In contrast to feature builders, they have access to the Scene object which contains ground-truth information (instead of just the AgentInput).

  • forward() The forward pass through the model. Features are provided as a dictionary which contains all the features generated by the feature builders. All tensors are already batched and on the same device as the model. The forward pass has to output a Dict of which one entry has to be "trajectory" and contain a tensor representing the future trajectory, i.e. of shape [B, T, 3], where B is the batch size, T is the number of future timesteps and 3 refers to x,y,heading.

  • compute_loss() Given the features, the targets and the model predictions, this function computes the loss used for training. The loss has to be returned as a single Tensor.

  • get_optimizers() Use this function to define the optimizers used for training. Depending on whether you want to use a learning-rate scheduler or not, this function needs to either return just an Optimizer (of type torch.optim.Optimizer) or a dictionary that contains the Optimizer (key: "optimizer") and the learning-rate scheduler of type torch.optim.lr_scheduler.LRScheduler (key: "lr_scheduler").

  • get_training_callbacks() In this function, you can return a List of pl.Callback to monitor or visualize the training process of the learned model. We implemented a callback for TransFuser in navsim.agents.transfuser.transfuser_callback.TransfuserCallback, which can serve as a starting point.

  • compute_trajectory() In contrast to the non-learning-based Agent, you don't have to implement this function. In inference, the trajectory will automatically be computed using the feature builders and the forward method.

Inputs

get_sensor_config() can be overwritten to determine which sensors are accessible to the agent.

The available sensors depend on the dataset. For OpenScene, this includes 9 sensor modalities: 8 cameras and a merged point cloud (from 5 LiDARs). Each modality is available for a duration of 2 seconds into the past, at a frequency of 2Hz (i.e., 4 frames). Only this data will be released for the test frames (no maps/tracks/occupancy etc, which you may use during training but will not have access to for leaderboard submissions).

You can configure the set of sensor modalities to use and how much history you need for each frame with the navsim.common.dataclasses.SensorConfig dataclass.

Why LiDAR? Recent literature on open-loop planning has opted away from LiDAR in favor of using surround-view high-resolution cameras. This has significantly strained the compute requirements for training and testing SoTA planners. We hope that the availability of the LiDAR modality enables more computationally efficient submissions that use fewer (or low-resolution) camera inputs.

Ego Status. Besides the sensor data, an agent also receives the ego pose, velocity and acceleration information in local coordinates. Finally, to disambiguate driver intention, we provide a discrete driving command, indicating whether the intended route is towards the left, straight or right direction. Importantly, the driving command in NAVSIM is based solely on the desired route, and does not entangle information regarding obstacles and traffic signs (as was prevalent on prior benchmarks such as nuScenes). Note that the left and right driving commands cover turns, lane changes and sharp curves.

Output

Given this input, you will need to override the compute_trajectory() method and output a Trajectory. This is an array of BEV poses (with x, y and heading in local coordinates), as well as a TrajectorySampling config object that indicates the duration and frequency of the trajectory. The PDM score is evaluated for a horizon of 4 seconds at a frequency of 10Hz. The TrajectorySampling config facilitates interpolation when the output frequency is different from the one used during evaluation.

Check out the baseline for implementations of agents!

Baselines

NAVSIM provides several baselines, which serve as comparison or starting points for new end-to-end driving models. We provide model weights for all learned baselines on Hugging Face.

ConstantVelocityAgent:

The ConstantVelocityAgent is a naive baseline and follows the most simple driving logic. The agent maintains constant speed and a constant heading angle, resulting in a straight-line output trajectory. You can use the agent to familiarize yourself with the AbstractAgent interface or analyze samples that have a trivial solution for achieving a high PDM score.

Link to the implementation.

EgoStatusMLPAgent:

The EgoStatusMLPAgent is a blind baseline, which ignores all sensors that perceive the environment. The agent applies a Multilayer perceptron to the state of the ego vehicle (i.e., the velocity, acceleration, and driving command). Thereby, the EgoStatusMLP serves as an upper bound for performance, which can be achieved by merely extrapolating the kinematic state of the ego vehicle. The EgoStatusMLP is a lightweight learned example, showcasing the procedure of creating feature caches and training an agent in NAVSIM.

Link to the implementation.

TransfuserAgent:

Transfuser is an example of a sensor agent that utilizes both camera and LiDAR inputs. The backbone of Transfuser applies CNNs on a front-view camera image and a discretized LiDAR BEV grid. The features from the camera and LiDAR branches are fused over several convolution stages with Transformers to a combined feature representation. The Transfuser architecture combines several auxiliary tasks and imitation learning with strong closed-loop performance in end-to-end driving with the CARLA simulator.

In NAVSIM, we implement the Transfuser backbone from CARLA Garage and use BEV semantic segmentation and DETR-style bounding-box detection as auxiliary tasks. To facilitate the wide-angle camera view of the Transfuser, we stitch patches of the three front-facing cameras. Transfuser is a good starting point for sensor agents and provides pre-processing for image and LiDAR sensors, training visualizations with callbacks, and more advanced loss functions (i.e., Hungarian matching for detection).

Link to the implementation.