A long-range and robust perception system plays a crucial role in advancing research and deployment of autonomous driving. 4D radar, as an emerging range sensor, offers superior resilience to adverse weather conditions than lidar and provides elevation measurement compared to 3D radar. Existing 4D radar datasets, emphasizing robust and multimodal perception, typically combine camera, lidar, and 4D radar. However, they often lack long-range capability due to limited annotations. Furthermore, the configuration of their single short-focus camera fails to effectively match a long-range 4D radar. To overcome these limitations, we present a novel long-range multimodal dataset. It encompasses high-resolution and long-range sensors, including forward-facing cameras, a 360° lidar, and a front-mounted 4D radar, along with detailed annotations for 3D objects. Particularly, our dataset introduces, for the first time, three cameras with different focal lengths, enabling simultaneous capture of images with varying perception ranges. It serves as a valuable resource for developing accurate long-range perception algorithms. Remarkably, our dataset achieves the longest annotation ranges among comparable 4D radar datasets, spanning up to 220 meters. It supports applications such as 3D object detection and tracking, as well as facilitates the study of multimodal tasks. Through rigorous experiments, we validate the efficacy of our dataset and offer valuable insights into long-range 3D object detection.
[2024.04.17] We have released the dataset download link.
[2024.04.17] Our Code currently supports some baselines including DETR3D, Pointpillars, Second, and PV-RCNN.
L-RadSet includes over 133K high-quality, manually annotated 3D ground bounding boxes, covering nine categories of static and dynamic objects within a long range of up to 220 meters. Each ground truth box is accompanied by a track identifier to facilitate 3D object tracking. Furthermore, our dataset comprises 280 meticulously selected driving scenes, each spanning 20 seconds. These scenes consist of a total of 11.2K keyframes, capturing data from diverse real-world traffic scenarios. For training and validation purposes, we partitioned these scenes into 225 samples, reserving 55 scenes for testing.
Our ego vehicle is equipped with three front-view cameras featuring varying focal lengths and field-of-view (FOV) characteristics, as depicted in Figure 2. This configuration enhances the ability to perceive distant objects captured by the images within a wide horizontal field of view. Furthermore, our top lidar captures data within a 360° horizontal field of view, reaching a range of up to 230 meters. This high-resolution and long-range capability enables precise and distant annotations in our dataset. Additionally, our 4D radar covers a forward range of 120° and can detect objects up to 300 meters away. These sensors play a crucial role in facilitating the development of effective long-range and robust perception algorithms within our dataset. Moreover, our autonomous driving platform incorporates GNSS (Global Navigation Satellite System) and IMU (Inertial Measurement Unit) systems to provide and collect accurate location information.
Figure 2. Sensor setup for our data collection platform
- The specification of the autonomous vehicle system platform. Our proposed dataset is collected from a high-resolution camera, an 80-line mechanical LiDAR, and two types of 4D radars, the Arbe Phoenix and the ARS548 RDI radar. Our dataset provides GPS information for timing implementation. The sensor configurations are shown in Table 1.
Table 1. The sensor specification of the data collection platform
Sensors | Type | View | Number | Resolution | Field of view | FPS | ||||
Range | Azimuth | Elevation | Range | Azimuth | Elevation | |||||
Camera | SG2-IMX390C-5200-GMSL2 | Front | 3 | - | 1920px | 1080px | - | 30°/60°/120° | 17°/32°/62° | 30 |
LiDAR | RS-Ruby Lite | 360° | 1 | 0.05m | 0.2° | 0.1° | 230m | 360° | -25°~15° | 10 |
4D radar | ARS 548RDI | Front | 1 | 0.22m | 1.2°@±15° 1.68°@±45° |
2.3° | 300m | ±60° | ±14°@<100m ±4°@300m |
20 |
- We analyze the distribution of frames across different weather conditions, times of day, and road types. As visually depicted in Fig. 4(b) and (c), our data collection process involved capturing various weather conditions, such as clear, cloudy, rainy, and foggy scene, as well as different light conditions, ranging from daytime to dusk and nighttime situations. Notably, our dataset includes a significant proportion of nighttime scenes, thereby encompassing more challenging scenarios to develop more robust detection algorithms. Additionally, to cover a comprehensive array of driving scenarios, we meticulously collected data from urban roads, highways, suburban roads, and tunnel scenarios, as presented in Fig. 3(d). This data collection strategy ensures the inclusion of a diverse set of real-world traffic scenes, facilitating the creation of a more comprehensive dataset.
Figure 4. Distribution of sampled cities (a), weather conditions (b), time of day (c), and road types (d) in L-RadSet. Our dataset collects data from two different cities, which encompasses four weather conditions, three light conditions from day to night, and four road types.
- We present various driving scenes in Fig. 5. These scenes encompass typical driving environments such as urban, suburban, and highway, as well as some challenging scenarios like nighttime, adverse weather, and low-light tunnel. As can be seen in Figure 7(d), (e), and (h), the camera is highly affected by light, and the captured images even lack RGB information to observe the object when the light is extremely weak. Therefore, in this scenario, lidar can acquire very tight spatial information, compensating for the camera’s shortcomings. In addition, adverse weather such as fog and rain can also affect image quality to some extent, but the performance degradation is much lower than the effect of low light, as shown in Figure 6(b) and (c). Adverse weather tends to cause the lidar to produce more noise, which affects its detection. In this case, the semantic information of the image and the accurate 3D measurements provided by the 4D radar point cloud can be used to enhance the detection performance.
Figure 5. The Scene visualization of L-RadSet captured by front 60° camera in different weathers, light conditions and road types. (a) urban clear daytime, (b) suburban light-fog daytime, (c) urban rainy daytime, (d) urban cloudy nighttime, (e) tunnel daytime, (f) suburban cloudy dusk, (g) highway clear daytime, (h) highway cloudy nighttime.
-
Our dataset is freely available to researchers. Please download and sign our agreement and send it to the provided email address ([email protected], [email protected]). You will receive the download link within one week.
-
When unzipping the data, please file and organize it by following the format below:
└─L-RadSet
├─ImageSets.zip
├─calibs.zip
├─labels.zip
├─timestamp.zip
├─images
│ ├─image_0.zip
│ ├─image_1.zip
│ ├─image_2.zip
├─lidar.zip
├─radar.zip
├─detection.json
├─detection_long.json
└─README.md
- This folder contains 11200 frames of labeled pointclouds and image data. The structure of the folder can be softly linked in data folder. The entire structure is shown as blow:
└─L-RadSet
├─ImageSets
│ test.txt
│ train.txt
│ trainval.txt
│ val.txt
├─calib
│ ├─ 000000.txt
..........
├─image
| ├─image_0
| │ │ 000000.png # Undistort images of the camera 30°.
│ │ ..........
| ├─image_1
| │ │ 000000.png # Undistort images of the camera 60°.
│ │ | ..........
| ├─image_2
| │ │ 000000.png # Undistort images of the camera 120°.
│ │ | ..........
├─lidar
│ │ 000000.bin # point cloud in bin format.
│ │ ..........
├─radar
│ │ 000000.bin # point cloud in bin format.
│ │ ..........
├─labels
│ │ 000000.txt # Label in txt format.
│ │ ..........
├─timestamp
│ │ 000000.txt # timestamp in txt format.
│ │ ..........
├─anchor_size.json
├─detection.json
├─detection_long.json
└─README.md
- The calib.txt contains three parts.
Intrinsics of each camera: matrix P(4×4)
Extrinsics of LiDAR to each camera: matrix P(4×4)
Distortion parameters: matrix P(1×5)
- All values (numerical or strings) are separated via spaces, each row corresponds to one object. The 19 columns represent:
Value Name Description
-------------------------------------------------------------------------------------------------------
1 type Describes the type of object: 'Car', 'Bus', 'Truck', 'Motorbike', 'Bicycle', 'Person', 'Child', 'Traffic_cone' and 'Barrier'
1 truncated Float from 0 (non-truncated) to 1 (truncated), where truncated refers to the object leaving image boundaries
1 occluded Integer (0,1,2,3) indicating occlusion state: 0 = fully visible, 1 = partly ccluded, 2 = largely occluded, 3 = unknown
1 alpha Observation angle of object, ranging [-pi..pi]
4 bbox 2D bounding box of object in the image (0-based index): contains left, top, right, bottom pixel coordinates.
3 dimensions 3D object dimensions: height, width, length (in meters).
3 location 3D object location x,y,z in LiDAR coordinates (in meters).
1 rotation_y Rotation ry around Z-axis in LiDAR coordinates [-pi..pi].
1 score Only for results: Float,indicating confidence in detection, needed for p/r curves , higher is better.
This is the documentation for how to use our detection frameworks with L-RadSet dataset. We test the L-RadSet detection frameworks on the following environment:
- Python 3.8.16
- Ubuntu 18.04
- Torch 1.9.1+cu111 or higher
- CUDA 11.1 or higher
- mmdet3d 1.1.1
- mmdet 3.0.0rc5
- mmengine 0.7.4
- setuptools 58.0.4
-
After all files are downloaded, please arrange the workspace directory with the following structure:
-
Organize your code structure under mmdetection3d framework, as follows
mmdetection3d
├── checkpoints
├── configs
├── data
├── mmdet3d
├── projects
├── tools
├── work_dirs
- Organize the dataset according to the following file structure under the data folder
└─L-RadSet
├─ImageSets
│ test.txt
│ train.txt
│ trainval.txt
│ val.txt
├─calib
│ ├─ 000000.txt
..........
├─image
| ├─image_0
| │ │ 000000.png # Undistort images of the camera 30°.
│ │ ..........
| ├─image_1
| │ │ 000000.png # Undistort images of the camera 60°.
│ │ | ..........
| ├─image_2
| │ │ 000000.png # Undistort images of the camera 120°.
│ │ | ..........
├─lidar
│ │ 000000.bin # point cloud in bin format.
│ │ ..........
├─radar
│ │ 000000.bin # point cloud in bin format.
│ │ ..........
├─labels
│ │ 000000.txt # Label in txt format.
│ │ ..........
├─timestamp
│ │ 000000.txt # timestamp in txt format.
│ │ ..........
├─anchor_size.json
├─detection.json
├─detection_long.json
└─README.md
- Clone the repository
git clone https://github.com/crrasjtu/L-RadSet.git
- Create a conda environment
You can follow the official installation of mmdetection3d - Note
- Change the mmdetection3d version into 1.1.1 before you compile it
git checkout v1.1.1
- Put the our files into the corresponding folders
- Generate the data infos by running the following command
- using lidar & image data
python tools/create_data.py l-radset --root-path ./data/l-radset --out-dir ./data/l-radset --extra-tag l-radset
- using 4D radar & image data
python tools/create_data.py radset --root-path ./data/l-radset --out-dir ./data/l-radset --extra-tag radset
- To train the model on single GPU, prepare the total dataset and run
python train.py ${CONFIG_FILE}
- To train the model on multi-GPUs, prepare the total dataset and run
tools/dist_train.sh ${CONFIG_FILE} ${NUM_GPUS}
- To evaluate the model on single GPU, modify the path and run
python test.py ${CONFIG_FILE} ${CKPT}
- To evaluate the model on multi-GPUs, modify the path and run
tools/dist_train.sh ${CONFIG_FILE} ${CKPT} ${NUM_GPUS}
- 3D object detection
Table 2. Experimental results of all baselines in 3D object detection
Baselines | Sensor | mAP | CDS | mATE | mASE(1-IoU) | mAOE | ckpts |
DETR3D | C1 | 0.426 | 0.515 | 0.821 | 0.213 | 0.300 | model |
C1, C2 | 0.451 | 0.524 | 0.757 | 0.207 | 0.278 | model | |
PointPillars | L | 0.648 | 0.681 | 0.303 | 0.198 | 0.360 | model |
SECOND | L | 0.653 | 0.692 | 0.280 | 0.181 | 0.346 | model |
PVRCNN | L | 0.680 | 0.724 | 0.206 | 0.177 | 0.315 | model |
PointPillars | R | 0.403 | 0.517 | 0.604 | 0.236 | 0.188 | model |
SECOND | R | 0.324 | 0.451 | 0.670 | 0.262 | 0.331 | model |
PVRCNN | R | 0.351 | 0.475 | 0.649 | 0.255 | 0.302 | model |
- Many thanks to the following open-source projects:
- The computations were partly run on the Siyuan-1 cluster supported by the Center for High Performance Computing at Shanghai Jiao Tong University.