Edge AnnotationZ Challenge was opened on December 14, 2021, and closed on February 10, 2022 (see here). However, you are always welcome to try out your ideas and let us know how your solutions work for this problem. This short document intends to explain the challenge and how the evaluation is done. To have a more detailed description of the challenge, please see challenge description.
For a full overview of the dataset, we refer to GitHub repo for Zenseact Open Dataset and the description that can be found on AI-Swedens' website here.
To download the challenge data (detections, trained models, etc.) in addition to Zenseact Open Dataset, please send an email to [email protected]. Please specify your name, institution/organization, and an email connected to dropbox in your email.
In the challenge data, you can find the following information:
- annotations_kitti - which contains the dynamic objects annotations but translated into the KITTI format (find more information below)
- detections - which contains the raw lidar and camera detection generated by our pretrained models (find more information below)
- trained_models - which contains the pretrained model weights (find more information below)
- train.json - which is a json file containing all the paths to all relevant files to be used during development
- test.json (will be released later) - which have the same format as train.json but contain the paths to the files that should be used when submitting your results.
- unlabeled.json (attached for completeness) - which contains the paths to all files that do not have any dynamic objects annotations i.e., are irrelevant for the challenge.
To further facilitate this challenge we have opted to convert our annotations to the format defined by the KITTI 3D object detection challenge. These converted annotations can be found in the annotations_kitti
folder. This format means that each row contains the following.
Class Name | Truncation | Occlusion | Alpha | Bbox left | Bbox top | Bbox right | Bbox bottom | height | width | length | x | y | z | rotation_y
In these annotations you will find three different classes; Vehicle
, Cyclist
, and Pedestrian
which are the classes we would like you to generate pseudo-annotations for. Note that the original annotations contain other classes than this and that those classes have been mapped to either of the three classes stated above or to the DontCare
class. If you want to see how this mapping has been made, we ask you to consult the script convert_annotations_to_kitti.py
found in the development-kit.
Additional things to mention here is that the alpha value
has not been computed and set to zero (0) for all ground truths.
Also, not all ground truth contains 3D properties, e.g., objects that are very far away thus not seen in the lidar point cloud during annotation. For those objects we retain the 2D properties but set the height
, width
, length
, x
, y
, z
, and rotation_y
to 0. Note that these ground truths will not affect the BEV and 3D AP score but will affect the 2D AP score.
To provide you with basic detections on each frame we have trained one lidar object detection network and one vision object detector. These can be found in the trained_models
folder. Note that you will find 3 sets of weights for each model. This is because we have done folded trainings to provide you with unbiased detections on every single frame. More information on how to use this (and for which frames you can expect to get unbiased detections given a certain set of model weights) we refer you to the readme in that folder.
This folder contains the detections generated by our pretrained models. The detections are provided on the KITTI format (see above). We want to highlight two things here.
The camera detections only provide image plane information i.e., 2D bounding box. All ED properties should therefore be consider invalid and are set to -1
The lidar detections provide only 3D properties. All 2D properties should therefore be consider invalid and are set to 0. Furthermore, note that the detections are provided in the lidar coordinate system i.e., x is right, y is forward and z up.
The quality of the detections have been evaluated using the metrics defined in the metrics
section below. The result of that evaluation (and how it was performed) can be found in the evaluation example in the development-kit.
While most sequences contain 21 lidar point clouds in the range_lidar_data
some sequences contains fewer. This is the case for 78 out of the total 6666 sequences. We know that this might be troublesome and thank you for your indulgence in this matter.
Here we intend to explain the metric used to evaluate the pseudo-annotations provided in the Edge Annotationz Challenge.
Your solutions will be scored based on how good your psedo-annotations are. We want you to provide us with annotations for three (3) different classes. These classes are Vehicle, Cyclist, and Pedestrian. The pseudo-annotations you provide should follow the KITTI format, i.e., one psedo-annotation per row containing
Class Name | Truncation (-1) | Occlusion (-1) | Alpha (-10) | Bbox left | Bbox top | Bbox right | Bbox bottom | height | width | length | x | y | z | rotation_y | confidence score
Note that the height, width and length are in meters, that x, y & z are in the camera coordinate system, and that the columns marked with (-1/-10) can be left as -1/-10.
To evaluate your solutions we will use the metrics defined in the KITTI 3D object detection challenge, with some slight modifications. We will assess your psedo-annotations by averaging both over the 2D AP, birds-eye-view (BEV) AP, and the 3D AP, but also over each of the classes defined. To clarify we will score your solution according to the following formula:
$$ score = \frac{AP^{2D}{vehicle} + AP^{BEV}{vehicle} + AP^{3D}{vehicle} + AP^{2D}{cyclist} + AP^{BEV}{cyclist} + AP^{3D}{cyclist} + AP^{2D}{pedestrian} + AP^{BEV}{pedestrian} + AP^{3D}_{pedestrian}}{9} $$
It is worth mentioning that in KITTI you are scored in 3 different difficulty categories (EASY, MODERATE, and HARD). These categories are defined by how occuluded, truncated and large (2D height) each ground truth is. To learn more about the categories we refer you to the KITTI webpage linked above. During this challenge we will only assess you on the MODERATE difficulty class. That is ground truth objects are allowed to be somewhat occluded and truncated, but have to be at least 25px high in the image plane. To clarify, all objects that are smaller than that are ignored during the evaluation.
Furthermore, the computation of 2D, BEV, and 3D AP is done independently, but we still want to to supply only one set of psedo-annoations per frame. To clarify, it is NOT OK to provide us with a set of 2D pseudo-annotations and another set of 3D pseudo-annotation. These shall all be in the same file as specified under the What you should provide
section.
To compute the metrics we will use this forked repository which we have modified slightly to suit our needs. To evaluate your performance you can clone that repository and follow the compilation stages found in the repos README.
We have also created a Dockerfile that you can use. This can be found under the eval
folder.
The psudo-annotations you provide should all be gathered in to one folder with one .txt file per core-frame as described above. The filename should correspond to the frame index e.g., dataset_root/submitted_pseudo_annotations/001337.txt
.