[Case Study] Algolux Eos Embedded Perception vs. State-of-the-Art Models
Comparing the Algolux Eos real-time robust perception software with recent real-time object detection models: YOLOv3, RetinaNet-50, and SSD MobileNet V2.May 1, 2021
Robust perception under varying illumination and weather is a key requirement for autonomous driving and driver assistance systems. But in practice, these systems often fail under these conditions. This compromises the large number of dependent components such as the planner and controllers for automated emergency braking or automated driving that rely on the perception system output to work effectively.
Adverse imaging conditions such as low ambient lighting can result in images with a very low signal-to-noise ratio. Vision systems that do not handle suboptimal imaging conditions can easily fail to perform well in such cases. Furthermore, there is a trade-off between robustness and speed. Deeper and more computationally expensive models have higher capacity but operate at very low frame rates.
In this case study, we compare the Algolux Eos real-time robust perception software, with recent real-time object detection models, namely, YOLOv3, RetinaNet-50, and SSD MobileNet V2.
Results: We determine that Eos is 38 to 55 mAP (Mean Average Precision) points better in adverse imaging conditions and 8 to 27 mAP points better in clean, noise-free conditions and performs 2.4x faster than YOLOv3.
In order to evaluate the robustness of different models in adverse imaging conditions, we captured a large dataset under varying imaging conditions. Using camera calibration information, during training, this dataset is augmented to 100x more RAW data than captured, sampling very broad illumination conditions.
The images were captured using a FLIR Blackfly 23S6C-C camera, which uses a Sony IMX249 sensor with 5.86×5.86 micron pixel pitch, and a high-quality machine-vision lens Fujinon CF12.5HA-1. We use a subset of this dataset for training and a separate unseen testing dataset.
The following classes were considered for both training and evaluation:
- Person (Cyclist/Pedestrian)
- Traffic Light
- Traffic Sign
To train the different models in good conditions, we combined our training data with publicly available object detection datasets, namely, COCO, Pascal VOC, and BDD100K, resulting in the total number of training images being around 200K. For the public dataset, only the subset of images containing the relevant labels was considered.
For evaluation, we used a joint public validation set consisting of COCO, Pascal VOC, and BDD100K, and our custom validation set consisting of 18 scenes under different lighting and environmental conditions.
Models Compared in this Report
- Eos is a real-time end-to-end robust vision system that operates on a RAW image stream from one or more cameras and outputs task-specific outputs such as bounding boxes with class labels, segmentation masks, lane markings, etc. Eos is designed to be robust to low-light, noise, and in general, adverse imaging conditions while running at minimum 30 FPS on mid-range mobile GPUs (e.g., 1152 CUDA core discrete GPU used in NVidia Drive PX2) and real-time on embedded processors for automotive and security applications.
- YOLOv3 is a single-stage detection built using a 106-layer deep residual network with multiscale feature extraction and detection heads. It has state-of-the-art performance among the publicly available object detection models.
- RetinaNet-50 is another high-performing publicly available single-stage object detection model that is built on top of a 50-layer residual network with a multiresolution Feature Pyramid Network (FPN) architecture.
- Single Shot Multibox Detector (SSD) with MobileNet V2 backbone is one of the fastest single shot detectors designed to run on mobile devices with very little compute. The key components of it are the linear bottleneck layers and usage of depthwise separable convolution to significantly reduce the number of operations.
Number of operations
In terms of floating-point operations (FLOPS), YOLOv3 operates on images of size 608×608 and performs 140 GFLOPS (billions of floating-point operations). Algolux Eos operates on RAW data in 2.4x fewer operations.
Eos is implemented using Tensorflow and trained using a novel Algolux- developed joint optimization framework for 30 epochs. For YOLOv3, we used the implementation from the authors’ GitHub repository and added the necessary configuration files to train and perform inference with 6 labels. The model was trained from pre-trained darknet53 weights and converged after 64 epochs.
For RetinaNet-50 and SSD MobileNet V2, we use the implementation provided in the Tensorflow Object Detection API and trained the models until convergence.
Next, we show the results achieved with the described Eos end-to-end model (from RAW to detector output), compared against YOLOv3, RetinaNet-50, and SSD MobileNet V2 running on RGB images. Evaluation is done using the Pascal VOC Average Precision (AP) per class and Mean Average Precision (mAP) metric at 50% Intersection over Union.
1. Harsh (low-light, low-contrast, noisy) Validation Dataset
For the 1744 difficult images that were taken under adverse imaging conditions, Eos outperforms all the other models by a significant margin. Compared to the second-best model YOLOv3, Eos is 38 Average Precision (AP) points better on average across all object categories. Some sample comparison images are shown in section 5.3.
Harsh Conditions Validation Dataset
2. COCO, Pascal VOC, and BDD100K Clean Image Validation Dataset
For the larger clean image validation set consisting of 20K images in good conditions, Eos performs better than all the other models used in this report. Compared to YOLOv3 (ranked 3rd), Eos (ranked 1st) is 8 AP points better on average, and 6 AP points better than RetinaNet-50, the second-best model for this validation dataset.
COCO, Pascal VOC and BDD100K Validation Dataset
Sample Detections from YOLOv3 and Eos
Below we provide some detection examples from YOLOv3 and Eos to illustrate the imaging conditions and detection performance. While YOLOv3 is able to detect some objects in low-light, noisy conditions, its performance is significantly affected by the imaging condition while Eos performs consistently irrespective of the imaging condition.
For instance, the first row of the following table shows detection under normal illumination conditions. Although YOLOv3 misses the person in front of the camera (Image 1), on average it provides dense detection. But for the rest of the rows that contain low-light and noisy images, YOLOv3 performs significantly worse than the normal imaging condition. Eos (right column) on the other hand performs consistently well under varying imaging conditions.
Image 1. Normal lighting conditions where YOLOv3 misses the clear pedestrian in front of the car.
Image 2. Ultra low-light condition where Eos detects almost all the visible cars.
Image 3. Ultra low-light image where YOLOv3 fails to detect the clearly visible pedestrian that was detected by Eos.
Image 4. Eos performs consistently better similar to the previous image pair.
Image 5. Low contrast imaging condition where Eos detects the construction truck but YOLOv3 fails.
Image 6. Ultra low-light street scene where the Eos detects the difficult to see pedestrian in front of the camera.
Image 7. Low-light indoor parking where Eos detects almost all cars and also a pedestrian at the back.
Image 8. Ultra low-light street scene where Eos performs significantly better than YOLOv3.
Image 8b. From the same sequence as before were YOLOv3 detects a different pedestrian but Eos performs consistently.
Image 9. Low-light and low-contrast image where Eos detects pedestrian (right image for visualization).
fill up the form below.