[Case Study] How an Automotive Tier 1 Supplier Improved Computer Vision Accuracy by up to 48% mAP within Days
Atlas was used to automatically maximize computer vision accuracy for a front-facing vision system from a leading automotive Tier 1.July 7, 2021
The Atlas Camera Optimization Suite is the industry’s first set of cloud-based machine learning tools and workflows that automatically optimizes camera architectures for computer vision and image quality. In this case study, Atlas was used to automatically maximize computer vision accuracy for a front-facing vision system from a leading automotive Tier 1. Atlas optimized the image processing functions of the system’s Renesas V3H ISP paired with the Sony IMX490 HDR image sensor to determine an optimal ISP configuration for two trained object detection models.
The Atlas-optimized ISP configuration improved the accuracy of the customer’s YOLOv4 model by up to 28% Mean Average Precision (mAP) points compared to the original image quality tuning. Optimizing the ISP for another more robust embedded vision model improved the detection accuracy by up to 48% mAP points.
The automated workflow optimized the system in a few days. This compares to several months of expert ISP tuning effort typically seen when using today’s manual tuning workflows that are not effective for computer vision, enabling significant resource scaling and cost savings.
- Computer vision accuracy can be improved by optimizing – rather than tuning – camera ISPs for either AI or classical computer vision models.
- Any lens, sensor, and ISP combination can be supported to optimize visual image quality or maximize computer vision results.
- Atlas optimization is highly scalable and can be done in days vs. today’s costly and months-long manually intensive ISP tuning process.
Computer Vision Accuracy Results - mAP (IoU >0.5)
1. Today’s Camera ISP Tuning Process
Cameras are designed and must be tuned every time a new lens or sensor is integrated with the ISP to provide subjectively pleasing image quality (IQ) for viewing, with growing use for computer vision in many safety-critical applications.
The camera’s ISP has hundreds of parameters that control each pipeline block (Figure 1).
Figure 2. Today’s ISP tuning process
Image quality tuning is a highly iterative and complex manual process requiring a team of imaging experts over many weeks or months to determine the best parameter settings for each ISP block within the program schedule (Figure 2).
2. Tuning Challenges for Computer Vision
For human viewing, this can still achieve visually good IQ results but requires scarce expert resources with specific ISP domain knowledge. Outsourcing can address some gaps, but those teams are also resource-limited. As visual IQ is ultimately subjective, it is difficult to know when you are complete, and tuning must be done every time a camera component is changed. Costs may exceed $50,000 per camera program and run many months. This approach is not scalable or predictable.
This process also does not provide the optimal output for computer vision.
- Imaging teams cannot subjectively see, and manually tune for, the best image quality for specific vision algorithms.
- “Rules of thumb”, such as increasing contrast and sharpness, do not generalize to model architecture and training sets.
- The very large ISP parameter space can’t be evaluated within a practical timeframe.
An alternative optimization-based approach is required to maximize computer vision results.
Atlas Camera Optimization for Computer Vision
Accuracy and robustness in all conditions is the objective for computer vision, but robust accuracy in all conditions is paramount for safety-critical applications, such as automotive. But as we’ve seen, today’s subjective and manually intensive ISP tuning does not work.
But as we have seen, this requires a move away from traditional tuning workflows towards automated and metric-driven ISP parameter optimization provided by the Atlas Camera Optimization Suite (Figure 3).
Atlas is the only camera / ISP optimization solution that addresses this challenge and is commercially available. It applies novel solvers to handle the massive and very rugged parameter space to maximize computer vision metrics for any camera and vision task.
Figure 3. Atlas workflow for computer vision
The Atlas workflow:
A small dataset of field RAW images is captured with the target camera module and annotated. It must contain a distribution of images that exercise the operational range of the camera and use case scenarios. The distribution covers the exposure times and gains the sensor is capable of in the expected low to bright light scenes, including High Dynamic Range (HDR), and low to high contrast. It also needs to have good coverage of the target detector classes and object sizes. If IQ metrics are to be evaluated or optimized, a small set of RAW lab chart images would also be captured.
Since the objective is to optimize an ISP and not train a neural network, this dataset can be orders of magnitude smaller than neural network training datasets, i.e only several hundreds or thousand images rather than hundreds of thousands or millions of annotated images, which is what you’d typically use for training a network.
Each frame is tagged with exposure and gain information from the sensor module so the sensor state is known for each image. That metadata is also used in the optimization process so the ISP parameter modulation functions can be built to control the image quality for the particular capture conditions.
This small RAW image dataset is run through the ISP and into the detector model. Computer vision accuracy metrics, such as average precision and recall of the detected object classes, are then evaluated for each trial ISP configuration.
Visual image quality KPIs can be directly measured on the image, but for computer vision-only systems, how visually pleasing the images look is unimportant so long as the final ISP parameter configuration enables the vision model to produce more accurate detections.
This novel optimization framework is applied to find the best ISP parameters to use for a specific pre-trained deep learned or classical computer vision model by minimizing the loss function of those accuracy KPIs.
These ISP pipelines are not differentiable and do not necessarily optimize well with typical gradient-based approaches. There can be null spaces in registers or non-linear stepwise behaviors as an ISP parameter crosses a threshold that is internal to an algorithm, so the optimizer needs to be robust and be able to split up and explore the solution space to escape the massive number of local minima.
Atlas projects require some initial optimizations to ensure proper ISP configuration and convergence, and then a final optimization to maximize results. This typically takes anywhere from 4000 to 8000 trials (or iterations) over a few days to explore the parameter space and converge on an optimized result, depending on factors such as number of parameters and KPI objectives that are being optimized.
This optimization process can also be applied to automate and significantly accelerate visual image quality tuning or optimize for both computer vision and visual image quality goals (Figure 4).
IV. Case Study
Imaging System Details
Algolux worked with a leading automotive industry Tier 1 supplier to enable the optimization of their front-facing imaging system using an IMX490 HDR image sensor and Renesas V3H ISP combined with a YOLOv4 object detection model.
The sensor was configured to capture an HDR image with a 24-bit linear range that was piecewise linearly companded to 16 bit for input to the ISP.
The ISP outputs 14-bit YUV data, which requires 10 bits of range compression from the ISP.
The ISP operation was also subject to some additional constraints. As the image stream was also being used for other computer vision applications, the output image had to be linearizable, which means that transforms have to be global and invertible.
Due to the Tier 1’s OEM customer requirements, local tone mapping could not be used. Furthermore, unlike most camera configurations, the ISP had to run in a steady state for system stability, meaning adaptive gains could not be set automatically based on image statistics.
As such, the ISP needed to be optimized for a static configuration for all images that cover the full dynamic range.
The optimal tone mapping or range compression to apply in the pipeline for the detector was unknown. So, the Atlas optimization process had to determine the best transform to use for mapping the input data range to the most useful range for the detector model.
A YOLOv4 object detector that had been trained with the COCO dataset was used and only the car and pedestrian detections were considered for this application.
2. RAW Dataset Used for Renesas V3H ISP Optimization
- Optimization for YOLOv4
- 2500 RAW images annotated
- 90 used for Atlas optimization
- Remainder used for validation
- Balanced distribution across lighting, classes, and object sizes
- Dataset built from 4 captures during:
- Midday: Bright daytime images
- Afternoon: Bright day to afternoon
- Dusk: Afternoon to dusk
- Night: Night low light
A set of 2500 RAW images was captured throughout a range of lighting and weather conditions and annotated.
Of those, just 90 images were needed for Atlas optimization. The remaining images were used for validation.
3. ISP Parametrization for Optimization
The ISP was parameterized in such a way that ensured the output image met the system constraints while being as flexible as possible with regard to the image processing functions.
A specific group of blocks in the ISP and the parameters that control the way the algorithms manipulate the images were selected.
6 lookup tables and 27 scalar parameters were validated and co-optimized through the ISP. These controlled range compression and expansion functions, color accuracy & saturation, spatial filtering, demosaicing, denoising, edge enhancement, sharpening, and contrast (Figure 5).
Some of the compression and expansion functions were optimized using separate optimization loops that used test patterns exercising the full signal range.
This was done using a measure of quantization error introduced in the ISP to remove biases caused by the luminance distribution of the objects in the optimization set. This enabled analysis of any precision bottlenecks to ensure any data coming into the ISP was also present in the output.
Per the program requirements, Atlas was configured to optimize computer vision accuracy with a single objective loss function, which was mean average precision (mAP).
30 instances of a bit accurate software model of the ISP were run in parallel and consistently reached convergence after about 1000 ISP configuration trials over a runtime of about 36 hours (Figure 6).
The speed of each trial was constrained by the performance of the ISP model itself and processing time for the 5-megapixel images.
Each trial represents a unique ISP configuration, and the configuration settings and image results for each trial were stored by Atlas for later reference.
The spikes and drops in accuracy seen in Figure 6 as the optimizer is moving toward convergence come from a continued intelligent sampling of parameters. This allows the optimizer to avoid getting trapped in the many local minima in the ISP’s parameter space.
4. Image Quality Tracking
Specific lab charts and test patterns were included in the optimization dataset for each trialed configuration to measure:
- Color Accuracy / Saturation
- Tonal Response
- Overshoot / Undershoot
The customer also evaluated and tracked various image quality factors during optimization and convergence. Lab-captured chart images were included in the RAW optimization dataset for reporting.
While these could have also been used by Atlas for optimization, this was not an objective of the program. Instead, the intent was to gain insights into the way the algorithms were configured and to do sensitivity analysis with the data.
5. Accuracy Results for YOLOv4
When the full dataset is analyzed across the different lighting conditions, there are substantial accuracy improvements in low light conditions over the baseline settings provided by the ISP vendor.
These improvements ranged from a very significant 28% mAP in low light to smaller but meaningful increases for well-illuminated daytime conditions.
Optimization of a front-facing imaging system using an IMX490 HDR image sensor and Renesas V3H ISP combined with a YOLOv4 object detection model
Here are some example images from the ISP as they were passed into the YOLOv4 model.
- The original configuration appears dark as it wasn’t necessarily tuned to render the HDR images for a standard dynamic range color space.
- The converged global tone mapping produced low contrast images for the daytime scenes but this did not degrade the detector accuracy.
- Local contrast could be improved using a local tone mapper for dynamic range compression but that would have violated one of the customer’s systems constraints.
Note: In order to better visualize and compare the example results below for this Case Study, some basic brightness and global contrast stretching were applied to roughly equalize the images.
Despite lower contrast initially, the information required for object detection is still present in the baseline tuned images. The banding produced by the vendor-tuned ISP settings seen below indicates that the tone mapping was not well configured for the output bit depth, even at 14 bit, as there is a massive 140 dB range to cover on the input.
There is also a massive loss of data in low light conditions as the ISP had not initially been tuned well by the provider for the system constraints and the signal ranges. Clearly, much better detector results were achieved by Atlas optimization of the mAP accuracy metric.
Examples of unequalized images that were passed into the detector:
Examples enhanced for visualization:
6. Optimizing the V3H ISP for the Algolux Eos Detector Backend
In addition to optimizing the Renesas V3H ISP for the customer’s YOLOv4 model, Atlas was also applied to the Algolux Eos detector backend to explore how much further performance could be improved when a much more accurate object detection model was used.
Eos is a highly robust embedded perception stack, including both a differentiable ISP and detector backends, and is trained end-to-end (see Eos performance case study here).
Reoptimizing the Renesas V3H ISP for the equivalently trained Eos detector delivered additional improvements over the YOLOv4 configuration. This was primarily due to the architectural advantages of Eos.
Note that applying parameters optimized for one vision model to another will improve that second model’s performance vs. the visual IQ tuning. But the highest accuracy results are achieved when optimizing the ISP for the specific vision model.
This highlights that each model has its own image quality “preference” based on a number of factors, such as how it was trained and its architecture.
7. Eos Detector Optimization Results
Optimization of a front-facing imaging system using an IMX490 HDR image sensor and Renesas V3H ISP combined with the Eos object detection model
Here are some example images from the ISP as they were passed into the Eos Perception model.
This case study demonstrated that it is possible to significantly improve computer vision accuracy by optimizing ISPs for pre-trained object detection models. The Atlas optimization framework is sensor and ISP-agnostic. It is not necessary to know the internal workings of the ISP algorithms, but choosing the right subset of registers and effective parameter ranges will speed up convergence time.
Over the course of the program with this automotive Tier 1 provider, Atlas improved their configuration of the widely used Renesas V3H ISP. This was done in a way that allows their camera system to run without any dynamic controls to ensure system stability, a critical requirement by the Tier 1’s OEM customer, while maximizing computer vision accuracy.
This was done in a matter of days, without any retraining of the vision model, by using the Atlas automated workflow to optimize the ISP parameters rather than attempt any additional lengthy expert-intensive manual tuning.
fill up the form below.