Fixing Tesla’s Highway Autopilot

May 30, 2021

Picture this: a long, boring, highway drive. You try out every radio station but there seems to only be commercials on. You turn the radio off. You stop for coffee. And back again you go. How about a podcast? Hard not to doze off but you have to keep going.

Sounds familiar? This happened to every single driver.

Now, imagine if you could just fall asleep at the wheel. The car would go on autopilot and you could just rest. Well, a man in Alberta, Canada thought that was possible. But as it turns out, Autopilot does not mean… autopilot.

Despite Tesla’s aggressive branding, their cars are not ready for L4 autonomy. But why is that? Why are they – and the rest of the industry – still struggling so much with something humans can do routinely?

Cameras are can be great

Today’s vision architectures are designed to fail

The vast majority of automotive vision systems primarily rely on cameras, and that majority will continue to increase over time. Recent reports expect cameras to hit 400 million units by 2031, about 10x the volume of either radar or lidar.

Today, a camera consists of a lens/sensor module and an image signal processor (ISP) that has many parameters manually “tuned” for that specific lens/sensor configuration.

The ISP processes the RAW sensor data to provide a visually pleasing image to the vision model. And while this architecture usually provides good results in good imaging conditions… in hard scenarios, such as low light and contrast, not so much.

This is Tesla’s latest Model S Autopilot. As you can see, the results are problematic. The vision system fails to detect two obvious objects: a large SUV and a pickup truck with a trailer. 

While this was taken at night, no one would describe these driving conditions as extreme. And yet, this is still considered to be an edge case by the industry.

Worse even, in this case, Tesla is actually relying on radar/camera fusion and tracking to remove uncertainty. In other words, it is a beefed-up perception system that still manages to fail.

So to recap:

  1. Cameras are the primary sensor for automobiles.
  2. Today’s vision systems, even highly advanced ones, are not robust.

So if we’re tackling driving safety with camera-based vision systems, how can we ensure it actually works?

The need to rethink vision architectures

As we mentioned earlier, the typical vision system consists of a camera – with a lens, sensor, and image signal processor tuned for that lens/sensor configuration. And as we’ve just seen, it doesn’t work well.

So to tackle that design problem, we decided to take a different approach altogether. Instead of relying on the image processor, we uniquely take RAW sensor data as input, then fuse and process the data through an end-to-end deep neural network. This is our Eos Embedded Perception Software.

Eos has been engineered to achieve exceptionally accurate and scalable computer vision performance under these challenging driving scenarios while also improving perception under normal driving conditions.

Thanks to its end-to-end architecture, it delivers up to 3x improved accuracy, especially in low light and harsh weather, as benchmarked by leading OEM and Tier 1 customers against state of the art public and commercial alternatives.

So how would Eos fare against Tesla’s Highway Autopilot?

As you can see, Eos (right) successfully detects not only close objects but also incoming vehicles. Interestingly, while Tesla is using camera/radar fusion and tracking, we are just running a frame-by-frame camera-only detection model. Despite their advantage, we significantly outperform them.

But what’s so special about our approach?

Robust & scalable by design

As we’ve seen, today’s camera-based vision systems suffer from inherent robustness limitations, and our Eos end-to-end learned architecture combines image formation with vision tasks to address the issue.

Using a novel AI training framework and methodology, we apply unsupervised and self-supervised learning to automatically adapt Eos to support any customer lens and sensor combination in just days. This effectively removes their need to capture and annotate new training datasets that normally would take many months and hundreds of thousands of dollars per configuration.

This joint design and training of the optics, image processing, and vision tasks enables Eos to deliver this improved performance. Furthermore, the Eos implementation has been optimized for efficient real-time performance across common target processors, providing customers with their choice of compute platform.

Eos is a comprehensive solution that delivers a full set of highly robust perception components addressing individual NCAP requirements, L2+ ADAS, higher levels of autonomy from highway autopilot, and autonomous valet (self-parking) to L4 autonomous vehicles – as well as Smart City applications such as video security and fleet management.

Our stack’s key vision features include:

  • Object and vulnerable road user (VRU) detection and tracking
  • Free space and lane detection
  • Traffic light state and sign recognition
  • Obstructed sensor detection
  • Reflection removal
  • Multi-sensor fusion
  • And more.

You can see more video examples on our product page.

We’ve benchmarked against many public and commercial alternatives. Here we compare our Eos’ frame by frame camera-only detection vs. the latest gen radar/camera fusion and tracking from Tesla.

Delivering a comprehensive portfolio of perception capabilities with next-generation robustness and scalability, Eos provides vision systems teams with the quickest path to effective driver-assist and autonomy capabilities.