Tesla’s Director of AI, Andrej Karpathy, took to the stage at TRAIN AI 2018 and then proceeded to unpack the company’s approach to building its Autopilot computer vision solution. His talk was titled, “Building the Software 2.0 Stack.”
Andrej took on the task of delineating traditional rule-based programming methods from the programming methods used when a neural network — also known as machine learning or artificial intelligence — runs the show. In typical internet lingo, he dubs neural net programming software 2.0, with rule-based programming taking up the software 1.0 moniker.
It turns out that the differences are considerable and programming a neural net is very different from programming a webpage or smartphone app. This has become increasingly evident in recent years as computer vision has struggled to define rules for every possible object in an image that could be identified. Being hard did not stop programmers from trying and even executing extremely complex computer vision analysis.
Early learnings from 1990–2010 in the analysis of photos laid the foundation for the modern focus on video image analysis, which, with their higher frame rates, put significantly more strain on computer resources. Applications like Tesla’s Autopilot require that all processing to be real-time, even using real-time data to predict what nearby drivers will do or might do, in order to mitigate the impact. [Editor’s note: This heavy use of computer processing for high-quality autonomous driving is a key reason George Hotz, aka geohot, has stated that Tesla is head and shoulders above conventional automakers. See: “Geohot: Tesla Autopilot = Apple iOS, Comma.ai = Android (CleanTechnica Exclusive)” for more on that.]
Tesla’s Autopilot solution relies heavily on computer vision, rather than lidar and other sensors, as Tesla’s team believes that it is fundamentally superior and that a robust array of cameras is more than sufficient to support a full self-driving solution.
Andrej kicks things into high gear in minute 15 when he digs into the approach Tesla’s team used to cracking the computer vision nut for Autopilot. Tesla’s Autopilot programming team is broken into two major groups. The first team builds the neural net itself, while the second group focuses on the actual programming of the neural net, which consists of selecting a set of labeled images that the neural net will learn from.
Just as programming code had to be efficient and effective, Andrej notes that the images used to program the neural net must be large, varied, and clean. Programming a neural net is much more about identifying the abnormalities and programming the software 2.0 stack for the proper behavior than it is about programming the system for normal situations.
An easy way to think about programming a neural net with images is the traffic signals at intersections. Most have the standard red-yellow-green stack and can be modeled by providing images of a red light and labeling that as the signal indicating the vehicle should stop. Conversely, a green light indicates that the vehicle can continue through the intersection. Yellow is an equally important indicator but appears much less frequently than its red and green bedfellows. The neural net must be programmed to understand all three equally well, even though the frequency of yellow lights is much lower than green and red in the real world.
Fundamentally, Tesla believes its Autopilot solution will deliver a much safer driving experience while on the road with cars operated by humans. That’s meaningful and important today but only hints at the broader possibility of a vehicle that can drive itself in any situation on the road, anywhere in the world. Tesla’s self-driving cars deliver a 4× reduction in fatalities today and CEO Elon Musk believes he can deliver at least a 10× improvement vs human drivers in the future.
Andrej noted that Tesla has the largest deployment of robots in the world, with 250,000 on the road today with varying degrees of autonomous driving capability due to the hardware each has onboard. Tesla has not achieved “Full Self-Driving” today, but it is so confident that it will be able to get there that it is already selling Full Self-Driving as an option on new Model S, X, and 3 orders.
Andrej’s full 30 minute talk is worth watching for the data geeks out there who want to stay up to date on the evolution of self-driving vehicles and computer vision … or you can skip to minute 15 for the update on Tesla’s approach to computer vision for its Autopilot solution.
Check out the video of his talk on Vimeo, as it cannot be embedded due to its security settings.