In this article, I’m going to try to take a very technical and complicated subject — namely, the technical details of Tesla’s and others autonomous driving systems — and make it understandable for non-engineers. I may or may not succeed in that goal. Let me know in the comments how I did. If it gets too technical and you end up learning something accidentally, I’m sorry, not sorry.
I’m going to go over some history of the choices that Tesla has made, look at some things that hackers have gleaned from looking at Tesla’s V8 and V9 networks, and explain why Tesla is likely to become the first company to get autonomous driving to work in the general case (anywhere a human can drive).
I’ve read many good articles on this site that explain various aspects of the components needed to build a full self-driving car. Michael Barnard’s article on the technical differences between LIDAR, Radar, Ultrasonic, and Passive Video (cameras) gives you a great base of knowledge to start to understand the choices designers choose from.
Basically, Tesla uses 3 mature and cheap technologies to do what they do best: Radar, to see a good distance forward in all conditions (light, dark, and all weather); Ultrasonic, to measure proximity in all conditions; and Passive Video, to see long distances in all directions and recognize objects in all conditions that it is safe for humans to drive in.
Additionally, Steve Hanley’s article last year on new research that suggests that LiDAR isn’t necessary for fully autonomous driving gives you a peek into Elon Musk’s thinking on the subject in a 6 minute video.
After reading these two articles, I knew Tesla was going a different way than the rest of the industry, but I still wondered why Tesla had made that decision. When I listened to Rob Maurer’s interview of Jimmy_D this week, it helped me put the pieces together as to why Tesla has decided to go down the path it has chosen.
Jimmy first explains that a neural network is a way of making a computer program. As a software engineer for 34 years, I am well aware that artificial intelligence in general and neural networks in particular have been over promising and under delivering for years. A quote I came across that summarized their reputation in the industry was by J. S. Denker: “neural nets are the second best way to do almost anything.” This changed recently, but before I tell you about that, I need to explain how neural networks are different from regular programming and give you an example of an area traditional programming has failed to solve.
In traditional programming, the developer writes a series of instructions to tell the computer what to do. With machine learning and specifically using neural networks to do machine learning, you don’t tell the computer how to do something, you just give it a lot of data and design a system that can learn the instructions.
The example Jimmy gave in the interview is looking at a picture and determining if that picture is a dog or a cat. The picture could be any resolution. It could be black and white or color. It could be from near or far and from any angle. The animal could be any age or breed or color. The animal might be running, sitting, sleeping, or jumping. You can see that writing a traditional program where you have to have statements with “if, then, else, end” logic would either be impossible or take a large team of programmers years to develop. Yet a young child can do this easily. How? Because they learn it by examples, not by learning rules like we must do for spelling in English.
i before e,
Except after c,
Or when sounded as “a,”
As in neighbor and weigh.
This example provides the contrast between how we learn speaking a language as a child and how we learn spelling. We learn speaking (and how to recognize dogs and cats) using our “neural network,” while we learn spelling like a traditional computer program — you have to remember all the rules and exceptions to those rules.
So, now that you understand the basic difference in approaches, it’s important to know that there was a breakthrough in 2012 that changed everything. There is an annual competition called “Large Scale Visual Recognition Challenge” and it was won by large established teams of developers every year, until it wasn’t. In 2012, three grad students used a neural network based model they had developed for about 6 months and shocked the industry by winning!
Neural networks (think of the dog and cat example above) have always been superior to traditional programming at image recognition, but the problem is that if you need 100 years of processing power to train the model, it doesn’t do you any good. The innovation these students discovered is that they could use the graphics processing units (GPU) of high-end gaming cards to train their model instead of general CPUs. General purpose CPUs are very good at running a wide range of workloads and are backward compatible for 30 years of old software, but all that flexibility means they are a “jack of all trades and master of none.” GPUs are very good at math and a few other things.
This breakthrough changed how leaders in the industry solved a lot of very difficult problems, including how Google translates several difficult language pairs (Japanese to English, for example) and how many companies do speech recognition. As I reported in my first article for CleanTechnica 3 months ago, Tesla has developed a new chip that is 10 times as fast for matrix multiplication (the main operation that neural networks do).
So, why did Tesla go down this path while everyone else went down the LIDAR path? I think that can be explained by 3 factors.
1. Tesla was a small startup with little money, which encouraged the team to use sensors that are cheap (radar, ultrasonic, and cameras are all cheap). Others in the industry had or have deep pockets, so they can afford to work with expensive LIDAR systems.
2. Tesla started (or restarted) its effort after 2012 (the breakthrough) and realized the breakthrough “changed everything.”
3. Most importantly, Elon Musk realized that it is faster and cheaper to design a computer to drive like a human (using cameras for eyes) than to try to rebuild 100 years of roads around the world. You read lots of articles that say we won’t have self-driving cars until we have new roads, new 5G networks, or cheap LIDAR, but Tesla has determined that if we can just solve the image recognition problem (that the car/computer can tell a dog from a cat from a bicycle from a truck), the cars can drive like humans without waiting for new roads, networks, or LIDAR sensors.
Now that we understand a little bit about neural networks and that Tesla is developing the best hardware in the world to run them, let’s talk about what Jimmy D has discovered in Version 9 of Tesla’s software.
Version 9 uses one neural network with one set of weights to process all 8 cameras. This may be tougher to train, because each camera has a different view of the world, but Jimmy D hopes that this more general abstraction means that the neural net will have a deeper understanding of the look of objects from all views when trained this way.
Another enhancement he noticed is the net processes 3 color channels and 2 frames at a time. He speculates that the 2 frames are from different times and can be used to detect motion, which can aid in telling what is in the background and what is in the foreground of a picture. Telling the relative speed of an object helps to determine if an object is a car, bicycle, or pedestrian.
It is likely that the massive expansion of data processed by this release is approximately the limit of the V2 and V2.5 hardware installed in all cars built since October 2016 and in order to achieve further substantial gains in performance, we will have to wait for the V3 hardware expected next spring.
So, why did everyone else go down the LiDAR path? Because, in 2007, the LiDAR-equipped vehicles made history by successfully navigating through an urban environment and obeying the traffic laws.
That was 10 years ago, so why don’t we have self driving cars now? Because, as I learned from listening to Jimmy D on the podcast, navigating through an urban environment and obeying the traffic laws is child’s play. What is difficult is driving like a human so you don’t upset all the other drivers on the road. See, if you never run into anything in your self-driving car, but you cause many accidents (or even simply driver frustration) because you drive in a strange way that confuses those around you, society (government) will shut you down pretty quickly. But if you have ever worked in a large company or government agency, you will realize that once an organization has gone down one path, it is very difficult (due to the oversized egos of its executives as well as institutional inertia) to admit it’s the entirely wrong path and commit resources to another path — until competition embarrasses the organization. I think Google’s system has some chance of success, given the company’s massive financial muscle, but what really hurts the approach is its dependence on 3 things:
1. The cars can only drive in zones that have been extensively mapped and processed by Google.
2. They require LIDAR and can’t deploy millions of vehicles until LIDAR costs come down.
3. It requires a huge amount of handwritten software to get the cars to drive in ways that don’t confuse other drivers and pedestrians.
Since Google has the money to drive the first two points and might have the software talent to pull off the third point, I won’t count them out of the race. But once they get it working in one city, they won’t be able to scale it to other cities very easily, so they might win the race locally, but still lose as they attempt to scale the solution to other markets.
Another issue is adapting to construction zones and detours. Any solution that requires detailed mapping (since it can’t read signs) seems likely to fail as conditions change without warning. Using cameras like eyes and following signs for construction and detours like a human should be more flexible and adaptable. Google’s team doesn’t yet realize that it is spending all its time and money on the easy issue of not running into things and seemingly hasn’t even started on the problem of how to interpret signs that tell humans how to navigate the world in real time. The history of computing is making use of increasing computational power to replace specialized devices. Think of the 50 things the smartphone you are reading this article on has replaced. [Sort of irrelevant but sort of fun editor’s note: Depending on the article, approximately 1/2 to 4/5 of you read our work on a smartphone. I am editing this piece on a laptop, which increasingly is something I have to consider while editing!]
The way I see this battle for the self-driving future, the evidence I’ve shown explains that if Tesla is able to get its image recognition working before another competitor has a commanding lead in the market, Tesla should be able to scale its solution faster than the LIDAR approach for 2 reasons.
1. Tesla already has more vehicles on the road today with the necessary sensors.
2. Tesla’s approach doesn’t require as detailed knowledge of the roads, since it will be designed to read and react to signs rather than need to have all possible routes pre-calculated.
By 2019, we should see these two teams start to deploy more advanced implementations of their designs. It will be exciting to watch them compete. It will be important for each to learn about what is working for both their team and the other team, and sometimes react quickly. With the self-driving market expected to grow by 36% a year over the next 5 years, this is an important market to watch!
Highly related article from 2015: Tesla Has The Right Approach To Self-Driving Cars
You can use my Tesla referral link to get 6 months free Supercharging on Model S, Model X, or Model 3. You can also get a 5-year extended warranty on solar panels. Here’s the code: https://ts.la/paul92237