For people working in software development, like I used to do, the 80–20 rule is infamous. It states that about 80% of the job is done in the first 20% of the time. The last 20% of the development will take about 80% of the time.
Knowing it and explaining it to your manager are two very different things, and it often results in software that is only 85% finished being released to market.
Training Summon & Navigate on Autopilot
Tesla “Full Self Driving” (FSD) is released piecemeal, one functionality (or part of a functionality) at a time. Summon is a good example of this. Every new release is a bit more capable. For something as complex as FSD, this a good approach. This way, every new functionality will receive the attention and testing it needs. Summon is also a good example of the 80–20 rule. People start to ask why it takes so long to finish it, since it looks nearly ready. Tales from the early adopters (beta testers) are that the corner cases are biting them in the heals.
Those corner cases look small, but are large. The framework for Summon looks ready. It performs well on an empty street or in a parking lot. Only all those small exceptions have to be solved and learned. It is not only that exceptions are often more complex than standard, normal situations — you also have to recognize and understand when the apply. The pesky thing is that they happen less often, a whole lot less often. For training a neural net, you do need examples — the more, the better. For complex situations, you need even more examples. Getting the necessary examples for all corner cases is the challenge Tesla is now facing with Summon. Hopefully the neural net will learn how to handle curbs, or pedestrians walking in the middle of the road, not only as part of the sub-function “Summon,” but as a generic capability.
Maneuvering a parking lot is considered one of the more complex tasks of driving. That is why we do it so slowly and are constantly watching all around us. In normal city driving, we use mostly the same capabilities, only at a higher speed in less complex surroundings. Luckily, a computer can watch 360 degrees around us at the same time, and it has no problem with speed or distractions. At least, as long as we stay within the processor’s parameters.
The other main function that is released in little increments is Navigate on Autopilot (NoA), driving in what is probably the simplest situation. All cars are going comparable speeds in the same directions, there’s no oncoming traffic, no crossing traffic, no parked cars.
Ooops, there might be an accident, road maintenance, a flat tire on the shoulder, etc.
It is simple until it is not simple. Again, the exceptions, corner cases, interference, situations that are not supposed to or even allowed to happen, but they still have to be dealt with. Again, there is a need for more examples to train the neural net, but there are few examples because, well, they are exceptions.
Divide & Conquer
The way to solve a large and complex problem such as autonomous driving is to nibble at the side of the problem and solve small pieces. Rinse and repeat. Do this until there is nothing left of the large, complex, and unyielding problem.
We can see this approach with NoA and Summon, but the improvements extend beyond those scenarios. When the solutions for corner cases in these well-defined areas are implemented as generic solutions, they will also be usable in the myriad situations encountered between parking lots and highways.
Seeing (recognizing) the surroundings is also attacked in many smaller tasks. Learning road signs, understanding traffic lights, following traffic instructions from a cop or road worker, the difference between static and potentially moving objects (cars can be in both categories). It all has to be learned.
Making a Machine Intelligent
The first attempts at artificial intelligence were programs with a lot of IF … THEN … type of statements. This was followed by programs that were “rule based,” meaning that condition-action pairs were in files that were scanned looking for the right set of actions to take.
The discovery of something called neural networks, because the were inspired by the structure of the brain, held great promises. Small demonstration programs were brilliant, but a bit bigger they became impossible to program — until someone had an Aha-erlebnis and found a way to train while making things more complex at the same time. This is called “deep learning.”
Now all present-day artificial intelligence uses neural networks (NN) capable of deep learning. What an NN is and what consists of deep learning (called training) is a very large topic for another day. Suffice to say that the result is a program that produces the right result or action when confronted with a lot of input without us humans actually knowing how the program came to that conclusion.
The training is done by feeding the NN a large set of condition-action pairs, like the rule-based data, but with an extra indicator. The rule is “right” or the rule is “wrong.”
NN training is a terrain where I am out of my depth. It is pure magic to me. It is very different from normal algorithmic software development where the programmer tells the program what to do. I just don’t know enough about this topic. How will an NN use the bits of knowledge from parking lot maneuvering and highway driving when it is driving on the roads between those places? Can it generalize enough or use solutions of likely situations? If the NN can fill in the blanks when encountering something new, like most humans can, the rules change. The more it knows, the easier it can fill in the blanks about what it does not know.
Are We At 80% Yet?
Looking at what the FSD software of Tesla can now do, Navigate on Autopilot and Summon, and what it still has to learn, it looks like Tesla has not quite reached 80% of necessary capabilities. Maybe I am wrong, but that is how it looks.
What I have not seen much of is recognizing what objects are non-static and predicting the behavior of those objects, which is probably the most difficult to learn. This involves human behavior that needs to be abstracted to possible actions in the traffic environment. What behavioral actions are predictive of which traffic actions and how do they have to be anticipated and resolved.
This is clearly in the 20%. The car can drive every road, as long as it does not encounter an unpredictable human, but there are about 8 billion of those bipeds potentially making the roads unsafe for self-driving cars.
Nevertheless, I don’t think the >20% we still have to go will take more than 80% of the total time of this project. It will not take 4 times the time we have already spent.
I expect the same type of acceleration of development as we have seen from the Human Genome Project. The software did get better, the computers got faster, the production of knowledge grew exponentially.