Elon Has Been Dropping Hints About Real-World AI — Here’s What That Could Mean
For a few years now, I’ve had a question written down that I told myself when we get to interview Elon (available for you any time, btw 😉), we should ask him. Where is the robot that builds the robot?
This article will explain that question and our recently renewed interest in this topic. For a quick introduction, the headline question is based on the combination of a few very specific puzzle pieces. The first is the methodologies Tesla uses to solve the problem of self-driving, the second is Elon’s ambitions for the machine that builds the machine, the third is his desire to get humanity to Mars, and our renewed interest comes from Elon’s recent subtle hints about real-world AI in a couple of his tweets.
First, let’s start with Autopilot. As Elon has described it multiple times, a human is just two cameras (the eyes) placed on a gimbal (the neck) with an on-board computer (the brain), and we are capable of driving a vehicle with so few accidents that insurance can cover the difference and makes this complex world of transportation that we have today possible. A Tesla vehicle has cameras and multiple other sensors on all sides and should theoretically be able to do the same job way better than 2 cameras on a gimbal. The biggest problem, however, is that AI just hasn’t been able to master an understanding of the real world surrounding it. For AI, this is one of the hardest problems out there and a solution to it could completely change the world in many more ways than just fully autonomous driving, and that is what Elon has recently been hinting at.
Tesla’s approach is the key
Rather than training an AI to think like a person or a driver, competitors like Waymo train an AI by first driving it down a road many many times, a process by which they teach the car how to drive down that road. After that, it is able to do it on its own. It will generally not be able to work on streets it has not been trained on and is by comparison not able to deal with unexpected situations. In some ways, the technologies and unique methodologies for training AI that Tesla is developing is what matters more than anything. Most journalists will be compelled to point out the amount of data a neural net was trained on, but in reality, when you have as many vehicles on the road as Tesla has, you don’t save all that driving footage — you just can’t.
As Andrej Karpathy has pointed out in numerous presentations, the Autopilot team chooses an area to improve upon, like STOP signs, and then they task all vehicles with sending them clips of what they think might be stop signs. That data is collected for a while, trained upon, and in the end deleted. The team then moves on to the next task. If they see some very good and rare corner cases, they might save those, but that is likely a rather small percentage of what they receive. Using this strategy, step by step step, they teach the AI new things. Sort of like an assembly line, they keep addressing the weakest link.
The most important thing is to reduce the amount of time needed to train these specific aspects and situations the vehicles will encounter. When Dojo eventually comes online, that again will be kicked up a notch, but a supercomputer alone won’t make as much of an impact without an automated improvement process.
To paraphrase the words of Andrej Karpathy and Elon Musk, they are trying to automate as much of the process as possible and that is precisely why the core Autopilot team only has 10–20 people. Coincidentally, when CleanTechnica was on tour at the Tesla factory in Fremont, from about 10 meters away, we saw Elon meeting with that team of approximately 15 people.
The more it’s automated, the faster the learning time, the faster the rate at which the AI improves. There are two important constraints to this problem. Those are the number of cars on roads that can collect data and, with a large enough fleet you get the next problem, processing power. Below are 3 examples of what this does or could look like. Do you collect enough footage of STOP signs that:
Non-Dojo:
- The footage can be processed by the computer within a reasonable amount of time, improve the reliability
a few percentage pointsa reasonable amount to where it’s no longer the weakest link, and move on to the next?
Dojo: