This actually isn’t breaking news — I’ve been meaning to write about it for weeks, ever since I discovered it on NVIDIA’s blog. It also echoes what Danny Shapiro, NVIDIA’s Vice President of Automotive, told me in a podcast interview in February. The short story is that NVIDIA GPUs power the supercomputers that Tesla is using to train its neural networks for better and better autonomous driving.
In the blog, Shapiro noted that Andrej Karpathy, Tesla’s senior director of AI, gave a presentation on June 20 about Tesla’s autonomous driving work. Shapiro notes that, in the presentation, Karpathy “unveiled the in-house supercomputer the automaker is using to train deep neural networks for Autopilot and self-driving capabilities.” Karpathy says that it is possibly the 5th most powerful supercomputer in the world.
NVIDIA was happy to use the opportunity to point out that it powers that #5 supercomputer (and I would not be surprised if it has its GPUs inside of some of the other 4 supercomputers). “The cluster uses 720 nodes of 8x NVIDIA A100 Tensor Core GPUs (5,760 GPUs total) to achieve an industry-leading 1.8 exaflops of performance.”
If you follow the Tesla FSD Beta testers closely, you can see that an enormous amount of progress has been made on Tesla’s FSD firmware, with the progress increasingly coming from Tesla AI teaching itself — well, presumably that’s the case, but we’ll hear more on this later today during Tesla’s AI Day presentation. In the meantime, here are some more words from NVIDIA on why Tesla is using its hardware in its superdojo:
“NVIDIA A100 GPUs deliver acceleration at every scale to power the world’s highest-performing data centers. Powered by the NVIDIA Ampere Architecture, the A100 GPU provides up to 20x higher performance over the prior generation and can be partitioned into seven GPU instances to dynamically adjust to shifting demands.”
Shapiro further summarizes how Tesla AI is working to improve autonomous driving in Tesla cars, and I don’t think I can do a better job than him, so here’s more on that:
“Tesla’s cyclical development begins in the car. A deep neural network running in ‘shadow mode’ quietly perceives and makes predictions while the car is driving without actually controlling the vehicle.
“These predictions are recorded, and any mistakes or misidentifications are logged. Tesla engineers then use these instances to create a training dataset of difficult and diverse scenarios to refine the DNN.
“The result is a collection of roughly 1 million 10-second clips recorded at 36 frames per second, totaling a whopping 1.5 petabytes of data. The DNN is then run through these scenarios in the data center over and over until it operates without a mistake. Finally, it’s sent back to the vehicle and begins the process again.
“Karpathy said training a DNN in this manner and on such a large amount of data requires ‘a huge amount of compute,’ which led Tesla to build and deploy the current generation supercomputer with high-performance A100 GPUs.
You can gather more from the Karpathy presentation above, but a final highlight is that about 20 engineers work together on each neural net team. For much more on the Tesla Autopilot team in general, based on a chat I had with Technoking Elon Musk last year, see this piece: “Tesla Autopilot Innovation Comes From Team Of ~300 Jedi Engineers — Interview With Elon Musk.”
And here’s the chat I had with Danny Shapiro earlier this year: