Why Autonomous Vehicles Need Billions of Miles Before We Can Trust the Trend Lines
Jonathan Slotkin’s recent analysis of Waymo’s safety results in an oped in the New York Times has sparked a new round of discussion about how society should think about autonomous vehicles. His framing is direct. He argues that the data looks like the kind of early clinical trial result that convinces researchers to stop a study because the treatment works. After roughly 100 million rider-only miles, Waymo’s autonomous system shows far fewer airbag deployments, far fewer injuries, and far fewer serious crashes compared to human drivers. Slotkin’s point is that the raw pattern is already visible. The question is not whether autonomy can work. The question is what pace of deployment we choose as a society. That position is compelling. It is also a starting point for a deeper examination of what high quality safety evidence looks like and how to build it.
The discussion is playing out in CleanTechnica‘s webpages. Fellow authors Steven Hanley and Zachary Shahan both wrote pieces. It was clear that the law of small numbers wasn’t clearly on their minds. I’m not remotely a statistician, but I am painfully aware of how bad humans are at statistics unless they are forcing themselves deep into Kahneman and Tversky’s System Two thinking, so thought it appropriate to add to the conversation. I also have a reasonable background in public health, having helped build the world’s most sophisticated communicable disease, outbreak, and vaccine management system in the aftermath of SARS, one used across Canada and in the Middle East during COVID. It’s worth framing the discussion to ensure we don’t lose site of the baby when checking the bathwater for impurities.
The Gates Foundation’s experience with its small-schools initiative is a clear example of how the law of small numbers can mislead even well funded organizations. The foundation saw that the highest performing schools in several districts were often smaller ones, and concluded that reducing school size would reliably improve outcomes. What they missed was that very small populations naturally generate more volatile results, both very high and very low. The pattern they interpreted as success was largely statistical noise. When the model scaled, the expected gains did not materialize. The lesson is straightforward. If the underlying sample is too small, even strong looking results can collapse when exposed to broader and more stable datasets.
The simple truth that underlies the whole discussion is that road travel harms people at a staggering scale. Every country deals with the same basic reality. Vehicle collisions kill large numbers of people every year. They injure many more. They impose costs on families, hospitals, insurers, and governments. When we step back and view road crashes at a national level, the results dwarf most public health risks that receive far more attention. Human drivers, even careful ones, make errors. They get distracted. They misjudge speed and distance. They operate when tired. They break rules for convenience. Autonomous systems do not get tired and do not text while driving. If they can match human performance and slowly rise above it, safety outcomes will shift. If they can exceed human performance by a margin similar to the early Waymo results, the shift would be significant for public health and the economy. The baseline is tragic and expensive. Any consistent improvement on it matters.
The Waymo dataset is the main source of credible evidence so far. The company has reported rider-only miles that are large enough for preliminary comparisons with human crash rates. The reductions in airbag deployments and injury events look meaningful. Independent researchers reviewing the data have come to similar conclusions. Even with the small number of serious injury cases, the reductions relative to human baselines appear real. The data is built on real-world operations in several cities, not simulation runs. The transparency of the dataset matters too. Other firms still treat their safety data as a competitive asset. Waymo has at least published enough detail that comparisons are possible. When Slotkin draws his clinical trial analogy, it is because this dataset contains the minimum conditions required for an early signal. There is a clear denominator of miles traveled and a defined set of comparable outcomes.
The difficult part is that serious crashes are rare events even in human driving. Fatalities occur roughly once every hundred million miles. Waymo’s dataset is only 96 million miles with passengers, so for the same amount of driving, we should expect zero, one, or many fatalities — because it’s far too small a sample size to get a statistically valid result. Serious injuries occur more often, but still at low frequencies. That is why the law of small numbers matters for autonomous vehicle evaluation. Early data can reflect real improvements, but randomness still plays a role when outcomes are rare. A small number of avoided crashes or unfortunate incidents can shift apparent rates. It takes very large and diverse datasets to overcome that statistical noise. This is not a criticism of the Waymo data. It is simply a feature of the underlying safety problem. Rare events require a high volume of exposure to establish strong confidence. Readers often underestimate how many miles are needed to produce robust statistical conclusions. For autonomy to be proven safe across many environments, the dataset will need to reach into the billions of miles and include a broader range of contexts.
I agree with Slotkin on the core point. Autonomous systems do not need to be perfect. They need to be measurably better than human drivers. If an autonomous system can reduce injuries or fatalities by even a modest percentage at scale, the public health benefit would be meaningful. Every avoided collision is a person who does not need to go to the hospital or a family that is not facing the aftermath of a serious incident. The Waymo results so far suggest that machines can reach that level. They show a pattern of improvement that aligns with the idea that a system without distraction or fatigue can outperform the average human. People tend to forget that the human baseline is not high. Many drivers are inexperienced or inattentive. Machines only need to clear that bar and then keep improving.
Where I diverge from Slotkin is in the level of confidence we can draw from the current dataset. The data is early. The operational design domain is limited to certain cities and certain conditions. The traffic mix is not uniform. Weather variation is limited in some deployment areas. Human driver benchmarks also vary by geography and time of day. Some types of crashes, such as interactions with pedestrians or cyclists, require large exposure sets because they depend on dense environments and complex behaviors. To claim that the clinical trial is already complete is to skip several steps in the process. The findings are promising. They point in the right direction. They do not yet represent the depth and breadth of evidence required for a technology that might one day replace human drivers across most conditions.
Experience from other clean technology transitions helps make sense of this. Solar deployments had to scale before failure modes and reliability curves were understood. Wind turbines had to reach high deployment levels before the industry learned how to address gearbox and blade issues. Battery systems had to accumulate many cycles in many environments before degradation patterns were clear. Scale reveals weaknesses. It also reveals strengths. Autonomous vehicles will face the same process. Early data gives an indication of the direction of travel. Only broad deployment resolves the unknowns.
The scaling problem becomes obvious when we look at the raw numbers. If fatalities occur at roughly one per hundred million miles, then strong confidence in a fatality rate that is lower than human drivers requires far more than one hundred million autonomous miles. It will require billions. That volume needs to be spread across cities, climates, day and night conditions, and road types. It also needs to include interactions with different kinds of road users. Without that diversity, the data risks overstating performance in one environment and understating risks in another. This is the normal path for any safety-critical system. Aviation, nuclear power, and medical devices all reached maturity through high exposure and detailed reporting. Autonomy will too.
Policy plays an important role here. If autonomous vehicles are to be evaluated fairly, society must create the conditions that allow evidence to accumulate. This means allowing controlled expansion of operations into more cities. It means creating reporting requirements that apply to every firm in the sector, not only the ones that choose transparency. It means building consistent human driving benchmarks that actually match the environments in which autonomous systems operate. Most important, it means treating autonomy as a safety intervention rather than a novelty. If the goal is fewer injuries and fewer fatalities, then the evaluation framework must support evidence gathering at scale, not constrain it with burdens that human drivers never face.
It’s also worth highlighting that driving exposes people to a level of health outcomes that is very different from the health outcomes faced by people who walk, bike or take transit. In countries with strong road safety standards, fatality rates for drivers and passengers still tend to fall in the range of several deaths per billion kilometers traveled. In the United States the rate is higher. The same comparison for rail transit, subways and buses usually shows an order of magnitude lower fatalities for the same distances traveled by passengers. Transit vehicles place trained operators behind the controls, separate people from traffic in many settings and spread risk across many passengers. When a bus or train travels one kilometer with dozens of riders, the exposure per person drops sharply. That is why transit has much lower fatality rates per passenger kilometer than private cars in almost every dataset.
Walking and cycling need a different lens. On a pure exposure basis pedestrians and cyclists suffer higher fatality rates per kilometer traveled than drivers in many countries. The difference is driven by the fact that they are completely exposed to the kinetic energy of cars and trucks where physically separated bike lanes and pathways are not present. A low speed collision that a driver might survive with minor injuries can be catastrophic for a pedestrian or cyclist. If the only measure used was the fatality rate per kilometer traveled, walking and cycling would look unsafe compared to driving. That framing is incomplete. It ignores the broader population health outcomes associated with physical activity.
Public health studies show that daily walking and cycling reduce rates of heart disease, diabetes and several other chronic conditions. These gains accumulate over a lifetime. When researchers look at the net effect on mortality, they find that the health benefits of physical activity outweigh the injury risk of walking or cycling in most built environments. Even in cities with higher collision rates, the population level mortality reductions from active travel are larger than the increases from exposure to traffic. This is why public health agencies consistently recommend shifting short trips from cars to active modes. People who walk or bike regularly tend to live longer even when controlling for age and income.
Transit sits between these worlds. It is safe on a distance basis and encourages incidental walking at each end of the trip. Riders walk to stops, transfer between platforms and climb stairways. These short, repeated bouts of activity improve cardiovascular health. At the same time transit separates most of the journey from road traffic. The combination delivers low injury risk and modest but widespread health benefits. Cities that build strong transit networks often see gains in longevity that are attributed in part to increased daily activity.
Driving does not offer these secondary health benefits. It confines people to seats, removes physical effort from the journey and contributes to sedentary lifestyles. When sedentary time climbs, rates of metabolic disease climb as well. This effect is visible in population level studies of commute modes and health outcomes. Communities that depend heavily on driving see higher rates of obesity, diabetes and heart disease. These chronic conditions shorten life expectancy more than the collision risks associated with walking or cycling. The result is that even if walking and cycling appear riskier per kilometer than driving, the net health outcome for an individual who chooses active travel is usually better.
The comparison of fatalities by distance traveled does not reflect total health effects of each mode. Driving carries collision risk and sedentary risk. Transit carries very low collision risk and modest activity benefits. Walking and cycling carry higher collision risk but large activity benefits. When public health agencies assess these modes, they consistently conclude that the net safety and health profile of transit, walking and cycling is better than that of private cars. The implications for urban design are clear. As cities shift travel from cars to transit and active modes, collision rates fall and overall health improves.
In Moving People in USA Much Harder To Decarbonize Than in Rest of World, I pointed out that the United States is a radical global outlier in personal transportation habits. In the U.S. a very high share of trips—far above levels common in Europe or Asia—are done by car because post-World War II suburban development, sprawling geography, low density and highway infrastructure have made driving nearly essential for everyday errands, commuting, shopping, and social activity. Outside the U.S., many countries support more compact settlement patterns, stronger public transit, and lower reliance on private cars. That structural context means Americans drive many more miles per person than people in most other nations, in settings often characterized by longer distances and automobile dependency. Because of that driving intensity, the U.S. exhibits much higher per-capita road fatality rates and a higher risk per mile than peer countries where driving is less dominant and road use is more balanced among transit, walking, cycling and compact urban mobility.
In earlier work I argued that autonomous vehicles are likely to increase congestion in car centric countries, not relieve it. A decade ago in Autonomous Cars Likely To Increase Congestion, I pointed out that most urban congestion comes from bottlenecks and simple overuse of road space. Adding easier, cheaper robotaxis into that system encourages more vehicle miles, more empty repositioning trips, and more induced demand, which worsens bottlenecks rather than easing them. I revisited the theme in 2024 in Tesla Robotaxi Would Cause More Gridlock In USA’s Transit-Deprived Cities, where I highlighted that in an American context with weak transit and long trip distances, fleets of cheap autonomous rides would almost certainly increase car use and gridlock, not cut it.
If autonomy increases the total time people spend as passengers in cars each day, the public health implications go beyond crash statistics. More robotaxi and car miles mean more sedentary time, fewer reasons to walk short trips, and less use of transit that builds incidental physical activity into daily life. People who might have walked ten minutes to a local shop or to a bus stop will be tempted to summon a cheap, comfortable autonomous ride instead. Over millions of people and many years, that shift in behavior reduces physical activity across the population, which raises risks of heart disease, diabetes and other chronic conditions. Even if autonomous systems cut collision rates per mile, a transport system that encourages more sedentary passenger time and fewer active trips will likely worsen overall health outcomes statistically.
Given that background, the fact that all of the publicly reported data for systems like Waymo comes from U.S.-only deployments brings both promise and a major caveat. On one hand, if an autonomous system can reduce crash rates in the U.S.—with its long driving distances, heavy reliance on cars, frequent highway and suburban driving, and cultural tendencies favoring private vehicles over transit—that is a strong real-world test under tough conditions. On the other hand, the U.S. context means the data may not generalize globally. An autonomous driving system that works well in U.S. urban and suburban driving patterns may face different risks in the denser, mixed-mode, multimodal-transport environments found in much of Europe or Asia. In short, success in the U.S. is necessary to prove viability under car-centric conditions, but it is not sufficient to demonstrate safety or benefit across vastly different transport systems worldwide.
Autonomy should be viewed as a public health technology. It has the potential to reduce one of the largest sources of preventable injury and death in modern cities. Treating it this way changes the discussion. It shifts attention from gadget appeal to outcome improvement. It places responsibility on regulators to measure performance accurately and on developers to publish comprehensive data. It encourages cities to think about integration with transit systems and urban planning that favors walking and biking, with integrated protected paths for both that rival those set aside for automobiles, whether humans or computers are at the wheel. It also provides a clearer answer to the question of why autonomy matters at all. The purpose is not convenience. The purpose is safety and economic efficiency.
The current moment calls for cautious optimism. The Waymo signals are promising and point toward a future where machines outperform humans on the road. Slotkin’s enthusiasm is grounded in real data, even if that data is still narrow. The law of small numbers reminds us that early results need to be expanded before they can be considered definitive. The path forward is clear. We need more miles, more transparency and more diversity of operating conditions. If we build the right evidence base, we will be able to decide with confidence how and where autonomous systems should be deployed. And we need to not lose sight of the much stronger health value and lower societal costs of walking, biking and transit. Autonomous cars are a much greater benefit in the urban sprawl of the USA, one that precludes most from walking, biking or taking transit, than in most of the rest of the world.
Sign up for CleanTechnica's Weekly Substack for Zach and Scott's in-depth analyses and high level summaries, sign up for our daily newsletter, and follow us on Google News!
Have a tip for CleanTechnica? Want to advertise? Want to suggest a guest for our CleanTech Talk podcast? Contact us here.
Sign up for our daily newsletter for 15 new cleantech stories a day. Or sign up for our weekly one on top stories of the week if daily is too frequent.
CleanTechnica's Comment Policy
