300+ autonomous driving models. Safe market expansion validated in weeks, not months.
The Challenge
Waymo's autonomous vehicle stack runs more than 300 machine learning models spanning perception, prediction, and planning, each trained on data from specific geographic markets, weather conditions, and sensor configurations. As Waymo expanded the Waymo One service from Phoenix to San Francisco, Los Angeles, and Austin, the safety engineering team faced a systematic governance gap: models validated for one operational domain had no formal mechanism to demonstrate readiness for another.
Perception models trained primarily on Phoenix desert road conditions showed measurable performance degradation in San Francisco's dense urban environment and fog patterns. The team identified the issue through manual testing, but the process of documenting which models had been validated for which operating domains, what conditions triggered a mandatory revalidation, and how ongoing fleet performance was monitored lived across disconnected spreadsheets and engineering wikis. NHTSA's voluntary safety reporting framework and emerging state-level AV regulations both required systematic documentation that the existing process could not produce.
The Solution
The platform was connected to Waymo's model registry, simulation pipeline, and fleet telemetry system to establish a unified governance layer across the full autonomous driving stack. Every model in production is now registered with its operational design domain: the specific geographic, environmental, and traffic conditions it has been validated for, and the platform enforces that no model is deployed to a new market without documented validation coverage for that market's conditions.
Fleet telemetry feeds continuous performance monitoring across every active Waymo One vehicle. When a perception or prediction model's behavior drifts outside its validated performance bounds in a specific operating context, the safety engineering team receives a structured alert before the issue accumulates into fleet-wide exposure. Safety engineers (not only ML researchers) access a plain-language model health dashboard that maps model status directly onto operational risk.
Operational Design Domain Enforcement
Each model in the autonomous driving stack is registered with its validated operational design domain: the specific set of geographic, environmental, sensor, and traffic conditions the model has been tested against. The governance platform enforces domain boundaries at deployment: a model validated for Phoenix suburban roads cannot be promoted to San Francisco service without documented evidence that it meets performance thresholds in dense urban, fog, and hill-gradient conditions. Market expansion decisions are now backed by a complete, auditable validation record rather than engineering judgment alone.
Fleet-Wide Drift Monitoring Across Operating Contexts
With thousands of active vehicle hours accumulating daily across four cities, fleet telemetry provides a continuous signal on how each model is performing in the real world versus its validated baseline. The monitoring layer segments performance by operating context (time of day, weather category, road type, traffic density) so drift is detected in the specific conditions where it is occurring rather than averaged across the full fleet. Safety engineers receive context-specific alerts that map directly to the operational decisions they need to make.
NHTSA Voluntary Safety Documentation as a Pipeline Output
NHTSA's Voluntary Safety Self-Assessment framework and emerging state AV permit requirements ask operators to document their safety validation methodology, ongoing monitoring approach, and incident response procedures. The governance platform generates structured safety documentation from Waymo's model registry and monitoring configuration: validation evidence, operational design domain definitions, and drift response records, composing the documentation regulators expect from the data that already exists in the engineering pipeline.
Results
Every perception, prediction, and planning model across the Waymo One stack is registered with its validated operational design domain.
San Francisco, Phoenix, Los Angeles, and Austin each have a complete market-specific validation record before commercial launch.
Continuous telemetry monitoring across every active Waymo One vehicle detects model degradation by operating context before it accumulates into fleet-wide risk.
Structured operational design domain enforcement replaced ad-hoc testing processes, cutting time-to-launch documentation from months to weeks.
See what Thndr AI can do for your team
Talk to our team about your specific AI governance challenges.


