XPENG Unveils The “World Model Accelerator” X-Cache, Which Requires No Training, Is Plug-And-Play, And Boosts Inference Speed By 2.7 Times

May 7, 20263 months Press Release 0 Comments

Support CleanTechnica's work through a Substack subscription, on Patreon, or on Stripe. Help us produce all of the high-quality, original content we publish week after week despite the challenges of content-scraping AI, antisocial media, inflation, and other hurdles.

Guangzhou, May 6, 2026 — XPENG (NYSE: XPEV, HKEX: 9868), a leading China-based high-tech company, previously released the X-World technical report and demonstrated the practical value of this technology in XPENG’s autonomous driving. Recently, XPENG once again announced advancements in world model technology, the X-Cache technical report.

X-Cache leverages the continuity of the physical world to identify reusable image regions while ensuring safety, thereby reducing redundant computations. It can be directly applied to world models in a fast and lightweight manner (without requiring retraining), achieving up to 2.7 times faster denoising inference acceleration for world models. This significantly enhances efficiency and reduces resource consumption.

Reductive yet Reliable, Exploiting Physical Continuity for Cross-Segment Feature Reuse

As autonomous driving enters the model-driven era, high-fidelity simulation of the real world has become a cornerstone for the continuous evolution of driving models. While autoregressive video diffusion-based world models offer high-fidelity, multi-view video generation capabilities, their inference cost and latency remain bottlenecks constraining real-time interaction and large-scale deployment.

XPENG employs fewer steps to refine visuals that closely mirror the real world (a technique known as few-step distillation). However, in this context, traditional acceleration methods, which identify similarities between denoising steps to enable skipping, fail to resolve the issue of slow inference.

The core insight behind X-Cache stems from a physical fact: autonomous driving footage is continuous and evolves smoothly. During driving, elements such as the road surface, roadside trees, and distant buildings change little between the previous frame and the next. Consequently, X-Cache partitions the video into temporally continuous “segments” and compares the intermediate feature similarity within the same layer and at the same denoising step across adjacent segments. If the variation is minimal, previously computed intermediate results are directly reused, and the entire layer computation is skipped. This constitutes the cross-segment caching logic of X-Cache.

In essence, rather than relying on the “step” dimension, where redundancy is already eliminated by few-step distillation, X-Cache optimizes along the novel dimension of “continuous generated segments.
Overall architecture of X-Cache

To ensure the accuracy of cross-segment reuse, X-Cache generates a “fingerprint”: it incorporates driving actions (e.g., aggressive steering) alongside visual structure to assess whether current road conditions resemble recent ones, enabling more intelligent reuse. Concurrently, X-Cache features a “safety mechanism” that triggers full computation at critical moments of scene transition, such as turning, lane changing, or traffic light switching (KV update frames), to prevent visual corruption caused by error accumulation.

Consequently, X-Cache significantly enhances the inference efficiency of world models without sacrificing generation quality, offering a viable solution for applications requiring high concurrency and high-frequency invocation.

An Intelligent, Plug-and-Play Utility for Lossless World Model Acceleration

X-Cache is a training-free control logic with cache contents refreshed in real time during generation; its overhead remains manageable compared to the parameter count of the model itself.

Unlike solutions that remain confined to the experimental stage, this intelligent utility has been successfully deployed in XPENG’s autonomous driving world model, X-World, operating stably across diverse complex scenarios such as urban roads and highways. By enabling cross-segment computation reuse, X-Cache achieves high compute utilization and inference acceleration, while ensuring generation quality and system stability through multiple mechanisms—demonstrating engineering reliability suitable for large-scale deployment.

Visual Comparison on Urban Expressways: Baseline Model vs. X-CacheVisual Comparison on Turning Scenarios: Baseline Model vs. X-Cache

X-Cache achieves a 71% block skip rate and delivers 2.6–2.7× measured inference speedup, with virtually no loss in visual quality.

As a physics-oriented simulation engine, X-World constructs inferable and interactive virtual environments, serving as the core infrastructure for model training and continuous evolution. Building on this foundation, X-Cache further addresses efficiency and cost challenges in large-scale simulation, endowing high-quality simulation with the engineering capability to be “runnable, fast-running, and cost-controllable.” Supported by this architecture, the performance ceiling of XPENG VLA 2.0 is significantly elevated.

In summary:

The XPENG VLA 2.0 handles perception and decision-making, acting as the user-facing output of capabilities.
X-World undertakes virtual-real mapping and scenario inference, serving as the core support for system evolution.
X-Cache provides efficient inference, functioning as the acceleration engine powering large-scale simulation.

Through this architecture, XPENG realizes closed-loop capabilities spanning data acquisition, model training, simulation verification, and continuous iteration, propelling autonomous driving from optimizing isolated capabilities toward a model-driven, full-stack closed-loop iteration.

New Breakthrough in Compute Infrastructure, Empowering Scalable Deployment and Ecosystem Expansion

From the debut of X-World to the development of X-Cache, XPENG has rapidly progressed from “constructing high-quality simulated worlds” to “efficiently utilizing simulated worlds.” This transcends mere inference acceleration; it empowers low-cost, high-concurrency closed-loop simulation to become a scalable, operational capability.

X-Cache demonstrates that in the era of Physical AI, the competitive focus extends beyond peak compete to exploring how prior knowledge of the physical world can maximize the value of every unit of compute—ensuring that every calculation advances the exploration of the “unknown.”

Notably, X-Cache targets few-step autoregressive interactive simulation and can be directly extended to embodied AI and world models of similar architectures. It fulfills industrial-grade requirements such as autonomous driving closed-loop testing, online reinforcement learning, and low-compute chip deployment. Furthermore, it provides a reusable computational paradigm and ecological cornerstone for embodied AI, robotic simulation, and broader physical world interaction.

Looking ahead, XPENG will continue to explore more technological breakthroughs in the field of autonomous driving, enabling XPENG smart driving to train harder in the digital world and drive more steadily in the real world.

For more information, please refer to the full technical report and the official websites:

Paper Title: X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
Paper Link: https://arxiv.org/pdf/2604.20289
X-Cache Official Website: https://x-cache-1.github.io/
X-World Official Website: https://x-world-1.github.io/

About XPENG

Founded in 2014, XPENG is a leading Chinese AI-driven mobility company that designs, develops, manufactures, and markets Smart EVs, catering to a growing base of tech-savvy consumers. With the rapid advancement of AI, XPENG aspires to become a global leader in AI mobility, with a mission to drive the Smart EV revolution through cutting-edge technology, shaping the future of mobility.

To enhance the customer experience, XPENG develops its full-stack advanced driver-assistance system (ADAS) technology and intelligent in-car operating system in-house, along with core vehicle systems such as the powertrain and electrical/electronic architecture (EEA). Headquartered in Guangzhou, China, XPENG also operates key offices in Beijing, Shanghai, Silicon Valley, and Amsterdam. Its Smart EVs are primarily manufactured at its facilities in Zhaoqing and Guangzhou, Guangdong province.
XPENG is listed on the New York Stock Exchange (NYSE: XPEV) and Hong Kong Exchange (HKEX: 9868).

For more information, please visit https://www.xpeng.com/.

Sign up for CleanTechnica's Weekly Substack for Zach and Scott's in-depth analyses and high level summaries, sign up for our daily newsletter, and follow us on Google News!

Have a tip for CleanTechnica? Want to advertise? Want to suggest a guest for our CleanTech Talk podcast? Contact us here.

Sign up for our daily newsletter for 15 new cleantech stories a day. Or sign up for our weekly one on top stories of the week if daily is too frequent.

CleanTechnica uses affiliate links. See our policy here.

CleanTechnica's Comment Policy

Reductive yet Reliable, Exploiting Physical Continuity for Cross-Segment Feature Reuse

An Intelligent, Plug-and-Play Utility for Lossless World Model Acceleration

New Breakthrough in Compute Infrastructure, Empowering Scalable Deployment and Ecosystem Expansion

Share this story!

Press Release