AI Interaction Breakthrough: Skywork AI Launches Matrix-Game 3.0, Enabling Real-Time HD “World Generation” at 720p and 40 FPS

14.Apr.2026 06:333 min read

Skywork AI has launched Matrix-Game 3.0, enabling real-time video generation at 40 frames per second (fps) with 720p resolution and resolving the long-standing “long-term memory” deficiency in AI video. This system generates highly spatiotemporally consistent, interactive worlds through a camera-perception memory mechanism and a large-scale data engine.

AI Interaction Breakthrough: Skywork AI Launches Matrix-Game 3.0, Enabling Real-Time HD “World Generation” at 720p and 40 FPS

The Skywork AI team has released a new technical report announcing a major breakthrough in interactive world models. Its latest system, Matrix-Game 3.0, is the first to achieve real-time video generation at 720p HD resolution and 40 frames per second (FPS), while successfully addressing the long-standing “long-term memory” limitation in AI video generation.

AI 交互新突破:Skywork AI 发布 Matrix-Game 3.0,实现 720p 40 帧实时高清“世界生成”

Core Breakthrough: Solving AI Video’s “Amnesia” Problem

For years, AI video generation models have struggled with long interactive sequences, often suffering from spatial inconsistencies or style drift due to ineffective memory mechanisms. Matrix-Game 3.0 overcomes this bottleneck by introducing a camera-aware memory retrieval mechanism.

The system precisely retrieves historical frames based on the current camera pose and employs a unified self-attention architecture to jointly model long-term memory, recent history, and the current predicted frame within a shared space. Experiments show that even during complex interactions lasting several minutes, the model maintains strong spatiotemporal consistency—ensuring that when users revisit previously generated locations, scene details closely match the original renderings.

Industrial-Scale Data Engine: Massive 3A Game Data Integration

To enhance the model’s understanding of real-world physics and logic, the team built a large-scale “data factory” drawing from both synthetic and real-world sources:

  • Synchronized Virtual Generation: Powered by Unreal Engine 5 (UE5), the Unreal-Gen platform can automatically generate cinematic interactive videos with over 100 million character combinations.

  • Automated 3A Game Capture: Supports large-scale automated recording of high-quality interactive data from blockbuster titles such as Grand Theft Auto V and Cyberpunk 2077.

  • Multi-Dimensional Real-World Supplementation: Integrates more than 10,000 real-world 4K video sequences, covering indoor environments, urban scenes, and aerial footage.

Matrix-Game 3.0 系统演示画面

Performance Optimization: Achieving Ultra-Fast Response Through Model Streamlining

To meet the strict low-latency requirements of real-time interaction, Matrix-Game 3.0 has undergone extensive optimization at the inference architecture level:

  • Adopts a multi-stage autoregressive distillation strategy to improve inference efficiency;

  • Introduces VAE decoder pruning technology with a pruning rate of up to 75%, increasing decoding speed by more than five times;

  • Combines INT8 quantization to further reduce computational overhead.

Even at a 5B parameter scale, the system delivers smooth performance while balancing visual quality and real-time responsiveness.

Future Vision: Toward an “Infinitely Generative” Digital Universe

In addition to the 5B version, the team also showcased a 28B-parameter Mixture-of-Experts (MoE) model. As model scale increases, the system demonstrates stronger capabilities in dynamic simulation, scene transitions, and generalization.

Industry observers believe that Matrix-Game 3.0 provides a critical technical foundation for robotics training, XR (extended reality), and next-generation immersive entertainment—marking a shift in AI from “generating clips” to “building fully interactive worlds in real time.”

Paper link: https://arxiv.org/pdf/2604.08995