Thinking Machines Unveils ‘Full Duplex’ AI Model That Listens While It Talks
Mira Murati’s startup, Thinking Machines Lab, has introduced a research preview of a “full duplex” AI model designed to process input and generate responses simultaneously, aiming to make AI conversations feel more like real-time phone calls than turn-based text exchanges.

Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati, has announced a new approach to conversational AI that could fundamentally change how humans interact with large language models. Instead of the familiar turn-based format—where users speak or type, wait for a response, and then continue—the company is developing what it calls an “interaction model” capable of listening and responding at the same time.
From Turn-Based to Real-Time AI
Today’s mainstream AI systems operate in a sequential pattern. A user provides input, the model processes it, and then generates a reply. This structure mirrors text messaging more than natural conversation. Thinking Machines aims to replace that paradigm with a “full duplex” architecture, enabling simultaneous input processing and output generation.
The company’s research preview model, dubbed TML-Interaction-Small, reportedly responds in approximately 0.40 seconds—close to the latency typical in human speech. According to Thinking Machines, this makes it significantly faster than comparable offerings from major AI providers, though independent benchmarking has yet to validate those claims.
Why Full Duplex Matters
In human dialogue, participants frequently interrupt, interject, or adjust their speech mid-sentence in response to subtle cues. Current AI voice assistants struggle with this dynamic flow because they must wait for input to conclude before generating output. A native full duplex model could enable smoother back-and-forth exchanges, more natural interruptions, and adaptive responses that evolve as a user continues speaking.
If successful, this shift could have implications beyond consumer chatbots. Real-time conversational AI is critical for applications such as virtual assistants, customer service automation, accessibility tools, tutoring systems, and collaborative work environments. Lower latency and simultaneous processing could reduce friction and make AI feel less mechanical.
Still a Research Preview
Despite the technical claims, Thinking Machines’ announcement remains firmly in the research phase. The company has not released the model publicly. A limited research preview is expected in the coming months, with a broader rollout anticipated later this year.
That means key questions remain unanswered: How well does the model handle noisy or overlapping speech? Can it maintain accuracy while generating responses in real time? And will the real-world user experience match the company’s performance benchmarks?
A Strategic Signal from a High-Profile Startup
The announcement is notable not only for its technical ambition but also for its source. Murati, who previously helped lead the development of ChatGPT and other OpenAI products, founded Thinking Machines Lab in 2025. The startup’s early focus on interaction-native AI suggests it may be targeting a foundational shift in how models are architected, rather than competing purely on model size or benchmark scores.
Whether full duplex interaction becomes the next standard for conversational AI will depend on execution, scalability, and developer adoption. But the move signals a growing recognition across the industry: making AI smarter may not be enough—making it feel more human could be just as important.