Episodes from AI Engineer about AI Infrastructure & Standards.

Real Time Video Diffusion on a Single GPU — Ziv Ilan, Nvidia
Jun 16, 2026 · 18:46
Nvidia's Ziv Ilan explains how combining quantization, caching, and step distillation enables near real-time video diffusion on a single Blackwell B200 GPU. Working with Black Forest Labs on Flux 2, dynamic quantization reduces memory and compute, caching skips redundant denoising steps, and distillation cuts steps from fifty to as few as one. The open-source FastGen repo packages these post-training and sharding techniques, achieving 10–200x speedups for real-time generation.

Why MCP and ChatGPT Apps Use Double Iframes — Frédéric Barthelet, Alpic
Jun 15, 2026 · 20:11
Frédéric Barthelet, CTO of Alpic, explains why ChatGPT and other MCP hosts render third-party app UI inside a double iframe. He traces how simpler approaches fail: `srcdoc` shares the parent origin, letting CSP block scripts and risking data access; sandboxing removes origin storage; and `allow-same-origin` recreates the escape. The resulting double iframe—an outer iframe from a controlled subdomain loading app HTML via `srcdoc` into an inner frame—ensures isolation and prevents cross-app storage collisions. Barthelet warns developers must declare every external domain their view uses in MCP app metadata or face submission rejection, and demos Skybridge's CSP inspector that diffs declared domains against actual network calls.

The Complete Guide to WebMCP — Tara Agyemang, Google Chrome
Jun 11, 2026 · 21:34
Tara Agyemang from the Google Chrome team introduces WebMCP, a proposed web standard that replaces brittle DOM scraping with structured tools for AI agents. She explains two implementation paths: the declarative API (adding HTML attributes to forms) and the imperative API (registering custom JavaScript tools). A live demo shows a concert ticket purchase completed in three tool calls: search, open page, purchase. WebMCP is in early preview on Chrome 146, with an eval CLI and inspector extension available for testing.

Your Attention Is the Bottleneck, Not Your Agents — Zack Proser, WorkOS
Jun 11, 2026 · 25:17
Zack Proser from WorkOS argues that human attention, not agent speed, is the real bottleneck in AI-assisted coding. He proposes a sustainable stack: signal layers to filter Slack and Linear, voice-first flows at 184 wpm, remote control of agents from a phone to leverage diffuse thinking, and weekly self-improvement passes over JSONL conversation history. He also integrates an Oura ring via MCP so Claude can nudge him about sleep, emphasizing balance over burnout.

Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind
Jun 10, 2026 · 20:52
Gus Martins and Ian Ballantyne of Google DeepMind introduce Gemma 4, a family of open-weight models that deliver high quality per parameter, enabling deployment on a single GPU or even a phone. They argue that the models' efficiency — a 31B model rivals those twenty times larger — and the shift to Apache 2.0 licensing remove barriers for sovereign institutions like those in Ukraine, Bulgaria, and Brazil. Ian demonstrates multi-agent translation running locally on an M4 Mac, showcasing ownership and control over agentic workloads.

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind
Jun 9, 2026 · 19:34
Thor Schaeff from Google DeepMind presents the Gemini audio stack—Gemini 3 Flash Preview for deep audio understanding, Gemini 3.1 Flash Live for real-time sound-to-sound multimodal interaction, and Lyria 3 for music generation. He shows how a single API call extracts speaker labels, timestamps, emotions, language detection, and translation, and how speech generation uses a 'director's note' to modify a base voice's accent and tone. The talk culminates in a live demo where the Gemini Live model uses Lyria via tool calls to generate a German techno schlager about the UK startup scene.