A company discussed on AI Engineer.

Real Time Video Diffusion on a Single GPU — Ziv Ilan, Nvidia
Jun 16, 2026 · 18:46
Nvidia's Ziv Ilan explains how combining quantization, caching, and step distillation enables near real-time video diffusion on a single Blackwell B200 GPU. Working with Black Forest Labs on Flux 2, dynamic quantization reduces memory and compute, caching skips redundant denoising steps, and distillation cuts steps from fifty to as few as one. The open-source FastGen repo packages these post-training and sharding techniques, achieving 10–200x speedups for real-time generation.

Stop Making Models Bigger, Make Them Behave — Kobie Crawford, Snorkel
Jun 10, 2026 · 20:56
Kobie Crawford of Snorkel explains how a 4B parameter model fine-tuned via RL for under $500 outperformed Qwen 3 235B on financial analysis tool use. The key was training tool discipline—inspecting schemas and self-correcting errors—not deeper reasoning. Single-table training alone boosted multi-table FinQA benchmark from 13.9% to 26.6%, and breaking evals into rubrics identifies which behaviors to fix.