AI scaled

Transformers / LLMs / coding agents

Transformers and LLMs have enabled agentic AI systems that autonomously solve software engineering tasks, with frontier models now matching or exceeding professional developer capabilities on real-world codebase tasks.

What to watch next

Watch for scaling reasoning with test-time compute to surpass human performance on complex multi-step planning; emergence of true agentic systems that coordinate multiple specialized coding agents in parallel; and development of interpretable reasoning traces that enable verification and debugging of AI agent decisions.

Key sub-ideas & techniques

Coding agents — Agentic LLMs that autonomously edit, run, and debug code have shifted programming from write-then-run to delegate-then-review — Cursor, Devin, Claude Code, and GitHub Copilot Workspace are now standard tooling for many engineering teams. [source]
Mixture-of-Experts (MoE) — Conditional computation that activates only a small fraction of parameters per token (DeepSeek-V3 ~5.5%) lets trillion-parameter models stay tractable to serve, decoupling parameter count from per-token cost. [source]
Long-context windows — Production context lengths exploded from 4K (GPT-3.5) to 1M+ tokens (Gemini, Claude Sonnet 4, Qwen2.5-1M), enabling whole-codebase reasoning, long-document analysis, and persistent agent memory. [source]
Multimodal frontier models — Native multimodal training (vision + audio + text + code in one model) replaced bolt-on adapters and is now the default for frontier systems, supporting voice agents, screen-reading, and embodied use cases. [source]
Open-weight catch-up — Open-weight models (Llama, Qwen, DeepSeek) closed most of the gap with closed-frontier systems, with DeepSeek-V3 and R1 published with permissive MIT-style licenses, restructuring the geopolitics of AI compute. [source]
Model Context Protocol (MCP) — Anthropic's open MCP standard (Nov 2024) became the lingua franca for connecting LLMs to external tools and data — adopted across Claude, OpenAI, and most agent frameworks, replacing one-off plugin systems with a single client/server protocol. [source]
AWS MCP Server (managed) — Managed MCP server providing secure, auditable agent access to AWS services with file uploads, long-running operations, sandboxed Python, IAM guardrails, and CloudTrail logging. [source]
GPT-Realtime-2 voice stack — Native real-time voice agent stack with GPT-5-class reasoning, 70+ language live translation, and streaming Whisper transcription, exposed via the OpenAI API. [source]
OpenAI GPT-5.5-Cyber (Trusted Access for Cyber) — May 7, 2026 — Domain-specialized variant of GPT-5.5 for vetted cyber defenders, launched via Trusted Access for Cyber identity tier. Permits bug-hunting, malware analysis, reverse engineering; blocks credential theft and malware writing. Cisco and CrowdStrike are launch partners; Axios sources peg vuln-finding capability ~ Anthropic Mythos Preview. [source]
OS-native agentic AI layer — A device-level agent that understands screen context, moves across apps and the browser, and completes multi-step tasks with the user in the loop — turning the smartphone OS itself into the agent surface. [source]
Streaming / predictive RAG for voice agents — Decouple retrieval from generation so the vector-DB round-trip never blocks speech: a background agent anticipates likely queries and pre-fetches into a fast cache while the speaking agent reads only from cache (or retrieves in parallel with user speech). [source]
IndexShare shared sparse-attention indexer — A sparse-attention efficiency method that reuses a single indexer across groups of four attention layers, reducing per-token FLOPs by 2.9x at 1M context, demonstrated at frontier quality in GLM-5.2. [source]
OpenAI GPT-Live (full-duplex voice) — Full-duplex voice model that listens and speaks simultaneously (vs turn-based Advanced Voice Mode), delegating search/reasoning to GPT-5.5 in the background; GPT-Live-1 becomes default ChatGPT Voice for Go/Plus/Pro, with new emotional-reliance/self-harm safety training. [source]
Kimi K3 — 2.8T-parameter open-weight MoE LLM from Moonshot AI, among the largest open models released to date
Inkling — 975B-param open-weight multimodal MoE model from Thinking Machines Lab
Gemini 3.6 Flash — Google DeepMind fast-tier model; 1M context, improved token-efficiency vs 3.5 Flash [source]

Current frontier

Mixture of Experts (MoE) architecture has become the standard for frontier models, with DeepSeek-V3 activating only 5.5% of parameters per token, enabling efficient trillion-parameter deployment. [source]
Context length scaling to 1M tokens is now production-ready with Gemini 2.5 Pro (2M tokens), Claude Sonnet 4 (1M tokens), and open-source Qwen2.5-1M. [source]
Claude Opus 4.7 and GPT-5.5 represent 2026 frontier capabilities with improved instruction persistence and multi-step agentic orchestration. [source]
Claude Mythos Preview achieves 93.9% on SWE-bench Verified, though only 45.9% on the harder contamination-free SWE-bench Pro, revealing data contamination in benchmark evaluation. [source]
Diffusion transformers with patch-based representations have emerged as scalable architecture for both language and vision domains, with Next-Latent Prediction extending self-supervised learning to latent space. [source]
GPT-5.5 Instant produces 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts in medicine, law, and finance (OpenAI release notes, May 5 2026). [source]
May 6, 2026 — Anthropic secures all of SpaceX Colossus 1 (~300 MW / 220K+ GPUs within the month), doubles Claude Code rate limits, raises Opus API limits, and expresses interest in multi-GW orbital compute with SpaceX. [source]
Gemini 3.5 Flash is a frontier-rivaling small model optimized for agentic coding, GA in Antigravity 2.0 and the Gemini API as of May 19, 2026. [source]
Claude Opus 4.8 released (May 28, 2026): 69.2% agentic coding, 84% computer-use on Online-Mind2Web, new dynamic workflows for Claude Code. [source]
Karpathy joined Anthropic pretraining team May 2026 to lead a 'use Claude to accelerate pretraining' effort. [source]
MiniMax M3 (June 1, 2026): open-weight model with 1M-token context via MiniMax Sparse Attention, 59.0% SWE-Bench Pro (beats GPT-5.5 and Gemini 3.1 Pro, approaches Opus 4.7), native multimodality and computer use; weights+report within 10 days. [source]
Claude Fable 5, a new 'Mythos-class' tier sitting above Opus, is now generally available and described by Anthropic as SOTA on nearly all tested AI-capability benchmarks, with classifier fallback to Opus 4.8 on cyber/bio/distillation queries. [source]
Open-weight frontier coding/agentic model GLM-5.2 (744B MoE, MIT, 1M context) becomes first open model >80% on Terminal-Bench 2.1 (81.0) and 62.1 SWE-bench Pro, beating GPT-5.5 at ~1/6 cost. [source]
Two June 2026 Nature papers show agentic AI doing autonomous, multi-step, end-to-end clinical management at or above board-certified-physician performance in simulated settings — MIRA (ED, 87.8% vs 78.1% dx accuracy) and Google's AMIE (longitudinal outpatient, non-inferior to PCPs). [source]
OpenAI limited-preview of GPT-5.6 Sol/Terra/Luna; Sol new SOTA on Terminal-Bench 2.1, $5/$30 per-M tokens; broad release gated by a US-government access process tied to a forthcoming cyber EO. [source]
Claude Sonnet 5 (Jun 30) default for Free/Pro, intro $2/$10 per-M pricing (→$3/$15 after Aug 31); gains on BrowseComp, OSWorld-Verified, Humanity's Last Exam and improved agentic tool-use vs Sonnet 4.6. [source]

Key people

Ashish Vaswani Co-founder and CEO, Essential AI; Co-author of 'Attention Is All You Need' · Essential AI [source]
Noam Shazeer VP Engineering, Gemini Co-lead · Google DeepMind [source]
Dario Amodei CEO · Anthropic [source]
Ilya Sutskever CEO · Safe Superintelligence Inc. (SSI) [source]
Jeff Dean Chief Scientist, Co-lead Gemini · Google DeepMind [source]
Yann LeCun Founder and Chief Scientist · Advanced Machine Intelligence Labs (AMI Labs); Jacob T. Schwartz Chaired Professor, NYU Courant Institute [source]
John Jumper Joining Anthropic (research) after ~9 years leading AlphaFold at Google DeepMind · Anthropic [source]
Dawn Song Professor (AI safety & security, agentic AI, decentralized AI); co-founder Virtue AI; now leading AI safety/security at Meta Superintelligence Labs · UC Berkeley; Berkeley RDI (co-director); Virtue AI (co-founder); Meta Superintelligence Labs [source]

Startups & labs to watch

Cursor (Anysphere) Anysphere Inc. · STARTUP · Series D, $29.3B valuation (2026) — Fastest-growing AI coding IDE, reaching $29.3B valuation with $2B+ ARR by early 2026, supporting parallel multi-agent coding. [source]
Cognition Labs (Devin) Cognition Labs · STARTUP · Series C+, $400M at $10.2B valuation; targeting $25B (2026) — Autonomous AI software engineer scaling ARR from $1M to $73M in 9 months; acquired Windsurf; targeting $25B valuation in April 2026. [source]
Safe Superintelligence Inc. (SSI) SSI · STARTUP · Series B, $2B at $32B valuation (April 2025) — Sutskever's new lab focused on safety-first AGI path with $3B+ raised and $32B valuation; significant backing from Alphabet and Nvidia. [source]
Advanced Machine Intelligence Labs (AMI Labs) AMI Labs · STARTUP · Series A, $1.03B at $3.5B (March 2026) — Yann LeCun's new venture focused on world models and open-source development; raised $1.03B at $3.5B pre-money valuation in March 2026. [source]
Anthropic enterprise-AI JV Anthropic / Blackstone / Hellman & Friedman / Goldman Sachs · STARTUP · $1.5B initial capitalization; backed by Blackstone, H&F, Goldman Sachs — First services-led commercialization vehicle for a frontier-lab model; converts mid-market deployments from pure API consumption to verticalized AI services. [source]
Z.ai (Zhipu AI) Z.ai / Zhipu AI · LAB · Well-funded Chinese AI 'tiger' (multi-billion valuation; public-listing track) — Leading Chinese open-weight frontier lab (GLM series); GLM-5.2 is the first open model >80% Terminal-Bench 2.1 and a credible MIT-licensed competitor to closed frontier coding models at ~1/6 cost. [source]
Moonshot AI Moonshot AI · LAB · Alibaba/Tencent-backed — Leading Chinese open-weight frontier lab; Kimi K3 (2.8T MoE) is among the largest open models released. [source]
Thinking Machines Lab Thinking Machines Lab · LAB · $2B seed round (2025) — Mira Murati's (ex-OpenAI CTO) frontier lab; shipped open-weight Inkling (975B) + Tinker fine-tuning platform. [source]