AI
scaled
Transformers / LLMs / coding agents
Transformers and LLMs have enabled agentic AI systems that autonomously solve software engineering tasks, with frontier models now matching or exceeding professional developer capabilities on real-world codebase tasks.
What to watch next
Watch for scaling reasoning with test-time compute to surpass human performance on complex multi-step planning; emergence of true agentic systems that coordinate multiple specialized coding agents in parallel; and development of interpretable reasoning traces that enable verification and debugging of AI agent decisions.
Key sub-ideas & techniques
- Coding agents — Agentic LLMs that autonomously edit, run, and debug code have shifted programming from write-then-run to delegate-then-review — Cursor, Devin, Claude Code, and GitHub Copilot Workspace are now standard tooling for many engineering teams. [source]
- Mixture-of-Experts (MoE) — Conditional computation that activates only a small fraction of parameters per token (DeepSeek-V3 ~5.5%) lets trillion-parameter models stay tractable to serve, decoupling parameter count from per-token cost. [source]
- Long-context windows — Production context lengths exploded from 4K (GPT-3.5) to 1M+ tokens (Gemini, Claude Sonnet 4, Qwen2.5-1M), enabling whole-codebase reasoning, long-document analysis, and persistent agent memory. [source]
- Multimodal frontier models — Native multimodal training (vision + audio + text + code in one model) replaced bolt-on adapters and is now the default for frontier systems, supporting voice agents, screen-reading, and embodied use cases. [source]
- Open-weight catch-up — Open-weight models (Llama, Qwen, DeepSeek) closed most of the gap with closed-frontier systems, with DeepSeek-V3 and R1 published with permissive MIT-style licenses, restructuring the geopolitics of AI compute. [source]
- Model Context Protocol (MCP) — Anthropic's open MCP standard (Nov 2024) became the lingua franca for connecting LLMs to external tools and data — adopted across Claude, OpenAI, and most agent frameworks, replacing one-off plugin systems with a single client/server protocol. [source]
- AWS MCP Server (managed) — Managed MCP server providing secure, auditable agent access to AWS services with file uploads, long-running operations, sandboxed Python, IAM guardrails, and CloudTrail logging. [source]
- GPT-Realtime-2 voice stack — Native real-time voice agent stack with GPT-5-class reasoning, 70+ language live translation, and streaming Whisper transcription, exposed via the OpenAI API. [source]
- OpenAI GPT-5.5-Cyber (Trusted Access for Cyber) — May 7, 2026 — Domain-specialized variant of GPT-5.5 for vetted cyber defenders, launched via Trusted Access for Cyber identity tier. Permits bug-hunting, malware analysis, reverse engineering; blocks credential theft and malware writing. Cisco and CrowdStrike are launch partners; Axios sources peg vuln-finding capability ~ Anthropic Mythos Preview. [source]
- OS-native agentic AI layer — A device-level agent that understands screen context, moves across apps and the browser, and completes multi-step tasks with the user in the loop — turning the smartphone OS itself into the agent surface. [source]
- Streaming / predictive RAG for voice agents — Decouple retrieval from generation so the vector-DB round-trip never blocks speech: a background agent anticipates likely queries and pre-fetches into a fast cache while the speaking agent reads only from cache (or retrieves in parallel with user speech). [source]
Current frontier
- Mixture of Experts (MoE) architecture has become the standard for frontier models, with DeepSeek-V3 activating only 5.5% of parameters per token, enabling efficient trillion-parameter deployment. [source]
- Context length scaling to 1M tokens is now production-ready with Gemini 2.5 Pro (2M tokens), Claude Sonnet 4 (1M tokens), and open-source Qwen2.5-1M. [source]
- Claude Opus 4.7 and GPT-5.5 represent 2026 frontier capabilities with improved instruction persistence and multi-step agentic orchestration. [source]
- Claude Mythos Preview achieves 93.9% on SWE-bench Verified, though only 45.9% on the harder contamination-free SWE-bench Pro, revealing data contamination in benchmark evaluation. [source]
- Diffusion transformers with patch-based representations have emerged as scalable architecture for both language and vision domains, with Next-Latent Prediction extending self-supervised learning to latent space. [source]
- GPT-5.5 Instant produces 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts in medicine, law, and finance (OpenAI release notes, May 5 2026). [source]
- May 6, 2026 — Anthropic secures all of SpaceX Colossus 1 (~300 MW / 220K+ GPUs within the month), doubles Claude Code rate limits, raises Opus API limits, and expresses interest in multi-GW orbital compute with SpaceX. [source]
- Gemini 3.5 Flash is a frontier-rivaling small model optimized for agentic coding, GA in Antigravity 2.0 and the Gemini API as of May 19, 2026. [source]
- Claude Opus 4.8 released (May 28, 2026): 69.2% agentic coding, 84% computer-use on Online-Mind2Web, new dynamic workflows for Claude Code. [source]
- Karpathy joined Anthropic pretraining team May 2026 to lead a 'use Claude to accelerate pretraining' effort. [source]
- MiniMax M3 (June 1, 2026): open-weight model with 1M-token context via MiniMax Sparse Attention, 59.0% SWE-Bench Pro (beats GPT-5.5 and Gemini 3.1 Pro, approaches Opus 4.7), native multimodality and computer use; weights+report within 10 days. [source]
- Claude Fable 5, a new 'Mythos-class' tier sitting above Opus, is now generally available and described by Anthropic as SOTA on nearly all tested AI-capability benchmarks, with classifier fallback to Opus 4.8 on cyber/bio/distillation queries. [source]
Key people
- Ashish Vaswani Co-founder and CEO, Essential AI; Co-author of 'Attention Is All You Need' · Essential AI [source]
- Noam Shazeer VP Engineering, Gemini Co-lead · Google DeepMind [source]
- Dario Amodei CEO · Anthropic [source]
- Ilya Sutskever CEO · Safe Superintelligence Inc. (SSI) [source]
- Jeff Dean Chief Scientist, Co-lead Gemini · Google DeepMind [source]
- Yann LeCun Founder and Chief Scientist · Advanced Machine Intelligence Labs (AMI Labs); Jacob T. Schwartz Chaired Professor, NYU Courant Institute [source]
Startups & labs to watch
- Cursor (Anysphere) Anysphere Inc. · STARTUP · Series D, $29.3B valuation (2026) — Fastest-growing AI coding IDE, reaching $29.3B valuation with $2B+ ARR by early 2026, supporting parallel multi-agent coding. [source]
- Cognition Labs (Devin) Cognition Labs · STARTUP · Series C+, $400M at $10.2B valuation; targeting $25B (2026) — Autonomous AI software engineer scaling ARR from $1M to $73M in 9 months; acquired Windsurf; targeting $25B valuation in April 2026. [source]
- Safe Superintelligence Inc. (SSI) SSI · STARTUP · Series B, $2B at $32B valuation (April 2025) — Sutskever's new lab focused on safety-first AGI path with $3B+ raised and $32B valuation; significant backing from Alphabet and Nvidia. [source]
- Advanced Machine Intelligence Labs (AMI Labs) AMI Labs · STARTUP · Series A, $1.03B at $3.5B (March 2026) — Yann LeCun's new venture focused on world models and open-source development; raised $1.03B at $3.5B pre-money valuation in March 2026. [source]
- Anthropic enterprise-AI JV Anthropic / Blackstone / Hellman & Friedman / Goldman Sachs · STARTUP · $1.5B initial capitalization; backed by Blackstone, H&F, Goldman Sachs — First services-led commercialization vehicle for a frontier-lab model; converts mid-market deployments from pure API consumption to verticalized AI services. [source]