Hardware
scaled
AI accelerators / advanced compute
GPU-dominated AI compute fractured into specialized lanes between 2024 and 2026 — wafer-scale (Cerebras), inference ASICs (Groq, Etched), photonic interconnect (Lightmatter), and per-hyperscaler custom silicon (TPU/Trainium/Maia/MTIA) — pulling 20–70% of opex out of training and inference.
What to watch next
Rubin (2H 2026) and the Cerebras OpenAI 750 MW deployment going live; Lightmatter Passage co-packaged optics in production; neuromorphic chips (Loihi 3, NorthPole) crossing the practical edge-AI threshold; whether NVIDIA-Groq integration cements inference dominance.
Key sub-ideas & techniques
- GPU scaling (Hopper → Blackwell → Rubin) — NVIDIA's flagship line keeps doubling effective AI throughput per generation: Blackwell (208B transistors, ~20 PFLOPS FP4 per B200) ships at scale in 2025, Rubin (3× Blackwell, 1.2 EFLOPS FP8 training) in 2H2026. [source]
- Wafer-scale processors — Cerebras WSE-3 (5nm, 4 trillion transistors, 900K cores, 44 GB on-chip SRAM, 21 PB/s memory) eliminates chip-to-chip movement entirely — enabling single-chip training of multi-trillion-parameter models. [source]
- Inference-specialized ASICs — Fixed-function silicon optimized for narrow inference workloads — Groq LPU's deterministic conveyor-belt architecture and Etched's transformer-only Sohu ASIC compete with GPUs on $/token. [source]
- Hyperscaler custom silicon — Cloud platforms have abandoned GPU monoculture: Google Trillium / TPU v6, AWS Trainium 3, Meta MTIA 300/500, Microsoft Maia 200 each deliver hyperscaler-specific compute economics. [source]
- Photonic & neuromorphic compute — Lightmatter's photonic processor and IBM NorthPole / Intel Loihi 3 neuromorphic chips break with the GPU paradigm — using photons or spiking neurons for orders-of-magnitude better energy per inference for the right workloads. [source]
- Google four-partner inference TPU supply chain — Google diversifying custom AI silicon by adding Marvell (memory processing unit + next-gen inference TPU) alongside Broadcom and MediaTek, signaling hyperscaler push to commoditize inference accelerators and reduce NVIDIA dependence. [source]
- NVIDIA RTX Spark (N1X) client AI superchip — Arm-CPU + Blackwell-GPU SoC with 128GB unified memory bringing data-center-class local AI inference (120B-param LLMs, 1M-token context) to consumer Windows PCs. [source]
Current frontier
- NVIDIA Rubin GPUs ship in 2H 2026 with ~3× Blackwell performance and 1.2 EFLOPS FP8 training capability; Rubin Ultra follows in 2027. [source]
- Cerebras Systems signed a >$10B agreement with OpenAI (Jan 2026) for ~750 MW of inference compute through 2028 and filed for IPO at ~$23B valuation in April 2026. [source]
- NVIDIA acquired Groq for ~$20B (Dec 2025) and integrated the LPU into its 2026 GTC stack; Groq 3 LPU achieves ~1,500 tokens/sec on agentic AI inference. [source]
- Hyperscaler custom silicon scaled across all four majors: Google Trillium (4× LLM training perf vs v5e, 2× HBM bandwidth), AWS Trainium 3 (2.52 PFLOPS MXFP8, 144 GB HBM3e), Meta MTIA generations 4× HBM bandwidth, Microsoft Maia 200 (10 PFLOPS FP4). [source]
- Intel Loihi 3 (Jan 2026) delivers 8M neurons / 64B synapses at 4nm running at ~1.2W peak vs 300W+ for GPU equivalents — neuromorphic compute crossing the threshold for practical edge AI. [source]
- Cerebras' $3.5B IPO at ~$26.6B valuation, anchored by a $20B+ OpenAI compute deal, validates wafer-scale silicon as a credible alternative to GPU dominance for AI inference. [source]
Key people
- Jensen Huang CEO & Co-founder · NVIDIA [source]
- Andrew Feldman Co-founder & CEO · Cerebras Systems [source]
- Jonathan Ross Founder · Groq (acquired by NVIDIA Dec 2025) [source]
- Jim Keller CEO · Tenstorrent [source]
- Norm Jouppi Engineering Fellow; lead architect, Google TPU · Google [source]
- Nick Harris Co-founder & CEO · Lightmatter [source]
Startups & labs to watch
- Tenstorrent Tenstorrent · STARTUP · Series B (2024) led by Khosla Ventures and others — Jim Keller's open-source RISC-V + Metalium ASIC pursuing GPU alternatives for hyperscale clusters via 12×400 Gbps Ethernet scale-out; Blackhole successor scaling. [source]
- Rebellions Rebellions Inc. · STARTUP · Series C 2025, $1.4B valuation; Arm + Samsung Ventures + Synopsys — Korean accelerator with 4nm UCIe-Advanced quad-chiplet architecture (REBEL-Quad), 144 GB HBM3e, claiming ~3.2× tokens/W vs H200; Arm + Samsung backed. [source]
- FuriosaAI (RNGD next-gen) FuriosaAI · STARTUP · Rejected $800M Meta acquisition; raised capital via LG and OpenAI partnerships — Korean TSMC 5nm AI ASIC; founder declined Meta's $800M acquisition (March 2025) and is pursuing a 2027 IPO with LG and OpenAI partnerships. [source]
- Lightmatter (photonic compute) Lightmatter · STARTUP · Series D ~$400M (2024); SoftBank Vision Fund, GV — Shipping the Envise photonic processor and Passage L200/L20 co-packaged optics — the leading commercial bet on photonic interconnect for AI datacenters. [source]