OpenAI Unveils Jalapeño — Its First Custom AI Inference Chip Built with Broadcom

OpenAI and Broadcom unveil Jalapeño, a custom LLM inference accelerator. Early testing shows substantially better performance-per-watt than current state-of-the-art, with gigawatt-scale deployment starting in 2026.

OpenAI and Broadcom Jalapeño chip announcement

OpenAI just took direct control of its hardware destiny. On June 24, the company unveiled Jalapeño, its first in-house AI inference processor — designed from scratch for LLM workloads and built in collaboration with Broadcom and Celestica.

This isn't a modified GPU or a repurposed accelerator. Jalapeño is a blank-slate design optimized specifically for large language model inference. Early testing shows "substantially better performance per watt than current state-of-the-art," according to OpenAI's official announcement, with engineering samples already running production workloads including GPT-5.3-Codex-Spark in the lab.

Why This Matters for API Users

If you use OpenAI's API, ChatGPT, or Codex, Jalapeño's economics will eventually show up in your bill. Here's the chain:

• Better inference efficiency → lower cost per token

• Gigawatt-scale deployment (starting late 2026 with Microsoft and other data center partners) → more capacity for less spend

• Multi-generational roadmap → compounding improvements over time

OpenAI's President Greg Brockman put it directly: "By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access." Lower inference costs mean OpenAI can either reduce API pricing or offer more capable models at the same price point — both scenarios benefit developers and businesses.

Nine Months from Design to Silicon

One of the more remarkable numbers in the announcement is the development timeline: Jalapeño went from initial design to manufacturing tape-out in just nine months. OpenAI says this is the fastest ASIC development cycle ever achieved in high-performance semiconductors.

The speed came from three things: deep software-hardware co-design between OpenAI's engineering teams, Broadcom's silicon implementation expertise, and — this part is interesting — OpenAI's own models were used to accelerate parts of the chip design and optimization process. The models you use today helped design the chips that will serve tomorrow's models.

What Jalapeño Actually Does Differently

The chip architecture was built around OpenAI's real-world serving patterns. Instead of being a general-purpose accelerator adapted for AI (which is how most AI chips work), Jalapeño was designed from the ground up around:

• The specific kernels OpenAI runs most frequently

• Memory movement patterns that dominate LLM inference

• Networking requirements for distributed serving

• Serving patterns for interactive products at scale

OpenAI claims the architecture "reduces data movement and balances compute, memory, and networking resources to achieve realized utilization much closer to theoretical peak performance." In plain English: less silicon wasted on idle cycles, more useful computation per watt.

Broadcom's Tomahawk networking silicon connects the chips at scale — critical for serving models that don't fit on a single die.

The Full-Stack Bet

Jalapeño makes OpenAI one of the few companies that owns the entire AI stack: chips, models, serving infrastructure, and end-user products. The only other company with comparable vertical integration is Google (TPU → Gemini → Google Cloud/Antigravity). Nvidia sells the shovels; OpenAI is building its own mine.

The timing is strategic. OpenAI is going public, and owning its inference silicon means better margins and more predictable capacity — two things Wall Street likes. It also reduces dependency on Nvidia's supply-constrained GPUs.

What's Still Unknown

OpenAI hasn't released detailed benchmark numbers yet. The company says a "detailed technical report on performance will be presented in the coming months." We don't know:

• Exact TOPS or FLOPS figures

• Power draw per chip

• How it compares against Nvidia B200/B300 or Google TPU v6

• Pricing or availability outside OpenAI's own infrastructure

What we do know: the first-generation chip is designed to work with all LLMs, not just OpenAI's models. If the chip is truly general to the transformer architecture, it could eventually serve competitors' models too — which would be an interesting business model for an inference-as-a-service offering down the line.

Bottom Line

Jalapeño is a signal. OpenAI is serious about controlling its infrastructure costs, which directly affects API pricing and product capabilities. If the chip performs as early tests suggest, by 2027 we could see materially cheaper GPT-class inference or models that cost the same but run much longer reasoning chains for the same price. Builders should watch the performance report expected in the coming months — that will tell us whether the "substantially better" claim holds up at scale.

Sources:

• OpenAI and Broadcom unveil LLM-optimized inference chip (Official, June 24, 2026)

• OpenAI unveils its first custom chip, built by Broadcom (TechCrunch, June 24, 2026)

• Broadcom corporate announcement