Open-Source Model Router Cuts AI API Costs 40-70% with a Single Endpoint Change — Workweave Router Deep Dive

Workweave Router is a new open-source tool that routes every LLM prompt to the optimal model in under 50ms, cutting API costs by 40-70%. We break down how it works, why model routing matters now more than ever, and the practical setup.

If you run an AI-powered product, your biggest headache after latency is probably cost. You're either overpaying because every prompt hits your most powerful model, or you're hurting quality because you manually routed the cheap one to a use case it can't handle.

A new open-source project called Workweave Router aims to solve exactly this — and the HN community gave it 144 points in its first day, which tells you developers are feeling the pain.

What Is Model Routing and Why Does It Matter Now?

Model routing is the practice of sending each LLM request to the cheapest model that can handle it well, rather than always using your most capable (and expensive) model for everything. It's not new — companies like OpenRouter and Portkey have offered API-level routing for a while. But the landscape shifted in mid-2026:

1. You have too many good models to choose from. Between OpenAI's GPT-5.5/5.6 family, Anthropic's Claude Mythos/Fable/Haiku, Google's Gemini 3.5 series, and various open-weight models, the list of capable LLMs has grown past 40+ production-ready options.

2. The price gap between premium and affordable models is widening. GPT-5.6 Sol costs $5/M input tokens, while fast models like Luna or Claude Haiku cost a fraction. If even 30% of your traffic can be handled by a cheap model, that's a massive savings.

3. Models have uneven strengths. One model might be great at code generation but bad at creative writing. A router that learns these patterns can send each task to its best-fit model — not just the cheapest.

How Workweave Router Works

The GitHub repo describes it simply: "Routes every prompt to the right model in <50ms. Cut costs 40-70% with just an endpoint change."

Here's the architecture:

• Smart routing: The router evaluates each incoming prompt (embedding-based classification, not just keyword matching) and decides which backend model should handle it

• Sub-50ms overhead: The routing decision itself is fast enough that it doesn't meaningfully add to your p95 latency

• Drop-in endpoint: You change your API URL to the router's endpoint, and it handles the dispatch to OpenAI, Anthropic, or whatever provider you configure

• Agent-friendly: Designed specifically for agentic systems (Claude Code, Codex, Cursor) where cost multiplies quickly because agents make many back-and-forth calls per task

It's built in Go with a frontend dashboard, so you get visibility into which models are being used and how much you're saving.

Why This Is Timely

The GPT-5.6 launch makes model routing even more relevant. OpenAI now has three tiers (Sol, Terra, Luna), and most teams will want to use all three for different purposes. Manually coding this logic into your application is brittle. An external router decouples the routing policy from your application code.

Meanwhile, the government restrictions on frontier models (Anthropic Mythos/Fable, OpenAI GPT-5.6 being limited to trusted partners) mean that building on a single model is a reliability risk. A router that can fall back to different providers gives you a layer of resilience.

What the Community Says

The 144+ HN points on the launch thread suggest real demand. Commenters raised a few practical concerns:

• Latency for real-time use: Sub-50ms is fine for most workloads, but if you're building a chatbot that needs instant responses, every millisecond counts

• OpenAI/Anthropic TOS: Some providers' terms of service restrict how you can route their models through third-party intermediaries — you'll want to check your specific agreement

• Classification quality: The router's decision-making is only as good as its embedding model. False positives (sending a hard reasoning task to a cheap model) can tank quality

Setting It Up

The project is open source on GitHub. The basic setup is:

1. Clone the repo from github.com/workweave/router 2. Configure your API keys in the env file 3. Run with Docker or Go directly 4. Point your application to the router endpoint instead of the provider directly

The frontend dashboard gives you visibility into routing decisions and cost savings in near real-time.

The Bigger Picture

Workweave Router is part of a broader trend. As AI API costs remain the #1 concern for developers building with LLMs (our own 2026 pricing guide tracked 42 models across 10 providers), smart routing is becoming standard infrastructure.

We're seeing a few patterns:

• Provider-agnostic tooling is winning. Lock-in to a single model or provider is increasingly seen as a risk, especially with government restrictions creating sudden availability gaps

• Cost optimization is moving from manual to automatic. Instead of developers hard-coding "use model A for code, model B for chat," routers can learn usage patterns

• The router market is heating up. OpenRouter, Portkey, and now Workweave — expect consolidation as this becomes table stakes for AI infrastructure

Practical Takeaway

If you're spending more than $500/month on LLM API calls and you aren't using some form of model routing, you're probably overpaying by 30-50%. Workweave Router is a solid open-source option to try before committing to a paid service. Given the widening price gaps between model tiers and the increasing number of capable options, this kind of tooling is becoming hard to justify living without.