Cloudflare Forces AI Companies to Separate Search and Training Crawlers — What Site Owners Need to Know

Starting September 15, Cloudflare will block bots that mix search indexing with AI training. The new policy gives site owners granular control over Search, Agent, and Training crawlers.

Cloudflare announced major changes to how website owners can manage AI traffic, rolling out a new bot classification system that distinguishes between Search, Agent, and Training crawlers — and giving site operators fine-grained control over each category.

Starting September 15, 2026, Cloudflare will block bots that combine search indexing with AI training in a single crawler, forcing AI companies to use separate crawlers for different purposes. This is a significant escalation in the ongoing battle between content publishers and AI companies over data usage.

The three-category system

Instead of treating all automated traffic as a monolith, Cloudflare is introducing a pragmatic three-way split:

Search crawlers — bots that index content to answer queries later. Think Googlebot, Bingbot. Site owners can allow these (since they drive referral traffic) or block them.

Agent crawlers — real-time bots acting on behalf of a user right now. ChatGPT-User fetch bots, Gemini or Claude driving a browser. These visit your site to complete a task for a human who's waiting.

Training crawlers — bots that scrape content to train or fine-tune AI models. Your data is permanently absorbed into the model's architecture.

The key rule: if a bot claims to be a search crawler but also uses the data for training, it gets blocked starting September 15. Companies must run separate bots for search and training if they want search access.

Why this matters

The policy addresses a problem that small and medium site owners have faced for years: you can't block AI training without also blocking search discovery, because many crawlers do both. Google's crawler, for instance, both indexes for search and feeds into AI training pipelines.

Cloudflare's new system lets site owners say "yes to search, no to training" — or any combination they prefer.

The company is also highlighting a Pay-Per-Crawl marketplace it launched last year, where AI companies can pay publishers for training data.

What site owners should do

If you run a website behind Cloudflare, here's what this changes for you:

1. Review your bot settings in the Cloudflare dashboard before September 15. The new AI options are now available to all customers — not just enterprise plans.

2. Decide your stance. If you want search traffic but don't want your content used for model training, you can now enforce that separation.

3. Watch for new bot classifications. Cloudflare is updating its bot scoring to label whether traffic falls under search, agent, or training behavior.

4. Consider the Pay-Per-Crawl marketplace if you're open to licensing your content for training.

A smarter approach than blanket blocking

The old binary — block AI bots or don't — was too blunt. Most site owners want search engines to find them, but don't want their hard work fed into a competitor's model for free. Cloudflare's new taxonomy gives publishers the granularity they've been asking for.

Cloudflare CEO Matthew Prince framed it as giving publishers "content independence" — the ability to choose how their content is used rather than accepting a take-it-or-leave-it deal from the big AI platforms.

The bigger picture

This move mirrors a broader industry trend. Publishers are pushing back harder against unpaid AI training. The New York Times, Reddit, and other major content platforms have either sued AI companies or struck licensing deals.

Cloudflare, sitting between 30%+ of the web and its visitors, is in a unique position to enforce this separation at the network level. If the policy works as described, it could become the de facto standard for how websites manage AI traffic — putting pressure on every major crawler to cleanly separate its intentions.

Sources:

• Cloudflare Blog: Your site, your rules — new AI traffic options

• TechCrunch: Cloudflare's new policy pushes AI companies to pay for publishers' content

• The Verge: Cloudflare is cracking down on multi-purpose crawlers

The three-category system

Why this matters

What site owners should do

A smarter approach than blanket blocking

The bigger picture

Related Articles

Popular on AIspace-time