
Anthropic's ClaudeBot crawls 23,951 pages for every single referral it sends back to website owners, according to Cloudflare Radar data from January through March 2026. OpenAI's GPTBot sits at 1,276:1, while DuckDuckGo achieves near-parity at 1.5:1. I spent weeks analyzing this data to build what I believe is the most actionable metric for Generative Engine Optimization (GEO) strategy: the crawl-to-refer ratio — the number of pages an AI crawler or LLM bot crawls divided by the referrals its parent platform (ChatGPT, Claude, Perplexity, Grok, Copilot) sends back.
Data source: Cloudflare Radar — Bot & Crawler Analytics (January 1 – March 16, 2026) analyzed the huge data by SEOmator

The crawl-to-refer ratio measures how many pages an AI crawler or LLM bot crawls from your website for every referral visit it sends back. A ratio of 100:1 means the bot crawled 100 of your pages before its parent platform directed a single user to your site.
This metric matters because AI crawlers — GPTBot, ClaudeBot, PerplexityBot, and others — consume your server resources (bandwidth, compute, and crawl budget) while indexing your content for their LLM models. The referral side measures whether that investment pays off through actual traffic returned to your site.
For SEO and GEO professionals, this ratio functions as a return-on-investment metric for crawler access. Every page crawled by an LLM bot is a page that could have been crawled by Googlebot instead, directly affecting your site's crawl budget allocation. In my experience auditing enterprise sites, I've seen AI crawlers consume up to 40% of total crawl activity — resources that deliver zero organic search value.
The gap between the best and worst crawl-to-refer ratios spans four orders of magnitude. DuckDuckGo sends back nearly one visit for every page it crawls, while Anthropic's ClaudeBot takes nearly 24,000 pages of content for each referral returned.
| Operator | Crawl-to-Refer Ratio | What This Means |
|---|---|---|
| Anthropic (ClaudeBot) | 23,951:1 | Crawls 23,951 pages per 1 referral sent back |
| OpenAI (GPTBot) | 1,276:1 | Crawls 1,276 pages per 1 referral sent back |
| Perplexity (PerplexityBot) | 111:1 | Crawls 111 pages per 1 referral sent back |
| Microsoft (Copilot) | 33:1 | Includes Copilot and Bing AI features |
| Mistral | 22:1 | Relatively low crawl volume |
| Yandex | 21:1 | Russian search with growing AI features |
| Google (Gemini / AI Overviews) | 5:1 | Traditional search drives high referral volume |
| Baidu | 4.8:1 | Chinese search with established referral patterns |
| ByteDance | 3.1:1 | TikTok's parent company generates strong referrals |
| DuckDuckGo | 1.5:1 | Near-parity — the most efficient ratio of all operators |
Anthropic's ClaudeBot crawls content to train Claude but operates no consumer search product that returns traffic. OpenAI's GPTBot similarly trains models for ChatGPT, though ChatGPT Search has begun generating some referral traffic — contributing to its 0.20% referrer share according to Cloudflare Radar. I've been tracking both bots in our server logs for the past six months, and the disparity is striking when you see it at the individual site level.

Anthropic's 23,951:1 ratio compared to Google's 5:1 ratio reflects a fundamental difference in business model rather than inefficiency. ClaudeBot operates as a training data crawler — it ingests web content to improve Claude's capabilities but Anthropic does not operate a search engine or any consumer-facing product that links back to source websites.
Google's 5:1 ratio benefits from Search's core function: sending users to websites. Every Google Search result click is a referral. Anthropic has no equivalent referral mechanism — Claude responses occasionally cite sources but do not generate clickable referrals tracked by analytics.
This structural gap means website owners who allow ClaudeBot access are subsidizing Anthropic's model training with zero measurable return in referral traffic. Whether that trade-off is acceptable depends on whether publishers view LLM model inclusion as a form of brand presence or strictly as a traffic exchange. Based on what I've seen across dozens of client sites, most publishers haven't even realized this imbalance exists until they check their server logs. Understanding this ratio is becoming a core part of any serious AI search optimization strategy.
Anthropic's ratio has shown dramatic month-over-month improvement in 2026, dropping 74% from January to March. However, even the improved March ratio of 11,736:1 still dwarfs every other operator.
| Operator | January 2026 | February 2026 | March 1-16, 2026 | Trend |
|---|---|---|---|---|
| Anthropic (ClaudeBot) | 45,458:1 | 25,023:1 | 11,736:1 | Improving rapidly (-74%) |
| OpenAI (GPTBot) | 1,280:1 | 1,340:1 | 1,161:1 | Stable with slight improvement |
| Perplexity (PerplexityBot) | 111:1 | 114:1 | 109:1 | Remarkably stable (~110:1) |
| Microsoft (Copilot) | 35:1 | 34:1 | 28:1 | Improving (-20%) |
| Yandex | 18:1 | 21:1 | 26:1 | Worsening (+44%) |
| 4.7:1 | 5.3:1 | 5.0:1 | Stable (~5:1) | |
| ByteDance | 2.6:1 | 3.4:1 | 5.5:1 | Worsening (+112%) |
| DuckDuckGo | 1.43:1 | 1.48:1 | 1.50:1 | Stable (~1.5:1) |
ClaudeBot's improvement likely reflects reduced crawling intensity rather than increased referrals, since Anthropic has not launched a search product. The daily timeseries confirms this: ClaudeBot's ratio peaked at 136,416:1 on January 1 and steadily declined to 6,393:1 by mid-March.
ByteDance's worsening ratio — from 2.6:1 to 5.5:1 — coincides with TikTok's referral share declining from ~13% in early January to ~3% by mid-February and stabilizing there, while its crawling activity increased. I documented similar AI bot traffic patterns by country in our earlier analysis, where geographic crawling behavior showed comparable volatility.

The industry your website operates in dramatically affects what you receive in return from AI crawlers and LLM bots. Finance sites receive 6x more referrals per crawl from Perplexity than shopping sites do. If you're building a GEO strategy, your vertical determines which bots are worth allowing.
| Operator | Shopping & Retail | Tech & Electronics | Finance | Business & Industry |
|---|---|---|---|---|
| Anthropic (ClaudeBot) | 10,971:1 | 1,710:1 | 11,503:1 | 13,805:1 |
| OpenAI (GPTBot) | 584:1 | 583:1 | 352:1 | 879:1 |
| Perplexity (PerplexityBot) | 182:1 | 98:1 | 42:1 | 100:1 |
| Microsoft (Copilot) | 51:1 | 23:1 | 25:1 | 30:1 |
| 8.2:1 | 4.6:1 | 3.1:1 | 4.3:1 | |
| DuckDuckGo | 1.9:1 | 1.5:1 | 3.4:1 | 1.7:1 |
Key findings by industry:
When I run technical SEO audits for SaaS clients, I always check the industry-specific crawl patterns. A SaaS company in the finance vertical will get a fundamentally different return from AI crawlers than an e-commerce retailer selling consumer electronics.
AI crawlers and LLM bots now generate 5.06% of all crawler traffic observed by Cloudflare Radar, with another 3.57% classified as "mixed-purpose" bots that serve both AI training and traditional functions.
| Client Type | Share of Crawler Traffic |
|---|---|
| Human | 46.24% |
| Non-AI Bot | 45.13% |
| AI Bot | 5.06% |
| Mixed Purpose | 3.57% |
When examining AI crawler volume by operator, two players dominate: Meta and OpenAI together account for over 70% of all AI crawler traffic.
| AI Crawler Operator | Share of AI Crawl Volume | Primary Bot |
|---|---|---|
| Meta | 36.10% | Meta-ExternalAgent |
| OpenAI | 34.44% | GPTBot |
| Amazon | 13.58% | Amazonbot |
| 8.14% | Google-Extended | |
| Huawei | 7.43% | PetalBot |
Meta's dominance is notable because Meta-ExternalAgent does not appear in the crawl-to-refer ratio data at all — it crawls content for Meta AI (the LLM powering Instagram, WhatsApp, and Facebook AI features) with no referral mechanism. Meta is the single largest AI crawler at 36.10% of AI traffic with zero return for publishers. I've recommended blocking Meta-ExternalAgent to every client I've worked with this year because there is simply no upside to allowing it.
Looking at total crawler user agent share across all bot traffic (not just AI), Googlebot still leads, but GPTBot and ClaudeBot now hold the second and third positions.
| User Agent | Share of All Crawler Traffic | Category |
|---|---|---|
| Googlebot | 23.91% | Search Engine |
| GPTBot (ChatGPT) | 17.07% | AI Crawler / LLM Bot |
| ClaudeBot (Claude) | 14.40% | AI Crawler / LLM Bot |
| Meta-ExternalAgent (Meta AI) | 12.26% | AI Crawler / LLM Bot |
| Bingbot | 8.01% | Search Engine |
| Amazonbot (Alexa AI) | 5.37% | AI Crawler / LLM Bot |
| facebookexternalhit | 3.08% | Social Preview |
| PetalBot (Huawei) | 2.59% | AI Crawler / LLM Bot |
| YandexBot | 2.54% | Search Engine |
AI crawlers and LLM bots (GPTBot + ClaudeBot + Meta-ExternalAgent + Amazonbot + PetalBot) collectively represent 51.69% of all crawler traffic — surpassing traditional search engine crawlers combined. Search engine crawlers (Googlebot + Bingbot + YandexBot) account for 34.46%.
LLM bots now consume more crawl resources than the search engines that actually drive the majority of referral traffic. This is a fundamental shift I first noticed in late 2025, and the trend has only accelerated. Our earlier research on AI SEO statistics predicted this crossover point, but it arrived faster than we expected.
Despite the heavy crawling activity of AI crawlers and LLM bots, referral traffic remains overwhelmingly dominated by Google. ChatGPT's referral share has nearly doubled since January but still contributes just 0.20% of all referrals.
| Referrer | Share of All Referral Traffic |
|---|---|
| 83.95% | |
| TikTok | 7.94% |
| Bing | 3.07% |
| Yandex | 1.69% |
| DuckDuckGo | 1.25% |
| Baidu (mobile) | 1.14% |
| Baidu (desktop) | 0.47% |
| ChatGPT | 0.20% |
| Bing China (cn.bing.com) | 0.14% |
ChatGPT's referral share grew from 0.13% on January 1 to 0.24% by mid-March — an 85% increase. At this growth rate, ChatGPT could reach 1% referral share by late 2026, which would place it alongside Baidu and DuckDuckGo as a meaningful traffic source. For GEO strategists, this trajectory confirms that optimizing for LLM-powered search surfaces is no longer optional.
TikTok's referral share dropped from 13.3% in early January to approximately 3.5% by mid-February and has remained there since — a 74% decline that coincides with changing content consumption patterns on the platform. I've been watching this decline closely because it has significant implications for content distribution strategies across our client portfolio.
Retail and software websites receive disproportionate attention from AI crawlers and LLM bots relative to their representation on the web.
| Industry | Share of AI Crawler Traffic |
|---|---|
| Retail | 20.56% |
| Computer Software | 17.32% |
| Gambling & Casinos | 6.55% |
| Marketing & Advertising | 6.40% |
| Information Technology | 5.54% |
| Media | 5.04% |
| Internet | 4.93% |
| Adult Entertainment | 4.02% |
| Telecommunications | 3.08% |
Retail sites absorb 20.56% of all AI crawler traffic while receiving some of the worst crawl-to-refer ratios — a double penalty that makes e-commerce sites the biggest subsidizers of LLM model training.
Website operators are responding to unfavorable crawl-to-refer ratios by blocking AI crawlers and LLM bots in their robots.txt files. According to Cloudflare Radar, which analyzed 3,973 robots.txt files from popular domains, blocking is already widespread.
| AI Crawler / LLM Bot | Fully Blocked | Partially Blocked | Total Blocking | % of Analyzed Domains |
|---|---|---|---|---|
| GPTBot (OpenAI / ChatGPT) | 330 | 108 | 438 | 11.0% |
| CCBot (Common Crawl) | 326 | 64 | 390 | 9.8% |
| ClaudeBot (Anthropic / Claude) | 286 | 80 | 366 | 9.2% |
| Google-Extended (Gemini) | 264 | 76 | 340 | 8.6% |
| Bytespider (ByteDance) | 278 | 42 | 320 | 8.1% |
GPTBot receives the most blocks (438 domains, 11.0% of analyzed sites), followed by CCBot and ClaudeBot. The blocking rate correlates directly with crawl aggressiveness — the bots that take the most and return the least face the most resistance.
Technology and business websites lead in AI bot blocking, with 904 and 798 domains respectively implementing Disallow rules. These industries — which are also among the most heavily crawled — are actively fighting back. If you're unsure how your own robots.txt is configured for LLM bots, I'd recommend reviewing our guide on what LLMs.txt is and how to generate it as part of a broader GEO and AI crawler access strategy.
| Domain Category | Domains Blocking AI Crawlers & LLM Bots |
|---|---|
| Technology | 904 |
| Business | 798 |
| E-commerce | 281 |
| Search Engines | 248 |
| Content Servers | 206 |
The decision to block AI crawlers and LLM bots depends on your industry, traffic goals, and long-term GEO strategy. Based on my analysis, the data suggests three distinct approaches based on operator behavior.
Block with confidence: Meta-ExternalAgent and any AI crawler with no referral mechanism. These bots provide zero return traffic and consume server resources exclusively for the operator's benefit. Meta is the single largest AI crawler (36.10% of AI traffic) with no referral product.
Evaluate carefully: ClaudeBot (23,951:1 ratio) and GPTBot (1,276:1 ratio). Both train LLM models on your content with minimal traffic return. However, blocking these bots means your content won't be represented in Claude or ChatGPT responses — a potential long-term GEO visibility risk as AI search optimization grows in importance.
Allow strategically: PerplexityBot (111:1) and Microsoft Copilot (33:1). These operators have moderate crawl-to-refer ratios and offer growing referral traffic through their LLM-powered search products. Perplexity in particular cites sources prominently in responses, providing brand visibility even when users don't click through.
Keep unblocked: Google (5:1), DuckDuckGo (1.5:1), and traditional search crawlers. These operators deliver measurable referral traffic that justifies their crawl volume.
For SEO and GEO professionals managing enterprise sites, the data supports selective blocking. A retail site receiving a 10,971:1 ratio from ClaudeBot has a quantifiable business case for blocking it, while a finance site receiving 42:1 from PerplexityBot has reason to keep access open.
The crawl-to-refer ratio gap between AI crawlers and traditional search engines reveals a structural tension in how the web economy works. Search engines historically operated on an implicit bargain: they crawled content and sent traffic back. LLM-powered platforms like ChatGPT, Claude, and Grok are breaking this bargain by crawling content without a reciprocal traffic mechanism.
Three trends from the data point to where this is heading:
1. ChatGPT's referral share is growing. From 0.13% to 0.24% in 75 days (85% growth), chatgpt.com is the fastest-growing referrer in Cloudflare's data. As ChatGPT Search matures, GPTBot's crawl-to-refer ratio should continue improving. This makes GEO optimization for ChatGPT increasingly worthwhile.
2. Blocking rates are increasing. With 11% of top domains already blocking GPTBot, LLM platforms face a data access problem. If blocking accelerates, AI models will train on progressively less representative data — potentially degrading response quality for users.
3. Industry-specific GEO strategies will emerge. The 4.3x difference in Perplexity's ratio between finance (42:1) and shopping (182:1) means publishers in different verticals will adopt different crawler access policies. One-size-fits-all robots.txt rules don't reflect the nuanced economics of AI crawler management.
Website owners who monitor these ratios quarterly and adjust their AI crawler access policies based on measurable returns will outperform those who either block everything or allow everything. This is the foundation of a data-driven GEO strategy.
While Cloudflare Radar provides aggregate data, individual site owners can approximate their own crawl-to-refer ratios using server logs and analytics. Here's the process I follow for our clients:
SEO tools like SEOmator can help automate the technical auditing side — analyzing your robots.txt configuration, monitoring AI crawler access patterns, and identifying which LLM bots are consuming your crawl budget without delivering proportional value.
All data in this report comes from Cloudflare Radar, which monitors traffic patterns across Cloudflare's global network. Cloudflare's network handles a significant portion of all internet traffic, making its bot and crawler data broadly representative of web-wide patterns.
Data dimensions used:
Date range: January 1 – March 16, 2026
Limitations: Crawl-to-refer ratios measure aggregate behavior across Cloudflare's network. Individual site ratios will vary based on content type, domain authority, and traffic patterns. Referral attribution may undercount AI-driven visits that arrive through intermediate pages or don't carry referrer headers.
Data sourced from Cloudflare Radar Bot & Crawler Analytics, January 1 – March 16, 2026. Last updated: March 17, 2026.
