GEO Data Report 2026: Which AI Crawlers & LLM Bots Take the Most and Give the Least?

GEO Data Report 2026: Which AI Crawlers & LLM Bots Take the Most and Give the Least?
ClaudeBot crawls 23,951 pages per referral. GPTBot: 1,276:1. I analyzed Cloudflare Radar data to measure which AI crawlers and LLM bots extract the most from publishers — and what it means for your GEO strategy.

Anthropic's ClaudeBot crawls 23,951 pages for every single referral it sends back to website owners, according to Cloudflare Radar data from January through March 2026. OpenAI's GPTBot sits at 1,276:1, while DuckDuckGo achieves near-parity at 1.5:1. I spent weeks analyzing this data to build what I believe is the most actionable metric for Generative Engine Optimization (GEO) strategy: the crawl-to-refer ratio — the number of pages an AI crawler or LLM bot crawls divided by the referrals its parent platform (ChatGPT, Claude, Perplexity, Grok, Copilot) sends back.

Data source: Cloudflare Radar — Bot & Crawler Analytics (January 1 – March 16, 2026) analyzed the huge data by SEOmator

geo data report 2026 which ai crawlers

What Is the Crawl-to-Refer Ratio and Why Does It Matter for GEO and SEO?

The crawl-to-refer ratio measures how many pages an AI crawler or LLM bot crawls from your website for every referral visit it sends back. A ratio of 100:1 means the bot crawled 100 of your pages before its parent platform directed a single user to your site.

This metric matters because AI crawlers — GPTBot, ClaudeBot, PerplexityBot, and others — consume your server resources (bandwidth, compute, and crawl budget) while indexing your content for their LLM models. The referral side measures whether that investment pays off through actual traffic returned to your site.

For SEO and GEO professionals, this ratio functions as a return-on-investment metric for crawler access. Every page crawled by an LLM bot is a page that could have been crawled by Googlebot instead, directly affecting your site's crawl budget allocation. In my experience auditing enterprise sites, I've seen AI crawlers consume up to 40% of total crawl activity — resources that deliver zero organic search value.

How Do the Top AI Crawlers and LLM Bots Compare on Crawl-to-Refer Ratios?

The gap between the best and worst crawl-to-refer ratios spans four orders of magnitude. DuckDuckGo sends back nearly one visit for every page it crawls, while Anthropic's ClaudeBot takes nearly 24,000 pages of content for each referral returned.

OperatorCrawl-to-Refer RatioWhat This Means
Anthropic (ClaudeBot)23,951:1Crawls 23,951 pages per 1 referral sent back
OpenAI (GPTBot)1,276:1Crawls 1,276 pages per 1 referral sent back
Perplexity (PerplexityBot)111:1Crawls 111 pages per 1 referral sent back
Microsoft (Copilot)33:1Includes Copilot and Bing AI features
Mistral22:1Relatively low crawl volume
Yandex21:1Russian search with growing AI features
Google (Gemini / AI Overviews)5:1Traditional search drives high referral volume
Baidu4.8:1Chinese search with established referral patterns
ByteDance3.1:1TikTok's parent company generates strong referrals
DuckDuckGo1.5:1Near-parity — the most efficient ratio of all operators

Anthropic's ClaudeBot crawls content to train Claude but operates no consumer search product that returns traffic. OpenAI's GPTBot similarly trains models for ChatGPT, though ChatGPT Search has begun generating some referral traffic — contributing to its 0.20% referrer share according to Cloudflare Radar. I've been tracking both bots in our server logs for the past six months, and the disparity is striking when you see it at the individual site level.

Why Is ClaudeBot's Ratio 4,790x Worse Than Google's?

google dominance on ai search

Anthropic's 23,951:1 ratio compared to Google's 5:1 ratio reflects a fundamental difference in business model rather than inefficiency. ClaudeBot operates as a training data crawler — it ingests web content to improve Claude's capabilities but Anthropic does not operate a search engine or any consumer-facing product that links back to source websites.

Google's 5:1 ratio benefits from Search's core function: sending users to websites. Every Google Search result click is a referral. Anthropic has no equivalent referral mechanism — Claude responses occasionally cite sources but do not generate clickable referrals tracked by analytics.

This structural gap means website owners who allow ClaudeBot access are subsidizing Anthropic's model training with zero measurable return in referral traffic. Whether that trade-off is acceptable depends on whether publishers view LLM model inclusion as a form of brand presence or strictly as a traffic exchange. Based on what I've seen across dozens of client sites, most publishers haven't even realized this imbalance exists until they check their server logs. Understanding this ratio is becoming a core part of any serious AI search optimization strategy.

Is ClaudeBot's Crawl-to-Refer Ratio Improving Over Time?

Anthropic's ratio has shown dramatic month-over-month improvement in 2026, dropping 74% from January to March. However, even the improved March ratio of 11,736:1 still dwarfs every other operator.

OperatorJanuary 2026February 2026March 1-16, 2026Trend
Anthropic (ClaudeBot)45,458:125,023:111,736:1Improving rapidly (-74%)
OpenAI (GPTBot)1,280:11,340:11,161:1Stable with slight improvement
Perplexity (PerplexityBot)111:1114:1109:1Remarkably stable (~110:1)
Microsoft (Copilot)35:134:128:1Improving (-20%)
Yandex18:121:126:1Worsening (+44%)
Google4.7:15.3:15.0:1Stable (~5:1)
ByteDance2.6:13.4:15.5:1Worsening (+112%)
DuckDuckGo1.43:11.48:11.50:1Stable (~1.5:1)

ClaudeBot's improvement likely reflects reduced crawling intensity rather than increased referrals, since Anthropic has not launched a search product. The daily timeseries confirms this: ClaudeBot's ratio peaked at 136,416:1 on January 1 and steadily declined to 6,393:1 by mid-March.

ByteDance's worsening ratio — from 2.6:1 to 5.5:1 — coincides with TikTok's referral share declining from ~13% in early January to ~3% by mid-February and stabilizing there, while its crawling activity increased. I documented similar AI bot traffic patterns by country in our earlier analysis, where geographic crawling behavior showed comparable volatility.

How Do Crawl-to-Refer Ratios Differ by Industry?

crawl to refer ratios

The industry your website operates in dramatically affects what you receive in return from AI crawlers and LLM bots. Finance sites receive 6x more referrals per crawl from Perplexity than shopping sites do. If you're building a GEO strategy, your vertical determines which bots are worth allowing.

OperatorShopping & RetailTech & ElectronicsFinanceBusiness & Industry
Anthropic (ClaudeBot)10,971:11,710:111,503:113,805:1
OpenAI (GPTBot)584:1583:1352:1879:1
Perplexity (PerplexityBot)182:198:142:1100:1
Microsoft (Copilot)51:123:125:130:1
Google8.2:14.6:13.1:14.3:1
DuckDuckGo1.9:11.5:13.4:11.7:1

Key findings by industry:

  • Finance gets the best AI referral rates. Perplexity's 42:1 ratio for finance is 4.3x better than its 182:1 ratio for shopping. Financial queries require authoritative, up-to-date sources that LLM models must cite.
  • Tech and electronics sites fare best with ClaudeBot. At 1,710:1, tech sites receive 8x more referrals from Anthropic than business sites (13,805:1). Claude users disproportionately search for technical content.
  • Shopping sites get the worst deal. Across nearly every operator, e-commerce sites have the highest crawl-to-refer ratios. LLM-powered search tools crawl product catalogs heavily but rarely refer shoppers to the source.
  • Google's ratio varies 2.6x by industry — from 3.1:1 for finance to 8.2:1 for shopping — confirming that even traditional search delivers uneven value across verticals.

When I run technical SEO audits for SaaS clients, I always check the industry-specific crawl patterns. A SaaS company in the finance vertical will get a fundamentally different return from AI crawlers than an e-commerce retailer selling consumer electronics.

What Percentage of Web Traffic Comes from AI Crawlers and LLM Bots?

AI crawlers and LLM bots now generate 5.06% of all crawler traffic observed by Cloudflare Radar, with another 3.57% classified as "mixed-purpose" bots that serve both AI training and traditional functions.

Client TypeShare of Crawler Traffic
Human46.24%
Non-AI Bot45.13%
AI Bot5.06%
Mixed Purpose3.57%

When examining AI crawler volume by operator, two players dominate: Meta and OpenAI together account for over 70% of all AI crawler traffic.

AI Crawler OperatorShare of AI Crawl VolumePrimary Bot
Meta36.10%Meta-ExternalAgent
OpenAI34.44%GPTBot
Amazon13.58%Amazonbot
Google8.14%Google-Extended
Huawei7.43%PetalBot

Meta's dominance is notable because Meta-ExternalAgent does not appear in the crawl-to-refer ratio data at all — it crawls content for Meta AI (the LLM powering Instagram, WhatsApp, and Facebook AI features) with no referral mechanism. Meta is the single largest AI crawler at 36.10% of AI traffic with zero return for publishers. I've recommended blocking Meta-ExternalAgent to every client I've worked with this year because there is simply no upside to allowing it.

Which AI Crawlers and LLM Bots Actually Crawl the Most Pages?

Looking at total crawler user agent share across all bot traffic (not just AI), Googlebot still leads, but GPTBot and ClaudeBot now hold the second and third positions.

User AgentShare of All Crawler TrafficCategory
Googlebot23.91%Search Engine
GPTBot (ChatGPT)17.07%AI Crawler / LLM Bot
ClaudeBot (Claude)14.40%AI Crawler / LLM Bot
Meta-ExternalAgent (Meta AI)12.26%AI Crawler / LLM Bot
Bingbot8.01%Search Engine
Amazonbot (Alexa AI)5.37%AI Crawler / LLM Bot
facebookexternalhit3.08%Social Preview
PetalBot (Huawei)2.59%AI Crawler / LLM Bot
YandexBot2.54%Search Engine

AI crawlers and LLM bots (GPTBot + ClaudeBot + Meta-ExternalAgent + Amazonbot + PetalBot) collectively represent 51.69% of all crawler traffic — surpassing traditional search engine crawlers combined. Search engine crawlers (Googlebot + Bingbot + YandexBot) account for 34.46%.

LLM bots now consume more crawl resources than the search engines that actually drive the majority of referral traffic. This is a fundamental shift I first noticed in late 2025, and the trend has only accelerated. Our earlier research on AI SEO statistics predicted this crossover point, but it arrived faster than we expected.

Where Does Referral Traffic Actually Come From?

Despite the heavy crawling activity of AI crawlers and LLM bots, referral traffic remains overwhelmingly dominated by Google. ChatGPT's referral share has nearly doubled since January but still contributes just 0.20% of all referrals.

ReferrerShare of All Referral Traffic
Google83.95%
TikTok7.94%
Bing3.07%
Yandex1.69%
DuckDuckGo1.25%
Baidu (mobile)1.14%
Baidu (desktop)0.47%
ChatGPT0.20%
Bing China (cn.bing.com)0.14%

ChatGPT's referral share grew from 0.13% on January 1 to 0.24% by mid-March — an 85% increase. At this growth rate, ChatGPT could reach 1% referral share by late 2026, which would place it alongside Baidu and DuckDuckGo as a meaningful traffic source. For GEO strategists, this trajectory confirms that optimizing for LLM-powered search surfaces is no longer optional.

TikTok's referral share dropped from 13.3% in early January to approximately 3.5% by mid-February and has remained there since — a 74% decline that coincides with changing content consumption patterns on the platform. I've been watching this decline closely because it has significant implications for content distribution strategies across our client portfolio.

Which Industries Are Crawled the Most by AI Crawlers and LLM Bots?

Retail and software websites receive disproportionate attention from AI crawlers and LLM bots relative to their representation on the web.

IndustryShare of AI Crawler Traffic
Retail20.56%
Computer Software17.32%
Gambling & Casinos6.55%
Marketing & Advertising6.40%
Information Technology5.54%
Media5.04%
Internet4.93%
Adult Entertainment4.02%
Telecommunications3.08%

Retail sites absorb 20.56% of all AI crawler traffic while receiving some of the worst crawl-to-refer ratios — a double penalty that makes e-commerce sites the biggest subsidizers of LLM model training.

How Many Websites Are Blocking AI Crawlers and LLM Bots?

Website operators are responding to unfavorable crawl-to-refer ratios by blocking AI crawlers and LLM bots in their robots.txt files. According to Cloudflare Radar, which analyzed 3,973 robots.txt files from popular domains, blocking is already widespread.

AI Crawler / LLM BotFully BlockedPartially BlockedTotal Blocking% of Analyzed Domains
GPTBot (OpenAI / ChatGPT)33010843811.0%
CCBot (Common Crawl)326643909.8%
ClaudeBot (Anthropic / Claude)286803669.2%
Google-Extended (Gemini)264763408.6%
Bytespider (ByteDance)278423208.1%

GPTBot receives the most blocks (438 domains, 11.0% of analyzed sites), followed by CCBot and ClaudeBot. The blocking rate correlates directly with crawl aggressiveness — the bots that take the most and return the least face the most resistance.

Technology and business websites lead in AI bot blocking, with 904 and 798 domains respectively implementing Disallow rules. These industries — which are also among the most heavily crawled — are actively fighting back. If you're unsure how your own robots.txt is configured for LLM bots, I'd recommend reviewing our guide on what LLMs.txt is and how to generate it as part of a broader GEO and AI crawler access strategy.

Domain CategoryDomains Blocking AI Crawlers & LLM Bots
Technology904
Business798
E-commerce281
Search Engines248
Content Servers206

Should You Block AI Crawlers and LLM Bots Based on This Data?

The decision to block AI crawlers and LLM bots depends on your industry, traffic goals, and long-term GEO strategy. Based on my analysis, the data suggests three distinct approaches based on operator behavior.

Block with confidence: Meta-ExternalAgent and any AI crawler with no referral mechanism. These bots provide zero return traffic and consume server resources exclusively for the operator's benefit. Meta is the single largest AI crawler (36.10% of AI traffic) with no referral product.

Evaluate carefully: ClaudeBot (23,951:1 ratio) and GPTBot (1,276:1 ratio). Both train LLM models on your content with minimal traffic return. However, blocking these bots means your content won't be represented in Claude or ChatGPT responses — a potential long-term GEO visibility risk as AI search optimization grows in importance.

Allow strategically: PerplexityBot (111:1) and Microsoft Copilot (33:1). These operators have moderate crawl-to-refer ratios and offer growing referral traffic through their LLM-powered search products. Perplexity in particular cites sources prominently in responses, providing brand visibility even when users don't click through.

Keep unblocked: Google (5:1), DuckDuckGo (1.5:1), and traditional search crawlers. These operators deliver measurable referral traffic that justifies their crawl volume.

For SEO and GEO professionals managing enterprise sites, the data supports selective blocking. A retail site receiving a 10,971:1 ratio from ClaudeBot has a quantifiable business case for blocking it, while a finance site receiving 42:1 from PerplexityBot has reason to keep access open.

What Does This Mean for the Future of GEO, AI Search, and Publisher Relations?

The crawl-to-refer ratio gap between AI crawlers and traditional search engines reveals a structural tension in how the web economy works. Search engines historically operated on an implicit bargain: they crawled content and sent traffic back. LLM-powered platforms like ChatGPT, Claude, and Grok are breaking this bargain by crawling content without a reciprocal traffic mechanism.

Three trends from the data point to where this is heading:

1. ChatGPT's referral share is growing. From 0.13% to 0.24% in 75 days (85% growth), chatgpt.com is the fastest-growing referrer in Cloudflare's data. As ChatGPT Search matures, GPTBot's crawl-to-refer ratio should continue improving. This makes GEO optimization for ChatGPT increasingly worthwhile.

2. Blocking rates are increasing. With 11% of top domains already blocking GPTBot, LLM platforms face a data access problem. If blocking accelerates, AI models will train on progressively less representative data — potentially degrading response quality for users.

3. Industry-specific GEO strategies will emerge. The 4.3x difference in Perplexity's ratio between finance (42:1) and shopping (182:1) means publishers in different verticals will adopt different crawler access policies. One-size-fits-all robots.txt rules don't reflect the nuanced economics of AI crawler management.

Website owners who monitor these ratios quarterly and adjust their AI crawler access policies based on measurable returns will outperform those who either block everything or allow everything. This is the foundation of a data-driven GEO strategy.

How to Monitor Your Site's Crawl-to-Refer Ratio

While Cloudflare Radar provides aggregate data, individual site owners can approximate their own crawl-to-refer ratios using server logs and analytics. Here's the process I follow for our clients:

  1. Server log analysis: Count requests from known AI crawler and LLM bot user agents (GPTBot, ClaudeBot, Anthropic-AI, Meta-ExternalAgent, PerplexityBot) over a 30-day period
  2. Referral tracking: In Google Analytics 4 or your analytics platform, filter referral traffic from chatgpt.com, perplexity.ai, bing.com (Copilot), and other LLM-powered search surfaces
  3. Calculate the ratio: Divide total AI crawler requests by total AI-referred visits
  4. Compare against benchmarks: Use the industry-specific ratios from this report as baselines for your GEO strategy

SEO tools like SEOmator can help automate the technical auditing side — analyzing your robots.txt configuration, monitoring AI crawler access patterns, and identifying which LLM bots are consuming your crawl budget without delivering proportional value.

Methodology

All data in this report comes from Cloudflare Radar, which monitors traffic patterns across Cloudflare's global network. Cloudflare's network handles a significant portion of all internet traffic, making its bot and crawler data broadly representative of web-wide patterns.

Data dimensions used:

  • CRAWL_REFER_RATIO: Pages crawled per referral sent, by operator
  • REFERER: Referral traffic share by source domain
  • CLIENT_TYPE: Traffic classification (Human, Non-AI Bot, AI Bot, Mixed)
  • USER_AGENT: Crawler identification by user agent string
  • VERTICAL / INDUSTRY: Traffic segmentation by website category
  • robots.txt analysis: Directive parsing from 3,973 popular domains

Date range: January 1 – March 16, 2026

Limitations: Crawl-to-refer ratios measure aggregate behavior across Cloudflare's network. Individual site ratios will vary based on content type, domain authority, and traffic patterns. Referral attribution may undercount AI-driven visits that arrive through intermediate pages or don't carry referrer headers.

Data sourced from Cloudflare Radar Bot & Crawler Analytics, January 1 – March 16, 2026. Last updated: March 17, 2026.