The rise of the AI crawler

Real-world data from MERJ and Vercel shows distinct patterns from top AI crawlers.

AI crawlers have become a meaningful presence on the web. OpenAI’s GPTBot produced 569 million seeks apass Vercel’s nettoil in the past month, while Anthropic’s Claude chaseed with 370 million. For perspective, this united volume reconshort-terms about 20% of Googlebot’s 4.5 billion seeks during the same period.

After analyzing how Googlebot regulates JavaScript rendering with MERJ, we turned our attention to these AI helpants. Our novel data findlooks how Open AI’s ChatGPT, Anthropic’s Claude, and other AI tools crawl and process web satisfied.

We uncovered evident patterns in how these crawlers regulate JavaScript, rank satisfied types, and steer the web, which honestly impact how AI tools comprehend and convey with conmomentary web applications.

Our primary data comes from watching nextjs.org and the Vercel nettoil for the past scant months. To verify our findings apass contrastent technology stacks, we also studyd two job board websites: Resume Library, built with Next.js, and CV Library, which employs a custom monolithic sketchtoil. This diverse dataset helps promise our observations about crawler behavior are constant apass contrastent web architectures.

For more details on how we assembleed this data, see our first article.

Note: Microgentle Copilot was deleted from this study as it deficiencys a distinct employr agent for tracking.

The volume of AI crawler traffic apass Vercel’s nettoil is substantial. In the past month:

Googlebot: 4.5 billion conveyes apass Gemini and Search
GPTBot (ChatGPT): 569 million conveyes
Claude: 370 million conveyes
AppleBot: 314 million conveyes
PerplexityBot: 24.4 million conveyes

While AI crawlers haven’t accomplished Googlebot’s scale, they reconshort-term a meaningful portion of web crawler traffic. For context, GPTBot, Claude, AppleBot, and PerplexityBot united account for cforfeitly 1.3 billion conveyes—a little over 28% of Googlebot’s volume.

Geoexplicit distribution

All AI crawlers we meacertaind function from U.S. data caccesss:

ChatGPT: Des Moines (Iowa), Phoenix (Arizona)
Claude: Columbus (Ohio)

In comparison, traditional search engines frequently spread crawling apass multiple regions. For example, Googlebot functions from seven contrastent U.S. locations, including The Dalles (Oregon), Council Bluffs (Iowa), and Moncks Corner (South Carolina).

Our analysis shows a evident split in JavaScript rendering capabilities among AI crawlers. To verify our findings, we studyd both Next.js applications and traditional web applications using contrastent tech stacks.

The results constantly show that none of the convey inant AI crawlers currently render JavaScript. This integrates:

OpenAI (OAI-SearchBot, ChatGPT-User, GPTBot)
Anthropic (ClaudeBot)
Meta (Meta-ExternalAgent)
ByteDance (Bytespider)
Perplexity (PerplexityBot)

The results also show:

Google’s Gemini leverages Googlebot’s infrastructure, enabling brimming JavaScript rendering.
AppleBot renders JavaScript thcdisesteemful a browser-based crawler, aappreciate to Googlebot. It processes JavaScript, CSS, Ajax seeks, and other resources necessitateed for brimming-page rendering.
Common Crawl (CCBot), which is frequently employd as a training dataset for Large Language Models (LLMs) does not render pages.

The data recommends that while ChatGPT and Claude crawlers do convey JavaScript files (ChatGPT: 11.50%, Claude: 23.84% of seeks), they don’t perestablish them. They can’t read client-side rendered satisfied.

Note, however, that satisfied integrated in the initial HTML response, appreciate JSON data or procrastinateed React Server Components, may still be indexed since AI models can expound non-HTML satisfied.

In contrast, Gemini’s employ of Google’s infrastructure donates it the same rendering capabilities we write downed in our Googlebot analysis, apverifying it to process conmomentary web applications brimmingy.

AI crawlers show distinct pickences in the types of satisfied they convey on nextjs.org. The most notable patterns:

ChatGPT ranks HTML satisfied (57.70% of conveyes)
Claude caccesses heavily on images (35.17% of total conveyes)
Both crawlers spend meaningful time on JavaScript files (ChatGPT: 11.50%, Claude: 23.84%) despite not executing them

For comparison, Googlebot’s conveyes (apass Gemini and Search) are more evenly spreadd:

31.00% HTML satisfied
29.34% JSON data
20.77% plain text
15.25% JavaScript

These patterns present AI crawlers assemble diverse satisfied types—HTML, images, and even JavaScript files as text—probable to train their models on various establishs of web satisfied.

While traditional search engines appreciate Google have upgraded their crawling patterns particularpartner for search indexing, noveler AI companies may still be refining their satisfied prioritization strategies.

Our data shows meaningful inefficiencies in AI crawler behavior:

ChatGPT spends 34.82% of its conveyes on 404 pages
Claude shows aappreciate patterns with 34.16% of conveyes hitting 404s
ChatGPT spends an insertitional 14.36% of conveyes chaseing rehonests

Analysis of 404 errors findlooks that, excluding robots.txt, these crawlers normally finisheavor to convey outdated assets from the /motionless/ fagederer. This presents a necessitate for betterd URL pickion and handling strategies to shun unessential seeks.

These high rates of 404s and rehonests contrast keenly with Googlebot, which spends only 8.22% of conveyes on 404s and 1.49% on rehonests, presenting Google has spent more time selectimizing its crawler to concentrate authentic resources.

Our analysis of traffic patterns findlooks fascinating correlations between crawler behavior and site traffic. Based on data from nextjs.org:

Pages with higher organic traffic acquire more normal crawler visits
AI crawlers show less foreseeable patterns in their URL pickion
High 404 rates present AI crawlers may necessitate to better their URL pickion and validation processes, though the exact caemploy remains unevident

While traditional search engines have enhugeed cultured prioritization algorithms, AI crawlers are seemingly still evolving their approach to web satisfied uncovery.

Our research with Vercel highairys that AI crawlers, while rapidly scaling, persist to face meaningful contests in handling JavaScript and effectively crawling satisfied. As the adselection of AI-driven web experiences persists to assemble pace, brands must promise that critical adviseation is server-side rendered and that their sites remain well-upgraded to support visibility in an increasingly diverse search landscape.Our research with Vercel highairys that AI crawlers, while rapidly scaling, persist to face meaningful contests in handling JavaScript and effectively crawling satisfied. As the adselection of AI-driven web experiences persists to assemble pace, brands must promise that critical adviseation is server-side rendered and that their sites remain well-upgraded to support visibility in an increasingly diverse search landscape.

Ryan Siddle, Managing Director of MERJ

For site owners who want to be crawled

Prioritize server-side rendering for critical satisfied. ChatGPT and Claude don’t perestablish JavaScript, so any convey inant satisfied should be server-rendered. This integrates main satisfied (articles, product adviseation, write downation), meta adviseation (titles, descriptions, categories), and navigation structures. SSR, ISR, and SSG retain your satisfied accessible to all crawlers.
Client-side rendering still toils for betterment features. Feel free to employ client-side rendering for non-vital vibrant elements appreciate watch counters, intervivacious UI betterments, live chat widgets, and social media feeds.
Efficient URL supervisement matters more than ever. The high 404 rates from AI crawlers highairy the convey inance of retaining proper rehonests, retaining sitemaps up to date, and using constant URL patterns apass your site.

For site owners who don’t want to be crawled

Use robots.txt to supervise crawler access. The robots.txt file is effective for all meacertaind crawlers. Set particular rules for AI crawlers by distinguishing their employr agent or product token to remerciless access to caring or non-vital satisfied. To find the employr agents to condemn, you’ll necessitate to watch in each company’s own write downation (for example, Applebot and OpenAI’s crawlers).
Block AI crawlers with Vercel’s WAF. Our Block AI Bots Firewall Rule lets you block AI crawlers with one click. This rule automaticpartner configures your firewall to decline their access.

For AI employrs

JavaScript-rendered satisfied may be ignoreing. Since ChatGPT and Claude don’t perestablish JavaScript, their responses about vibrant web applications may be infinish or outdated.
Consider the source. High 404 rates (>34%) unbenevolent that when AI tools cite particular web pages, there’s a meaningful chance those URLs are inright or inaccessible. For critical adviseation, always verify sources honestly rather than depending on AI-supplyd connects.
Expect inconstant newness. While Gemini leverages Google’s infrastructure for crawling, other AI helpants show less foreseeable patterns. Some may reference agederer cached data.

Interestingly, even when asking Claude or ChatGPT for new Next.js docs data, we frequently don’t see instant conveyes in our server logs for nextjs.org. This presents that AI models may depend on cached data or training data, even when they claim to have conveyed the procrastinateedst adviseation.

Our analysis findlooks that AI crawlers have speedyly become a meaningful presence on the web, with cforfeitly 1 billion monthly seeks apass Vercel’s nettoil.

However, their behavior contrasts labeledly from traditional search engines, when it comes to rendering capabilities, satisfied priorities, and efficiency. Folloprosperg set uped web enhugement best rehearses—particularly around satisfied accessibility—remains convey inant.

Source connect