Hotelrank ResearchResearch / robots.txt StudyMarch 2026

Do Hotels Block AI Crawlers?

We parsed 105,002 hotel robots.txt files. 96.7% have zero AI-specific blocking rules. The industry is wide open.

82.2%
Have robots.txt
3.3%
Block Any AI
7.5%
France (Outlier)
2.1%
"Smart" Strategy

TL;DR

We parsed robots.txt files from 105,002 hotel websites across 7 countries. Only 3.3% block any AI crawler β€” and just 0.9% block all of them. GPTBot (OpenAI) is the most commonly blocked at 2.9%, while AI search bots like PerplexityBot and OAI-SearchBot are blocked by just 1.0%. The most interesting signal: 2.1% of hotels use a "smart" strategy β€” blocking training bots while allowing search bots through. France is a clear outlier at 7.5%, more than 3x the rate of any other country.

Executive Summary

The robots.txt file is the first line of defense for any website. It tells crawlers what they can and cannot access. With the rise of AI-powered search engines (ChatGPT, Perplexity, Gemini) and AI model training, hotels face a new decision: should they allow AI bots to crawl their content?

Our analysis of 105,002 hotel websites reveals that the vast majority have not yet made this decision β€” or have decided to leave the door wide open. Only 3.3% block any AI crawler at all. For context, this means 96.7% of hotel websites are fully accessible to AI training bots and AI-powered search engines alike.

The distinction between training and search bots matters. Training bots (GPTBot, ClaudeBot, CCBot) scrape content to build AI models. Search bots (PerplexityBot, OAI-SearchBot) fetch content to answer user queries in real time. Blocking the first protects your content from being used in training. Blocking the second removes you from AI-powered search results. Understanding this distinction is critical β€” and our anatomy of ChatGPT hotel search article explains exactly how these bots work.

96.7%
No AI blocking rules
3.3%
Block at least one AI bot
2.5x
Training vs search blocking gap
The key finding: Hotels that do block AI crawlers are making a deliberate distinction between training and search. Training bots are blocked ~2.5x more often than search bots. This "smart" strategy β€” blocking training while allowing search β€” is emerging as the most sophisticated approach, adopted by 2.1% of hotels.

robots.txt Adoption

How many hotel websites have a robots.txt file? (n=105,002 hotels)

82.2%
Have robots.txt
86,348 hotels
60.1%
Have Sitemap
63,110 hotels
17.8%
No robots.txt
18,654 hotels
0.9%
Blanket Disallow
958 hotels

robots.txt status breakdown

The 60.1% sitemap rate is a positive signal. Hotels that declare a sitemap in their robots.txt are actively helping crawlers discover their content. Combined with the 82.2% robots.txt adoption rate, this suggests that most hotel websites have at least basic crawl management in place β€” they just haven't updated it for the AI era.

AI Blocking Overview

How does AI bot blocking compare to traditional search engine blocking?

3.3%
Block Any AI
3,458 hotels
0.9%
Block All AI
957 hotels
1.3%
Block Googlebot
1,325 hotels
1.1%
Block Bingbot
1,160 hotels

AI blocking vs traditional search engine blocking

AI blocking (3.3%) is higher than traditional search blocking. Hotels that block Googlebot (1.3%) or Bingbot (1.1%) are likely misconfigured β€” blocking your primary search engines is almost never intentional. But AI blocking at 3.3% represents a deliberate choice by hotels that are specifically targeting AI crawlers while keeping traditional search open.

Per-Bot Blocking Rates

Which AI bots are hotels blocking? Colored by category: training search user agent

AI bot blocking rates by bot

GPTBot leads at 2.9%. Training bots cluster between 2.5% and 2.9%, while search and user-agent bots hover around 1.0%. The ~2.5x gap between training and search bot blocking is the key finding β€” hotels that actively manage AI access are distinguishing between content scraping for model training and real-time search retrieval.

Full AI bot blocking data

BotProviderCategoryHotels Blocking% of Total
GPTBotOpenAItraining3,0362.9%
Google-ExtendedGoogletraining2,7932.7%
CCBotCommon Crawltraining2,8472.7%
BytespiderByteDancetraining2,7822.6%
ClaudeBotAnthropictraining2,7422.6%
Applebot-ExtendedAppletraining2,6692.5%

Selective Blocking: Training vs Search

Among hotels that block AI, what strategy are they using?

2.1%
"Smart" Strategy
Block training, allow search
0.9%
Block Everything
All AI bots blocked
0.1%
Reverse Strategy
Block search, allow training

AI blocking strategy distribution

The "smart" strategy is the most interesting signal in this data. 2.1% of hotels (2,214 properties) block training bots like GPTBot and ClaudeBot while allowing search bots like PerplexityBot and OAI-SearchBot to crawl freely. This means they protect their content from model training while remaining visible in AI-powered search results. Only 58 hotels (0.1%) do the reverse β€” blocking search while allowing training β€” which suggests either misconfiguration or a very unusual strategy.

Understanding OpenAI's 3 crawlers

Per OpenAI's official documentation, each crawler serves a distinct purpose:

Training

GPTBot β€” crawls content for training generative AI models. Blocking it means your content won't be used for training. This is the one most hotels should consider blocking.

Search

OAI-SearchBot β€” indexes content for ChatGPT's search features. Blocking it means your site won't appear in ChatGPT search results, only as navigational links. Hotels wanting AI search visibility should allow this.

User

ChatGPT-User β€” triggered when a user asks ChatGPT to browse a page. It's user-initiated, and OpenAI states "robots.txt rules may not apply." Blocking this bot is largely pointless β€” yet 1,190 hotels do it.

The visibility trade-off is real. Hotels blocking OAI-SearchBot opt out of ChatGPT search results. Hotels blocking PerplexityBot vanish from Perplexity. As AI search becomes a primary discovery channel for travelers, blocking search bots is equivalent to delisting from a search engine. Read our anatomy of ChatGPT hotel search to understand how search bots retrieve and present hotel information.

AI Blocking by Country

France is a clear outlier. Germany, despite being another GDPR market, is the lowest.

% of hotels blocking any AI crawler, by country

France at 7.5% is 3x the US (2.1%) or UK (2.0%) rate. But the number is misleading. Most of it comes from a single chain. See below.

The Logis Effect: One Chain Explains France's Outlier Status

Logis Hotels is a French cooperative of ~2,300 independent hotels, restaurants, and guesthouses. Their shared CMS/platform includes a robots.txt that blocks 6 AI training bots (GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Applebot-Extended) while allowing all search bots. This single decision affects 955 properties in our dataset.

72.1%
of French blockers are Logis
2.1%
France's rate without Logis
= US rate
No longer an outlier

Remove Logis from the data, and France drops from 7.5% to 2.1% β€” exactly the US rate. The "French GDPR culture" hypothesis largely evaporates. What looks like a national trend is actually a single platform decision by a cooperative that bundles AI blocking into its shared infrastructure.

France blocking breakdown (1,317 total blockers):

Logis Hotels β€” 950 (72.1%)
Independent/Other β€” 283 (21.5%)
Aparthotel chains β€” ~90 (6.9%)
Sofitel + Others β€” 13 (1.0%)

The irony: Logis's blocking is actually the "smart" pattern β€” they block training bots while allowing search bots. Their hotels remain visible in ChatGPT search and Perplexity. This makes them the largest coordinated example of the training-only blocking strategy in our dataset.

Germany at 1.7% disproves the GDPR hypothesis entirely. If data protection regulation drove AI blocking, Germany β€” with its equally strong GDPR enforcement β€” would match France. Instead, it has the lowest rate of any country in our dataset. AI blocking in hospitality is driven by platform decisions and agency culture, not regulation.

Full country-level blocking data

CountryHotelsHas robots.txtBlocks Any AIGPTBotClaudeBot
France17,63489.8%7.5%7.2%6.5%
Italy27,31971.7%3.3%2.6%2.4%
Spain16,41183.6%2.6%2%2%
Netherlands2,89186.6%2.2%1.8%1.5%
USA7,44590.8%2.1%1.8%1.7%
UK10,54789.4%2%1.7%1.5%

AI Blocking by Star Classification

Does hotel quality affect AI blocking decisions?

AI blocking rates by star classification

Zero effect. The range is tight: 2.6% to 3.8% across all star classifications. Whether a hotel is 1-star or 5-star has no meaningful impact on whether it blocks AI crawlers. The slightly higher rate for "Unclassified" properties (3.8%) may reflect a different mix of website platforms rather than a deliberate strategic choice.

Blocking rates by star classification

StarsHotelsBlocks Any AIBlocks All AI
1-star2,6992.6%1.2%
2-star10,2223.1%0.8%
3-star30,1993%1.1%
4-star16,5482.6%1%
5-star2,0623.1%0.6%
Unclassified43,2723.8%0.8%

Blocking Distribution

How many AI bots do hotels block? The pattern is bimodal: 0 or most.

Number of AI bots blocked per hotel

Hotels either block 0 or block most/all. The distribution is bimodal: 101,544 hotels (96.7%) block zero AI bots, while 3,065 hotels (2.9%) block 9-14 bots. Very few hotels block just 1-3 bots (704, or 0.7%). This suggests that AI blocking is typically an all-or-nothing decision β€” when hotels add AI blocking rules, they tend to copy comprehensive blocklists rather than selectively choosing individual bots.

Hotels by number of AI bots blocked

Bots BlockedHotels% of Total
0101,54496.7%
1-37040.7%
4-81,6891.6%
9-143,0652.9%

Who's Actually Blocking?

The 3,458 blocking hotels aren't random β€” most blocking is chain or platform-driven.

Logis Hotels β€” 944 properties (27% of all blockers)

6 bots blocked: GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Applebot-Extended

The French cooperative hotel chain accounts for the single largest share of AI blocking. Their blocking is training-only β€” they allow ChatGPT-User, OAI-SearchBot, and PerplexityBot. This is the "smart" pattern: block AI training, stay visible in AI search. Logis alone explains most of France's 7.5% outlier rate.

Block-everything hotels β€” 957 properties

14 bots blocked (all AI crawlers)

These hotels use a blanket Disallow: / for all user agents, which blocks every crawler including AI. Many are Italian resort booking platforms (bookitalyhotels.com, Greenblu) or vacation club networks (Diamond Resorts). Notable 5-star blockers: Grand Hotel Des Iles Borromee (Stresa, 4.7β˜…), Aquatio Cave Luxury Hotel & Spa (Matera, 4.7β˜…), Hotel Masseria San Domenico (Fasano, 4.7β˜…).

Partial search bot blocking β€” ~90 properties

GPTBot fully blocked + OAI-SearchBot blocked on specific paths

Some hotel chains block GPTBot entirely (no training) but only restrict OAI-SearchBot from sensitive paths like /booking/. This is actually a nuanced, smart strategy: the hotel remains visible in ChatGPT Search for discovery queries, but protects its booking funnel. Our detection flags any Disallow rule as a "block," but these hotels are still discoverable.

Sercotel Hotels β€” 71 properties (Spain)

9 bots blocked: GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, Google-Extended, PerplexityBot, cohere-ai, YouBot, Applebot-Extended

The Spanish chain blocks both training and search bots β€” including PerplexityBot and YouBot. They allow OAI-SearchBot and Claude-Web but block ChatGPT-User. Per OpenAI's docs, blocking ChatGPT-User is largely pointless: it's user-initiated and "robots.txt rules may not apply." Meanwhile, blocking PerplexityBot means Sercotel hotels are invisible to Perplexity search β€” a real visibility loss.

Sofitel (Accor luxury) β€” blocks only Google-Extended

1 bot blocked: Google-Extended

Sofitel Le Scribe Paris OpΓ©ra, Sofitel Paris le Faubourg, Sofitel Paris Arc de Triomphe, Sofitel London St James, Sofitel Legend The Grand Amsterdam β€” they all block only Google-Extended (Gemini training). GPTBot, ClaudeBot, and search bots pass freely. This is the most minimal blocking policy: stop Google from training Gemini on your content, allow everything else.

Paris Spotlight: 35 hotels block AI

Of the ~4,000+ hotels in our Paris dataset, only 35 block any AI crawler. Notable 5-star blockers:

HΓ΄tel Madame RΓͺve

5β˜… Β· 4.6 rating Β· blocks 6 training bots

Sofitel Le Scribe Paris OpΓ©ra

5β˜… Β· 4.6 rating Β· blocks Google-Extended only

Sofitel Paris le Faubourg

5β˜… Β· 4.6 rating Β· blocks Google-Extended only

Sofitel Paris Arc de Triomphe

5β˜… Β· 4.6 rating Β· blocks Google-Extended only

The majority of Paris blockers are Logis Hotels (via their CMS) and aparthotel chains (corporate policy). The palace hotels β€” Ritz, Plaza AthΓ©nΓ©e, Le Bristol, Four Seasons George V β€” do not block any AI crawler.

5-Star Hotels: 64 Block AI

Only 64 out of ~2,062 five-star hotels (3.1%) block any AI crawler. The most notable:

HotelLocationRatingBlocking
Villa d'EsteCernobbio, IT4.7β˜…GPTBot + ChatGPT-User
Aman VeniceVenice, IT4.8β˜…Google-Extended only
Villa la MassaCandeli, IT4.8β˜…GPTBot + ChatGPT-User
Grand Hotel Des Iles BorromeeStresa, IT4.7β˜…All 14 bots
Equinox Hotel New YorkNew York, US4.4β˜…anthropic-ai only
The Royal HorseguardsLondon, GB4.4β˜…6 training bots
HΓ΄tel Madame RΓͺveParis, FR4.6β˜…6 training bots
Gran Hotel InglΓ©sMadrid, ES4.7β˜…6 training bots

Italy dominates the 5-star blocking list. Villa d'Este and Villa la Massa (both luxury Italian properties) block GPTBot and ChatGPT-User specifically β€” an anti-OpenAI stance. Aman blocks only Google-Extended. Equinox New York blocks only anthropic-ai. Each has a different, seemingly deliberate policy.

Most AI blocking is a chain or CMS decision, not an individual hotel decision. Logis alone (944 hotels) accounts for 27% of all blockers. Add other chains and blanket-blocking platforms (~957), and you've explained ~60% of all AI blocking with just a few patterns. The remaining ~40% is a mix of individual hotels, smaller chains, and hosting providers with default blocking rules.
Opting Out

1,071 Hotels Are Invisible to ChatGPT Search

Blocking OAI-SearchBot doesn't just prevent training β€” it removes your hotel from ChatGPT's search results entirely.

1,071
block OAI-SearchBot
1.0% of all hotels
1,083
block PerplexityBot
1.0% of all hotels
58
block only search bots
0.1% β€” deliberate opt-out

Most hotels that block OAI-SearchBot do so as part of a blanket blocklist β€” they're blocking everything, not specifically targeting search. But 58 hotels block search bots while allowing training bots, which is the exact opposite of the "smart" strategy. These hotels are opting out of AI discovery while still letting their content be used for model training.

Three blocking patterns we observed

Full block

Blanket Disallow: / for all AI bots

~957 hotels block every crawler including all AI bots. These are typically platform-level decisions (booking platforms, resort networks) rather than individual hotel choices. The hotel is invisible to every AI search engine.

Partial block

Block OAI-SearchBot from specific paths only

Some hotel chains block OAI-SearchBot only from sensitive paths (e.g. /booking/) while allowing it on the rest of the site. This is actually a nuanced, smart strategy: the hotel remains visible in ChatGPT Search but protects its booking funnel from being scraped. Our detection counts these as "blocking OAI-SearchBot," but the hotel is still discoverable.

Smart pattern

Block GPTBot (training), allow OAI-SearchBot (search)

2.1% of hotels block training bots while keeping search bots open. This is the optimal approach: your content won't be used to train AI models, but your hotel still appears when travelers ask ChatGPT for recommendations. Some chains go further by also protecting booking paths from search bots β€” the most sophisticated policy we observed.

Common mistake: blocking ChatGPT-User instead of OAI-SearchBot

1,190 hotels block ChatGPT-User β€” but this is largely pointless. Per OpenAI's documentation: ChatGPT-User is triggered when a user asks ChatGPT to visit a page or interacts with a Custom GPT. It's user-initiated, not automated crawling β€” and robots.txt rules may not apply.

Critically, ChatGPT-User is not used to determine whether content appears in ChatGPT Search. That's OAI-SearchBot. Hotels blocking ChatGPT-User think they're opting out of ChatGPT β€” but they're blocking the wrong bot.

We also observed the reverse mistake: some hotel chains block GPTBot + ChatGPT-User but allow OAI-SearchBot. The result is correct (visible in search, opted out of training) β€” but likely achieved by accident rather than by understanding the bot taxonomy.

Note on partial blocks

Our detection flags any Disallow rule targeting OAI-SearchBot as a "block." But some of the 1,071 hotels only block specific paths (like /booking/) β€” not the entire site. These hotels are still discoverable in ChatGPT Search. The true "fully invisible" count is lower than 1,071, concentrated among blanket blockers and platform-level decisions.

Blocking OAI-SearchBot is the new "noindex". When a hotel blocks OAI-SearchBot, it won't appear when travelers ask ChatGPT for recommendations β€” even if the hotel has great reviews and a strong web presence. As AI search grows as a discovery channel, this is equivalent to delisting yourself from a search engine. Hotels that want to opt out of ChatGPT Search should block OAI-SearchBot β€” not ChatGPT-User.

Frequently Asked Questions

Methodology

Data Collection

  • Source: Same 105,002 reachable hotel websites from our schema adoption study
  • 7 countries: France (17.6K), Italy (27.3K), Spain (16.4K), Netherlands (2.9K), USA (7.4K), UK (10.5K), Germany (22.3K)
  • robots.txt fetched from each domain root with Chrome-like user agent, 10-second timeout
  • Each robots.txt parsed for User-agent directives and Disallow rules
  • 14 AI-specific bots tracked across training, search, and user categories

Bot Classification

  • Training bots: GPTBot, Google-Extended, CCBot, Bytespider, ClaudeBot, Applebot-Extended, anthropic-ai, cohere-ai, Diffbot
  • Search bots: PerplexityBot, OAI-SearchBot, YouBot
  • User agent bots: ChatGPT-User, Claude-Web
  • "Blocks any AI" = at least one AI bot has a Disallow rule
  • "Smart strategy" = blocks at least one training bot but allows all search bots