Do Hotels Block AI Crawlers?
We parsed 105,002 hotel robots.txt files. 96.7% have zero AI-specific blocking rules. The industry is wide open.
TL;DR
We parsed robots.txt files from 105,002 hotel websites across 7 countries. Only 3.3% block any AI crawler β and just 0.9% block all of them. GPTBot (OpenAI) is the most commonly blocked at 2.9%, while AI search bots like PerplexityBot and OAI-SearchBot are blocked by just 1.0%. The most interesting signal: 2.1% of hotels use a "smart" strategy β blocking training bots while allowing search bots through. France is a clear outlier at 7.5%, more than 3x the rate of any other country.
Executive Summary
The robots.txt file is the first line of defense for any website. It tells crawlers what they can and cannot access. With the rise of AI-powered search engines (ChatGPT, Perplexity, Gemini) and AI model training, hotels face a new decision: should they allow AI bots to crawl their content?
Our analysis of 105,002 hotel websites reveals that the vast majority have not yet made this decision β or have decided to leave the door wide open. Only 3.3% block any AI crawler at all. For context, this means 96.7% of hotel websites are fully accessible to AI training bots and AI-powered search engines alike.
The distinction between training and search bots matters. Training bots (GPTBot, ClaudeBot, CCBot) scrape content to build AI models. Search bots (PerplexityBot, OAI-SearchBot) fetch content to answer user queries in real time. Blocking the first protects your content from being used in training. Blocking the second removes you from AI-powered search results. Understanding this distinction is critical β and our anatomy of ChatGPT hotel search article explains exactly how these bots work.
robots.txt Adoption
How many hotel websites have a robots.txt file? (n=105,002 hotels)
robots.txt status breakdown
AI Blocking Overview
How does AI bot blocking compare to traditional search engine blocking?
AI blocking vs traditional search engine blocking
Per-Bot Blocking Rates
Which AI bots are hotels blocking? Colored by category: training search user agent
AI bot blocking rates by bot
Full AI bot blocking data
| Bot | Provider | Category | Hotels Blocking | % of Total |
|---|---|---|---|---|
| GPTBot | OpenAI | training | 3,036 | 2.9% |
| Google-Extended | training | 2,793 | 2.7% | |
| CCBot | Common Crawl | training | 2,847 | 2.7% |
| Bytespider | ByteDance | training | 2,782 | 2.6% |
| ClaudeBot | Anthropic | training | 2,742 | 2.6% |
| Applebot-Extended | Apple | training | 2,669 | 2.5% |
Selective Blocking: Training vs Search
Among hotels that block AI, what strategy are they using?
AI blocking strategy distribution
Understanding OpenAI's 3 crawlers
Per OpenAI's official documentation, each crawler serves a distinct purpose:
GPTBot β crawls content for training generative AI models. Blocking it means your content won't be used for training. This is the one most hotels should consider blocking.
OAI-SearchBot β indexes content for ChatGPT's search features. Blocking it means your site won't appear in ChatGPT search results, only as navigational links. Hotels wanting AI search visibility should allow this.
ChatGPT-User β triggered when a user asks ChatGPT to browse a page. It's user-initiated, and OpenAI states "robots.txt rules may not apply." Blocking this bot is largely pointless β yet 1,190 hotels do it.
AI Blocking by Country
France is a clear outlier. Germany, despite being another GDPR market, is the lowest.
% of hotels blocking any AI crawler, by country
The Logis Effect: One Chain Explains France's Outlier Status
Logis Hotels is a French cooperative of ~2,300 independent hotels, restaurants, and guesthouses. Their shared CMS/platform includes a robots.txt that blocks 6 AI training bots (GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Applebot-Extended) while allowing all search bots. This single decision affects 955 properties in our dataset.
Remove Logis from the data, and France drops from 7.5% to 2.1% β exactly the US rate. The "French GDPR culture" hypothesis largely evaporates. What looks like a national trend is actually a single platform decision by a cooperative that bundles AI blocking into its shared infrastructure.
France blocking breakdown (1,317 total blockers):
The irony: Logis's blocking is actually the "smart" pattern β they block training bots while allowing search bots. Their hotels remain visible in ChatGPT search and Perplexity. This makes them the largest coordinated example of the training-only blocking strategy in our dataset.
Full country-level blocking data
| Country | Hotels | Has robots.txt | Blocks Any AI | GPTBot | ClaudeBot |
|---|---|---|---|---|---|
| France | 17,634 | 89.8% | 7.5% | 7.2% | 6.5% |
| Italy | 27,319 | 71.7% | 3.3% | 2.6% | 2.4% |
| Spain | 16,411 | 83.6% | 2.6% | 2% | 2% |
| Netherlands | 2,891 | 86.6% | 2.2% | 1.8% | 1.5% |
| USA | 7,445 | 90.8% | 2.1% | 1.8% | 1.7% |
| UK | 10,547 | 89.4% | 2% | 1.7% | 1.5% |
AI Blocking by Star Classification
Does hotel quality affect AI blocking decisions?
AI blocking rates by star classification
Blocking rates by star classification
| Stars | Hotels | Blocks Any AI | Blocks All AI |
|---|---|---|---|
| 1-star | 2,699 | 2.6% | 1.2% |
| 2-star | 10,222 | 3.1% | 0.8% |
| 3-star | 30,199 | 3% | 1.1% |
| 4-star | 16,548 | 2.6% | 1% |
| 5-star | 2,062 | 3.1% | 0.6% |
| Unclassified | 43,272 | 3.8% | 0.8% |
Blocking Distribution
How many AI bots do hotels block? The pattern is bimodal: 0 or most.
Number of AI bots blocked per hotel
Hotels by number of AI bots blocked
| Bots Blocked | Hotels | % of Total |
|---|---|---|
| 0 | 101,544 | 96.7% |
| 1-3 | 704 | 0.7% |
| 4-8 | 1,689 | 1.6% |
| 9-14 | 3,065 | 2.9% |
Who's Actually Blocking?
The 3,458 blocking hotels aren't random β most blocking is chain or platform-driven.
Logis Hotels β 944 properties (27% of all blockers)
6 bots blocked: GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Applebot-Extended
The French cooperative hotel chain accounts for the single largest share of AI blocking. Their blocking is training-only β they allow ChatGPT-User, OAI-SearchBot, and PerplexityBot. This is the "smart" pattern: block AI training, stay visible in AI search. Logis alone explains most of France's 7.5% outlier rate.
Block-everything hotels β 957 properties
14 bots blocked (all AI crawlers)
These hotels use a blanket Disallow: / for all user agents, which blocks every crawler including AI. Many are Italian resort booking platforms (bookitalyhotels.com, Greenblu) or vacation club networks (Diamond Resorts). Notable 5-star blockers: Grand Hotel Des Iles Borromee (Stresa, 4.7β
), Aquatio Cave Luxury Hotel & Spa (Matera, 4.7β
), Hotel Masseria San Domenico (Fasano, 4.7β
).
Partial search bot blocking β ~90 properties
GPTBot fully blocked + OAI-SearchBot blocked on specific paths
Some hotel chains block GPTBot entirely (no training) but only restrict OAI-SearchBot from sensitive paths like /booking/. This is actually a nuanced, smart strategy: the hotel remains visible in ChatGPT Search for discovery queries, but protects its booking funnel. Our detection flags any Disallow rule as a "block," but these hotels are still discoverable.
Sercotel Hotels β 71 properties (Spain)
9 bots blocked: GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, Google-Extended, PerplexityBot, cohere-ai, YouBot, Applebot-Extended
The Spanish chain blocks both training and search bots β including PerplexityBot and YouBot. They allow OAI-SearchBot and Claude-Web but block ChatGPT-User. Per OpenAI's docs, blocking ChatGPT-User is largely pointless: it's user-initiated and "robots.txt rules may not apply." Meanwhile, blocking PerplexityBot means Sercotel hotels are invisible to Perplexity search β a real visibility loss.
Sofitel (Accor luxury) β blocks only Google-Extended
1 bot blocked: Google-Extended
Sofitel Le Scribe Paris OpΓ©ra, Sofitel Paris le Faubourg, Sofitel Paris Arc de Triomphe, Sofitel London St James, Sofitel Legend The Grand Amsterdam β they all block only Google-Extended (Gemini training). GPTBot, ClaudeBot, and search bots pass freely. This is the most minimal blocking policy: stop Google from training Gemini on your content, allow everything else.
Paris Spotlight: 35 hotels block AI
Of the ~4,000+ hotels in our Paris dataset, only 35 block any AI crawler. Notable 5-star blockers:
HΓ΄tel Madame RΓͺve
5β Β· 4.6 rating Β· blocks 6 training bots
Sofitel Le Scribe Paris OpΓ©ra
5β Β· 4.6 rating Β· blocks Google-Extended only
Sofitel Paris le Faubourg
5β Β· 4.6 rating Β· blocks Google-Extended only
Sofitel Paris Arc de Triomphe
5β Β· 4.6 rating Β· blocks Google-Extended only
The majority of Paris blockers are Logis Hotels (via their CMS) and aparthotel chains (corporate policy). The palace hotels β Ritz, Plaza AthΓ©nΓ©e, Le Bristol, Four Seasons George V β do not block any AI crawler.
5-Star Hotels: 64 Block AI
Only 64 out of ~2,062 five-star hotels (3.1%) block any AI crawler. The most notable:
| Hotel | Location | Rating | Blocking |
|---|---|---|---|
| Villa d'Este | Cernobbio, IT | 4.7β | GPTBot + ChatGPT-User |
| Aman Venice | Venice, IT | 4.8β | Google-Extended only |
| Villa la Massa | Candeli, IT | 4.8β | GPTBot + ChatGPT-User |
| Grand Hotel Des Iles Borromee | Stresa, IT | 4.7β | All 14 bots |
| Equinox Hotel New York | New York, US | 4.4β | anthropic-ai only |
| The Royal Horseguards | London, GB | 4.4β | 6 training bots |
| HΓ΄tel Madame RΓͺve | Paris, FR | 4.6β | 6 training bots |
| Gran Hotel InglΓ©s | Madrid, ES | 4.7β | 6 training bots |
Italy dominates the 5-star blocking list. Villa d'Este and Villa la Massa (both luxury Italian properties) block GPTBot and ChatGPT-User specifically β an anti-OpenAI stance. Aman blocks only Google-Extended. Equinox New York blocks only anthropic-ai. Each has a different, seemingly deliberate policy.
1,071 Hotels Are Invisible to ChatGPT Search
Blocking OAI-SearchBot doesn't just prevent training β it removes your hotel from ChatGPT's search results entirely.
Most hotels that block OAI-SearchBot do so as part of a blanket blocklist β they're blocking everything, not specifically targeting search. But 58 hotels block search bots while allowing training bots, which is the exact opposite of the "smart" strategy. These hotels are opting out of AI discovery while still letting their content be used for model training.
Three blocking patterns we observed
Blanket Disallow: / for all AI bots
~957 hotels block every crawler including all AI bots. These are typically platform-level decisions (booking platforms, resort networks) rather than individual hotel choices. The hotel is invisible to every AI search engine.
Block OAI-SearchBot from specific paths only
Some hotel chains block OAI-SearchBot only from sensitive paths (e.g. /booking/) while allowing it on the rest of the site. This is actually a nuanced, smart strategy: the hotel remains visible in ChatGPT Search but protects its booking funnel from being scraped. Our detection counts these as "blocking OAI-SearchBot," but the hotel is still discoverable.
Block GPTBot (training), allow OAI-SearchBot (search)
2.1% of hotels block training bots while keeping search bots open. This is the optimal approach: your content won't be used to train AI models, but your hotel still appears when travelers ask ChatGPT for recommendations. Some chains go further by also protecting booking paths from search bots β the most sophisticated policy we observed.
Common mistake: blocking ChatGPT-User instead of OAI-SearchBot
1,190 hotels block ChatGPT-User β but this is largely pointless. Per OpenAI's documentation: ChatGPT-User is triggered when a user asks ChatGPT to visit a page or interacts with a Custom GPT. It's user-initiated, not automated crawling β and robots.txt rules may not apply.
Critically, ChatGPT-User is not used to determine whether content appears in ChatGPT Search. That's OAI-SearchBot. Hotels blocking ChatGPT-User think they're opting out of ChatGPT β but they're blocking the wrong bot.
We also observed the reverse mistake: some hotel chains block GPTBot + ChatGPT-User but allow OAI-SearchBot. The result is correct (visible in search, opted out of training) β but likely achieved by accident rather than by understanding the bot taxonomy.
Note on partial blocks
Our detection flags any Disallow rule targeting OAI-SearchBot as a "block." But some of the 1,071 hotels only block specific paths (like /booking/) β not the entire site. These hotels are still discoverable in ChatGPT Search. The true "fully invisible" count is lower than 1,071, concentrated among blanket blockers and platform-level decisions.
Frequently Asked Questions
Methodology
Data Collection
- Source: Same 105,002 reachable hotel websites from our schema adoption study
- 7 countries: France (17.6K), Italy (27.3K), Spain (16.4K), Netherlands (2.9K), USA (7.4K), UK (10.5K), Germany (22.3K)
- robots.txt fetched from each domain root with Chrome-like user agent, 10-second timeout
- Each robots.txt parsed for User-agent directives and Disallow rules
- 14 AI-specific bots tracked across training, search, and user categories
Bot Classification
- Training bots: GPTBot, Google-Extended, CCBot, Bytespider, ClaudeBot, Applebot-Extended, anthropic-ai, cohere-ai, Diffbot
- Search bots: PerplexityBot, OAI-SearchBot, YouBot
- User agent bots: ChatGPT-User, Claude-Web
- "Blocks any AI" = at least one AI bot has a Disallow rule
- "Smart strategy" = blocks at least one training bot but allows all search bots
Continue Reading
Explore more Hotelrank research on AI hotel search.