Should Publishers Block AI Bots From Scraping Their Content?

Credit: Overearth / iStock.com

Publishers face two pressing questions right now: “Should we block AI bots from scraping our content? And if so, how do we actually do it?”

Usage of ChatGPT, Claude, Google Gemini, and others has exploded to well over 15 billion monthly visits.1

And you’ve seen the headlines. National news and lifestyle publications report that AI-generated answers have led to search traffic dropping 10% to 15% and even up to 50% in extreme cases.

The reality, however, is that not all publishers are impacted equally.

AI Threat Level by Publisher Type

National news and lifestyle publications shout the loudest because they are the most highly impacted by generative AI. However, the actual impact varies wildly depending on your niche:

  • National News (Very High Risk): These media organizations compete with hundreds of other sites reporting mostly on the same stories. They’re highly dependent on SEO and heavily reliant on programmatic advertising.
  • National Lifestyle / Hobby (High Risk): Publishers who cover topics like fashion, knitting, jazz, fishing, etc. are still dependent on SEO. Much of their content is commodity or “evergreen” where AI can easily summarize general knowledge topics.
  • City & Regional News (Medium-high Risk): These publishers have a local content advantage, but much of their content is still national. They often face significant competition from other local media outlets. And they’re still dependent on SEO and high page views for programmatic revenue.
  • City & Regional Lifestyle (Medium Risk): Magazines like D Magazine (Dallas) or Philadelphia Magazine have fresh, unique, local content. They are still somewhat SEO dependent, and some rely on programmatic backfill advertising.
  • National B2B Trade (Medium Low Risk): B2B / professional association publishers (e.g., auto body, construction, restaurant owners) are much less dependent on SEO and can rely more heavily on other channels like email. Their content is highly specialized and low-commodity, and very few B2B publishers rely on programmatic backfill revenue.
  • City & Regional Business (Low Risk): Publications like BizTimes Milwaukee or Ottawa Business Journal are actually seeing their search traffic grow right now. They have highly unique, fresh local content. Email, Google News, Nextdoor News and other channels often drive more traffic than search.
Ai Threat Level
AI threat level by publisher type

AI Visibility vs. Website Traffic

Before you block AI bots from your site, you must answer one core question: “Is visibility in generative AI more important than website traffic?”

If you allow bots to scrape your site, your content might appear in a ChatGPT or Gemini answer. Your content may gain some visibility as a source, but most generative AI users never click through to your site.

Most of the publishers I work with have decided that traffic is more important right now. They also don’t want to feed the AI models for free and are exploring ways to license their content instead.

And for those worried that blocking AI bots will hurt your traffic, the data suggests otherwise. Here is a week-by-week view of a client where we blocked all AI bots in mid-September. Since blocking AI bots, organic search traffic has remained strong.

Ai Bot Blocking Had No Impact On Search Traffic
AI bot blocking had no impact on search traffic

Loopholes AI Companies Use

Be aware that AI companies can use loopholes to access your content even if you try to block them:

  • The Internet Archive Loophole: AI companies claim they honor robots.txt. However, reports suggest they bypass this by grabbing your content from Common Crawl or the Internet Archive (Wayback Machine). They don’t scrape your site directly; they scrape these Internet archives.
  • The Sublicensing Loophole: If you participate in content distribution programs like SmartNews or NewsBreak, check the terms and conditions of their agreements. You may have unwittingly given them permission to sub-license your content to AI companies.

How to Block AI Bots Effectively

So you’ve decided to block AI bots from your site? Use a multi-layered approach:

  1. Update your robots.txt file: Robots.txt gives instructions to bots that visit your site. You need to specifically list and block the user agents for every AI bot (ChatGPT, Gemini, Claude, etc.). To close the Internet archive loophole, you must also block the Creative Commons and Wayback Machine bots. Note that only the “ethical” AI bots respect robots.txt. Be very careful not to block legitimate search engine bots. You do not want to accidentally block Google or Bing from indexing your site for standard search.
  2. Use a web application firewall (WAF): This happens at the website hosting level where you can block AI bots at the firewall so they don’t even hit your site. Cloudflare and AWS, for example, have specific AI bot filtering settings.
  3. Implement geo-blocking: If you are a local or regional publisher, consider blocking all traffic from outside your country. Many unscrupulous AI bots originate internationally. If international readership isn’t critical to your business model, this can be another effective tactic.
  4. Advanced blocking tactics: If you want to get really advanced, you can implement “AI honeypots” (invisible links that only bots click and put them into a loop) or set behavioral rules. For example, if a site visitor requests your sitemap immediately upon arrival, or makes multiple requests extremely fast, that is likely a bot, not a human.

The Bottom Line

For most publishers, it is strategically advantageous right now to block AI bots from your site. This may change in the future, but right now the visibility tradeoffs are not worth the negative impacts on your business.

Footnotes

  1. Similarweb, September 2025 Desktop and Mobile Visits ↩︎

Share this post …

Sign up for my newsletter

I send digital revenue, audience and technology tips for publishers once per week.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.