Cloudflare has launched a new tool aimed at protecting its clients’ websites from unauthorized scraping by AI bots used by various companies to train large language models (LLMs). This tool, available for free to all Cloudflare users, allows website owners to easily block AI bots that may be harvesting content without permission.
What can Cloudflare’s AI bot block tool do?
Cloudflare’s AI bot tool serves several critical functions aimed at protecting websites from unauthorized scraping and ensuring ethical AI usage, such as:
- Block AI bots: The tool allows website owners to easily block AI bots from accessing their websites. This includes bots used by AI companies to scrape content for training large language models (LLMs) without permission.
- Accessible to all: Available for free to all Cloudflare customers, including those on the free tier, the tool ensures that even smaller website operators can protect their content.
- Ease of use: Positioned as an “easy button” within the Cloudflare dashboard under the Security > Bots section, users can toggle the AI Scrapers and Crawlers option to activate the block.
By blocking AI bots, Cloudflare helps protect the integrity and originality of content hosted on its clients’ websites. This is crucial for content creators who want to control how their content is used and ensure it isn’t misused for unauthorized purposes.
Aligns with efforts to enforce transparency in AI usage, ensuring AI companies operate within ethical boundaries, respect website owners’ rights, and adhere to legal standards regarding data usage and copyright.
Cloudflare’s AI bot tool will be continuously updated based on Cloudflare’s global network insights and machine learning models to detect and block new and evolving bot behaviors. Also, it provides mechanisms for reporting misbehaving AI bots, empowering customers to contribute to the ongoing improvement of bot detection and mitigation strategies. The platform’s global network, processing over 57 million requests per second on average, provides robust signals for identifying and mitigating bot-driven threats.
Overall, Cloudflare’s AI bot tool represents a proactive approach to safeguarding online content and promoting responsible artificial intelligence practices across its platform. By offering accessible and effective tools, Cloudflare supports a more secure Internet ecosystem for content creators and users alike.
Top AI bots by request volume
Cloudflare also has a list of AI bots, and here are the most active ones:
- Bytespider: Leads in both the number of requests and the extent of Internet property crawling. Operated by ByteDance, Bytespider gathers training data for large language models (LLMs) supporting various applications.
- GPTBot: Managed by OpenAI, GPTBot ranks prominently in crawling activity and is widely used for collecting training data for AI-driven products like ChatGPT.
- Amazonbot: Used for indexing content for Alexa‘s question-answering capabilities, Amazonbot follows closely behind Bytespider in request volume.
- ClaudeBot: Operated to train the Claude chatbot, ClaudeBot has seen an increase in request volume, reflecting its role in AI model training.
Despite AI bots’ popularity, many website operators are unaware of the presence and activity of AI crawlers on their sites. In June, AI bots accessed around 39% of the top one million Internet properties using Cloudflare’s services.
Cloudflare’s data shows that only 2.98% of the top Internet properties actively block or challenge AI bot requests, highlighting a gap in proactive measures against unauthorized scraping.
Featured image credit: Eray Eliaçık/Bing