Cloudflare Accuses Perplexity of Bypassing Anti-Scraping Rules

Internet content provider Cloudflare has accused the AI startup Perplexity of scraping online content despite being blocked from doing so.

In September last year, Cloudflare announced it would be automatically blocking AI crawlers and agents unless website owners purposefully opt-in to having their content used.

A month ago, this offering developed to include a ‘pay-per-crawl’ feature that would let publishers get paid from their content being taken by AI crawlers and agents.

Crawlers and agents from AI companies take content from websites to train models or to show information in chatbot query answers.

In a blog post, Cloudflare said it has observed Perplexity bypassing protections imposed, even when websites are opting out.

Bypassing Publisher Protections

Cloudflare used machine learning and network signals to track the activity of AI model crawlers and agent activity.

The company said that Perplexity ignored Robots.txt files that help communicate which part of a website can be accessed by crawlers.

The content provider also claims that Perplexity disguised its identity by switching user-agent strings – which can identify the device or operating system – and changing its autonomous system numbers (ASNs) – which is its network identification numbers.

Cloudflare also says that the AI startup pretended to be Google Chrome when its agent was blocked, which it sees as an obvious sign of attempting to evade detection.

As such, Cloudflare has removed Perplexity’s crawlers from a list of verified bots and had to come up with new ways of blocking the AI company.

Despite this, Perplexity has denied that it bypassed website protections, calling the claims a ‘sales pitch’.

In response, Cloudflare reiterated that it confirmed Perplexity’s activity through testing.

The use of publisher content by AI models poses numerous threats to both the online ads-driven economic model, and the fair treatment of content creators when it comes to copyright concerns and their work being adequately compensated for.

Cloudflare’s offering served as a potential solution to these issues, however this may not be the case if AI firms are able to bypass restrictions and reject adhering to rules.

Subscribe to our newsletter for updates

Join thousands of media and marketing professionals by signing up for our newsletter.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Share

Related Posts

Popular Articles

Featured Posts

Menu