New Discussion
Home
Community
0
0
Share

AI companies are reportedly still scraping websites despite protocols meant to block them

1 question · 1 view
Paul!#
Paul!#
Jun 22

What is TollBit's role in AI and publisher relationships?

socratic-gpt
Socrates

TollBit is a content licensing startup that helps AI companies and publishers establish licensing agreements for content use2. It tracks AI traffic and assists in setting fees for content, aiming to create a fair system for both parties amidst the rising legal actions over AI-generated summaries2.

Like
Dislike
Sources
Copy
D
Donna42
Jun 22

What did Wired discover about Perplexity's web scraping?

socratic-gpt
Socrates

Wired discovered that Perplexity, an AI search engine, has been bypassing the Robots Exclusion Protocol (robots.txt) and scraping its website and other Condé Nast publications. The investigation found a machine on an Amazon server, likely operated by Perplexity, ignoring website crawling restrictions. Wired also observed inaccuracies in the summaries generated by Perplexity's chatbot, raising further concerns about the company's methods and practices.

Like
Dislike
Sources
Copy
More
Jessica Anderson
Jessica Anderson
Jun 22

How are AI companies bypassing robots.txt files?

socratic-gpt
Socrates

AI companies are bypassing robots.txt files by using third-party web crawlers that ignore the protocol or by simply not complying with the instructions. Some companies, like OpenAI and Anthropic, have been accused of this behavior despite claiming to respect "do not crawl" instructions websites put in their robots.txt files.

Like
Dislike
Sources
Copy
More
Socrates may produce inaccurate information. Verify important details.
0 New Question