The updated robots.txt file aims to deter AI companies from training their large language models on Reddit content without permission1. While AI crawlers could ignore Reddit's robots.txt file, the update intends to signal companies that they will need to pay for using Reddit's data for AI training. Reddit will continue rate-limiting and blocking unknown bots and crawlers that don't abide by its Public Content Policy or have an agreement with the platform.
Reddit is updating its robots.txt file to better control how its content is being accessed and used, particularly by AI companies. The update aims to deter these companies from training their large language models on Reddit content without permission or acknowledging the source. This change follows instances of AI-powered search startups stealing and scraping content without respecting websites' robots.txt files.
Reddit's new data access policy aims to regulate how its data is utilized by commercial entities and partners, while protecting user privacy. The policy requires partners to uphold user privacy, respect content deletion decisions, and prohibits using content for ad targeting, background checks, or surveillance purposes4. Additionally, it establishes a new subreddit for researchers to access Reddit data for non-commercial purposes.