Reddit blocks Wayback Machine over concerns about data being mined by AI
Reddit will restrict Wayback Machine access to most content to prevent AI from mining data, allowing only the homepage and popular headlines to be saved.
Quick summary:
Reddit limits the Wayback Machine to only saving the homepage and popular headlines.
Reason: concerns about AI companies mining data in violation of policy.
Reddit previously blocked the API and required search engines to pay for data.
Reddit confirmed that it had discovered that several AI companies were scraping data from the Internet Archive's Wayback Machine, violating its platform policies. As a result, the social network will limit access to the Wayback Machine, allowing it to only archive Reddit.com's homepage and a list of popular headlines, rather than entire posts, comments, or user profiles as before.

Reddit requires the Internet Archive to comply with its privacy policies and remove the removed content before restoring full access, said spokesperson Tim Rathschmidt.
Reddit said the restrictions will be rolled out gradually starting today. The company contacted the Internet Archive in advance to inform them of the decision, and has expressed concerns about content being scraped from the Wayback Machine in the past.
This isn’t the first time Reddit has blocked data scrapers. In 2023, Reddit changed its API policy, forcing several third-party apps to shut down after they couldn’t afford to pay for data access—reportedly because the APIs were being used to train AI.
Last year, Reddit signed a deal to provide data to Google for search and AI training, and began blocking other major search engines if they didn’t pay. The company also reached a settlement with OpenAI, but sued Anthropic in June 2024 for allegedly continuing to scrape data despite saying it would stop.
Mark Graham, director of the Wayback Machine, said the Internet Archive has a longstanding relationship with Reddit and is still discussing the issue.