With the proliferation of AI tools in recent months, many fans have voiced concerns regarding data scraping and AI-generated works, and how these developments can affect AO3. We share your concerns. We’d like to share what we’ve been doing to combat data scraping and what our current policies on the subject of AI are.
Data scraping and AO3 fanworks
We’ve put in place certain technical measures to hinder large-scale data scraping on AO3, such as rate limiting, and we’re constantly monitoring our traffic for signs of abusive data collection. We do not make exceptions for researchers or those wishing to create datasets. However, we don’t have a policy against responsible data collection — such as those done by academic researchers, fans backing up works to Wayback Machine or Google’s search indexing. Putting systems in place that attempt to block all scraping would be difficult or impossible without also blocking legitimate uses of the site.
With that said, it is an unfortunate reality that anything that is publicly available online can be used for reasons other than its initial intended purposes. In many cases, AI data collection traffic relies on the same techniques as the legitimate use cases above.
Once we became aware that data from AO3 was being included in the Common Crawl dataset — which is used to train AI such as ChatGPT — we put code in place in December 2022 requesting Common Crawl not scrape the Archive again.
We cannot go back in time to stop data collection that already occurred, or remove AO3’s content from existing datasets, as much as we may dislike that it happened. All we can do is attempt to reduce such collection in the future. The Archive’s development team will continue to be on the lookout for individual scrapers collecting AO3 data, and to take action as needed.
Likewise, our Legal committee has and will continue to serve the OTW mission of protecting fanworks from legal challenge and commercial exploitation. This includes their position that users should be allowed to opt out from having their works incorporated into AI training sets, a position that they have presented to the U.S. Copyright Office. They, too, will continue to keep pace with this developing field.
What can I do to avoid data scraping?
You may want to restrict your work to Archive users only. While this will not block every potential scraper, it should provide some protection against large-scale scraping.
AI-generated works and AO3 policies
At the moment, there is nothing in our Terms of Service that prohibits fanworks that are fully or partly generated with AI tools from being posted to the AO3, if they otherwise qualify as fanworks.
Our goals as an organization include maximum inclusivity of fanworks. This means not only the best fanworks, or the most popular fanworks, but all the fanworks that we can preserve. If fans are using AI to generate fanworks, then our current position is that this is also a type of work that is within our mandate to preserve.
Depending on the circumstances, AI-generated works could violate our anti-spam policies (e.g. if a creator posts a significant number in a short time). If you’re uncertain whether a work violates our Terms of Service, you may always report it to our Policy & Abuse team using the link at the bottom of any page, and they can investigate.
This statement reflects AO3’s policy at the time of writing, as we wanted to be transparent with our users about what our current stance is and what can be done – and is being done – to mitigate scraping for AI datasets. However, these policies are also under discussion internally among AO3 volunteers. If we agree on changes to these in the future, those will be announced publicly; additionally, if there are any proposed changes to the AO3 Terms of Service, they will be made available for public comment as is required of any and all changes to our Terms of Service.
We hope that this helps to make things more clear – this is a complicated situation, and we’re doing our very best to address it in a way that doesn’t compromise AO3’s principles of maximum fanwork inclusivity or legitimate uses of the site. As discussions and approaches evolve, we will keep our users updated.