On Thursday, Reddit is rolling out a new policy aimed at balancing its desire to license its content to larger tech companies, like Google, and protecting users’ privacy. The newly announced “Public Content Policy” will now join Reddit’s existing privacy policy and content policy to guide how Reddit’s data is being accessed and used by commercial entities and other partners. Related to this, the company also announced a subreddit dedicated to researchers working with Reddit’s data.
The announcement comes shortly after Reddit’s stock market debut, which sees the company positioning itself to grow revenue not only from the ads that run on its platform and API usage by developers but also from its corpus of data. The company in its IPO prospectus said it had already made $203 million through data licensing agreements and expects that number to increase over time.
While Reddit hadn’t historically blocked access to its data for AI training purposes, it changed its course last year. Reddit CEO Steve Huffman told The New York Times that it didn’t make sense for Reddit to continue to give “all of that value to some of the largest companies in the world for free,” signaling the company’s plan to move into the data licensing space.
With those efforts now well underway, the new Public Content Policy will further lock down access to Reddit’s data without an agreement.
“Unfortunately, we see more and more commercial entities using unauthorized access or misusing authorized access to collect public data in bulk, including Reddit public content,” Reddit writes in its blog. “Worse, these entities perceive they have no limitation on their usage of that data, and they do so with no regard for user rights or privacy, ignoring reasonable legal, safety, and user removal requests. While we will continue our efforts to block known bad actors, we need to do more to restrict access to Reddit public content at scale to trusted actors who have agreed to abide by our policies. But we also need to continue to ensure that users, mods, researchers, and other good-faith, non-commercial actors have access.”
In other words, access to Reddit data for research and other non-commercial efforts will continue, but those entities that want to use Reddit’s data for other purposes — including for AI training — will have to pay. In a graphic shared on the blog, Reddit makes this clear, saying that businesses interested in using Reddit data to “power, augment or enhance your product for any commercial purposes” requires a contract.
Advertisers, meanwhile, are directed to an ads API for managing campaigns and tracking their performance.
Because the company is essentially just a large website, indexable by search engines, this new policy aims to lock down Reddit content from any unauthorized collection while also respecting users’ rights.
For instance, Reddit says that its partners will have to upload users’ decisions to delete their content. So if users don’t want their personal posts to become fodder for future AI engines, they should be able to opt out. Partners are also restricted by the new policy from using Reddit’s content to identify individuals or their personal information, including for ad targeting. Partners also can’t use Reddit content to spam or harass its users or to conduct “background checks, facial recognition, government surveillance, or help law enforcement do any of the above.”
The policy additionally restricts access to adult media and clarifies that Reddit won’t sell its users’ personal information. The company also notes that it will never license non-public content like private messages or non-public account information, like users’ emails or browsing history, among other things.
To help researchers who want to use Reddit data for non-commercial purposes, the company has established a new subreddit, r/reddit4researchers. The company says it’s partnering with OpenMined to also develop a program to guide and grow researchers’ collaboration with Reddit.