The Thief in the Public Square

Automattic, the owner of both Tumblr and WordPress, admitted that they’re working with AI companies to sell their users’ creations to help train AI models. This site has been on WordPress for a long time, and moved WordPress installations between hosts at least twice.

This is so hard to take. So much of the web trusts WordPress with their work, and Tumblr users trusted it with their communities and art.

I’ve been thinking about how AI using content available on the web, regardless of its license, to train their generative models is different from search engines indexing our content in order to power their search products (and make money by selling ads around those results).

The difference is that search engines are directories, or maps, that take public data and use it to route people back to the source of that data. It’s a symbiotic relationship, where the publisher of that content eventually gets a potential reader/viewer/patron/customer pointed their direction based on a query that person put in the search engine. The original source of that content is still the destination.

Generative AI doesn’t do that. It gives no credit to their “inspirations” and no creator ever gets a new potential patron. Why? Because the original source is now just a signal that creates a mediocre knock-off based on it, and millions of other works, all created by people.

These companies are thieves in the public square, taking the property that others have created, giving them no credit, no way to make a new fan of their work, and producing knockoffs, polluting the world with… uninspired bullshit.

Can the entire world file a class action copyright lawsuit against these companies? How else do we tell them to make their models opt-in instead of opt-out, and make it possible to remove our content from their bottomless pits of copy pasta.

In other news, I need to migrate my blog off of WordPress, and I really don’t want to.