Jordan Meyer and Mathew Dryhurst founded Spawning AI to create tools that help artists exert more control over how their works are used online. Their latest project, called Source.Plus, is intended to curate “non-infringing” media for AI model training.

The Source.Plus project’s first initiative is a dataset seeded with nearly 40 million public domain images and images under the Creative Commons’ CC0 license, which allows creators to waive nearly all legal interest in their works. Meyer claims that, despite the fact that it’s substantially smaller than some other generative AI training data sets out there, Source.Plus’ data set is already “high-quality” enough to train a state-of-the-art image-generating model.

“With Source.Plus, we’re building a universal ‘opt-in’ platform,” Meyer said. “Our goal is to make it easy for rights holders to offer their media for use in generative AI training — on their own terms — and frictionless for developers to incorporate that media into their training workflows.”

Rights management

The debate around the ethics of training generative AI models, particularly art-generating models like Stable Diffusion and OpenAI’s DALL-E 3, continues unabated — and has massive implications for artists however the dust ends up settling.

Generative AI models “learn” to produce their outputs (e.g., photorealistic art) by training on a vast quantity of relevant data — images, in that case. Some developers of these models argue that fair use entitles them to scape data from public sources, regardless of that data’s copyright status. Others have attempted to toe the line, compensating or at least crediting content owners for their contributions to training sets.

Meyer, Spawning’s CEO, believes that no one’s settled on a best approach — yet.

“AI training frequently defaults to using the easiest available data — which hasn’t always been the most fair or responsibly sourced,” he told TechCrunch in an interview. “Artists and rights holders have had little control over how their data is used for AI training, and developers have not had high-quality alternatives that make it easy to respect data rights.”

Source.Plus, available in limited beta, builds on Spawning’s existing tools for art provenance and usage rights management.

In 2022, Spawning created HaveIBeenTrained, a website that allows creators to opt out of the training datasets used by vendors who’ve partnered with Spawning, including Hugging Face and Stability AI. After raising $3 million in venture capital from investors, including True Ventures and Seed Club Ventures, Spawning rolled out ai.text, a way for websites to “set permissions” for AI, and a system — Kudurru — to defend against data-scraping bots.

Source.Plus is Spawning’s first effort to build a media library — and curate that library in-house. The initial image dataset, PD/CC0, can be used for commercial or research applications, Meyer says.