{"id":80,"date":"2024-06-13T07:19:39","date_gmt":"2024-06-13T07:19:39","guid":{"rendered":"https:\/\/duosols.com\/?p=80"},"modified":"2024-06-13T07:19:39","modified_gmt":"2024-06-13T07:19:39","slug":"spawning-wants-to-build-more-ethical-ai-training-datasets","status":"publish","type":"post","link":"https:\/\/duosols.com\/spawning-wants-to-build-more-ethical-ai-training-datasets\/","title":{"rendered":"Spawning wants to build more ethical AI training datasets"},"content":{"rendered":"<p>Jordan Meyer and Mathew Dryhurst founded Spawning AI to create tools that help artists exert more control over how their works are used online. Their latest project, called Source.Plus, is intended to curate \u201cnon-infringing\u201d media for AI model training.<\/p>\n<p>The Source.Plus project\u2019s first initiative is a dataset seeded with nearly 40 million public domain images and images under the Creative Commons\u2019 CC0 license, which allows creators to waive nearly all legal interest in their works. Meyer claims that, despite the fact that it\u2019s substantially smaller than some other generative AI training data sets out there, Source.Plus\u2019 data set is already \u201chigh-quality\u201d enough to train a state-of-the-art image-generating model.<\/p>\n<p>\u201cWith Source.Plus, we\u2019re building a universal \u2018opt-in\u2019 platform,\u201d Meyer said. \u201cOur goal is to make it easy for rights holders to offer their media for use in generative AI training \u2014 on their own terms \u2014 and frictionless for developers to incorporate that media into their training workflows.\u201d<\/p>\n<h2>Rights management<\/h2>\n<p>The debate around the ethics of training generative AI models, particularly art-generating models like Stable Diffusion and OpenAI\u2019s DALL-E 3, continues unabated \u2014 and has massive implications for artists however the dust ends up settling.<\/p>\n<p>Generative AI models \u201clearn\u201d to produce their outputs (e.g., photorealistic art) by training on a vast quantity of relevant data \u2014 images, in that case. Some developers of these models argue that fair use entitles them to scape data from public sources, regardless of that data\u2019s copyright status. Others have attempted to toe the line, compensating or at least crediting content owners for their contributions to training sets.<\/p>\n<p>Meyer, Spawning\u2019s CEO, believes that no one\u2019s settled on a best approach \u2014 yet.<\/p>\n<p>\u201cAI training frequently defaults to using the easiest available data \u2014 which hasn\u2019t always been the most fair or responsibly sourced,\u201d he told TechCrunch in an interview. \u201cArtists and rights holders have had little control over how their data is used for AI training, and developers have not had high-quality alternatives that make it easy to respect data rights.\u201d<\/p>\n<p>Source.Plus, available in limited beta, builds on Spawning\u2019s existing tools for art provenance and usage rights management.<\/p>\n<p>In 2022, Spawning created HaveIBeenTrained, a website that allows creators to opt out of the training datasets used by vendors who\u2019ve partnered with Spawning, including Hugging Face and Stability AI. After raising $3 million in venture capital from investors, including True Ventures and Seed Club Ventures, Spawning rolled out ai.text, a way for websites to \u201cset permissions\u201d for AI, and a system \u2014 Kudurru \u2014 to defend against data-scraping bots.<\/p>\n<p>Source.Plus is Spawning\u2019s first effort to build a media library \u2014 and curate that library in-house. The initial image dataset, PD\/CC0, can be used for commercial or research applications, Meyer says.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Jordan Meyer and Mathew Dryhurst founded Spawning AI to create tools that help artists exert more control over how their works are used online. Their latest project, called Source.Plus, is intended to curate \u201cnon-infringing\u201d media for AI model training. The Source.Plus project\u2019s first initiative is a dataset seeded with nearly 40 million public domain images [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":81,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","rank_math_lock_modified_date":false,"inline_featured_image":false,"footnotes":""},"categories":[5],"tags":[],"class_list":["post-80","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-startups"],"_links":{"self":[{"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/posts\/80","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/comments?post=80"}],"version-history":[{"count":1,"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/posts\/80\/revisions"}],"predecessor-version":[{"id":82,"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/posts\/80\/revisions\/82"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/media\/81"}],"wp:attachment":[{"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/media?parent=80"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/categories?post=80"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/duosols.com\/wp-json\/wp\/v2\/tags?post=80"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}