Dataset Providers Alliance formed to advocate for ethical data sourcing in AI training

Dataset Providers Alliance formed to advocate for ethical data sourcing in AI training

Seven leading dataset providers have come together to form the Dataset Providers Alliance (DPA), aiming to advocate for ethical data sourcing in the training of artificial intelligence (AI) systems. The alliance seeks to address concerns related to intellectual property rights and the rights of individuals depicted in datasets used for AI training. The group includes companies like Rightsify, vAIsual, Pixta, and Datarade.

The rise of generative AI technologies, which can replicate human creativity, has sparked controversy and copyright lawsuits against tech giants such as Google, Meta, and OpenAI. Many tech companies have been training their AI models using large amounts of content scraped from the internet, often without obtaining consent from content creators or rights owners. While tech companies argue the legality of such usage, they are also quietly seeking access to privately owned content to minimize legal risks. This situation has led to the budding industry of companies packaging and selling licensed data for AI systems.

In response to the growing demand for licensed data, the DPA has been established to set ethical standards in the industry. For instance, members of the alliance are required not to sell text data obtained through web crawling or audio featuring people’s voices without explicit consent. The DPA aims to promote legislation such as the NO FAKES Act, which would introduce penalties for creating unauthorized digital replicas of individuals' voices or likenesses. Furthermore, the alliance will advocate for greater transparency in training data, similar to the requirements outlined in the European Union’s AI Act and the Generative AI Copyright Disclosure Act introduced in the United States in April.

Alex Bestall, CEO of Rightsify and its licensing subsidiary GCX, who spearheaded the founding of the DPA, emphasized the importance of advocacy in addressing the ongoing battles between AI and copyright. Bestall stated that while positions on AI and copyright have been taken, many challenges remain unresolved and will require time to resolve. The DPA plans to release a white paper outlining its positions on these issues in July.

The formation of the Dataset Providers Alliance marks a significant step towards ensuring ethical data practices and intellectual property protection in AI training. By bringing together leading industry players, the alliance aims to establish standards that respect the rights of individuals and content owners. As AI continues to advance, it is crucial to strike a balance between technological innovation and ethical considerations, ensuring that AI systems are trained responsibly and ethically.


Written By

Jiri Bílek

In the vast realm of AI and U.N. directives, Jiri crafts tales that bridge tech divides. With every word, he champions a world where machines serve all, harmoniously.