Child abuse and exploitation are horrific crimes that no child should ever have to endure. Unfortunately, advancements in technology have enabled abusers to exploit artificial intelligence (AI) to create disturbing and illegal content known as “deepfakes.” These deepfakes involve using AI to generate realistic images and videos that depict child abuse, enabling offenders to manipulate and blackmail their victims.
In the UK, the creation of simulated child abuse imagery is already illegal. Both the Labour and Conservative parties agree that all explicit AI-generated images of real people should be banned. However, globally, there is a lack of consensus on how to regulate and police deepfake technology. The true challenge lies in the fact that the creation of such explicit images is deeply ingrained in the foundations of AI image generation itself.
To illustrate the severity of the issue, researchers at Stanford University made a disturbing discovery. Buried within one of the largest training sets for AI image generators, known as Laion (Large-scale AI Open Network), they found hundreds, possibly thousands, of instances of child sexual abuse material (CSAM). Laion contains approximately 5 billion images, making it impossible for researchers to manually examine the entire dataset in a reasonable amount of time. Instead, they developed an automated system to scan the database and flag questionable images for further review by law enforcement.
The creators of Laion were quick to respond to the discovery by removing the dataset from download. They clarified that the dataset consisted of URLs linking to images hosted elsewhere on the internet, and they had never distributed the illegal images themselves. Nevertheless, the damage was already done. AI models trained on Laion-5B have been widely used worldwide, incorporating the illicit training data into their neural networks. This means that these AI image generators have the ability to create explicit content, both involving adults and children, simply because they have been exposed to it during training.
Laion is not the only dataset that may contain illegal content. It was created as an “open source” project, freely available for anyone to use for AI research. While open source datasets promote innovation and collaboration, they also come with risks. For instance, Stable Diffusion, one of the breakthrough image generators of 2022, utilized Laion as a training dataset. On the other hand, companies like OpenAI take a different approach. They only provide a “model card” for their Dall-E 3 system, which states that the images used for training were sourced from publicly available and licensed content. OpenAI claims to have made efforts to filter explicit content from their data, but their ability to completely guarantee a clean dataset remains uncertain.
One advantage of closed-source models like Dall-E 3 is that they offer greater control over the generated content. Users cannot download the model and run it on their own hardware; instead, requests must go through the company’s systems, which have additional filtering mechanisms to weed out explicit content. This approach, employed by other companies like Google, provides an added layer of protection. AI safety experts argue that these measures are more reliable than relying solely on a system that has been trained to avoid creating explicit content.
Finding a balance between open-source development and tackling the issue of explicit AI-generated content is crucial. Open source AI development fosters innovation and may hold the key to developing effective tools for addressing future harms. However, in the short term, proposed regulations mainly target purpose-built tools used for creating explicit deepfakes. The focus is placed on taking action against the creators and hosts of these tools.
The fight against explicit AI images raises broader questions about the limitations of a technology that is not fully understood. While banning purpose-built tools is a step in the right direction, tackling the deeper challenges of deepfake regulation requires a multifaceted approach. It involves addressing technical limitations, implementing stricter guidelines for dataset curation, and continuously adapting to the evolving landscape of AI technology.
Protecting children from exploitation is a collective responsibility, and it requires the collaboration of governments, tech companies, researchers, and society as a whole. By combining efforts, implementing robust regulations, and promoting responsible AI development, we can work towards a future where children are safe from the horrors of deepfake exploitation.
Use the share button below if you liked it.