Massive AI dataset purged, uncovers child abuse content, profound implications.

LAION-5B, a widely-used artificial intelligence (AI) data set used to train text-to-image generators, has been removed by its creator, German nonprofit organization LAION, after it was found to contain instances of suspected child sexual abuse material. A report from researchers at the Stanford Internet Observatory’s Cyber Policy Center identified 3,226 instances of suspected child sexual abuse material in the data set. While the presence of such material may not drastically influence the output of models trained on the data set, it could still have some effect. The dataset, released in March 2022, contains 5.85 billion image-text pairs. LAION has a zero tolerance policy for illegal content and has removed the data sets out of caution, intending to republish them once they are deemed safe.

Post Views: 479