data set

MLCommons and Hugging Face team up to release massive speech data set for AI research

MLCommons, a nonprofit AI safety working group, has teamed up with AI dev platform Hugging Face to release one of the world’s largest collections of public domain voice recordings for AI research. The data set, called Unsupervised People’s Speech, contains more than a million hours of audio spanning at least 89 different languages. MLCommons says […]

MLCommons and Hugging Face team up to release massive speech data set for AI research Read More »

Many companies won’t say if they’ll comply with California’s AI training transparency law

On Sunday, California Governor Gavin Newsom signed a bill, AB-2013, requiring companies developing generative AI systems to publish a high-level summary of the data that they used to train their systems. Among other points, the summaries must cover who owns the data and how it was procured or licensed, as well as whether it includes

Many companies won’t say if they’ll comply with California’s AI training transparency law Read More »