benchmarking Archives - GenixPlay Studios

Even some of the best AI can’t beat this new benchmark

The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems. The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make […]

Even some of the best AI can’t beat this new benchmark Read More »

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

AI / Sasandara Dilmina

An organization developing math benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community. Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath.

AI benchmarking organization criticized for waiting to disclose funding from OpenAI Read More »

Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024

AI / Sasandara Dilmina

When a company releases a new AI video generator, it’s not long before someone uses it to make a video of actor Will Smith eating spaghetti. It’s become something of a meme as well as a benchmark: Seeing whether a new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself

Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024 Read More »

Can Pictionary and Minecraft test AI models’ ingenuity?

AI / Sasandara Dilmina

Most AI benchmarks don’t tell us much. They ask questions that can be solved with rote memorization, or cover topics that aren’t relevant to the majority of users. So some AI enthusiasts are turning to games as a way to test AIs’ problem-solving skills. Paul Calcraft, a freelance AI developer, has built an app where

Can Pictionary and Minecraft test AI models’ ingenuity? Read More »