Benchmark

A test for AGI is closer to being solved — but it may be flawed

A well-known test for artificial general intelligence (AGI) is closer to being solved. But the tests’s creators say this points to flaws in the test’s design, rather than a bonafide research breakthrough. In 2019, Francois Chollet, a leading figure in the AI world, introduced the ARC-AGI benchmark, short for “Abstract and Reasoning Corpus for Artificial […]

A test for AGI is closer to being solved — but it may be flawed Read More »

The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark

Over the past few months, tech execs like Elon Musk have touted the performance of their company’s AI models on a particular benchmark: Chatbot Arena. Maintained by a nonprofit known as LMSYS, Chatbot Arena has become something of an industry obsession. Posts about updates to its model leaderboards garner hundreds of views and reshares across Reddit and

The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark Read More »