ai benchmarks Archives - GenixPlay Studios

Anthropic, Google score win by nabbing OpenAI-backed Harvey as a user

Popular legal AI tool Harvey will now be using leading foundation models from Anthropic and Google, moving beyond strictly using OpenAI’s, Harvey announced in a blog post on Tuesday. This is noteworthy because Harvey is one of the OpenAI Startup Fund’s most successful early-backed portfolio companies. The OpenAI Startup Fund is an OpenAI-associated fund to […]

Anthropic, Google score win by nabbing OpenAI-backed Harvey as a user Read More »

The rise of AI ‘reasoning’ models is making benchmarking more expensive

AI / Sasandara Dilmina

AI labs like OpenAI claim that their so-called “reasoning” AI models, which can “think” through problems step by step, are more capable than their non-reasoning counterparts in specific domains, such as physics. But while this generally appears to be the case, reasoning models are also much more expensive to benchmark, making it difficult to independently

The rise of AI ‘reasoning’ models is making benchmarking more expensive Read More »

Why IQ is a poor test for AI

AI / Sasandara Dilmina

During a recent press appearance, OpenAI CEO Sam Altman said that he’s observed the “IQ” of AI rapidly improve over the past several years. “Very roughly, it feels to me like — this is not scientifically accurate, this is just a vibe or spiritual answer — every year we move one standard deviation of IQ,”

Why IQ is a poor test for AI Read More »

People are benchmarking AI by having it make balls bounce in rotating shapes

AI / Sasandara Dilmina

The list of informal, weird AI benchmarks keeps growing. Over the past few days, some in the AI community on X have become obsessed with a test of how different AI models, particularly so-called reasoning models, handle prompts like this: “Write a Python script for a bouncing yellow ball within a shape. Make the shape

People are benchmarking AI by having it make balls bounce in rotating shapes Read More »

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

AI / Sasandara Dilmina

An organization developing math benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community. Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath.

AI benchmarking organization criticized for waiting to disclose funding from OpenAI Read More »