study

A test for AGI is closer to being solved — but it may be flawed

A well-known test for artificial general intelligence (AGI) is closer to being solved. But the tests’s creators say this points to flaws in the test’s design, rather than a bonafide research breakthrough. In 2019, Francois Chollet, a leading figure in the AI world, introduced the ARC-AGI benchmark, short for “Abstract and Reasoning Corpus for Artificial […]

A test for AGI is closer to being solved — but it may be flawed Read More »

AI models get more election questions wrong when asked in Spanish, study shows

AI models are struggling to accurately answer election-related questions in Spanish. That’s according to a new study from the AI Democracy Projects, a collaboration between Proof News, fact-checking service Factchequeado and the Institute for Advanced Study in San Francisco. The study found a sharp disparity between the factuality of English- and Spanish-language responses produced by five

AI models get more election questions wrong when asked in Spanish, study shows Read More »

Study suggests that even the best AI models hallucinate a bunch

All generative AI models hallucinate, from Google’s Gemini to Anthropic’s Claude to the latest stealth release of OpenAI’s GPT-4o. The models are unreliable narrators in other words — sometimes to hilarious effect, other times problematically so. But not all models make things up at the same rate. And the kinds of mistruths they spout depend

Study suggests that even the best AI models hallucinate a bunch Read More »

Many safety evaluations for AI models have significant limitations

Despite increasing demand for AI safety and accountability, today’s tests and benchmarks may fall short, according to a new report. Generative AI models — models that can analyze and output text, images, music, videos and so on — are coming under increased scrutiny for their tendency to make mistakes and generally behave unpredictably. Now, organizations

Many safety evaluations for AI models have significant limitations Read More »

Making AI models ‘forget’ undesirable data hurts their performance

So-called “unlearning” techniques are used to make a generative AI model forget specific and undesirable info it picked up from training data, like sensitive private data or copyrighted material. But current unlearning techniques are a double-edged sword: They could make a model like OpenAI’s GPT-4o or Meta’s Llama 3.1 405B much less capable of answering

Making AI models ‘forget’ undesirable data hurts their performance Read More »

Study finds that AI models hold opposing views on controversial topics

Not all generative AI models are created equal, particularly when it comes to how they treat polarizing subject matter. In a recent study presented at the 2024 ACM Fairness, Accountability and Transparency (FAccT) conference, researchers at Carnegie Mellon, the University of Amsterdam and AI startup Hugging Face tested several open text-analyzing models, including Meta’s Llama

Study finds that AI models hold opposing views on controversial topics Read More »