This Week in AI: It’s shockingly easy to make a Kamala Harris deepfake



Hiya, folks, welcome to TechCrunch’s regular AI newsletter. If you want this in your inbox every Wednesday, sign up here.

It was shockingly easy to create a convincing Kamala Harris audio deepfake on Election Day. It cost me $5 and took less than two minutes, illustrating how cheap, ubiquitous generative AI has opened the floodgates to disinformation.

Creating a Harris deepfake wasn’t my original intent. I was playing around with Cartesia’s Voice Changer, a model that transforms your voice into a different voice while preserving the original’s prosody. That second voice can be a “clone” of another person’s — Cartesia will create a digital voice double from any 10-second recording.

So, I wondered, would Voice Changer transform my voice into Harris’? I paid $5 to unlock Cartesia’s voice cloning feature, created a clone of Harris’ voice using recent campaign speeches, and selected that clone as the output in Voice Changer.

It worked like a charm:

I’m confident that Cartesia didn’t exactly intend for its tools to be used in this way. To enable voice cloning, Cartesia requires that you check a box indicating that you won’t generate anything harmful or illegal and that you consent to your speech recordings being cloned.

But that’s just an honor system. Absent any real safeguards, there’s nothing preventing a person from creating as many “harmful or illegal” deepfakes as they wish.

That’s a problem, it goes without saying. So what’s the solution? Is there one? Cartesia can implement voice verification, as some other platforms have done. But by the time it does, chances are a new, unfettered voice cloning tool will have emerged.

I spoke about this very issue with experts at TC’s Disrupt conference last week. Some were supportive of the idea of invisible watermarks so that it’s easier to tell whether content has been AI-generated. Others pointed to content moderation laws such as the Online Safety Act in the U.K., which they argued might help stem the tide of disinformation.

Call me a pessimist, but I think those ships have sailed. We’re looking at, as CEO of the Center for Countering Digital Hate Imran Ahmed put it, a “perpetual bulls— machine.”

Disinformation is spreading at an alarming rate. Some high-profile examples from the past year include a bot network on X targeting U.S. federal elections and a voicemail deepfake of President Joe Biden discouraging New Hampshire residents from voting. But U.S. voters and tech-savvy people aren’t the targets of most of this content, according to True Media.org’s analysis, so we tend to underestimate its presence elsewhere.

The volume of AI-generated deepfakes grew 900% between 2019 and 2020, according to data from the World Economic Forum.

Meanwhile, there’s relatively few deepfake-targeting laws on the books. And deepfake detection is poised to become a never-ending arms race. Some tools inevitably won’t opt to use safety measures such as watermarking, or will be deployed with expressly malicious applications in mind.

Short of a sea change, I think the best we can do is be intensely skeptical of what’s out there — particularly viral content. It’s not as easy as it once was to tell truth from fiction online. But we’re still in control of what we share versus what we don’t. And that’s much more impactful than it might seem.

News

ChatGPT Search review: My colleague Max took OpenAI’s new search integration for ChatGPT, ChatGPT Search, for a spin. He found it to be impressive in some ways, but unreliable for short queries containing just a few words.

Amazon drones in Phoenix: A few months after ending its drone-based delivery program, Prime Air, in California, Amazon says that it’s begun making deliveries to select customers via drone in Phoenix, Arizona.

Ex-Meta AR lead joins OpenAI: The former head of Meta’s AR glasses efforts, including Orion, announced on Monday she’s joining OpenAI to lead robotics and consumer hardware. The news comes after OpenAI hired the co-founder of X (formerly Twitter) challenger Pebble.

Held back by compute: In a Reddit AMA, OpenAI CEO Sam Altman admitted that a lack of compute capacity is one major factor preventing the company from shipping products as often as it’d like.

AI-generated recaps: Amazon has launched “X-Ray Recaps,” a generative AI-powered feature that creates concise summaries of entire TV seasons, individual episodes, and even parts of episodes.

Anthropic hikes Haiku prices: Anthropic’s newest AI model has arrived: Claude 3.5 Haiku. But it’s pricier than the last generation, and unlike Anthropic’s other models, it can’t analyze images, graphs, or diagrams just yet.

Apple acquires Pixelmator: AI-powered image editor Pixelmator announced on Friday that it’s being acquired by Apple. The deal comes as Apple has grown more aggressive about integrating AI into its imaging apps.

An ‘agentic’ Alexa: Amazon CEO Andy Jassy last week hinted at an improved “agentic” version of the company’s Alexa assistant — one that could take actions on a user’s behalf. The revamped Alexa has reportedly faced delays and technical setbacks, and might not launch until sometime in 2025.

Research paper of the week

Pop-ups on the web can fool AI, too — not just grandparents.

In a new paper, researchers from Georgia Tech, the University of Hong Kong, and Stanford show that AI “agents” — AI models that can complete tasks — can be hijacked by “adversarial pop-ups” that instruct the models to do things like download malicious file extensions.

Adversarial AI pop-ups
Image Credits:Zhang et al.

Some of these pop-ups are quite obviously traps to the human eye — but AI isn’t as discerning. The researchers say that the image- and text-analyzing models they tested failed to ignore pop-ups 86% of the time, and — as a result — were 47% less likely to complete tasks.

Basic defenses, like instructing the models to ignore the pop-ups, weren’t effective. “Deploying computer-use agents still suffers from significant risks,” the co-authors of the study wrote, “and more robust agent systems are needed to ensure safe agent workflow.”

Model of the week

Meta announced yesterday that it’s working with partners to make its Llama “open” AI models available for defense applications. Today, one of those partners, Scale AI, announced Defense Llama, a model built on top of Meta’s Llama 3 that’s “customized and fine-tuned to support American national security missions.”

Defense Llama, which is available in Scale’s Donavan chatbot platform for U.S. government customers, was optimized for planning military and intelligence operations, Scale says. Defense Llama can answer defense-related questions, for example like how an adversary might plan an attack against a U.S. military base.

So what makes Defense Llama different from stock Llama? Well, Scale says it was fine-tuned on content that might be relevant to military operations, like military doctrine and international humanitarian law, as well as the capabilities of various weapons and defense systems. It also isn’t restricted from answering questions about warfare, like a civilian chatbot might be:

Image Credits:Scale.ai

It’s not clear who might be inclined use it, though.

The U.S. military has been slow to adopt generative AI — and skeptical of its ROI. So far, the U.S. Army is the only branch of the U.S. armed forces with a generative AI deployment. Military officials have expressed concerns about security vulnerabilities in commercial models, as well as legal challenges associated with intelligence data sharing and models’ unpredictability when faced with edge cases.

Grab bag

Spawning AI, a startup creating tools to enable creators to opt out of generative AI training, has released an image dataset for training AI models that it claims is fully public domain.

Most generative AI models are trained on public web data, some of which may be copyrighted or under a restrictive license. OpenAI and many other AI vendors argue that fair-use doctrine shields them from copyright claims. But that hasn’t stopped data owners from filing lawsuits.

Spawning AI says its training dataset of 12.4 million image-caption pairs includes only content with “known provenance” and “labeled with clear, unambiguous rights” for AI training. Unlike some other datasets, it’s also available for download from a dedicated host, eliminating the need to web-scrape.

“Significantly, the public-domain status of the dataset is integral to these larger goals,” Spawning writes in a blog post. “Datasets that include copyrighted images will continue to rely on web-scraping because hosting the images would violate copyright.”

Spawning’s dataset, PD12M, and a version curated for “aesthetically pleasing” images, PD3M, can be found at this link.




Source