Anthropic’s new AI model can control your PC



In a pitch to investors last spring, Anthropic said it intended to build AI to power virtual assistants that could perform research, answer emails, and handle other back-office jobs on their own. The company referred to this as a “next-gen algorithm for AI self-teaching” — one it believed that could, if all goes according to plan, automate large portions of the economy someday.

It took a while, but that AI is starting to arrive.

Anthropic on Tuesday released an upgraded version of its Claude 3.5 Sonnet model that can understand and interact with any desktop app. Via a new “Computer Use” API, now in open beta, the model can imitate keystrokes, button clicks, and mouse gestures, essentially emulating a person sitting at a PC.

“We trained Claude to see what’s happening on a screen and then use the software tools available to carry out tasks,” Anthropic wrote in a blog post shared with TechCrunch. “When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place.”

Developers can try out Computer Use via Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI platform. The new 3.5 Sonnet without Computer Use is rolling out to Claude apps, and brings various performance improvements over the outgoing 3.5 Sonnet model.

Automating apps

A tool that can automate tasks on a PC is hardly a novel idea. Countless companies offer such tools, from decades-old RPA vendors to newer upstarts like Relay, Induced AI and Automat.

In the race to develop so-called “AI agents,” the field has only become more crowded. AI agents remains an ill-defined term, but it generally refers to AI that can automate software.

Some analysts say AI agents could provide companies with an easier path to monetizing the billions of dollars that they’re pouring into AI. Companies seem to agree: according to a recent Capgemini survey, 10% of organizations already use AI agents and 82% will integrate them within the next three years.

Salesforce made splashy announcements about its AI agent tech this summer, while Microsoft touted new tools for building AI agents yesterday. OpenAI, which is plotting its own brand of AI agents, sees the tech as a step toward super-intelligent AI.

Anthropic calls its take on the AI agent concept an “action-execution layer” that lets the new 3.5 Sonnet perform desktop-level commands. Thanks to its ability to browse the web (not a first for AI models, but a first for Anthropic), 3.5 Sonnet can use any website and any application.

Claude 3.5 Sonnet new
Anthropic’s new AI can control apps on a PC. Image Credits:Anthropic

“Humans remain in control by providing specific prompts that direct Claude’s actions, like ‘use data from my computer and online to fill out this form,’” an Anthropic spokesperson told TechCrunch. “People enable access and limit access as needed. Claude breaks down the user’s prompts into computer commands (e.g.. moving the cursor, clicking, typing) to accomplish that specific task.”

Software development platform Replit has used an early version of the new 3.5 Sonnet model to create an “autonomous verifier” that can evaluate apps while they’re being built. Canva, meanwhile, says that it’s exploring ways in which the new model might be able to support the designing and editing process.

But how is this any different than the other AI agents out there? It’s a reasonable question. Consumer gadget startup Rabbit is building a web agent that can do things like buying movie tickets online; Adept, which was recently acqui-hired by Amazon, trains models to browse websites and navigate software; and Twin Labs is using off-the-shelf models, including OpenAI’s GPT-4o, to automate desktop processes.

Anthropic claims the new 3.5 Sonnet is simply a stronger, more robust model that can do better on coding tasks than even OpenAI’s flagship o1, per the SWE-bench Verified benchmark. Despite not being explicitly trained to do so, the upgraded 3.5 Sonnet self-corrects and retries tasks when it encounters obstacles, and can work toward objectives that require dozens or hundreds of steps.

Claude 3.5 Sonnet new
The new Claude 3.5 Sonnet model’s performance on various benchmarks. Image Credits:Anthropic

But don’t fire your secretary just yet.

In an evaluation designed to test an AI agent’s ability to help with airline booking tasks, like modifying a flight reservation, the new 3.5 Sonnet managed to complete less than half of the tasks successfully. In a separate test involving tasks like initiating a return, 3.5 Sonnet failed roughly a third of the time.

Anthropic admits the upgraded 3.5 Sonnet struggles with basic actions like scrolling and zooming, and that it can miss “short-lived” actions and notifications because of the way it takes screenshots and pieces them together.

“Claude’s Computer Use remains slow and often error-prone,” Anthropic writes in its post. “We encourage developers to begin exploration with low-risk tasks.”

Risky business

But is the new 3.5 Sonnet capable enough to be dangerous? Possibly.

A recent study found that models without the ability to use desktop apps, like OpenAI’s GPT-4o, were willing to engage in harmful “multi-step agent behavior,” such as ordering a fake passport from someone on the dark web, when “attacked” using jailbreaking techniques. Jailbreaks led to high rates of success in performing harmful tasks even for models protected by filters and safeguards, according to the researchers.

One can imagine how a model with desktop access could wreak more havoc — say, by exploiting app vulnerabilities to compromise personal info (or storing chats in plaintext). Aside from the software levers at its disposal, the model’s online and app connections could open up avenues for malicious jailbreakers.

Anthropic doesn’t deny that there’s risk in releasing the new 3.5 Sonnet. But the company argues that the benefits of observing how the model is used in the wild ultimately outweigh this risk.

“We think it’s far better to give access to computers to today’s more limited, relatively safer models,” the company wrote. “This means we can begin to observe and learn from any potential issues that arise at this lower level, building up computer use and safety mitigations gradually and simultaneously.”

Claude 3.5 Sonnet new
Image Credits:Anthropic

Anthropic also says it has taken steps to deter misuse, like not training the new 3.5 Sonnet on users’ screenshots and prompts, and preventing the model from accessing the web during training. The company says it developed classifiers to “nudge” 3.5 Sonnet away from actions perceived as high-risk, such as posting on social media, creating accounts and interacting with government websites.

As the U.S. general election nears, Anthropic says it is focused on mitigating election-related abuse of its models. The U.S. AI Safety Institute and U.K. Safety Institute, two separate but allied government agencies dedicated to evaluating AI model risk, tested the new 3.5 Sonnet prior to its deployment.

Anthropic told TechCrunch it has the ability to restrict access to additional websites and features “if necessary,” to protect against spam, fraud, and misinformation, for example. As a safety precaution, the company retains any screenshots captured by Computer Use for at least 30 days — a retention period that might alarm some devs.

We’ve asked Anthropic under which circumstances, if any, it would hand over screenshots to a third party (e.g., law enforcement) if asked, and will update this post if we hear back.

“There are no foolproof methods, and we will continuously evaluate and iterate on our safety measures to balance Claude’s capabilities with responsible use,” Anthropic said. “Those using the computer-use version of Claude should take the relevant precautions to minimize these kinds of risks, including isolating Claude from particularly sensitive data on their computer.”

Hopefully, that’ll be enough to prevent the worst from occurring.

A cheaper model

Today’s headliner might’ve been the upgraded 3.5 Sonnet model, but Anthropic also said an updated version of Haiku, the cheapest, most efficient model in its Claude series, is on the way.

Claude 3.5 Haiku, due in the coming weeks, will match the performance of Claude 3 Opus, once Anthropic’s state-of-the-art model, on certain benchmarks at the same cost and “approximate speed” of Claude 3 Haiku.

“With fast speeds, improved instruction following, and more accurate tool use, Claude 3.5 Haiku is well suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from huge volumes of data like purchase history, pricing or inventory data,” Anthropic wrote in a blog post.

3.5 Haiku will initially be available as a text-only model and later as part of a multimodal package that can analyze both text and images.

Claude 3.5 Haiku
3.5 Haiku’s benchmark performance. Image Credits:Anthropic

So once 3.5 Haiku is available, will there be much reason to use 3 Opus? What about 3.5 Opus, 3 Opus’ successor, which Anthropic teased back in June?

“All of the models in the Claude 3 model family have their individual uses for customers,” the Anthropic spokesperson said. “Claude 3.5 Opus is on our roadmap and we’ll be sure to share more as soon as we can.”




Source

The application allows you to download YouTube videos easily. Mexican news outlet Milenio on July 1 published its unofficial count of murders in Mexico for the first half of. Buppah takes control of the vehicle and causes it to crash into another truck. Sample Origin and Preparation Single valves from Arachnoidiscus and Cocconeis genera have been obtained from the AM sample of the Hustedt collection 13. I promise to always fight with you and for you, never against you. Fischer points out that before the imminent battle against the Philistines, Saul's loneliness and desperation is evident as recorded in 1 Samuel and Saul's sense of alienation is reinforced by the recollection of Samuel's death at 1 Samuel. For some crops such as tomatoes, peppers and celery, monitoring for cutworms must continue through to harvest. Burkitt's lymphoma BL and chronic lymphocytic leukemia CLL are examples of contrasting B-cell cancers BL is a highly aggressive lymphoid tumor, frequently affecting children, whereas CLL typically presents as an indolent, slow-progressing leukemia affecting the elderly. The general function of these muscles is to produce extension at the wrist and fingers. While the rooms for the retreat are full, commuters are still welcome. He said "her most holy soul was filled with the divine Spirit of Jesus Christ more than all other creatures of God taken together. To turn the water temperature up, move the outermost plastic piece to the left a few splines. After the steamboats, railroads were the next major form of commercial transportation to come to Cincinnati. The following picture provides a view of the two daughter boards at their original location at the rear of the device main board has been removed. Hidden categories: Webarchive template wayback links CS1 maint: archived copy as title CS1 maint: BOT: original-url status unknown Use mdy dates from February All articles with specifically marked weasel-worded phrases Articles with specifically marked weasel-worded phrases from May Wikipedia articles needing clarification from October All articles with unsourced statements Articles with unsourced statements from June Articles with unsourced statements from October.