Claude 3.5 can now operate your computer like a human

Anthropic is ushering in a new era of AI agents.

Oct 23, 2024

Anthropic's Claude has just leveled up. Now, not only can it think for you, but it can also perform actions, sparing you the effort of even lifting a finger.

Two major updates for Claude 3.5

Today, Anthropic unveiled two major updates to Claude 3.5:

The enhanced Claude 3.5 Sonnet features improved performance across the board, especially in programming. It introduces a groundbreaking new feature: computer use. Developers can now instruct Claude to use a computer as if it were a human—viewing the screen, moving the mouse, clicking buttons, and typing.
A new model called Claude 3.5 Haiku, which Anthropic touts as the "next generation of their fastest model."

Claude 3.5 Sonnet: AI that uses a computer like a human

Let's start with Claude 3.5 Sonnet. This isn't just about typical voice commands or simple tasks like checking a calendar. This is real "computer use." Claude can comprehend complex commands and automatically execute a sequence of actions: browsing the web, finding information, filling out forms, and even testing or developing software. It looks like a real person remotely controlling your computer while you just watch.

For instance, in a demo, Claude was tasked with filling out a vendor request form using data spread across different files and systems. It checked the spreadsheet on screen, switched to the CRM system to find the needed data, and filled out and submitted the form—all without any manual intervention.

In another example, Claude built a basic website. It opened a browser, accessed cloud.ai, and wrote the code for a '90s-style homepage. When local modifications were needed, Claude downloaded the files, opened them in VS Code, analyzed the content, identified issues, made corrections, and successfully started the server. It even fixed an error in the website by editing the code and restarting the server.

Claude can also assist with everyday tasks. In one demo, it planned a sunrise hike—finding the best spots in San Francisco to see the Golden Gate Bridge at dawn, calculating distances, and adding all the information to a calendar.

Anthropic has taken a different approach with this model. Instead of developing tools for specific tasks, they've taught Claude general computer skills, enabling it to use software designed for humans. With these capabilities, Claude can efficiently write documents, manage spreadsheets, and perform tasks across a variety of software platforms without requiring dedicated APIs.

Claude's ability to use a computer involves four main steps:

Initialization: Set up Claude with the computer tools and provide the task—e.g., saving images or managing files.
Decision Making: Claude assesses whether it can use a tool to fulfill the request. If yes, it generates a formatted request.
Tool Execution: The system executes Claude's request in a secure environment and provides feedback.
Continuous Operation: Claude checks the results and decides if further actions are required, repeating until the task is done.

Currently, this computer use feature is in testing and has limitations—for example, struggles with scrolling or drag-and-drop actions. During demos, there have been small bugs, such as accidentally stopping a screen recording.

Also, the updated Claude 3.5 Sonnet shows notable improvements in industry benchmarks, particularly in coding and tool usage tasks. Its SWE-bench Verified coding score jumped from 33.4% to 49.0%, surpassing all publicly available models, including OpenAI's o1-preview. On TAU-bench, its tool usage score rose from 62.6% to 69.2% in retail and from 36.0% to 46.0% in the airline domain. Despite these advancements, Claude 3.5 Sonnet maintains the same price and speed as before.

Claude 3.5 Haiku: cost-effective and fast

The new Claude 3.5 Haiku offers the best value in the Claude lineup. It has improved in several areas compared to Claude 3 while keeping costs and speed similar. In some benchmark tests, it even outperformed the larger Claude 3 Opus model. Haiku will be available later this month through various platforms, including Amazon Bedrock and Google Cloud's Vertex AI.

Final thoughts

These updates from Anthropic are impressive at a time when large model advancements have become almost routine. While other AI systems focus on specialized tools, Anthropic aims for a more human-like versatility—giving AI the ability to interact with computers just like we do.

Anthropic appears to be leading the charge in creating AI that doesn't just think but acts in our digital environments. Their strategy could redefine how we interact with technology in the coming years.