Anthropic Pushes AI Boundaries with Groundbreaking Claude 3.5 Upgrades and Computer Use
Anthropic has announced major upgrades to its AI assistant Claude, marking a significant leap forward in AI capabilities. The improvements include enhanced models, a revolutionary computer use feature, and advanced data analysis tools.
Improved Models: Claude 3.5 Sonnet and Haiku 1
Anthropic has unveiled significant upgrades to its AI models, introducing two new versions: Claude 3.5 Sonnet and Claude 3.5 Haiku.
Despite of keeping the name Claude Sonnet 3.5 (2024-10-22), the enhanced model showcases across-the-board improvements over its predecessor (2024-06-20), with substantial gains in coding abilities. On the SWE-bench Verified benchmark, the new Claude 3.5 Sonnet performance jumped from 33.4% to an impressive 49.0%, surpassing all publicly available models, including specialized systems designed for coding tasks.
GitLab, an early tester of the model’s improvement, commented that they have found that the new Claude 3.5 Sonnet delivers stronger reasoning, up to 10% across use cases, with no added latency. This makes it ideal for powering multi-step software development processes.
In addition to coding enhancements, the new Claude 3.5 Sonnet version has shown significant improvements in agentic tool use tasks. On the TAU-bench, which evaluates an AI’s ability to use tools, the model’s performance increased from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the more challenging airline domain. While benchmarks always need to be taken with a grain of salt, Anthropic is the first major AI company that published benchmarks for agentic model behavior in their analyses.
The new smaller Claude 3.5 Haiku model offers performance matching that of Claude 3 Opus, Anthropic’s previous largest model, on many evaluations. However, it does so at the same cost and similar speed to the previous generation of Haiku. Claude 3.5 Haiku particularly excels in coding tasks, scoring 40.6% on SWE-bench Verified and outperforming many publicly available state-of-the-art models.
Revolutionary Computer Use Feature 1,2
Anthropic has introduced a groundbreaking “computer use” feature, now available in public beta. This innovative capability allows Claude to interact with computers in ways similar to humans – by observing the screen, moving a cursor, clicking buttons, and typing text.
The computer use feature works by utilizing screenshots and instructions to navigate and control a user’s computer. This allows Claude to perform tasks across various applications and interfaces, potentially revolutionizing how AI assistants interact with existing software ecosystems.
“With computer use, we’re trying something fundamentally new. Instead of creating specific tools for individual tasks, we’re teaching Claude general computer skills. This allows it to use a wide range of standard tools and software programs designed for people.”, according to Anthropic’s official announcement.
Several companies, including Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company, have already begun exploring the possibilities of this new feature. For instance, Replit is utilizing Claude 3.5 Sonnet’s computer use capabilities to develop a key feature for their Replit Agent product, which evaluates apps as they’re being built.
On the OSWorld benchmark, which assesses AI models’ ability to use computers like humans, Claude 3.5 Sonnet scored 14.9% in the screenshot-only category, significantly outperforming the next-best AI system’s score of 7.8%. When given more steps to complete tasks, Claude’s score increased to 22.0%.
However, Anthropic acknowledges that the computer use feature is still in its early stages and can be imperfect. Actions that humans perform effortlessly, such as scrolling, dragging, and zooming, currently present challenges for Claude. The company encourages developers to begin exploration with low-risk tasks and has developed new classifiers to identify when computer use is being employed and whether potential harm is occurring.
Enhanced Data Analysis and Visualization Capabilities 3
Alongside the model improvements and computer use feature, Anthropic has rolled out significant enhancements to Claude’s data analysis capabilities. The new analysis tool feature represents a major leap forward in Claude’s ability to work with and interpret complex data sets.
Key aspects of this new feature include:
- Visualization Creation: Claude can now create visualizations and charts directly from data, making it easier for users to interpret complex information visually.
- JavaScript Integration: The AI utilizes JavaScript for analysis and visualization tasks, allowing for more dynamic and interactive data representations.
- Code Writing and Execution: Claude has gained the ability to write and run code, significantly expanding its capabilities in data manipulation and analysis.
- Mathematical Precision: The new analysis tool is designed for enhanced mathematical precision, ensuring accurate calculations and data interpretations.
These improvements position Claude as a powerful assistant for data scientists, analysts, and researchers, capable of handling complex data tasks with increased accuracy and versatility.
Responsible AI Development and Future Outlook
As part of Anthropic’s commitment to responsible AI development, the company conducted joint pre-deployment testing of the new Claude 3.5 Sonnet model with the US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI). Anthropic also evaluated the upgraded model for catastrophic risks and determined that the ASL-2 Standard, as outlined in their Responsible Scaling Policy, remains appropriate.
The introduction of these new models and features represents a significant step forward in AI capabilities. By combining improved reasoning, enhanced coding abilities, innovative computer control features, and advanced data analysis tools, Claude becomes an even more important part of how we interact with technology and information.
As these new features roll out, it will be fascinating to observe how users and developers leverage Claude’s enhanced capabilities. The coming months will likely reveal new and unexpected applications for this AI assistant, that for many has become indispensable.
Anthropic’s announcement marks an exciting development in the world of AI, promising to make artificial intelligence more accessible, powerful, and integrated into our daily lives than ever before.
Sources:
[…] its introduction on October 22, 2024 (see also https://www.theaiobserver.com/anthropic-pushes-ai-boundaries-with-groundbreaking-claude-3-5-upgrades… ), the improved model has climbed to number one on all coding-related leaderboards, cementing its […]
[…] two weeks of anticipation, Anthropic has delivered on its promise, launching Claude 3.5 Haiku on November 4th, 2024. The release marks a significant milestone, with […]