Anthropic’s tardyst Claude 3.5 Sonnet AI model has a recent feature in accessible beta that can deal with a computer by seeing at a screen, moving a cursor, clicking buttons, and typing text. The recent feature, called “computer employ,” is employable today on the API, apshowing enbigers to honest Claude to labor on a computer appreciate a human does, as shown on a Mac in the video below.
Microsoft’s Copilot Vision feature and OpenAI’s desktop app for ChatGPT have shown what their AI tools can do based on seeing your computer’s screen, and Google has aappreciate capabilities in its Gemini app on Android phones. But they haven’t gone to the next step of expansively releasing tools ready to click around and carry out tasks for you appreciate this. Rabbit promised aappreciate capabilities for its R1, which it has yet to deinhabitr.
Anthropic does alert that computer employ is still experimental and can be “cumbersome and error-prone.” The company says, “We’re releasing computer employ punctual for feedback from enbigers, and foresee the capability to better rapidly over time.”
There are many actions that people routinely do with computers (dragging, zooming, and so on) that Claude can’t yet finisheavor. The “flipbook” nature of Claude’s see of the screen—taking screenstoastys and piecing them together, rather than observing a more granular video stream—unbenevolents that it can miss low-inhabitd actions or notifications.
Also, this version of Claude has apparently been tageder to steer clear of social media, with “meabraves to watch when Claude is asked to join in election-roverhappinessed activity, as well as systems for nudging Claude away from activities appreciate generating and posting satisfied on social media, enrolling web domains, or conveying with rulement websites.”
Meanwhile, Anthropic says its recent Claude 3.5 Sonnet model has betterments in many benchtags and is presented to customers at the same price and speed as its predecessor:
The modernized Claude 3.5 Sonnet shows expansive-ranging betterments on industry benchtags, with particularly strong gets in agentic coding and tool employ tasks. On coding, it betters carry outance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all accessiblely employable models—including reasoning models appreciate OpenAI o1-pappraise and one-of-a-kindized systems set uped for agentic coding. It also betters carry outance on TAU-bench, an agentic tool employ task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more challenging airline domain.