Chapter 3 — Computer Use

🖱️

"From 14.9% (Oct 2024) → 72.5% (Feb 2026). At human baseline.Anthropic Computer Use is not a demo anymore."

You'll learn

What Computer Use is + why 2025-2026 breakthrough
OSWorld-Verified benchmark: 14.9% → 72.5% in 16 months
4 enterprise cases (Rakuten, CRED, TELUS, Zapier)
Why Project Mariner shut down, Anthropic leads
Target: data entry, accounting, legacy software operators

01 What is Computer Use?

Computer Use = AI agent that can:

Take screenshots
Move + click mouse
Type keys
Scroll
Read UI text (OCR + accessibility tree)

→ Interact with computer like a human user, no API needed.

Why breakthrough?

90% of enterprise software has no good API:

Legacy ERP (SAP, Oracle, Baan)
Local SME software (VN: MISA, Bravo, Fast; similar pattern globally)
Internal custom tools
Web apps without public APIs

Before Computer Use: impossible to automate.

After Computer Use: agent clicks like human = automation possible.

02 Chronology — 16 months from 14.9% → 72.5%

Time	Milestone	OSWorld score
Oct 2024	Anthropic launches Computer Use beta	14.9%
Jan 2025	Google Project Mariner launch	~30% (WebVoyager)
Apr 2025	OpenAI Operator launch	~38%
Jul 2025	Sonnet 4.0 + Computer Use 2.0	~52%
Feb 2026	Sonnet 4.6	72.5%
Human baseline	(reference)	~72%

Feb 2026 = moment Computer Use reached human baseline performance.

03 OSWorld-Verified benchmark

OSWorld = benchmark measuring agent on desktop tasks.

Sample tasks

"Open Excel, create table from screenshot data, save to Documents"
"Search YouTube for '[query]', play first video, comment '[text]'"
"In Photoshop, open file, apply filter, save as 1080p PNG"
"In Outlook, search emails from [sender], reply with template [X]"

04 Enterprise cases 2025-2026

Case 1: Rakuten

Item	Detail
Industry	E-commerce
Use case	Auto data entry across vendor portals (legacy, no API)
Stack	Anthropic Computer Use + custom orchestrator
Result	Data entry time cut ~70%

Case 2: CRED (India)

Item	Detail
Industry	Fintech
Use case	Customer support agent navigates internal CRM
Result	Handles 40% support tickets without human

Case 3: TELUS (Canada)

Item	Detail
Industry	Telecom
Use case	Field tech tool navigation, troubleshoot script
Result	First-time-fix rate up 15%

Case 4: Zapier

Item	Detail
Use case	Computer Use ↔ Zapier for web apps without APIs
Result	Coverage up 30% with same Zapier engineering team

05 Why Google Project Mariner shut down

May 2026: Google officially shutdowns Project Mariner (May 4, 2026), absorbs into Gemini Agent.

Reasons (analyst speculation)

Performance gap: Mariner peaked ~83.5% WebVoyager but Anthropic Computer Use universal (web + desktop)
Single-purpose: Mariner browser-only, Anthropic full computer
MCP advantage: Anthropic ecosystem (97M downloads/month) vs Google standalone
Distribution: Anthropic API-first, Google consumer-first → enterprise prefers Anthropic

→ Anthropic leads computer use race 2026.

06 Computer Use workflow — architecture

Agent loop with Computer Use

1. Screenshot → 2. LLM analyze → 3. Action 
   (1080p)        (vision + plan)  (click/type/scroll)
                                        │
                                        ▼
                                 4. Wait + screenshot
                                        │
                                        ▼
                                 5. Verify expected state
                                        │
                          ┌─────────────┴────────┐
                       Continue              Done

Code example (Python)

python

from anthropic import Anthropic
client = Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=[{
        "type": "computer_20251124",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080,
    }],
    messages=[{
        "role": "user",
        "content": "Open Excel, create table with 5 rows, save as report.xlsx"
    }],
    extra_headers={"anthropic-beta": "computer-use-2025-11-24"}
)

07 5 Computer Use patterns

5 high-value use cases

Pattern 1: Legacy ERP automation

Target: SAP / Oracle / Baan / local accounting software
Use case: Daily reconciliation, data export, report gen
Cost saved: 1-2 FTE / department

Pattern 2: Cross-app workflow

Target: 5 software without integration
Use case: Quote → invoice → CRM → email
Cost saved: 30-60% sales ops time

Pattern 3: Web scraping no-API

Target: competitor websites, supplier portals
Use case: Daily price monitor, stock check
Cost saved: $0 vs $500/month scraping services

Pattern 4: QA testing

Target: web app, mobile emulator
Use case: Run regression tests, screenshot diff
Cost saved: 50% manual QA time

Pattern 5: Customer support escalation

Target: internal tool navigation
Use case: Lookup customer, refund, escalate
Cost saved: 20-40% T1 support

08 Prompt pack — Computer Use

5 templates

1. Step-by-step task (high accuracy)

Task: [describe end goal]

Constraints:
- Use [specific app: Excel / Chrome / etc.]
- Save output to [path]
- Don't modify [protected area]

Verify each step before proceeding:
- After click: check state changed
- After type: check text appeared
- After save: check file exists

Pause and ask if uncertain.

2. Cross-app workflow

You'll work across [App A] and [App B]:
1. In [App A]: extract data from [location]
2. In [App B]: paste + format + save
3. Send confirmation email to [user]

Maintain checklist of completed steps.
On error: screenshot + describe before retry.

3. Web automation with verification

Navigate to [URL], perform [task].

Wait for page load (look for [indicator]) before clicking.
If element not found within 10s, take screenshot + explain.
Don't navigate away from [domain] without confirming.

4. Data entry batch

You have [N] rows to enter in [system].
Each row: [field structure]

Process:
- Read row [1] → enter
- Verify saved → next
- Every 10 rows: save checkpoint screenshot
- Stop + report on any error

5. Eval / QA test

Test [feature] by:
1. Navigate to [page]
2. Perform [action sequence]
3. Verify [expected outcome]
4. Take screenshot at each step
5. Compare against [baseline screenshot]

Report: pass/fail per step + diff explanation.

09 Common pitfalls

🚨 7 Computer Use mistakes

1. No sandbox → wrong click can destroy data. Always run in VM/container.

2. No state verification → agent assumes click worked → cascading error. Verify each step.

3. Resolution mismatch → agent trained 1920x1080, target 4K = wrong coordinates. Set display size matching.

4. Sensitive data in screenshots → cookie/password leaks. Mask or use test env.

5. Race conditions → agent clicks before UI ready → fail. Wait for loading indicators.

6. Cost blowup → Computer Use vision tokens expensive ($3/MTok input). Batch screenshots, not every-keystroke.

7. Infinite agent loop → can't recover error → loops forever. Max iteration cap + timeout.

10 Use cases in emerging markets

5 high-pain enterprise niches

Pain	Target user	Computer Use case
Multiple accounting software (MISA + Bravo + Fast in VN)	Accounting SME	Auto sync data, generate reports
Tax filing via government portal	Accountant + biz	Auto-file VAT, CIT returns
HR onboarding 10 legacy systems	HR	Auto create accounts across systems
Procurement check supplier portals	SME buyer	Daily price monitor, order
Customer care check 3 systems (CRM + ERP + warehouse)	CS rep	Auto lookup + escalate

Economics

Item	Cost
Anthropic API (Sonnet 4.6)	$3/$15 per MTok
Compute (sandbox E2B / VM)	$20-100/month
Engineering setup	1-2 weeks dev
Total per agent	$100-500/month running

→ Replace 1 FTE local ($800-2K/month) → ROI 6-12 months.

🏛️ Legal considerations

Computer Use on local machine = OK
Computer Use into client systems = need consent + GDPR-equiv
Cybersecurity regulations (no sensitive data leaving country)
Disclosure: employees know agent uses their machine

🤝 Agency / consultant opportunity

Pitch SME / mid-market: "auto replace 1-2 data entry workers"
Project size: $5K-50K (build + 6-month support)
Recurring: $500-2K/month maintenance + upgrade

11 Practice exercises

✍️ 3 levels

Level 1 — 1 week

Setup Anthropic Computer Use beta
Run 5 example tasks (file manage, browser, Excel)
Measure success rate

Level 2 — 1 month

Pick 1 internal legacy workflow (e.g., daily reconciliation Excel + ERP)
Build full agent automation
Eval 30 tasks → measure accuracy + time saved

Level 3 — 3 months

Pitch 1 SME: automate 1 workflow
Deliver project (build + train + handover)
Charge $5K-15K

12 Continue reading

Final word

"90% of enterprise software has no good API.90% of office work uses that software.Computer Use isn't "cool AI."It's the final automation that unlocks 90% of work."

Category	Sample
Office (Excel, Word, PPT)	Data entry, table format, slide create
Web browser	Form fill, scrape, schedule
Multimedia	Photoshop, video edit
OS-level	File manage, install software
Daily tasks	Email, calendar, messaging

Chapter 3 — Computer Use ​

01 What is Computer Use? ​

Why breakthrough? ​

02 Chronology — 16 months from 14.9% → 72.5% ​

03 OSWorld-Verified benchmark ​

Sample tasks ​

Categories ​

04 Enterprise cases 2025-2026 ​

Case 1: Rakuten ​

Case 2: CRED (India) ​

Case 3: TELUS (Canada) ​

Case 4: Zapier ​

05 Why Google Project Mariner shut down ​

Reasons (analyst speculation) ​

06 Computer Use workflow — architecture ​

Code example (Python) ​

07 5 Computer Use patterns ​

Pattern 1: Legacy ERP automation ​

Pattern 2: Cross-app workflow ​

Pattern 3: Web scraping no-API ​

Pattern 4: QA testing ​

Pattern 5: Customer support escalation ​

08 Prompt pack — Computer Use ​

09 Common pitfalls ​

10 Use cases in emerging markets ​

5 high-pain enterprise niches ​

Economics ​

🏛️ Legal considerations ​

🤝 Agency / consultant opportunity ​

11 Practice exercises ​

12 Continue reading ​