Google’s Gemini 2.5 Computer Use Model Launched: Everything You Need to Know

Google has introduced a new AI model that takes computer interaction to the next level. Called the Gemini 2.5 Computer Use model, it’s a specialized version of Gemini 2.5 Pro designed to let AI agents directly use websites and apps — not through APIs or code, but through the same graphical interfaces that humans use.

Google’s New Gemini AI Can Click, Type, and Navigate Like You

Teaching AI to “Use” a Computer

Instead of working behind the scenes through structured APIs, this new model can actually perform visible actions on a screen. It can click buttons, fill out forms, scroll through pages, select menu options, and even navigate login screens — all while understanding what’s being shown visually.

Think of it as teaching an AI to use a mouse and keyboard, not just send commands. Google says the model outperforms other AI systems on web and mobile control benchmarks while running with lower latency. It’s currently available in public preview through the Gemini API in Google AI Studio and Vertex AI.

Why It Exists

AI assistants and automation tools have long been good at connecting to apps through APIs, but most real-world workflows still depend on graphical interfaces. Booking an appointment, uploading a file, or completing a checkout form often can’t be automated easily unless something can interact with the UI itself.

That’s the gap Gemini 2.5 Computer Use is trying to close. It gives AI systems the ability to handle those front-end tasks directly — the same ones that human users would normally complete manually.

How It Works

Under the hood, the model works in a continuous loop. It starts with a few inputs — the user’s request, a screenshot of the current screen, and a record of recent actions. Using this, the AI figures out what’s on the page and what action to take next: maybe clicking a button, typing in a field, or scrolling down.

If the task involves something sensitive, like making a purchase, the system pauses to ask for confirmation. After each action, it takes a new screenshot, updates its context, and continues the loop until the job’s done or the user stops it. For now, it’s optimized for browsers, though Google says early mobile performance looks promising. Full desktop support isn’t available yet.

Safety Comes First

Giving AI agents control over a computer obviously raises questions about safety. Google says the model has several layers of protection built in. Every action goes through a per-step safety review before execution, and developers can block or require confirmation for certain high-risk operations — like bypassing CAPTCHAs or accessing secure systems.

The company has also published a Gemini 2.5 Computer Use System Card, outlining its safety framework and best practices. Google’s message is clear: developers should test and monitor these systems thoroughly before letting them run independently.

What It’s Being Used For

Inside Google, versions of this model are already being used for UI testing and agentic automation. It powers tools like Project Mariner, Firebase Testing Agent, and AI Mode in Search, where interface-level control helps simulate how users interact with apps.

Externally, developers in early access programs have tested it for workflow automation, personal assistant development, and quality assurance. Google has also shared interactive demos — one where the model organizes virtual sticky notes on a whiteboard, and another where it plays the 2048 puzzle game on a website.

Available to Try Now

Developers can experiment with Gemini 2.5 Computer Use right now through Google’s AI Studio and Vertex AI platforms. There’s even a live demo hosted by Browserbase, letting anyone see the model perform actions in real time. For those building locally, it also integrates with Playwright and Browserbase APIs for custom testing.

This new capability moves AI systems a step closer to acting as genuine digital collaborators — not just chatbots or copilots, but agents that can actually do things on your screen.

Source