Claude Desktop and OpenAI Atlas can now control your browser, click buttons, fill forms, navigate between tabs. I tested this by having it edit a Google Sheet and then create templated letters pulling data directly from that sheet. No APIs, no integrations, just plain English instructions to an AI that can see your screen and move your mouse.
If you’re running web based SaaS applications, this changes how you think about automation. Traditional RPA needed perfect APIs and rigid workflows. These AI agents work with the messy reality of how web apps actually behave, they adapt to layout changes, they handle exceptions, they figure out the next step without breaking when a button moves three pixels to the left.
Think about your internal tools right now. All those one off scripts someone wrote to move data between systems, all those manual processes where someone copies from one web form and pastes into another, all those “it only takes 10 minutes” tasks that eat up hours across your team. That’s the automation sweet spot these browser agents are built for.
I’ll be honest, both are still slow and sluggish right now. You’re not going to impress anyone watching these things think through each click in real time. But the trajectory is clear, and in a year or less the speed will be there. We’ve seen this pattern before with LLMs, the capability shows up first, then the performance catches up fast.
I’m watching Google scramble to get Gemini integrated into Chrome because they see what’s coming. If I was Microsoft or Apple, I’d be building this straight into the OS layer, not just the browser. The company that owns the interface layer where AI agents operate is going to have a massive advantage in the next phase of enterprise software.
The window between “interesting demo” and “competitive necessity” is getting shorter every quarter. Time to start testing.
Leave a comment