|
|
|
AI coding agents are about to get a lot more accurate & reliable for web automation & development -- thanks to this new tool from Vercel. |
 | Using agent-browser with Cloud Code in the CLI to perform Pupeeteer-style web automation with natural language prompts |
|
These agents do excel at code generation -- but what happens when it's time to actually test the code in a real browser, like a human or like Pupeeteer? |
They've always struggled with being able to autonomously navigate the browser-- and identify/manipulate elements in a quick and reliable way. |
Flaky selectors. Bloated DOM code. Screenshots that can't really be understood in the context of your prompts. |
And this is exactly what the agent-browser tool from Vercel is here to fix. |
It's a tiny CLI on top of Playwright, but with one genuinely clever idea that makes browser control way more reliable for AI. |
The killer feature: "snapshot + refs" |
Instead of asking an agent to guess CSS selectors or XPath, agent-browser does this: |
It takes a snapshot of the page's accessibility tree It assigns stable references like @e1, @e2, @e3 to elements Your agent clicks and types using those refs
|
So instead of having to guess the element you mean on its own from a simple prompt like: |
"Find the blue submit button and click it" |
|
|
you get: |
agent-browser snapshot -i # - button "Sign up" [ref=e7] agent-browser click @e7
|
No selector guessing or brittle DOM queries. |
This one design choice makes browser automation way more deterministic for agents. |
Why this is actually a big deal for AI agents |
1. Way less flakiness |
Traditional automation breaks all the time because selectors depend on DOM structure or class names. |
Refs don't care about layout shifts or renamed CSS classes. They point to the exact element from the snapshot the agent just saw. |
That alone eliminates a huge amount of "it worked yesterday" failures. |
2. Much cleaner "page understanding" for the model |
Instead of dumping a massive DOM or a raw screenshot into the model context, you give it a compact, structured snapshot: |
headings inputs buttons links roles labels refs
|
That's a way more usable mental model for an LLM. |
The agent just picks refs and issues actions. No token explosion or weird parsing hacks. |
3. It's built for fast agent loops |
agent-browser runs as a CLI + background daemon.
|
The first command starts a browser. Every command after that reuses it. |
So your agent can do: |
|
All the news that matters to your career & life |
|
Hyper-relevant news. Bite-sized stories. Written with personality. And games that'll keep you coming back. |
Morning Brew is the go-to newsletter for anyone who wants to stay on top of the world's most pressing stories — in a quick, witty, and actually enjoyable way. If it impacts your career or life, you can bet it's covered in the Brew — with a few puns sprinkled in to keep things interesting. |
Join over 4 million people who read Morning Brew every day, and start your mornings with the news that matters most — minus the boring stuff. |
Check it out |
What 100K+ Engineers Read to Stay Ahead |
|
Your GitHub stars won't save you if you're behind on tech trends. |
That's why over 100K engineers read The Code to spot what's coming next. |
Get curated tech news, tools, and insights twice a week Learn about emerging trends you can leverage at work in just 10 mins Become the engineer who always knows what's next
|
Join 100k+ engineers |
0 Komentar untuk "Vercel's new tool just made web dev way easier for AI coding agents"