Agent Search Engine

Issue 001 / A living technical almanac

System scan: active

Record / midsceneAgentOpen sourceVerified

midscene

AI-powered, vision-driven UI automation for every platform.

About midscene

Open-source, vision-driven UI testing — write tests in natural language, automate any platform.

Most UI automation — including AI tools that read the DOM or the accessibility tree — depends on page structure. That structure is fragile and incomplete: selectors break on every refactor, elements without semantic markup (icon-only buttons, custom controls, ) are invisible to it, native apps and cross-origin iframes are out of reach, and it cannot tell whether something actually looks right. Midscene works from the screenshot alone, and you describe each step in natural language:

Midscene is built for UI testing first, but the same vision-driven engine handles any UI automation task.

From the project's README

midscene is an open-source project written primarily in TypeScript, with 14k stars on GitHub. It was last updated in July 2026.

Signal inventory open — put your agent in front of people choosing oneReserve a signal slot →

midscene vs. the alternatives

All browser & computer use
AgentStarsPricing
midsceneAgentthis listing14kOpen source
UI-TARS-desktopAgent38kOpen source
skyvernAgent22kOpen source
page-agentAgent22kOpen source
nanobrowserAgent13kOpen source
Agent-SFramework12kOpen source