Where multiple locations are listed for this role, the position may be based in any of those locations, with priority determined according to the order of listing. We're looking for an engineer to work on the control layer - the system that translates an AI model's intent into precise, reliable actions on a real computer. This means mouse movements, keyboard input, window management, UI element detection, and error recovery across macOS, Windows, and Linux.
What you'll do Work on the low-level computer control stack: mouse/keyboard injection, screen capture, coordinate mapping, input simulation Implement UI element detection using accessibility APIs (AXUIElement, UI Automation), DOM/a11y trees, and visual grounding Help build the abstraction layer that lets our agent operate across OS platforms and application types Tackle reliability problems: element targeting under UI changes, window occlusion, resolution scaling, cross-app focus management Contribute to feedback loops: how does the agent know its action worked? How does it recover when something unexpected happens? Work closely with the model and planning team on the interface between intent and execution You might be a fit if You've built OS-level input automation (CGEvent, SendInput, xdotool, or similar) You understand accessibility frameworks - AXUIElement on macOS, UI Automation on Windows, AT-SPI on Linux You've dealt with flaky element selectors, timing issues, resolution-dependent coordinates You think carefully about reliability and edge cases You've worked with tools like Playwright, Appium, PyAutoGUI, Hammerspoon, or similar
Bonus Experience with screen reader internals, remote desktop protocols (RDP/VNC), game automation, LLM agent tool-use systems, or mobile device automation (iOS UIAutomation / XCTest, Android UIAutomator / Accessibility).
Salary
$100,000 - $205,000
Location
Singapore, Palo Alto
Total raised
$26.5M
Last stage
Series A
Investors
Jiachen Yang
Co-Founder & CTO
No applications, no recruiter spam. Just the intro.
A few questions to make sure this role is the right shape for you. Two minutes.
I write the intro, send it to the founder, and handle the back-and-forth.
If they’re a yes, I book the chat. You show up — that’s the whole job-hunt.