Terminal Bench measures real-world engineering capability, not cherry-picked demos.
We build agents that ship
Focused on reliability, not demos. Open research, pragmatic engineering, and a ruthless bar for real-world performance.
Open by default
We publish research and interfaces so teams can build on top with confidence.
Terminal-native
Lives where engineers work. Understands large repos and executes end-to-end tasks.
Measured results
Benchmarked on real engineering suites—not cherry-picked demos.
our mission
The AI agent built for teams that ship to production.
OB-1 combines deep codebase understanding with testing-first development and formal verification. Terminal-native, multi-platform, and ranked #1 on Terminal Bench—we're built for real engineering challenges, not cherry-picked demos. From fintech to healthcare, teams choose OB-1 when correctness matters.