Gambit is an open-source agent harness designed to simplify the development of reliable AI agents by providing an operating system-like environment. It allows defining agents in markdown or TypeScript, managing tool calling, planning, and context, and includes automatic evaluation 'graders' and test agents for synthetic data generation. The core idea is to shift from a compute-LLM-compute pipeline to an LLM-centric orchestration.
Reach out to developers and AI engineers building complex agents, highlighting how Gambit's evaluation and testing features could integrate or enhance 'agent-eval-lab' for robust agent development.
Hey HN! Wanted to show our open source agent harness called Gambit. If you’re not familiar, agent harnesses are sort of like an operating system for an agent... they handle tool calling, planning, context window management, and don’t require as much developer orchestration. Normally you might see an agent orchestration framework pipeline like: compute -> compute -> compute -> LLM -> compute -> compute -> LLM we invert this so with an agent harness, it’s more like: LLM -> LLM -> LLM -> compute -> LLM -> LLM -> compute -> LLM Essentially you describe each agent in either a self contained markdown file, or as a typescript program. Your root agent can bring in other agents as needed, and we create a typesafe way for you to define the interfaces between those agents. We call these decks. Agents can call agents, and each agent can be designed with whatever model params make sense for your task. Additionally, each step of the chain gets automatic evals, we call graders. A grader is another deck type… but it’s designed to evaluate and score conversations (or individual conversation turns). We also have test agents you can define on a deck-by-deck basi