AgentAssay

LIVE

Token-Efficient Agent Testing

AgentAssay is a regression testing framework for AI agents that cuts evaluation costs by 78-100% while maintaining statistical rigor. Using stochastic 3-valued verdicts (pass/fail/uncertain) and adaptive budgeting, it tests agent behavior changes efficiently across any framework.

$pip install agentassay
Read the Paper →GitHub →
78-100%
Cost Reduction
10
Framework Adapters
86%
Detection Rate
7,605
Trials Validated

Features

78-100% Cost Reduction

Adaptive budget allocation means you only spend tokens where they matter. SPRT-based early stopping.

10 Framework Adapters

LangGraph, CrewAI, AutoGen, OpenAI Agents, Semantic Kernel, DSPy, LlamaIndex, Haystack, Marvin, custom.

Stochastic 3-Valued Verdicts

Pass, fail, or uncertain — with confidence intervals. No more binary testing of probabilistic systems.

Behavioral Fingerprinting

86% detection rate for agent behavior changes. Track regression across model updates.

5D Coverage Model

Task, capability, context, interaction, and edge-case dimensions for comprehensive testing.

Mutation Testing

Automatically generate agent variants to verify your test suite catches real failures.

Use Cases

Regression testing after model updates
Cost-efficient CI/CD for agent pipelines
Multi-framework agent evaluation
Statistical testing of probabilistic agents

Research Paper

AgentAssay: Token-Efficient Agent Evaluation via Stochastic Verdicts

Varun Pratap Bhardwaj, 2026

Read on arXiv
Licensed under Apache-2.0