AgentAssayLIVE

Token-Efficient Agent Testing

Name: AgentAssay
Author: Varun Pratap Bhardwaj

AgentAssay is a regression testing framework for AI agents that cuts evaluation costs by 78-100% while maintaining statistical rigor. Using stochastic 3-valued verdicts (pass/fail/uncertain) and adaptive budgeting, it tests agent behavior changes efficiently across any framework.

$pip install agentassay

Read the Paper →GitHub →

78-100%

Cost Reduction

Framework Adapters

86%

Detection Rate

7,605

Trials Validated

Features

78-100% Cost Reduction

Adaptive budget allocation means you only spend tokens where they matter. SPRT-based early stopping.

10 Framework Adapters

LangGraph, CrewAI, AutoGen, OpenAI Agents, Semantic Kernel, DSPy, LlamaIndex, Haystack, Marvin, custom.

Stochastic 3-Valued Verdicts

Pass, fail, or uncertain — with confidence intervals. No more binary testing of probabilistic systems.

Behavioral Fingerprinting

86% detection rate for agent behavior changes. Track regression across model updates.

5D Coverage Model

Task, capability, context, interaction, and edge-case dimensions for comprehensive testing.

Mutation Testing

Automatically generate agent variants to verify your test suite catches real failures.

Use Cases

Regression testing after model updates

Cost-efficient CI/CD for agent pipelines

Multi-framework agent evaluation

Statistical testing of probabilistic agents

Research Paper

AgentAssay: Token-Efficient Agent Evaluation via Stochastic Verdicts

Varun Pratap Bhardwaj, 2026

Read on arXiv→

Licensed under AGPL-3.0