AgentForge

Testing Overview

Test agents deterministically with MockLLM, TestHarness, and StepDebugger.

Testing tools

Agents are non-deterministic, so AgentForge ships utilities that pin down behavior for tests:

  • MockLLM — scripted LLM responses for reproducible tests
  • TestHarness — run agents and inspect results with rich assertions
  • StepDebugger — step through agent execution one step at a time

Quick Start

import { createMockLLM, createTestHarness } from '@ahzan-agentforge/core';
import { describe, it, expect } from 'vitest';

describe('my agent', () => {
  it('should call the lookup tool', async () => {
    const mockLLM = createMockLLM({
      responses: [
        { toolCalls: [{ name: 'lookup', input: { id: '123' } }] },
        { text: 'Found the item.' },
      ],
    });

    const harness = createTestHarness({
      agent: myAgentConfig,
      llm: mockLLM,
    });

    const result = await harness.run({ task: 'Find item 123' });

    expect(result.status).toBe('completed');
    expect(result.toolCalls('lookup')).toHaveLength(1);
    expect(result.hasError()).toBe(false);
  });
});

Key Points

  • MockLLM uses text field (not content) for text responses
  • TestHarness result: toolCalls() and hasError() are methods, not properties
  • toolCalls(toolName?) filters by tool name; without argument returns all calls

Next Steps