AgentForge

Replay

Diff and replay agent runs — compare executions and debug regressions.

buildReplayDiff

Compare two runs to find differences:

import { buildReplayDiff, formatReplayDiff } from '@ahzan-agentforge/core';

const diff = buildReplayDiff(originalTrace, replayTrace);
console.log(formatReplayDiff(diff));

ReplayDiff

interface ReplayDiff {
  original: RunTrace;
  replay: RunTrace;
  steps: StepComparison[];
  summary: {
    matched: number;
    diverged: number;
    added: number;
    removed: number;
  };
}

StepComparison

interface StepComparison {
  index: number;
  status: 'matched' | 'diverged' | 'added' | 'removed';
  original?: StepRecord;
  replay?: StepRecord;
  differences?: string[];
}

Usage

Replay is useful for:

  • Regression testing — ensure agent behavior is consistent across code changes
  • Debugging — compare a failing run against a working one
  • Cost analysis — see how prompt changes affect token usage
// Run original
const result1 = await agent.run({ task: 'Process order' });

// Change something (prompt, tools, model, etc.)

// Run again
const result2 = await agent.run({ task: 'Process order' });

// Compare
const diff = buildReplayDiff(
  buildTrace(result1),
  buildTrace(result2),
);

console.log(`Matched: ${diff.summary.matched}, Diverged: ${diff.summary.diverged}`);

Next Steps