SDK — Evals

The rivano.evals resource gives you programmatic access to quality evaluation data — scores from automated evaluators, performance baselines per agent, and regression alerts when quality drops.

Quality scores

Returns quality scores recorded across traces. Each score is produced by an evaluator that runs post-response:

import Rivano from '@rivano/sdk';

const rivano = new Rivano({ apiKey: 'rv_...' });

const scores = await rivano.evals.scores();
for (const score of scores) {
  console.log(`Trace ${score.traceId} — ${score.scorerName}: ${score.value}`);
}

Score fields

FieldTypeDescription
traceIdstringThe trace this score belongs to
scorerNamestringEvaluator that produced the score (e.g. faithfulness, relevance, toxicity)
valuenumberScore value, typically 0–1
labelstring | nullOptional human-readable label
createdAtstringWhen the score was recorded

Baselines

A baseline captures the expected quality score for a specific agent and scorer. Regressions are detected by comparing new scores against the baseline.

List baselines

import Rivano from '@rivano/sdk';

const rivano = new Rivano({ apiKey: 'rv_...' });

const baselines = await rivano.evals.baselines();
for (const baseline of baselines) {
  console.log(`${baseline.agentName} / ${baseline.scorerName}: ${baseline.value} (set ${baseline.createdAt})`);
}

Create a baseline

import Rivano from '@rivano/sdk';

const rivano = new Rivano({ apiKey: 'rv_...' });

// Capture the current average score as the baseline for this agent + scorer
const baseline = await rivano.evals.createBaseline({
  agentName: 'contract-summarizer',
  scorerName: 'faithfulness',
});

console.log('Baseline set:', baseline.value, 'at', baseline.createdAt);

Baseline create parameters

ParameterTypeRequiredDescription
agentNamestringYesAgent to baseline
scorerNamestringYesEvaluator to baseline
💡

Set a baseline immediately after a successful deployment. When you deploy the next version, run evals.regressions() to check whether quality held.

Regressions

Returns a list of agent/scorer pairs where current scores are significantly below their baseline:

import Rivano from '@rivano/sdk';

const rivano = new Rivano({ apiKey: 'rv_...' });

const regressions = await rivano.evals.regressions();

if (regressions.length === 0) {
  console.log('No regressions detected');
} else {
  for (const reg of regressions) {
    console.log(`REGRESSION: ${reg.agentName} / ${reg.scorerName}`);
    console.log(`  Baseline: ${reg.baselineValue}, Current: ${reg.currentValue}`);
    console.log(`  Drop: ${reg.dropPercent}%`);
  }
}

Regression fields

FieldTypeDescription
agentNamestringAgent where regression was detected
scorerNamestringEvaluator that regressed
baselineValuenumberReference score
currentValuenumberCurrent rolling average
dropPercentnumberPercentage decrease from baseline
detectedAtstringWhen the regression was first detected

Error handling

ErrorWhen it occurs
SdkAuthErrorInvalid API key
SdkForbiddenErrorEvals are not available on all plans
SdkErrorAgent or scorer name does not exist
  • SDK Traces — Traces contain the raw scores used for baseline comparison
  • SDK Agents — Agent history for correlating regressions with deployments
  • SDK Alerts — Configure notifications when regressions are detected