SDK — Evals
The rivano.evals resource gives you programmatic access to quality evaluation data — scores from automated evaluators, performance baselines per agent, and regression alerts when quality drops.
Quality scores
Returns quality scores recorded across traces. Each score is produced by an evaluator that runs post-response:
import Rivano from '@rivano/sdk';
const rivano = new Rivano({ apiKey: 'rv_...' });
const scores = await rivano.evals.scores();
for (const score of scores) {
console.log(`Trace ${score.traceId} — ${score.scorerName}: ${score.value}`);
} Score fields
| Field | Type | Description |
|---|---|---|
traceId | string | The trace this score belongs to |
scorerName | string | Evaluator that produced the score (e.g. faithfulness, relevance, toxicity) |
value | number | Score value, typically 0–1 |
label | string | null | Optional human-readable label |
createdAt | string | When the score was recorded |
Baselines
A baseline captures the expected quality score for a specific agent and scorer. Regressions are detected by comparing new scores against the baseline.
List baselines
import Rivano from '@rivano/sdk';
const rivano = new Rivano({ apiKey: 'rv_...' });
const baselines = await rivano.evals.baselines();
for (const baseline of baselines) {
console.log(`${baseline.agentName} / ${baseline.scorerName}: ${baseline.value} (set ${baseline.createdAt})`);
}
Create a baseline
import Rivano from '@rivano/sdk';
const rivano = new Rivano({ apiKey: 'rv_...' });
// Capture the current average score as the baseline for this agent + scorer
const baseline = await rivano.evals.createBaseline({
agentName: 'contract-summarizer',
scorerName: 'faithfulness',
});
console.log('Baseline set:', baseline.value, 'at', baseline.createdAt); Baseline create parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
agentName | string | Yes | Agent to baseline |
scorerName | string | Yes | Evaluator to baseline |
💡
Set a baseline immediately after a successful deployment. When you deploy the next version, run evals.regressions() to check whether quality held.
Regressions
Returns a list of agent/scorer pairs where current scores are significantly below their baseline:
import Rivano from '@rivano/sdk';
const rivano = new Rivano({ apiKey: 'rv_...' });
const regressions = await rivano.evals.regressions();
if (regressions.length === 0) {
console.log('No regressions detected');
} else {
for (const reg of regressions) {
console.log(`REGRESSION: ${reg.agentName} / ${reg.scorerName}`);
console.log(` Baseline: ${reg.baselineValue}, Current: ${reg.currentValue}`);
console.log(` Drop: ${reg.dropPercent}%`);
}
} Regression fields
| Field | Type | Description |
|---|---|---|
agentName | string | Agent where regression was detected |
scorerName | string | Evaluator that regressed |
baselineValue | number | Reference score |
currentValue | number | Current rolling average |
dropPercent | number | Percentage decrease from baseline |
detectedAt | string | When the regression was first detected |
Error handling
| Error | When it occurs |
|---|---|
SdkAuthError | Invalid API key |
SdkForbiddenError | Evals are not available on all plans |
SdkError | Agent or scorer name does not exist |
Related
- SDK Traces — Traces contain the raw scores used for baseline comparison
- SDK Agents — Agent history for correlating regressions with deployments
- SDK Alerts — Configure notifications when regressions are detected