Cost Tracking
Rivano tracks the cost of every AI request in real-time, providing granular attribution by agent, model, team, and user. No sampling — every token is counted and priced.
How It Works
When a request flows through the Rivano proxy, three things happen:
- Token counting — Rivano counts input and output tokens from the provider’s response (using the provider’s reported usage, not estimates).
- Model pricing — Tokens are multiplied by the model’s current pricing. Rivano maintains an up-to-date pricing table for all supported providers.
- Attribution — Cost is attributed to the agent, and optionally to a team or user via the
X-Rivano-Userheader.
Adding User Attribution
Pass the X-Rivano-User header to attribute costs to specific users or teams:
const client = new OpenAI({
baseURL: "https://proxy.rivano.ai/v1",
defaultHeaders: {
"X-Rivano-Agent": process.env.RIVANO_AGENT_ID,
"X-Rivano-Key": process.env.RIVANO_API_KEY,
"X-Rivano-User": "user:jane@example.com",
},
});
Viewing Costs
Dashboard
The Costs page shows a real-time breakdown of your AI spend:
- Total spend — daily, weekly, monthly, or custom range
- By agent — which agents are driving the most cost
- By model — compare spend across GPT-4o, Claude, Gemini, etc.
- By user — who on your team is using the most tokens (requires
X-Rivano-Userheader) - Cost trend — line chart showing spend over time with anomaly highlighting
API
curl "https://api.rivano.ai/v1/costs/summary?period=7d" \
-H "Authorization: Bearer rv_live_abc123"
Response:
{
"period": "7d",
"total_cost": 127.43,
"total_requests": 8432,
"total_input_tokens": 2145000,
"total_output_tokens": 1876000,
"by_agent": [
{ "agent_id": "agent_abc123", "name": "prod-assistant", "cost": 89.21 },
{ "agent_id": "agent_def456", "name": "support-bot", "cost": 38.22 }
],
"by_model": [
{ "model": "gpt-4o", "cost": 102.15, "requests": 5200 },
{ "model": "gpt-4o-mini", "cost": 12.30, "requests": 2800 },
{ "model": "claude-sonnet-4-20250514", "cost": 12.98, "requests": 432 }
]
}
Budget Alerts
Set budget thresholds to get notified before costs spiral. Alerts can be configured per-agent, per-model, or organization-wide.
Via YAML Policy
name: daily-budget-alert
description: Alert when daily spend exceeds $50
status: active
priority: 20
conditions:
direction: both
budget:
threshold: 50.00
period: daily
scope: organization
action: log
notifications:
- type: email
recipients:
- ops@example.com
- type: slack
webhook: https://hooks.slack.com/services/T00/B00/xxx
- type: webhook
url: https://example.com/api/budget-alert
Budget Actions
| Threshold Type | Action | Description |
|---|---|---|
| Warning (80%) | notify | Send alert but allow requests |
| Limit (100%) | notify | Send alert, continue allowing requests |
| Hard limit | block | Reject new requests until the next period |
To enforce a hard spending cap:
name: hard-budget-cap
description: Block requests when daily spend exceeds $100
status: active
priority: 1
conditions:
direction: inbound
budget:
threshold: 100.00
period: daily
scope: per_agent
hard_limit: true
action: block
message: "Daily budget exceeded. Requests will resume tomorrow."
Model Pricing
Rivano’s pricing table is updated automatically when providers change prices. Current rates (as of January 2026):
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o-mini | $0.15 | $0.60 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| Claude Haiku 3.5 | $0.80 | $4.00 |
| Gemini 2.0 Flash | $0.075 | $0.30 |
For custom or fine-tuned models, set pricing manually in Settings → Model Pricing.
Tips for Reducing Costs
- Use the right model — route simple tasks to cheaper models (GPT-4o-mini, Haiku) and reserve expensive models for complex reasoning.
- Set per-agent budgets — prevent any single agent from consuming a disproportionate share.
- Monitor output token ratios — if output tokens consistently exceed input by 10x+, your prompts may be too open-ended.
- Cache frequent queries — Rivano’s cache hits are free and don’t count toward your budget.
- Review weekly — the Costs dashboard highlights anomalies automatically.