Observability

Manage model, phone, and throughput cost without losing the business metric.

Track model usage, phone usage, throughput, and evaluations in one place. Set budgets, compare models, and use evals to decide when a smaller or open-source model is good enough.

That is how teams stay model agnostic, keep ROI visible, and save money without guessing which quality tradeoff is acceptable.

MetricGPT-5 miniGPT-5 nanoGPT-OSS 120B FireworksGPT-OSS 20B Fireworks
Total Cost$0.659436$0.225682$0.542208$0.390457
Avg Request Duration14.020s11.986s10.805s10.908s
Total Tokens1,374,8321,654,8963,304,3045,248,592
Total Error0000
Total Calls376392536616
Percentage Passed81.12%73.72%71.27%55.84%
answer relevancy
GPT-5 mini
GPT-5 nano
GPT-OSS 120B Fireworks
GPT-OSS 20B Fireworks
Pass
Fail
Citation Check
GPT-5 mini
GPT-5 nano
GPT-OSS 120B Fireworks
GPT-OSS 20B Fireworks
Why This Matters

Budgets and quality need to live in the same system

If model spend, phone spend, and throughput live in separate tools, teams optimize the wrong thing. BotDojo keeps cost, quality, and ROI visible together.

Budgets across model, phone, and throughput

Track LLM usage, telephony spend, and workflow volume together so budgets can be set before cost drifts.

Model-agnostic evaluations

Compare frontier, smaller hosted, and open-source models against the same workflow and the same eval suites.

ROI tied to the workflow

Measure quality, cost, call outcomes, and operator time saved against the business result that matters.

Evaluation Loop

Evaluate before rollout, monitor after rollout, and route for cost with evidence

The goal is not just to ship a better prompt. The goal is to know which model should run which task, what it costs across model and phone usage, and whether the change moved the business KPI.

Offline evaluations

Run golden sets, regression checks, and rubric-based tests before you promote a change into production.

Production monitoring

Track production quality, latency, safety, phone usage, and user outcomes continuously so problems show up as signals instead of surprises.

Budget-aware routing

Use the eval results to decide when a smaller or open-source model is good enough, then route work there with evidence.

Dataset
Golden Set
Model
v3.2temp 0.2
Scoring
Rubric & Metrics
92/100
Pass rate improving
Cost And ROI

Save money by using the right model for the right work

BotDojo stays model agnostic, so you can compare frontier models against smaller and open-source models in the same environment. When the evals say the cheaper model is good enough, you can route the work there with confidence.

Production proof

ContactWorks used BotDojo evaluation tools to identify tasks that could move to faster, cheaper models without sacrificing accuracy, cutting model costs by an additional 8x.

Track model, phone, and throughput

See model choice, phone usage, cost, latency, errors, and tool behavior at the workflow level instead of guessing where spend drift came from.

Set budgets and alerts

Budget against model usage, phone usage, and throughput so the team can act before overages become the story.

Save money with eval-guided routing

Once the pass rates are visible, move lower-risk work to cheaper models and keep higher-cost models only where they earn the spend.

Observability - BotDojo