Walking-Skeleton MVP

VertexEval

Agent Evaluation Platform

Third-party trajectory-level evaluation with cross-channel evidence (trace + audit + state snapshot), Pass@k / Pass^k, HORIZON attribution, and LaStraj-federated red-team corpus.

by Nguyen Dinh Quang Khanh
63
Tests Passing
pytest · green
3
Judge Pool
majority voting
cross-channel
Evidence
trace + audit + snapshot
8010
Service Port
FastAPI /docs

Vertex-Eval

Third-party trajectory-level evaluation with cross-channel evidence (trace + audit + state snapshot), Pass@k / Pass^k, HORIZON attribution, and LaStraj-federated red-team corpus.

📏
Pass@k / Pass^k
HumanEval estimator + rolling-window Pass^k.
🎯
HORIZON Attribution
task_failure, safety_violation, hallucination, tool_misuse…
🌐
LaStraj Federation
PII-stripping + content-hash dedupe; opt-in tenants.

Architecture Overview

End-to-end pipeline from request intake through policy gating, tool execution, and audit emission.

flowchart LR T[Trace] --> Ing[Ingest native/OTel] Ing --> E[EvalEngine] E --> J[Judge pool] E --> CC[Cross-channel] E --> A[HORIZON Attribution] E --> K[Pass@k / Pass^k] E --> R[Report]

Components

Click any component to expand its design decisions and implementation highlights.

Ingest (native + OTel)
ingest
two payload shapes
RubricRegistry
rubric
versioned rubrics + checks
EvalEngine
core
judges + cross-channel + attribution
LaStrajFederation
federated
PII-stripping anonymizer
SLA Alerts
alert
Pass^k breach rules
Privacy
guard
per-tenant isolation

Quickstart

Up and running in under two minutes. Requires Python 3.11+ and Make.

terminal
# clone and install
$ git clone https://github.com/ndqkhanh/vertex-eval
$ cd vertex-eval
$ make install

# run the test suite
$ make test

# start the FastAPI service
$ make run

# smoke-test the API
$ curl http://localhost:8010/healthz

Design Docs

Architecture decisions, trade-offs, system design, and block-level diagrams.