Walking-Skeleton MVP

VertexEval

Agent Evaluation Platform

Third-party trajectory-level evaluation with cross-channel evidence (trace + audit + state snapshot), Pass@k / Pass^k, HORIZON attribution, and LaStraj-federated red-team corpus.

View on GitHub Explore architecture →

by Nguyen Dinh Quang Khanh

Tests Passing

pytest · green

Judge Pool

majority voting

cross-channel

Evidence

trace + audit + snapshot

8010

Service Port

FastAPI /docs

Overview

Vertex-Eval

Third-party trajectory-level evaluation with cross-channel evidence (trace + audit + state snapshot), Pass@k / Pass^k, HORIZON attribution, and LaStraj-federated red-team corpus.

📏

Pass@k / Pass^k

HumanEval estimator + rolling-window Pass^k.

🎯

HORIZON Attribution

task_failure, safety_violation, hallucination, tool_misuse…

🌐

LaStraj Federation

PII-stripping + content-hash dedupe; opt-in tenants.

System Design

Architecture Overview

End-to-end pipeline from request intake through policy gating, tool execution, and audit emission.

flowchart LR T[Trace] --> Ing[Ingest native/OTel] Ing --> E[EvalEngine] E --> J[Judge pool] E --> CC[Cross-channel] E --> A[HORIZON Attribution] E --> K[Pass@k / Pass^k] E --> R[Report]

Implementation Detail

Components

Click any component to expand its design decisions and implementation highlights.

⚙

Ingest (native + OTel)

ingest

two payload shapes

⚙

RubricRegistry

rubric

versioned rubrics + checks

⚙

EvalEngine

core

judges + cross-channel + attribution

⚙

LaStrajFederation

federated

PII-stripping anonymizer

⚙

SLA Alerts

alert

Pass^k breach rules

⚙

Privacy

guard

per-tenant isolation

Get Started

Quickstart

Up and running in under two minutes. Requires Python 3.11+ and Make.

terminal

# clone and install
$ git clone https://github.com/ndqkhanh/vertex-eval
$ cd vertex-eval
$ make install

# run the test suite
$ make test

# start the FastAPI service
$ make run

# smoke-test the API
$ curl http://localhost:8010/healthz