Agent Evaluation Platform
Third-party trajectory-level evaluation with cross-channel evidence (trace + audit + state snapshot), Pass@k / Pass^k, HORIZON attribution, and LaStraj-federated red-team corpus.
Third-party trajectory-level evaluation with cross-channel evidence (trace + audit + state snapshot), Pass@k / Pass^k, HORIZON attribution, and LaStraj-federated red-team corpus.
End-to-end pipeline from request intake through policy gating, tool execution, and audit emission.
Click any component to expand its design decisions and implementation highlights.
Up and running in under two minutes. Requires Python 3.11+ and Make.
# clone and install $ git clone https://github.com/ndqkhanh/vertex-eval $ cd vertex-eval $ make install # run the test suite $ make test # start the FastAPI service $ make run # smoke-test the API $ curl http://localhost:8010/healthz
Architecture decisions, trade-offs, system design, and block-level diagrams.