Brixbench#
brixbench is a Go test based benchmark harness for comparing
AIBrix-routed serving paths with direct-engine baselines. It uses
reproducible scenario YAML files, runs benchmark clients inside the target
Kubernetes cluster, and stores shared result artifacts for later analysis.
Use brixbench when you want to compare AIBrix gateway behavior, routing
configuration, release versions, source commits, or direct vLLM baselines
under repeatable benchmark scenarios.
For the older Python benchmark, dataset generator, and workload generator, see Benchmark and Workload Generator.
Requirements#
Before running benchmarks, make sure the local environment has:
Go 1.22 or newer.
kubectlonPATH.Access to the target Kubernetes cluster through the active kubeconfig context.
Permission to create, update, list, watch, and delete resources in the benchmark namespace.
Optional:
uv,.venv, andmatplotlibfor comparison figure generation.
Quick cluster preflight:
kubectl config current-context
kubectl auth can-i create pods -n brixbench-adhoc
kubectl auth can-i delete namespace brixbench-adhoc
kubectl get ns
Core Concepts#
Each benchmark run is driven by a scenario file under
brixbench/benchmark/testdata/scenarios/. A scenario contains one or more
test cases. Each test case wires together three kinds of inputs:
engine.manifest: Kubernetes workload manifest for the model serving path.benchmark: request workload YAML consumed by the benchmark client.gateway: optional AIBrix gateway image, environment, or resource overrides.
The runner deploys the selected serving path, waits for readiness, runs the benchmark client in the cluster, persists artifacts, and then tears down the run-scoped benchmark namespace.
By default, the runner resets the benchmark namespace before each test case
and cleans it up after the case finishes. Set
BENCHMARK_RESET_BEFORE_TEST=false or pass -benchmark.reset=false to
reuse the existing benchmark namespace at the start of a case. Set
BENCHMARK_CLEANUP_AFTER_TEST=false or pass
-benchmark.cleanup=false to leave resources in place after a test case
finishes.
Scenario Files#
Scenario files define the benchmark cases to run and the deployment inputs for each case.
Example:
Scenario: aibrix-batching-comparison
Tests:
- name: aibrix-v0.6.0-disaggregated
provider: aibrix
fullstack: false
version: v0.6.0
engine:
type: vllm
manifest: testdata/deployments/aibrix/models/pd-model.yaml
benchmark: testdata/benchmarks/vllm-chat-smoke-pd.yaml
gateway:
env:
ROUTING_ALGORITHM: pd
resources:
- testdata/deployments/aibrix/gateway/overrides/gateway-plugins-scaleup.yaml
Important scenario fields:
Field |
Meaning |
|---|---|
|
Use the AIBrix deployer. |
|
Use the plain vLLM direct baseline. |
|
Reuse an existing shared AIBrix control plane and update only run-scoped workloads and gateway overrides. |
|
Reinstall the AIBrix control plane from the checked-out workspace and Helm chart. |
|
Select the AIBrix release, source revision, or local source tree used by the test case. |
|
Serving engine type. The supported benchmark paths currently use
|
|
Kubernetes model deployment manifest. |
|
Request workload configuration consumed by the benchmark client. |
Benchmark Workload Config#
The benchmark field points to request-side workload YAML. This is where
RPS, concurrency, arrival rate, dataset shape, and routing-specific benchmark
settings live. It is separate from engine.manifest, which only describes
how to deploy the model workload.
Example benchmark config:
kind: vllm-bench
execution: cluster
image: aibrix-public-release-cn-beijing.cr.volces.com/aibrix/vllm-bench:v0.21.0-20260521
namespace: brixbench-adhoc
podName: vllm-bench-client
modelHostPath: /data01/models
rootHostPath: /root
artifacts:
resultFilename: bench_results.json
logDir: testdata/logs
vllmArgs:
base-url: http://aibrix-gateway-plugins.aibrix-system.svc.cluster.local:10080
endpoint: /v1/chat/completions
backend: openai-chat
model: qwen3-8b
tokenizer: /data01/models/Qwen3-8B
num-prompts: 200
request-rate: 16
concurrency: 32
metric-percentiles: 50,90,95,99
percentile-metrics: ttft,tpot,itl,e2el
goodput: tpot:250
routing-strategy: pd
dataset-name: prefix_repetition
prefix-repetition-num-prefixes: 20
prefix-repetition-prefix-len: 6000
prefix-repetition-suffix-len: 2000
prefix-repetition-output-len: 1024
Key request-side fields:
base-url: gateway or direct service endpoint used by the benchmark client. The runner can override this with the resolved serving endpoint at runtime.endpoint: OpenAI-compatible API path, such as/v1/chat/completions.model: served model name sent in benchmark requests.num-prompts: total number of benchmark requests.request-rate: target request arrival rate used byvllm bench serve.concurrency: maximum in-flight request concurrency.dataset-name: workload generator, such asrandomorprefix_repetition.goodput: optional latency SLO expression used byvllm benchreporting.routing-strategy: optional routing label for routing-specific benchmark cases.prefix-repetition-*: prefix-repetition dataset shape for shared-prefix workload tests.
Existing request workload examples:
brixbench/benchmark/testdata/benchmarks/vllm-chat-smoke.yamlbrixbench/benchmark/testdata/benchmarks/vllm-chat-smoke-pd.yamlbrixbench/benchmark/testdata/benchmarks/routing/*.yaml
Running Benchmarks#
Run from the brixbench/ directory.
Run the scenario selected by BENCHMARK_SCENARIO:
export BENCHMARK_SCENARIO=testdata/scenarios/aibrix-batching-comparison.yaml
go test -v -count=1 -timeout 30m ./benchmark/...
Run one scenario directly:
go test -v ./benchmark -run TestAIBrixBenchmarkSuite \
-scenario testdata/scenarios/aibrix-hello-world.yaml -count=1
Keep resources after the run for inspection:
BENCHMARK_CLEANUP_AFTER_TEST=false \
go test -v ./benchmark -run TestAIBrixBenchmarkSuite \
-scenario testdata/scenarios/aibrix-hello-world.yaml -count=1
Reuse existing benchmark namespace resources and keep them after the run:
BENCHMARK_RESET_BEFORE_TEST=false BENCHMARK_CLEANUP_AFTER_TEST=false \
go test -v ./benchmark -run TestAIBrixBenchmarkSuite \
-scenario testdata/scenarios/aibrix-hello-world.yaml -count=1
Run a single test case:
go test -v -count=1 -timeout 30m \
-run 'TestAIBrixBenchmarkSuite/aibrix-v0.6.0-disaggregated$' \
./benchmark/...
Deployment Inputs#
For provider: aibrix, the runner supports these source modes:
version: download release artifacts on demand into.tmp/releases/.commit: check out the requested AIBrix source revision and use the workspace chart and overlays.localPath: stage a local AIBrix source tree and use its workspace chart and overlays.
The benchmark suite should not check in large generated AIBrix core or dependency manifests. Version-based runs should use downloaded release artifacts, and source-based runs should use the checked-out workspace and Helm chart path.
Metrics Export#
Metrics export is disabled by default.
Without
BENCHMARK_PUSHGATEWAY_URL, the runner skips external metric export.With
BENCHMARK_PUSHGATEWAY_URL, the runner enables the Pushgateway exporter.The current Pushgateway exporter returns an explicit not-implemented error until real export behavior is implemented.
Artifacts#
Run artifacts are written under:
brixbench/benchmark/testdata/logs/<timestamp>-UTC-<scenario>/
Typical artifacts include:
metadata.jsonsummary.jsonsummary.csvper-case
bench_results.jsonper-case
vllm-bench-client.logper-case
vllm-bench-pod.yamloptional comparison figures under
figures/
Optional Figure Generation#
If brixbench/.venv/bin/python and matplotlib are available, the Go
runner generates figures automatically after writing the scenario summary.
Manual setup:
uv venv .venv
source .venv/bin/activate
uv pip install matplotlib
Manual regeneration:
python benchmark/plot_summary_vllm_bench.py \
benchmark/testdata/logs/<timestamp>-UTC-<scenario>/summary.csv
Troubleshooting#
If a run stalls before benchmark execution, check pending model pods:
kubectl get pods -n brixbench-adhoc
kubectl describe pod -n brixbench-adhoc <pending-pod-name>
Common causes include insufficient GPU capacity or a missing shared AIBrix
control plane when using fullstack: false.