Local Mode#
Local mode runs the AIBrix gateway data path on a single machine without Kubernetes. It starts Envoy and the AIBrix gateway plugin as local processes and routes traffic to model servers listed in a static endpoint file.
Use local mode for gateway routing development, quick OpenAI-compatible request testing, and debugging routing algorithms. Use a Kubernetes installation when you need controllers, CRDs, autoscaling, model downloading, runtime sidecars, or the full batch and metadata stack.
Architecture#
Client
|
v
Envoy (:10080)
|
v
Gateway plugin
|
v
Static endpoints file
|
v
vLLM or another OpenAI-compatible backend
Prerequisites#
Install these tools on the local machine:
Go 1.22 or newer
Envoy
one or more OpenAI-compatible model servers, such as vLLM
On macOS, Envoy can be installed with Homebrew:
brew install envoy
On Linux, download an Envoy release binary:
curl -L -o envoy \
https://github.com/envoyproxy/envoy/releases/download/v1.37.1/envoy-1.37.1-linux-x86_64
chmod +x envoy
sudo mv envoy /usr/local/bin/envoy
Build the local gateway plugin binary from the repository root:
make build-gateway-plugins-nozmq
Start a backend model server. This example uses vLLM:
vllm serve Qwen/Qwen2.5-1.5B-Instruct --port 8000
Configure endpoints#
Local mode reads model endpoints from
deployment/local/configs/endpoints.yaml. A minimal configuration looks like
this:
models:
- name: Qwen/Qwen2.5-1.5B-Instruct
endpoints:
- "127.0.0.1:8000"
The request model name must match the configured model name.
The same file can also describe prefill and decode role sets for prefill/decode disaggregation testing:
models:
- name: Qwen/Qwen2.5-72B
engine: vllm
rolesets:
- name: qwen-pd
prefill:
- "127.0.0.1:8100"
decode:
- "127.0.0.1:8200"
Run local mode#
On Linux, the helper script starts Envoy and the gateway plugin:
cd deployment/local
./run-local.sh
Send a request through Envoy:
curl http://localhost:10080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
"messages": [{"role": "user", "content": "Hello from AIBrix local mode"}]
}'
Stop the local processes:
./stop-local.sh
On macOS, or when you want direct process control, start the processes in separate terminals from the repository root:
bin/gateway-plugins \
--standalone \
--endpoints-config=deployment/local/configs/endpoints.yaml
envoy \
-c deployment/local/configs/envoy.yaml \
--use-dynamic-base-id \
--log-level warn
Use a custom endpoint or Envoy config with the helper script:
cd deployment/local
./run-local.sh \
-e /path/to/endpoints.yaml \
-c /path/to/envoy.yaml
Routing algorithms#
Set ROUTING_ALGORITHM before starting local mode:
ROUTING_ALGORITHM=least-request ./run-local.sh
Use local mode with the same gateway routing algorithms that are supported by the gateway plugin, such as random routing, least-request routing, prefix-cache-aware routing, and prefill/decode routing. For production gateway configuration and algorithm behavior, see Gateway Routing.
Ports and logs#
Component |
Default |
Purpose |
|---|---|---|
Envoy HTTP listener |
|
OpenAI-compatible request entry point. |
Envoy admin |
|
Envoy admin and diagnostics. |
Gateway plugin metrics |
|
Gateway plugin metrics endpoint. |
Health check |
|
Local gateway health check. |
The helper script writes logs under deployment/local/logs:
gateway-plugin.logenvoy.log
Troubleshooting#
no healthy upstream: confirm the backend model server is running and the host and port inendpoints.yamlare correct.model not found: confirm the requestmodelvalue exactly matches the configured model name.Envoy cannot bind a port: stop the previous local mode process or change the listener ports in the Envoy config.
External processing errors: check
gateway-plugin.logfirst, then verify Envoy is usingdeployment/local/configs/envoy.yaml.Prefix-cache or prefill/decode routing does not behave as expected: confirm the selected
ROUTING_ALGORITHMand the endpoint or roleset shape inendpoints.yaml.