Local Mode#

Local mode runs the AIBrix gateway data path on a single machine without Kubernetes. It starts Envoy and the AIBrix gateway plugin as local processes and routes traffic to model servers listed in a static endpoint file.

Use local mode for gateway routing development, quick OpenAI-compatible request testing, and debugging routing algorithms. Use a Kubernetes installation when you need controllers, CRDs, autoscaling, model downloading, runtime sidecars, or the full batch and metadata stack.

Architecture#

Client
   |
   v
Envoy (:10080)
   |
   v
Gateway plugin
   |
   v
Static endpoints file
   |
   v
vLLM or another OpenAI-compatible backend

Prerequisites#

Install these tools on the local machine:

  • Go 1.22 or newer

  • Envoy

  • one or more OpenAI-compatible model servers, such as vLLM

On macOS, Envoy can be installed with Homebrew:

brew install envoy

On Linux, download an Envoy release binary:

curl -L -o envoy \
  https://github.com/envoyproxy/envoy/releases/download/v1.37.1/envoy-1.37.1-linux-x86_64
chmod +x envoy
sudo mv envoy /usr/local/bin/envoy

Build the local gateway plugin binary from the repository root:

make build-gateway-plugins-nozmq

Start a backend model server. This example uses vLLM:

vllm serve Qwen/Qwen2.5-1.5B-Instruct --port 8000

Configure endpoints#

Local mode reads model endpoints from deployment/local/configs/endpoints.yaml. A minimal configuration looks like this:

models:
  - name: Qwen/Qwen2.5-1.5B-Instruct
    endpoints:
      - "127.0.0.1:8000"

The request model name must match the configured model name.

The same file can also describe prefill and decode role sets for prefill/decode disaggregation testing:

models:
  - name: Qwen/Qwen2.5-72B
    engine: vllm
    rolesets:
      - name: qwen-pd
        prefill:
          - "127.0.0.1:8100"
        decode:
          - "127.0.0.1:8200"

Run local mode#

On Linux, the helper script starts Envoy and the gateway plugin:

cd deployment/local
./run-local.sh

Send a request through Envoy:

curl http://localhost:10080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-1.5B-Instruct",
    "messages": [{"role": "user", "content": "Hello from AIBrix local mode"}]
  }'

Stop the local processes:

./stop-local.sh

On macOS, or when you want direct process control, start the processes in separate terminals from the repository root:

bin/gateway-plugins \
  --standalone \
  --endpoints-config=deployment/local/configs/endpoints.yaml
envoy \
  -c deployment/local/configs/envoy.yaml \
  --use-dynamic-base-id \
  --log-level warn

Use a custom endpoint or Envoy config with the helper script:

cd deployment/local
./run-local.sh \
  -e /path/to/endpoints.yaml \
  -c /path/to/envoy.yaml

Routing algorithms#

Set ROUTING_ALGORITHM before starting local mode:

ROUTING_ALGORITHM=least-request ./run-local.sh

Use local mode with the same gateway routing algorithms that are supported by the gateway plugin, such as random routing, least-request routing, prefix-cache-aware routing, and prefill/decode routing. For production gateway configuration and algorithm behavior, see Gateway Routing.

Ports and logs#

Component

Default

Purpose

Envoy HTTP listener

localhost:10080

OpenAI-compatible request entry point.

Envoy admin

localhost:9901

Envoy admin and diagnostics.

Gateway plugin metrics

localhost:8080/metrics

Gateway plugin metrics endpoint.

Health check

localhost:10080/healthz

Local gateway health check.

The helper script writes logs under deployment/local/logs:

  • gateway-plugin.log

  • envoy.log

Troubleshooting#

  • no healthy upstream: confirm the backend model server is running and the host and port in endpoints.yaml are correct.

  • model not found: confirm the request model value exactly matches the configured model name.

  • Envoy cannot bind a port: stop the previous local mode process or change the listener ports in the Envoy config.

  • External processing errors: check gateway-plugin.log first, then verify Envoy is using deployment/local/configs/envoy.yaml.

  • Prefix-cache or prefill/decode routing does not behave as expected: confirm the selected ROUTING_ALGORITHM and the endpoint or roleset shape in endpoints.yaml.