Quickstart#

Install AIBrix#

Get your kubernetes cluster ready, run following commands to install aibrix components in your cluster.

Note

If you just want to install specific components or specific version, please check installation guidance for more installation options.

kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.2.1/aibrix-dependency-v0.2.1.yaml
kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.2.1/aibrix-core-v0.2.1.yaml

Wait for few minutes and run kubectl get pods -n aibrix-system to check pod status util they are ready.

NAME                                         READY   STATUS    RESTARTS   AGE
aibrix-controller-manager-56576666d6-gsl8s   1/1     Running   0          5h24m
aibrix-gateway-plugins-c6cb7545-r4xwj        1/1     Running   0          5h24m
aibrix-gpu-optimizer-89b9d9895-t8wnq         1/1     Running   0          5h24m
aibrix-kuberay-operator-6dcf94b49f-l4522     1/1     Running   0          5h24m
aibrix-metadata-service-6b4d44d5bd-h5g2r     1/1     Running   0          5h24m
aibrix-redis-master-84769768cb-fsq45         1/1     Running   0          5h24m

Deploy base model#

Save yaml as model.yaml and run kubectl apply -f model.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    model.aibrix.ai/name: deepseek-r1-distill-llama-8b # Note: The label value `model.aibrix.ai/name` here must match with the service name.
    model.aibrix.ai/port: "8000"
  name: deepseek-r1-distill-llama-8b
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      model.aibrix.ai/name: deepseek-r1-distill-llama-8b
  template:
    metadata:
      labels:
        model.aibrix.ai/name: deepseek-r1-distill-llama-8b
    spec:
      containers:
        - command:
            - python3
            - -m
            - vllm.entrypoints.openai.api_server
            - --host
            - "0.0.0.0"
            - --port
            - "8000"
            - --uvicorn-log-level
            - warning
            - --model
            - deepseek-ai/DeepSeek-R1-Distill-Llama-8B
            - --served-model-name
            # Note: The `--served-model-name` argument value must also match the Service name and the Deployment label `model.aibrix.ai/name`
            - deepseek-r1-distill-llama-8b
            - --max-model-len
            - "12288" # 24k length, this is to avoid "The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache" issue.
          image: vllm/vllm-openai:v0.7.1
          imagePullPolicy: IfNotPresent
          name: vllm-openai
          ports:
            - containerPort: 8000
              protocol: TCP
          resources:
            limits:
              nvidia.com/gpu: "1"
            requests:
              nvidia.com/gpu: "1"
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 8000
              scheme: HTTP
            initialDelaySeconds: 120
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 5
            httpGet:
              path: /health
              port: 8000
              scheme: HTTP
            initialDelaySeconds: 120
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1

---

apiVersion: v1
kind: Service
metadata:
  labels:
    model.aibrix.ai/name: deepseek-r1-distill-llama-8b
    prometheus-discovery: "true"
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
  name: deepseek-r1-distill-llama-8b # Note: The Service name must match the label value `model.aibrix.ai/name` in the Deployment
  namespace: default
spec:
  ports:
    - name: serve
      port: 8000
      protocol: TCP
      targetPort: 8000
    - name: http
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector:
    model.aibrix.ai/name: deepseek-r1-distill-llama-8b
  type: ClusterIP

Ensure that:

  1. The Service name matches the model.aibrix.ai/name label value in the Deployment.

  2. The –served-model-name argument value in the Deployment command is also consistent with the Service name and model.aibrix.ai/name label.

Invoke the model endpoint using gateway api#

Depending on where you deployed the AIBrix, you can use either of the following options to query the gateway.

# Option 1: Kubernetes cluster with LoadBalancer support
LB_IP=$(kubectl get svc/envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
ENDPOINT="${LB_IP}:80"

# Option 2: Dev environment without LoadBalancer support. Use port forwarding way instead
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 &
ENDPOINT="localhost:8888"

Attention

Some cloud provider like AWS EKS expose the endpoint at hostname field, if that case, you should use .status.loadBalancer.ingress[0].hostname instead.

# completion api
curl -v http://${ENDPOINT}/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "deepseek-r1-distill-llama-8b",
        "prompt": "San Francisco is a",
        "max_tokens": 128,
        "temperature": 0
    }'

# chat completion api
curl http://${ENDPOINT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "deepseek-r1-distill-llama-8b",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "help me write a random generator in python"}
    ]
}'

If you meet problems exposing external IPs, feel free to debug with following commands. 101.18.0.4 is the ip of the gateway service.

kubectl get svc -n envoy-gateway-system
NAME                                     TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                   AGE
envoy-aibrix-system-aibrix-eg-903790dc   LoadBalancer   10.96.239.246   101.18.0.4    80:32079/TCP                              10d
envoy-gateway                            ClusterIP      10.96.166.226   <none>        18000/TCP,18001/TCP,18002/TCP,19001/TCP   10d