Quickstart#
Install AIBrix#
Get your kubernetes cluster ready, run following commands to install aibrix components in your cluster.
Note
If you just want to install specific components or specific version, please check installation guidance for more installation options.
kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.2.1/aibrix-dependency-v0.2.1.yaml
kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.2.1/aibrix-core-v0.2.1.yaml
Wait for few minutes and run kubectl get pods -n aibrix-system to check pod status util they are ready.
NAME READY STATUS RESTARTS AGE
aibrix-controller-manager-56576666d6-gsl8s 1/1 Running 0 5h24m
aibrix-gateway-plugins-c6cb7545-r4xwj 1/1 Running 0 5h24m
aibrix-gpu-optimizer-89b9d9895-t8wnq 1/1 Running 0 5h24m
aibrix-kuberay-operator-6dcf94b49f-l4522 1/1 Running 0 5h24m
aibrix-metadata-service-6b4d44d5bd-h5g2r 1/1 Running 0 5h24m
aibrix-redis-master-84769768cb-fsq45 1/1 Running 0 5h24m
Deploy base model#
Save yaml as model.yaml and run kubectl apply -f model.yaml.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b # Note: The label value `model.aibrix.ai/name` here must match with the service name.
model.aibrix.ai/port: "8000"
name: deepseek-r1-distill-llama-8b
namespace: default
spec:
replicas: 1
selector:
matchLabels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
template:
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
spec:
containers:
- command:
- python3
- -m
- vllm.entrypoints.openai.api_server
- --host
- "0.0.0.0"
- --port
- "8000"
- --uvicorn-log-level
- warning
- --model
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- --served-model-name
# Note: The `--served-model-name` argument value must also match the Service name and the Deployment label `model.aibrix.ai/name`
- deepseek-r1-distill-llama-8b
- --max-model-len
- "12288" # 24k length, this is to avoid "The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache" issue.
image: vllm/vllm-openai:v0.7.1
imagePullPolicy: IfNotPresent
name: vllm-openai
ports:
- containerPort: 8000
protocol: TCP
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8000
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8000
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
prometheus-discovery: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
name: deepseek-r1-distill-llama-8b # Note: The Service name must match the label value `model.aibrix.ai/name` in the Deployment
namespace: default
spec:
ports:
- name: serve
port: 8000
protocol: TCP
targetPort: 8000
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
type: ClusterIP
Ensure that:
The Service name matches the model.aibrix.ai/name label value in the Deployment.
The –served-model-name argument value in the Deployment command is also consistent with the Service name and model.aibrix.ai/name label.
Invoke the model endpoint using gateway api#
Depending on where you deployed the AIBrix, you can use either of the following options to query the gateway.
# Option 1: Kubernetes cluster with LoadBalancer support
LB_IP=$(kubectl get svc/envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
ENDPOINT="${LB_IP}:80"
# Option 2: Dev environment without LoadBalancer support. Use port forwarding way instead
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 &
ENDPOINT="localhost:8888"
Attention
Some cloud provider like AWS EKS expose the endpoint at hostname field, if that case, you should use .status.loadBalancer.ingress[0].hostname instead.
# completion api
curl -v http://${ENDPOINT}/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-8b",
"prompt": "San Francisco is a",
"max_tokens": 128,
"temperature": 0
}'
# chat completion api
curl http://${ENDPOINT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "help me write a random generator in python"}
]
}'
If you meet problems exposing external IPs, feel free to debug with following commands. 101.18.0.4 is the ip of the gateway service.
kubectl get svc -n envoy-gateway-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
envoy-aibrix-system-aibrix-eg-903790dc LoadBalancer 10.96.239.246 101.18.0.4 80:32079/TCP 10d
envoy-gateway ClusterIP 10.96.166.226 <none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP 10d