AIBrix Container Images#
Overview#
AIBrix provides enhanced container images for vLLM and SGLang that include additional capabilities for distributed inference and KV cache disaggregation:
aibrix_kvcache - Built from source for KV cache disaggregation support
nixl + nixl-cu12 - UCX-based high-performance networking libraries for RDMA
UCX tooling - Pre-installed debugging and performance testing utilities
Image Naming Convention#
AIBrix images extend upstream inference engines with additional capabilities:
Upstream Image |
AIBrix Enhanced Image |
Use Case |
|---|---|---|
|
|
vLLM + KVCache + RDMA |
|
|
SGLang + KVCache + RDMA |
When to Use AIBrix Images#
Use AIBrix-enhanced images when you need:
KV Cache Offloading: KV cache offload to Host memory or remote storage
Prefill-Decode Disaggregation: Separate prefill and decode workloads via NIXL
Use upstream images for:
Standard single-node inference without disaggregation
Development and testing without specialized networking
Compatibility Matrix#
The following table shows tested component versions for AIBrix v0.5.0:
Component |
vLLM Image |
SGLang Image |
Notes |
|---|---|---|---|
Engine Version |
v0.10.2 |
v0.5.5.post3 |
Stable inference engines |
CUDA Version |
12.8 |
12.9 |
CUDA toolkit version |
PyTorch Version |
2.8 |
2.9 |
Auto-detected from base image |
AIBrix KVCache |
v0.5.0 |
v0.5.0 |
KV cache disaggregation support |
NIXL Version |
0.7.1 |
0.7.1 |
UCX-based RDMA networking |
UCX Version |
1.19.0 |
1.19.0 |
Unified Communication X |
Note
PyTorch version is automatically extracted from the upstream base image to ensure compatibility. AIBrix KVCache is built against the exact PyTorch version from the base image.
Released Images (v0.5.0)#
The following pre-built images are available for immediate use:
vLLM Image:
docker pull aibrix/vllm-openai:v0.10.2-aibrix-v0.5.0-nixl-0.7.1-20251123
SGLang Image:
docker pull aibrix/sglang:v0.5.5.post3-aibrix-v0.5.0-nixl-0.7.1-20251123
Building Custom Images#
For detailed build instructions and troubleshooting, see build/container/README.md.
Version History#
v0.5.0#
vLLM: v0.10.2 with CUDA 12.8, PyTorch 2.8
SGLang: v0.5.5.post3 with CUDA 12.9, PyTorch 2.9
AIBrix KVCache: v0.5.0
NIXL: 0.7.1
UCX: 1.19.0
Features:
Full KV cache offloading support
RDMA networking for distributed inference
Prefill-Decode disaggregation support
Troubleshooting#
Performance Issues#
For RDMA networking issues:
Verify RDMA devices are available:
ibv_devicesCheck UCX configuration:
ucx_info -dTest RDMA bandwidth:
ib_write_bwEnsure security policies allow RDMA access
For debugging utilities included in the image, run:
kubectl exec -it <pod-name> -- ucx_info -d
kubectl exec -it <pod-name> -- ibv_devices
kubectl exec -it <pod-name> -- ib_write_bw