Batch API#
The AIBrix Batch API provides an efficient way to process large volumes of LLM inference requests asynchronously. It is fully compatible with the OpenAI Batch API, allowing you to send batches of requests that are processed in the background with results retrieved later.
Overview#
Batch processing is ideal for workloads that don’t require immediate responses and can benefit from:
Cost efficiency: Process requests during off-peak hours with optimized resource utilization
Higher throughput: Handle large volumes of requests without rate limiting concerns
Simplified workflows: Submit thousands of requests in a single batch operation
Guaranteed processing: Built-in retry mechanisms and failure handling
The Batch API accepts a JSONL (JSON Lines) file containing multiple inference requests, processes them asynchronously using Kubernetes Jobs, and returns results in a corresponding JSONL output file.
Key Features#
OpenAI-compatible: Drop-in replacement for OpenAI’s Batch API with identical request/response format
Distributed execution: Leverages Kubernetes Jobs for scalable, fault-tolerant batch processing
Metadata server workflow: Centralized coordination for multi-node batch execution
Storage flexibility: Supports S3, Redis, and local storage backends
Request tracking: Each request has a
custom_idfor precise result matchingStatus monitoring: Real-time progress tracking with detailed metrics
24-hour completion window: Automatic expiration for long-running batches
Comparison with OpenAI Batch API#
AIBrix Batch API maintains full compatibility with OpenAI’s Batch API while adding enterprise features:
AIBrix Enhancements:
Self-hosted: Full control over infrastructure and data
Kubernetes-native: Leverages K8s for scheduling and resource management
Flexible storage: S3, Redis, or local storage backends
Distributed execution: Metadata server coordinates multi-node processing
Cost control: Use existing infrastructure without per-request pricing
Migration from OpenAI:
Simply update the base_url in your OpenAI SDK configuration:
from openai import OpenAI
# Before: OpenAI
# client = OpenAI(api_key="sk-...")
# After: AIBrix
client = OpenAI(
base_url="http://your-aibrix-endpoint/v1",
api_key="your-key" # Optional
)
# All batch API calls work identically
Known Differences:
Pricing: No usage-based pricing; controlled by your infrastructure costs
Endpoints: Currently supports
/v1/chat/completions(more coming soon)Rate Limits: Determined by your cluster capacity, not API limits
Supported Endpoints:
/v1/chat/completions- Chat completion requests/v1/completions- Text completion requests/v1/embeddings- Embedding generation requests
All endpoints are fully supported if the underlying LLM engine supports them.
Current Limitations:
24-hour completion window (not configurable)
Moderation endpoint (
/v1/moderations) not supported
Architecture#
Workflow Overview#
The Batch API follows a metadata server architecture for distributed processing:
┌──────────┐
│ Client │
└────┬─────┘
│ 1. Upload JSONL file
▼
┌─────────────────┐
│ Files API │
│ (Metadata) │
└────┬────────────┘
│ 2. Create batch job
▼
┌──────────────────┐ ┌──────────────────┐
│ Batch API │────────▶│ Job Scheduler │
│ (Metadata) │ │ (Kubernetes) │
└────┬─────────────┘ └────┬─────────────┘
│ │ 3. Execute workers
│ 4. Poll status ▼
│ ┌─────────────┐
│ │ K8s Jobs │
│ │ (Workers) │
│ └────┬────────┘
│ │ 5. Process requests
│ │ & write outputs
│ 6. Download output │
▼ ▼
┌──────────────────┐ ┌──────────────┐
│ Files API │◀────────│ Storage │
│ (Metadata) │ │ (S3/Redis) │
└──────────────────┘ └──────────────┘
Phase Transitions:
validating → in_progress → finalizing → completed
↓ ↓ ↓ ↓
Preparing Worker Collecting Results
job files execution outputs ready
Status Lifecycle:
validating: Metadata server validates input file and prepares job configuration
in_progress: Kubernetes Jobs are executing and processing batch requests
finalizing: Workers have completed, metadata server is aggregating results
completed: Output file is ready for download with all results
Failed/Cancelled States:
failed: Job execution encountered unrecoverable errors
cancelled: User explicitly cancelled the batch job
expired: Job exceeded the 24-hour completion window
Components#
Metadata Server: Coordinates batch job lifecycle, manages files, and tracks progress
Job Scheduler: Creates and manages Kubernetes Jobs for batch execution
Worker Jobs: Kubernetes Jobs that process batch requests in parallel
Storage Backend: S3, Redis, or local filesystem for file storage and job state
Files API: OpenAI-compatible file upload/download endpoints
Deployment#
Storage Backend Configuration#
The Batch API requires a storage backend for file operations. AIBrix supports multiple storage backends including S3, TOS, and local storage. To enable cloud object storage, you need to configure credentials and enable the appropriate storage patches.
Enabling S3 Storage
To enable S3 as the storage backend for batch operations:
Generate S3 Credentials Secret:
Use the AIBrix secret generation tool to create the necessary Kubernetes secrets:
# Install the AIBrix package in development mode
cd python/aibrix && pip install -e .
# Generate S3 credentials secret
aibrix_gen_secrets s3 --bucket your-s3-bucket-name --namespace aibrix-system
# Generate S3 credentials secret for Job Executor
aibrix_gen_secrets s3 --bucket your-s3-bucket-name --namespace default
This command will:
Create a Kubernetes secret named
aibrix-s3-credentialsin theaibrix-systemnamespaceConfigure the secret with your S3 bucket name and credentials
Set up the necessary environment variables for the metadata service
Enable S3 Environment Variables:
Uncomment the S3 patch in the metadata service configuration:
# Edit the kustomization file
vim config/metadata/kustomization.yaml
Find and uncomment the following line:
patches:
- path: s3-env-patch.yaml # Uncomment this line
The patch will inject the S3 environment variables into the metadata service deployment.
Apply the Configuration:
Deploy the job rbac andupdated configuration:
kubectl apply -k config/job
kubectl apply -k config/default
Enabling TOS Storage
For TOS (Tencent Object Storage), follow similar steps:
Generate TOS Credentials Secret:
# Install the AIBrix package in development mode
cd python/aibrix && pip install -e .
# Generate TOS credentials secret
aibrix_gen_secrets tos --bucket your-tos-bucket-name --namespace aibrix-system
# Generate TOS credentials secret for Job Executor
aibrix_gen_secrets tos --bucket your-tos-bucket-name --namespace default
Enable TOS Environment Variables:
Uncomment the TOS patch in the metadata service configuration:
# Edit the kustomization file
vim config/metadata/kustomization.yaml
Find and uncomment the following line:
patches:
- path: tos-env-patch.yaml # Uncomment this line
The patch will inject the TOS environment variables into the metadata service deployment.
Apply the Configuration:
Deploy the job rbac and updated configuration:
kubectl apply -k config/job
kubectl apply -k config/default
Examples#
End-to-End Example#
Here’s a complete example of processing a batch of chat completions:
Step 1: Prepare Input File
Create a file batch_input.jsonl with your requests:
{"custom_id": "task-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain neural networks."}], "max_tokens": 200}}
{"custom_id": "task-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is deep learning?"}], "max_tokens": 200}}
{"custom_id": "task-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Describe transformers architecture."}], "max_tokens": 200}}
Step 2: Upload Input File
# Upload the input file
ENDPOINT="your-aibrix-endpoint:80"
UPLOAD_RESPONSE=$(curl -X POST http://${ENDPOINT}/v1/files \
-F "purpose=batch" \
-F "file=@batch_input.jsonl")
echo $UPLOAD_RESPONSE
# {"id":"file-abc123","object":"file","bytes":1024,"created_at":1677610602,"filename":"batch_input.jsonl","purpose":"batch","status":"uploaded"}
# Extract file ID
FILE_ID=$(echo $UPLOAD_RESPONSE | jq -r '.id')
echo "Uploaded file ID: $FILE_ID"
Step 3: Create Batch Job
# Create batch job
BATCH_RESPONSE=$(curl -X POST http://${ENDPOINT}/v1/batches \
-H "Content-Type: application/json" \
-d "{
\"input_file_id\": \"${FILE_ID}\",
\"endpoint\": \"/v1/chat/completions\",
\"completion_window\": \"24h\"
}")
echo $BATCH_RESPONSE
# Extract batch ID
BATCH_ID=$(echo $BATCH_RESPONSE | jq -r '.id')
echo "Created batch ID: $BATCH_ID"
Step 4: Poll Batch Status
# Poll until completion (with timeout)
MAX_ATTEMPTS=60
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
STATUS_RESPONSE=$(curl -s http://${ENDPOINT}/v1/batches/${BATCH_ID})
STATUS=$(echo $STATUS_RESPONSE | jq -r '.status')
echo "Attempt $ATTEMPT: Status = $STATUS"
if [ "$STATUS" = "completed" ]; then
echo "Batch completed successfully!"
OUTPUT_FILE_ID=$(echo $STATUS_RESPONSE | jq -r '.output_file_id')
break
elif [ "$STATUS" = "failed" ] || [ "$STATUS" = "expired" ] || [ "$STATUS" = "cancelled" ]; then
echo "Batch processing failed with status: $STATUS"
exit 1
fi
ATTEMPT=$((ATTEMPT + 1))
sleep 10
done
if [ $ATTEMPT -eq $MAX_ATTEMPTS ]; then
echo "Batch did not complete within timeout"
exit 1
fi
Step 5: Download Results
# Download output file
curl -o batch_output.jsonl http://${ENDPOINT}/v1/files/${OUTPUT_FILE_ID}/content
echo "Output saved to batch_output.jsonl"
# Display results
cat batch_output.jsonl | jq '.'
Step 6: Process Results
import json
# Parse output file
results = {}
with open('batch_output.jsonl', 'r') as f:
for line in f:
output = json.loads(line)
custom_id = output['custom_id']
response = output['response']
if response['status_code'] == 200:
content = response['body']['choices'][0]['message']['content']
results[custom_id] = content
print(f"{custom_id}: {content[:100]}...")
else:
print(f"{custom_id}: ERROR {response['status_code']}")
# Output:
# task-1: Neural networks are computational models inspired by biological neurons...
# task-2: Deep learning is a subset of machine learning that uses multi-layer...
# task-3: The Transformer architecture is a neural network design that relies...
Python SDK Example#
Using the OpenAI Python SDK (works with AIBrix as a drop-in replacement):
import json
import time
from openai import OpenAI
# Configure client for AIBrix
client = OpenAI(
base_url="http://your-aibrix-endpoint:80/v1",
api_key="dummy-key" # Replace with actual key if authentication is enabled
)
# Step 1: Create batch input file
batch_requests = [
{
"custom_id": f"request-{i}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-oss-120b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"Tell me a fact about the number {i}."}
],
"max_tokens": 100
}
}
for i in range(1, 11) # 10 requests
]
# Write to JSONL file
with open("batch_requests.jsonl", "w") as f:
for request in batch_requests:
f.write(json.dumps(request) + "\n")
# Step 2: Upload file
with open("batch_requests.jsonl", "rb") as f:
batch_file = client.files.create(
file=f,
purpose="batch"
)
print(f"Uploaded file: {batch_file.id}")
# Step 3: Create batch
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Created batch: {batch.id}")
# Step 4: Wait for completion
while batch.status not in ["completed", "failed", "expired", "cancelled"]:
time.sleep(10)
batch = client.batches.retrieve(batch.id)
print(f"Status: {batch.status}")
if batch.status == "completed":
print(f"Batch completed!")
print(f"Total requests: {batch.request_counts.total}")
print(f"Completed: {batch.request_counts.completed}")
print(f"Failed: {batch.request_counts.failed}")
# Step 5: Download results
output_file_id = batch.output_file_id
result_content = client.files.content(output_file_id)
# Save results
with open("batch_results.jsonl", "wb") as f:
f.write(result_content.content)
# Process results
with open("batch_results.jsonl", "r") as f:
for line in f:
result = json.loads(line)
custom_id = result["custom_id"]
content = result["response"]["body"]["choices"][0]["message"]["content"]
print(f"{custom_id}: {content}")
else:
print(f"Batch failed with status: {batch.status}")
Customization#
Customizing Job Executor#
You can customize the batch job execution environment by modifying the job template patch configuration. This allows you to specify custom container images, resource requirements, and other Kubernetes Job specifications.
Job Template Patch Configuration
The job executor behavior is controlled by the config/metadata/job_template_patch.yaml file. This file defines the Kubernetes Job template that will be used for batch processing:
apiVersion: batch/v1
kind: Job
metadata:
name: batch-job-template
namespace: default
spec:
parallelism: 1 # Customizable. The number of parallel workers.
completions: 1 # Customizable. Must equal to the parallelism.
backoffLimit: 2 # Customizable, but usually no need to change.
template:
spec:
containers:
- name: batch-worker
image: aibrix/runtime:nightly # Customizable, runtime image
- name: llm-engine
image: aibrix/vllm-mock:nightly # Customizable, LLM engine image
Customization Options:
parallelism: Number of parallel worker pods (affects throughput)
completions: Must match parallelism for proper job completion
backoffLimit: Number of retries for failed worker pods
batch-worker image: Runtime container that coordinates batch processing
llm-engine image: LLM inference engine container (e.g., vLLM, TensorRT-LLM)
Common Customizations:
Use Custom LLM Engine:
containers: - name: llm-engine image: your-registry/custom-vllm:latest
Increase Parallelism:
spec: parallelism: 4 completions: 4
Add Resource Requirements:
containers: - name: llm-engine image: aibrix/vllm-mock:nightly resources: requests: nvidia.com/gpu: 1 memory: "8Gi" limits: nvidia.com/gpu: 1 memory: "16Gi"
Add Environment Variables:
containers: - name: llm-engine image: aibrix/vllm-mock:nightly env: - name: CUDA_VISIBLE_DEVICES value: "0" - name: MODEL_PATH value: "/models/your-model"
Applying Changes:
After modifying job_template_patch.yaml, apply the changes using:
kubectl apply -k config/default
Configuring Job Pool Size#
The batch job scheduler uses a job pool to control concurrent batch processing. The pool size can be configured via environment variable:
# Set job pool size (default: 1, min: 1, max: 100)
export AIBRIX_BATCH_JOB_POOL_SIZE=10
Configuration Options:
Default: 1 (conservative, prevents resource contention)
Recommended: 5-20 (depending on cluster resources)
Maximum: 100 (validated at runtime)
Setting in Deployment:
Add to metadata service deployment environment variables:
env:
- name: AIBRIX_BATCH_JOB_POOL_SIZE
value: "10"
Verification and Testing#
Verifying Batch API Functionality#
Follow these steps to verify that the Batch API is working correctly in your AIBrix deployment:
Step 1: Set Up Port Forwarding
First, create a port-forward to access the AIBrix services:
# Port-forward the gateway service to access AIBrix APIs
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 &
# Verify the port-forward is working
curl -s http://localhost:8888/v1/batches
Step 2: Set Up Object Store Credentials
Configure S3 credentials for batch file storage:
# Navigate to the Python package directory
cd python/aibrix
# Install the AIBrix package in development mode
pip install -e .
# Generate S3 credentials secret (replace with your S3 bucket)
aibrix_gen_secrets s3 --bucket your-s3-bucket-name
# Example with specific bucket:
# aibrix_gen_secrets s3 --bucket my-aibrix-batch-storage
This command will:
Create the necessary Kubernetes secrets for S3 access
Configure the metadata service to use your S3 bucket for file storage
Set up proper IAM credentials for batch job file operations
Step 3: Run End-to-End Tests
Execute the comprehensive batch API test suite:
# Navigate to the Python package directory (if not already there)
cd python/aibrix
# Run the batch API end-to-end tests
pytest tests/e2e/test_batch_api.py -v
Expected Test Output:
tests/e2e/test_batch_api.py::test_batch_api_e2e_real_service PASSED
========================= 1 passed in 10.78s =========================
Test Coverage:
The test suite verifies:
File Upload/Download: Files API functionality with S3 backend
Batch Job Creation: Proper batch job submission and validation
Kubernetes Job Execution: Worker pod creation and execution
Status Monitoring: Real-time batch status tracking
Result Collection: Output file generation and retrieval
Troubleshooting Common Issues:
Port-forward Connection Issues:
# Check if port-forward is running ps aux | grep port-forward # Kill existing port-forwards and restart pkill -f "port-forward.*8888" kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 &
S3 Credentials Issues:
# Verify S3 secret was created kubectl get secret aibrix-s3-credentials -n aibrix-system # Check secret contents kubectl get secret aibrix-s3-credentials -n aibrix-system -o yaml
Test Failures:
# Run tests with more verbose output pytest tests/e2e/test_batch_api.py -v -s --tb=long
Manual Verification:
You can also manually verify the batch API using curl commands as shown in the Examples section above, using localhost:8888 as your endpoint after setting up the port-forward.
API Reference#
Files API#
The Files API manages input and output files for batch processing. ENDPOINT is the metadata service endpoint.
kubectl port-forward svc/aibrix-metadata-service 8090:8090 -n aibrix-system
export ENDPOINT=localhost:8090
Upload File
curl -X POST http://${ENDPOINT}/v1/files \
-F "purpose=batch" \
-F "file=@batch_input.jsonl"
{
"id": "102983c4-92ef-4de9-a03b-8e05066b16fd",
"object": "file",
"bytes": 3104,
"created_at": 1677610602,
"filename": "batch_input.jsonl",
"purpose": "batch",
"status": "uploaded"
}
Get File Metadata
curl -X GET http://${ENDPOINT}/v1/files/{file_id}
{
"id": "102983c4-92ef-4de9-a03b-8e05066b16fd",
"object": "file",
"bytes": 3104,
"created_at": 1760131968,
"filename": "batch_input.jsonl",
"purpose": "batch",
"status": "uploaded",
"content_type": "application/octet-stream",
"etag": "e64b86a757f6b6e3bbbe65387158d47a",
"last_modified": 1760131968
}
List Files
List all files with optional filtering and pagination:
# List all files
curl -X GET http://${ENDPOINT}/v1/files
# List only batch files
curl -X GET "http://${ENDPOINT}/v1/files?purpose=batch"
# List with pagination (limit and cursor)
curl -X GET "http://${ENDPOINT}/v1/files?limit=10&after=file-abc123"
Response:
{
"object": "list",
"data": [
{
"id": "102983c4-92ef-4de9-a03b-8e05066b16fd",
"object": "file",
"bytes": 3104,
"created_at": 1760131968,
"filename": "batch_input.jsonl",
"purpose": "batch",
"status": "uploaded"
}
],
"has_more": false
}
Query Parameters:
purpose(optional): Filter by file purpose (e.g., “batch”)limit(optional): Number of files to return (1-100, default 20)after(optional): File ID to use as pagination cursor
Download File
curl -X GET http://${ENDPOINT}/v1/files/{file_id}/content
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Explain quantum computing in simple terms."}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "system", "content": "You are a creative writing assistant."},{"role": "user", "content": "Write a short story about a robot discovering emotions."}],"max_tokens": 1000}}
{"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "system", "content": "You are a code reviewer."},{"role": "user", "content": "Review this Python function: def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)"}],"max_tokens": 1000}}
{"custom_id": "request-4", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "system", "content": "You are a cooking instructor."},{"role": "user", "content": "How do I make perfect scrambled eggs?"}],"max_tokens": 1000}}
{"custom_id": "request-5", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "system", "content": "You are a travel advisor."},{"role": "user", "content": "What are the top 5 must-see attractions in Tokyo for first-time visitors?"}],"max_tokens": 1000}}
...
Response: Raw file content (JSONL format)
Batch API#
The Batch API manages batch job lifecycle.
Create Batch
curl -X POST http://${ENDPOINT}/v1/batches \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "102983c4-92ef-4de9-a03b-8e05066b16fd",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'
{
"id": "6f646d68-1314-42f9-907b-b50a88061a9f",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "102983c4-92ef-4de9-a03b-8e05066b16fd",
"completion_window": "24h",
"status": "created",
"output_file_id": null,
"error_file_id": null,
"created_at": 1760132899,
"in_progress_at": null,
"expires_at": 1760219299,
"finalizing_at": null,
"completed_at": null,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": null,
"metadata": null
}
Get Batch Status
curl -X GET http://${ENDPOINT}/v1/batches/{batch_id}
{
"id": "6f646d68-1314-42f9-907b-b50a88061a9f",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "102983c4-92ef-4de9-a03b-8e05066b16fd",
"completion_window": "24h",
"status": "completed",
"output_file_id": "4d4c4f0d-43e2-3a76-8c44-06b95b5afc08",
"error_file_id": "eca1882e-5bf2-3c23-9b03-f54f98558302",
"created_at": 1760132899,
"in_progress_at": 1760132899,
"expires_at": 1760219299,
"finalizing_at": 1760132909,
"completed_at": 1760132909,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 10,
"completed": 10,
"failed": 0
},
"metadata": null
}
List Batches
curl -X GET http://${ENDPOINT}/v1/batches
{
"object": "list",
"data": [
{
"id": "6f646d68-1314-42f9-907b-b50a88061a9f",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "102983c4-92ef-4de9-a03b-8e05066b16fd",
"completion_window": "24h",
"status": "completed",
"output_file_id": "4d4c4f0d-43e2-3a76-8c44-06b95b5afc08",
"error_file_id": "eca1882e-5bf2-3c23-9b03-f54f98558302",
"created_at": 1760132899,
"in_progress_at": 1760132899,
"expires_at": 1760219299,
"finalizing_at": 1760132909,
"completed_at": 1760132909,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 10,
"completed": 10,
"failed": 0
},
"metadata": null
}
],
"first_id": "6f646d68-1314-42f9-907b-b50a88061a9f",
"last_id": "6f646d68-1314-42f9-907b-b50a88061a9f",
"has_more": false
}
Input File Format#
Input files must be in JSONL format with one request per line. Each request requires:
custom_id: Unique identifier for matching results (required)method: HTTP method, typically “POST” (required)url: Endpoint path, e.g., “/v1/chat/completions” (required)body: Request payload matching the endpoint’s format (required)
Example batch_input.jsonl:
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is AI?"}], "max_tokens": 100}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing."}], "max_tokens": 150}}
{"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is machine learning?"}], "max_tokens": 100}}
Output File Format#
Output files are in JSONL format with one result per line matching each input request:
Example batch_output.jsonl:
{"id": "batch-def456-0", "custom_id": "request-1", "response": {"status_code": 200, "request_id": "req_001", "body": {"id": "chatcmpl-001", "object": "chat.completion", "created": 1677610602, "model": "gpt-3.5-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "AI stands for Artificial Intelligence..."}, "finish_reason": "stop"}]}}}
{"id": "batch-def456-1", "custom_id": "request-2", "response": {"status_code": 200, "request_id": "req_002", "body": {"id": "chatcmpl-002", "object": "chat.completion", "created": 1677610603, "model": "gpt-3.5-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Quantum computing uses quantum mechanics..."}, "finish_reason": "stop"}]}}}
{"id": "batch-def456-2", "custom_id": "request-3", "response": {"status_code": 200, "request_id": "req_003", "body": {"id": "chatcmpl-003", "object": "chat.completion", "created": 1677610604, "model": "gpt-3.5-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Machine learning is a subset of AI..."}, "finish_reason": "stop"}]}}}
Each output line contains:
id: Unique result identifiercustom_id: Matches the input request’s custom_idresponse: Containsstatus_code,request_id, andbodywith the actual result