Autoscaling
Automatically scale agent workloads based on queue depth, CPU usage, or custom metrics. Ensure your agents can handle traffic spikes.
What is Autoscaling?
Autoscaling automatically adjusts the number of worker instances based on demand. When the queue grows, more workers spin up. When demand drops, workers scale down to save resources.
Create a Scaling Rule
Create Scaling Rulebash
curl -X POST http://localhost:8000/api/v1/scaling/rules \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"name": "runtime-worker-scaling",
"metric": "queue_depth",
"target_value": 10,
"min_instances": 1,
"max_instances": 10,
"cooldown_seconds": 300,
"scale_up_step": 2,
"scale_down_step": 1
}'Available Metrics
| Metric | Description |
|---|---|
| queue_depth | Number of pending runs in the Celery queue |
| cpu_usage | CPU utilization percentage of worker nodes |
| memory_usage | Memory utilization percentage |
| latency_p99 | 99th percentile response latency |
Start with queue_depth as your primary metric. Scale up when queue exceeds 10 items, scale down when below 2. Set a 5-minute cooldown to avoid thrashing.