Autoscaling

Automatically scale agent workloads based on queue depth, CPU usage, or custom metrics. Ensure your agents can handle traffic spikes.

What is Autoscaling?

Autoscaling automatically adjusts the number of worker instances based on demand. When the queue grows, more workers spin up. When demand drops, workers scale down to save resources.

Create a Scaling Rule

Create Scaling Rulebash

curl -X POST http://localhost:8000/api/v1/scaling/rules \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "name": "runtime-worker-scaling",
    "metric": "queue_depth",
    "target_value": 10,
    "min_instances": 1,
    "max_instances": 10,
    "cooldown_seconds": 300,
    "scale_up_step": 2,
    "scale_down_step": 1
  }'

Available Metrics

Metric	Description
queue_depth	Number of pending runs in the Celery queue
cpu_usage	CPU utilization percentage of worker nodes
memory_usage	Memory utilization percentage
latency_p99	99th percentile response latency

Start with queue_depth as your primary metric. Scale up when queue exceeds 10 items, scale down when below 2. Set a 5-minute cooldown to avoid thrashing.

Event Bus

System Intelligence