Rate Limits
Relay enforces rate limits to ensure fair usage and platform stability. This page explains how rate limiting works and how to handle rate limit errors.
Rate Limit Structure
Rate limits are applied per app and scoped to your organization. Different organizations have different limits based on their plan.
Standard Limits (Free/Starter Plan)
| Limit | Value | Notes |
|---|---|---|
| Events per minute | 60 | Per app |
| Events per hour | 3,600 | Per app |
| Max concurrent events | 10 | Per app |
| Payload size | 64KB | Per event |
Scale Plan Limits
| Limit | Value | Notes |
|---|---|---|
| Events per minute | 300 | Per app |
| Events per hour | 18,000 | Per app |
| Max concurrent events | 50 | Per app |
| Payload size | 64KB | Per event |
Enterprise Plan Limits
Contact sales for custom limits.
How Rate Limiting Works
Relay uses a token bucket algorithm:
- Your app starts with a budget of tokens
- Each event sent costs 1 token
- Tokens are refilled every minute at a fixed rate
- When you run out of tokens, new events are rejected
Example: 60 Events Per Minute
Start of minute 1:
Budget: 60 tokens
You send 30 events (30 remaining)
After 30 seconds:
Budget: 30 (no new tokens yet)
You send 20 events (10 remaining)
Start of minute 2:
Budget: 60 tokens (refilled)
You send 50 events (10 remaining)
Rate Limit Error Response
When you exceed your rate limit, you receive:
{
"type": "error",
"event_id": null,
"agent_id": "athena",
"error": "Rate limit exceeded: 60 events per minute (Starter plan)",
"code": "RATE_LIMITED"
}
The response includes:
code: Always"RATE_LIMITED"error: Human-readable message with your limit- HTTP 429 status code
Handling Rate Limits
Option 1: Queue Internally
Buffer events and send them at a steady rate:
import asyncio
from collections import deque
class EventQueue:
def __init__(self, events_per_minute=60):
self.events_per_minute = events_per_minute
self.min_interval = 60 / events_per_minute # seconds per event
self.queue = deque()
self.last_sent_time = 0
async def enqueue(self, agent_id, thread_id, payload):
"""Add event to queue"""
self.queue.append((agent_id, thread_id, payload))
await self.process_queue()
async def process_queue(self):
"""Send queued events at rate limit"""
while self.queue:
agent_id, thread_id, payload = self.queue.popleft()
# Wait if needed to respect rate limit
time_since_last = time.time() - self.last_sent_time
if time_since_last < self.min_interval:
await asyncio.sleep(self.min_interval - time_since_last)
event = {
"type": "event",
"agent_id": agent_id,
"thread_id": thread_id,
"payload": payload
}
await websocket.send(json.dumps(event))
self.last_sent_time = time.time()
# Usage
queue = EventQueue(events_per_minute=60)
# Users can send events as fast as they want
await queue.enqueue("athena", "task-123", payload)
await queue.enqueue("klyve", "task-124", payload)
await queue.enqueue("athena", "task-125", payload)
# They're automatically throttled to 60/min
Option 2: Retry with Backoff
When rate-limited, back off and retry:
async def send_with_rate_limit_handling(websocket, agent_id, thread_id, payload):
"""Send event, backing off if rate limited"""
max_retries = 3
backoff_seconds = 10 # Start at 10s for rate limit
for attempt in range(max_retries):
try:
event = {"type": "event", "agent_id": agent_id, "thread_id": thread_id, "payload": payload}
await websocket.send(json.dumps(event))
response = await websocket.recv()
data = json.loads(response)
if data.get("code") == "RATE_LIMITED":
if attempt < max_retries - 1:
delay = backoff_seconds + (2 ** attempt) # 10s, 12s, 14s
print(f"Rate limited, retrying in {delay}s...")
await asyncio.sleep(delay)
continue
else:
return {"error": "Rate limited after retries"}
return data
except Exception as e:
return {"error": str(e)}
Option 3: Upgrade Plan
If you consistently hit rate limits, upgrade to a higher plan:
Starter → Scale:
- Events/minute: 60 → 300
- Max concurrent: 10 → 50
- Cost: $xx/month
Contact sales@relay.ckgworks.com to upgrade.
Monitoring Rate Limit Usage
Headers (If Implemented)
Relay may return rate limit info in response headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1712510460
X-RateLimit-Limit: Your limit (events/minute)X-RateLimit-Remaining: Events remaining before limitX-RateLimit-Reset: Unix timestamp when limit resets
Use these to predict when you'll hit the limit:
def should_send_event(headers):
"""Check if it's safe to send another event"""
remaining = int(headers.get("X-RateLimit-Remaining", 1))
limit = int(headers.get("X-RateLimit-Limit", 60))
# Buffer: always keep 5 events in reserve
threshold = max(5, limit // 10)
return remaining > threshold
Dashboard Metrics
In the Relay dashboard, check Metrics for your app:
- Events sent (per minute, hour, day)
- Events rejected (rate limited)
- P50/P95/P99 latency
- Error rates by code
Use this to understand your usage pattern.
Logging
Log your event sending rate to track patterns:
import logging
from collections import deque
import time
class RateLimitMonitor:
def __init__(self, window_seconds=60):
self.window_seconds = window_seconds
self.timestamps = deque()
def record_event(self):
"""Record that an event was sent"""
now = time.time()
self.timestamps.append(now)
# Remove old entries
while self.timestamps and self.timestamps[0] < now - self.window_seconds:
self.timestamps.popleft()
def get_rate(self):
"""Get current events per minute"""
return len(self.timestamps) * (60 / self.window_seconds)
def log_stats(self):
"""Log current usage"""
rate = self.get_rate()
logger.info(f"Event rate: {rate:.1f} events/min")
if rate > 50: # If approaching 60/min limit
logger.warning(f"Approaching rate limit: {rate:.1f}/60 events/min")
# Usage
monitor = RateLimitMonitor()
async def send_event_with_monitoring(ws, agent_id, thread_id, payload):
await send_event(ws, agent_id, thread_id, payload)
monitor.record_event()
# Every minute
while True:
await asyncio.sleep(60)
monitor.log_stats()
Best Practices
Steady-State Sending
Design your app to send events at a steady rate, not in bursts:
# Bad: All events at once
async def bad_pattern(ws):
for i in range(100):
await send_event(ws, "athena", f"task-{i}", payload)
# Good: Spread events over time
async def good_pattern(ws):
for i in range(100):
await send_event(ws, "athena", f"task-{i}", payload)
await asyncio.sleep(1) # Spread over ~100 seconds
Batch Requests Appropriately
Instead of sending one event per user action, batch when possible:
# Bad: One event per keystroke
async def on_keystroke(char):
payload = {"message": char}
await send_event(ws, "athena", thread_id, payload)
# Good: Batch by request
async def on_user_submit():
full_message = get_accumulated_text()
payload = {"message": full_message}
await send_event(ws, "athena", thread_id, payload)
Queue Management
Always queue events internally if you can't send immediately:
class SmartQueue:
def __init__(self, rate_limit=60):
self.rate_limit = rate_limit
self.queue = asyncio.Queue()
self.sending = False
async def enqueue(self, event):
"""Enqueue event for sending"""
await self.queue.put(event)
if not self.sending:
asyncio.create_task(self.process_queue())
async def process_queue(self):
"""Send queued events respecting rate limit"""
self.sending = True
interval = 60 / self.rate_limit
try:
while True:
event = await asyncio.wait_for(self.queue.get(), timeout=1)
await send_to_relay(event)
await asyncio.sleep(interval)
except asyncio.TimeoutError:
self.sending = False
Upgrading Your Plan
If you consistently need more than 60 events/minute:
- Check dashboard for usage patterns
- Contact sales at sales@relay.ckgworks.com
- Upgrade to Scale plan (300 events/minute)
- Or request Enterprise plan for custom limits
Upgrades take effect immediately.
Edge Cases
Concurrent Events
"Max concurrent events" limits how many events can be processed simultaneously:
# This respects the 60/min limit but violates max concurrent (10)
# All events sent instantly:
for i in range(20):
await websocket.send(json.dumps(event)) # All sent immediately
# Should be:
for i in range(20):
await websocket.send(json.dumps(event))
await asyncio.sleep(0.1) # Stagger sending
Multiple Apps
Each app has its own rate limit budget:
App A (Portal):
- 60 events/min budget
- Currently using 40/min → 20 available
App B (Academy):
- 60 events/min budget
- Currently using 55/min → 5 available
App C (Flow):
- 60 events/min budget
- Currently using 0/min → 60 available
Limits don't share across apps.
Troubleshooting
Hitting rate limit even though usage seems low?
- Check if you have multiple app instances (each has own limit)
- Monitor actual events/min in dashboard
- Look for retries being counted (failed + retry = 2 events)
- Check if tests are accidentally sending real events
Want to send more than your plan allows?
- Implement internal queueing (see examples above)
- Contact sales to upgrade plan
- For temporary spikes, we can whitelist temporarily
Rate limit reset timing?
- Limits reset every 60 seconds
- It's a sliding window (not calendar-based)
- If you hit limit at T=0s, you get new tokens at T=60s
Best Practices Summary
Do
- Queue events internally
- Send steadily over time
- Monitor usage in dashboard
- Upgrade if consistently at limit
- Handle RATE_LIMITED errors
Don't
- Send events in bursts
- Retry rate-limited events immediately
- Hardcode rate limits in code
- Assume high burst capacity
- Ignore rate limit responses