Skip to main content

Rate Limits

Relay enforces rate limits to ensure fair usage and platform stability. This page explains how rate limiting works and how to handle rate limit errors.

Rate Limit Structure

Rate limits are applied per app and scoped to your organization. Different organizations have different limits based on their plan.

Standard Limits (Free/Starter Plan)

LimitValueNotes
Events per minute60Per app
Events per hour3,600Per app
Max concurrent events10Per app
Payload size64KBPer event

Scale Plan Limits

LimitValueNotes
Events per minute300Per app
Events per hour18,000Per app
Max concurrent events50Per app
Payload size64KBPer event

Enterprise Plan Limits

Contact sales for custom limits.

How Rate Limiting Works

Relay uses a token bucket algorithm:

  1. Your app starts with a budget of tokens
  2. Each event sent costs 1 token
  3. Tokens are refilled every minute at a fixed rate
  4. When you run out of tokens, new events are rejected

Example: 60 Events Per Minute

Start of minute 1:
Budget: 60 tokens
You send 30 events (30 remaining)

After 30 seconds:
Budget: 30 (no new tokens yet)
You send 20 events (10 remaining)

Start of minute 2:
Budget: 60 tokens (refilled)
You send 50 events (10 remaining)

Rate Limit Error Response

When you exceed your rate limit, you receive:

{
"type": "error",
"event_id": null,
"agent_id": "athena",
"error": "Rate limit exceeded: 60 events per minute (Starter plan)",
"code": "RATE_LIMITED"
}

The response includes:

  • code: Always "RATE_LIMITED"
  • error: Human-readable message with your limit
  • HTTP 429 status code

Handling Rate Limits

Option 1: Queue Internally

Buffer events and send them at a steady rate:

import asyncio
from collections import deque

class EventQueue:
def __init__(self, events_per_minute=60):
self.events_per_minute = events_per_minute
self.min_interval = 60 / events_per_minute # seconds per event
self.queue = deque()
self.last_sent_time = 0

async def enqueue(self, agent_id, thread_id, payload):
"""Add event to queue"""
self.queue.append((agent_id, thread_id, payload))
await self.process_queue()

async def process_queue(self):
"""Send queued events at rate limit"""
while self.queue:
agent_id, thread_id, payload = self.queue.popleft()

# Wait if needed to respect rate limit
time_since_last = time.time() - self.last_sent_time
if time_since_last < self.min_interval:
await asyncio.sleep(self.min_interval - time_since_last)

event = {
"type": "event",
"agent_id": agent_id,
"thread_id": thread_id,
"payload": payload
}

await websocket.send(json.dumps(event))
self.last_sent_time = time.time()

# Usage
queue = EventQueue(events_per_minute=60)

# Users can send events as fast as they want
await queue.enqueue("athena", "task-123", payload)
await queue.enqueue("klyve", "task-124", payload)
await queue.enqueue("athena", "task-125", payload)

# They're automatically throttled to 60/min

Option 2: Retry with Backoff

When rate-limited, back off and retry:

async def send_with_rate_limit_handling(websocket, agent_id, thread_id, payload):
"""Send event, backing off if rate limited"""
max_retries = 3
backoff_seconds = 10 # Start at 10s for rate limit

for attempt in range(max_retries):
try:
event = {"type": "event", "agent_id": agent_id, "thread_id": thread_id, "payload": payload}
await websocket.send(json.dumps(event))

response = await websocket.recv()
data = json.loads(response)

if data.get("code") == "RATE_LIMITED":
if attempt < max_retries - 1:
delay = backoff_seconds + (2 ** attempt) # 10s, 12s, 14s
print(f"Rate limited, retrying in {delay}s...")
await asyncio.sleep(delay)
continue
else:
return {"error": "Rate limited after retries"}

return data

except Exception as e:
return {"error": str(e)}

Option 3: Upgrade Plan

If you consistently hit rate limits, upgrade to a higher plan:

Starter → Scale:

  • Events/minute: 60 → 300
  • Max concurrent: 10 → 50
  • Cost: $xx/month

Contact sales@relay.ckgworks.com to upgrade.

Monitoring Rate Limit Usage

Headers (If Implemented)

Relay may return rate limit info in response headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1712510460
  • X-RateLimit-Limit: Your limit (events/minute)
  • X-RateLimit-Remaining: Events remaining before limit
  • X-RateLimit-Reset: Unix timestamp when limit resets

Use these to predict when you'll hit the limit:

def should_send_event(headers):
"""Check if it's safe to send another event"""
remaining = int(headers.get("X-RateLimit-Remaining", 1))
limit = int(headers.get("X-RateLimit-Limit", 60))

# Buffer: always keep 5 events in reserve
threshold = max(5, limit // 10)

return remaining > threshold

Dashboard Metrics

In the Relay dashboard, check Metrics for your app:

  • Events sent (per minute, hour, day)
  • Events rejected (rate limited)
  • P50/P95/P99 latency
  • Error rates by code

Use this to understand your usage pattern.

Logging

Log your event sending rate to track patterns:

import logging
from collections import deque
import time

class RateLimitMonitor:
def __init__(self, window_seconds=60):
self.window_seconds = window_seconds
self.timestamps = deque()

def record_event(self):
"""Record that an event was sent"""
now = time.time()
self.timestamps.append(now)

# Remove old entries
while self.timestamps and self.timestamps[0] < now - self.window_seconds:
self.timestamps.popleft()

def get_rate(self):
"""Get current events per minute"""
return len(self.timestamps) * (60 / self.window_seconds)

def log_stats(self):
"""Log current usage"""
rate = self.get_rate()
logger.info(f"Event rate: {rate:.1f} events/min")

if rate > 50: # If approaching 60/min limit
logger.warning(f"Approaching rate limit: {rate:.1f}/60 events/min")

# Usage
monitor = RateLimitMonitor()

async def send_event_with_monitoring(ws, agent_id, thread_id, payload):
await send_event(ws, agent_id, thread_id, payload)
monitor.record_event()

# Every minute
while True:
await asyncio.sleep(60)
monitor.log_stats()

Best Practices

Steady-State Sending

Design your app to send events at a steady rate, not in bursts:

# Bad: All events at once
async def bad_pattern(ws):
for i in range(100):
await send_event(ws, "athena", f"task-{i}", payload)

# Good: Spread events over time
async def good_pattern(ws):
for i in range(100):
await send_event(ws, "athena", f"task-{i}", payload)
await asyncio.sleep(1) # Spread over ~100 seconds

Batch Requests Appropriately

Instead of sending one event per user action, batch when possible:

# Bad: One event per keystroke
async def on_keystroke(char):
payload = {"message": char}
await send_event(ws, "athena", thread_id, payload)

# Good: Batch by request
async def on_user_submit():
full_message = get_accumulated_text()
payload = {"message": full_message}
await send_event(ws, "athena", thread_id, payload)

Queue Management

Always queue events internally if you can't send immediately:

class SmartQueue:
def __init__(self, rate_limit=60):
self.rate_limit = rate_limit
self.queue = asyncio.Queue()
self.sending = False

async def enqueue(self, event):
"""Enqueue event for sending"""
await self.queue.put(event)

if not self.sending:
asyncio.create_task(self.process_queue())

async def process_queue(self):
"""Send queued events respecting rate limit"""
self.sending = True
interval = 60 / self.rate_limit

try:
while True:
event = await asyncio.wait_for(self.queue.get(), timeout=1)
await send_to_relay(event)
await asyncio.sleep(interval)
except asyncio.TimeoutError:
self.sending = False

Upgrading Your Plan

If you consistently need more than 60 events/minute:

  1. Check dashboard for usage patterns
  2. Contact sales at sales@relay.ckgworks.com
  3. Upgrade to Scale plan (300 events/minute)
  4. Or request Enterprise plan for custom limits

Upgrades take effect immediately.

Edge Cases

Concurrent Events

"Max concurrent events" limits how many events can be processed simultaneously:

# This respects the 60/min limit but violates max concurrent (10)
# All events sent instantly:
for i in range(20):
await websocket.send(json.dumps(event)) # All sent immediately

# Should be:
for i in range(20):
await websocket.send(json.dumps(event))
await asyncio.sleep(0.1) # Stagger sending

Multiple Apps

Each app has its own rate limit budget:

App A (Portal):
- 60 events/min budget
- Currently using 40/min → 20 available

App B (Academy):
- 60 events/min budget
- Currently using 55/min → 5 available

App C (Flow):
- 60 events/min budget
- Currently using 0/min → 60 available

Limits don't share across apps.

Troubleshooting

Hitting rate limit even though usage seems low?

  • Check if you have multiple app instances (each has own limit)
  • Monitor actual events/min in dashboard
  • Look for retries being counted (failed + retry = 2 events)
  • Check if tests are accidentally sending real events

Want to send more than your plan allows?

  • Implement internal queueing (see examples above)
  • Contact sales to upgrade plan
  • For temporary spikes, we can whitelist temporarily

Rate limit reset timing?

  • Limits reset every 60 seconds
  • It's a sliding window (not calendar-based)
  • If you hit limit at T=0s, you get new tokens at T=60s

Best Practices Summary

Do

  • Queue events internally
  • Send steadily over time
  • Monitor usage in dashboard
  • Upgrade if consistently at limit
  • Handle RATE_LIMITED errors

Don't

  • Send events in bursts
  • Retry rate-limited events immediately
  • Hardcode rate limits in code
  • Assume high burst capacity
  • Ignore rate limit responses