Rate Limits

Relay enforces rate limits to ensure fair usage and platform stability. This page explains how rate limiting works and how to handle rate limit errors.

Rate Limit Structure

Rate limits are applied per app and scoped to your organization. Different organizations have different limits based on their plan.

Standard Limits (Free/Starter Plan)

Limit	Value	Notes
Events per minute	60	Per app
Events per hour	3,600	Per app
Max concurrent events	10	Per app
Payload size	64KB	Per event

Scale Plan Limits

Limit	Value	Notes
Events per minute	300	Per app
Events per hour	18,000	Per app
Max concurrent events	50	Per app
Payload size	64KB	Per event

Enterprise Plan Limits

Contact sales for custom limits.

How Rate Limiting Works

Relay uses a token bucket algorithm:

Your app starts with a budget of tokens
Each event sent costs 1 token
Tokens are refilled every minute at a fixed rate
When you run out of tokens, new events are rejected

Example: 60 Events Per Minute

Start of minute 1:
  Budget: 60 tokens
  You send 30 events (30 remaining)

After 30 seconds:
  Budget: 30 (no new tokens yet)
  You send 20 events (10 remaining)

Start of minute 2:
  Budget: 60 tokens (refilled)
  You send 50 events (10 remaining)

Rate Limit Error Response

When you exceed your rate limit, you receive:

{
  "type": "error",
  "event_id": null,
  "agent_id": "athena",
  "error": "Rate limit exceeded: 60 events per minute (Starter plan)",
  "code": "RATE_LIMITED"
}

The response includes:

code: Always "RATE_LIMITED"
error: Human-readable message with your limit
HTTP 429 status code

Handling Rate Limits

Option 1: Queue Internally

Buffer events and send them at a steady rate:

import asyncio
from collections import deque

class EventQueue:
    def __init__(self, events_per_minute=60):
        self.events_per_minute = events_per_minute
        self.min_interval = 60 / events_per_minute  # seconds per event
        self.queue = deque()
        self.last_sent_time = 0

    async def enqueue(self, agent_id, thread_id, payload):
        """Add event to queue"""
        self.queue.append((agent_id, thread_id, payload))
        await self.process_queue()

    async def process_queue(self):
        """Send queued events at rate limit"""
        while self.queue:
            agent_id, thread_id, payload = self.queue.popleft()

            # Wait if needed to respect rate limit
            time_since_last = time.time() - self.last_sent_time
            if time_since_last < self.min_interval:
                await asyncio.sleep(self.min_interval - time_since_last)

            event = {
                "type": "event",
                "agent_id": agent_id,
                "thread_id": thread_id,
                "payload": payload
            }

            await websocket.send(json.dumps(event))
            self.last_sent_time = time.time()

# Usage
queue = EventQueue(events_per_minute=60)

# Users can send events as fast as they want
await queue.enqueue("athena", "task-123", payload)
await queue.enqueue("klyve", "task-124", payload)
await queue.enqueue("athena", "task-125", payload)

# They're automatically throttled to 60/min

Option 2: Retry with Backoff

When rate-limited, back off and retry:

async def send_with_rate_limit_handling(websocket, agent_id, thread_id, payload):
    """Send event, backing off if rate limited"""
    max_retries = 3
    backoff_seconds = 10  # Start at 10s for rate limit

    for attempt in range(max_retries):
        try:
            event = {"type": "event", "agent_id": agent_id, "thread_id": thread_id, "payload": payload}
            await websocket.send(json.dumps(event))

            response = await websocket.recv()
            data = json.loads(response)

            if data.get("code") == "RATE_LIMITED":
                if attempt < max_retries - 1:
                    delay = backoff_seconds + (2 ** attempt)  # 10s, 12s, 14s
                    print(f"Rate limited, retrying in {delay}s...")
                    await asyncio.sleep(delay)
                    continue
                else:
                    return {"error": "Rate limited after retries"}

            return data

        except Exception as e:
            return {"error": str(e)}

Option 3: Upgrade Plan

If you consistently hit rate limits, upgrade to a higher plan:

Starter → Scale:

Events/minute: 60 → 300
Max concurrent: 10 → 50
Cost: $xx/month

Contact sales@relay.ckgworks.com to upgrade.

Monitoring Rate Limit Usage

Headers (If Implemented)

Relay may return rate limit info in response headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1712510460

X-RateLimit-Limit: Your limit (events/minute)
X-RateLimit-Remaining: Events remaining before limit
X-RateLimit-Reset: Unix timestamp when limit resets

Use these to predict when you'll hit the limit:

def should_send_event(headers):
    """Check if it's safe to send another event"""
    remaining = int(headers.get("X-RateLimit-Remaining", 1))
    limit = int(headers.get("X-RateLimit-Limit", 60))

    # Buffer: always keep 5 events in reserve
    threshold = max(5, limit // 10)

    return remaining > threshold

Dashboard Metrics

In the Relay dashboard, check Metrics for your app:

Events sent (per minute, hour, day)
Events rejected (rate limited)
P50/P95/P99 latency
Error rates by code

Use this to understand your usage pattern.

Logging

Log your event sending rate to track patterns:

import logging
from collections import deque
import time

class RateLimitMonitor:
    def __init__(self, window_seconds=60):
        self.window_seconds = window_seconds
        self.timestamps = deque()

    def record_event(self):
        """Record that an event was sent"""
        now = time.time()
        self.timestamps.append(now)

        # Remove old entries
        while self.timestamps and self.timestamps[0] < now - self.window_seconds:
            self.timestamps.popleft()

    def get_rate(self):
        """Get current events per minute"""
        return len(self.timestamps) * (60 / self.window_seconds)

    def log_stats(self):
        """Log current usage"""
        rate = self.get_rate()
        logger.info(f"Event rate: {rate:.1f} events/min")

        if rate > 50:  # If approaching 60/min limit
            logger.warning(f"Approaching rate limit: {rate:.1f}/60 events/min")

# Usage
monitor = RateLimitMonitor()

async def send_event_with_monitoring(ws, agent_id, thread_id, payload):
    await send_event(ws, agent_id, thread_id, payload)
    monitor.record_event()

# Every minute
while True:
    await asyncio.sleep(60)
    monitor.log_stats()

Best Practices

Steady-State Sending

Design your app to send events at a steady rate, not in bursts:

# Bad: All events at once
async def bad_pattern(ws):
    for i in range(100):
        await send_event(ws, "athena", f"task-{i}", payload)

# Good: Spread events over time
async def good_pattern(ws):
    for i in range(100):
        await send_event(ws, "athena", f"task-{i}", payload)
        await asyncio.sleep(1)  # Spread over ~100 seconds

Batch Requests Appropriately

Instead of sending one event per user action, batch when possible:

# Bad: One event per keystroke
async def on_keystroke(char):
    payload = {"message": char}
    await send_event(ws, "athena", thread_id, payload)

# Good: Batch by request
async def on_user_submit():
    full_message = get_accumulated_text()
    payload = {"message": full_message}
    await send_event(ws, "athena", thread_id, payload)

Queue Management

Always queue events internally if you can't send immediately:

class SmartQueue:
    def __init__(self, rate_limit=60):
        self.rate_limit = rate_limit
        self.queue = asyncio.Queue()
        self.sending = False

    async def enqueue(self, event):
        """Enqueue event for sending"""
        await self.queue.put(event)

        if not self.sending:
            asyncio.create_task(self.process_queue())

    async def process_queue(self):
        """Send queued events respecting rate limit"""
        self.sending = True
        interval = 60 / self.rate_limit

        try:
            while True:
                event = await asyncio.wait_for(self.queue.get(), timeout=1)
                await send_to_relay(event)
                await asyncio.sleep(interval)
        except asyncio.TimeoutError:
            self.sending = False

Upgrading Your Plan

If you consistently need more than 60 events/minute:

Check dashboard for usage patterns
Contact sales at sales@relay.ckgworks.com
Upgrade to Scale plan (300 events/minute)
Or request Enterprise plan for custom limits

Upgrades take effect immediately.

Edge Cases

Concurrent Events

"Max concurrent events" limits how many events can be processed simultaneously:

# This respects the 60/min limit but violates max concurrent (10)
# All events sent instantly:
for i in range(20):
    await websocket.send(json.dumps(event))  # All sent immediately

# Should be:
for i in range(20):
    await websocket.send(json.dumps(event))
    await asyncio.sleep(0.1)  # Stagger sending

Multiple Apps

Each app has its own rate limit budget:

App A (Portal):
  - 60 events/min budget
  - Currently using 40/min → 20 available

App B (Academy):
  - 60 events/min budget
  - Currently using 55/min → 5 available

App C (Flow):
  - 60 events/min budget
  - Currently using 0/min → 60 available

Limits don't share across apps.

Troubleshooting

Hitting rate limit even though usage seems low?

Check if you have multiple app instances (each has own limit)
Monitor actual events/min in dashboard
Look for retries being counted (failed + retry = 2 events)
Check if tests are accidentally sending real events

Want to send more than your plan allows?

Implement internal queueing (see examples above)
Contact sales to upgrade plan
For temporary spikes, we can whitelist temporarily

Rate limit reset timing?

Limits reset every 60 seconds
It's a sliding window (not calendar-based)
If you hit limit at T=0s, you get new tokens at T=60s

Best Practices Summary

Queue events internally
Send steadily over time
Monitor usage in dashboard
Upgrade if consistently at limit
Handle RATE_LIMITED errors

Don't

Send events in bursts
Retry rate-limited events immediately
Hardcode rate limits in code
Assume high burst capacity
Ignore rate limit responses

Rate Limit Structure​

Standard Limits (Free/Starter Plan)​

Scale Plan Limits​

Enterprise Plan Limits​

How Rate Limiting Works​

Example: 60 Events Per Minute​

Rate Limit Error Response​

Handling Rate Limits​

Option 1: Queue Internally​

Option 2: Retry with Backoff​

Option 3: Upgrade Plan​

Monitoring Rate Limit Usage​

Headers (If Implemented)​

Dashboard Metrics​

Logging​

Best Practices​

Steady-State Sending​

Batch Requests Appropriately​

Queue Management​

Upgrading Your Plan​

Edge Cases​

Concurrent Events​

Multiple Apps​

Troubleshooting​

Best Practices Summary​