Error Codes Reference

This reference covers all error codes that can be returned by Relay, when they occur, and how to handle them.

Error Code Table

Code	HTTP Status	Retryable	Meaning	Recommended Action
`AGENT_OFFLINE`	503	Yes	Agent not currently connected	Retry with exponential backoff
`NOT_ALLOWED`	403	No	App not allowlisted for agent	Check allowlist, contact admin
`AGENT_NOT_FOUND`	404	No	Agent does not exist in org	Verify agent_id, check registration
`AGENT_TIMEOUT`	504	Maybe	Agent took too long to respond	Retry once, then escalate
`AGENT_ERROR`	500	Maybe	Agent returned an error	Check agent logs, retry
`PAYLOAD_TOO_LARGE`	413	No	Event payload exceeds 64KB	Reduce payload size, retry
`RATE_LIMITED`	429	Yes	Event rate limit exceeded	Respect rate limit window, retry
`INVALID_MESSAGE`	400	No	Message malformed or missing fields	Fix message structure, retry

Detailed Error Explanations

AGENT_OFFLINE

When it occurs: The target agent is not currently connected to Relay.

HTTP Status: 503 (Service Unavailable)

Is it retryable? Yes

Recommended action:

Retry after 1-2 seconds
Show user: "Athena is offline. Retrying..."
After 2-3 retries, inform user to try again later
Do not retry more than 5 times

if error_code == "AGENT_OFFLINE":
    # Retry with exponential backoff
    for attempt in range(3):
        await asyncio.sleep(2 ** (attempt + 1))  # 2s, 4s, 8s
        result = await send_event(ws, agent_id, thread_id, payload)
        if result.get("type") != "error":
            break

Possible causes:

Agent server crashed or went offline
Network connectivity issue between agent and Relay
Agent is being redeployed
Agent reached max concurrent connections

What the user sees: "Athena is currently offline. Please try again in a moment."

NOT_ALLOWED

When it occurs: Your app is not allowlisted for this agent.

HTTP Status: 403 (Forbidden)

Is it retryable? No

Recommended action:

Do NOT retry
Log as a permission error with app_id and agent_id
Disable the agent in your UI
Show user: "This agent is not available for your organization"
Contact your Relay admin to request allowlist access

if error_code == "AGENT_NOT_ALLOWED":
    logger.error(f"Permission denied: {app_id} not allowed for {agent_id}")
    print("Contact your admin to enable access to this agent")
    # Disable agent in UI
    disable_agent_button()

Possible causes:

Agent operator has allowlist enabled but didn't add your app
Your app was removed from allowlist
Wrong agent_id specified

What the user sees: "You don't have access to this agent. Contact your administrator."

AGENT_TIMEOUT

When it occurs: The agent took too long to respond (exceeded timeout window).

HTTP Status: 504 (Gateway Timeout)

Is it retryable? Maybe

Recommended action:

Retry once after 2-5 seconds
If it times out again, escalate to infrastructure team
Limit total retry attempts to 1-2
Show user: "Agent is slow. Please try again."

if error_code == "AGENT_TIMEOUT":
    # Retry once
    await asyncio.sleep(3)
    result = await send_event(ws, agent_id, thread_id, payload)

    if result.get("code") == "AGENT_TIMEOUT":
        # Failed twice, give up
        return {"error": "Agent is not responding", "user_message": "Please try again in a moment"}

Possible causes:

Agent is overloaded with other requests
Agent's AI model is slow (large response)
Agent crashed or hung mid-processing
Network latency

What the user sees: "Athena is taking longer than expected. Please try again."

PAYLOAD_TOO_LARGE

When it occurs: Your event payload exceeds the 64KB limit.

HTTP Status: 413 (Payload Too Large)

Is it retryable? No

Recommended action:

Do NOT retry with same payload
Reduce payload size
Remove non-essential fields (e.g., full conversation history)
Truncate large text fields
Retry with trimmed payload

import json

def trim_payload(payload, max_size_bytes=64*1024):
    """Reduce payload to fit within size limit"""
    current_size = len(json.dumps(payload).encode('utf-8'))

    if current_size <= max_size_bytes:
        return payload

    # Try removing largest fields
    trimmed = payload.copy()

    # Remove full comment history, keep only recent
    if "comments" in trimmed and len(trimmed["comments"]) > 5:
        trimmed["comments"] = trimmed["comments"][-5:]

    # Truncate long text fields
    if "description" in trimmed:
        trimmed["description"] = trimmed["description"][:500]

    if "payload" in trimmed:
        del trimmed["payload"]

    new_size = len(json.dumps(trimmed).encode('utf-8'))

    if new_size <= max_size_bytes:
        return trimmed

    raise ValueError(f"Payload too large: {new_size} bytes")

# Usage
try:
    await send_event(ws, agent_id, thread_id, payload)
except PayloadTooLargeError:
    trimmed = trim_payload(payload)
    await send_event(ws, agent_id, thread_id, trimmed)

Possible causes:

Payload includes entire conversation history
Payload includes large attachments/images as base64
Task has many comments or large descriptions
Debug/metadata fields included unnecessarily

What the user sees: (This is usually a backend error, not shown to user)

RATE_LIMITED

When it occurs: Your app has exceeded its per-minute event limit.

HTTP Status: 429 (Too Many Requests)

Is it retryable? Yes, but respect the limit

Recommended action:

Immediately stop sending new events
Queue events internally
Resume sending at a lower rate (respecting the limit)
Wait at least 10+ seconds before retrying
Use exponential backoff starting at 10 seconds

import time
import asyncio

class RateLimitedClient:
    def __init__(self, events_per_minute=60):
        self.events_per_minute = events_per_minute
        self.min_interval = 60 / events_per_minute
        self.event_queue = asyncio.Queue()
        self.last_sent_time = 0

    async def send_event(self, ws, agent_id, thread_id, payload):
        """Queue event and send respecting rate limit"""
        await self.event_queue.put((agent_id, thread_id, payload))
        await self.process_queue(ws)

    async def process_queue(self, ws):
        """Process queue at rate limit"""
        while not self.event_queue.empty():
            agent_id, thread_id, payload = await self.event_queue.get()

            # Wait if needed to respect rate limit
            time_since_last = time.time() - self.last_sent_time
            if time_since_last < self.min_interval:
                await asyncio.sleep(self.min_interval - time_since_last)

            event = {
                "type": "event",
                "agent_id": agent_id,
                "thread_id": thread_id,
                "payload": payload
            }

            try:
                await ws.send(json.dumps(event))
                self.last_sent_time = time.time()
            except RateLimitError:
                # Re-queue and wait
                await self.event_queue.put((agent_id, thread_id, payload))
                await asyncio.sleep(10)

Possible causes:

Your app is sending events too fast
Testing/load testing without rate limit awareness
Buggy loop sending duplicate events
Sudden traffic spike

What the user sees: (Usually handled transparently with queueing)

Check your rate limit: See Rate Limits for your organization's limits.

INVALID_MESSAGE

When it occurs: Event is malformed or missing required fields.

HTTP Status: 400 (Bad Request)

Is it retryable? No

Recommended action:

Do NOT retry
Log the malformed event
Fix the event structure
Validate before sending

def validate_event(agent_id, thread_id, payload):
    """Validate event before sending"""
    if not isinstance(agent_id, str) or not agent_id.strip():
        raise ValueError("agent_id must be non-empty string")

    if not isinstance(thread_id, str) or not thread_id.strip():
        raise ValueError("thread_id must be non-empty string")

    if not isinstance(payload, dict):
        raise ValueError("payload must be a dict")

    if len(json.dumps(payload)) > 64 * 1024:
        raise ValueError("payload exceeds 64KB")

    return True

# Usage
try:
    validate_event(agent_id, thread_id, payload)
    await send_event(ws, agent_id, thread_id, payload)
except ValueError as e:
    logger.error(f"Invalid event: {e}")

Common validation errors:

agent_id is empty or null
thread_id is empty or null
payload is not a dict
Event JSON is malformed
Required field is missing

What the user sees: (Usually a backend error)

RELAY_INTERNAL_ERROR

When it occurs: An unexpected error occurred in Relay (bug or infrastructure issue).

HTTP Status: 500 (Internal Server Error)

Is it retryable? Yes

Recommended action:

Retry with exponential backoff
After 3-5 failed attempts, escalate to support
Include event_id in bug report

if error_code == "RELAY_INTERNAL_ERROR":
    for attempt in range(3):
        delay = 2 ** attempt  # 1s, 2s, 4s
        await asyncio.sleep(delay)
        result = await send_event(ws, agent_id, thread_id, payload)

        if result.get("type") != "error":
            return result

    # Still failing, contact support
    logger.critical(f"Relay error persists for event {event_id}")
    notify_support(f"event_id={event_id}, app_id={app_id}")

Possible causes:

Bug in Relay code
Database connection issue
Infrastructure outage
Temporary service degradation

What the user sees: "Service temporarily unavailable. Please try again shortly."

Error Handling Strategy

Decision Tree

┌─ Is it retryable?
│  ├─ No: Log error, inform user, don't retry
│  │
│  └─ Yes: How many retries already?
│     ├─ 0-2: Retry with exponential backoff
│     │
│     └─ 3+: Give up, inform user

Sample Error Handler

class RelayErrorHandler:
    RETRYABLE_ERRORS = {
        "AGENT_OFFLINE",
        "AGENT_TIMEOUT",
        "RATE_LIMITED",
        "RELAY_INTERNAL_ERROR"
    }

    async def handle_error(self, error_code, event_id, max_retries=3):
        if error_code not in self.RETRYABLE_ERRORS:
            # Non-retryable
            logger.error(f"Non-retryable error {error_code} for {event_id}")
            return {"retryable": False, "code": error_code}

        # Retryable - try again
        for attempt in range(max_retries):
            delay = self.calculate_backoff(attempt, error_code)
            logger.info(f"Retrying after {delay}s (attempt {attempt+1}/{max_retries})")

            await asyncio.sleep(delay)
            result = await self.retry_event(event_id)

            if result.get("type") != "error":
                return {"retryable": True, "retried": True, "success": True}

        # Max retries exceeded
        logger.error(f"Max retries exceeded for {event_id}")
        return {"retryable": True, "retried": True, "success": False}

    def calculate_backoff(self, attempt, error_code):
        if error_code == "RATE_LIMITED":
            return max(10, 2 ** attempt)  # Start at 10s for rate limit
        else:
            return 2 ** attempt  # 1s, 2s, 4s, 8s, ...

Best Practices

Check error code in all responses
Retry only retryable errors
Use exponential backoff
Log all errors with context
Validate payloads before sending

Don't

Retry permission errors
Retry invalid events
Use fixed delays
Ignore rate limits
Retry indefinitely

Support

If you encounter persistent errors not covered here, contact support@relay.ckgworks.com with:

Error code
Event ID (if available)
Timestamp
Steps to reproduce
Logs

Error Code Table​

Detailed Error Explanations​

AGENT_OFFLINE​

NOT_ALLOWED​

AGENT_TIMEOUT​

PAYLOAD_TOO_LARGE​

RATE_LIMITED​

INVALID_MESSAGE​

RELAY_INTERNAL_ERROR​

Error Handling Strategy​

Decision Tree​

Sample Error Handler​

Best Practices​

Support​

Error Code Table

Detailed Error Explanations

AGENT_OFFLINE

NOT_ALLOWED

AGENT_TIMEOUT

PAYLOAD_TOO_LARGE

RATE_LIMITED

INVALID_MESSAGE

RELAY_INTERNAL_ERROR

Error Handling Strategy

Decision Tree

Sample Error Handler

Best Practices

Support