Security Best Practices

For App Developers

1. Token Security

Store tokens in environment variables, never in code:

# ✅ GOOD
token = os.getenv("RELAY_APP_TOKEN")

# ❌ BAD
token = "rlk_portal_x8k2m9p..."  # hardcoded

Add to .gitignore:

# .gitignore
.env
.env.local
.env.*.local
secrets.json

Verify it worked:

git status  # should NOT show .env files
git log --all --full-history -- .env  # should be empty

2. Connection Handling

Implement exponential backoff for reconnection:

async def connect_to_relay_with_backoff():
    max_retries = 5
    wait_time = 1  # seconds

    for attempt in range(max_retries):
        try:
            async with websockets.connect(uri, extra_headers=headers) as ws:
                return ws
        except ConnectionError:
            if attempt < max_retries - 1:
                await asyncio.sleep(wait_time)
                wait_time *= 2  # exponential backoff
            else:
                raise

Maintain connection health:

async def maintain_connection():
    ws = await connect_to_relay_with_backoff()

    try:
        while True:
            message = await ws.recv()
            handle_message(message)
    except websockets.exceptions.ConnectionClosed:
        # Reconnect automatically
        await maintain_connection()

3. Event Payload Design

Don't send secrets in payloads:

# ❌ BAD - exposes API keys
payload = {
    "api_key": "sk_live_...",
    "password": "...",
    "secret_token": "..."
}

# ✅ GOOD - safe to expose
payload = {
    "user_id": "user_123",
    "task_id": "task_456",
    "context": "Summarize the progress on this task",
    "metadata": { "priority": "high", "due_date": "2026-04-15" }
}

Include enough context for the agent to respond:

# ✅ Agent has full context
payload = {
    "event": "comment.mention",
    "message": "@athena what's the status?",
    "task": {
        "id": "task_123",
        "title": "Q2 Roadmap",
        "description": "...",
        "assignees": ["alice", "bob"],
        "status": "in_progress",
        "comments": [...]  # conversation history
    }
}

4. Error Handling

Handle Relay errors gracefully:

async def send_event(event):
    try:
        await ws.send(json.dumps(event))
    except websockets.exceptions.ConnectionClosed:
        print("Connection to Relay lost, reconnecting...")
        await reconnect()
        await send_event(event)  # retry

async def handle_incoming():
    try:
        async for message in ws:
            data = json.loads(message)
            if data["type"] == "error":
                print(f"Error from Relay: {data['error']}")
                # Handle specific error codes
                if data["code"] == "PERMISSION_DENIED":
                    print(f"App not allowlisted for agent {data['agent_id']}")
                elif data["code"] == "AGENT_OFFLINE":
                    print("Agent is not connected")
                # etc.
            elif data["type"] == "reply":
                deliver_to_app(data)
    except Exception as e:
        print(f"Error in message handling: {e}")

5. Validate Relay Responses

Verify event_id matches your request:

# Track sent events
pending_events = {}

await ws.send(json.dumps({
    "type": "event",
    "agent_id": "athena",
    "thread_id": "task_123",
    "payload": {...}
}))

async for message in ws:
    data = json.loads(message)
    event_id = data.get("event_id")

    if event_id not in pending_events:
        print(f"Unexpected event_id: {event_id}")
        continue

    # Process only events we sent
    handle_response(pending_events[event_id], data)

Check message types:

VALID_TYPES = {"accepted", "token", "reply", "error", "pong"}

if data["type"] not in VALID_TYPES:
    print(f"Unknown message type: {data['type']}")

6. Rate Limiting

Be aware of rate limits:

Relay has per-app rate limits (details in API Reference)
Monitor for RATE_LIMIT_EXCEEDED errors
Implement backoff if you hit limits

if response.get("code") == "RATE_LIMIT_EXCEEDED":
    wait_time = response.get("retry_after_seconds", 60)
    print(f"Rate limited, waiting {wait_time}s")
    await asyncio.sleep(wait_time)

7. Logging

Log appropriately (never log tokens):

# ✅ GOOD - useful debug info without exposing secrets
logging.info(f"Sending event to agent: {event.agent_id}")
logging.debug(f"Event payload size: {len(payload_json)} bytes")
logging.error(f"Failed to send event: {error_message}")

# ❌ BAD - exposes token
logging.debug(f"Using token: {token}")
logging.error(f"Auth failed with: Authorization: Bearer {token}")

# ❌ BAD - exposes full payload if it has secrets
logging.debug(f"Event: {json.dumps(event)}")

For Agent Operators

1. Token Security

Protect agent tokens like passwords:

Store in environment variables
Use secrets manager
Never commit to version control
Rotate periodically (monthly or quarterly)
Rotate immediately if exposed

2. Connection Monitoring

Monitor agent connection status:

Check the dashboard daily
Set up alerts if agent goes offline unexpectedly
Log all connection/disconnection events

ws.on('open', () => {
  logging.info("Agent connected to Relay");
});

ws.on('close', () => {
  logging.warn("Agent disconnected from Relay, reconnecting in 5s");
  setTimeout(() => reconnect(), 5000);
});

ws.on('error', (error) => {
  logging.error(`Connection error: ${error}`);
});

3. Session Management

Configure TTLs appropriately:

Short TTL (7-14 days): General assistants, don't need long context
Longer TTL (30+ days): Technical specialists, benefit from context history

Check TTL configuration in your plugin:

{
  "agents": [
    {
      "token": "rla_athena_...",
      "session_ttl_days": 14
    },
    {
      "token": "rla_klyve_...",
      "session_ttl_days": 30
    }
  ]
}

4. Error Handling

Handle Relay errors in agent:

ws.on('message', async (data) => {
  const message = JSON.parse(data);

  try {
    if (message.type === "error") {
      logging.error(`Event error: ${message.error}`);
      // Don't reply to errors, Relay already knows about them
      return;
    }

    if (message.type === "event") {
      const reply = await processEvent(message);
      ws.send(JSON.stringify({
        type: "reply",
        event_id: message.event_id,
        content: reply
      }));
    }
  } catch (error) {
    logging.error(`Error processing event: ${error}`);
    ws.send(JSON.stringify({
      type: "error",
      event_id: message.event_id,
      error: "Agent processing failed",
      code: "AGENT_ERROR"
    }));
  }
});

5. Rate Limiting

Be aware of rate limits per app:

If an app sends too many events, Relay throttles it
Don't implement retries for events you receive (Relay already handled them)
If you're slow to reply, apps may experience latency

6. Resource Management

Monitor memory and CPU:

Sessions are stored in memory
If you have many long-lived sessions, memory usage grows
Monitor session_count metric
Clean up expired sessions regularly

7. Availability

Plan for restarts:

During 1-hour grace period, you can restart without impact
After grace period, reconnect immediately
Queue outgoing replies (Relay buffers them)
Replay missed events from Relay if needed

For Dashboard Admins

1. Account Security

Use a strong email provider:

Gmail with 2FA enabled
Microsoft 365 with 2FA enabled
Corporate email with SSO

Don't use:

Free email providers (less secure)
Email addresses shared between people
Email accounts without 2FA

2. Member Management

Review team members regularly:

Monthly: Who has access to Relay?
Is their role still appropriate?
Do they still need access?

Monthly Checklist:
  ☐ List all team members
  ☐ Verify each is still on the team
  ☐ Check if roles match their responsibilities
  ☐ Remove anyone who left or changed roles
  ☐ Document any changes

3. Token Rotation

Rotate tokens periodically:

Monthly for high-risk apps/agents
Quarterly for others
Immediately if exposed

Document rotations:

Token Rotation Log:
  2026-04-05: Rotated Portal app token (monthly maintenance)
  2026-04-05: Rotated Athena agent token (monthly maintenance)
  2026-03-15: Rotated ResearchBot token (exposed in logs)

4. Allowlist Audits

Review allowlists quarterly:

Is each restriction still necessary?
Are there new apps that should be allowlisted?
Are there apps that should be removed?

Allowlist Review (Q2 2026):
  Athena (open): ✓ Correct, general-purpose
  Klyve: allowlist [Portal, Flow]
    ✓ Still correct, specialized agent
  ResearchBot: allowlist [Portal]
    ✓ Correct, experimental
  Future: allowlist []
    ✓ Open, ready for adoption

5. Event Log Monitoring

Review event logs weekly:

Check for rejection patterns
Monitor failure rates
Look for unusual activity

Weekly Event Log Review:
  ☐ Any rejected events? Why?
  ☐ Any failure spikes?
  ☐ Are connections stable?
  ☐ Are latencies normal?

6. Incident Response Plan

Prepare for common incidents:

Incident	Response
Agent offline	Check connection status, restart service, check logs
High latency	Check network, verify agent load, check Relay status
Rejected events	Check allowlist, verify app can reach agent
High failure rate	Check agent logs, verify resources, monitor for patterns
Token exposed	Rotate immediately, review access logs, check for misuse

Summary

For Developers

Store tokens securely (env vars, secrets manager)
Implement exponential backoff for reconnection
Don't send secrets in payloads
Validate Relay responses
Handle errors gracefully
Log appropriately (never log tokens)

For Operators

Protect agent tokens
Monitor connection status
Configure session TTLs appropriately
Handle errors in agents
Monitor memory and resources
Plan for restarts and maintenance

For Admins

Use secure email providers with 2FA
Review team members monthly
Rotate tokens regularly
Audit allowlists quarterly
Monitor event logs weekly
Have an incident response plan

Checklist for New Teams

Use this checklist when onboarding to Relay:

Setup:
  ☐ Create organization
  ☐ Register first app
  ☐ Register first agent
  ☐ Configure allowlists
  ☐ Invite team members
  ☐ Store tokens securely

Development:
  ☐ Implement WebSocket client with backoff
  ☐ Handle all error cases
  ☐ Implement event validation
  ☐ Add logging (no tokens)
  ☐ Test token rotation
  ☐ Test connection failures

Production:
  ☐ Use secrets manager for tokens
  ☐ Set up monitoring and alerts
  ☐ Plan incident response
  ☐ Document access controls
  ☐ Schedule monthly reviews
  ☐ Schedule quarterly audits

Next steps:

For App Developers​

1. Token Security​

2. Connection Handling​

3. Event Payload Design​

4. Error Handling​

5. Validate Relay Responses​

6. Rate Limiting​

7. Logging​

For Agent Operators​

1. Token Security​

2. Connection Monitoring​

3. Session Management​

4. Error Handling​

5. Rate Limiting​

6. Resource Management​

7. Availability​

For Dashboard Admins​

1. Account Security​

2. Member Management​

3. Token Rotation​

4. Allowlist Audits​

5. Event Log Monitoring​

6. Incident Response Plan​

Summary​

For Developers​

For Operators​

For Admins​

Checklist for New Teams​

For App Developers

1. Token Security

2. Connection Handling

3. Event Payload Design

4. Error Handling

5. Validate Relay Responses

6. Rate Limiting

7. Logging

For Agent Operators

1. Token Security

2. Connection Monitoring

3. Session Management

4. Error Handling

5. Rate Limiting

6. Resource Management

7. Availability

For Dashboard Admins

1. Account Security

2. Member Management

3. Token Rotation

4. Allowlist Audits

5. Event Log Monitoring

6. Incident Response Plan

Summary

For Developers

For Operators

For Admins

Checklist for New Teams