Skip to main content

Security Best Practices

For App Developers

1. Token Security

Store tokens in environment variables, never in code:

# ✅ GOOD
token = os.getenv("RELAY_APP_TOKEN")

# ❌ BAD
token = "rlk_portal_x8k2m9p..." # hardcoded

Add to .gitignore:

# .gitignore
.env
.env.local
.env.*.local
secrets.json

Verify it worked:

git status # should NOT show .env files
git log --all --full-history -- .env # should be empty

2. Connection Handling

Implement exponential backoff for reconnection:

async def connect_to_relay_with_backoff():
max_retries = 5
wait_time = 1 # seconds

for attempt in range(max_retries):
try:
async with websockets.connect(uri, extra_headers=headers) as ws:
return ws
except ConnectionError:
if attempt < max_retries - 1:
await asyncio.sleep(wait_time)
wait_time *= 2 # exponential backoff
else:
raise

Maintain connection health:

async def maintain_connection():
ws = await connect_to_relay_with_backoff()

try:
while True:
message = await ws.recv()
handle_message(message)
except websockets.exceptions.ConnectionClosed:
# Reconnect automatically
await maintain_connection()

3. Event Payload Design

Don't send secrets in payloads:

# ❌ BAD - exposes API keys
payload = {
"api_key": "sk_live_...",
"password": "...",
"secret_token": "..."
}

# ✅ GOOD - safe to expose
payload = {
"user_id": "user_123",
"task_id": "task_456",
"context": "Summarize the progress on this task",
"metadata": { "priority": "high", "due_date": "2026-04-15" }
}

Include enough context for the agent to respond:

# ✅ Agent has full context
payload = {
"event": "comment.mention",
"message": "@athena what's the status?",
"task": {
"id": "task_123",
"title": "Q2 Roadmap",
"description": "...",
"assignees": ["alice", "bob"],
"status": "in_progress",
"comments": [...] # conversation history
}
}

4. Error Handling

Handle Relay errors gracefully:

async def send_event(event):
try:
await ws.send(json.dumps(event))
except websockets.exceptions.ConnectionClosed:
print("Connection to Relay lost, reconnecting...")
await reconnect()
await send_event(event) # retry

async def handle_incoming():
try:
async for message in ws:
data = json.loads(message)
if data["type"] == "error":
print(f"Error from Relay: {data['error']}")
# Handle specific error codes
if data["code"] == "PERMISSION_DENIED":
print(f"App not allowlisted for agent {data['agent_id']}")
elif data["code"] == "AGENT_OFFLINE":
print("Agent is not connected")
# etc.
elif data["type"] == "reply":
deliver_to_app(data)
except Exception as e:
print(f"Error in message handling: {e}")

5. Validate Relay Responses

Verify event_id matches your request:

# Track sent events
pending_events = {}

await ws.send(json.dumps({
"type": "event",
"agent_id": "athena",
"thread_id": "task_123",
"payload": {...}
}))

async for message in ws:
data = json.loads(message)
event_id = data.get("event_id")

if event_id not in pending_events:
print(f"Unexpected event_id: {event_id}")
continue

# Process only events we sent
handle_response(pending_events[event_id], data)

Check message types:

VALID_TYPES = {"accepted", "token", "reply", "error", "pong"}

if data["type"] not in VALID_TYPES:
print(f"Unknown message type: {data['type']}")

6. Rate Limiting

Be aware of rate limits:

  • Relay has per-app rate limits (details in API Reference)
  • Monitor for RATE_LIMIT_EXCEEDED errors
  • Implement backoff if you hit limits
if response.get("code") == "RATE_LIMIT_EXCEEDED":
wait_time = response.get("retry_after_seconds", 60)
print(f"Rate limited, waiting {wait_time}s")
await asyncio.sleep(wait_time)

7. Logging

Log appropriately (never log tokens):

# ✅ GOOD - useful debug info without exposing secrets
logging.info(f"Sending event to agent: {event.agent_id}")
logging.debug(f"Event payload size: {len(payload_json)} bytes")
logging.error(f"Failed to send event: {error_message}")

# ❌ BAD - exposes token
logging.debug(f"Using token: {token}")
logging.error(f"Auth failed with: Authorization: Bearer {token}")

# ❌ BAD - exposes full payload if it has secrets
logging.debug(f"Event: {json.dumps(event)}")

For Agent Operators

1. Token Security

Protect agent tokens like passwords:

  • Store in environment variables
  • Use secrets manager
  • Never commit to version control
  • Rotate periodically (monthly or quarterly)
  • Rotate immediately if exposed

2. Connection Monitoring

Monitor agent connection status:

  • Check the dashboard daily
  • Set up alerts if agent goes offline unexpectedly
  • Log all connection/disconnection events
ws.on('open', () => {
logging.info("Agent connected to Relay");
});

ws.on('close', () => {
logging.warn("Agent disconnected from Relay, reconnecting in 5s");
setTimeout(() => reconnect(), 5000);
});

ws.on('error', (error) => {
logging.error(`Connection error: ${error}`);
});

3. Session Management

Configure TTLs appropriately:

  • Short TTL (7-14 days): General assistants, don't need long context
  • Longer TTL (30+ days): Technical specialists, benefit from context history

Check TTL configuration in your plugin:

{
"agents": [
{
"token": "rla_athena_...",
"session_ttl_days": 14
},
{
"token": "rla_klyve_...",
"session_ttl_days": 30
}
]
}

4. Error Handling

Handle Relay errors in agent:

ws.on('message', async (data) => {
const message = JSON.parse(data);

try {
if (message.type === "error") {
logging.error(`Event error: ${message.error}`);
// Don't reply to errors, Relay already knows about them
return;
}

if (message.type === "event") {
const reply = await processEvent(message);
ws.send(JSON.stringify({
type: "reply",
event_id: message.event_id,
content: reply
}));
}
} catch (error) {
logging.error(`Error processing event: ${error}`);
ws.send(JSON.stringify({
type: "error",
event_id: message.event_id,
error: "Agent processing failed",
code: "AGENT_ERROR"
}));
}
});

5. Rate Limiting

Be aware of rate limits per app:

  • If an app sends too many events, Relay throttles it
  • Don't implement retries for events you receive (Relay already handled them)
  • If you're slow to reply, apps may experience latency

6. Resource Management

Monitor memory and CPU:

  • Sessions are stored in memory
  • If you have many long-lived sessions, memory usage grows
  • Monitor session_count metric
  • Clean up expired sessions regularly

7. Availability

Plan for restarts:

  • During 1-hour grace period, you can restart without impact
  • After grace period, reconnect immediately
  • Queue outgoing replies (Relay buffers them)
  • Replay missed events from Relay if needed

For Dashboard Admins

1. Account Security

Use a strong email provider:

  • Gmail with 2FA enabled
  • Microsoft 365 with 2FA enabled
  • Corporate email with SSO

Don't use:

  • Free email providers (less secure)
  • Email addresses shared between people
  • Email accounts without 2FA

2. Member Management

Review team members regularly:

  • Monthly: Who has access to Relay?
  • Is their role still appropriate?
  • Do they still need access?
Monthly Checklist:
☐ List all team members
☐ Verify each is still on the team
☐ Check if roles match their responsibilities
☐ Remove anyone who left or changed roles
☐ Document any changes

3. Token Rotation

Rotate tokens periodically:

  • Monthly for high-risk apps/agents
  • Quarterly for others
  • Immediately if exposed

Document rotations:

Token Rotation Log:
2026-04-05: Rotated Portal app token (monthly maintenance)
2026-04-05: Rotated Athena agent token (monthly maintenance)
2026-03-15: Rotated ResearchBot token (exposed in logs)

4. Allowlist Audits

Review allowlists quarterly:

  • Is each restriction still necessary?
  • Are there new apps that should be allowlisted?
  • Are there apps that should be removed?
Allowlist Review (Q2 2026):
Athena (open): ✓ Correct, general-purpose
Klyve: allowlist [Portal, Flow]
✓ Still correct, specialized agent
ResearchBot: allowlist [Portal]
✓ Correct, experimental
Future: allowlist []
✓ Open, ready for adoption

5. Event Log Monitoring

Review event logs weekly:

  • Check for rejection patterns
  • Monitor failure rates
  • Look for unusual activity
Weekly Event Log Review:
☐ Any rejected events? Why?
☐ Any failure spikes?
☐ Are connections stable?
☐ Are latencies normal?

6. Incident Response Plan

Prepare for common incidents:

IncidentResponse
Agent offlineCheck connection status, restart service, check logs
High latencyCheck network, verify agent load, check Relay status
Rejected eventsCheck allowlist, verify app can reach agent
High failure rateCheck agent logs, verify resources, monitor for patterns
Token exposedRotate immediately, review access logs, check for misuse

Summary

For Developers

  • Store tokens securely (env vars, secrets manager)
  • Implement exponential backoff for reconnection
  • Don't send secrets in payloads
  • Validate Relay responses
  • Handle errors gracefully
  • Log appropriately (never log tokens)

For Operators

  • Protect agent tokens
  • Monitor connection status
  • Configure session TTLs appropriately
  • Handle errors in agents
  • Monitor memory and resources
  • Plan for restarts and maintenance

For Admins

  • Use secure email providers with 2FA
  • Review team members monthly
  • Rotate tokens regularly
  • Audit allowlists quarterly
  • Monitor event logs weekly
  • Have an incident response plan

Checklist for New Teams

Use this checklist when onboarding to Relay:

Setup:
☐ Create organization
☐ Register first app
☐ Register first agent
☐ Configure allowlists
☐ Invite team members
☐ Store tokens securely

Development:
☐ Implement WebSocket client with backoff
☐ Handle all error cases
☐ Implement event validation
☐ Add logging (no tokens)
☐ Test token rotation
☐ Test connection failures

Production:
☐ Use secrets manager for tokens
☐ Set up monitoring and alerts
☐ Plan incident response
☐ Document access controls
☐ Schedule monthly reviews
☐ Schedule quarterly audits

Next steps: