Security Best Practices
For App Developers
1. Token Security
Store tokens in environment variables, never in code:
# ✅ GOOD
token = os.getenv("RELAY_APP_TOKEN")
# ❌ BAD
token = "rlk_portal_x8k2m9p..." # hardcoded
Add to .gitignore:
# .gitignore
.env
.env.local
.env.*.local
secrets.json
Verify it worked:
git status # should NOT show .env files
git log --all --full-history -- .env # should be empty
2. Connection Handling
Implement exponential backoff for reconnection:
async def connect_to_relay_with_backoff():
max_retries = 5
wait_time = 1 # seconds
for attempt in range(max_retries):
try:
async with websockets.connect(uri, extra_headers=headers) as ws:
return ws
except ConnectionError:
if attempt < max_retries - 1:
await asyncio.sleep(wait_time)
wait_time *= 2 # exponential backoff
else:
raise
Maintain connection health:
async def maintain_connection():
ws = await connect_to_relay_with_backoff()
try:
while True:
message = await ws.recv()
handle_message(message)
except websockets.exceptions.ConnectionClosed:
# Reconnect automatically
await maintain_connection()
3. Event Payload Design
Don't send secrets in payloads:
# ❌ BAD - exposes API keys
payload = {
"api_key": "sk_live_...",
"password": "...",
"secret_token": "..."
}
# ✅ GOOD - safe to expose
payload = {
"user_id": "user_123",
"task_id": "task_456",
"context": "Summarize the progress on this task",
"metadata": { "priority": "high", "due_date": "2026-04-15" }
}
Include enough context for the agent to respond:
# ✅ Agent has full context
payload = {
"event": "comment.mention",
"message": "@athena what's the status?",
"task": {
"id": "task_123",
"title": "Q2 Roadmap",
"description": "...",
"assignees": ["alice", "bob"],
"status": "in_progress",
"comments": [...] # conversation history
}
}
4. Error Handling
Handle Relay errors gracefully:
async def send_event(event):
try:
await ws.send(json.dumps(event))
except websockets.exceptions.ConnectionClosed:
print("Connection to Relay lost, reconnecting...")
await reconnect()
await send_event(event) # retry
async def handle_incoming():
try:
async for message in ws:
data = json.loads(message)
if data["type"] == "error":
print(f"Error from Relay: {data['error']}")
# Handle specific error codes
if data["code"] == "PERMISSION_DENIED":
print(f"App not allowlisted for agent {data['agent_id']}")
elif data["code"] == "AGENT_OFFLINE":
print("Agent is not connected")
# etc.
elif data["type"] == "reply":
deliver_to_app(data)
except Exception as e:
print(f"Error in message handling: {e}")
5. Validate Relay Responses
Verify event_id matches your request:
# Track sent events
pending_events = {}
await ws.send(json.dumps({
"type": "event",
"agent_id": "athena",
"thread_id": "task_123",
"payload": {...}
}))
async for message in ws:
data = json.loads(message)
event_id = data.get("event_id")
if event_id not in pending_events:
print(f"Unexpected event_id: {event_id}")
continue
# Process only events we sent
handle_response(pending_events[event_id], data)
Check message types:
VALID_TYPES = {"accepted", "token", "reply", "error", "pong"}
if data["type"] not in VALID_TYPES:
print(f"Unknown message type: {data['type']}")
6. Rate Limiting
Be aware of rate limits:
- Relay has per-app rate limits (details in API Reference)
- Monitor for
RATE_LIMIT_EXCEEDEDerrors - Implement backoff if you hit limits
if response.get("code") == "RATE_LIMIT_EXCEEDED":
wait_time = response.get("retry_after_seconds", 60)
print(f"Rate limited, waiting {wait_time}s")
await asyncio.sleep(wait_time)
7. Logging
Log appropriately (never log tokens):
# ✅ GOOD - useful debug info without exposing secrets
logging.info(f"Sending event to agent: {event.agent_id}")
logging.debug(f"Event payload size: {len(payload_json)} bytes")
logging.error(f"Failed to send event: {error_message}")
# ❌ BAD - exposes token
logging.debug(f"Using token: {token}")
logging.error(f"Auth failed with: Authorization: Bearer {token}")
# ❌ BAD - exposes full payload if it has secrets
logging.debug(f"Event: {json.dumps(event)}")
For Agent Operators
1. Token Security
Protect agent tokens like passwords:
- Store in environment variables
- Use secrets manager
- Never commit to version control
- Rotate periodically (monthly or quarterly)
- Rotate immediately if exposed
2. Connection Monitoring
Monitor agent connection status:
- Check the dashboard daily
- Set up alerts if agent goes offline unexpectedly
- Log all connection/disconnection events
ws.on('open', () => {
logging.info("Agent connected to Relay");
});
ws.on('close', () => {
logging.warn("Agent disconnected from Relay, reconnecting in 5s");
setTimeout(() => reconnect(), 5000);
});
ws.on('error', (error) => {
logging.error(`Connection error: ${error}`);
});
3. Session Management
Configure TTLs appropriately:
- Short TTL (7-14 days): General assistants, don't need long context
- Longer TTL (30+ days): Technical specialists, benefit from context history
Check TTL configuration in your plugin:
{
"agents": [
{
"token": "rla_athena_...",
"session_ttl_days": 14
},
{
"token": "rla_klyve_...",
"session_ttl_days": 30
}
]
}
4. Error Handling
Handle Relay errors in agent:
ws.on('message', async (data) => {
const message = JSON.parse(data);
try {
if (message.type === "error") {
logging.error(`Event error: ${message.error}`);
// Don't reply to errors, Relay already knows about them
return;
}
if (message.type === "event") {
const reply = await processEvent(message);
ws.send(JSON.stringify({
type: "reply",
event_id: message.event_id,
content: reply
}));
}
} catch (error) {
logging.error(`Error processing event: ${error}`);
ws.send(JSON.stringify({
type: "error",
event_id: message.event_id,
error: "Agent processing failed",
code: "AGENT_ERROR"
}));
}
});
5. Rate Limiting
Be aware of rate limits per app:
- If an app sends too many events, Relay throttles it
- Don't implement retries for events you receive (Relay already handled them)
- If you're slow to reply, apps may experience latency
6. Resource Management
Monitor memory and CPU:
- Sessions are stored in memory
- If you have many long-lived sessions, memory usage grows
- Monitor
session_countmetric - Clean up expired sessions regularly
7. Availability
Plan for restarts:
- During 1-hour grace period, you can restart without impact
- After grace period, reconnect immediately
- Queue outgoing replies (Relay buffers them)
- Replay missed events from Relay if needed
For Dashboard Admins
1. Account Security
Use a strong email provider:
- Gmail with 2FA enabled
- Microsoft 365 with 2FA enabled
- Corporate email with SSO
Don't use:
- Free email providers (less secure)
- Email addresses shared between people
- Email accounts without 2FA
2. Member Management
Review team members regularly:
- Monthly: Who has access to Relay?
- Is their role still appropriate?
- Do they still need access?
Monthly Checklist:
☐ List all team members
☐ Verify each is still on the team
☐ Check if roles match their responsibilities
☐ Remove anyone who left or changed roles
☐ Document any changes
3. Token Rotation
Rotate tokens periodically:
- Monthly for high-risk apps/agents
- Quarterly for others
- Immediately if exposed
Document rotations:
Token Rotation Log:
2026-04-05: Rotated Portal app token (monthly maintenance)
2026-04-05: Rotated Athena agent token (monthly maintenance)
2026-03-15: Rotated ResearchBot token (exposed in logs)
4. Allowlist Audits
Review allowlists quarterly:
- Is each restriction still necessary?
- Are there new apps that should be allowlisted?
- Are there apps that should be removed?
Allowlist Review (Q2 2026):
Athena (open): ✓ Correct, general-purpose
Klyve: allowlist [Portal, Flow]
✓ Still correct, specialized agent
ResearchBot: allowlist [Portal]
✓ Correct, experimental
Future: allowlist []
✓ Open, ready for adoption
5. Event Log Monitoring
Review event logs weekly:
- Check for rejection patterns
- Monitor failure rates
- Look for unusual activity
Weekly Event Log Review:
☐ Any rejected events? Why?
☐ Any failure spikes?
☐ Are connections stable?
☐ Are latencies normal?
6. Incident Response Plan
Prepare for common incidents:
| Incident | Response |
|---|---|
| Agent offline | Check connection status, restart service, check logs |
| High latency | Check network, verify agent load, check Relay status |
| Rejected events | Check allowlist, verify app can reach agent |
| High failure rate | Check agent logs, verify resources, monitor for patterns |
| Token exposed | Rotate immediately, review access logs, check for misuse |
Summary
For Developers
- Store tokens securely (env vars, secrets manager)
- Implement exponential backoff for reconnection
- Don't send secrets in payloads
- Validate Relay responses
- Handle errors gracefully
- Log appropriately (never log tokens)
For Operators
- Protect agent tokens
- Monitor connection status
- Configure session TTLs appropriately
- Handle errors in agents
- Monitor memory and resources
- Plan for restarts and maintenance
For Admins
- Use secure email providers with 2FA
- Review team members monthly
- Rotate tokens regularly
- Audit allowlists quarterly
- Monitor event logs weekly
- Have an incident response plan
Checklist for New Teams
Use this checklist when onboarding to Relay:
Setup:
☐ Create organization
☐ Register first app
☐ Register first agent
☐ Configure allowlists
☐ Invite team members
☐ Store tokens securely
Development:
☐ Implement WebSocket client with backoff
☐ Handle all error cases
☐ Implement event validation
☐ Add logging (no tokens)
☐ Test token rotation
☐ Test connection failures
Production:
☐ Use secrets manager for tokens
☐ Set up monitoring and alerts
☐ Plan incident response
☐ Document access controls
☐ Schedule monthly reviews
☐ Schedule quarterly audits
Next steps: