Configuration & CLI Troubleshooting

OpenClaw Gateway Disconnected: 5 Fixes That Actually Work

A disconnected gateway silences every agent on your system simultaneously. The error looks catastrophic but usually traces to one of five fixable root causes. Identify the right one in under three minutes with this diagnostic flow.

MK
M. Kim
AI Product Specialist
Feb 25, 2025 12 min read 7.1k views
Updated Feb 25, 2025
Key Takeaways
  • Process crash is the most common cause — run as a systemd service with Restart=always to recover automatically
  • Network timeouts sever WebSocket connections — firewalls and load balancers with idle timeout settings are the usual culprits
  • Heartbeat misconfiguration makes the gateway appear offline even when running — tune intervalSeconds to match your network conditions
  • Memory exhaustion causes silent crashes — check free memory on your host before assuming a software bug
  • Always check logs immediately after a disconnect before attempting any fix

Every gateway disconnect has a paper trail. The gateway logs the reason before it goes down — or the OS does, if it was killed by the kernel. You have to read those logs before you touch anything else. The mistake that extends every incident is restarting the gateway without capturing the failure reason first.

Diagnosing Which Disconnect You're Dealing With

Run this diagnostic sequence immediately when you see a gateway disconnected error:

# Is the process still running?
ps aux | grep openclaw-gateway

# What happened right before the disconnect?
openclaw logs --tail 50 --level debug

# Is it a systemd service? Check the unit status
systemctl status openclaw-gateway

# Is memory the issue?
free -h

The output of these four commands tells you which of the five root causes you're dealing with. Match what you see to the fix below.

💡
Capture Logs Before Restarting
A restart clears the in-memory log buffer. If you restart before reading the logs, the failure reason is gone. Read openclaw logs --tail 100 and copy the output before touching anything else. This single habit cuts debugging time in half.

Fix 1: Gateway Process Crashed

Symptom: ps aux | grep openclaw-gateway returns nothing. The process is gone.

Check the log for the crash reason. Common causes: an unhandled exception in a skill, a corrupted config file, or a Node.js segfault. The log line immediately before the process exit contains the cause.

Fix the underlying cause, then make future crashes self-healing by running as a systemd service:

[Unit]
Description=OpenClaw Gateway
After=network.target

[Service]
ExecStart=/usr/bin/openclaw gateway start
Restart=always
RestartSec=5
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target

With Restart=always and RestartSec=5, a crashed gateway restarts automatically within five seconds. Your agents reconnect automatically — channel downtime is typically under 30 seconds.

Fix 2: Network Timeout Severed the Connection

Symptom: The process is still running (ps aux shows it), but agents report disconnected. Gateway logs show a WebSocket close event without a crash.

WebSocket connections are long-lived TCP connections. If no data flows for a period, network equipment along the path may close the connection silently. The gateway and agent both think the connection is alive until the next message fails.

Fix: reduce the heartbeat interval so data flows frequently enough to keep the connection alive:

heartbeat:
  intervalSeconds: 20
  missesBeforeOffline: 3

With a 20-second interval, keepalive packets flow every 20 seconds — shorter than most firewall idle timeouts (typically 60–120 seconds).

Fix 3: Firewall or Load Balancer Closing Connections

Symptom: Disconnects happen at regular, predictable intervals (every 60 seconds, every 5 minutes). The timing matches your infrastructure's idle timeout setting.

Cloud load balancers (AWS ALB, GCP LB, Cloudflare) often have default idle timeouts of 60 seconds. A WebSocket connection that goes quiet for 60 seconds gets terminated.

Two options:

  • Increase the load balancer's idle timeout — set it to 3600 seconds (1 hour) to accommodate long-running agent conversations
  • Configure WebSocket keepalive at the gateway — set heartbeat.intervalSeconds to less than half your firewall's idle timeout

For Nginx reverse proxies, add these directives to your server block:

proxy_read_timeout 3600;
proxy_send_timeout 3600;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

Fix 4: Heartbeat Misconfiguration

Symptom: The gateway process is healthy and the network connection is stable, but the dashboard shows "gateway offline." Agents send messages but nothing happens.

This is a false offline detection. The heartbeat interval is too aggressive — the gateway is declaring itself offline before it's actually failed.

Check your heartbeat config:

heartbeat:
  intervalSeconds: 30     # How often to check
  missesBeforeOffline: 2  # How many missed beats = offline

With these values, the gateway is declared offline after 60 seconds (2 × 30) without a response. If your server has high latency or temporary network jitter, this fires incorrectly. Increase missesBeforeOffline to 3 or 4 to add tolerance.

Fix 5: Memory Exhaustion Killing the Process

Symptom: Gateway disconnects happen after extended uptime, often at consistent memory-usage patterns. free -h shows low available memory. The OOM killer appears in system logs.

Check for OOM kills:

dmesg | grep -i "out of memory"
journalctl -k | grep "killed process"

If the kernel is killing the gateway process due to memory pressure, the fix is:

  • Increase available RAM on the host
  • Add a swap file as a safety buffer
  • Reduce the number of concurrent agents or active conversations
  • Profile memory usage with NODE_OPTIONS=--max-old-space-size=512 to cap Node.js heap size
⚠️
Memory Is the Silent Killer
Memory exhaustion causes clean-looking process exits with no error in the application log. The only evidence is in the kernel log (dmesg or journalctl -k). Builders consistently blame the application for what is actually an infrastructure sizing problem.

Frequently Asked Questions

Why does the OpenClaw gateway keep disconnecting?

The most common causes: gateway process crashing, a network timeout severing the WebSocket connection, a firewall closing idle connections, a misconfigured heartbeat interval, or memory exhaustion killing the process. Check gateway logs immediately after each disconnect to identify which cause applies.

How do I make the OpenClaw gateway reconnect automatically?

Run the gateway as a systemd service with Restart=always and RestartSec=5. This restarts the process within 5 seconds of any crash. Agents reconnect automatically — channel downtime is typically under 30 seconds.

What does 'gateway offline' mean in OpenClaw?

Gateway offline means the heartbeat check failed — the gateway didn't respond to consecutive heartbeat pings within the configured window. This can mean the process crashed, the network is unreachable, or the heartbeat interval is misconfigured and too aggressive for your connection latency.

How do I check if the OpenClaw gateway is running?

Run ps aux | grep openclaw-gateway to check if the process is alive. For systemd-managed gateways, use systemctl status openclaw-gateway. You can also test the health endpoint directly: curl http://localhost:8080/health.

Can a firewall cause OpenClaw gateway disconnects?

Yes. Firewalls and load balancers that close idle TCP connections will sever WebSocket connections. Configure your firewall to allow long-lived connections on the gateway port, or set the heartbeat interval below half your firewall's idle timeout to keep connections active.

What log level should I use to debug gateway disconnects?

Set log level to debug temporarily: openclaw logs --level debug --follow. This shows every connection event, heartbeat exchange, and disconnect reason in real time. Revert to info or warn once you've identified the root cause.

MK
M. Kim
AI Product Specialist

M. Kim focuses on OpenClaw reliability and uptime across production deployments. Has diagnosed and resolved over 200 gateway stability incidents, developed the internal runbook for systematic gateway troubleshooting, and reduced mean time to resolution by 65% through better diagnostic tooling.

OpenClaw Reliability Guides

Weekly uptime and stability tips, free.