How to monitor a Railway deployment

Railway is great for getting things deployed quickly. Push your code, it builds, it runs, you get a URL. For a lot of projects, that's all you need.

But Railway doesn't monitor your app for you. It monitors the container. If your container is running, Railway is happy. Whether your app is actually responding to requests, returning the right status codes, or serving the right content is not something Railway tracks. Your container can be running while your app is stuck in a boot loop, throwing 500s, or deadlocked on a database connection.

This is the gap that uptime monitoring fills. You need something that checks your app from the outside, the way a real user would, and alerts you when it stops working.

What to monitor

Your public URL. This is the most important check. Whatever URL your users hit, that's what you should monitor. On Railway, this is either the auto-generated *.up.railway.app domain or your custom domain. Use HTTPS, and check for a 200 status code.

If your app has a dedicated health check endpoint (like /health or /up), use that instead. A good health check does more than return 200. It verifies the app can reach its database, any external services it depends on, and that it's actually ready to serve requests. A simple GET / might return 200 from a cached page even when the database is down.

Your API endpoints. If you're running an API on Railway, monitor the endpoints your users actually call. A health check might pass while a specific route is broken because of a missing env var or a failed migration. You can check for specific status codes, response body keywords, and response time thresholds.

Your custom domain SSL. If you've set up a custom domain on Railway, you have an SSL certificate that needs to stay valid. Railway manages the certificate for you, but renewal can fail if your DNS isn't configured correctly. Monitoring the SSL certificate catches this before your users see browser warnings.

Setting it up

Create an HTTP monitor in Larm and enter your Railway app's URL. That's the basic setup. Larm will check it from multiple global locations and only alert you when multiple probes confirm the app is down, so you won't get false alerts from transient network issues.

A few things worth configuring:

Check interval. How often to check depends on how quickly you need to know about downtime. 1-minute checks are good for production services. 3-minute checks (available on the free plan) are fine for most things.

Expected status code. Default is 200, which works for most cases. If your health endpoint returns 204, set that instead.

Keyword validation. For extra confidence, you can tell Larm to look for (or reject) specific strings in the response body. If your health check returns {"status":"ok"}, checking for "ok" in the response body catches cases where the endpoint returns 200 but with an error message.

Timeout. Railway apps on the Hobby plan can have cold starts if they haven't received traffic in a while. If you're on a plan that sleeps inactive services, set a generous timeout (10-15 seconds) to avoid false alerts from cold starts. If your service is always-on, 5 seconds is reasonable.

Railway-specific things to watch for

Deploys. Railway does rolling deploys by default, which means there's usually no downtime during deployment. But if your new version fails to start, Railway will keep the old version running. Your monitoring won't see a blip, but you also won't know that your deploy failed. Check Railway's deploy logs separately.

Private networking. If you have multiple services on Railway communicating over the private network, your external monitors can't see those connections. If your API depends on an internal Redis or Postgres service, a health check endpoint that verifies those connections is the only way to catch internal failures from outside.

Region-specific issues. Railway runs in specific regions. If you've deployed to us-west, your app might have higher latency from other continents. Monitoring from multiple locations helps you understand what your global users are experiencing, not just what things look like from your region.

Volume mounts. If your Railway service uses persistent volumes, a volume mount failure can cause your app to crash or behave unexpectedly on restart. Your app might start and respond to requests but fail when it tries to read or write data. Health checks that touch the filesystem catch this.

If you're running multiple services

Most Railway projects grow beyond a single service. You end up with an API, a frontend, a worker, maybe a database proxy. Each one of these is a separate failure point.

For HTTP services (APIs, frontends), set up an HTTP monitor for each one. For background workers that don't serve HTTP, use heartbeat monitoring. Have the worker ping a heartbeat URL at the end of each job cycle, and alert when the ping stops coming. This is covered in more detail in the cron job monitoring guide.

Railway makes deployment simple. Monitoring is the part you add yourself. If you want to get started, Larm's free plan gives you 15 monitors with multi-probe voting from all locations, which is enough to cover most Railway projects.

What to monitor

Setting it up

Railway-specific things to watch for

If you're running multiple services

Start monitoring in minutes.