How to monitor a Dokploy deployment

Dokploy is a self-hosted PaaS that handles the parts of running apps that nobody wants to do manually. Builds, deploys, SSL, reverse proxying, Docker orchestration. You push code, Dokploy builds a container and puts it behind Traefik with a Let's Encrypt certificate. It's like having your own mini-Heroku on a VPS.

What Dokploy shows you is whether your containers are running. The dashboard has green indicators for each service. But a running container is not the same thing as a working app. Your container can be "running" while the app inside it is crash-looping, stuck on a failed migration, or returning 500s on every request. Dokploy doesn't check that.

What to monitor

Each app by its public domain. If you're running three apps on Dokploy (say, a frontend, an API, and a docs site), each one has its own domain and its own failure modes. Monitor each domain separately. Don't assume that if one is up, the others are too.

HTTPS, not HTTP. Dokploy sets up SSL through Traefik and Let's Encrypt automatically. Monitor the HTTPS URL so you're also testing that the certificate is valid and Traefik is terminating TLS correctly. If the certificate expires or Traefik's config breaks, your HTTP check might still pass while users see browser warnings.

Health endpoints over root paths. GET / on a frontend will probably return 200 even if the backend it depends on is down. If your app has a health endpoint that checks its dependencies (database, Redis, external APIs), use that instead. The health check should return a non-200 status when something is wrong.

Dokploy-specific failure modes

Traefik routing issues. Dokploy uses Traefik as its reverse proxy. Traefik reads configuration from Docker labels, and if a label is wrong or missing, the routing breaks. The container runs, Dokploy says it's healthy, but requests to the domain return a 404 from Traefik or hit the wrong service. Monitoring from outside catches this immediately because your check gets the same 404 your users would.

SSL renewal failures. Traefik handles Let's Encrypt certificate issuance and renewal via ACME challenges. This can fail if your DNS isn't pointing to the server, if you've set up CAA records that don't include Let's Encrypt, or if Traefik's ACME storage gets corrupted. The certificate works fine until it expires, and then your site is effectively down. SSL expiry monitoring gives you weeks of warning.

Build failures that look like success. You push a change, Dokploy builds and deploys. But the new container fails to start. Dokploy keeps the old container running, so your app stays up, but the deploy didn't actually work. You think you shipped a fix, but you didn't. Uptime monitoring won't catch this directly (since the old container is still serving traffic), but if your "fix" was for a broken endpoint, monitoring that endpoint will tell you it's still broken.

Docker resource limits. If you've set memory limits on your containers and your app exceeds them, Docker kills the container. Dokploy's restart policy kicks in, but there's a gap. If the app consistently exceeds its memory limit, it enters a restart loop. The container is technically "running" (it keeps restarting), but it's never up long enough to serve requests.

Volume mount failures. If your app uses Docker volumes for persistent data and the volume mount fails (permissions, full disk, corruption), the container starts but the app can't access its data. Depending on how the app handles this, it might crash, return errors, or silently serve stale content.

Shared server resource exhaustion. Dokploy runs on a single server. If one app eats all the CPU or memory, every other app on the server suffers. A slow app can take down its neighbors. Monitoring each app independently helps you figure out which one is the problem.

Setting it up

Create an HTTP monitor in Larm for each of your Dokploy apps. Use the public domain, over HTTPS.

Timeout. Dokploy apps generally don't have cold starts since the containers run continuously. 5 seconds is a reasonable timeout. If your app is behind a slow database, you might need more.

Keyword validation. If your health endpoint returns JSON, check for a specific keyword in the response. This catches cases where the endpoint returns 200 but the body indicates a problem.

Check interval. 1-minute checks for anything user-facing. 3-minute checks (free plan) for staging environments or internal tools.

Monitoring the server itself

Dokploy runs on a server, and that server can have its own problems. If you're on Hetzner (which is where a lot of Dokploy instances run), the Hetzner monitoring guide covers server-level concerns like disk space, OOM kills, and network issues.

For a complete picture, you want both: server-level awareness (is the machine reachable?) and app-level monitoring (is each service working?). A TCP monitor on port 443 tells you the server is accepting connections. HTTP monitors on each app's domain tell you the apps are actually working.

Dokploy handles the hard parts of self-hosting. Monitoring is the part you add on top. Larm's free plan includes 15 monitors, which is enough to cover a typical Dokploy server with several apps and still have room left.

What to monitor

Dokploy-specific failure modes

Setting it up

Monitoring the server itself

Start monitoring in minutes.