I Replaced 3 Paid Monitoring Tools With a Homelab at $0/Month

I was paying $25/month across three monitoring services for the same thing: knowing when my homelab services go down. Better Uptime ($5), UptimeRobot ($8), and Grafana Cloud ($12 for metrics retention).

Last month I replaced all three with a single Docker Compose stack running on the same ThinkCentre it’s monitoring. Three months in, it’s caught 14 outages, alerted me on all of them, and costs exactly $0 extra.

The Stack

Four containers, 512MB RAM total on idle:

services:
  uptime-kuma:
    image: louislam/uptime-kuma:latest
    ports: ["3001:3001"]
    volumes: ["./uptime-kuma:/app/data"]
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes: ["./prometheus.yml:/etc/prometheus/prometheus.yml"]
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    environment:
      - GF_INSTALL_PLUGINS=grafana-piechart-panel
    volumes: ["./grafana:/var/lib/grafana"]
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter:latest
    ports: ["9100:9100"]
    restart: unless-stopped
    network_mode: host

What Each Piece Does

**Uptime Kuma** — the replacement for Better Uptime and UptimeRobot. It pings 12 endpoints every 60 seconds: blog, ERP staging, Ollama, Open WebUI, PostgreSQL, Redis, CI runner, and five internal services. If something goes down, it sends notifications via Gotify (self-hosted push notifications) and Telegram.

**Prometheus + Node Exporter** — the Grafana Cloud replacement. Node Exporter scrapes system metrics (CPU, RAM, disk, network) every 15 seconds. Prometheus stores 30 days of data in 8GB of disk space.

**Grafana** — the dashboard. I rebuilt three dashboards from scratch:

1. **Service Overview** — status of all 12 endpoints, uptime percentages, response times

2. **System Health** — CPU/memory/disk trends, top processes, network I/O

3. **Docker Stats** — per-container resource usage, restart counts, image sizes

What I Gave Up

– **SMS alerts.** Better Uptime’s SMS alerting was great for critical outages. Uptime Kuma doesn’t do SMS without Twilio. I use Telegram + Gotify instead, which is free but requires internet.

– **99.99% uptime SLA.** The monitoring stack runs on the same machine it monitors. If the machine dies, the monitor dies too. I solved this with a $3/month VPS running a single ping-only check — it just pings the homelab IP and texts me if it’s unreachable.

– **Beautiful default dashboards.** Grafana Cloud’s dashboards look better out of the box. My self-hosted ones look functional (fine) but I spent 4 hours setting them up.

The Numbers

| Service | Monthly Cost | My Cost | Savings |
|———|————-|———|———|
| Better Uptime | $5 | $0 | $5 |
| UptimeRobot | $8 | $0 | $8 |
| Grafana Cloud | $12 | $0 | $12 |
| $3 VPS (failover) | $0 | $3 | -$3 |
| **Total** | **$25** | **$3** | **$22/month** |

That’s $264/year saved. The setup took 3 hours. Breakeven was at about 4 months — and I crossed it two months ago.

What Broke (And How I Fixed It)

**Break #1: Database file corruption.** Uptime Kuma uses SQLite. After a power outage, the DB was corrupted and lost 3 days of monitoring history. Fix: add a `sqlite3 .backup` cron job that snapshots the DB every 6 hours.

**Break #2: Grafana forgot all my dashboards.** I updated Grafana from v10 to v11 and the dashboard JSON format changed. Dashboards stayed in the SQLite DB but Grafana couldn’t parse them. Fix: always pin the Grafana version (`image: grafana/grafana:10.4.0`) instead of using `latest`.

**Break #3: Prometheus disk filled up.** After 60 days, Prometheus had consumed 22GB. The `retention` setting was defaulting to 15 days when I thought it was 30. Fix: explicit config:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

storage:
  tsdb:
    retention:
      time: 30d
      size: 10GB

The Verdict

Self-hosting monitoring saves money and teaches you way more about your infrastructure than paying for it ever did. I know exactly how much RAM Ollama uses at idle (2.4GB), how long PostgreSQL recovery takes (37 seconds), and which container restarts most often (WordPress, every 3-4 days due to PHP-FPM memory leaks).

But the biggest win: when something goes down at 2 AM, the alert goes to my phone via Telegram in under 60 seconds. That’s the same SLA I was paying $25/month for — with zero vendor lock-in and a lot more visibility.


*The Prometheus config and full Compose file are in my homelab repo if you want to replicate the stack. What monitoring tools are you paying for that you could self-host?*


Discover more from Susiloharjo

Subscribe to get the latest posts sent to your email.

Leave a Comment

Discover more from Susiloharjo

Subscribe now to keep reading and get access to the full archive.

Continue reading