3 de marzo de 2026 · 4 min · Carol
Un uptime alto no es lo mismo que disponibilidad
Tu monitoreo puede reportar 99.99% y tu cliente estar furioso. La diferencia está en qué monitoreas, con qué frecuencia, y qué consideras "arriba".
"The site is up 99.99% of the time" — who hasn't seen that line in a sales pitch? The number is pretty, it's reassuring, it becomes a footer badge. But it hides more than it shows.
Uptime is a measurement, not a guarantee. And like any measurement, it depends on how you define the boundaries.
What "up" means to your monitor
Most monitoring tools check a single thing: a GET on the home page. If it responded 200 OK within the timeout, it's "up". If it didn't respond, it's "down".
This catches:
- ✅ Server completely down
- ✅ Broken DNS
- ✅ Expired SSL certificate
- ✅ Full timeout
But it doesn't catch:
- ❌ Home page OK, checkout broken
- ❌ 200 OK returning a generic error HTML ("Service Unavailable")
- ❌ Degraded response time (slow but up)
- ❌ API working, admin panel down
- ❌ SSL cert expires in 5 days (still valid = "up")
Result: you close the month with 99.99% on the dashboard and a queue of tickets complaining.
The three levels of monitoring
1. Basic availability (HTTP)
GET / and look at the status code. It's the minimum. It detects most catastrophic problems but misses the silent ones.
Use for: a simple corporate site, a landing page.
2. Keyword check
GET / and look for a keyword in the body. If the word "Welcome" disappeared, something is wrong — even if the status is 200.
Use for: detecting a generic error page serving 200, defacement, an unscheduled banner.
3. Multi-endpoint
Monitor separately: home, login, payment API, admin dashboard, webhook. Each has its own alert. The status page shows it granularly.
Use for: SaaS, e-commerce, anything where different parts can fail independently.
Frequency: 1 minute vs 5 minutes
UptimeRobot free checks every 5 minutes. Sentinela and most paid plans check every 1 minute. The difference in practice:
- Fails at 2:03:30 PM, 5-min interval → detected at 2:05 → alert around 2:06
- Fails at 2:03:30 PM, 1-min interval → detected at 2:04 → alert around 2:05
Seems small. But if you run e-commerce at peak time, 1 minute of undetected downtime is a few lost orders. Multiply by month, it becomes a real difference.
The trade-off is cost: 1/min = 5x more checks = more load on your server (small) and more load on the monitor (you pay for it in the plan).
"We're up but we're slow"
Degraded latency is the silent killer. The site responds 200 OK, but it took 8 seconds. The user gave up before the response arrived. To the monitor, everything was OK.
How to catch it: monitor not only the status, but the response time. The metrics that matter:
- p50 (median): half the requests are faster than this
- p95: 95% of requests are faster than this — the worst 5%
- p99: the worst 1%
The average deceives. If p50 = 100ms but p99 = 8s, your user experience has a serious problem masked by the average.
Maintenance windows: honest reporting
99.99% is honest if you count it right. If you take the API down every Wednesday at 3 AM for maintenance, that counts as downtime — unless you declared the window in advance.
Good monitoring lets you:
- Create a scheduled maintenance window
- Suppress alerts during the window
- Not count the window time against uptime%
- Show a "scheduled maintenance" banner on the status page
Without that, either you lie in the numbers or you pay in false alarms.
The right question isn't "what's my uptime"
It's: "when something broke in the last 90 days, did I know before the customer?"
An honest metric:
- How many incidents opened
- Mean time to detection (from breakage → first alert)
- Mean time to resolution (from detection → resolved)
- In how many cases the customer warned before the monitor
That last one is the real test. If you had 3 incidents this month and in 2 of them the customer complained first, your monitoring isn't doing the job — regardless of the aggregate number.
Practical conclusion
Before celebrating 99.9%:
- Monitor more than the home — add checkpoints for the critical flows (checkout, login, main API)
- Use a keyword check when the app serves 200 in degraded mode
- Look at p95 and p99 — not just status — to catch degradation
- Declare maintenance windows for honest uptime
- Track MTTD (time to detection) — not just uptime
Uptime is an easy number to publish. Real availability is harder to measure — and more important to deliver.
Sigue leyendo