Observability
Observability is the property of a system that lets engineers understand its internal state from external outputs — answering questions about how the system is behaving without modifying it. Modern observability is built on three pillars: structured logs (what happened), metrics (how much, how often), and distributed traces (request paths across services).
The distinction from monitoring is real but subtle: monitoring tells you a known problem is occurring (a pre-built dashboard, an alert on a defined threshold); observability lets you answer questions you didn't anticipate ahead of time. Monitoring is for known unknowns; observability is for unknown unknowns. The shift from monitoring to observability matters most in distributed systems where the failure modes don't all fit on a pre-built dashboard — 'why is checkout slow for some users in Brazil?' requires being able to slice traces by region, endpoint, customer, and request type, on-the-fly. The OpenTelemetry project (CNCF) is becoming the industry-standard instrumentation layer.
Related terms
- Distributed tracing
Distributed tracing records the path of a single request as it traverses multiple services, producing a tree-like view of every span — a unit of work in a single service — with timings, parent-child relationships, and metadata.
- OpenTelemetry
OpenTelemetry (OTel) is the CNCF observability standard for instrumenting software to emit traces, metrics, and logs in a vendor-neutral format.
- SLO
A Service-Level Objective is a target reliability metric for a service — typically expressed as a percentage over a time window.