Health

Beyond “Up” or “Down”: Understanding Application Health

Is your application healthy? This seems like a simple question, but the answer can be complex. Is a slow API healthy? Is an API that’s logging a high rate of errors healthy? To truly understand the state of a modern application, you need more than a simple ping; you need observability.

Observability is about being able to ask any question about your system’s state just by observing its outputs. SliceFlow is built with this principle at its core, providing a powerful, pre-configured stack that gives you deep insights into your application’s behavior. This system is built on three pillars:

Logs: What happened? A detailed, event-by-event record of everything your application does. (Covered in detail in the Logging documentation).
Metrics: What is the overall status? High-level, aggregated data about your application’s performance, such as request rates, error percentages, CPU usage, and memory consumption.
Traces: What is the story of a single request? A detailed, step-by-step visualization of a request’s journey as it travels through your API and interacts with databases, caches, and other services.

The SliceFlow Observability Stack: A Complete Toolkit

Your SliceFlow development environment, powered by Docker, comes with a complete, integrated observability suite to cover these three pillars. You don’t need to set anything up; it’s all wired together and ready to go.

Prometheus: The time-series database. Your SliceFlow application exposes a /metrics endpoint, and the prometheus container continuously “scrapes” this endpoint to collect and store all your key performance indicators.
Jaeger: The distributed tracing system. Your application sends detailed trace data for every request to the jaeger container, which allows you to visualize the entire lifecycle of a request.
Loki: The log aggregation engine. All your application logs are sent to the loki container, making them centrally searchable.
Grafana: The unified dashboard. This is your single pane of glass. The grafana container is pre-configured with data sources for Prometheus, Jaeger, and Loki, allowing you to build dashboards that correlate metrics, traces, and logs all in one place. You can access it at http://localhost:3001.

OpenTelemetry: The Engine Under the Hood

The magic that makes this all possible is OpenTelemetry (OTel). OTel is a vendor-neutral, open-source standard for instrumenting applications to generate telemetry data (metrics, logs, and traces).

SliceFlow comes with a rich OpenTelemetry configuration (ConfigureHealth in Program.cs) that automatically instruments the most important parts of your application:

ASP.NET Core: Captures metrics and traces for every incoming request.
HttpClient: Traces outgoing HTTP calls to other services.
Entity Framework Core: Measures and traces database queries.

This means you get a wealth of information about your application’s performance with zero manual effort. The application knows where to send this data because of a simple setting in appsettings.json:

{
  "OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317"
}

This tells your application to export its trace data to the Jaeger container, which is listening on port 4317.

Health Checks: The Simple Handshake

While observability gives you deep insights, sometimes you need a simple, direct answer to the question, “Are you okay?”. This is the job of health check endpoints. These are simple, lightweight endpoints that external tools (like Docker’s health checker or a Kubernetes probe) can call to determine the status of your service.

SliceFlow configures two standard health check endpoints for you in development:

/health (Readiness): This endpoint checks the application itself and its critical dependencies. A “Healthy” status here means “I am running and I am ready to accept traffic.”
/alive (Liveness): This is a simpler check that only confirms the application process itself is running and responsive. A “Healthy” status means “I am alive.”

You can see this in action within the docker-compose.yml file, where the zitadel service has a healthcheck that waits for the postgres service to be healthy before it starts. This is a crucial pattern for ensuring a stable startup order in a complex system.

By combining detailed observability with simple, actionable health checks, SliceFlow gives you a complete, production-grade solution for monitoring and understanding your application, from local development all the way to a production deployment.