Version: 1.21.x

Health

Probes are essential for maintaining the health and reliability of your containers. This guide covers the typical three types of probes: readinessProbe, livenessProbe, and startupProbe.

By appropriately configuring readinessProbe, livenessProbe, and startupProbe in your Kubernetes manifests, you can enhance the reliability and performance of your applications.

For Dossier Organizer we already provide default settings within the Helm values packaged with the Helm chart.

Configuration

To enable metrics, use the appropriate settings in the Helm values:

management:
  endpoints:
    web.exposure.include: health, prometheus, metrics
  prometheus:
    metrics:
      export:
        enabled: true
  jmx:
    metrics:
      export:
        enabled: true

This example enables the health, prometheus and metrics endpoints.

ReadinessProbe

The readinessProbe is responsible for determining if a container is ready to serve traffic. Kubernetes uses this probe to decide whether a pod should receive requests or not.

The readinessProbe configuration in the Kubernetes YAML file might look like this:

readinessProbe:
    httpGet:
        path: /actuator/health/readiness
        port: http
    initialDelaySeconds: 5  # time to wait before first probe
    periodSeconds: 15       # time to wait between probes
    timeoutSeconds: 15      # time to wait for a probe to complete
    successThreshold: 1     # number of successful probes before pod is considered healthy
    failureThreshold: 10    # number of failed probes before pod is considered unhealthy

In the default configuration provided, Kubernetes checks the /actuator/health/readiness endpoint to determine if the container is ready.

LivenessProbe

The livenessProbe checks whether the container is alive or needs to be restarted. If the probe fails, Kubernetes restarts the container.

The livenessProbe configuration in the Kubernetes YAML file might look like this:

livenessProbe:
    httpGet:
        path: /actuator/health/liveness
        port: http
    initialDelaySeconds: 5  # time to wait before first probe
    periodSeconds: 5        # time to wait between probes
    timeoutSeconds: 5       # time to wait for a probe to complete
    successThreshold: 1     # number of successful probes before pod is considered healthy
    failureThreshold: 20    # number of failed probes before pod is considered unhealthy

This ensures that if the application stops responding at the /actuator/health/liveness endpoint, Kubernetes will restart the container.

StartupProbe

The StartupProbe checks whether a pod starts at all. Only if this is the case is a LivenessProbe or ReadinessProbe performed. If the StartupProbe fails, the container is restarted.

startupProbe:
    httpGet:
        path: /actuator/health/liveness
        port: http
    initialDelaySeconds: 5 # time to wait before first probe
    periodSeconds: 3 # time to wait between probes
    timeoutSeconds: 5 # time to wait for a probe to complete
    successThreshold: 1 # number of successful probes before pod is considered healthy
    failureThreshold: 100 # number of failed probes before pod is considered unhealthy

This ensures Kubernetes waits for a successful response from the /actuator/health/liveness endpoint before considering the container as started

Troubleshooting health issues

When monitoring the health of microservices, especially in a pod-based architecture, it's important to understand how the health of individual components impacts the overall health status of the pod. The http://localhost:8080/actuator/health endpoint provides a JSON output that details the status of each component.

Here’s an example of such output with a failed component:

{
  "status": "DOWN",
  "components": {
    "db": {
      "status": "DOWN",
      "details": {
        "error": "org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection"
      }
    },
    "diskSpace": {
      "status": "UP",
    
    ...
    
    },
    
    ...
    
  },
  "groups": ["liveness", "readiness"]
}

Overall Health Status: The status field at the top level indicates the overall health of the pod. If any critical component is down, this status will be DOWN.
Component-Level Status: Each component has its own status. This helps in identifying which specific component is causing issues.
Error Details: When a component is down, additional details are often provided under details. For example, in the db component, the error Failed to obtain JDBC Connection clearly indicates a database connection issue.

Configuring ingress to restrict public access to the actuator endpoint

it's crucial to restrict public access to these endpoints to prevent unauthorized access to sensitive information. The following example shows how to restrict access to /actuator endpoint via ingress

  ingress:
    enabled: true
    className: "nginx"
    annotations:
      nginx.ingress.kubernetes.io/configuration-snippet: |
        server_tokens off;
        location /actuator {
          deny all;
          return 403;
        }

Configuration​

ReadinessProbe​

LivenessProbe​

StartupProbe​

Troubleshooting health issues​

Configuring ingress to restrict public access to the actuator endpoint​