Health
Probes are essential for maintaining the health and reliability of your containers. This guide covers the typical three types of probes: readinessProbe, livenessProbe, and startupProbe.
By appropriately configuring readinessProbe, livenessProbe, and startupProbe in your Kubernetes manifests, you can enhance the reliability and performance of your applications.
For Output Organizer we already provide default settings within the Helm values packaged with the Helm chart.
Configuration
To enable metrics, use the appropriate settings in the Helm values:
management:
endpoints:
web.exposure.include: health, prometheus, metrics
prometheus:
metrics:
export:
enabled: true
jmx:
metrics:
export:
enabled: true
This example enables the health, prometheus and metrics endpoints.
ReadinessProbe
The readinessProbe is responsible for determining if a container is ready to serve traffic. Kubernetes uses this probe to decide whether a pod should receive requests or not.
The readinessProbe configuration in the Kubernetes YAML file might look like this:
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: http
initialDelaySeconds: 5 # time to wait before first probe
periodSeconds: 15 # time to wait between probes
timeoutSeconds: 15 # time to wait for a probe to complete
successThreshold: 1 # number of successful probes before pod is considered healthy
failureThreshold: 10 # number of failed probes before pod is considered unhealthy
In the default configuration provided, Kubernetes checks the /actuator/health/readiness endpoint to determine if the container is ready.
LivenessProbe
The livenessProbe checks whether the container is alive or needs to be restarted. If the probe fails, Kubernetes restarts the container.
The livenessProbe configuration in the Kubernetes YAML file might look like this:
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: http
initialDelaySeconds: 5 # time to wait before first probe
periodSeconds: 5 # time to wait between probes
timeoutSeconds: 5 # time to wait for a probe to complete
successThreshold: 1 # number of successful probes before pod is considered healthy
failureThreshold: 20 # number of failed probes before pod is considered unhealthy
This ensures that if the application stops responding at the /actuator/health/liveness endpoint, Kubernetes will restart the container.
StartupProbe
The StartupProbe checks whether a pod starts at all. Only if this is the case is a LivenessProbe or ReadinessProbe performed. If the StartupProbe fails, the container is restarted.
startupProbe:
httpGet:
path: /actuator/health/liveness
port: http
initialDelaySeconds: 5 # time to wait before first probe
periodSeconds: 3 # time to wait between probes
timeoutSeconds: 5 # time to wait for a probe to complete
successThreshold: 1 # number of successful probes before pod is considered healthy
failureThreshold: 100 # number of failed probes before pod is considered unhealthy
This ensures Kubernetes waits for a successful response from the /actuator/health/liveness endpoint before considering the container as started
Troubleshooting health issues
When monitoring the health of microservices, especially in a pod-based architecture, it's important to understand how the health of individual components impacts the overall health status of the pod. The http://localhost:8080/actuator/health endpoint provides a JSON output that details the status of each component.
Here’s an example of such output with a failed component:
{
"status": "DOWN",
"components": {
"db": {
"status": "DOWN",
"details": {
"error": "org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection"
}
},
"diskSpace": {
"status": "UP",
...
},
...
},
"groups": ["liveness", "readiness"]
}
- Overall Health Status: The status field at the top level indicates the overall health of the pod. If any critical component is down, this status will be DOWN.
- Component-Level Status: Each component has its own status. This helps in identifying which specific component is causing issues.
- Error Details: When a component is down, additional details are often provided under details. For example, in the db component, the error Failed to obtain JDBC Connection clearly indicates a database connection issue.
Configuring ingress to restrict public access to the actuator endpoint
it's crucial to restrict public access to these endpoints to prevent unauthorized access to sensitive information. The following example shows how to restrict access to /actuator endpoint via ingress
ingress:
enabled: true
className: "nginx"
annotations:
nginx.ingress.kubernetes.io/configuration-snippet: |
server_tokens off;
location /actuator {
deny all;
return 403;
}