What are Health Probes?

Health probes are a great way to ensure the stability of your Container Group. Salad supports three types of Health probes. Startup, Liveness, and Readiness. Health probes can be configured via the Salad Portal, or the API.

Three Types of Health Probes

Startup Probe

Confirm a node is ready to serve inference. A startup probe is especially useful for applications with long startup times or complex initialization sequences. If a node fails a startup probe, then Salad will reallocate the instance.

Liveness Probe

Ensure the application is running and responding promptly. Liveness probes run after the startup probe has completed successfully. If a node fails a liveness probe, then Salad will reallocate the instance.

Readiness Probe

Evaluate if the application running your container is ready to accept traffic. Readiness probes are only available if Container Gateway or Job Queues are enabled. If a node fails a readiness probe, then Salad will avoid routing traffic to that instance.

General Tips for Setting up Health Probes

Misconfigured Health probes can cause unnecessary termination of containers, leading to slower deployment and less reliable uptime. It's important to understand your containers' normal behavior on Salad Cloud to minimize false positives when configuring health probes. If your container often takes longer to start, adjust the thresholds to prevent premature reallocation of containers that could succeed.

Health Probe Sequence
- If a startup probe is defined, the readiness probe and liveness probe will not start until the startup probe passes.
Initialization Time
- Initial Delay Seconds is the number of seconds after the container starts before the probe will run. Make sure you give your application enough time to initialize or the probe will run prematurely.
Period Seconds
- Period Seconds is how often the probe will be run. Checking too frequently might add unnecessary load and instability to your container group.
Timeout Seconds
- Timeout Seconds tells the probe how long to wait before the probe times out. Make sure this is a reasonable amount of time so that transient network issues do not cause unnecessary time outs.
Success Threshold and Failure Threshold
- For startup and liveness probes, set a higher Failure Threshold to wait longer before failing the instance or stopping traffic
Failure Delay
- Unless a startup probe is successful, after initialDelaySeconds + (failureThreshold * periodSeconds) the container will fail.