If clients receive errors every time you scale-in,
it’s not a design problem, but a termination strategy problem.
>
>
The narrative of Graceful Shutdown, completing all tasks just before termination and safely concluding β the heart of a sailor tidying up and departing orderly amidst chaos.
π― What this article covers
- What happens to in-flight requests when a Pod terminates
- How SIGTERM and Graceful Shutdown work
- How to make termination safe with the terminationGracePeriodSeconds setting
- How to use preStop Hook
- Recommended timeout strategies in an MSA environment
π Introduction / Background
Kubernetes’ HPA (Horizontal Pod Autoscaler) automatically reduces the number of Pods when traffic decreases. This is an excellent feature.
However, a practical problem arises here.
“What happens to requests being processed by a Pod the moment it scales down?”
Without any handling, the following occurs:
- HPA decides to terminate a Pod
- The Pod is busy processing API requests
- The Pod is forcibly SIGKILLed and disappears immediately
- The client receives no response and the TCP connection is broken
- The client waits until timeout and then receives a Connection reset or 502/504 error
This article addresses how to solve this problem with a Graceful Shutdown strategy.

π Understanding Kubernetes Pod Termination Flow
When a Pod is deleted, Kubernetes operates in the following sequence:
1. kubectl delete pod / HPA scale-in λͺ
λ Ή
2. Pod μν β Terminating
3. Endpointμμ ν΄λΉ Pod IP μ κ±° (λ μ΄μ μ μμ²μ λ°μ§ μμ)
4. Containerμ SIGTERM μ νΈ μ μ‘
5. terminationGracePeriodSeconds λμ λκΈ° (κΈ°λ³Έ 30μ΄)
6. κΈ°κ° λ΄ μ’
λ£ μ λλ©΄ SIGKILLλ‘ κ°μ μ’
λ£
The key here is that steps 3 and 4 do not happen simultaneously.
There is a propagation delay of several seconds for kube-proxy and Ingress Controller to detect Endpoint changes. This means new requests can still come in even after SIGTERM is received.
π‘ Solution 1 β SIGTERM Handling + terminationGracePeriodSeconds
This is the most fundamental solution. When an application receives SIGTERM, it should not die immediately, but instead finish processing existing requests before shutting down.
Application-level handling (Node.js example)
const server = app.listen(3000);
process.on('SIGTERM', () => {
console.log('SIGTERM received. Closing server gracefully...');
// Do not accept new connections, wait for existing connections to complete processing
server.close(() => {
console.log('All connections closed. Exiting.');
process.exit(0);
});
// Safety net: Force shutdown after 25 seconds (shorter than gracePeriod)
setTimeout(() => {
console.error('Forced exit after timeout');
process.exit(1);
}, 25000);
});
Java Spring Boot example
With Spring Boot 2.3 or later, you can enable Graceful Shutdown with a single line of configuration.
# application.yaml
server:
shutdown: graceful # Default is immediate
spring:
lifecycle:
timeout-per-shutdown-phase: 30s
Kubernetes Deployment Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-api
spec:
template:
spec:
# Grace period until SIGKILL (default 30 seconds)
terminationGracePeriodSeconds: 60
containers:
- name: my-api
image: my-api:latest
lifecycle:
preStop:
exec:
# Sleep to cover Endpoint propagation delay
command: ["/bin/sh", "-c", "sleep 5"]
π‘ Role of preStop hook: It runs just before SIGTERM is sent. Adding a ‘sleep 5’ prevents new requests from coming in while kube-proxy removes the Endpoint. This may seem minor, but it’s a very effective pattern in practice.
π‘ Solution 2 β MSA Standard Pattern: Retry + Short Timeout
This is another recommended approach in an MSA (Microservices Architecture) environment.
While it’s important to make the server perfect, clients should also be designed to handle transient failures themselves.
The core principles are:
- Short Timeout: Detect failures quickly without waiting too long
- Retry with Backoff: Retry at intervals upon failure
- Circuit Breaker: Block requests on consecutive failures (e.g., Resilience4j, Istio)
// Resilience4j Retry configuration example
RetryConfig config = RetryConfig.custom()
.maxAttempts(3)
.waitDuration(Duration.ofMillis(500))
.retryOnException(e -> e instanceof ConnectException
|| e instanceof SocketTimeoutException)
.build();
The advantage of this approach is that it can handle various situations beyond just Pod termination, such as temporary network outages, redeployments, and node failures.
π» Recommended Combination in Practice
Combining both approaches results in the most stable configuration.
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # Server-side grace period
containers:
- name: my-api
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"] # Wait for Endpoint propagation
# Control traffic reception with Readiness Probe
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
And in the application:
- Create a /health/ready endpoint that returns 503 upon receiving SIGTERM
- Continue processing existing requests
- Terminate the process after all requests are completed
This creates a clean flow: readinessProbe fails β kube-proxy removes Endpoint β new requests are blocked β existing requests complete β Pod terminates.
β οΈ Cautions / Common Mistakes
π΄ Setting terminationGracePeriodSeconds too long
Setting it too long, like 600 seconds (10 minutes), drastically increases deployment time. Generally, 60-120 seconds is appropriate.
π΄ Relying only on SIGTERM without preStop sleep
Due to kube-proxy’s Endpoint propagation delay, new requests can still come in immediately after SIGTERM. A `preStop: sleep 5` is essential.
π΄ Beware of languages/frameworks without SIGTERM handlers
Some older frameworks or containers running shell scripts ignore SIGTERM by default. The process must be run as PID 1 using the `exec` command for SIGTERM to be delivered.
# Bad example β SIGTERM goes to sh, not delivered to app
CMD ["sh", "-c", "node server.js"]
# Correct example β node becomes PID 1 and receives SIGTERM directly
CMD ["node", "server.js"]
π΄ Long-polling / WebSocket require special handling
Unlike HTTP requests, persistent connections may not naturally terminate within the grace period. These connection types require explicit logic to close connections upon receiving SIGTERM.
β Summary / Conclusion
The problem of lost in-flight requests when a Pod terminates in Kubernetes is not a design flaw, but a configuration issue. Applying the following three points can cover most situations.
| Layer | Setting | Purpose |
| Kubernetes | terminationGracePeriodSeconds: 60 | Secure SIGKILL grace period |
| Kubernetes | preStop: sleep 5 | Wait for Endpoint propagation delay |
| Application | Implement SIGTERM handler | Complete existing requests then terminate |
| Client | Retry + Short Timeout | Automatic recovery from transient failures |
A system equipped with all four of these will virtually eliminate situations where clients receive errors during scale-in.
As a next step, introducing a Service Mesh like Istio or Linkerd can automatically handle all these patterns at the infrastructure level without application code changes.

Leave a Reply