From Crash to Calm: How to Debug and Resolve Redis Pod Issues

Introduction

In one of our high-traffic deployments, which sits at the intersection of web and AI use cases, we had a Redis cache pod shared between the web application and AI background workers. While this setup wasn’t ideal, it had worked well—until one day, we were paged: Redis was down‼️ When a Redis pod crashes or restarts frequently, it’s important to follow a step-by-step guide to diagnose and resolve the root cause effectively.


🔍 1. Check Pod Status & Events

kubectl describe pod <pod-name> -n <namespace>

Look for:


📄 2. View Pod Logs

kubectl logs <pod-name> -n <namespace> --previous

Useful to find:


📈 3. Check Resource Limits

kubectl get pod <pod-name> -o jsonpath="{.spec.containers[*].resources}"

🧠 4. Log Into the Redis Pod

kubectl exec -it <pod-name> -n <namespace> -- sh
redis-cli

Once inside the Redis CLI, run the following diagnostics:


🔎 5. Redis Internal Diagnostics

➤ Memory Usage & Key Stats

INFO MEMORY
INFO KEYSPACE
redis-cli --bigkeys

➤ Eviction Policy Check & Fix

Check current policy:

CONFIG GET maxmemory-policy

If it’s set to noeviction, Redis will crash on memory exhaustion. Update it:

CONFIG SET maxmemory-policy allkeys-lru

⚠️ This is a runtime change — to persist, update your Helm chart, ConfigMap, or redis.conf.

Check memory cap:

CONFIG GET maxmemory

Set memory cap (e.g., 512MB):

CONFIG SET maxmemory 536870912

⚠️ This is a runtime change — to persist, update your Helm chart, ConfigMap, or redis.conf.

➤ Additional Useful Commands

INFO clients          # Number of connected clients
INFO commandstats     # Command usage stats
MONITOR               # Live traffic (only use in dev/debug)

🧰 6. Investigate Node & Cluster Health

Sometimes the problem lies outside Redis:


📦 7. Persistent Storage (if using AOF or RDB)


💾 8. Deployment Config

# redis-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:latest
        ports:
        - containerPort: 6379

Conclusion

Diagnosing Redis pod crashes requires a methodical approach that covers pod status, logs, resource limits, Redis internal diagnostics, node health, persistent storage, and deployment configurations. Through this investigation, I identified key issues, including missing eviction policies in our cluster setup, which led to linearly increasing key counts and memory exhaustion, ultimately causing crashes. This experience also highlighted the importance of understanding usage patterns and configurations. Key takeaways include:

Resources

Comments

comments powered by Disqus