From Crash to Calm: How to Debug and Resolve Redis Pod Issues

Fri, Apr 25, 2025

Introduction

In one of our high-traffic deployments, which sits at the intersection of web and AI use cases, we had a Redis cache pod shared between the web application and AI background workers. While this setup wasn’t ideal, it had worked well—until one day, we were paged: Redis was down‼️ When a Redis pod crashes or restarts frequently, it’s important to follow a step-by-step guide to diagnose and resolve the root cause effectively.

🔍 1. Check Pod Status & Events

kubectl describe pod <pod-name> -n <namespace>

Look for:

Status: OOMKilled, CrashLoopBackOff, etc.
Events: At the bottom — check for warnings, restarts, failed probes.

📄 2. View Pod Logs

kubectl logs <pod-name> -n <namespace> --previous

Useful to find:

Crash traces
Redis-specific errors (OOM command not allowed, MISCONF, etc.)
File system or permission issues

📈 3. Check Resource Limits

kubectl get pod <pod-name> -o jsonpath="{.spec.containers[*].resources}"

Look for memory limits that may be too low.
Redis is memory-intensive and may be OOMKilled if usage exceeds limits.

🧠 4. Log Into the Redis Pod

kubectl exec -it <pod-name> -n <namespace> -- sh
redis-cli

Once inside the Redis CLI, run the following diagnostics:

🔎 5. Redis Internal Diagnostics

➤ Memory Usage & Key Stats

INFO MEMORY
INFO KEYSPACE
redis-cli --bigkeys

INFO MEMORY: View used_memory, fragmentation, overheads.
INFO KEYSPACE: See number of keys in each DB.
--bigkeys: Find largest keys (can reveal memory hogs).

➤ Eviction Policy Check & Fix

Check current policy:

CONFIG GET maxmemory-policy

If it’s set to noeviction, Redis will crash on memory exhaustion. Update it:

CONFIG SET maxmemory-policy allkeys-lru

⚠️ This is a runtime change — to persist, update your Helm chart, ConfigMap, or redis.conf.

Check memory cap:

CONFIG GET maxmemory

Set memory cap (e.g., 512MB):

CONFIG SET maxmemory 536870912

⚠️ This is a runtime change — to persist, update your Helm chart, ConfigMap, or redis.conf.

➤ Additional Useful Commands

INFO clients          # Number of connected clients
INFO commandstats     # Command usage stats
MONITOR               # Live traffic (only use in dev/debug)

🧰 6. Investigate Node & Cluster Health

Sometimes the problem lies outside Redis:

Run kubectl top node / top pod to check for high memory or CPU.
Inspect node logs (journalctl, dmesg) for kernel-level OOM kills.
Look for disk pressure or pod evictions.

📦 7. Persistent Storage (if using AOF or RDB)

Confirm volume is mounted and writable.
Check for disk full conditions or slow I/O.

💾 8. Deployment Config

Liveness/readiness probes: Are they tuned correctly?
Volume mounts: Correct paths/permissions?
Any sidecars affecting Redis behavior?

# redis-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:latest
        ports:
        - containerPort: 6379

Conclusion

Diagnosing Redis pod crashes requires a methodical approach that covers pod status, logs, resource limits, Redis internal diagnostics, node health, persistent storage, and deployment configurations. Through this investigation, I identified key issues, including missing eviction policies in our cluster setup, which led to linearly increasing key counts and memory exhaustion, ultimately causing crashes. This experience also highlighted the importance of understanding usage patterns and configurations. Key takeaways include:

Monitor resource limits: Ensure Redis has sufficient memory and CPU resources.
Configure eviction policies: Set a suitable eviction policy to prevent crashes due to memory exhaustion.
Check node and cluster health: Identify potential issues outside of Redis.
Verify persistent storage: Confirm volume mounts and writable permissions.
Optimize deployment configurations: Tune liveness/readiness probes and volume mounts for optimal performance. By applying these best practices, you can minimize Redis crashes and ensure a more robust system.