From Crash to Calm: How to Debug and Resolve Redis Pod Issues
Introduction
In one of our high-traffic deployments, which sits at the intersection of web and AI use cases, we had a Redis cache pod shared between the web application and AI background workers. While this setup wasn’t ideal, it had worked well—until one day, we were paged: Redis was down‼️ When a Redis pod crashes or restarts frequently, it’s important to follow a step-by-step guide to diagnose and resolve the root cause effectively.
🔍 1. Check Pod Status & Events
kubectl describe pod <pod-name> -n <namespace>
Look for:
- Status:
OOMKilled
,CrashLoopBackOff
, etc. - Events: At the bottom — check for warnings, restarts, failed probes.
📄 2. View Pod Logs
kubectl logs <pod-name> -n <namespace> --previous
Useful to find:
- Crash traces
- Redis-specific errors (
OOM command not allowed
,MISCONF
, etc.) - File system or permission issues
📈 3. Check Resource Limits
kubectl get pod <pod-name> -o jsonpath="{.spec.containers[*].resources}"
- Look for memory limits that may be too low.
- Redis is memory-intensive and may be OOMKilled if usage exceeds limits.
🧠 4. Log Into the Redis Pod
kubectl exec -it <pod-name> -n <namespace> -- sh
redis-cli
Once inside the Redis CLI, run the following diagnostics:
🔎 5. Redis Internal Diagnostics
➤ Memory Usage & Key Stats
INFO MEMORY
INFO KEYSPACE
redis-cli --bigkeys
INFO MEMORY
: Viewused_memory
, fragmentation, overheads.INFO KEYSPACE
: See number of keys in each DB.--bigkeys
: Find largest keys (can reveal memory hogs).
➤ Eviction Policy Check & Fix
Check current policy:
CONFIG GET maxmemory-policy
If it’s set to noeviction
, Redis will crash on memory exhaustion. Update it:
CONFIG SET maxmemory-policy allkeys-lru
⚠️ This is a runtime change — to persist, update your Helm chart, ConfigMap, or redis.conf.
Check memory cap:
CONFIG GET maxmemory
Set memory cap (e.g., 512MB):
CONFIG SET maxmemory 536870912
⚠️ This is a runtime change — to persist, update your Helm chart, ConfigMap, or redis.conf.
➤ Additional Useful Commands
INFO clients # Number of connected clients
INFO commandstats # Command usage stats
MONITOR # Live traffic (only use in dev/debug)
🧰 6. Investigate Node & Cluster Health
Sometimes the problem lies outside Redis:
- Run
kubectl top node
/top pod
to check for high memory or CPU. - Inspect node logs (
journalctl
,dmesg
) for kernel-level OOM kills. - Look for disk pressure or pod evictions.
📦 7. Persistent Storage (if using AOF or RDB)
- Confirm volume is mounted and writable.
- Check for disk full conditions or slow I/O.
💾 8. Deployment Config
- Liveness/readiness probes: Are they tuned correctly?
- Volume mounts: Correct paths/permissions?
- Any sidecars affecting Redis behavior?
# redis-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:latest
ports:
- containerPort: 6379
Conclusion
Diagnosing Redis pod crashes requires a methodical approach that covers pod status, logs, resource limits, Redis internal diagnostics, node health, persistent storage, and deployment configurations. Through this investigation, I identified key issues, including missing eviction policies in our cluster setup, which led to linearly increasing key counts and memory exhaustion, ultimately causing crashes. This experience also highlighted the importance of understanding usage patterns and configurations. Key takeaways include:
- Monitor resource limits: Ensure Redis has sufficient memory and CPU resources.
- Configure eviction policies: Set a suitable eviction policy to prevent crashes due to memory exhaustion.
- Check node and cluster health: Identify potential issues outside of Redis.
- Verify persistent storage: Confirm volume mounts and writable permissions.
- Optimize deployment configurations: Tune liveness/readiness probes and volume mounts for optimal performance. By applying these best practices, you can minimize Redis crashes and ensure a more robust system.