Kubernetes etcd Issues Recipe

  1. Search etcd logs (/var/logs/pods/*etcd*/etcd/*):
    1. leader failed to send out heartbeat on time; took too long, leader is overloaded likely from slow disk
    2. Compactions greater than 100ms
      grep -r "finished scheduled compaction" /var/log/pods/*etcd*/etcd/* | awk -F\" '$(NF-1) ~ /[0-9]s/ || $(NF-1) ~ /[1-9][0-9][0-9]/ {print $(NF-1);}' | sort -nr | head
  2. Consider if defragmentation is necessary which may reduce memory usage of kube-apiserver and etcd:
    1. Kubernetes defragmentation documentation
    2. OpenShift defragmentation documentation