Illumina Innovates with Rancher and Kubernetes
The commands/steps listed on this page can be used to check the most important Kubernetes resources and apply to Rancher Launched Kubernetes clusters.
Make sure you configured the correct kubeconfig (for example, export KUBECONFIG=$PWD/kube_config_rancher-cluster.yml for Rancher HA) or are using the embedded kubectl via the UI.
export KUBECONFIG=$PWD/kube_config_rancher-cluster.yml
Run the command below and check the following:
kubelet
docker logs kubelet
kubectl get nodes -o wide
Example output:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME controlplane-0 Ready controlplane 31m v1.13.5 138.68.188.91 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5 etcd-0 Ready etcd 31m v1.13.5 138.68.180.33 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5 worker-0 Ready worker 30m v1.13.5 139.59.179.88 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5
Run the command below to list nodes with Node Conditions
kubectl get nodes -o go-template='{{range .items}}{{$node := .}}{{range .status.conditions}}{{$node.metadata.name}}{{": "}}{{.type}}{{":"}}{{.status}}{{"\n"}}{{end}}{{end}}'
Run the command below to list nodes with Node Conditions that are active that could prevent normal operation.
kubectl get nodes -o go-template='{{range .items}}{{$node := .}}{{range .status.conditions}}{{if ne .type "Ready"}}{{if eq .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{":"}}{{.status}}{{"\n"}}{{end}}{{else}}{{if ne .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{": "}}{{.status}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}'
worker-0: DiskPressure:True
The leader is determined by a leader election process. After the leader has been determined, the leader (holderIdentity) is saved in the kube-controller-manager endpoint (in this example, controlplane-0).
holderIdentity
kube-controller-manager
controlplane-0
kubectl -n kube-system get endpoints kube-controller-manager -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}' {"holderIdentity":"controlplane-0_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","leaseDurationSeconds":15,"acquireTime":"2018-12-27T08:59:45Z","renewTime":"2018-12-27T09:44:57Z","leaderTransitions":0}>
The leader is determined by a leader election process. After the leader has been determined, the leader (holderIdentity) is saved in the kube-scheduler endpoint (in this example, controlplane-0).
kube-scheduler
kubectl -n kube-system get endpoints kube-scheduler -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}' {"holderIdentity":"controlplane-0_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","leaseDurationSeconds":15,"acquireTime":"2018-12-27T08:59:45Z","renewTime":"2018-12-27T09:44:57Z","leaderTransitions":0}>
The default Ingress Controller is NGINX and is deployed as a DaemonSet in the ingress-nginx namespace. The pods are only scheduled to nodes with the worker role.
ingress-nginx
worker
Check if the pods are running on all nodes:
kubectl -n ingress-nginx get pods -o wide
kubectl -n ingress-nginx get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE default-http-backend-797c5bc547-kwwlq 1/1 Running 0 17m x.x.x.x worker-1 nginx-ingress-controller-4qd64 1/1 Running 0 14m x.x.x.x worker-1 nginx-ingress-controller-8wxhm 1/1 Running 0 13m x.x.x.x worker-0
If a pod is unable to run (Status is not Running, Ready status is not showing 1/1 or you see a high count of Restarts), check the pod details, logs and namespace events.
1/1
kubectl -n ingress-nginx describe pods -l app=ingress-nginx
kubectl -n ingress-nginx logs -l app=ingress-nginx
kubectl -n ingress-nginx get events
Communication to the cluster (Kubernetes API via cattle-cluster-agent) and communication to the nodes (cluster provisioning via cattle-node-agent) is done through Rancher agents.
cattle-cluster-agent
cattle-node-agent
Check if the cattle-node-agent pods are present on each node, have status Running and don’t have a high count of Restarts:
kubectl -n cattle-system get pods -l app=cattle-agent -o wide
NAME READY STATUS RESTARTS AGE IP NODE cattle-node-agent-4gc2p 1/1 Running 0 2h x.x.x.x worker-1 cattle-node-agent-8cxkk 1/1 Running 0 2h x.x.x.x etcd-1 cattle-node-agent-kzrlg 1/1 Running 0 2h x.x.x.x etcd-0 cattle-node-agent-nclz9 1/1 Running 0 2h x.x.x.x controlplane-0 cattle-node-agent-pwxp7 1/1 Running 0 2h x.x.x.x worker-0 cattle-node-agent-t5484 1/1 Running 0 2h x.x.x.x controlplane-1 cattle-node-agent-t8mtz 1/1 Running 0 2h x.x.x.x etcd-2
Check logging of a specific cattle-node-agent pod or all cattle-node-agent pods:
kubectl -n cattle-system logs -l app=cattle-agent
Check if the cattle-cluster-agent pod is present in the cluster, has status Running and doesn’t have a high count of Restarts:
kubectl -n cattle-system get pods -l app=cattle-cluster-agent -o wide
NAME READY STATUS RESTARTS AGE IP NODE cattle-cluster-agent-54d7c6c54d-ht9h4 1/1 Running 0 2h x.x.x.x worker-1
Check logging of cattle-cluster-agent pod:
kubectl -n cattle-system logs -l app=cattle-cluster-agent
To check, run the command:
kubectl get pods --all-namespaces
If a pod is not in Running state, you can dig into the root cause by running:
kubectl describe pod POD_NAME -n NAMESPACE
kubectl logs POD_NAME -n NAMESPACE
If a job is not in Completed state, you can dig into the root cause by running:
kubectl describe job JOB_NAME -n NAMESPACE
kubectl logs -l job-name=JOB_NAME -n NAMESPACE
Pods can be evicted based on eviction signals.
Retrieve a list of evicted pods (podname and namespace):
kubectl get pods --all-namespaces -o go-template='{{range .items}}{{if eq .status.phase "Failed"}}{{if eq .status.reason "Evicted"}}{{.metadata.name}}{{" "}}{{.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}'
To delete all evicted pods:
kubectl get pods --all-namespaces -o go-template='{{range .items}}{{if eq .status.phase "Failed"}}{{if eq .status.reason "Evicted"}}{{.metadata.name}}{{" "}}{{.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}' | while read epod enamespace; do kubectl -n $enamespace delete pod $epod; done
Retrieve a list of evicted pods, scheduled node and the reason:
kubectl get pods --all-namespaces -o go-template='{{range .items}}{{if eq .status.phase "Failed"}}{{if eq .status.reason "Evicted"}}{{.metadata.name}}{{" "}}{{.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}' | while read epod enamespace; do kubectl -n $enamespace get pod $epod -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,MSG:.status.message; done