Kubernetes

Kubernetes (also known as k8s or kube) is an open-source system for automating deployment, scaling, and management of containerized applications.

Basic terms:

  • Pod: Collection of one or more containers running on the same node with shared resources such as storage and IP addresses.
  • Deployment: One or more pods.
  • Service: Wire together pods by exposing deployments to each other. A service is basically a load balancer/reverse proxy to a set of pods using a selector and access policy. A service is normally named service-name.namespace:port. A service provides a permanent, internal host name for applications to use.
  • Operator: Manage application state and exposes interfaces to manage the application.

Architecture

A Kubernetes cluster is a set of nodes. Each node runs the kubelet agent to monitor pods, kube-proxy to maintain network rules, and a container runtime such as Docker, containerd, CRI-O, or any other Container Runtime Interface (CRI)-compliant runtime. Worker nodes run applications and master nodes manage the cluster.

Master Nodes

Master nodes collectively called the control plane administer the worker nodes. Each master node runs etcd for a highly-available key-value store of cluster data, cloud-controller-manager to interact with any underlying cloud infrastructure, kube-apiserver to expose APIs for the control plane, kube-scheduler for assigning pods to nodes, and kube-controller-manager to manage controllers (the last three may be called Master Services).

kubectl

kubectl is a command line interface to manage a Kubernetes cluster.

Links:

Cluster Context

kubectl may use multiple clusters. The available clusters may be shown with the following command and the current cluster is denoted with *:

$ kubectl config get-contexts
CURRENT   NAME                            CLUSTER         AUTHINFO                NAMESPACE
          default/c103-:30595/IAM#email   c103-:30595     IAM#email/c103-:30595   testodo4
*         docker-desktop                  docker-desktop  docker-desktop

The API endpoints may be displayed with:

$ kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'
Cluster name      Server
c103-:30595     https://c103-.com:30595
docker-desktop  https://kubernetes.docker.internal:6443
Change Cluster Context
$ kubectl config use-context docker-desktop
Switched to context "docker-desktop".
Delete Cluster Context
$ kubectl config delete-context docker-desktop
deleted context docker-desktop from ~/.kube/config
etcd

etcd stores the current and desired states of the cluster, role-based access control (RBAC) rules, application environment information, and non-application user data.

High Availability

Run at least 3 master nodes for high availability and size each appropriately.

Objects

Kubernetes Objects represent the intended state of system resources. Controllers act through resources to try to achieve the desired state. The spec property is the desired state and the status property is the object's current status.

Labels

Objects may have metadata key/value pair labels and objects may be grouped by label(s) using selectors.

Resources

Kubernetes Resources are API endpoints that store and control a collection of Kubernetes objects (e.g. pods). Common resources:

  • Deployment: Collections of pods. API
  • ReplicaSet: Ensure that a specified number of replicas of a pod are running but generally Deployments (that include a ReplicaSet) are directly used instead. API
  • StatefulSets: Deployment with stateful state. API
  • Service: Provides internal network access to a logical set of pods (Deployments or StatefulSets). API
  • Ingress: Provides external network access to a Service. Ingress is also called a Route. API
  • ConfigMap: Non-confidential key-value configuration pairs. API
  • Secret: Confidential key-value configuration pairs. API
  • PersistentVolume: Persistent storage. API
  • StorageClass: Groups storage by different classes of qualities-of-service and characteristics. API

List resource kinds

$ kubectl api-resources
NAME     SHORTNAMES   APIGROUP    NAMESPACED   KIND
pods     po                       true         Pod
[...]

Namespace

A namespace is a logical isolatuion unit or "project" to group objects/resources, policies to restrict users, constraints to enforce quotas through ResourceQuotas, and service accounts to automatically manage resources.

List namespaces

$ kubectl get namespaces --show-labels
NAME                   STATUS   AGE     LABELS
default                Active   8d      <none>
kube-node-lease        Active   8d      <none>
kube-public            Active   8d      <none>
kube-system            Active   8d      <none>
kubernetes-dashboard   Active   6d22h   <none>

Create namespace

kubectl create namespace testns1

Show current namespace (if any)

kubectl config view --minify | grep namespace

Change current namespace

kubectl config set-context --current --namespace=${NAMESPACE}

Reset to no namespace:

kubectl config set-context --current --namespace=

Nodes

List Nodes

$ kubectl get nodes -o wide
NAME             STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION     CONTAINER-RUNTIME
docker-desktop   Ready    master   8d    v1.19.7   192.168.65.4   <none>        Docker Desktop   5.10.25-linuxkit   docker://20.10.5

Controllers

Kubernetes Controllers run a reconciliation loop indefinitely while enabled and continuously attempt to control a set of resources to reach a desired state (e.g. minimum number of pods).

Deployments

Deployments define a collection of one or more pods and configure container templates with a name, image, resources, storage volumes, and health checks, as well as a deployment strategy for how to create/recreate a deployment, and triggers for when to do so.

List Deployments

$ kubectl get deployments --all-namespaces
NAMESPACE              NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
kube-system            coredns                     2/2     2            2           8d
kube-system            metrics-server              1/1     1            1           6d22h
kubernetes-dashboard   dashboard-metrics-scraper   1/1     1            1           6d22h
kubernetes-dashboard   kubernetes-dashboard        1/1     1            1           6d22h

Create Deployment

kubectl create deployment ${DEPLOYMENT} --image=${FROM} --namespace=${NAMESPACE}

For example:

kubectl create deployment liberty1 --image=icr.io/appcafe/websphere-liberty --namespace=testns1

List pods for a Deployment

kubectl get pods -l=app=${DEPLOYMENT} --namespace=${NAMESPACE}

For example:

kubectl get pods -l=app=liberty1 --namespace=testns1

With custom columns:

$ kubectl get pods -l=app=liberty1 -o=custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,STATUS:.status.phase,NODE:.spec.nodeName,STARTED:.status.startTime --namespace=testns1
NAME                       NAMESPACE   STATUS    NODE      STARTED
liberty1-585d8dfd6-2vb6c   testns1     Running   worker2   2022-04-25T18:34:58Z

Delete Deployment

kubectl delete deployment.apps/${DEPLOYMENT} --namespace=${NAMESPACE}

Scale Deployment

kubectl scale deployment ${DEPLOYMENT} --replicas=${PODS} --namespace=${NAMESPACE}
kubectl logs "--selector=app=${DEPLOYMENT}" --prefix=true --all-containers=true --namespace=${NAMESPACE}

Pods

Create, run, and remote into a new pod

kubectl run -i --tty fedora --image=fedora -- sh

Operators

Kubernetes Operators are Kubernetes native applications which are controller pods for custom resources (CRs) (normally a logical application) that interact with the API server to automate actions. Operators are based on a Custom Resource Definition (CRD).

OperatorHub is a public registry of operators.

Operator SDK is one way to build operators.

Operator logs

Find the operator's API resource:

$ kubectl api-resources | awk 'NR==1 || /containerdiagnostic/'
NAME                              SHORTNAMES   APIGROUP                       NAMESPACED   KIND
containerdiagnostics                           diagnostic.ibm.com             true         ContainerDiagnostic

Then find the pods for them:

$ kubectl get pods --all-namespaces | awk 'NR==1 || /containerdiag/'
NAMESPACE                      NAME                                                        READY   STATUS      RESTARTS   AGE
containerdiagoperator-system   containerdiagoperator-controller-manager-5976b5bb4c-2szb7   2/2     Running     0          19m

Print logs for the manager container:

$ kubectl logs containerdiagoperator-controller-manager-5976b5bb4c-2szb7 --namespace=containerdiagoperator-system --container=manager
[...]
2021-06-23T16:04:56.624Z    INFO    setup   starting manager

Operator Lifecycle Manager

The Operator Lifecycle Manager (OLM) may be used to install and manager operators in a Kubernetes cluster.

List operator catalogs
$ kubectl get catalogsource --all-namespaces
NAMESPACE               NAME                   DISPLAY                TYPE   PUBLISHER   AGE
openshift-marketplace   community-operators    Community Operators    grpc   Red Hat     69d
openshift-marketplace   certified-operators    Certified Operators    grpc   Red Hat     69d
openshift-marketplace   redhat-marketplace     Red Hat Marketplace    grpc   Red Hat     69d
openshift-marketplace   redhat-operators       Red Hat Operators      grpc   Red Hat     69d
openshift-marketplace   ibm-operator-catalog   IBM Operator Catalog   grpc   IBM         61d
List all operators
$ kubectl get packagemanifest --all-namespaces
NAMESPACE               NAME                               CATALOG                AGE
openshift-marketplace   ibm-spectrum-scale-csi-operator    Community Operators    69d
openshift-marketplace   syndesis                           Community Operators    69d
openshift-marketplace   openshift-nfd-operator             Community Operators    69d
[...]

Operator Catalogs

The most common operator catalogs are:

CPU and Memory Resource Limits

A container may be configured with CPU and/or memory resource requests and limits. A request is the minimum amount of a resource that is required by (and reserved for) a container and is used to decide if a node has sufficient capacity to start a new container. A limit puts a cap on a container's usage of that resource. If there are sufficient available resources, a container may use more than the requested amount of resource, up to the limit. If only a limit is specified, the request is set equal to the limit.

Therefore, if request is less than the limit, then the system may become overcommitted. For resources such as memory, this may lead to the Linux OOM Killer activating and killing processes with Killed in application logs and kernel: Memory cgroup out of memory: Killed process in node logs (e.g. oc debug node/$NODE -t followed by chroot /host journalctl).

CPU Resources

CPU resources are gauged in terms of a vCPU/core in cloud or a CPU hyperthread on bare metal. The m suffix means millicpu (or millicore), so 0.5 (or half) of one CPU is equivalent to 500m (or 500 millicpu), and CPU resources may be specified in either form (i.e. 0.5 or 500m) although the general recommendation is to use millicpu. CPU limits are evaluated every quota period per CPU and this defaults to 100ms.

For example, a CPU limit of 500m means that a container may use no more than half of 1 CPU in any 100ms period. Values larger than 1000m may be specified if there is more than one CPU. For details, review the Linux kernel CFS bandwidth control documentation.

Many recommend using CPU limits. If containers exhaust node CPU, the kubelet process may become resource starved and cause the node to enter the NotReady state. The throttling metric counts the number of times the CPU limit is exceeded. However, there have been cases of throttling occurring even when the limit is not hit, generally fixed in Linux kernel >= 4.14.154, 4.19.84, and 5.3.9 (see 1, 2, 3, and 4). One solution is to increase CPU requests and limits although this may reduce density on nodes. Some specify a CPU request but without a limit. Review additional OpenShift guidance on overcommit.

Memory Resources

Memory resources are gauged in terms of bytes. The suffixes K, M, G, etc. may be used for multiples of 1000, and the suffixes Ki, Mi, Gi, etc. may be used for multiples of 1024.

Events

View Latest Events

$ kubectl get events --all-namespaces
NAMESPACE              LAST SEEN   TYPE      REASON              OBJECT                                            MESSAGE
kube-system            7m8s        Normal    Scheduled           pod/metrics-server-6b5c979cf8-t8496               Successfully assigned kube-system/metrics-server-6b5c979cf8-t8496 to docker-desktop
kube-system            7m6s        Normal    Pulling             pod/metrics-server-6b5c979cf8-t8496               Pulling image "k8s.gcr.io/metrics-server/metrics-server:v0.4.3"
[...]

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) scales the number of Pods in a replication controller, deployment, replica set or stateful set based on metrics such as CPU utilization.

Day X

Day 1 activities generally include installation and configuration activities.

Day 2 activities generally include scaling up and down, reconfiguration, updates, backups, failovers, restores, etc.

In general, operators are used to implement day 1 and day 2 activities.

Pod Affinity

Example ensuring that not all pods run on the same node:

      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - myappname
              topologyKey: "kubernetes.io/hostname"

NodePorts

Set externalTrafficPolicy: Local on a kubernetes service so that NodePort won't open on every Node but only on the nodes where the pods are actually running.

Clustering

Jobs

A Job may be used to run one or more pods until a specified number have successfully completed. A CronJob is a Job on a repeating schedule. Note:

A Replication Controller manages Pods which are not expected to terminate (e.g. web servers), and a Job manages Pods that are expected to terminate (e.g. batch tasks).

List jobs

# kubectl get jobs -o wide
NAME        COMPLETIONS   DURATION   AGE   CONTAINERS           IMAGES                     SELECTOR
myjobname   1/1           5s         34s   myjobcontainername   kgibm/containerdiagsmall   controller-uid=5078824a-fad1-4961-af97-62d387ef2fc7

Create job

printf '{"apiVersion": "batch/v1","kind": "Job", "metadata": {"name": "%s"}, "spec": {"template": {"spec": {"restartPolicy": "Never", "containers": [{"name": "%s", "image": "%s", "command": %s}]}}}}' myjobname myjobcontainername kgibm/containerdiagsmall '["ls", "-l"]' | kubectl create -f -

Describe job

$ kubectl describe job myjobname
[...]
Start Time:     Wed, 23 Jun 2021 08:20:59 -0700
Completed At:   Wed, 23 Jun 2021 08:21:04 -0700
Duration:       5s
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
[...]
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  73s   job-controller  Created pod: myjobname-d9rr5
  Normal  Completed         68s   job-controller  Job completed
$ kubectl logs myjobname-d9rr5
total 60
lrwxrwxrwx   1 root root    7 Jan 26 06:05 bin -> usr/bin
[...]

DaemonSets

A DaemonSet may be used to run persistent pods on all or a subset of nodes. Note:

DaemonSets are similar to Deployments in that they both create Pods, and those Pods have processes which are not expected to terminate (e.g. web servers, storage servers). Use a Deployment for stateless services, like frontends, where scaling up and down the number of replicas and rolling out updates are more important than controlling exactly which host the Pod runs on. Use a DaemonSet when it is important that a copy of a Pod always run on all or certain hosts, and when it needs to start before other Pods.

Services

List Services

$ kubectl get services --all-namespaces
NAMESPACE              NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default                kubernetes                  ClusterIP   10.96.0.1        <none>        443/TCP                  8d
kube-system            kube-dns                    ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   8d
kube-system            metrics-server              ClusterIP   10.102.139.243   <none>        443/TCP                  6d23h
kubernetes-dashboard   dashboard-metrics-scraper   ClusterIP   10.107.135.44    <none>        8000/TCP                 7d
kubernetes-dashboard   kubernetes-dashboard        ClusterIP   10.97.139.73     <none>        443/TCP                  7d

Create Service

By default, services are exposed on ClusterIP which is internal to the cluster.

kubectl expose deployment ${DEPLOYMENT} --port=${EXTERNALPORT} --target-port=${PODPORT} --namespace=${NAMESPACE}

For example:

kubectl expose deployment liberty1 --port=80 --target-port=9080 --namespace=testns1

To expose a service on a NodePort (i.e. a random port between 30000-32767 on each node):

kubectl expose deployment liberty1 --port=80 --target-port=9080 --type=NodePort --namespace=testns1

Then, access the service at the LoadBalancer Ingress host on port NodePort:

$ kubectl describe services liberty1 --namespace=testns1
Name:                     liberty1
Namespace:                testns1
Labels:                   app=liberty1
Annotations:              <none>
Selector:                 app=liberty1
Type:                     NodePort
IP:                       10.107.0.163
LoadBalancer Ingress:     localhost
Port:                     <unset>  80/TCP
TargetPort:               9080/TCP
NodePort:                 <unset>  30187/TCP
Endpoints:                10.1.0.36:9080,10.1.0.37:9080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

For example:

$ curl -I http://localhost:30187/
HTTP/1.1 200 OK
[...]

Delete Service

kubectl delete service/${DEPLOYMENT} --namespace=${NAMESPACE}

Ingresses

An Ingress exposes services outside of the cluster network. Before creating an ingress, you must create at least one Ingress Controller to manage the ingress. By default, no ingress controller is installed. A commonly used ingress controller which is supported by Kubernetes is the nginx ingress controller.

Create nginx Ingress controller

  1. kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.46.0/deploy/static/provider/cloud/deploy.yaml
  2. kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=120s

See https://kubernetes.github.io/ingress-nginx/deploy/

Create Ingress

printf '{"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"name":"%s","annotations":{"nginx.ingress.kubernetes.io/rewrite-target":"/"}},"spec":{"rules":[{"http":{"paths":[{"path":"%s","pathType":"Prefix","backend":{"service":{"name":"%s","port":{"number":80}}}}]}}]}}' "${INGRESS}" "${PATH}" "${SERVICE}" | kubectl create -f - --namespace=${NAMESPACE}

For example:

printf '{"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"name":"%s","annotations":{"nginx.ingress.kubernetes.io/rewrite-target":"/"}},"spec":{"rules":[{"http":{"paths":[{"path":"%s","pathType":"Prefix","backend":{"service":{"name":"%s","port":{"number":80}}}}]}}]}}' "ingress1" "/" "liberty1" | kubectl create -f - --namespace=testns1

List Ingresses

$ kubectl get ingresses --all-namespaces
NAMESPACE   NAME       CLASS    HOSTS   ADDRESS     PORTS   AGE
testns1     ingress1   <none>   *       localhost   80      63s

Describe Ingress

$ kubectl describe ingress ${INGRESS} --namespace=${NAMESPACE}
Name:             ingress1
Namespace:        testns1
Address:          localhost
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host        Path  Backends
  ----        ----  --------
  *           
              /   liberty1:80 (10.1.0.44:9080,10.1.0.47:9080)
Annotations:  nginx.ingress.kubernetes.io/rewrite-target: /
Events:
  Type    Reason  Age                From                      Message
  ----    ------  ----               ----                      -------
  Normal  Sync    51s (x2 over 93s)  nginx-ingress-controller  Scheduled for sync

Delete Ingress

kubectl delete ingress/${INGRESS} --namespace=${NAMESPACE}

Authentication

Kubernetes authentication supports service accounts and normal users. Normal users are managed through external mechanisms rather than by Kubernetes itself:

It is assumed that a cluster-independent service manages normal users [...]

Kubernetes does not have objects which represent normal user accounts. Normal users cannot be added to a cluster through an API call.

[...] any user that presents a valid certificate signed by the cluster's certificate authority (CA) is considered authenticated.

[...] Kubernetes determines the username from the common name field in the 'subject' of the cert.

[...] client certificates can also indicate a user's group memberships using the certificate's organization fields. To include multiple group memberships for a user, include multiple organization fields in the certificate.

List Service Accounts

$ kubectl get serviceaccounts
NAME      SECRETS   AGE
default   1         136m

Retrieve Default Service Account Token

The default service account token may be retrieved:

$ TOKEN=$(kubectl get secrets -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}" | base64 --decode)
$ echo ${TOKEN}

This may be then used in an API request. For example:

$ curl -X GET https://kubernetes.docker.internal:6443/api --header "Authorization: Bearer ${TOKEN}" --insecure
{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
[...]

Role-Based Access Control

Role-Based Access Control (RBAC) implements authorization in Kubernetes. Roles are namespace-scoped and ClusterRoles are cluster-scoped. RoleBindings and ClusterRoleBindings attach users and/or groups to a set of Roles or ClusterRoles, respectively.

List Roles

$ kubectl get roles --all-namespaces              
NAMESPACE     NAME                                    CREATED AT
kube-public   system:controller:bootstrap-signer      2021-04-27T15:24:35Z
[...]

List Role Bindings

$ kubectl get rolebindings --all-namespaces
NAMESPACE     NAME                                    ROLE                                        AGE
kube-public   system:controller:bootstrap-signer      Role/system:controller:bootstrap-signer     138m
[...]

List Cluster Roles

$ kubectl get clusterroles
NAME                     CREATED AT
admin                    2021-04-27T15:24:34Z
cluster-admin            2021-04-27T15:24:34Z
edit                     2021-04-27T15:24:34Z
system:basic-user        2021-04-27T15:24:34Z
[...]

List Cluster Role Bindings

$ kubectl get clusterrolebindings
NAME                ROLE                              AGE
cluster-admin       ClusterRole/cluster-admin         135m
[...]

Monitoring

Show CPU and memory usage:

kubectl top pods --all-namespaces
kubectl top pods --containers --all-namespaces
kubectl top nodes

Tekton Pipelines

Tekton pipelines describes CI/CD pipelines as code using Kubernetes custom resources. Terms:

  • Task: set of sequential steps
  • Pipeline: set of sequential tasks

Technologies such as OpenShift Pipelines, Jenkins, JenkinsX, etc. use Tekton to implement their CI/CD workflow on top of Kubernetes.

Appsody

Appsody was a way to create application stacks using predefined templates. It has been superceded by OpenShift do (odo).

Helm

Helm groups together YAML templates that define a logical application release and its required Kubernetes resources using helm charts.

Common commands

  • Show Helm CLI version: helm version
  • Show available options: helm show values .
  • Install a chart: helm install $NAME .
  • List installed charts: helm ls
  • Upgrade a chart: helm upgrade $NAME .
  • Rollback an upgrade: helm rollback $NAME 1

Kubernetes Dashboard

Kubernetes Dashboard is a simple web interface for Kubernetes. Example installation:

  1. kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml
  2. kubectl proxy
  3. Open http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
  4. Use a login token such as the default service account token
  5. Change the namespace at the top as needed and explore.

To delete the dashboard, use the same YAML as above: kubectl delete -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml

Kubernetes Metrics Server

Kubernetes Metrics Server provides basic container resource metrics for consumers such as Kubernetes Dashboard. Example installation:

  1. kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
  2. For a development installation, allow insecure certificates: kubectl patch deployment metrics-server -n kube-system --type 'json' -p '[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
  3. If using Kubernetes Dashboard, refresh the Pods view after a few minutes to see an overall CPU usage graph if it works.

To delete the metrics-server, use the same YAML as above: kubectl delete -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Knative

Knative helps deploy and manage serverless workloads.