Skip to main content

Kubernetes-101: Volumes, part 3

·1883 words·9 mins
Kubernetes - This article is part of a series.
Part 10: This Article

The Container Storage Interface (CSI) is a specification that is implemented in Kubernetes. CSI is a standard for exposing arbitrary file and block storage systems to containerized workloads in Kubernetes (or similar container orchestrators).

There are a large number of CSI-drivers for various storage systems available1. To make this article actionable I will concentrate on a single CSI-diver and go through a full example of it. How to install each CSI-driver into your cluster, and how to use it, is different for different clusters and drivers. So it is important to read the documentation for the CSI-driver you want to use.

In the rest of this article I will go beyond the local Minikube cluster I have been using so far in this series. Instead I will set up an Elastic Kubernetes Service (EKS) cluster on Amazon Web Services (AWS)2. A real cluster! We are finally getting somewhere!

Working with a CSI-driver
#

The general workflow when working with a CSI-driver is:

  1. Prepare your cluster
  2. Install the CSI-driver into your cluster
  3. Create PersistentVolumes
  4. Create PersistentVolumeClaims

What exactly the first two steps in this list include depends on the CSI-driver you select. In the example I will show the first two steps are very easy and we will hardly notice them.

Create an AWS EKS Kubernetes cluster
#


Warning, if you follow the steps outlined below you will be charged a small amount of money! How much money depends on how long you keep the cluster running. You will pay for both the EKS-cluster with associated EC2-instances, and the EBS-volume that will be the backing media for the PersistentVolume we create.


This article is not a tutorial on how to work with AWS. To follow along the steps in this article you must have an AWS account. You will also need to install and configure the AWS CLI. The required steps to do this is documented in the official AWS documentation: installation and configuration.

Install prerequisites
#

AWS, together with Weaveworks, have created a tool called eksctl that simplifies the creation of Kubernetes clusters (EKS) on AWS. This tool can be install (on mac) with Homebrew:

$ brew tap weaveworks/tap
$ brew install weaveworks/tap/eksctl

The first command (brew tap weaveworks/tap) installs the Weaveworks Brew repository. The second command (brew install weaveworks/tap/eksctl) installs the eksctl CLI tool from the Weaveworks repository. In the following sections I will use eksctl for various tasks, the full documentation for this tool can be found at eksctl.io.

Creating the cluster
#

eksctl allows me to declaratively define what type of Kubernetes cluster I want to create using a custom YAML schema that looks similar to a regular Kubernetes manifest. In eksctl terms this file is called a config file. This is the config file for my Kubernetes cluster:

# cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: cluster
  region: eu-north-1

iam:
  withOIDC: true
  serviceAccounts:
    - metadata:
        name: ebs-csi-controller-sa
        namespace: kube-system
      wellKnownPolicies:
        ebsCSIController: true

nodeGroups:
  - name: nodegroup-1
    instanceType: m5.large
    desiredCapacity: 3
    iam:
      withAddonPolicies:
        ebs: true

In cluster.yaml I define settings for my cluster named cluster (not very original). I say that I want my cluster to be created in the Swedish region (eu-north-1). I add a single node group to my cluster. A node group is a collection of nodes (virtual machines) with a given specification. I configure my cluster to have the necessary Service Accounts3 and IAM-policies needed to work with the EBS CSI-driver. The details of this are beyond the scope of this article.

I create the cluster from my cluster config file with the following command:

$ eksctl create cluster -f cluster.yaml

Creating an EKS-cluster is time-consuming, so you will need to wait 15-20 minutes for this operation to complete.

Authenticate to the cluster
#

Now I have a Kubernetes cluster in the form of EKS.

To be able to communicate with my EKS-cluster I need to configure a kubeconfig file. We have not discussed kubeconfig files so far in this Kubernetes-101 series, and we will not really do so in this article either.

For now, just know that to be able to communicate with a given cluster, we need some form of credentials. These credentials are stored in files referred to as kubeconfig files, we will go through them in more details in a future article.

As it turns out, eksctl automatically sets up a kubeconfig for us in /Users/<username>/.kube/config (on mac) and activates it for us, so we can immediately start working with our cluster.

Installing the CSI-driver
#

After creating my cluster with eksctl I have completed the first step of my four-step guide to CSI-drivers.

I need to install my CSI-driver into my cluster. There are instructions available at the EBS CSI-driver GitHub page. The essential step in the instructions is that I should run kubectl apply with the manifests that they provide:

$ kubectl apply \
    -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.14"

To make sure that the CSI-driver has been installed I list the running Pods in the kube-system Namespace:

$ kubectl get pods -n kube-system

NAME                                 READY   STATUS    RESTARTS   AGE
aws-node-675g8                       1/1     Running   0          20m
aws-node-f5dg8                       1/1     Running   0          20m
aws-node-k46g7                       1/1     Running   0          20m
coredns-d5b9bfc4-8dfzb               1/1     Running   0          32m
coredns-d5b9bfc4-bjqcj               1/1     Running   0          32m
ebs-csi-controller-988ff97c9-lwlhf   6/6     Running   0           1m
ebs-csi-controller-988ff97c9-n72mp   6/6     Running   0           1m
ebs-csi-node-bsvl9                   3/3     Running   0           1m
ebs-csi-node-sqlvc                   3/3     Running   0           1m
ebs-csi-node-ssqtf                   3/3     Running   0           1m
kube-proxy-2n4rq                     1/1     Running   0          20m
kube-proxy-jmxmd                     1/1     Running   0          20m
kube-proxy-klhvf                     1/1     Running   0          20m

I can see two ebs-csi-controller-... Pods and three ebs-csi-node-... Pods. That completes step two of my four-step guide.

Using the CSI-driver
#

Now I want to use the CSI-driver.

In the previous article on PersistentVolumes and PersistentVolumeClaims we learned that as Kubernetes administrators the first step is to create a PersistentVolume that our Kubernetes application developers later can claim through a PersistentVolumeClaim. In this section I will complete steps three and four in the four-step guide to CSI-drivers!

For our Kubernetes administrators to be able to use the CSI-driver for EBS-volumes I must first create an EBS-volume. For this I use the AWS CLI aws ec2 create-volume command:

$ aws ec2 create-volume \
    --availability-zone eu-north-1a \
    --volume-type gp2 \
    --size 100 \
    --query VolumeId \
    --output text

vol-097ab0b089cd15bdf

I have created a general purpuse (gp2) type of EBS-volume of size 100 Gibibytes in the eu-north-1a availability zone of the Stockholm (eu-north-1) region. Next I can use the EBS-volume to create a PersistentVolume in my EKS-cluster. The manifest for this looks like this:

# pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: ebs-pv
spec:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 5Gi
  csi:
    driver: ebs.csi.aws.com
    fsType: ext4
    # provide the VolumeId for the EBS-volume
    volumeHandle: vol-097ab0b089cd15bdf
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: In
              values:
                - eu-north-1a

An EBS-volume is located in a single availability-zone4, and any workload that wishes to use it should be placed on a node that is located in the same availability-zone. In a production environment you would create a lot more than a single EBS-volume, and you would place them in different availability-zones5. In the manifest above I specify in .spec.nodeAffinity that this PersistentVolume should only be used for nodes in the eu-north-1a availability zone, because this is where my EBS-volume is located.

I create my PersistentVolume using kubectl apply:

$ kubectl apply -f pv.yaml

persistentvolume/ebs-pv created

To make sure my PersistentVolume was created I run kubectl get pv:

$ kubectl get pv

NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
ebs-pv    5Gi        RWO            Retain           Available                                   30s

I will use this PersistentVolume from a simple application. I create a composite manifest for a Pod and a PersistentVolumeClaim. The PersistentVolumeClaim will ask for 5Gi storage. The manifest for this application looks like this:

# application.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  storageClassName: ""
  volumeName: ebs-pv # match the name of the PV
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
    - name: app
      image: centos
      command: ["/bin/sh"]
      args:
        ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: ebs-claim

In .spec.containers[*].args I have created an infinite loop that appends the current date and time to a file with the path /data/out.txt and then sleeps for five seconds. The EBS-volume is mounted at the /data path.

I create my application using kubectl apply:

$ kubectl apply -f application.yaml

persistentvolumeclaim/ebs-claim created
pod/app created

I verify that my Pod starts up using kubectl get pods:

$ kubectl get pods

NAME   READY   STATUS    RESTARTS   AGE
app    1/1     Running   0          1m12s

I can once again check what my PersistentVolume looks like, and this time I can see that it has been claimed:

$ kubectl get persistentvolumes

NAME     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
ebs-pv   5Gi        RWO            Retain           Bound    default/my-claim                           3m48s

I can also verify that I have data in my Volume by watching how new data is appended to my output file:

$ kubectl exec -it app -- tail -f /data/out.txt

Thu Dec 29 06:33:09 UTC 2022
Thu Dec 29 06:33:14 UTC 2022
Thu Dec 29 06:33:19 UTC 2022
Thu Dec 29 06:33:24 UTC 2022
Thu Dec 29 06:33:29 UTC 2022
Thu Dec 29 06:33:34 UTC 2022
Thu Dec 29 06:33:39 UTC 2022
Thu Dec 29 06:33:44 UTC 2022
Thu Dec 29 06:33:49 UTC 2022
Thu Dec 29 06:33:54 UTC 2022
...

That concludes this exercise! It was a lot of work to get to this point, but once we are here it is easy to use the EBS CSI-driver for all our Volume needs.

Deleting my cluster
#

Now that I am done with my test I can remove my cluster and all the workloads that are currently running on it using eksctl delete cluster:

$ eksctl delete cluster -f cluster.yaml

This process takes a few minutes.

I should also remove the EBS-volume I created, for this I use the AWS CLI:

$ aws ec2 delete-volume --volume-id vol-097ab0b089cd15bdf

Summary
#

What a ride! In this article we looked at CSI-drivers. CSI-drivers are a way for Kubernetes to work with a large number of storage backends using a common interface. I specifically showed you a CSI-driver for EBS-volumes in AWS. To this end I set up an EKS-cluster in AWS with a tool called eksctl.

In the next article in this series we will take a look at Jobs and CronJobs. These are workload resources that creates Pods to perform a single task once, or repeating a task at a fixed interval.


  1. See the list available at https://kubernetes-csi.github.io/docs/drivers.html, note that this is not an exhaustive list! ↩︎

  2. I won’t be able to cover how EKS works in this article, but if you are interested to learn more you could visit www.eksworkshop.com↩︎

  3. We have not covered Service Accounts yet but we will do so in a future article. ↩︎

  4. An AWS availability zone is a data-center located in an AWS region. A region is usually made up of three or more availability zones that are located close to each other, but not close enough that a natural disaster would take out more than one data-center (availability zone). ↩︎

  5. You could also utilize dynamic provisioning of PersistentVolumes and have the backing media automatically created for you, but we will not do that in this article. ↩︎

Mattias Fjellström
Author
Mattias Fjellström
Cloud architect · Author · HashiCorp Ambassador
Kubernetes - This article is part of a series.
Part 10: This Article