What the blog?!

The story of how I almost got this blog totally wrong. While we’re talking about it you can learn about Kubernetes storage in general, ConfigMaps, StatefulSets and the NFS provisioner.

When I looked for an engine for this blog I aimed for something lightweight and easy. A visceral hatred of WordPress is rooted deep in my soul, so I leaned towards Grav, which is a flat-file based CMS written in PHP. I used it before, it’s nice and you can write posts in pure markdown which is big plus. I thought it will fit the purpose. So I set up a new project, found a pretty simple theme, tweaked it a bit and deployed it to the cluster. So far, so good.

Grav, however is a full featured CMS, so you have the ability to edit it online via an admin UI. It was tempting to be able to quickly fix a typo without the whole thing going through CI. The posts are saved to the filesystem and I think you guessed what’s the problem with that. As soon as you edit something the app becomes stateful, and all the changes are lost when the pod restarts. Is it that bad? — you may ask and I’d answer ‘No’. That’s what persistent storage is for. Easy.

Persistent volumes however have some implications: most of them do not support the ReadWriteMany, aka RWX access mode, when multiple pods mount the volume with write access. ‘What does it matter for me?' — I though. I’m good to go with one replica anyway. So I have set up a persistentVolumeClaim:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: blog-pages
spec:
  storageClassName: scw-ssd-retain
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

and mounted it into the container:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


apiVersion: apps/v1
kind: Deployment
metadata:
  name:  blog
  labels:
    app:  blog
spec:
  selector:
    matchLabels:
      app: blog
  replicas: 1
  template:
    metadata:
      labels:
        app:  blog
    spec:
      containers:
      - name:  blog
        # ...
        volumeMounts:
        - name: pages
          mountPath: /app/user/pages
      volumes:
      - name: pages
        persistentVolumeClaim:
          claimName: blog-pages

Looks good to me. For the first sight, at least. You might notice that only the pages folder (where the actual content is stored) is mounted from the volume, which means all other stuff (config, templates, style) is still baked in at image build time. I’m ok with that, except one thing I should have realize right at the beginning: when I deploy a new image version the workload restarts with quite some downtime. And that’s because the fact that the volume cannot be mounted in multiple pods at any given time. Deployment update has to wait for the old pod to gracefully shut down and release the volume, so that the new one can start to initialize, mount the volume and be ready to serve requests. Of course it’s not the end of the world, but I didn’t want that.

Bad ideas

#1 ConfigMap

Theoretically I could deploy a post as a ConfigMap, a native resource of Kubernetes. They are used to store information and provide it to containers as mounted files or environment variables. It would look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


apiVersion: v1
kind: ConfigMap
metadata:
  name: first-post
  namespace: default
  labels:
    type: blog-post
data:
  first-post.md: >-
        This is an interesting post...

And I can use it like this in the deployment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


    # ...
    volumeMounts:
      - name: first-post
        mountPath: "/app/user/pages/first-post.md"
        subPath: "first-post.md"
  volumes:
    - name: first-post
      configMap:
        name: first-post
        items:
        - key: first-post.md
          path: first-post.md

There are reasons why this is a bad idea: one is that mounting a lot of them this way is less than practical. I would need a controller to look for a new labeled ConfigMap and mount it automatically. Second — and this is worse — is that a ConfigMap is stored in the central etcd, the brain of k8s. It is a really bad idea to overload it with a lot of heavy stuff. This simply doesn’t scale well.

ConfigMap is great for exactly the purpose the name suggest: store and provide configurations for containers. Probably you have one or two for a pod, but surely not dozens.

#2 StatefulSet

This isn’t even a bad idea, but a misconception by me. StatefulSet is a special kind of deployment where the pods and their persistent volumes are bound together and this binding is maintained across re-schedules. This is achieved by unique identifiers on pods and volumes (like pod-0, vol-0 and pod-1, vol-1) so that pods find their own volume after restart. Also the order of replicas is taken into account: when performing a rolling update pod-0 will be restarted first and then the others sequentially. A StatefulSet is useful when your application instances store their own data individually and the application itself manages data synchronization or addressing between the nodes. Typically database clusters behave this way, if you either think about primary-follower setups or the sharding mechanisms of a MongoDB cluster.

That’s truly amazing, but has no relevance in my use-case.

#3 NFS server provisioner

Well, this isn’t inherently a bad idea. Actually it works pretty well. Just maybe a bit exaggeration is this case. The difficulty of ReadWriteMany volumes lies in the filesystems themselves, most of which is not prepared for concurrent writes. Actually there are safeguards in the kernel to make sure only one process can open a file for writing at a time, because really bad things could happen otherwise. Of course if multiple kernel instances (containers) would handle the same filesystem, those safeguards wouldn’t work. NFS however, was built specifically for this: it allows exports to be mounted to multiple clients and handles locking on the server. What NFS server provisioner does is the following:

creates a StatefulSet with a persistent storage volume with ReadWriteOnce access bind to it as the NFS server
creates a specific StorageClass that supports ReadWriteMany access
the k8s dynamic provisioner watches for PersistentVolumeClaims using this StorageClass and provisions a PersistentVolume for them backed by an NFS export. All the needed wiring is done ~~magically~~ internally.

A very common use-case for this is when you want to horizontally scale WordPress. I did try this with my Grav blog and it certainly solved the downtime issue, because now the new pod could be started and get ready before the scheduler shut down the old one. This is also known as a rolling update.

What did I end up with?

I was really close to release my first post to the world (i.e. tweet it) with the above setup when I stopped for a minute. Is this the right solution for the problem? Or is this just another unnecessary overengineering as this whole k8s project is? This time I came to the conscious decision that I don’t need this. Although I learned a lot in this experiment, I let it go. I grabbed Hugo, an amazing static site generator written in Go, moved the post from Grav and deployed the generated content using a simple NGINX container. Now I have to commit and push for even a minor change, but as Hugo does not have any dependencies my build time is very short and the site is blazing fast. Actually there is only about 2 minutes from git push till the change is deployed to the live site.

Takeaways

Experiment! Discover ways to solve your problem, learn along the way, but don’t be afraid to switch concepts or implementations if they don’t feel quite right.
Storage in the container world is difficult. Avoid state to be scattered everywhere if possible, try to aggregate it in places specifically built for this purpose, like a database (yeah, I know, WordPress does exactly that, but…).
The NFS server provisioner is a damn good thing, I’m going to keep that. It is possible to deploy it manually, but it’s quite some work, so if you want to try it, I recommend using the Helm chart.

Discussion: https://twitter.com/iben12/status/1279802178660708354?s=20