This blog is the first production workload on the Kubernetes cluster I’m building. Here I’m going to blog about the HOW, but before that I want to give some context by explaining the WHY.
Some history
As a freelancer I worked in total DevOps. It wasn’t that I was so enlightened back then when DevOps wasn’t even a thing, but since it was a one-man-show I had to do everything. The workloads were some static and WordPress sites, later Laravel apps and my hobby projects. A bunch of them. I quickly realized that working with client’s shared hosting solutions was awful so many ways, so I figured I can operate a small piece of infrastructure and get stuff into my own hands. I had three little VMs (1CPU/1GB) in Aruba Cloud (they charged a ridiculous $1/VM/month at the time) and RunCloud proved to be a very useful platform to manage multiple LEMP stack instances in one place, mostly without the need to SSH onto the servers.
This went on for years and pain points gradually developed:
- If one of the servers went down (no matter if by incident or planned maintenance) the sites hosted on that machine went down as well.
- Deploying applications other than static or PHP was possible, but not easy.
- Balancing the load between servers was not trivial: moving one site to another server was a task you wouldn’t do frequently.
Of course these weren’t Fortune 500 company websites, but I wanted to do better after feeling the helplessness when one of them went down for a few hours.
Warning: If you think that this whole project of mine is a fallacy and unecessary in this scale, you're right. I'm not doing this because I'm convinced that this is the only way to run a bunch of simple, low-traffic workloads reliably. I'm doing this because I wanted to learn these technologies and experiment with them to exploit their advantages and see their weaknesses.
Here comes the Whale
It was DockerConEU2018 when I learned that people do use Docker in production and a Swarm cluster would address most of the above mentioned pains. Except it’s not, but we get back to it later.
It took me about a year to learn how to build a reliable Swarm cluster, dockerize all workloads and move everything into the cluster. I switched to Scaleway’s cloud for they have a very reasonable offering and Docker Machine driver. However there were several trade offs along the way, because:
- Operating Swarm in HA requires at least 3 manager nodes that you usually don’t use for business workloads. That means you can take off with 4 nodes at least.
- Volume management in a multi-node cluster is (still) not really solved*.
I ended up having a one-node-cluster on a 4CPU/8GB VM and it works nicely till this day. Was that an improvement? Partly yes, as this seems to be more reliable and faster than the previous setup was, but it may depend on many factors. Also I can easily deploy any kind of application, let it be a static NGINX site, Wordpress/PHP or NodeJS. Otherwise it’s a bit worse, since now if that one VM goes down, all my sites go down with it.
Operating a small multi-node Swarm cluster is not a piece of cake and not really cost effective, particularly if you need persistent storage, but…
You don’t say…, do you?”
Yes, I do. I do say that I can fix HA and not much increase visible complexity and cost in this very case with Kubernetes. Some like to talk about k8s as it would solve all your infrastructure problems immediately, but I’d never say that. However, Kubernetes for sure became the de facto cloud container orchestrator, thus a lot more development goes into it and its ecosystem than into SwarmKit.
Let’s move to Kubernetes!
This isn’t my first k8s cluster, but the first I wouldn’t call experimental. I build this for production and the plan is to migrate all stuff from Swarm into this cluster in the near future. I can say, I have some experience on how k8s works, which is a huge complexity in itself when you first face it. When you come from Docker, everything seems so complicated and overwhelming. You need to give yourself time to familiarize with the concepts and get used to the manifests and CLI of k8s. I won’t cover these here, because there are a lot of great resources like this where you can learn them. And of course, there are tools that can make your life easier, which I will definitely cover.
Rule #1: Don’t Do It Yourself!
No really, the best advice I can give is to NOT operate your own k8s cluster, unless there are a hundred people with the job description system engineer on your company payroll. Certainly, there are tools that help you deploy and configure k8s, but we don’t want to increase complexity too much and operating a cluster is well beyond the amount of hassle most of us want to take.
I recommend using a cloud provider’s k8s offering. Why? You remember, I mentioned running Swarm in HA requires at least 3 manager nodes and it’s no different with k8s. Kubernetes cluster managers run the control plane (API server, etcd database, scheduler and other services) crucial for k8s to work. In a managed k8s cluster you don’t even see those nodes and services, it is completely managed by the provider on infrastructure hidden from the customer. There are plenty of managed k8s services to choose from (EKS on AWS, GKE on Google Cloud, AKS on Azure, IBM Cloud Kubernetes Service… you name it, my choice is Kapsule on Scaleway). The provider may or may not charge you for management, in case of Scaleway it’s free as it used to be on GKE. Now Google gives you only one cluster for free (from the second one they bill $0.10 per cluster per hour, similar to EKS), so watch out for those costs when comparing offers. Since management is handled for you, all you need beyond that is a couple of worker nodes, which are provisioned automatically from a user configured node pool with auto-update, auto-healing or even auto-scaling.
So what are the benefits of a cloud provider’s k8s:
- deployment, update and overall management of the cluster is handled for you, redundantly with SLA
- automatic node provisioning
- usually the provider gives you other automatically provisioned resources like block storage based volumes via custom storage class and load balancers to expose your workloads
If I may have created the false impression that you’re just a few clicks away from deploying your first applications with a managed k8s cluster, I have to quickly clarify that that’s not the case. You will probably need some tools to run applications reliably and I’m going to get on those in the next post when we start building the cluster.
* CSI (Container Storage Interface) support is long promised for Swarm and its new owner, Mirantis confirmed their commitment on it, but we’re yet to see it shipped. There are solutions from NFS to GlusterFS, Ceph, Storidge, but these require more advanced nodes (like at least 3 distinct physical disks) or are ops heavy to set up and manage. Or both. ← Back to where I was
Discussion: https://twitter.com/iben12/status/1278593614638190592?s=20