👋 I am Aurimas. I write the SwirlAI Newsletter with the goal of presenting complicated Data related concepts in a simple and easy-to-digest way. My mission is to help You UpSkill and keep You updated on the latest news in Data Engineering, MLOps, Machine Learning and overall Data space.
This is a 🔒 Paid Subscriber 🔒 only issue. If you want to read the full article, consider upgrading to paid subscription.
This is the first part of my Guide to using Kubernetes.
I was introduced to Kubernetes (K8s) around 6 years ago while working as a ML Engineer. At that point in time the system was already popular and well known between technology enthusiasts and used in companies with strong engineering functions that needed capabilities that would allow seamless scaling of applications. From the moment I was introduced to the system I got obsessed with it and what it had to offer. As time went on I had touch points with K8s in most of the companies I’ve worked for, even led a Cloud Native Transformation in one of them. When Cloud Native Computing Foundation came out with their Kubernetes Certifications I naturally jumped on the boat and am currently a proud holder of all 3 of them - Certified Kubernetes Application Developer, Certified Kubernetes Administrator and Certified Kubernetes Security Specialist, or as better known for their acronyms - CKAD, CKA and CKS.
So why do I believe in Kubernetes? What problems does it solve? What do you need to know about it in your day-to-day life as a Data Professional? I will try to answer these questions in this multi-part series on Kubernetes.
On top of learning the theory behind K8s, we will also use it in most of the hands-on implementations of projects in this Newsletter. The first one is coming out next week where I will be covering the first piece of The SwirlAI Data Engineering Project Master Template - The Collector.
This episode of the Newsletter will cover most of what we will need for our future project deployment and that is why I wanted to roll it out first, before we go into any hands-on.
A thought to spike your excitement when it comes to the series: we will be deploying literally everything in our hands-on tutorials on Kubernetes, and to make it even more exciting - we will eventually do that with a single click. Plus, you will be able to bring all of that to any major cloud vendor seamlessly. Also, like you to be able to pass at least CKAD exam after finishing these series + doing some hands-on. As we progress with the content I will give you tips on what to keep in mind when preparing for the exams if you choose to take them.
In the first part we cover:
Why Kubernetes for Data Engineering and Machine Learning.
General Kubernetes Resources for application configuration:
Namespaces.
ConfigMaps.
Secrets.
Deploying Applications:
Pods.
Deployments.
Services.
This scratches only the surface of Kubernetes, but you will already be able to craft useful applications by just understanding how to use these concepts.
Why Kubernetes for Data Engineering and Machine Learning.
First, let’s look into the Kubernetes system from a bird's eye view. What is Kubernetes (K8s)?
It is a container orchestrator that performs the scheduling, running and recovery of your containerised applications in a horizontally scalable and self-healing way. It is important to note that while I will mostly be using Docker icons to represent containers in the diagrams, K8s can orchestrate almost any type of container out there.
Kubernetes architecture consists of two main logical groups:
Control plane - this is where K8s system processes that are responsible for scheduling workloads defined by you and keeping the system healthy live. The control plane by itself is a big component of a Kubernetes cluster but as an application developer you will almost never be exposed to it directly. Cloud hosted Kubernetes services also manage Control Planes for you. We will look into Control Planes in detail in one of the later episodes of Kubernetes series.
Worker nodes - this is where containers are scheduled and being run. You will be mostly exposed to these nodes as an application developer. The concepts that will be relevant are: resources available on the nodes, colocation of different applications next to each other on the same node, persistent storage needed for your applications etc.
How does Kubernetes help you?
You can have thousands of Nodes (usually you only need tens of them) in your K8s cluster, each of them can host multiple pods (containers). Nodes can be added or removed from the cluster as needed. This enables unrivaled horizontal scalability.
Kubernetes provides an easy to use and understand declarative interface to deploy applications. Your application deployment definition can be described in yaml, submitted to the cluster and the system will take care that the desired state of the application is always up to date. This is extremely important as by bringing Kubernetes language to your organisation you also normalize it, by doing so you unify how your developers communicate and explain architectures.
Users are empowered to create and own their application architecture in boundaries pre-defined by Cluster Administrators. Administrators can ensure the isolation of a specific user or a team, they can define resource quotas and boundaries of where the applications deployed by a specific team can run.
So what added value can you expect out of this as a Data Engineer, ML or MLOps Engineer?
Keep reading with a 7-day free trial
Subscribe to