SAI #11: 5 Books for a Data Engineer of 2023, Central Role of The Model Registry and more...
5 Books for a Data Engineer of 2023, Central Role of The Model Registry, ACID Properties in DBMS.
👋 This is Aurimas. I write the weekly SAI Newsletter where my goal is to present complicated Data related concepts in a simple and easy to digest way. The goal is to help You UpSkill in Data Engineering, MLOps, Machine Learning and Data Science areas.
In this episode we cover:
5 Books for a Data Engineer of 2023.
Central Role of The Model Registry.
ACID Properties in DBMS.
Thanks for reading SwirlAI Newsletter!
5 Books for a Data Engineer of 2023
If I could only choose 5 books to read in 2023 as an aspiring Data Engineer these would be them in a specific order:
1️⃣ “Fundamentals of Data Engineering” - A book that I wish I had 5 years ago. After reading it you will understand the entire Data Engineering workflow. It will prepare you for further deep dives.
2️⃣ “Accelerate“ - Data Engineers should follow the same practices that Software Engineers do and more. After reading this book you will understand DevOps practices in and out.
3️⃣ “Designing Data-Intensive Applications” - Delve deeper into Data Engineering Fundamentals. After reading the book you will understand Storage Formats, Distributed Technologies, Distributed Consensus algorithms and more.
4️⃣ “Team Topologies” - Sometimes you might get confused about why a certain communication pattern is in place in the company you work for. After reading this book you will learn the Team Topologies model of organizational structure for fast flow. It will help you navigate the organizational dynamics and understand your role in the value chain.
5️⃣ “Data Mesh” - Data Mesh has become an extremely popular buzzword in recent years. After reading this book you will understand the intent by the author of the term herself. Don’t be the one to throw around the term without understanding its meaning deeply.
[NOTE]: All of the books above are talking about Fundamental concepts, even if you read all of them and decide that Data Engineering is not for you - you will be able to reuse the knowledge in any other Tech Role.
[ADDITIONAL NOTE]: I did not include any Fundamental Classics in the list as these can be picked up after you have already established yourself in the role.
Central Role of The Model Registry
Why is Model Registry such an important element in MLOps Stack?
We have already looked into the procedure for different types of ML Model deployments.
Let’s review the model training steps:
𝟭: Version Control: Machine Learning Training Pipeline is defined in code, once merged to the main branch it is built and triggered.
𝟮: Feature Preprocessing: Features are retrieved from the Feature Store, validated and passed to the next stage. Any feature related metadata that is tightly coupled to the Model being trained is saved to the Experiment Tracking System.
𝟯: Model is trained and validated on Preprocessed Data, Model related metadata is saved to the Experiment Tracking System.
𝟰: If Model Validation passes all checks - Model Artifact is passed to a Model Registry. The model is served.
✅ This is it - regardless of what deployment type will follow, the model is served in The Model Registry. Model registry is what glues Training and Deployment Pipelines together and this is where handover of the Model Artifact happens.
𝟱: The same model can be packaged as a container for different deployment types by implementing a respective interface. E.g.
👉 Flink application for Stream Processing Deployment.
👉 gRPC API for Request-Response.
👉 Plain model pointing to the Batch Serving Feature Store API for Batch.
Keep reading with a 7-day free trial