SAI #14: Data Latency in ML Systems.
Data Latency in ML Systems, MLOps Maturity Level 2, DBMS Architecture.
👋 This is Aurimas. I write the weekly SAI Newsletter where my goal is to present complicated Data related concepts in a simple and easy to digest way. The goal is to help You UpSkill in Data Engineering, MLOps, Machine Learning and Data Science areas.
This week we cover the following topics:
How do we define Data Latency in ML Systems when serving Online predictions?
What is MLOps Maturity Level 2 and how do we move there from Level 1?
How are DBMS (Database Management Systems) Architected?
Bonus: Moving to MLOps Engineering from other roles.
How do we define Data Latency in ML Systems when serving Online predictions?
There are two main components you can think about when it comes to Data Latency:
➡️ How Recent is the Data that the Model being served was trained on.
➡️ How Recent are the Features that are being fed to the Model that is making Online predictions.
Generally you can split maturity of ML System Architectures into 4 levels.
Level 1:
1: Offline Features from Data Warehouse are used to train ML Models on a schedule.
2: Inference is applied on new incoming Features on a schedule.
3: Inference results are uploaded into a low latency Prediction Store like Redis.
4: Product Applications source predictions from the Prediction Store.
5: New Features are piped from Product Apps back to the DWH.
👉 Model Data Latency last time the Model was trained.
👉 Feature Latency equals to the last time the inference was applied + what was the Feature Latency at that point in time.
Level 2:
6: Data Warehouse is replaced by a Feature Store which is used for retrieving offline features for Model Training and Feature Serving in Real Time.
7: Prediction Store is replaced by a ML Model exposed as a Web Service that retrieves Features in Real Time from the Feature Store.
8: Product Applications request Inference directly from the Web Service.
9: New Features are piped from Product Apps back to the Feature Store. Upload performed via Batch Feature Ingestion API on a schedule.
👉 Model Data Latency equals to Last time the model was trained.
👉 Feature Latency is decoupled and equals to when was the last time the Features were ingested into the Feature Store.
Level 3:
10: New Features that are being piped from Product Apps back to the Feature Store are Transformed (by e.g. Flink Application) and Ingested in real time via Real Time Feature Ingestion API.
👉 Model Data Latency equals to Last time the model was trained.
👉 Feature Latency becomes Near to Real Time.
Level 4:
11: Online Model Training is introduced.
12: ML Models are continuously trained in Real Time on new incoming data and continuously redeployed in production after ensuring stability and correctness.
👉 Both Model and Feature Latency becomes Near to Real Time
❗️ Having said this, It is always a good idea to start at Level 1. With the current industry state you will very rarely move beyond Level 2.
❗️ Industry is moving forward and Real time Feature Processing is becoming attainable for more companies so you will see a lot more Level 3s in the future.
What is MLOps Maturity Level 2 and how do we move there from Level 1?
Keep reading with a 7-day free trial
Subscribe to