SAI #19: The Data Value Chain.
The Data Value Chain, Data Contracts in the Data Pipeline, The 4 types of ML Model Deployment.
👋 This is Aurimas. I write the weekly SAI Newsletter where my goal is to present complicated Data related concepts in a simple and easy to digest way. The goal is to help You UpSkill in Data Engineering, MLOps, Machine Learning and Data Science areas.
This week in the Newsletter:
The Data Value Chain.
Data Contracts in the Data Pipeline.
The 4 types of ML Model Deployment.
The Data Value Chain.
How does the Data Value Chain look like and what do we need to know about efforts to bring the Data up the chain?
We can benefit greatly towards understanding ROI from a well designed Data Flow and involvement of relevant non business Stakeholders at a correct point in time.
Let’s say we are interested in building out a new Data Product powered by a Machine Learning Model.
Here are some Data Flow elements that are important - note that we go from the downstream to the upstream and not vice versa:
1️⃣ Engineered Features: Do we already have Features available in the Feature Store that could be directly used for the ML Driven Product Idea that is being conceived?
👉 Investment: minimal, if we do have the data - we will be able to start a PoC really quickly.
👉 Involve: Data Scientists and Machine Learning Engineers.
2️⃣ Curated Data: Can we derive required Features from the Data that is already Curated?
👉 Investment: minimal to medium, Data here is already Golden - quality is assured and SLAs are met.
👉 Involve: Analytical Engineers and Data Engineers.
3️⃣ Raw Data: Could we derive any Features from Raw Data? If yes - is it possible for it to become Curated? Can we ensure the Quality? Is the Data current and are we receiving it in regular intervals? Is it possible to establish meaningful SLAs?
👉 Investment: medium to high.
👉 Involve: Data Engineers.
❗️This is where we will start seeing negative ROI.
4️⃣ Data Acquisition: The last frontier. If we don't have the needed Data anywhere we might try to acquire it from external systems.
👉 Investment: High.
👉 Involve: Software Engineers, Data Engineers.
❗️This is where most of the ideas will result in negative ROI.
[Important]: after going one step upstream always remember to add up any downstream investments when calculating ROI.
Data Contracts in the Data Pipeline.
Keep reading with a 7-day free trial
Subscribe to