SAI #02: Feature Store, Splittable vs…

Aurimas Griciūnas

Oct 22, 2022

Splittable vs Non-Splittable Files, CDC (Change Data Capture), Machine Learning Pipeline, Feature Store.

Read →

3 Comments

Hadil

Oct 29, 2022

So good, so clear... Good job 👍🙏

Expand full comment

Michael Andrés Mora

Jan 29, 2023

Hello Aurimas,

Thanks for this reading, you explained cross-cutting topics really useful 🤓

Expand full comment

Michal

Dec 24, 2022

Hey, thank you so much for the content.

I have a question regarding the sentence from the Machine Learning Pipeline's Feature Retrieval section:

You pointed out that the train/test/validation split should be performed there, which is a good idea, but what isn't clear to me is how to avoid Data Leakage at this point; If we calculate the Features with the whole dataset, then the test and validation set will be contaminated.

Do you propose creating different Feature Tables for all three datasets?

Kind Regards!

Expand full comment

SwirlAI Newsletter

SAI #02: Feature Store, Splittable vs…