3 Comments
Oct 29, 2022Liked by Aurimas Griciūnas

So good, so clear... Good job 👍🙏

Expand full comment
Jan 29, 2023Liked by Aurimas Griciūnas

Hello Aurimas,

Thanks for this reading, you explained cross-cutting topics really useful 🤓

Expand full comment

Hey, thank you so much for the content.

I have a question regarding the sentence from the Machine Learning Pipeline's Feature Retrieval section:

You pointed out that the train/test/validation split should be performed there, which is a good idea, but what isn't clear to me is how to avoid Data Leakage at this point; If we calculate the Features with the whole dataset, then the test and validation set will be contaminated.

Do you propose creating different Feature Tables for all three datasets?

Kind Regards!

Expand full comment