3 Comments
User's avatar
Hadil's avatar

So good, so clear... Good job 👍🙏

Expand full comment
Michael Andrés Mora's avatar

Hello Aurimas,

Thanks for this reading, you explained cross-cutting topics really useful 🤓

Expand full comment
Michal's avatar

Hey, thank you so much for the content.

I have a question regarding the sentence from the Machine Learning Pipeline's Feature Retrieval section:

You pointed out that the train/test/validation split should be performed there, which is a good idea, but what isn't clear to me is how to avoid Data Leakage at this point; If we calculate the Features with the whole dataset, then the test and validation set will be contaminated.

Do you propose creating different Feature Tables for all three datasets?

Kind Regards!

Expand full comment