3 Comments
User's avatar
Hadil's avatar

So good, so clear... Good job 👍🙏

Michael Andrés Mora's avatar

Hello Aurimas,

Thanks for this reading, you explained cross-cutting topics really useful 🤓

Michal's avatar

Hey, thank you so much for the content.

I have a question regarding the sentence from the Machine Learning Pipeline's Feature Retrieval section:

You pointed out that the train/test/validation split should be performed there, which is a good idea, but what isn't clear to me is how to avoid Data Leakage at this point; If we calculate the Features with the whole dataset, then the test and validation set will be contaminated.

Do you propose creating different Feature Tables for all three datasets?

Kind Regards!