I have a question regarding the sentence from the Machine Learning Pipeline's Feature Retrieval section:
You pointed out that the train/test/validation split should be performed there, which is a good idea, but what isn't clear to me is how to avoid Data Leakage at this point; If we calculate the Features with the whole dataset, then the test and validation set will be contaminated.
Do you propose creating different Feature Tables for all three datasets?
So good, so clear... Good job 👍🙏
Hello Aurimas,
Thanks for this reading, you explained cross-cutting topics really useful 🤓
Hey, thank you so much for the content.
I have a question regarding the sentence from the Machine Learning Pipeline's Feature Retrieval section:
You pointed out that the train/test/validation split should be performed there, which is a good idea, but what isn't clear to me is how to avoid Data Leakage at this point; If we calculate the Features with the whole dataset, then the test and validation set will be contaminated.
Do you propose creating different Feature Tables for all three datasets?
Kind Regards!