Why do you need to split a machine learning dataset into training data and test data?
A . So you can try two different sets of features
B . To make sure your model is generalized for more than just the training data
C . To allow you to create unit tests in your code
D . So you can use one dataset for a wide model and one for a deep model
Answer: B
Explanation:
The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called overfitting.
Reference: https://machinelearningmastery.com/a-simple-intuition-for-overfitting/
Latest Professional Data Engineer Dumps Valid Version with 160 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund