A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression During exploratory data analysis the Specialist observes that many features are highly correlated with each other This may make the model unstable
What should be done to reduce the impact of having such a large number of features?
A . Perform one-hot encoding on highly correlated features
B . Use matrix multiplication on highly correlated features.
C . Create a new feature space using principal component analysis (PCA)
D . Apply the Pearson correlation coefficient
Answer: C
Explanation:
Principal component analysis (PCA) is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible. This is done by finding a new set of features called components, which are composites of the original features that are uncorrelated with one another. They are also constrained so that the first component accounts for the largest possible variability in the data, the second component the second most variability, and so on. By using PCA, the impact of having a large number of features that are highly correlated with each other can be reduced, as the new feature space will have fewer dimensions and less redundancy. This can make the linear models more stable and less prone to overfitting.
References:
Principal Component Analysis (PCA) Algorithm – Amazon SageMaker
Perform a large-scale principal component analysis faster using Amazon SageMaker | AWS Machine Learning Blog
Machine Learning- Prinicipal Component Analysis | i2tutorials
Latest MLS-C01 Dumps Valid Version with 104 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund