Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?
An online reseller has a large, multi-column dataset with one column missing 30% of its data A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data.
Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?
A . Listwise deletion
B . Last observation carried forward
C . Multiple imputation
D . Mean substitution
Answer: C
Explanation:
Multiple imputation is a technique that uses machine learning to generate multiple plausible values for each missing value in a dataset, based on the observed data and the relationships among the variables. Multiple imputation preserves the integrity of the dataset by accounting for the uncertainty and variability of the missing data, and avoids the bias and loss of information that may result from other methods, such as listwise deletion, last observation carried forward, or mean substitution. Multiple imputation can improve the accuracy and validity of statistical analysis and machine learning models that use the imputed dataset.
References:
Managing missing values in your target and related datasets with automated imputation support in Amazon Forecast
Imputation by feature importance (IBFI): A methodology to impute missing data in large datasets Multiple Imputation by Chained Equations (MICE) Explained
Latest MLS-C01 Dumps Valid Version with 104 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund