In which of the following situations is it preferable to impute missing feature values with their median value over the mean value?
A . When the features are of the categorical type
B . When the features are of the boolean type
C . When the features contain a lot of extreme outliers
D . When the features contain no outliers
E . When the features contain no missing no values
Answer: C
Explanation:
Imputing missing values with the median is often preferred over the mean in scenarios where the data contains a lot of extreme outliers. The median is a more robust measure of central tendency in such cases, as it is not as heavily influenced by outliers as the mean. Using the median ensures that the imputed values are more representative of the typical data point, thus preserving the integrity of the dataset’s distribution. The other options are not specifically relevant to the question of handling outliers in numerical data.
Reference: Data Imputation Techniques (Dealing with Outliers).
Latest Databricks Machine Learning Associate Dumps Valid Version with 74 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund