Which of the following classification metrics should be used to evaluate the model?
A health organization is developing a classification model to determine whether or not a patient currently has a specific type of infection. The organization's leaders want to maximize the number of positive cases identified by the model. Which of the following classification metrics should be used to evaluate the model?A...
Which of the following is a benefit of using vectorized pandas UDFs instead of standard PySpark UDFs?
Which of the following is a benefit of using vectorized pandas UDFs instead of standard PySpark UDFs?A . The vectorized pandas UDFs allow for the use of type hintsB . The vectorized pandas UDFs process data in batches rather than one row at a timeC . The vectorized pandas UDFs...
Which of the following lines of code will return the metadata description?
A machine learning engineer has created a Feature Table new_table using Feature Store Client fs. When creating the table, they specified a metadata description with key information about the Feature Table. They now want to retrieve that metadata programmatically. Which of the following lines of code will return the metadata...
Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?
Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?A . pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadataB . pandas API on Spark DataFrames are more performant than Spark DataFramesC . pandas API on Spark DataFrames...
Which of the following pieces of code can be used to fill in the above blank to complete the task?
A machine learning engineer wants to parallelize the training of group-specific models using the Pandas Function API. They have developed the train_model function, and they want to apply it to each group of DataFrame df. They have written the following incomplete code block: Which of the following pieces of code...
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?
A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API. Which of the following blocks of...
Which of the following describes why?
A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult. Which of the following describes why?A . Gradient boosting is not a linear algebra-based algorithm which is...
Which of the following feature engineering tasks will be the least efficient to distribute?
A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process. Which of the following feature engineering tasks will be the least efficient to distribute?A . One-hot encoding categorical featuresB . Target encoding categorical featuresC . Imputing missing feature values with the meanD...
In which of the following situations is it preferable to impute missing feature values with their median value over the mean value?
In which of the following situations is it preferable to impute missing feature values with their median value over the mean value?A . When the features are of the categorical typeB . When the features are of the boolean typeC . When the features contain a lot of extreme outliersD...
Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to scale with big data?
A data scientist has written a data cleaning notebook that utilizes the pandas library, but their colleague has suggested that they refactor their notebook to scale with big data. Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to...