What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?A . Leave-one-out encodingB . Target encodingC . One-hot encodingD . CategoricalE . String indexingView AnswerAnswer: C Explanation: The method that transforms categorical features into a series of binary indicator variables is...

September 11, 2024 No Comments READ MORE +

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?A . pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadataB . pandas API on Spark DataFrames are more performant than Spark DataFramesC . pandas API on Spark DataFrames...

September 10, 2024 No Comments READ MORE +

Which of the following possible explanations for this difference is invalid?

A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model by comparing the label predictions to the actual price values, the data scientist notices that...

September 10, 2024 No Comments READ MORE +

Which of the following changes do they need to make to the above code block in order to accomplish the task?

A data scientist wants to tune a set of hyperparameters for a machine learning model. They have wrapped a Spark ML model in the objective function objective_function and they have defined the search space search_space. As a result, they have the following code block: Which of the following changes do...

September 10, 2024 No Comments READ MORE +

Which of the following lines of code can be used to complete the code block to successfully complete the task?

A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model: They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df: Which of the following lines of code can be used to complete...

September 10, 2024 No Comments READ MORE +

Which of the following lines of code can the data scientist run to accomplish the task?

A data scientist is wanting to explore summary statistics for Spark DataFrame spark_df. The data scientist wants to see the count, mean, standard deviation, minimum, maximum, and interquartile range (IQR) for each numerical feature. Which of the following lines of code can the data scientist run to accomplish the task?A...

September 8, 2024 No Comments READ MORE +

Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to scale with big data?

A data scientist has written a data cleaning notebook that utilizes the pandas library, but their colleague has suggested that they refactor their notebook to scale with big data. Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to...

September 8, 2024 No Comments READ MORE +

Which of the following changes does the data scientist need to make to their objective_function in order to produce a more accurate model?

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model. They elect to use the Hyperopt library's fmin operation to facilitate this process. Unfortunately, the final model is not very accurate. The data scientist suspects that there is an issue with the objective_function being passed as an...

September 4, 2024 No Comments READ MORE +

Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?

Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?A . KerasB . pandasC . PvTorchD . Spark MLE . Scikit-learnView AnswerAnswer: D Explanation: Spark ML (Machine Learning Library) is designed specifically for...

September 4, 2024 No Comments READ MORE +

Which of the following changes can the data scientist make to address the concern?

A data scientist is using Spark ML to engineer features for an exploratory machine learning project. They decide they want to standardize their features using the following code block: Upon code review, a colleague expressed concern with the features being standardized prior to splitting the data into a training set...

September 1, 2024 No Comments READ MORE +