Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?

Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?A . TrainValidationSplitB . DataFrame.whereC . CrossValidatorD . TrainValidationSplitModelE . DataFrame.randomSplitView AnswerAnswer: E Explanation: The correct method to randomly split a Spark DataFrame into training and...

August 29, 2024 No Comments READ MORE +

Which of the following is a benefit of using vectorized pandas UDFs instead of standard PySpark UDFs?

Which of the following is a benefit of using vectorized pandas UDFs instead of standard PySpark UDFs?A . The vectorized pandas UDFs allow for the use of type hintsB . The vectorized pandas UDFs process data in batches rather than one row at a timeC . The vectorized pandas UDFs...

August 28, 2024 No Comments READ MORE +

Which of the following values represents the overall cross-validation root-mean-squared error?

A data scientist uses 3-fold cross-validation when optimizing model hyperparameters for a regression problem. The following root-mean-squared-error values are calculated on each of the validation folds: • 10.0 • 12.0 • 17.0 Which of the following values represents the overall cross-validation root-mean-squared error?A . 13.0B . 17.0C . 12.0D ....

August 27, 2024 No Comments READ MORE +

Which of the following describes why?

A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult. Which of the following describes why?A . Gradient boosting is not a linear algebra-based algorithm which is...

August 27, 2024 No Comments READ MORE +

Which of the following feature engineering tasks will be the least efficient to distribute?

A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process. Which of the following feature engineering tasks will be the least efficient to distribute?A . One-hot encoding categorical featuresB . Target encoding categorical featuresC . Imputing missing feature values with the meanD...

August 25, 2024 No Comments READ MORE +

Which of the following changes does the machine learning engineer need to make to complete the task?

A machine learning engineer would like to develop a linear regression model with Spark ML to predict the price of a hotel room. They are using the Spark DataFrame train_df to train the model. The Spark DataFrame train_df has the following schema: The machine learning engineer shares the following code...

August 24, 2024 No Comments READ MORE +

Which of the following explanations justifies this suggestion?

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository. Which of the following explanations justifies this suggestion?A . One-hot encoding is not supported by...

August 23, 2024 No Comments READ MORE +

Which of the following code blocks will accomplish this task?

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0. Which of the following code blocks will accomplish this task?A . spark_df[spark_df["price"] > 0]B . spark_df.filter(col("price") >...

August 23, 2024 No Comments READ MORE +

In which of the following situations is it preferable to impute missing feature values with their median value over the mean value?

In which of the following situations is it preferable to impute missing feature values with their median value over the mean value?A . When the features are of the categorical typeB . When the features are of the boolean typeC . When the features contain a lot of extreme outliersD...

August 23, 2024 No Comments READ MORE +

Which of the following lines of code will return the metadata description?

A machine learning engineer has created a Feature Table new_table using Feature Store Client fs. When creating the table, they specified a metadata description with key information about the Feature Table. They now want to retrieve that metadata programmatically. Which of the following lines of code will return the metadata...

August 21, 2024 No Comments READ MORE +