Which of the following changes does the machine learning engineer need to make to complete the task?
A machine learning engineer would like to develop a linear regression model with Spark ML to predict the price of a hotel room. They are using the Spark DataFrame train_df to train the model.
The Spark DataFrame train_df has the following schema:
The machine learning engineer shares the following code block:
Which of the following changes does the machine learning engineer need to make to complete the task?
A . They need to call the transform method on train df
B . They need to convert the features column to be a vector
C . They do not need to make any changes
D . They need to utilize a Pipeline to fit the model
E . They need to split the features column out into one column for each feature
Answer: B
Explanation:
In Spark ML, the linear regression model expects the feature column to be a vector type. However, if the features column in the DataFrame train_df is not already in this format (such as being a column of type UDT or a non-vectorized type), the engineer needs to convert it to a vector column using a transformer like VectorAssembler. This is a critical step in preparing the data for modeling as Spark ML models require input features to be combined into a single vector column.
Reference
Spark MLlib documentation for LinearRegression: https://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression
Latest Databricks Machine Learning Associate Dumps Valid Version with 74 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund