Which of the following changes does the machine learning engineer need to make to complete the task?

A machine learning engineer would like to develop a linear regression model with Spark ML to predict the price of a hotel room. They are using the Spark DataFrame train_df to train the model.

The Spark DataFrame train_df has the following schema:

The machine learning engineer shares the following code block:

Which of the following changes does the machine learning engineer need to make to complete the task?
A . They need to call the transform method on train df
B . They need to convert the features column to be a vector
C . They do not need to make any changes
D . They need to utilize a Pipeline to fit the model
E . They need to split the features column out into one column for each feature

Answer: B

Explanation:

In Spark ML, the linear regression model expects the feature column to be a vector type. However, if the features column in the DataFrame train_df is not already in this format (such as being a column of type UDT or a non-vectorized type), the engineer needs to convert it to a vector column using a transformer like VectorAssembler. This is a critical step in preparing the data for modeling as Spark ML models require input features to be combined into a single vector column.

Reference

Spark MLlib documentation for LinearRegression: https://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments