Which of the following lines of code can be used to complete the code block to successfully complete the task?

exams

10 months ago

A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:

They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:

Which of the following lines of code can be used to complete the code block to successfully complete the task?
A . predict(*spark_df.columns)
B . mapInPandas(predict)
C . predict(Iterator(spark_df))
D . mapInPandas(predict(spark_df.columns))
E . predict(spark_df.columns)

Answer: B

Explanation:

To apply the Pandas UDF predict to each record of a Spark DataFrame, you use the mapInPandas method. This method allows the Pandas UDF to operate on partitions of the DataFrame as pandas DataFrames, applying the specified function (predict in this case) to each partition. The correct code completion to execute this is simply mapInPandas(predict), which specifies the UDF to use without additional arguments or incorrect function calls.

Reference: PySpark DataFrame documentation (Using mapInPandas with UDFs).

Latest Databricks Machine Learning Associate Dumps Valid Version with 74 Q&As

Latest And Valid Q&A | Instant Download | Once Fail, Full Refund

Instant Download Databricks Machine Learning Associate PDF Databricks Machine Learning Associate Questions Online Training