Exam4Training

Which of the following lines of code can be used to complete the code block to successfully complete the task?

A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:

They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:

Which of the following lines of code can be used to complete the code block to successfully complete the task?
A . predict(*spark_df.columns)
B . mapInPandas(predict)
C . predict(Iterator(spark_df))
D . mapInPandas(predict(spark_df.columns))
E . predict(spark_df.columns)

Answer: B

Explanation:

To apply the Pandas UDF predict to each record of a Spark DataFrame, you use the mapInPandas method. This method allows the Pandas UDF to operate on partitions of the DataFrame as pandas DataFrames, applying the specified function (predict in this case) to each partition. The correct code completion to execute this is simply mapInPandas(predict), which specifies the UDF to use without additional arguments or incorrect function calls.

Reference: PySpark DataFrame documentation (Using mapInPandas with UDFs).

Exit mobile version