Site icon Exam4Training

Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?

Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?
A . TrainValidationSplit
B . DataFrame.where
C . CrossValidator
D . TrainValidationSplitModel
E . DataFrame.randomSplit

Answer: E

Explanation:

The correct method to randomly split a Spark DataFrame into training and test sets is by using the randomSplit method. This method allows you to specify the proportions for the split as a list of weights and returns multiple DataFrames according to those weights. This is directly intended for splitting DataFrames randomly and is the appropriate choice for preparing data for training and testing in machine learning workflows.

Reference: Apache Spark DataFrame API documentation (DataFrame Operations: randomSplit).

Exit mobile version