Databricks Databricks Machine Learning Associate Databricks Certified Machine Learning Associate Exam Online Training
Databricks Databricks Machine Learning Associate Online Training
The questions for Databricks Machine Learning Associate were last updated at Nov 19,2024.
- Exam Code: Databricks Machine Learning Associate
- Exam Name: Databricks Certified Machine Learning Associate Exam
- Certification Provider: Databricks
- Latest update: Nov 19,2024
A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?
- A . One-hot encoding categorical features
- B . Target encoding categorical features
- C . Imputing missing feature values with the mean
- D . Imputing missing feature values with the true median
- E . Creating binary indicator features for missing values
Which of the following is a benefit of using vectorized pandas UDFs instead of standard PySpark UDFs?
- A . The vectorized pandas UDFs allow for the use of type hints
- B . The vectorized pandas UDFs process data in batches rather than one row at a time
- C . The vectorized pandas UDFs allow for pandas API use inside of the function
- D . The vectorized pandas UDFs work on distributed DataFrames
- E . The vectorized pandas UDFs process data in memory rather than spilling to disk
A data scientist wants to tune a set of hyperparameters for a machine learning model. They have wrapped a Spark ML model in the objective function objective_function and they have defined the search space search_space.
As a result, they have the following code block:
Which of the following changes do they need to make to the above code block in order to accomplish the task?
- A . Change SparkTrials() to Trials()
- B . Reduce num_evals to be less than 10
- C . Change fmin() to fmax()
- D . Remove the trials=trials argument
- E . Remove the algo=tpe.suggest argument
A machine learning engineer would like to develop a linear regression model with Spark ML to predict the price of a hotel room. They are using the Spark DataFrame train_df to train the model.
The Spark DataFrame train_df has the following schema:
The machine learning engineer shares the following code block:
Which of the following changes does the machine learning engineer need to make to complete the task?
- A . They need to call the transform method on train df
- B . They need to convert the features column to be a vector
- C . They do not need to make any changes
- D . They need to utilize a Pipeline to fit the model
- E . They need to split the features column out into one column for each feature
Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?
- A . Keras
- B . pandas
- C . PvTorch
- D . Spark ML
- E . Scikit-learn