Microsoft DP-100 Designing and Implementing a Data Science Solution on Azure Online Training
Microsoft DP-100 Online Training
The questions for DP-100 were last updated at Jan 21,2025.
- Exam Code: DP-100
- Exam Name: Designing and Implementing a Data Science Solution on Azure
- Certification Provider: Microsoft
- Latest update: Jan 21,2025
You are analyzing a dataset containing historical data from a local taxi company. You are developing a regression a regression model.
You must predict the fare of a taxi trip.
You need to select performance metrics to correctly evaluate the- regression model.
Which two metrics can you use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
- A . an F1 score that is high
- B . an R Squared value dose to 1
- C . an R-Squared value close to 0
- D . a Root Mean Square Error value that is high
- E . a Root Mean Square Error value that is low
- F . an F 1 score that is low.
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the valuation metric.
Which visualization should you use?
- A . Binary classification confusion matrix
- B . box plot
- C . Gradient descent
- D . coefficient of determination
You create a classification model with a dataset that contains 100 samples with Class A and 10,000 samples with Class B
The variation of Class B is very high.
You need to resolve imbalances.
Which method should you use?
- A . Partition and Sample
- B . Cluster Centroids
- C . Tomek links
- D . Synthetic Minority Oversampling Technique (SMOTE)
HOTSPOT
You have a dataset that contains 2,000 rows. You are building a machine learning classification model by using Azure Learning Studio. You add a Partition and Sample module to the experiment.
You need to configure the module.
You must meet the following requirements:
✑ Divide the data into subsets
✑ Assign the rows into folds using a round-robin method
✑ Allow rows in the dataset to be reused
How should you configure the module? To answer, select the appropriate options in the dialog box in the answer area. NOTE: Each correct selection is worth one point.
HOTSPOT
You are using the Azure Machine Learning Service to automate hyperparameter exploration of your neural network classification model.
You must define the hyperparameter space to automatically tune hyperparameters using random sampling according to following requirements:
✑ The learning rate must be selected from a normal distribution with a mean value of 10 and a standard deviation of 3.
✑ Batch size must be 16, 32 and 64.
✑ Keep probability must be a value selected from a uniform distribution between the range of 0.05 and 0.1.
You need to use the param_sampling method of the Python API for the Azure Machine Learning Service.
How should you complete the code segment? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
You are a data scientist building a deep convolutional neural network (CNN) for image classification.
The CNN model you built shows signs of overfitting.
You need to reduce overfitting and converge the model to an optimal fit.
Which two actions should you perform? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
- A . Reduce the amount of training data.
- B . Add an additional dense layer with 64 input units
- C . Add L1/L2 regularization.
- D . Use training data augmentation
- E . Add an additional dense layer with 512 input units.
You are with a time series dataset in Azure Machine Learning Studio.
You need to split your dataset into training and testing subsets by using the Split Data module.
Which splitting mode should you use?
- A . Regular Expression Split
- B . Split Rows with the Randomized split parameter set to true
- C . Relative Expression Split
- D . Recommender Split
HOTSPOT
You create an experiment in Azure Machine Learning Studio- You add a training dataset that contains 10.000 rows. The first 9.000 rows represent class 0 (90 percent). The first 1.000 rows represent class 1 (10 percent).
The training set is unbalanced between two Classes. You must increase the number of training examples for class 1 to 4,000 by using data rows. You add the Synthetic Minority Oversampling Technique (SMOTE) module to the experiment.
You need to configure the module.
Which values should you use? To answer, select the appropriate options in the dialog box in the answer area. NOTE: Each correct selection is worth one point.
You are performing clustering by using the K-means algorithm.
You need to define the possible termination conditions.
Which three conditions can you use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
- A . A fixed number of iterations is executed.
- B . The residual sum of squares (RSS) rises above a threshold.
- C . The sum of distances between centroids reaches a maximum.
- D . The residual sum of squares (RSS) falls below a threshold.
- E . Centroids do not change between iterations.
You are building a regression model tot estimating the number of calls during an event.
You need to determine whether the feature values achieve the conditions to build a Poisson regression model.
Which two conditions must the feature set contain? I ach correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
- A . The label data must be a negative value.
- B . The label data can be positive or negative,
- C . The label data must be a positive value
- D . The label data must be non discrete.
- E . The data must be whole numbers.