Microsoft DP-100 Designing and Implementing a Data Science Solution on Azure Online Training
Microsoft DP-100 Online Training
The questions for DP-100 were last updated at Jan 21,2025.
- Exam Code: DP-100
- Exam Name: Designing and Implementing a Data Science Solution on Azure
- Certification Provider: Microsoft
- Latest update: Jan 21,2025
You are developing a data science workspace that uses an Azure Machine Learning service.
You need to select a compute target to deploy the workspace.
What should you use?
- A . Azure Data Lake Analytics
- B . Azure Databrick .
- C . Apache Spark for HDInsight.
- D . Azure Container Service
You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset that has missing values in many columns. The data does not require the application of predictors for each column. You plan to use the Clean Missing Data module to handle the missing data.
You need to select a data cleaning method.
Which method should you use?
- A . Synthetic Minority Oversampling Technique (SMOTE)
- B . Replace using MICE
- C . Replace using; Probabilistic PCA
- D . Normalization
You are determining if two sets of data are significantly different from one another by using Azure Machine Learning Studio.
Estimated values in one set of data may be more than or less than reference values in the other set of data. You must produce a distribution that has a constant Type I error as a function of the correlation.
You need to produce the distribution.
Which type of distribution should you produce?
- A . Paired t-test with a two-tail option
- B . Unpaired t-test with a two tail option
- C . Paired t-test with a one-tail option
- D . Unpaired t-test with a one-tail option
HOTSPOT
You are developing a machine learning, experiment by using Azure.
The following images show the input and output of a machine learning experiment:
Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic. NOTE: Each correct selection is worth one point.
You are creating a machine learning model.
You need to identify outliers in the data.
Which two visualizations can you use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point. NOTE: Each correct selection is worth one point.
- A . box plot
- B . scatter
- C . random forest diagram
- D . Venn diagram
- E . ROC curve
HOTSPOT
You arc I mating a deep learning model to identify cats and dogs. You have 25,000 color images.
You must meet the following requirements:
• Reduce the number of training epochs.
• Reduce the size of the neural network.
• Reduce over-fitting of the neural network. You need to select the image modification values.
Which value should you use? To answer, select the appropriate Options in the answer area. NOTE: Each correct selection is worth one point.
HOTSPOT
You are performing sentiment analysis using a CSV file that includes 12,000 customer reviews written in a short sentence format. You add the CSV file to Azure Machine Learning Studio and configure it as the starting point dataset of an experiment. You add the Extract N-Gram Features from Text module to the experiment to extract key phrases from the customer review column in the dataset.
You must create a new n-gram dictionary from the customer review text and set the maximum n-gram size to trigrams.
What should you select? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
You are analyzing a dataset by using Azure Machine Learning Studio.
YOU need to generate a statistical summary that contains the p value and the unique value count for each feature column.
Which two modules can you users? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
- A . Execute Python Script
- B . Export Count Table
- C . Convert to Indicator Values
- D . Summarize Data
- E . Compute linear Correlation
You are building a binary classification model by using a supplied training set.
The training set is imbalanced between two classes.
You need to resolve the data imbalance.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution NOTE: Each correct selection is worth one point.
- A . Penalize the classification
- B . Resample the data set using under sampling or oversampling
- C . Generate synthetic samples in the minority class.
- D . Use accuracy as the evaluation metric of the model.
- E . Normalize the training feature set.
You are building recurrent neural network to perform a binary classification.
The training loss, validation loss, training accuracy, and validation accuracy of each training epoch has been provided. You need to identify whether the classification model is over fitted.
Which of the following is correct?
- A . The training loss increases while the validation loss decreases when training the model.
- B . The training loss decreases while the validation loss increases when training the model.
- C . The training loss stays constant and the validation loss decreases when training the model.
- D . The training loss .stays constant and the validation loss stays on a constant value and close to the training loss value when training the model.