SAS Institute A00-406 SAS Viya Supervised Machine Learning Pipelines Online Training

exams

1 year ago

Question #1

Refer to the exhibit below:

Based on the output from the Data Exploration node shown in the exhibit, which variable has the most thin tails (most platykurtic distribution)?

A . Logi_rfm4
B . Logi_rfm6
C . Logi_rfm8
D . Logi_rfm12

Reveal Solution Hide Solution

Correct Answer: D

Question #2

Given the following properties for a neural network model, which statement is true regrading hidden units in the model? The following SAS program is submitted:

A . There are no hidden units in the model.
B . The number of hidden units is 1.
C . The number of hidden units is 50.
D . The number of hidden units is 26.

Reveal Solution Hide Solution

Correct Answer: D

Question #3

Which of the following is an example of a NoSQL database that is commonly used to store unstructured data?

A . MySQL
B . MongoDB
C . Oracle Database
D . Microsoft SQL Server

Reveal Solution Hide Solution

Correct Answer: B

Question #4

What is the primary goal of A/B testing in the context of model deployment?

A . To evaluate the model’s accuracy
B . To compare two different versions of a model or strategy to determine which performs better
C . To assess data quality
D . To create synthetic data

Reveal Solution Hide Solution

Correct Answer: B

Question #5

What does the term "bias" in machine learning refer to?

A . A model’s inability to generalize to new data
B . Systematic errors that cause a model to consistently underpredict or overpredict
C . The simplicity of a model
D . The overall accuracy of a model

Reveal Solution Hide Solution

Correct Answer: B

Question #6

What is the significance of the "bias-variance trade-off" in machine learning?

A . It represents the trade-off between underfitting and overfitting.
B . It indicates the trade-off between accuracy and precision.
C . It refers to the trade-off between the number of features and the model’s complexity.
D . It is not relevant in machine learning.

Reveal Solution Hide Solution

Correct Answer: A

Question #7

What is the purpose of cross-validation in model building and evaluation?

A . Splitting the dataset into training and testing sets
B . Reducing the dataset size
C . Assessing the model’s generalization performance
D . Generating synthetic data

Reveal Solution Hide Solution

Correct Answer: C

Question #8

What is the purpose of a "canary release" in the context of model deployment?

A . To assess data quality
B . To deploy a new model version to a small subset of users or systems for testing
C . To create synthetic data
D . To evaluate model accuracy

Reveal Solution Hide Solution

Correct Answer: B

Question #9

When deploying a machine learning model, what is "model drift"?

A . A sudden increase in the model’s accuracy
B . A change in the distribution of the input data or target variable over time
C . The process of feature extraction
D . A measure of feature importance

Reveal Solution Hide Solution

Correct Answer: B

Question #10

Which algorithm is commonly used for decision-making tasks in classification models?

A . K-Means
B . Decision Trees
C . Principal Component Analysis (PCA)
D . Linear Regression

Reveal Solution Hide Solution

Correct Answer: B

Question #11

What is the primary purpose of model documentation in the model deployment phase?

A . To create synthetic data
B . To assess data quality
C . To provide information on the model’s development, architecture, and usage
D . To evaluate the model’s accuracy

Reveal Solution Hide Solution

Correct Answer: C

Question #12

Which of the following best describes unstructured data?

A . Data that is organized in rows and columns
B . Data that is difficult to process and lacks a predefined structure
C . Data stored in a relational database
D . Data with a clear schema

Reveal Solution Hide Solution

Correct Answer: B

Question #13

What is the main advantage of using a RESTful API (Representational State Transfer) as a data source?

A . Real-time data processing
B . Support for complex data structures
C . Simple and standardized communication
D . High security features

Reveal Solution Hide Solution

Correct Answer: C

Question #14

Which statements are true for the F1 score?

(Choose 2.)

A . F1 score is calculated based on a depth value.
B . F1 score is calculated based on a cut off value.
C . F1 score is applicable to a model with a binary target.
D . F1 score is applicable to a model with an interval target.

Reveal Solution Hide Solution

Correct Answer: BC

Question #15

Which type of model is commonly used for anomaly detection in datasets?

A . Decision Trees
B . Clustering Models
C . Linear Regression
D . Principal Component Analysis (PCA)

Reveal Solution Hide Solution

Correct Answer: B

Question #16

What does API stand for in the context of data sources?

A . Application Programming Interface
B . Advanced Programming Integration
C . Automated Program Integration
D . Application Program Interface

Reveal Solution Hide Solution

Correct Answer: A

Question #17

Which data source allows for real-time data streaming and processing?

A . Data warehouses
B . Cloud storage
C . IoT devices
D . Static data files

Reveal Solution Hide Solution

Correct Answer: C

Question #18

What is the main advantage of ensemble methods in model building?

A . They require minimal data preprocessing
B . They produce simple and interpretable models
C . They combine multiple models to improve predictive performance
D . They work well with high-dimensional data

Reveal Solution Hide Solution

Correct Answer: C

Question #19

Which machine learning technique is typically used for building a model to predict a numeric target variable?

A . Classification
B . Regression
C . Clustering
D . Dimensionality reduction

Reveal Solution Hide Solution

Correct Answer: B

Question #20

Refer to the treemap shown in the exhibit below:

Which statement is true about the tree map for a decision tree with a binary target?

A . The top bar represents the node with the highest probability of event.
B . The darker bars represent nodes with a lower probability of event.
C . The top bar represents the node with the highest count.
D . The wider bars represent nodes with a higher probability of event.

Reveal Solution Hide Solution

Correct Answer: C

Question #21

What is the primary purpose of model assessment in the context of data science and machine learning?

A . Data preprocessing
B . Model building
C . Evaluating and selecting the best-performing model
D . Data visualization

Reveal Solution Hide Solution

Correct Answer: C

Question #22

What is the main advantage of ensemble learning methods, such as Random Forest, in a machine learning pipeline?

A . They are simple and easy to interpret.
B . They are not suitable for large datasets.
C . They combine multiple models to improve predictive performance.
D . They require minimal data preprocessing.

Reveal Solution Hide Solution

Correct Answer: C

Question #23

When deploying a machine learning model, what is meant by "model latency"?

A . The time it takes to build a model
B . The time it takes to train a model
C . The time it takes for the model to make predictions once deployed
D . The time it takes to create synthetic data

Reveal Solution Hide Solution

Correct Answer: C

Question #24

What is the purpose of an ROC curve (Receiver Operating Characteristic) in model assessment?

A . To evaluate regression models
B . To visualize data distribution
C . To compare a model’s true positive rate with the false positive rate
D . To measure feature importance

Reveal Solution Hide Solution

Correct Answer: C

Question #25

Which data source typically provides access to real-time financial market data?

A . Social media platforms
B . Weather stations
C . Stock market APIs
D . Online news websites

Reveal Solution Hide Solution

Correct Answer: C

Question #26

In model assessment, what does "cross-validation" aim to address?

A . Training a model
B . Overfitting and generalization
C . Data preprocessing
D . Model deployment

Reveal Solution Hide Solution

Correct Answer: B

Question #27

Which metric is commonly used to evaluate the performance of a regression model?

A . F1 Score
B . Mean Absolute Error (MAE)
C . Precision
D . Confusion Matrix

Reveal Solution Hide Solution

Correct Answer: B

Question #28

What is "model reevaluation" in the model deployment phase?

A . The process of data preprocessing
B . The process of selecting features
C . The periodic assessment of a deployed model’s performance and potential retraining
D . The evaluation of data distribution

Reveal Solution Hide Solution

Correct Answer: C

Question #29

What is overfitting in machine learning, and how can it be addressed in a pipeline?

A . Overfitting occurs when the model is too simple and underperforms.
B . Overfitting occurs when the model fits the training data too closely and may not generalize well. It can be addressed by regularization techniques.
C . Overfitting occurs when the model is too complex and overperforms.
D . Overfitting is not a concern in machine learning pipelines.

Reveal Solution Hide Solution

Correct Answer: B

Question #30

What is a data lake?

A . A data storage solution designed for high-speed data retrieval
B . A centralized repository for storing all structured and unstructured data at any scale
C . A specialized database for time-series data
D . A backup system for relational databases

Reveal Solution Hide Solution

Correct Answer: B