Exam4Training

SAS Institute A00-406 SAS Viya Supervised Machine Learning Pipelines Online Training

Question #1

Refer to the exhibit below:

Based on the output from the Data Exploration node shown in the exhibit, which variable has the most thin tails (most platykurtic distribution)?

  • A . Logi_rfm4
  • B . Logi_rfm6
  • C . Logi_rfm8
  • D . Logi_rfm12

Reveal Solution Hide Solution

Correct Answer: D
Question #2

Given the following properties for a neural network model, which statement is true regrading hidden units in the model? The following SAS program is submitted:

  • A . There are no hidden units in the model.
  • B . The number of hidden units is 1.
  • C . The number of hidden units is 50.
  • D . The number of hidden units is 26.

Reveal Solution Hide Solution

Correct Answer: D
Question #3

Which of the following is an example of a NoSQL database that is commonly used to store unstructured data?

  • A . MySQL
  • B . MongoDB
  • C . Oracle Database
  • D . Microsoft SQL Server

Reveal Solution Hide Solution

Correct Answer: B
Question #4

What is the primary goal of A/B testing in the context of model deployment?

  • A . To evaluate the model’s accuracy
  • B . To compare two different versions of a model or strategy to determine which performs better
  • C . To assess data quality
  • D . To create synthetic data

Reveal Solution Hide Solution

Correct Answer: B
Question #5

What does the term "bias" in machine learning refer to?

  • A . A model’s inability to generalize to new data
  • B . Systematic errors that cause a model to consistently underpredict or overpredict
  • C . The simplicity of a model
  • D . The overall accuracy of a model

Reveal Solution Hide Solution

Correct Answer: B
Question #6

What is the significance of the "bias-variance trade-off" in machine learning?

  • A . It represents the trade-off between underfitting and overfitting.
  • B . It indicates the trade-off between accuracy and precision.
  • C . It refers to the trade-off between the number of features and the model’s complexity.
  • D . It is not relevant in machine learning.

Reveal Solution Hide Solution

Correct Answer: A
Question #7

What is the purpose of cross-validation in model building and evaluation?

  • A . Splitting the dataset into training and testing sets
  • B . Reducing the dataset size
  • C . Assessing the model’s generalization performance
  • D . Generating synthetic data

Reveal Solution Hide Solution

Correct Answer: C
Question #8

What is the purpose of a "canary release" in the context of model deployment?

  • A . To assess data quality
  • B . To deploy a new model version to a small subset of users or systems for testing
  • C . To create synthetic data
  • D . To evaluate model accuracy

Reveal Solution Hide Solution

Correct Answer: B
Question #9

When deploying a machine learning model, what is "model drift"?

  • A . A sudden increase in the model’s accuracy
  • B . A change in the distribution of the input data or target variable over time
  • C . The process of feature extraction
  • D . A measure of feature importance

Reveal Solution Hide Solution

Correct Answer: B
Question #10

Which algorithm is commonly used for decision-making tasks in classification models?

  • A . K-Means
  • B . Decision Trees
  • C . Principal Component Analysis (PCA)
  • D . Linear Regression

Reveal Solution Hide Solution

Correct Answer: B

Question #11

What is the primary purpose of model documentation in the model deployment phase?

  • A . To create synthetic data
  • B . To assess data quality
  • C . To provide information on the model’s development, architecture, and usage
  • D . To evaluate the model’s accuracy

Reveal Solution Hide Solution

Correct Answer: C
Question #12

Which of the following best describes unstructured data?

  • A . Data that is organized in rows and columns
  • B . Data that is difficult to process and lacks a predefined structure
  • C . Data stored in a relational database
  • D . Data with a clear schema

Reveal Solution Hide Solution

Correct Answer: B
Question #13

What is the main advantage of using a RESTful API (Representational State Transfer) as a data source?

  • A . Real-time data processing
  • B . Support for complex data structures
  • C . Simple and standardized communication
  • D . High security features

Reveal Solution Hide Solution

Correct Answer: C
Question #14

Which statements are true for the F1 score?

(Choose 2.)

  • A . F1 score is calculated based on a depth value.
  • B . F1 score is calculated based on a cut off value.
  • C . F1 score is applicable to a model with a binary target.
  • D . F1 score is applicable to a model with an interval target.

Reveal Solution Hide Solution

Correct Answer: BC
Question #15

Which type of model is commonly used for anomaly detection in datasets?

  • A . Decision Trees
  • B . Clustering Models
  • C . Linear Regression
  • D . Principal Component Analysis (PCA)

Reveal Solution Hide Solution

Correct Answer: B
Question #16

What does API stand for in the context of data sources?

  • A . Application Programming Interface
  • B . Advanced Programming Integration
  • C . Automated Program Integration
  • D . Application Program Interface

Reveal Solution Hide Solution

Correct Answer: A
Question #17

Which data source allows for real-time data streaming and processing?

  • A . Data warehouses
  • B . Cloud storage
  • C . IoT devices
  • D . Static data files

Reveal Solution Hide Solution

Correct Answer: C
Question #18

What is the main advantage of ensemble methods in model building?

  • A . They require minimal data preprocessing
  • B . They produce simple and interpretable models
  • C . They combine multiple models to improve predictive performance
  • D . They work well with high-dimensional data

Reveal Solution Hide Solution

Correct Answer: C
Question #19

Which machine learning technique is typically used for building a model to predict a numeric target variable?

  • A . Classification
  • B . Regression
  • C . Clustering
  • D . Dimensionality reduction

Reveal Solution Hide Solution

Correct Answer: B
Question #20

Refer to the treemap shown in the exhibit below:

Which statement is true about the tree map for a decision tree with a binary target?

  • A . The top bar represents the node with the highest probability of event.
  • B . The darker bars represent nodes with a lower probability of event.
  • C . The top bar represents the node with the highest count.
  • D . The wider bars represent nodes with a higher probability of event.

Reveal Solution Hide Solution

Correct Answer: C

Question #21

What is the primary purpose of model assessment in the context of data science and machine learning?

  • A . Data preprocessing
  • B . Model building
  • C . Evaluating and selecting the best-performing model
  • D . Data visualization

Reveal Solution Hide Solution

Correct Answer: C
Question #22

What is the main advantage of ensemble learning methods, such as Random Forest, in a machine learning pipeline?

  • A . They are simple and easy to interpret.
  • B . They are not suitable for large datasets.
  • C . They combine multiple models to improve predictive performance.
  • D . They require minimal data preprocessing.

Reveal Solution Hide Solution

Correct Answer: C
Question #23

When deploying a machine learning model, what is meant by "model latency"?

  • A . The time it takes to build a model
  • B . The time it takes to train a model
  • C . The time it takes for the model to make predictions once deployed
  • D . The time it takes to create synthetic data

Reveal Solution Hide Solution

Correct Answer: C
Question #24

What is the purpose of an ROC curve (Receiver Operating Characteristic) in model assessment?

  • A . To evaluate regression models
  • B . To visualize data distribution
  • C . To compare a model’s true positive rate with the false positive rate
  • D . To measure feature importance

Reveal Solution Hide Solution

Correct Answer: C
Question #25

Which data source typically provides access to real-time financial market data?

  • A . Social media platforms
  • B . Weather stations
  • C . Stock market APIs
  • D . Online news websites

Reveal Solution Hide Solution

Correct Answer: C
Question #26

In model assessment, what does "cross-validation" aim to address?

  • A . Training a model
  • B . Overfitting and generalization
  • C . Data preprocessing
  • D . Model deployment

Reveal Solution Hide Solution

Correct Answer: B
Question #27

Which metric is commonly used to evaluate the performance of a regression model?

  • A . F1 Score
  • B . Mean Absolute Error (MAE)
  • C . Precision
  • D . Confusion Matrix

Reveal Solution Hide Solution

Correct Answer: B
Question #28

What is "model reevaluation" in the model deployment phase?

  • A . The process of data preprocessing
  • B . The process of selecting features
  • C . The periodic assessment of a deployed model’s performance and potential retraining
  • D . The evaluation of data distribution

Reveal Solution Hide Solution

Correct Answer: C
Question #29

What is overfitting in machine learning, and how can it be addressed in a pipeline?

  • A . Overfitting occurs when the model is too simple and underperforms.
  • B . Overfitting occurs when the model fits the training data too closely and may not generalize well. It can be addressed by regularization techniques.
  • C . Overfitting occurs when the model is too complex and overperforms.
  • D . Overfitting is not a concern in machine learning pipelines.

Reveal Solution Hide Solution

Correct Answer: B
Question #30

What is a data lake?

  • A . A data storage solution designed for high-speed data retrieval
  • B . A centralized repository for storing all structured and unstructured data at any scale
  • C . A specialized database for time-series data
  • D . A backup system for relational databases

Reveal Solution Hide Solution

Correct Answer: B
Exit mobile version