Google Professional Machine Learning Engineer Google Professional Machine Learning Engineer Online Training
Google Professional Machine Learning Engineer Online Training
The questions for Professional Machine Learning Engineer were last updated at Feb 18,2025.
- Exam Code: Professional Machine Learning Engineer
- Exam Name: Google Professional Machine Learning Engineer
- Certification Provider: Google
- Latest update: Feb 18,2025
You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way.
Which strategy should you choose?
- A . Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.
- B . Separate each data scientist’s work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.
- C . Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources.
- D . Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In BigQuery, create a SQL view that maps users to the resources they are using
You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables. You need to prioritize detection of fraudulent transactions while minimizing false positives.
Which optimization objective should you use when training the model?
- A . An optimization objective that minimizes Log loss
- B . An optimization objective that maximizes the Precision at a Recall value of 0.50
- C . An optimization objective that maximizes the area under the precision-recall curve (AUC PR) value
- D . An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value
Your company manages a video sharing website where users can watch and upload videos. You need to create an ML model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your company’s website.
Which result should you use to determine whether the model is successful?
- A . The model predicts videos as popular if the user who uploads them has over 10,000 likes.
- B . The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.
- C . The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.
- D . The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.
You are working on a Neural Network-based project. The dataset provided to you has columns with different ranges. While preparing the data for model training, you discover that gradient optimization is having difficulty moving weights to a good solution.
What should you do?
- A . Use feature construction to combine the strongest features.
- B . Use the representation transformation (normalization) technique.
- C . Improve the data cleaning step by removing features with missing values.
- D . Change the partitioning step to reduce the dimension of the test set and have a larger training set.
You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent.
Which data transformation strategy would likely improve the performance of your classifier?
- A . Write your data in TFRecords.
- B . Z-normalize all the numeric features.
- C . Oversample the fraudulent transaction 10 times.
- D . Use one-hot encoding on all categorical features.
You are developing an ML model intended to classify whether X-Ray images indicate bone fracture risk. You have trained on Api Resnet architecture on Vertex AI using a TPU as an accelerator, however you are unsatisfied with the trainning time and use memory usage. You want to quickly iterate your training code but make minimal changes to the code. You also want to minimize impact on the models accuracy.
What should you do?
- A . Configure your model to use bfloat16 instead float32
- B . Reduce the global batch size from 1024 to 256
- C . Reduce the number of layers in the model architecture
- D . Reduce the dimensions of the images used un the model
Your task is classify if a company logo is present on an image. You found out that 96% of a data does not include a logo. You are dealing with data imbalance problem.
Which metric do you use to evaluate to model?
- A . F1 Score
- B . RMSE
- C . F Score with higher precision weighting than recall
- D . F Score with higher recall weighted than precision
You need to train a regression model based on a dataset containing 50,000 records that is stored in BigQuery. The data includes a total of 20 categorical and numerical features with a target variable that can include negative values. You need to minimize effort and training time while maximizing model performance.
What approach should you take to train this regression model?
- A . Create a custom TensorFlow DNN model.
- B . Use BQML XGBoost regression to train the model
- C . Use AutoML Tables to train the model without early stopping.
- D . Use AutoML Tables to train the model with RMSLE as the optimization objective
Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests.
Which platform components should you choose for this system?
- A . Vertex AI Pipelines and App Engine
- B . Vertex AI Pipelines and Al Platform Prediction
- C . Cloud Composer, BigQuery ML , and Al Platform Prediction
- D . Cloud Composer, Al Platform Training with custom containers , and App Engine
While monitoring your model training’s GPU utilization, you discover that you have a native synchronous implementation. The training data is split into multiple files. You want to reduce the execution time of your input pipeline.
What should you do?
- A . Increase the CPU load
- B . Add caching to the pipeline
- C . Increase the network bandwidth
- D . Add parallel interleave to the pipeline