Google Professional Machine Learning Engineer Google Professional Machine Learning Engineer Online Training
Google Professional Machine Learning Engineer Online Training
The questions for Professional Machine Learning Engineer were last updated at Feb 20,2025.
- Exam Code: Professional Machine Learning Engineer
- Exam Name: Google Professional Machine Learning Engineer
- Certification Provider: Google
- Latest update: Feb 20,2025
You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.
Which action should you try first to increase the efficiency of your pipeline?
- A . Preprocess the input CSV file into a TFRecord file.
- B . Randomly select a 10 gigabyte subset of the data to train your model.
- C . Split into multiple CSV files and use a parallel interleave transformation.
- D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.
You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.
Which action should you try first to increase the efficiency of your pipeline?
- A . Preprocess the input CSV file into a TFRecord file.
- B . Randomly select a 10 gigabyte subset of the data to train your model.
- C . Split into multiple CSV files and use a parallel interleave transformation.
- D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.
Export the batch prediction job outputs from Cloud Storage and import them into BigQuery.
Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading.
Which approach should you use?
- A . Create a collaborative filtering system that recommends articles to a user based on the user’s past behavior.
- B . Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.
- C . Build a logistic regression model for each user that predicts whether an article should be recommended to a user.
- D . Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories.
You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a human.
Which metric(s) should you use to monitor the model’s performance?
- A . Number of messages flagged by the model per minute
- B . Number of messages flagged by the model per minute confirmed as being inappropriate by humans.
- C . Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review
- D . Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
You have been given a dataset with sales predictions based on your company’s marketing activities. The data is structured and stored in BigQuery, and has been carefully managed by a team of data analysts. You need to prepare a report providing insights into the predictive capabilities of the data. You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks. You only have a few hours to gather the results of your experiments.
Which Google Cloud tools should you use to complete this task in the most efficient and self-serviced way?
- A . Use BigQuery ML to run several regression models, and analyze their performance.
- B . Read the data from BigQuery using Dataproc, and run several models using SparkML.
- C . Use Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics.
- D . Train a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms.
You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision.
What should you do?
- A . Use local feature importance from the predictions.
- B . Use the correlation with target values in the data summary page.
- C . Use the feature importance percentages in the model evaluation page.
- D . Vary features independently to identify the threshold per feature that changes the classification.
You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year. Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to AI Platform. You need to determine which customer attribute has the most predictive power for each prediction served by the model.
What should you do?
- A . Use AI Platform notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong signal.
- B . Stream prediction results to BigQuery. Use BigQuery’s CORR (X1, X2) function to calculate the Pearson correlation coefficient between each feature and the target variable.
- C . Use the AI Explanations feature on AI Platform. Submit each prediction request with the ‘explain’ keyword to retrieve feature attributions using the sampled Shapley method.
- D . Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature importance in order of those that caused the most significant performance drop when removed from the model.
You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed.
Which metrics would give you the most confidence in your model?
- A . F-score where recall is weighed more than precision
- B . RMSE
- C . F1 score
- D . F-score where precision is weighed more than recall
You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product sales volumes, expenses, and profits for all regions.
What should you use as the input and output for your model?
- A . Use latitude, longitude, and product type as features. Use profit as model output.
- B . Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs.
- C . Use product type and the feature cross of latitude with longitude, followed by binning, as features.
Use profit as model output. - D . Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model outputs.