You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level TensorFlow APIs on AI Platform with GPUs. Your team usually
takes a few weeks or months to iterate on a new version of a model. You were recently asked to review your team’s spending.
How should you reduce your Google Cloud compute costs without impacting the model’s performance?
A . Use AI Platform to run distributed training jobs with checkpoints.
B . Use AI Platform to run distributed training jobs without checkpoints.
C . Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.
D . Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.
Answer: C
Explanation:
Option A is incorrect because using AI Platform to run distributed training jobs with checkpoints does not reduce the compute costs, but rather increases them by using more resources and storing the checkpoints.
Option B is incorrect because using AI Platform to run distributed training jobs without checkpoints may reduce the compute costs, but it also risks losing the progress of the training if the job fails or is interrupted.
Option C is correct because migrating to training with Kubeflow on Google Kubernetes Engine, and using preemptible VMs with checkpoints can reduce the compute costs significantly by using cheaper and more scalable resources, while also preserving the state of the training with checkpoints.
Option D is incorrect because using preemptible VMs without checkpoints may reduce the compute costs, but it also risks losing the training progress if the VMs are preempted.
Reference: Kubeflow on Google Cloud
Using preemptible VMs and GPUs
Saving and loading models
Latest Professional Machine Learning Engineer Dumps Valid Version with 60 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund