You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure.
What should you do?
A . Significantly increase the max_batch_size TensorFlow Serving parameter
B . Switch to the tensorflow-model-server-universal version of TensorFlow Serving
C . Significantly increase the max_enqueued_batches TensorFlow Serving parameter
D . Recompile TensorFlow Serving using the source to support CPU-specific optimizations Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes
Answer: D
Explanation:
https://www.tensorflow.org/tfx/serving/performance
Latest Professional Machine Learning Engineer Dumps Valid Version with 60 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund