You are an ML engineer at a mobile gaming company. A data scientist on your team recently trained a TensorFlow model, and you are responsible for deploying this model into a mobile application. You discover that the inference latency of the current model doesn’t meet production requirements. You need to reduce the inference time by 50%, and you are willing to accept a small decrease in model accuracy in order to reach the latency requirement.
Without training a new model, which model optimization technique for reducing latency should you try first?
A . Weight pruning
B. Dynamic range quantization
C. Model distillation
D. Dimensionality reduction
Answer: C
Latest Professional Machine Learning Engineer Dumps Valid Version with 60 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund