Given these requirements, which AWS deployment service and configuration is the MOST SUITABLE for deploying the machine learning model?

exams MLA-C01 MLA-C01 exam 0 Comments

You are a data scientist at a retail company responsible for deploying a machine learning model that predicts customer purchase behavior. The model needs to serve real-time predictions with low latency to support the company’s recommendation engine on its e-commerce platform. The deployment solution must also be scalable to handle varying traffic loads during peak shopping periods, such as Black Friday and holiday sales. Additionally, you need to monitor the model’s performance and automatically roll out updates when a new version of the model is available.

Given these requirements, which AWS deployment service and configuration is the MOST SUITABLE for deploying the machine learning model?
A . Deploy the model on Amazon EC2 instances with a load balancer to distribute traffic, manually scaling the instances based on expected traffic during peak periods
B . Deploy the model on Amazon SageMaker with batch transform jobs, running the jobs periodically to generate predictions and storing the results in Amazon S3 for the recommendation engine
C . Deploy the model using Amazon SageMaker real-time hosting services with an auto-scaling endpoint, enabling you to automatically adjust the number of instances based on traffic demand
D . Use AWS Lambda to deploy the model as a serverless function, automatically scaling based on the number of requests, and store the model artifacts in Amazon S3

Answer: C

Explanation:

Correct option:

Deploy the model using Amazon SageMaker real-time hosting services with an auto-scaling endpoint, enabling you to automatically adjust the number of instances based on traffic demand

Amazon SageMaker hosting services provide a managed environment for deploying models in real-time, with support for auto-scaling based on traffic. This ensures low latency and scalability during peak periods, making it ideal for an e-commerce platform with fluctuating traffic. SageMaker also offers monitoring and versioning capabilities, allowing you to manage model updates efficiently.

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.

via – https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html

Incorrect options:

Deploy the model on Amazon EC2 instances with a load balancer to distribute traffic, manually scaling the instances based on expected traffic during peak periods – Deploying on Amazon EC2 with a load balancer can work, but it requires manual scaling, which may not be responsive enough to sudden traffic spikes. It also lacks the integrated monitoring and management features provided by

SageMaker.

Use AWS Lambda to deploy the model as a serverless function, automatically scaling based on the number of requests, and store the model artifacts in Amazon S3 – AWS Lambda is suitable for serverless deployments with lightweight models and use cases where request volume is unpredictable.

However, it may not be ideal for high-throughput, low-latency applications like real-time recommendation engines, especially if the model is large or requires significant compute resources.

Deploy the model on Amazon SageMaker with batch transform jobs, running the jobs periodically to generate predictions and storing the results in Amazon S3 for the recommendation engine – Batch transform jobs in SageMaker are designed for batch predictions rather than real-time inference. While this approach works for generating predictions in bulk, it does not meet the requirement for real-time, low-latency predictions needed by the recommendation engine.

References:

https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html

https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html