A Machine Learning Specialist needs to create a data repository to hold a large amount of time-based training data for a new model. In the source system, new files are added every hour Throughout a single 24-hour period, the volume of hourly updates will change significantly. The Specialist always wants to train on the last 24 hours of the data
Which type of data repository is the MOST cost-effective solution?
A . An Amazon EBS-backed Amazon EC2 instance with hourly directories
B . An Amazon RDS database with hourly table partitions
C . An Amazon S3 data lake with hourly object prefixes
D . An Amazon EMR cluster with hourly hive partitions on Amazon EBS volumes
Answer: C
Explanation:
An Amazon S3 data lake is a cost-effective solution for storing and analyzing large amounts of time-based training data for a new model. Amazon S3 is a highly scalable, durable, and secure object storage service that can store any amount of data in any format. Amazon S3 also offers low-cost storage classes, such as S3 Standard-IA and S3 One Zone-IA, that can reduce the storage costs for infrequently accessed data. By using hourly object prefixes, the Machine Learning Specialist can organize the data into logical partitions based on the time of ingestion. This can enable efficient data access and management, as well as support incremental updates and deletes. The Specialist can also use Amazon S3 lifecycle policies to automatically transition the data to lower-cost storage classes or delete the data after a certain period of time. This way, the Specialist can always train on the last 24 hours of the data and optimize the storage costs.
Reference: What is a data lake? – Amazon Web Services
Amazon S3 Storage Classes – Amazon Simple Storage Service Managing your storage lifecycle – Amazon Simple Storage Service Best Practices Design Patterns: Optimizing Amazon S3 Performance
Latest MLS-C01 Dumps Valid Version with 104 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund