What should the Specialist do to optimize the data for training on SageMaker?

exams MLS-C01 V2 MLS-C01 exam 0 Comments

A Machine Learning Specialist is preparing data for training on Amazon SageMaker. The Specialist is transformed into a numpy .array, which appears to be negatively affecting the speed of the training.

What should the Specialist do to optimize the data for training on SageMaker?
A . Use the SageMaker batch transform feature to transform the training data into a DataFrame
B . Use AWS Glue to compress the data into the Apache Parquet format
C . Transform the dataset into the Recordio protobuf format
D . Use the SageMaker hyperparameter optimization feature to automatically optimize the data

Answer: C

Explanation:

The Recordio protobuf format is a binary data format that is optimized for training on SageMaker. It allows faster data loading and lower memory usage compared to other formats such as CSV or numpy arrays. The Recordio protobuf format also supports features such as sparse input, variable-length input, and label embedding. To use the Recordio protobuf format, the data needs to be serialized and deserialized using the appropriate libraries. Some of the built-in algorithms in SageMaker support the Recordio protobuf format as a content type for training and inference.

References:

Common Data Formats for Training

Using RecordIO Format

Content Types Supported by Built-in Algorithms