What should the Specialist do to optimize the data for training on SageMaker?
A Machine Learning Specialist is preparing data for training on Amazon SageMaker. The Specialist is transformed into a numpy .array, which appears to be negatively affecting the speed of the training.
What should the Specialist do to optimize the data for training on SageMaker?
A . Use the SageMaker batch transform feature to transform the training data into a DataFrame
B . Use AWS Glue to compress the data into the Apache Parquet format
C . Transform the dataset into the Recordio protobuf format
D . Use the SageMaker hyperparameter optimization feature to automatically optimize the data
Answer: C
Explanation:
The Recordio protobuf format is a binary data format that is optimized for training on SageMaker. It allows faster data loading and lower memory usage compared to other formats such as CSV or numpy arrays. The Recordio protobuf format also supports features such as sparse input, variable-length input, and label embedding. To use the Recordio protobuf format, the data needs to be serialized and deserialized using the appropriate libraries. Some of the built-in algorithms in SageMaker support the Recordio protobuf format as a content type for training and inference.
References:
Common Data Formats for Training
Using RecordIO Format
Content Types Supported by Built-in Algorithms
Latest MLS-C01 Dumps Valid Version with 104 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund