How should the records be stored in Amazon S3 to improve query performance?

exams MLS-C01 V2 MLS-C01 exam 0 Comments

A monitoring service generates 1 TB of scale metrics record data every minute A Research team performs queries on this data using Amazon Athena. The queries run slowly due to the large volume of data, and the team requires better performance

How should the records be stored in Amazon S3 to improve query performance?
A . CSV files
B . Parquet files
C . Compressed JSON
D . RecordIO

Answer: B

Explanation:

Parquet is a columnar storage format that can store data in a compressed and efficient way. Parquet files can improve query performance by reducing the amount of data that needs to be scanned, as only the relevant columns are read from the files. Parquet files can also support predicate pushdown, which means that the filtering conditions are applied at the storage level, further reducing the data that needs to be processed. Parquet files are compatible with Amazon Athena, which can leverage the benefits of the columnar format and provide faster and cheaper queries. Therefore, the records should be stored in Parquet files in Amazon S3 to improve query performance.

References:

Columnar Storage Formats – Amazon Athena

Parquet SerDe – Amazon Athena

Optimizing Amazon Athena Queries – Amazon Athena

Parquet – Apache Software Foundation