Why is this occurring?

exams ADA-C01 ADA-C01 exam 0 Comments

A retailer uses a TRANSACTIONS table (100M rows, 1.2 TB) that has been clustered by the STORE_ID column (varchar (50)). The vast majority of analyses on this table are grouped by STORE_ID to look at store performance.

There are 1000 stores operated by the retailer but most sales come from only 20 stores. The Administrator notes that most queries are currently experiencing poor pruning, with large amounts of bytes processed by even simple queries.

Why is this occurring?
A . The STORE_ID should be numeric.
B . The table is not big enough to take advantage of the clustering key.
C . Sales across stores are not uniformly distributed.
D . The cardinality of the stores to transaction count ratio is too low to use the STORE_ID as a clustering key.

Answer: C

Explanation:

According to the Snowflake documentation1, clustering keys are most effective when the data is evenly distributed across the key values. If the data is skewed, such as in this case where most sales come from only 20 stores out of 1000, then the micro-partitions will not be well-clustered and the pruning will be poor. This means that more bytes will be scanned by queries, even if they filter by STORE_ID.

Option A is incorrect because the data type of the clustering key does not affect the pruning.

Option B is incorrect because the table is large enough to benefit from clustering, if the data was more balanced.

Option D is incorrect because the cardinality of the clustering key is not relevant for pruning, as long as the key values are distinct.

1: Considerations for Choosing Clustering for a Table | Snowflake Documentation