Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?
A . spark.sql.files.maxPartitionBytes
B . spark.sql.autoBroadcastJoinThreshold
C . spark.sql.files.openCostInBytes
D . spark.sql.adaptive.coalescePartitions.minPartitionNum
E . spark.sql.adaptive.advisoryPartitionSizeInBytes

Answer: A

Explanation:

This is the correct answer because spark.sql.files.maxPartitionBytes is a configuration parameter that directly affects the size of a spark-partition upon ingestion of data into Spark. This parameter configures the maximum number of bytes to pack into a single partition when reading files from file-based sources such as Parquet, JSON and ORC. The default value is 128 MB, which means each partition will be roughly 128 MB in size, unless there are too many small files or only one large file.

Verified Reference: [Databricks Certified Data Engineer Professional], under “Spark Configuration” section; Databricks Documentation, under “Available Properties – spark.sql.files.maxPartitionBytes” section.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments