Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?
A . spark.sql.files.maxPartitionBytes
B . spark.sql.autoBroadcastJoinThreshold
C . spark.sql.files.openCostInBytes
D . spark.sql.adaptive.coalescePartitions.minPartitionNum
E . spark.sql.adaptive.advisoryPartitionSizeInBytes
Answer: A
Explanation:
This is the correct answer because spark.sql.files.maxPartitionBytes is a configuration parameter that directly affects the size of a spark-partition upon ingestion of data into Spark. This parameter configures the maximum number of bytes to pack into a single partition when reading files from file-based sources such as Parquet, JSON and ORC. The default value is 128 MB, which means each partition will be roughly 128 MB in size, unless there are too many small files or only one large file.
Verified Reference: [Databricks Certified Data Engineer Professional], under “Spark Configuration” section; Databricks Documentation, under “Available Properties – spark.sql.files.maxPartitionBytes” section.
Latest Databricks Certified Professional Data Engineer Dumps Valid Version with 222 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund