Which statement regarding stream-static joins and static Delta tables is correct?

Which statement regarding stream-static joins and static Delta tables is correct?A . Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch.B . Each microbatch of a stream-static join will use the most recent version of the static Delta...

September 13, 2024 No Comments READ MORE +

Which statement characterizes the general programming model used by Spark Structured Streaming?

Which statement characterizes the general programming model used by Spark Structured Streaming?A . Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.B . Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.C . Structured Streaming uses specialized hardware and I/O...

September 12, 2024 No Comments READ MORE +

Which code block accomplishes this task while minimizing potential compute costs?

The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE". The data science team would like predictions saved...

September 12, 2024 No Comments READ MORE +

Which statement describes Delta Lake Auto Compaction?

Which statement describes Delta Lake Auto Compaction?A . An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 1 GB.B . Before a Jobs cluster terminates, optimize is executed on all tables modified...

September 12, 2024 No Comments READ MORE +

Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?

A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30...

September 11, 2024 No Comments READ MORE +

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table. Before executing the code,...

September 10, 2024 No Comments READ MORE +

Which solution meets these requirements?

An upstream system is emitting change data capture (CDC) logs that are being written to a cloud object storage directory. Each record in the log indicates the change type (insert, update, or delete) and the values for each field after the change. The source table has a primary key identified...

September 9, 2024 No Comments READ MORE +

When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?

When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?A . Cluster: New Job Cluster; Retries: Unlimited; Maximum Concurrent Runs: UnlimitedB . Cluster: New Job Cluster; Retries: None; Maximum Concurrent Runs: 1C . Cluster: Existing All-Purpose Cluster; Retries: Unlimited; Maximum Concurrent Runs:...

September 9, 2024 No Comments READ MORE +

Which statement describes how the Delta engine identifies which files to load?

A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3 Which statement describes how...

September 9, 2024 No Comments READ MORE +

If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?

An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable: Assume that the fields customer_id and order_id serve as a composite key...

September 9, 2024 No Comments READ MORE +