Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries. The data engineering team has been made aware of new requirements from a customer-facing...

March 30, 2025 No Comments READ MORE +

Which describes how Delta Lake can help to avoid data loss of this nature in the future?

A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on...

March 28, 2025 No Comments READ MORE +

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table. Before executing the code,...

March 27, 2025 No Comments READ MORE +

Which statement describes the results of querying recent_orders?

A table is registered with the following code: Both users and orders are Delta Lake tables. Which statement describes the results of querying recent_orders?A . All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query...

March 27, 2025 No Comments READ MORE +

Which statement describes the results returned by this query?

A table named user_ltv is being used to create a view that will be used by data analysts on various teams. Users in the workspace are configured into groups, which are used for setting up data access using ACLs. The user_ltv table has the following schema: email STRING, age INT,...

March 26, 2025 No Comments READ MORE +

Which solution meets these requirements?

An upstream system is emitting change data capture (CDC) logs that are being written to a cloud object storage directory. Each record in the log indicates the change type (insert, update, or delete) and the values for each field after the change. The source table has a primary key identified...

March 16, 2025 No Comments READ MORE +

Which statement describes Delta Lake Auto Compaction?

Which statement describes Delta Lake Auto Compaction?A . An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 1 GB.B . Before a Jobs cluster terminates, optimize is executed on all tables modified...

March 15, 2025 No Comments READ MORE +

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device. Streaming DataFrame df has the following...

March 7, 2025 No Comments READ MORE +

Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM. Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?A . • Total VMs;...

March 6, 2025 No Comments READ MORE +

Which statement explains the cause of this failure?

The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalid latitude and longitude values in the activity_details table have been breaking their ability to use other geolocation processes. A junior engineer has written the...

March 5, 2025 No Comments READ MORE +