Databricks Certified Professional Data Engineer exam Archives - Page 4 of 43

Which statement describes how the Delta engine identifies which files to load?

A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3 Which statement describes how...

Which statement regarding stream-static joins and static Delta tables is correct?

Which statement regarding stream-static joins and static Delta tables is correct?A . Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch.B . Each microbatch of a stream-static join will use the most recent version of the static Delta...

Assuming that all data governance considerations are accounted for, which statement accurately informs this decision?

A small company based in the United States has recently contracted a consulting firm in India to implement several new data engineering pipelines to power artificial intelligence applications. All the company's data is stored in regional cloud storage in the United States. The workspace administrator at the company is uncertain...

February 22, 2025 exams Databricks Certified Professional Data Engineer V2 Databricks Certified Professional Data Engineer exam No Comments READ MORE +

If all users on the finance team are members of the finance group, which statement describes how the tx_sales table will be created?

An external object storage container has been mounted to the location /mnt/finance_eda_bucket. The following logic was executed to create a database for the finance team: After the database was successfully created and permissions configured, a member of the finance team runs the following code: If all users on the finance...

February 17, 2025 exams Databricks Certified Professional Data Engineer V2 Databricks Certified Professional Data Engineer exam No Comments READ MORE +

Assuming there are millions of user accounts and tens of thousands of records processed hourly, which implementation can be used to efficiently update the described account_current table as part of each hourly batch job?

An hourly batch job is configured to ingest data files from a cloud object storage container where each batch represent all records produced by the source system in a given hour. The batch job to process these records into the Lakehouse is sufficiently delayed to ensure no late-arriving data is...

February 11, 2025 exams Databricks Certified Professional Data Engineer V2 Databricks Certified Professional Data Engineer exam No Comments READ MORE +

Which of the following accurately presents information about Delta Lake and Databricks that may impact their decision-making process?

A junior data engineer is working to implement logic for a Lakehouse table named silver_device_recordings. The source data contains 100 unique fields in a highly nested JSON structure. The silver_device_recordings table will be used downstream to power several production monitoring dashboards and a production model. At present, 45 of the...

February 10, 2025 exams Databricks Certified Professional Data Engineer V2 Databricks Certified Professional Data Engineer exam No Comments READ MORE +

Which command allows manual confirmation that these three requirements have been met?

The data governance team has instituted a requirement that all tables containing Personal Identifiable Information (PH) must be clearly annotated. This includes adding column comments, table comments, and setting the custom table property "contains_pii" = true. The following SQL DDL statement is executed to create a new table: Which command...

February 10, 2025 exams Databricks Certified Professional Data Engineer V2 Databricks Certified Professional Data Engineer exam No Comments READ MORE +

Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?

A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create. Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?A . Three new jobs named "Ingest new data" will be...

February 10, 2025 exams Databricks Certified Professional Data Engineer V2 Databricks Certified Professional Data Engineer exam No Comments READ MORE +

Which approach will ensure that this requirement is met?

The data architect has mandated that all tables in the Lakehouse should be configured as external Delta Lake tables. Which approach will ensure that this requirement is met?A . Whenever a database is being created, make sure that the location keyword is usedB . When configuring an external data warehouse...

February 6, 2025 exams Databricks Certified Professional Data Engineer V2 Databricks Certified Professional Data Engineer exam No Comments READ MORE +

Which situation is causing increased duration of the overall job?

A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task...

February 4, 2025 exams Databricks Certified Professional Data Engineer V2 Databricks Certified Professional Data Engineer exam No Comments READ MORE +