Which of the following explains why the data files are no longer present?

A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.

Which of the following explains why the data files are no longer present?
A . The VACUUM command was run on the table
B . The TIME TRAVEL command was run on the table
C . The DELETE HISTORY command was run on the table
D . The OPTIMIZE command was nun on the table
E . The HISTORY command was run on the table

Answer: A

Explanation:

The VACUUM command is used to remove files that are no longer referenced by a Delta table and are older than the retention threshold1. The default retention period is 7 days2, but it can be changed by setting the delta.logRetentionDuration and delta.deletedFileRetentionDuration configurations3. If the VACUUM command was run on the table with a retention period shorter than 3 days, then the data files that were needed to restore the table to a 3-day-old version would have been deleted. The other commands do not delete data files from the table. The TIME TRAVEL command is used to query a historical version of the table4. The DELETE HISTORY command is not a valid command in Delta Lake. The OPTIMIZE command is used to improve the performance of the table by compacting small files into larger ones5. The HISTORY command is used to retrieve information about the operations performed on the table.

Reference:

1: VACUUM | Databricks on AWS

2: Work with Delta Lake table history | Databricks on AWS

3: [Delta Lake configuration | Databricks on AWS]

4: Work with Delta Lake table history – Azure Databricks

5: [OPTIMIZE | Databricks on AWS]: [HISTORY | Databricks on AWS]

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments