Which of the following describes why the data files still exist and the metadata files were deleted?

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?
A . The table’s data was larger than 10 GB
B . The table’s data was smaller than 10 GB
C . The table was external
D . The table did not have a location
E . The table was managed

Answer: C

Explanation:

An external table is a table that is defined in the metastore and points to an existing location in the storage system. When you drop an external table, only the metadata is deleted from the metastore, but the data files are not deleted from the storage system. This is because external tables are meant to be shared by multiple applications and users, and dropping them should not affect the data availability. On the other hand, a managed table is a table that is defined in the metastore and also managed by the metastore. When you drop a managed table, both the metadata and the data files are deleted from the metastore and the storage system, respectively. This is because managed tables are meant to be exclusive to the application or user that created them, and dropping them should free up the storage space. Therefore, the correct answer is C, because the table was external and only the metadata was deleted when the table was dropped.

Reference: Databricks Documentation – Managed and External Tables, Databricks Documentation – Drop Table

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments