Which of the following commands could the data engineering team use to access sales in PySpark?

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following commands could the data engineering team use to access sales in PySpark?
A . SELECT * FROM sales
B . There is no way to share data between PySpark and SQL.
C . spark.sql("sales")
D . spark.delta.table("sales")
E . spark.table("sales")

Answer: E

Explanation:

The data engineering team can use the spark.table method to access the Delta table sales in PySpark. This method returns a DataFrame representation of the Delta table, which can be used for further processing or testing. The spark.table method works for any table that is registered in the Hive metastore or the Spark catalog, regardless of the file format1. Alternatively, the data engineering team can also use the DeltaTable.forPath method to load the Delta table from its path2.

Reference:

1: SparkSession | PySpark 3.2.0 documentation

2: Welcome to Delta Lake’s Python documentation page ― delta-spark 2.4.0 documentation

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments