Microsoft DP-203 Data Engineering on Microsoft Azure Online Training
Microsoft DP-203 Online Training
The questions for DP-203 were last updated at Dec 20,2024.
- Exam Code: DP-203
- Exam Name: Data Engineering on Microsoft Azure
- Certification Provider: Microsoft
- Latest update: Dec 20,2024
You develop data engineering solutions for a company.
A project requires the deployment of data to Azure Data Lake Storage.
You need to implement role-based access control (RBAC) so that project members can manage the Azure Data Lake Storage resources.
Which three actions should you perform? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
- A . Assign Azure AD security groups to Azure Data Lake Storage.
- B . Configure end-user authentication for the Azure Data Lake Storage account.
- C . Configure service-to-service authentication for the Azure Data Lake Storage account.
- D . Create security groups in Azure Active Directory (Azure AD) and add project members.
- E . Configure access control lists (ACL) for the Azure Data Lake Storage account.
You are designing an Azure Synapse Analytics dedicated SQL pool.
You need to ensure that you can audit access to Personally Identifiable information (PII).
What should you include in the solution?
- A . dynamic data masking
- B . row-level security (RLS)
- C . sensitivity classifications
- D . column-level security
You are designing a sales transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will contains approximately 60 million rows per month and will be partitioned by month. The table will use a clustered column store index and round-robin distribution.
Approximately how many rows will there be for each combination of distribution and partition?
- A . 1 million
- B . 5 million
- C . 20 million
- D . 60 million
You are designing a dimension table for a data warehouse. The table will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as the data changes.
Which type of slowly changing dimension (SCD) should use?
- A . Type 0
- B . Type 1
- C . Type 2
- D . Type 3
You are designing an inventory updates table in an Azure Synapse Analytics dedicated SQL pool.
The table will have a clustered columnstore index and will include the following columns:
You identify the following usage patterns:
✑ Analysts will most commonly analyze transactions for a warehouse.
✑ Queries will summarize by product category type, date, and/or inventory event type.
You need to recommend a partition strategy for the table to minimize query times.
On which column should you partition the table?
- A . ProductCategoryTypeID
- B . EventDate
- C . WarehouseID
- D . EventTypeID
HOTSPOT
You are designing an application that will store petabytes of medical imaging data
When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes. You need to select a storage strategy for the data. The solution must minimize costs.
Which storage tier should you use for each time frame? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
HOTSPOT
You develop a dataset named DBTBL1 by using Azure Databricks.
DBTBL1 contains the following columns:
✑ SensorTypeID
✑ GeographyRegionID
✑ Year
✑ Month
✑ Day
✑ Hour
✑ Minute
✑ Temperature
✑ WindSpeed
✑ Other
You need to store the data to support daily incremental load pipelines that vary for each GeographyRegionID. The solution must minimize storage costs.
How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool.
You plan to keep a record of changes to the available fields.
The supplier data contains the following columns.
Which three additional columns should you add to the data to create a Type 2 SCD? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
- A . surrogate primary key
- B . foreign key
- C . effective start date
- D . effective end date
- E . last modified date
- F . business key
You plan to implement an Azure Data Lake Gen2 storage account.
You need to ensure that the data lake will remain available if a data center fails in the primary Azure region.
The solution must minimize costs.
Which type of replication should you use for the storage account?
- A . geo-redundant storage (GRS)
- B . zone-redundant storage (ZRS)
- C . locally-redundant storage (LRS)
- D . geo-zone-redundant storage (GZRS)
You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.
FactPurchase will have 1 million rows of data added daily and will contain three years of data.
Transact-SQL queries similar to the following query will be executed daily.
SELECT
SupplierKey, StockItemKey, COUNT(*)
FROM FactPurchase
WHERE DateKey >= 20210101
AND DateKey <= 20210131
GROUP By SupplierKey, StockItemKey
Which table distribution will minimize query times?
- A . round-robin
- B . replicated
- C . hash-distributed on DateKey
- D . hash-distributed on PurchaseKey