Which of the following describes Apache Flink?
- A . A distributed file storage system
- B . A graph processing framework
- C . A stream processing framework
- D . A batch processing engine
Which of the following are characteristics of the Hadoop ecosystem? (Select all that apply)
- A . Real-time processing
- B . Batch processing
- C . Low fault tolerance
- D . Scalability
- E . Single-node architecture
Which skill is crucial for a data engineer to effectively manage and optimize large-scale data processing systems?
- A . Data analysis
- B . Front-end development
- C . Cloud computing
- D . Graphic design
Which Python library formats data into dataframes?
- A . NLTK
- B . NumPy
- C . Pandas
- D . scikit-learn
Which of the following are components of the Apache Spark Streaming architecture? (Select all that apply)
- A . Driver node
- B . Worker node
- C . Discretized streams (DStreams)
- D . Kafka topics
- E . HDFS storage
In Python, which type of operator is often used in conditional statements to perform Boolean operations?
- A . Bitwise
- B . Assignment
- C . Arithmetic
- D . Logical
What type of expertise is required to be an effective data engineer?
- A . Big Data tools
- B . Business analysis
- C . Project management
- D . Accounting
In the ELT process, data is transformed after being loaded into the target system.
Which tools can be used for ELT? (Select all that apply)
- A . Apache Hadoop
- B . Apache Spark
- C . Talend
- D . Microsoft SSIS
- E . Amazon Redshift
What skill is essential for a data engineer to efficiently transform and clean raw data into usable formats?
- A . Data visualization
- B . Machine learning
- C . Data warehousing
- D . ETL (Extract, Transform, Load) processes
A company needs to fill an opening for a key tactical data governance position. The new hire is to be a subject matter expert on customer data and ensure the data policies and standards are adhered to on a daily basis. This role bridges the gap between a data governance program and its execution.
What is a suitable job title for this role?
- A . Data miner
- B . Data scientist
- C . Data steward
- D . Data engineer
Which Cassandra data type represents a collection of key-value pairs, and is represented in curly brackets ({, }) and colon (:)?
- A . List
- B . Map
- C . Set
- D . UUID
What is the primary purpose of Apache Kafka in a data processing architecture?
- A . Storing historical data
- B . Running machine learning algorithms
- C . Processing real-time data streams
- D . Running complex SQL queries
What enables Pravega to rapidly ingest streaming data into durable storage?
- A . Apache Spark
- B . Append-only logs
- C . Relational database schema
- D . MongoDB
What is the purpose of sensor operators in Apache Airflow?
- A . Perform validation checks in parallel
- B . Move data sequentially from one system to another
- C . Use triggers to report each successive retry
- D . Use a poke method to monitor external processes
Why is it important to control the capabilities of a workload during runtime in Kubernetes?
- A . Limit resource usage on a cluster
- B . Apply API authorization on a cluster
- C . Enable management of application secrets
- D . Support data exfiltration from a pod
How is replication implemented in Redis?
- A . Client-server communication protocol is used, whereby every client replicates data from the server
- B . Data is shared among many computers to enable fault tolerance and data accessibility
- C . Meta information is generated based on the client-server communication protocol
- D . The developer copies and pastes the data in multiple databases
What is a difference between Data Governance and Master Data Management (MDM)?
- A . Data governance is an informal system of decision making.
MDM is a formal system of decision making. - B . Data governance implementation involves people, policies, rules, and metrics.
MDM is done automatically with no interference. - C . Data governance focuses mainly on IT practices.
MDM creates the rules and decisions for operational data processes. - D . Data Governance is a formalized system of decision making.
MDM is a comprehensive method to link all data elements to a common point of reference.
What is the data governance role that leverages the data for business purposes, such as increasing sales or reducing manufacturing costs?
- A . Data steward
- B . Data custodian
- C . Security specialist
- D . End user
What is a characteristic of Pig?
- A . Performs real-time reads and writes in HDFS
- B . Uses HiveQL to translate SQL queries into MapReduce jobs
- C . Data warehouse infrastructure that manages jobs in the cluster
- D . Alternative language to Java programming for MapReduce
What ethical principle emphasizes that collected data should be used for its original intended purpose?
- A . Data minimization
- B . Purpose limitation
- C . Transparency
- D . Accountability
Which Sqoop validation type checks the row counts between the source and target databases, and tries to ensure that the counts match?
- A . ValidationThreshold
- B . Eval
- C . ValidationFailureHandler
- D . Validator
Which IoT data processing tool uses a storage abstraction called Stream for continuous and unbounded data?
- A . Apache NiFi
- B . Apache Storm
- C . Pravega
- D . EdgeX Foundry
You are designing a database for an e-commerce data warehouse. The data is normalized in 1NF format and will be used to create data marts for different departments inside the organization.
Which schema type should be used for this requirement?
- A . Star
- B . Columnar
- C . Highly Normalized
- D . Graph
An organization has decided to structure their governance model to allow either a single individual or a team to control and maintain the master data.
Which governance model is being implemented?
- A . Distributed
- B . Centralized
- C . Federated
- D . Hybrid
Which of the following are components of the Apache Spark architecture? (Select all that apply)
- A . Spark Core
- B . Spark SQL
- C . Spark HBase
- D . Spark Streaming
- E . Spark Cassandra
What are three programming languages supported by Apache Spark?
- A . Python, R, and Scala
- B . PL/SQL, R, and Scala
- C . Python, R, and C++
- D . Python, C, and Scala
What is the primary storage system used by the Hadoop ecosystem?
- A . MongoDB
- B . HDFS
- C . Amazon S3
- D . Cassandra
Which of the following are components of the ETL process? (Select all that apply)
- A . Extraction
- B . Transmission
- C . Loading
- D . Transformation
Visualization
Which tool builds batch processing as a special case of stream data processing?
- A . Apache Hive
- B . Apache Flink
- C . Apache Spark
- D . Apache Oozie
What is the purpose of memory management in Apache Flink?
- A . Convert all of the data into Java objects
- B . Control how much memory the runtime operations use
- C . Eliminate the need for serialization of the data
- D . Ensure no disk space is ever required
Understand the current and future states of data governance and identify remaining gaps
Which tool provides fine-grained access control and security for Hadoop components?
- A . Apache Atlas
- B . Apache Ranger
- C . Apache Knox
- D . Apache HBase
In Python, which primitive data type covers real rational and irrational numbers?
- A . String
- B . Integer
- C . Float
- D . Boolean