Hortonworks Apache Hadoop Developer Hadoop 2.0 Certification exam for Pig and Hive Developer Online Training
Hortonworks Apache Hadoop Developer Online Training
The questions for Apache Hadoop Developer were last updated at Apr 18,2025.
- Exam Code: Apache Hadoop Developer
- Exam Name: Hadoop 2.0 Certification exam for Pig and Hive Developer
- Certification Provider: Hortonworks
- Latest update: Apr 18,2025
You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt.
How many files will be processed by the FileInputFormat.setInputPaths () command when it’s given a path object representing this directory?
- A . Four, all files will be processed
- B . Three, the pound sign is an invalid character for HDFS file names
- C . Two, file names with a leading period or underscore are ignored
- D . None, the directory cannot be named jobdata
- E . One, no special characters can prefix the name of an input file
In a large MapReduce job with m mappers and n reducers, how many distinct copy operations will there be in the sort/shuffle phase?
- A . mXn (i.e., m multiplied by n)
- B . n
- C . m
- D . m+n (i.e., m plus n)
- E . mn (i.e., m to the power of n)
Which Hadoop component is responsible for managing the distributed file system metadata?
- A . NameNode
- B . Metanode
- C . DataNode
- D . NameSpaceManager
Review the following data and Pig code.
M,38,95111
F,29,95060
F,45,95192
M,62,95102
F,56,95102
A = LOAD 'data' USING PigStorage('.') as (gender:Chararray, age:int, zlp:chararray);
B = FOREACH A GENERATE age;
Which one of the following commands would save the results of B to a folder in hdfs named myoutput?
- A . STORE A INTO 'myoutput' USING PigStorage(',');
- B . DUMP B using PigStorage('myoutput');
- C . STORE B INTO 'myoutput';
- D . DUMP B INTO 'myoutput';
MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate daemons? Select two.
- A . Heath states checks (heartbeats)
- B . Resource management
- C . Job scheduling/monitoring
- D . Job coordination between the ResourceManager and NodeManager
- E . Launching tasks
- F . Managing file system metadata
- G . MapReduce metric reporting
- H . Managing tasks
Assuming the following Hive query executes successfully:
Which one of the following statements describes the result set?
- A . A bigram of the top 80 sentences that contain the substring "you are" in the lines column of the input data A1 table.
- B . An 80-value ngram of sentences that contain the words "you" or "are" in the lines column of the inputdata table.
- C . A trigram of the top 80 sentences that contain "you are" followed by a null space in the lines column of the inputdata table.
- D . A frequency distribution of the top 80 words that follow the subsequence "you are" in the lines column of the inputdata table.
Given the following Pig commands:
Which one of the following statements is true?
- A . The $1 variable represents the first column of data in ‘my.log’
- B . The $1 variable represents the second column of data in ‘my.log’
- C . The severe relation is not valid
- D . The grouped relation is not valid
What does Pig provide to the overall Hadoop solution?
- A . Legacy language Integration with MapReduce framework
- B . Simple scripting language for writing MapReduce programs
- C . Database table and storage management services
- D . C++ interface to MapReduce and data warehouse infrastructure
What types of algorithms are difficult to express in MapReduce v1 (MRv1)?
- A . Algorithms that require applying the same mathematical function to large numbers of individual binary records.
- B . Relational operations on large amounts of structured and semi-structured data.
- C . Algorithms that require global, sharing states.
- D . Large-scale graph algorithms that require one-step link traversal.
- E . Text analysis algorithms on large collections of unstructured text (e.g, Web crawls).
What types of algorithms are difficult to express in MapReduce v1 (MRv1)?
- A . Algorithms that require applying the same mathematical function to large numbers of individual binary records.
- B . Relational operations on large amounts of structured and semi-structured data.
- C . Algorithms that require global, sharing states.
- D . Large-scale graph algorithms that require one-step link traversal.
- E . Text analysis algorithms on large collections of unstructured text (e.g, Web crawls).