Which query interface would you recommend?
Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data. This colleague has a strong background in data flow languages and programming. Which query interface would you recommend?A . PigB . HiveC . HowlD . HBaseView AnswerAnswer: A
In the Exhibit. For effective visualization, what is the chart's primary flaw?
Refer to the Exhibit. In the Exhibit. For effective visualization, what is the chart's primary flaw?A . The use of 3 dimensions.B . The slanting of axis labels.C . The location of the legend.D . The order of the columns.View AnswerAnswer: A
Which tool should be recommended?
You are having a discussion with a business colleague. The colleague mentions that they want to perform K-means clustering on text file data stored in HDFS. Which tool should be recommended?A . MahoutB . HBaseC . ScribeD . SqoopView AnswerAnswer: A
Which ROC curve represents a perfect model fit?
Which ROC curve represents a perfect model fit? A) B) C) D) A . Exhibit AB . Exhibit BC . Exhibit CD . Exhibit DView AnswerAnswer: A
When is the GROUP BY ROLLUP clause used in an OLAP query?
When is the GROUP BY ROLLUP clause used in an OLAP query?A . All subtotals and grand totals are to be included in the outputB . Subtotals are only to be included in the outputC . Grand totals are only to be included in the outputD . Specific subtotals and...
Which word or phrase completes the statement? Business Intelligence is to ad-hoc reporting and dashboards as Data Science is to __________.
Which word or phrase completes the statement? Business Intelligence is to ad-hoc reporting and dashboards as Data Science is to __________.A . Optimization and Predictive ModelingB . Alerts and QueriesC . Structured Data and Data SourcesD . Sales and profit reportingView AnswerAnswer: A
When would you prefer a Naive Bayes model to a logistic regression model for classification?
When would you prefer a Naive Bayes model to a logistic regression model for classification?A . When you are using several categorical input variables with over 1000 possible values each.B . When you need to estimate the probability of an outcome, not just which class it is in.C . When...
What else must be true?
You have run the association rules algorithm on your data set, and the two rules {banana, apple} => {grape} and {apple, orange}=> {grape} have been found to be relevant. What else must be true?A . {grape, apple, orange} must be a frequent itemset.B . {banana, apple, grape, orange} must be...
If the tf-idf metric is used to score relevance for search and retrieval, which term has the highest discriminatory power?
You have the following corpus of texts: “The cat hit the dog.” “The dog bit the mail carrier.” “The mail carrier chased the truck.” “The truck hit the wall while avoiding the dog that chased the cat.” “The cat climbed the wall.” If the tf-idf metric is used to score...
How many customer groups should you specify?
Refer to the exhibit. You are using K-means clustering to classify customer behavior for a large retailer. You need to determine the optimum number of customer groups. You plot the within-sum-of- squares (wss) data as shown in the exhibit. How many customer groups should you specify?A . 2B . 3C...