Among the big data computing tasks, which of the following description of IO-intensive tasks is wrong?
- A . During the execution of IO-intensive tasks, most of the time is spent on IO
- B . By improving network transmission efficiency and read-write efficiency, performance can be greatly improved
- C . High CPU consumption
- D . The more tasks, the higher the CPU efficiency
Which of the following HDFS commands can be used to check the integrity of a data block?
- A . HDFS fsck
- B . HDFS fsck -delete
- C . HDFS dfsadmin -report
- D . HDFS balancer-threshold 1
Which of the following description of Flink barrier is wrong?
- A . The barrier is periodically inserted into the data stream and flows as part of the data stream
- B . barrier is the core of Flink snapshot
- C . A barrier separates the data of this cycle snapshot from the data of the next cycle snapshot
- D . When the barrier is inserted, the data flow will be temporarily blocked
During operation, the Kafka cluster directly depends on which of the following components?
- A . Zookeeper
- B . HDFS
- C . Spark
- D . HBase
On the Fusioninsight Manager interface, which of the following options is not included in the loader operation?
- A . Switch Loader active and standby nodes
- B . Start the loader instance
- C . Configure loader parameters
- D . View loader service status
In the Fusioninsight HD system, which component does the flume data stream not pass through in the node?
- A . sink
- B . topic
- C . Source
- D . Channel
Which of the following functions can the Kafka Cluster Mirroring tool achieve?
- A . Kafka cross-cluster data synchronization scheme
- B . Data backup in Kafka single cluster
- C . Kafka but data recovery within the cluster
- D . None of the above is correct
Where is the HBase metadata Meta Region routing information stored?
- A . Root table
- B . Zookeeper
- C . HMaster
- D . Meta table
Which of the following factors have contributed to the vigorous development of the big data era?
- A . Reduction of hardware cost and improvement of network bandwidth
- B . The rise of cloud computing
- C . The popularity of smart terminals and the improvement of social needs
- D . All of the above are correct
Which of the following is not a state store supported by Flink?
- A . FsStateBackend
- B . RocksDBStateBackend
- C . MemoryStateBackend
- D . FileStateBackend
Which of the following descriptions about the characteristics of big data and traditional database data is wrong?
- A . Big data is the data processing of "fish in the pond", with a clear goal; the data processing of traditional databases is to judge whether other kinds of "fish" exist through certain "fish"
- B . There are many types of data processed by big data, including structured, unstructured and semi-structured data; the types of data in traditional databases are relatively single, often structured data
- C . In big data, there is no unified data tool, that is, "No size fits all"; in traditional databases, in a specific business scenario, often one tool can solve a problem, that is, "One size fits all" all"
- D . The data scale of big data is very large, and TB and PB are generally used as data processing units; the data scale in traditional databases is generally small, and MB is often used as data processing units
Which of the following statements about ZKFC is wrong?
- A . ZKFC (ZKFailoverController) is used as a client of Zookeeper cluster to monitor the status information of NameNode
- B . The ZKFC process needs to be deployed in the NameNode node and Zookeeper’s Leader node
- C . The Standby NameNode perceives the status of the Active NameNode through Zookeeper. Once the Active NameNode goes down, the Standby NameNode will perform the master promotion operation
- D . The ZKFC of HDFS NameNode connects to Zookeeper, and saves information such as the host name in Zookeeper
Flume supports monitoring and transferring new files in the directory, which can realize data transfer. Which type of source is described above?
- A . spooling directory source
- B . http source
- C . exec source
- D . syslog source
How is the HBaseM main Master elected?
- A . Randomly selected
- B . Adjudicated by RegionServer
- C . Judgment via Zookeeper
- D . HMaster is in dual-master mode and does not need to be adjudicated
A user needs to build a 350-node FusionInsight HD cluster, which planning scheme is the best?
- A . Deployment of management nodes, control nodes, and data nodes in one, Layer 2 networking
- B . Integrated deployment of management nodes and control nodes, independent deployment of data nodes, and Layer 2 networking
- C . Management nodes, control nodes, and data nodes are deployed independently, three-tier networking
- D . Integrated deployment of management nodes and data nodes, independent deployment of control nodes, and Layer 2 networking
When the number of nodes in the Zookeeper cluster is 5 nodes, how many nodes are equivalent to the disaster recovery capability of the cluster?
- A . 3
- B . 4
- C . 6
- D . None of the above
When viewing the partition details of a topic in Kafka, which of the following commands should be used?
- A . bin/kafka-topic.sh -create
- B . bin/kafka-topic.sh -list
- C . bin/kafka-topic.sh -describe
- D . bin/kafka-topic.sh -delete
The following provides a variety of Redis optimization methods, which option is wrong?
- A . Simplified key names and key values
- B . Turn off Transparent Huge Pages
- C . Modify the minimum number of tcp connections in Linux
- D . Modify the Linux kernel memory allocation strategy or execute sysctl
How many copies of HDFS blocks are saved by default in the Fusionlnsight HD system?
- A . 3 _
- B . 2
- C . 1
- D . not sure
Which component controls the active/standby arbitration in HDFS?
- A . Zookeeper Failover Controller
- B . Node Manager
- C . Resource Manager
- D . HDFS Client
Which nodes need to communicate with external data sources before and after the Fusioninsight HD Loader operation?
- A . Loader service master node
- B . Nodes running YARN service jobs
- C . Both of the first two are required
- D . Neither of the first two is required
YRAN’s label-based scheduling, which of the following options is used for labeling?
- A . APPMaster
- B . Resource Manager
- C . Container
- D . Node Manager
Which of the following is not a mandatory option when creating a Loader job?
- A . name
- B . connection
- C . type
- D . Priority
The figure below shows the calculation model of Structured Streaming. Through observation, it can be concluded that the final calculation result of T3 is?
- A . Dog 1, owl 1
- B . Cat 2, dog 4, owl 2
- C . Cat 2, dog 3, owl 1
- D . Cat 1, cat 1, dog 2, dog 2, owl 2
Fusioninsight HD system audit log is unavailable to record which of the following operations?
- A . Clear warnings manually
- B . Start and stop the service instance
- C . Delete service instance
- D . Query History Monitoring
In the Flink technical architecture, which of the following is the computing engine for stream processing and batch processing?
- A . Standalone
- B . Runtime
- C . DataStream
- D . FlinkCore
FusionInsight HD uses the Hbase client to write 10 pieces of data in batches. A certain HRegionServer node contains two Regions of the table, namely A and B. Among the 10 pieces of data, 6 pieces belong to A and 4 pieces belong to B. Excuse me. How many RPC requests need to be sent to HRegionServer to write these 10 pieces of data?
- A . 2
- B . 1
- C . 6
- D . 10
The RowKey of a table in HBase is divided into SplitKey 9, e, a, z. How many Regions does this table have?
- A . 3
- B . 4
- C . 5
- D . 6
What physical resources does Redis mainly consume?
- A . memory
- B . Bandwidth
- C . Hard disk space
- D . Disk I/O
Which of the following programming languages is implemented in Spark?
- A . C
- B . C++
- C . JAVA
- D . Scala
What is wrong about the architecture description of Hive in FusionInsight HD?
- A . As long as one HiveServer is unavailable, the entire Hive cluster will be unavailable
- B . HiveServer is responsible for accepting client requests, parsing, executing HQL commands and returning query results
- C . MetaStore is used to provide original data services and depends on DBServer
- D . At the same time, only one HiveServer is in the Active state, and the other is in the Standby state
In creating a Loader job, which of the following steps can set the filter type?
- A . Input settings
b. Conversion
c. output - B . Basic information
Which of the following is not a core element of Krb Server?
- A . KDC (Key Distribution Center)
- B . Kerberos Client
- C . Kerberos KDC Client
- D . Kerberos KDC Server
Which parameter needs to be configured to enable the log aggregation function of the YARN component on the Hadoop platform?
- A . YARN.nodemanager.local-dir
- B . YARN.nodemanager.log-dirs
- C . YARN.acl.enable
- D . YARN.log-aggregation-enable
When installing the Streaming component of Fusioninsight HD, the Nimbus role requires how many nodes to be installed?
- A . 4
- B . 3
- C . 2
- D . 1
Which of the following installation process of Fusioninsight HD is correct?
- A . Install Manager->execute preinstall->LLD tool to configure->install cluster->check after installation->configure after installation
- B . Configure with LLD tool -> Execute preinstall -> Install Managers -> Install cluster -> Check after installation -> Configure after installation
- C . Install Manager->LLD tool to configure->execute preinstall->install cluster->check after installation->configure after installation
- D . LLD tool for configuration -> execute preinstalls -> install cluster -> install Manager -> check after installation -> configure after installation
When the loader in Fusioninsight HD imports files from the SFTP server, which one of the following is the fastest file type that does not require code conversion and data conversion?
- A . sequence_file
- B . text_file
- C . binary_file
- D . graph_file
If the data producer needs to decide to send the data to a certain task of the target Bolt, which of the following message publishing strategies should be selected?
- A . Local field grouping
- B . Broadcast packet
- C . Direct grouping
- D . Global grouping
In the Fusioninsight Hadoop cluster, through df-hT query on a certain node, the partitions seen include the following:
/var/log
/srv/BigData
/srv/BigData/hadoop/data5
/srv/BigData/solr/solrserver3
/srv/BigData/dbdataom
What is the planning combination of the optimal Raid level for the disks corresponding to these partitions?
- A . Raid0 Raid1 Raid0 Non-Raid1
- B . Raid1 Raid1 Non-Raid Non-Raid Raid1
- C . Raid0 Raid0 Raid0 Raid0 Raid0
- D . Non-Raid Non-Raid Non-Raid Non-Raid Raid1
Which of the following descriptions about the key features of Flink is wrong?
- A . Compared with Flink, SparkStreaming has lower latency
- B . The Flink stream processing engine can provide support for both stream processing and batch processing applications
- C . Compared with Streaming in FusionInght HD, Flink has higher throughput
- D . checkpoint implements Flink’s fault tolerance
Which of the following options are suitable for MapReduce?
- A . Offline Computing
- B . Real-time interactive computing
- C . Iterative calculation
- D . Stream computing
What is the physical storage unit of Region in Hbase?
- A . Region
- B . ColumnFamily
- C . Column
- D . Row
In Fusioninsight products, which of the following descriptions are correct about creating Kafka topics?
- A . When creating a Kafka Topic, the number of Partitions must be set
- B . When creating a Kafka Topic, you must set the number of Partition copies
- C . Setting up multiple copies can enhance the disaster recovery capability of Kafka service
- D . All of the above are correct
Which parameter needs to be configured to set the maximum amount of resources used by queue QueueA in YARN?
- A . yarn.scheduler.capacity.root.QueueA.user-limit-factor
- B . yarn.scheduler.capacity.root.QueueA.minimum-user-limit-percent
- C . yarn.scheduler.capacity.root.QueueA.state
- D . yarn.scheduler.capacity.root.QueueA.maximum-capacity
The __ interface in Flink is used for streaming data processing, and the ______ interface is used for batch processing.
- A . DataStream API, DataSet API
- B . Data batch API, DataStream API
- C . Stream API, Batch API
- D . Batch API, Stream API
Which is not the channel type of Flume?
- A . Memory Channel
- B . File Channel
- C . JDBC Channel
- D . HDFS Channel
In the task scheduling process of YARN, which of the following tasks is the responsibility of the Application Master?
- A . Applying for and receiving resources
- B . Set up the running environment for the task
- C . Allocate Containers
- D . Start the Map or Reduce task
As shown in the figure, which of the following description about the Kafka message consumer reading the message is wrong?
- A . The blue in the picture is a topic of Kafka, which can be understood as a queue, and each grid represents a message.
- B . The messages generated by the producer are placed at the end of the topic one by one.
- C . Consumers read messages sequentially from right to left.
- D . Consumer uses offset to record the read position
In which of the following links is the data conversion operation of Flink completed?
- A . soure
- B . Transformation
- C . Sink
- D . Channel
Which is the correct description about Supervisor of Fusioninsight HD Streaming?
- A . Supervisor is responsible for resource allocation and task scheduling
- B . Supervisor is responsible for accepting tasks assigned by Nimbus, starting and stopping Worker processes managed by itself
- C . Supervisor is a process that runs specific processing logic
- D . Supervisor is a component that receives data in Topology and then performs processing
In Fusioninsight HD, if you need to view the users and permission groups currently logged into HBase, what commands can you execute in the HBase shell?
- A . use_permission
- B . whoami
- C . who
- D . get_user
Which of the following options cannot be realized through big data technology?
- A . Business model discovery
- B . Credit Evaluation
- C . Product recommendation
- D . Operational Analysis
What is the index name of ElasticSearch?
- A . ddcvc
- B . doc
- C . logstash-2020.01.17
- D . 3sDqsm8Bu-kTplz0jqhL
What components does HBase use by default as its underlying file storage system?
- A . File
- B . Kafka
- C . Memory
- D . HDFS
Which of the following options in the HBase command has the worst performance
- A . put
- B . get
- C . list
- D . Scan
In the MRS platform, which component does the Flume data flow not need to pass through in the node?
- A . Sink
- B . Channel
- C . Topic
- D . Source
The figure below shows the Flume data transmission architecture. What is the component at the "?" in the figure?
- A . Interceptor
- B . Channel processor
- C . Channel selector
- D . None of the above is correct
Which of the following commands can clear the data of all databases under the Redis instance?
- A . dropall
- B . flushall
- C . flushdb
- D . dropdb
In the Fusioninsight product, which of the following descriptions about the kafka topic is wrong?
- A . Each Topic can only be divided into one partition (area)
- B . The number of partitions of Topic can be configured at the time of creation
- C . The storage level of each Partition corresponds to a log file, and all information data is recorded in the log file
- D . Each message published to Kafka has a category. This category is called Topic, which can also be understood as a queue for storing messages.
What is the core module of spark?
- A . spark streaming
- B . spark core
- C . mapreduce
- D . spark sql
Which one is correct about the description of the Loader job in FusionInsight HD?
- A . After the Loader submits the job to Yam for execution, if the Loader service is abnormal at this time, the job execution fails.
- B . After the Loader submits the job to Yam for execution, if a Mapper task fails to execute, it can automatically retry
- C . After the Loader job fails to execute, garbage-level data will be generated, which needs to be cleared manually by the user
- D . After the Loader submits a job to Yam for execution, no other jobs can be submitted until the job is executed
Regarding the basic operation of Hive table creation, which of the following description is correct?
- A . Once the table is created, the table name cannot be modified
- B . Once the table is built, no new columns can be added
- C . The external keyword needs to be specified when creating an external table
- D . Once the table is created, the column names cannot be modified
Regarding the Controller and NodeAgent in FusionInsightManager, which statement is correct?
- A . Controller sends heartbeat to NodeAgent every 3 seconds
- B . NodeAgent accepts the commands issued by the Controller and executes specific actions
- C . Each node must deploy a Controller
- D . NodeAgent is open source enhanced
What data types of application scenarios are HBase of Hadoop not suitable for?
- A . Large file application scenarios
- B . Mass data application scenarios
- C . High throughput application scenarios
- D . Semi-structured data application scenarios
Which of the following descriptions about Hive features is wrong?
- A . Flexible and convenient ETL
- B . Only supports MapReduce computing engine
- C . Can directly access HDFS files and HBase
- D . Easy to use and easy to program
What is the meaning of this command "ALTER TABLEemployee1ADDcolumns (columnlstring);" in Hive?
- A . Drop table
- B . Add columns
- C . Create table
- D . Modify the file format
Which of the following scenarios is Hive not suitable for?
- A . Real-time online data analysis
- B . Non-real-time analysis, such as log analysis, statistical analysis
- C . Data mining, such as user behavior analysis, interest segmentation, and regional display
- D . Data summary, such as daily and weekly user clicks, click ranking
If you want to add 1 to the digital value stored in Key, which command should you use?
- A . incr
- B . decr
- C . incrby
- D . decrby
Which of the following description about Kafka is wrong?
- A . Used as the basis for activity streams and operational data processing pipelines
- B . Developed by Apache Hadoop and open sourced in 2011
- C . It has the characteristics of information persistence, high throughput, and real-time
- D . Using Scala, Java language implementation
In the YARN service, if the capacity of the queue QuqueA is to be set to 30%, which parameter should be configured?
- A . YARN.scheduler.capacity.root.QueueA.user-limit-factor
- B . YARN.scheduler.capacity.root.QueueA.minimum-user-limit-percent
- C . YARN.scheduler.capacity.root.QueueA.capacity
- D . YARN.scheduler.capacity.root.QueueA.state
What is the default Block Size of HDFS in the Fusioninsight HD system?
- A . 32M
- B . 64M
- C . 128M
- D . 256M
How many shards does an index library of ElasticSearch have by default?
- A . 5
- B . 6
- C . 3
- D . 4
Which instance must be deployed with the Loader instance in FusionInsight HD?
- A . DataNodes
- B . RegionServer
- C . Resource Manager
- D . Node Manager
What is the module used to manage the active and standby status of the Loader Server process in the Loader?
- A . Job Scheduler
- B . HA Manager
- C . Job Manager
- D . Resource Manager
During the data collection process of Flume, which of the following options the data can be filtered and modified?
- A . Channel
- B . Channel Selector
- C . Interceptor
- D . Sink
The main role of Zookeeper in distributed applications does not include which of the following options?
- A . Election of Master nodes
- B . Ensure data consistency on each node
- C . Allocate cluster resources
- D . Store server information in the cluster
Regarding the warning of insufficient Kafka disk capacity, which of the following analysis is wrong for the possible reasons?
- A . The disk configuration used to store Kafka data (such as the number and size of disks, etc.) cannot meet the current business data flow declaration, causing the disk usage to reach the upper limit
- B . The data storage time is configured too long, and the data accumulation reaches the upper limit of disk usage
- C . Unreasonable business planning, resulting in uneven data distribution, causing some disks to reach the upper limit of usage
- D . Broker node failure causes
In Kafka HA, when the Leader corresponding to the Partition goes down, a new Leader needs to be elected from the Followers. Which of the following roles will perform it?
- A . Follower
- B . Controller
- C . Broker
- D . Leader
Which of the following description about HBase’s Region Split process is wrong?
- A . The table will suspend service during the Split process
- B . Split splits a Region into two Regions in order to reduce the data size in the Region
- C . The Region that is split during the Split process will suspend the service
- D . During the Split process, the file is not really split, only the reference file is created
Provide enterprise-level metadata management under the DGC platform architecture. Data asset management is visualized and supports drilling and traceability. Through the data map, which module can realize the data lineage and data panorama visualization of data assets, and provide data intelligent search and operation monitoring?
- A . Data development
- B . Data asset management
- C . Specification design
- D . Data Integration
When the Spark application is running, what is the basis for Stage division?
- A . task
- B . taskSet
- C . action
- D . shuffle
Regarding DataSet, which of the following statements is wrong?
- A . DataSet is a strongly typed collection of domain-specific objects
- B . DataSet can perform most operations without deserialization
- C . DataSet performs sort, filter, shuffle and other operations that need to be deserialized
- D . DataSet is highly similar to RDD, and its performance is better than RDD
What does Fusioninsight HD HBase use by default as its underlying file storage system?
- A . HDFS
- B . Hadoop
- C . Memory
- D . MapReduce
Does the underlying data of HBase exist in the form of __?
- A . keyvalue
- B . Column storage
- C . Row storage
- D . Real-time storage
Regarding the relationship between Hive and other components of Hadoop, which of the following descriptions is wrong?
- A . Hive finally stores the data in HDFS
- B . Hive is a data warehouse tool for the Hadoop platform
- C . HQL can perform tasks through MapReduce
- D . Hive has a strong dependence on HBase
Which process is described in the Map stage shown in the figure below?
- A . Partition
- B . Sort
- C . Spill/Merge
- D . Combine
Zookeeper’s scheme authentication method does not include which of the following?
- A . sasl
- B . auth
- C . digest
- D . world
Which module in Hadoop is responsible for HDFS data storage?
- A . NameNode
- B . Data Node
- C . ZooKeeper
- D . Job Traoker
Huawei Fusioninsight HD is the first big data platform in China that complies with the national financial level protection. What are the following aspects of its security?
- A . System Security
- B . Authority authentication
- C . Data Security
- D . All of the above are correct
Which of the following does not include the scheme authentication method of Zookeeper?
- A . auth
- B . sasl
- C . digest
- D . world
When MRS Loader creates a job, what is the role of the connector?
- A . Configure the connection method between the job and the external data source
- B . Configure the connection method between the job and the internal data source
- C . Provide optimization parameters to improve data import and export performance
- D . Make sure there is a conversion step
Which of the following descriptions about Kafka partition offset is wrong?
- A . Consumers track records through (offset, partition, topic)
- B . Uniquely mark a message
- C . offset is a string of type String
- D . The position of each message in the file is called offset (offset)
Which application scenarios are HBase not suitable for?
- A . High throughput application scenarios
- B . Application scenarios that require full ACID characteristics
- C . Semi-structured data application scenarios
- D . Massive data (TB, PB) application scenarios
Among Fusioninsight HD products, which statement about Kafka is wrong?
- A . Kafka strongly depends on Zookeeper
- B . The number of instances deployed by Kafka must not be less than 2
- C . The Kafka server can generate messages
- D . Consumer acts as Kafka’s client role to consume messages
In the operation process of Flink, the role responsible for applying for resources is?
- A . Resource Manager
- B . Job Manager
- C . Client
- D . Task Manager
Regarding the comparison between Hive and traditional data warehouses, which of the following descriptions is wrong?
- A . Hive metadata storage is independent of data storage, thereby decoupling metadata and data, with high flexibility, while traditional data warehouses have single data applications and low flexibility
- B . Hive is based on HDFS storage, theoretically, the storage capacity can be expanded infinitely, while the storage capacity of traditional data warehouses will have an upper limit
- C . Since Hive data is stored in HDFS, it can guarantee high fault tolerance and high reliability of data
- D . Since Hive is based on a big data platform, the query efficiency is faster than traditional data warehouses
What is the default resource scheduler for queues in YARN?
- A . FIFO scheduler
- B . Capacity Scheduler
- C . Fair Scheduler
- D . All of the above are correct
In Hadoop, if yarn.scheduler.capacity.root.QueueA.minimum-user-limit-percent is set to 50, which of the following statements is wrong?
- A . A user submitting a task can use 100% of QueueA’s resources
- B . Each user in QueueA can only get up to 50% of the resources
- C . If there are already 2 user tasks running in QueueA, the task submitted by the third user needs to wait for the release of resources
- D . QueueA must guarantee that each user gets at least 50% of the resources
What is the file format of data storage in HBase?
- A . HFile
- B . HLog
- C . TextFile
- D . SequenceFile
Which of the following descriptions about Hive log collection on the Fusioninsight Manager interface is wrong?
- A . Specific users can be designated for log collection, for example, only the logs generated by UserA are downloaded.
- B . You can specify a time period for log collection, for example, only collect logs from 2016-1-1 to 2016-1-10.
- C . You can specify an instance for log collection, such as specifying to collect metstore logs.
- D . You can specify the node IP for log collection, for example, only download the logs of a certain IP.