Problem Scenario 25: You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)
Problem Scenario 25: You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)
sex, name, city
1, alok, mumbai
1, jatin, chennai
1, yogesh, kolkata
2, ragini, delhi
2, jyotsana, pune
1, valmiki, banglore
Create a flume conf file using fastest non-durable channel, which write data in hive warehouse directory, in two separate tables called flumemaleemployee1 and flumefemaleemployee1
(Create hive table as well for given data}. Please use tail source with /home/cloudera/flumetest/in.txt file.
Flumemaleemployee1: will contain only male employees data flumefemaleemployee1: Will contain only woman employees data
Answer: Solution:
Step 1: Create hive table for flumemaleemployeel and .’
CREATE TABLE flumemaleemployeel
(
sex_type int, name string, city string )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘, ‘;
CREATE TABLE flumefemaleemployeel
(
sex_type int, name string, city string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘, ‘;
Step 2: Create below directory and file mkdir /home/cloudera/flumetest/ cd /home/cloudera/flumetest/
Step 3: Create flume configuration file, with below configuration for source, sink and channel and save it in flume5.conf.
agent.sources = tailsrc
agent.channels = mem1 mem2
agent.sinks = stdl std2
agent.sources.tailsrc.type = exec
agent.sources.tailsrc.command = tail -F /home/cloudera/flumetest/in.txt
agent.sources.tailsrc.batchSize = 1
agent.sources.tailsrc.interceptors = i1 agent.sources.tailsrc.interceptors.i1.type = regex_extractor agent.sources.tailsrc.interceptors.il.regex = A(\d} agent.sources.tailsrc. interceptors. M.serializers = t1 agent.sources.tailsrc. interceptors, i1.serializers.t1 . name = type
agent.sources.tailsrc.selector.type = multiplexing agent.sources.tailsrc.selector.header = type agent.sources.tailsrc.selector.mapping.1 = memi agent.sources.tailsrc.selector.mapping.2 = mem2
agent.sinks.std1.type = hdfs
agent.sinks.stdl.channel = mem1
agent.sinks.stdl.batchSize = 1
agent.sinks.std1 .hdfs.path = /user/hive/warehouse/flumemaleemployeei
agent.sinks.stdl.rolllnterval = 0
agent.sinks.stdl.hdfs.tileType = Data Stream
agent.sinks.std2.type = hdfs
agent.sinks.std2.channel = mem2
agent.sinks.std2.batchSize = 1
agent.sinks.std2 .hdfs.path = /user/hi ve/warehouse/fIumefemaleemployee1
agent.sinks.std2.rolllnterval = 0
agent.sinks.std2.hdfs.tileType = Data Stream
agent.channels.mem1.type = memory agent.channels.meml.capacity = 100
agent.channels.mem2.type = memory agent.channels.mem2.capacity = 100
agent.sources.tailsrc.channels = mem1 mem2
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/flume5.conf –name agent
Step 5: Open another terminal create a file at /home/cloudera/flumetest/in.txt.
Step 6: Enter below data in file and save it.
l.alok.mumbai
1 jatin.chennai
1, yogesh, kolkata
2, ragini, delhi
2, jyotsana, pune
1, valmiki, banglore
Step 7: Open hue and check the data is available in hive table or not.
Step 8: Stop flume service by pressing ctrl+c
Latest CCA175 Dumps Valid Version with 96 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund