Problem Scenario 25: You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)

exams CCA175 CCA175 exam 1 Comment

Problem Scenario 25: You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)

sex, name, city

1, alok, mumbai

1, jatin, chennai

1, yogesh, kolkata

2, ragini, delhi

2, jyotsana, pune

1, valmiki, banglore

Create a flume conf file using fastest non-durable channel, which write data in hive warehouse directory, in two separate tables called flumemaleemployee1 and flumefemaleemployee1

(Create hive table as well for given data}. Please use tail source with /home/cloudera/flumetest/in.txt file.

Flumemaleemployee1: will contain only male employees data flumefemaleemployee1: Will contain only woman employees data

Answer: Solution:

Step 1: Create hive table for flumemaleemployeel and .’

CREATE TABLE flumemaleemployeel

(

sex_type int, name string, city string )

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘, ‘;

CREATE TABLE flumefemaleemployeel

(

sex_type int, name string, city string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘, ‘;

Step 2: Create below directory and file mkdir /home/cloudera/flumetest/ cd /home/cloudera/flumetest/

Step 3: Create flume configuration file, with below configuration for source, sink and channel and save it in flume5.conf.

agent.sources = tailsrc

agent.channels = mem1 mem2

agent.sinks = stdl std2

agent.sources.tailsrc.type = exec

agent.sources.tailsrc.command = tail -F /home/cloudera/flumetest/in.txt

agent.sources.tailsrc.batchSize = 1

agent.sources.tailsrc.interceptors = i1 agent.sources.tailsrc.interceptors.i1.type = regex_extractor agent.sources.tailsrc.interceptors.il.regex = A(\d} agent.sources.tailsrc. interceptors. M.serializers = t1 agent.sources.tailsrc. interceptors, i1.serializers.t1 . name = type

agent.sources.tailsrc.selector.type = multiplexing agent.sources.tailsrc.selector.header = type agent.sources.tailsrc.selector.mapping.1 = memi agent.sources.tailsrc.selector.mapping.2 = mem2

agent.sinks.std1.type = hdfs

agent.sinks.stdl.channel = mem1

agent.sinks.stdl.batchSize = 1

agent.sinks.std1 .hdfs.path = /user/hive/warehouse/flumemaleemployeei

agent.sinks.stdl.rolllnterval = 0

agent.sinks.stdl.hdfs.tileType = Data Stream

agent.sinks.std2.type = hdfs

agent.sinks.std2.channel = mem2

agent.sinks.std2.batchSize = 1

agent.sinks.std2 .hdfs.path = /user/hi ve/warehouse/fIumefemaleemployee1

agent.sinks.std2.rolllnterval = 0

agent.sinks.std2.hdfs.tileType = Data Stream

agent.channels.mem1.type = memory agent.channels.meml.capacity = 100

agent.channels.mem2.type = memory agent.channels.mem2.capacity = 100

agent.sources.tailsrc.channels = mem1 mem2

Step 4: Run below command which will use this configuration file and append data in hdfs.

Start flume service:

flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/flume5.conf –name agent

Step 5: Open another terminal create a file at /home/cloudera/flumetest/in.txt.

Step 6: Enter below data in file and save it.

l.alok.mumbai

1 jatin.chennai

1, yogesh, kolkata

2, ragini, delhi

2, jyotsana, pune

1, valmiki, banglore

Step 7: Open hue and check the data is available in hive table or not.

Step 8: Stop flume service by pressing ctrl+c