Problem Scenario 30: You have been given three csv files in hdfs as below.
Problem Scenario 30: You have been given three csv files in hdfs as below.
EmployeeName.csv with the field (id, name)
EmployeeManager.csv (id, manager Name)
EmployeeSalary.csv (id, Salary)
Using Spark and its API you have to generate a joined output as below and save as a text tile (Separated by comma) for final distribution and output must be sorted by id.
ld, name, salary, managerName
EmployeeManager.csv
E01, Vishnu
E02, Satyam
E03, Shiv
E04, Sundar
E05, John
E06, Pallavi
E07, Tanvir
E08, Shekhar
E09, Vinod
E10, Jitendra
EmployeeName.csv
E01, Lokesh
E02, Bhupesh
E03, Amit
E04, Ratan
E05, Dinesh
E06, Pavan
E07, Tejas
E08, Sheela
E09, Kumar
E10, Venkat
EmployeeSalary.csv
E01, 50000
E02, 50000
E03, 45000
E04, 45000
E05, 50000
E06, 45000
E07, 50000
E08, 10000
E09, 10000
E10, 10000
Answer: Solution:
Step 1: Create all three files in hdfs in directory called sparkl (We will do using Hue}. However, you can first create in local filesystem and then
Step 2: Load EmployeeManager.csv file from hdfs and create PairRDDs
val manager = sc.textFile("spark1/EmployeeManager.csv")
val managerPairRDD = manager.map(x=> (x.split(", ")(0), x.split(", ")(1)))
Step 3: Load EmployeeName.csv file from hdfs and create PairRDDs
val name = sc.textFile("spark1/EmployeeName.csv")
val namePairRDD = name.map(x=> (x.split(", ")(0), x.split(‘")(1)))
Step 4: Load EmployeeSalary.csv file from hdfs and create PairRDDs
val salary = sc.textFile("spark1/EmployeeSalary.csv")
val salaryPairRDD = salary.map(x=> (x.split(", ")(0), x.split(", ")(1)))
Step 4: Join all pairRDDS
val joined = namePairRDD.join(salaryPairRDD}.join(managerPairRDD}
Step 5: Now sort the joined results, val joinedData = joined.sortByKey()
Step 6: Now generate comma separated data.
val finalData = joinedData.map(v=> (v._1, v._2._1._1, v._2._1._2, v._2._2))
Step 7: Save this output in hdfs as text file.
finalData.saveAsTextFile("spark1/result.txt")
Latest CCA175 Dumps Valid Version with 96 Q&As
Latest And Valid Q&A | Instant Download | Once Fail, Full Refund