TY - GEN
T1 - Performance Analysis of Multi-Node Hadoop Cluster Based on Large Data Sets
AU - Ahmed, N.
AU - Barczak, Andre L.C.
AU - Bazai, Sibghat Ullah
AU - Susnjak, Teo
AU - Rashid, Mohammed A.
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/16
Y1 - 2020/12/16
N2 - The purpose of this paper is to assess the performance of a Hadoop cluster in MapReduce and Spark using two different clusters, one with 5 slave nodes, and another with 9 slave nodes. For the experiment, the HiBenchmark workloads WordCount and TeraSort are used with varied data scale from 50GB to 600GB. We have chosen a few different parameters and replaced their default values with the tuned values, allowing us to analyze the effects of such changes in each job's runtime. The results show that for both WordCount and Terasort workloads, depending on the tuned parameters, MapReduce and Spark achieved 64% and around 60% performance improvement at each data point. Besides, we also got slightly interesting results of speed-up progress by 1% using extra slave nodes. These results show that cluster performance can be improved by changing default values of a few parameters and adding additional slave nodes.
AB - The purpose of this paper is to assess the performance of a Hadoop cluster in MapReduce and Spark using two different clusters, one with 5 slave nodes, and another with 9 slave nodes. For the experiment, the HiBenchmark workloads WordCount and TeraSort are used with varied data scale from 50GB to 600GB. We have chosen a few different parameters and replaced their default values with the tuned values, allowing us to analyze the effects of such changes in each job's runtime. The results show that for both WordCount and Terasort workloads, depending on the tuned parameters, MapReduce and Spark achieved 64% and around 60% performance improvement at each data point. Besides, we also got slightly interesting results of speed-up progress by 1% using extra slave nodes. These results show that cluster performance can be improved by changing default values of a few parameters and adding additional slave nodes.
UR - http://www.scopus.com/inward/record.url?scp=85105467451&partnerID=8YFLogxK
U2 - 10.1109/CSDE50874.2020.9411587
DO - 10.1109/CSDE50874.2020.9411587
M3 - Conference contribution
AN - SCOPUS:85105467451
T3 - 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2020
BT - 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2020
PB - IEEE, Institute of Electrical and Electronics Engineers
T2 - 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2020
Y2 - 16 December 2020 through 18 December 2020
ER -