SUST Repository

COMPARING THE PERFORMANCE OF APACHE SPARK AND APACHE HADOOP MAPREDUCE ON BIG DATA PROCESSING

Show simple item record

dc.contributor.author SHUMO, ALAA ISMAIEL IBRAHIM
dc.contributor.author SALIH, ESRA ADIL GALAL .
dc.contributor.author KHALED, SAJDA LOTFY AHMED
dc.contributor.author ALBASHEER, SARA HASSABO ABDALLAH
dc.contributor.author Supervisor, -AHMED HAMZA ABDL-MONIEM HAMZA
dc.date.accessioned 2018-08-02T08:55:06Z
dc.date.available 2018-08-02T08:55:06Z
dc.date.issued 2017-10-02
dc.identifier.citation SHUMO, ALAA ISMAIEL IBRAHIM . COMPARING THE PERFORMANCE OF APACHE SPARK AND APACHE HADOOP MAPREDUCE ON BIG DATA PROCESSING \ ALAA ISMAIEL IBRAHIM SHUMO ... .{etal} ; AHMED HAMZA ABDL-MONIEM HAMZA .- khartoum:Sudan University of Science & Technology,College Of Computer Science,2017.-110p.:ill.;28cm.-search Bachelor en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/21226
dc.description Search Bachelor en_US
dc.description.abstract Imagine the massive volume of data in the world, and the rapid growth of it every moment and every second, these data that carry many useful values, which help companies to succeed and increase a competitive advantage, is called 'Big Data', due to its sheer Volume, Variety, Velocity and Veracity. Most of this data is unstructured, structured or semi structured. The large amounts of data created a need for new frameworks for processing. The “Apache Hadoop MapReduce" model is a framework for processing large-scale datasets with parallel and distributed algorithms. The “Apache Hadoop MapReduce“allows for the distributed processing of large data sets across clusters of computers using simple programming models. Recently a framework called Apache Spark has emerged, focused on micro-batch data processing. In addition the main feature of Spark is the in-memory computation. In this research, we perform a comparative study on the performance of these two frameworks. Additionally we use bigdatabench (tool) to load dataset up to 420 million records. Experimental results show that Spark has better performance and overall lower runtimes than Apache Hadoop MapReduce. en_US
dc.description.sponsorship Sudan University of Science & Technology en_US
dc.language.iso en en_US
dc.publisher Sudan University of Science and Technology en_US
dc.subject Computer Science en_US
dc.subject APACHE SPARK en_US
dc.subject APACHE HADOOP MAPREDUCE en_US
dc.subject BIG DATA PROCESSING en_US
dc.title COMPARING THE PERFORMANCE OF APACHE SPARK AND APACHE HADOOP MAPREDUCE ON BIG DATA PROCESSING en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account