COMPARING THE PERFORMANCE OF APACHE  SPARK AND APACHE HADOOP MAPREDUCE ON BIG DATA PROCESSING

SHUMO, ALAA ISMAIEL IBRAHIM; SALIH, ESRA ADIL GALAL .; KHALED, SAJDA LOTFY AHMED; ALBASHEER, SARA HASSABO ABDALLAH; Supervisor, -AHMED HAMZA ABDL-MONIEM HAMZA

Please use this identifier to cite or link to this item: https://repository.sustech.edu/handle/123456789/21226

Full metadata record

DC Field	Value	Language
dc.contributor.author	SHUMO, ALAA ISMAIEL IBRAHIM
dc.contributor.author	SALIH, ESRA ADIL GALAL .
dc.contributor.author	KHALED, SAJDA LOTFY AHMED
dc.contributor.author	ALBASHEER, SARA HASSABO ABDALLAH
dc.contributor.author	Supervisor, -AHMED HAMZA ABDL-MONIEM HAMZA
dc.date.accessioned	2018-08-02T08:55:06Z
dc.date.available	2018-08-02T08:55:06Z
dc.date.issued	2017-10-02
dc.identifier.citation	SHUMO, ALAA ISMAIEL IBRAHIM . COMPARING THE PERFORMANCE OF APACHE SPARK AND APACHE HADOOP MAPREDUCE ON BIG DATA PROCESSING \ ALAA ISMAIEL IBRAHIM SHUMO ... .{etal} ; AHMED HAMZA ABDL-MONIEM HAMZA .- khartoum:Sudan University of Science & Technology,College Of Computer Science,2017.-110p.:ill.;28cm.-search Bachelor	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/21226
dc.description	Search Bachelor	en_US
dc.description.abstract	Imagine the massive volume of data in the world, and the rapid growth of it every moment and every second, these data that carry many useful values, which help companies to succeed and increase a competitive advantage, is called 'Big Data', due to its sheer Volume, Variety, Velocity and Veracity. Most of this data is unstructured, structured or semi structured. The large amounts of data created a need for new frameworks for processing. The “Apache Hadoop MapReduce" model is a framework for processing large-scale datasets with parallel and distributed algorithms. The “Apache Hadoop MapReduce“allows for the distributed processing of large data sets across clusters of computers using simple programming models. Recently a framework called Apache Spark has emerged, focused on micro-batch data processing. In addition the main feature of Spark is the in-memory computation. In this research, we perform a comparative study on the performance of these two frameworks. Additionally we use bigdatabench (tool) to load dataset up to 420 million records. Experimental results show that Spark has better performance and overall lower runtimes than Apache Hadoop MapReduce.	en_US
dc.description.sponsorship	Sudan University of Science & Technology	en_US
dc.language.iso	en	en_US
dc.publisher	Sudan University of Science and Technology	en_US
dc.subject	Computer Science	en_US
dc.subject	APACHE SPARK	en_US
dc.subject	APACHE HADOOP MAPREDUCE	en_US
dc.subject	BIG DATA PROCESSING	en_US
dc.title	COMPARING THE PERFORMANCE OF APACHE SPARK AND APACHE HADOOP MAPREDUCE ON BIG DATA PROCESSING	en_US
dc.type	Thesis	en_US
Appears in Collections:	Bachelor of Computer Science and Information Technology

Files in This Item:

File	Description	Size	Format
COMPARING THE PERFORMANCE .....pdf	Research	4.16 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets