Please use this identifier to cite or link to this item: https://repository.sustech.edu/handle/123456789/1506
Title: Detecting Similarity Among Multiple Data Sources For Categorized DATA
Authors: Mohamed, Gamal Saad
Supervisor - Awad EL-Kareem Mohammed Yousof
Keywords: Data management
Issue Date: 1-Sep-2012
Publisher: Sudan University of Science and Technology
Citation: Mohamed,Gamal Saad.Detecting Similarity Among Multiple Data Sources/ Gamal Saad Mohamed;Awad EL-Kareem Yousof.-khartoum:Sudan University of Science & Technology,computer science,2012.-92p:ill;28cm.-Ph.D.
Abstract: Efficient techniques to detect similar data in many data sources has become one of the most important and challenging issues in many areas such as Data Base, Bioinformatics and Data Mining.In this research, a three phase framework for similarity detection is proposed: In the first phase: Data Sources were collected from the web, depending on how it relates to a predetermined domain. The base source is the source of the data available, which describes the domain. In the second phase: the sources obtained are filtered to select data sources with a greater probability of containing data describing the domain by examining the degree of similarity between the base source, and each source from the sources obtained "External Sources". Whereas the selection is only for the external sources which its simi_degree value is less than, or equal to the average of the simi_degree values of all sources. In the third phase: Content similarity is examined between the base source, and all the selected external sources in phase 1, by using the proposed "Probability Measure" that gives a value on the basis of which it is determined whether the content of external sources is similar to the content of the base resource. Experimental result shows that the researcher's similarity framework can achieve better quality result than the conventional approaches.
Description: Thesis
URI: http://repository.sustech.edu/handle/123456789/1506
Appears in Collections:PhD theses : Computer Science and Information Technology

Files in This Item:
File Description SizeFormat 
Detecting Similarity among ... .pdfTitle43.8 kBAdobe PDFView/Open
Abstract.pdfAbstract74.02 kBAdobe PDFView/Open
chapter 1.pdf
  Restricted Access
chapter 32 kBAdobe PDFView/Open Request a copy
chapter 2.pdf
  Restricted Access
chapter 103.9 kBAdobe PDFView/Open Request a copy
chapter 3.pdf
  Restricted Access
chapter 47.32 kBAdobe PDFView/Open Request a copy
chapter 4.pdf
  Restricted Access
chapter 100 kBAdobe PDFView/Open Request a copy
chapter5.pdf
  Restricted Access
chapter 213.15 kBAdobe PDFView/Open Request a copy
chapter 6.pdf
  Restricted Access
chapter 16.96 kBAdobe PDFView/Open Request a copy
appendix.pdfappendix61.35 kBAdobe PDFView/Open
refrence.pdfrefrence15.81 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.