SUST Repository

Building the Multilingual Hadith Corpus to Enhance Performance of Information Retrieval System for Hadith

Show simple item record

dc.contributor.author Hassan, Samah Mohamed Osman
dc.contributor.author Supervised, -Eric Atwell
dc.date.accessioned 2017-10-23T07:16:26Z
dc.date.available 2017-10-23T07:16:26Z
dc.date.issued 2017-08-12
dc.identifier.citation Hassan, Samah Mohamed Osman .Building the Multilingual Hadith Corpus to Enhance Performance of Information Retrieval System for Hadith /Samah Mohamed Osman Hassan ;Eric Atwell.- Khartoum: Sudan University of Science and Technology, college of Computer science and information technology, 2017 .- 139p. :ill. ;28cm .- PhD en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/18855
dc.description Thesis en_US
dc.description.abstract Information retrieval (IR) systems retrieve relevant information relating to a specific query by the user, and this requires the extraction of related unstructured information from data which may be texts, sound, images. In this context, an important problem facing information retrieval, in particular from text files, is reliance on exact matching of the word or words in the query and the same words in a specific text file. This leads in many cases to the loss of results where files contain synonyms with words in the query which may be useful to the user. This dilemma appears in most information retrieval systems for unstructured text data, and with most languages, especially with regard to the Arabic language. This research will deal with the problem of information retrieval from the Hadith across many languages, by building a parallel corpus with multiple languages containing the Hadith in Arabic as well as translated texts in English, French and Russian. We have built a parallel corpus containing the text of 2030 Arabic Hadith along with the translation in English, French, and Russian languages. Thus the parallel corpus contains 8120 Hadith consisting of 2,470,913 words. Our matching algorithm to applied into thedata for the retrieval process, calculating the weight of the words in the query based on their importance and then comparing this with the existing documents, which have been processed to calculate the importance of words in each document. Then a similarity coefficient is calculated from the particular query and existing documents. To improve performance, the system has a dictionary of words with identifying all files that contain those words as an inverted index. We built a web portal to allow user search via World Wide Web. We designed and evaluated the proposed solution by using a selection of important concepts, for which we have pre-determined the results manually without referring to the system. The evaluation work calculates both the average precision and average recall for each language. The results showed that the proposed method has good results for retrieval in all four languages: the average precision and average recall of the Arabic language were 96.5% and 82%, consequently for the English language they were 98.4% and 90%, the French language were 97.5% and 91.7% and the Russian language they were 98% and 91%. en_US
dc.description.sponsorship Sudan University of Science and Technology en_US
dc.language.iso en en_US
dc.publisher Sudan University of Science and Technology en_US
dc.subject Building the Multilingual en_US
dc.subject Hadith Corpus to Enhance en_US
dc.subject Retrieval System for Hadith en_US
dc.title Building the Multilingual Hadith Corpus to Enhance Performance of Information Retrieval System for Hadith en_US
dc.title.alternative بىاء مجاميغ متؼذدة انهغاث نهحذيث بغرض تحسيه كفاءة وظاو استرجاع الاحاديث انىبويت en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account