Term Translation disambiguation in Cross-Language Information Retrieval

Mohammed, Ebtihal Mustafa Elamin; Supervised, Ali Ahmed Al-faki; Supervior - Ali Ahmed Al-faki

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
Masters Dissertations : Computer Science and Information Technology
→
View Item

dc.contributor.author	Mohammed, Ebtihal Mustafa Elamin
dc.contributor.author	Supervised, Ali Ahmed Al-faki
dc.contributor.author	Supervior - Ali Ahmed Al-faki
dc.date.accessioned	2016-02-21T10:18:09Z
dc.date.available	2016-02-21T10:18:09Z
dc.date.issued	2015-11-01
dc.identifier.citation	Mohammed , Ebtihal Mustafa Elamin . Term Translation disambiguation in Cross-Language Information Retrieval : Translation From Arabic To English / Ebtihal Mustafa Elamin Mohammed ; Ali Ahmed Al-faki .- khartoum : Sudan University of Science and Technology , College of Computer science and Information Technology , 2015 .- 53p. ;28cm .- M.Sc.	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/12809
dc.description	thises	en_US
dc.description.abstract	Cross-language information retrieval (CLIR), where queries and documents are in different languages, become one of the major topics within the information retrieval community. The important step in CLIR is the translation. This research proposes a term translation disambiguation method based on co-occurrence statistics for translation in Arabic-English CLIR. There are multiple ways to perform query translations: employing machine translation techniques, using parallel corpora or using bilingual dictionaries. The first two approaches are very labour intensive. Manual hand-coding of linguistic, semantic and pragmatic knowledge is required for a machine translation engine to produce good translations. This can be quite overwhelming when the domain of coverage is wide. A great deal of work is also required for building parallel collections when using the second approach. With the increasing availability of machine-readable bilingual dictionaries, the third approach has become a viable approach to Cross-Language Information Retrieval (CLIR), but in this approach, resolving term ambiguity is a crucial step. In this research the ambiguity problem was resolved by co-occurrence statistics. Co- occurrence technique based on the hypothesis that correct translations tend to co- occur together in the target language collection. Therefore, the valid translation among a set of possible synonymous candidates of a certain source query term is expected to have high frequency of co-occurrence with the translations of the other terms in the same source query. After the document set divided to fixed size window to overcome varying in document length problem, the degree of association is calculated using mutual information measure because it simple and produce high correlation between terms even though they not appeared very frequently in document set. The results of developed method proved that co-occurrence statistics can reduce the ambiguity problem and it works well in case of diacritics and homonymous.	en_US
dc.description.sponsorship	Sudan University of Science and Technology	en_US
dc.language.iso	en_US	en_US
dc.publisher	Sudan University of Science and Technology	en_US
dc.subject	Term Translation disambiguation	en_US
dc.subject	Cross-Language Information Retrieval	en_US
dc.subject	Translation From Arabic To English	en_US
dc.title	Term Translation disambiguation in Cross-Language Information Retrieval	en_US
dc.title.alternative	‫المعلومات‬ ‫استرجاع‬ ‫انظمة‬ ‫في‬ ‫اإلرتباط‬ ‫معامل‬ ‫بإستخذام‬ ‫الترجمة‬ ‫غموض‬ ‫إزالة‬ ‫اللغات‬ ‫بين‬	en_US
dc.type	Thesis	en_US