SUST Repository

Term Translation disambiguation in Cross-Language Information Retrieval

Show simple item record

dc.contributor.author Mohammed, Ebtihal Mustafa Elamin
dc.contributor.author Supervised, Ali Ahmed Al-faki
dc.contributor.author Supervior - Ali Ahmed Al-faki
dc.date.accessioned 2016-02-21T10:18:09Z
dc.date.available 2016-02-21T10:18:09Z
dc.date.issued 2015-11-01
dc.identifier.citation Mohammed , Ebtihal Mustafa Elamin . Term Translation disambiguation in Cross-Language Information Retrieval : Translation From Arabic To English / Ebtihal Mustafa Elamin Mohammed ; Ali Ahmed Al-faki .- khartoum : Sudan University of Science and Technology , College of Computer science and Information Technology , 2015 .- 53p. ;28cm .- M.Sc. en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/12809
dc.description thises en_US
dc.description.abstract Cross-language information retrieval (CLIR), where queries and documents are in different languages, become one of the major topics within the information retrieval community. The important step in CLIR is the translation. This research proposes a term translation disambiguation method based on co-occurrence statistics for translation in Arabic-English CLIR. There are multiple ways to perform query translations: employing machine translation techniques, using parallel corpora or using bilingual dictionaries. The first two approaches are very labour intensive. Manual hand-coding of linguistic, semantic and pragmatic knowledge is required for a machine translation engine to produce good translations. This can be quite overwhelming when the domain of coverage is wide. A great deal of work is also required for building parallel collections when using the second approach. With the increasing availability of machine-readable bilingual dictionaries, the third approach has become a viable approach to Cross-Language Information Retrieval (CLIR), but in this approach, resolving term ambiguity is a crucial step. In this research the ambiguity problem was resolved by co-occurrence statistics. Co- occurrence technique based on the hypothesis that correct translations tend to co- occur together in the target language collection. Therefore, the valid translation among a set of possible synonymous candidates of a certain source query term is expected to have high frequency of co-occurrence with the translations of the other terms in the same source query. After the document set divided to fixed size window to overcome varying in document length problem, the degree of association is calculated using mutual information measure because it simple and produce high correlation between terms even though they not appeared very frequently in document set. The results of developed method proved that co-occurrence statistics can reduce the ambiguity problem and it works well in case of diacritics and homonymous. en_US
dc.description.sponsorship Sudan University of Science and Technology en_US
dc.language.iso en_US en_US
dc.publisher Sudan University of Science and Technology en_US
dc.subject Term Translation disambiguation en_US
dc.subject Cross-Language Information Retrieval en_US
dc.subject Translation From Arabic To English en_US
dc.title Term Translation disambiguation in Cross-Language Information Retrieval en_US
dc.title.alternative ‫المعلومات‬ ‫استرجاع‬ ‫انظمة‬ ‫في‬ ‫اإلرتباط‬ ‫معامل‬ ‫بإستخذام‬ ‫الترجمة‬ ‫غموض‬ ‫إزالة‬ ‫اللغات‬ ‫بين‬ en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account