SUST Repository

Automatic Recognition and Identification for Mixed Sudanese Arabic – English Languages Speech

Show simple item record

dc.contributor.author Elfahal, Mohammed Osman Eltayeb
dc.contributor.author Supervisor, Mohammed Elhafiz Mustafa
dc.contributor.author Co-Supervisor, Rashid A. Saeed
dc.date.accessioned 2025-11-25T19:25:54Z
dc.date.available 2025-11-25T19:25:54Z
dc.date.issued 2019-08-25
dc.identifier.citation Elfahal, Mohammed Osman Eltayeb. Automatic Recognition and Identification for Mixed Sudanese Arabic – English Languages Speech/ Mohammed Osman Eltayeb Elfahal; Mohammed Elhafiz Mustafa, Rashid A. Saeed.- Khartoum : Sudan University of Science and Technology, College of Computer Science and Information Technology ,2019.-137p.:ill.;28cm.- P.hd en_US
dc.identifier.uri https://repository.sustech.edu/handle/123456789/28398
dc.description.abstract Mixed speech is the phenomena of using more than one language in a single sentence, this occurs in communication between bilinguals to express their ideas and thoughts using vocabulary of both languages, even occurs among none bilingual people to describe product originally from second language. This thesis addresses the problem of mixed speech communication in multilingual communities. This is regional problem faces shortage resources and studies. Sudanese Arabic and English languages are the two languages selected for this research to build a generalized mixed speech and language identification model, the first is common and formal language among the Sudan and the latter is international, language of science and primary lesson in Sudan education systems. For experimental purposes, mixed speech corpus was built including most frequent daily life Sudanese Arabic and English mixed sentences, collected through social media applications campaign, 75% of this collection is read by 87 bilingual Arabic natives in office environment resulting in 2289 audio files associated with their transcription for training purpose, considering speakers and code-switch types, environment as factors affecting performance of the model at recording time. Based on the assumption that native language dominance others in mixed speech, proposed solution for generalizing recognition model is centered around Sudanese Arabic language. The solution keeps the original words for each language participates in switching in all components of the model such as mixed phonetic dictionary, mixed languages lexicon, etc., except for Acoustic Model (AM) Arabic language is used instead of its original language based on assumption that native speaker does not suddenly reconfigure his articulation organs to produce sounds as natives do. Open source CMU SHPINX is adapted for this mixed speech task, proposed model, which is consider effected by native language dominance, outperforms existing single pass and multi pass models achieving overall accuracy of 33.05% in term of Word Error Rate (WER). Mixed speech produce hybrid language not belong to each participating language, interface for further linguistic computation is provided to deal with this new language. The interface contains recognized word, its order in the sentence, recognition confidence and its language identity. Language identification in the model is simply looked up identity from mixed languages lexicon to avoid effects of unclear language discrimination attributes in such speech. Achieved results, prove the possibility to generalize the model based on Arabic language, module for phonemes clustering and comparison needed to serve as front-end to detect new language phonemes that are not included in phonemes set in order to add new language to the model. en_US
dc.description.sponsorship Sudan University of Science and Technology en_US
dc.language.iso en en_US
dc.publisher Sudan University of Science and Technology en_US
dc.subject Automatic Recognition en_US
dc.subject Mixed Sudanese Arabic en_US
dc.subject Identification en_US
dc.subject English Languages Speech en_US
dc.subject Computer Science en_US
dc.title Automatic Recognition and Identification for Mixed Sudanese Arabic – English Languages Speech en_US
dc.title.alternative التعرف الآلي وتحديد اللغات في الكلام المختلط العربي – الانجليزي السوداني en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search SUST


Browse

My Account