Abstract:
Nowadays, speech recognition plays a major role in designing a natural voice interface for communication between human and their modern digital life equipment. It is presenting an easy way to cross the language barrier between monolingual individuals. But the obvious problem with this field is the lack of wide support for several universal languages and their dialects; while most of the daily interaction is done using them.
This research comes to ensure the viability of designing the Automatic speech recognition model for the Sudanese Dialect. The researcher focused on building a dataset by collecting represented resources and perform pre-processing to construct the dataset. The Automatic speech recognition model was built by training the model to recognize each character of the Sudanese Dialect. The model's architecture followed the end-to-end speech recognition approach. Each building block of the model was formed using Convolution Neural Networks rather than Recurrent Neural Networks, the usual choice of the speech-related task, and the training was done using the Connectionist Temporal Classification learning algorithm.
In this research, a Sudanese dialect dataset was built overcoming the lack of annotated data and reached an average label error rate of 73.67%. The proposed model will enable the use of the collected dataset in any Natural Language Processing future research targeting the Sudanese Dialect. The designed model, with its performance, provided some insights about the current recognition task. The model can reach a much better label error rate by deploying any improvement such as a language model. The applications for this research are vastly available from designing archives for the Sudanese content with its text format to develop real-time speech recognizer.