Sharif University of Technology Scientia Iranica 1026-3098 30 1 2023 02 01 Language recognition by convolutional neural networks 116 123 22870 10.24200/sci.2022.59110.6064 EN L. Khosravani Pour Department of Electrical Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran A. Farrokhi Department of Electrical Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran Journal Article 2021 09 26 Speech recognition and in other word communication between computers and human as a sub field of computational linguistics or Natural Language Processing (NLP) has a long history. ASR (Automatic Speech Recognition), TTS (Text to Speech), STT (Speech to Text), CSR (continuous speech recognition), IVR (Interactive Voice Response) systems are different approaches to solve problems in this area. Hybrid deep neural network (DNN) - hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional GMM-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that extracting prosodic features for Persian language (Farsi) can be obtained by using CNNs for segmentation and labeling speech for short texts. By using 128 and 200 filters for CNN and special architecture we reach 19.46 error in detection rate and also better time consumption in comparison with RNNs. One other advantages of using CNN is simplification of learning procedure. Experimental results show that CNN networks can be a good feature extractor for speech recognition in Farsi or other languages.

https://scientiairanica.sharif.edu/article_22870_e5c98677da9e255feb77253c6e5c7355.pdf