Language recognition by convolutional neural networks

Document Type : Research Note


Department of Electrical Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran


Speech recognition and in other word communication between computers and human as a sub field of computational linguistics or Natural Language Processing (NLP) has a long history. ASR (Automatic Speech Recognition), TTS (Text to Speech), STT (Speech to Text), CSR (continuous speech recognition), IVR (Interactive Voice Response) systems are different approaches to solve problems in this area. Hybrid deep neural network (DNN) - hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional GMM-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that extracting prosodic features for Persian language (Farsi) can be obtained by using CNNs for segmentation and labeling speech for short texts. By using 128 and 200 filters for CNN and special architecture we reach 19.46 error in detection rate and also better time consumption in comparison with RNNs. One other advantages of using CNN is simplification of learning procedure. Experimental results show that CNN networks can be a good feature extractor for speech recognition in Farsi or other languages.


1. Sameti, H., Veisi, H., Bahrani, M., et al. "A large vocabulary continuous speech recognition system for Persian language", EURASIP Journal on Audio, Speech, and Music Processing, 2011(1), pp. 16-28 (2021).
2. Kurzekar, P.K., Deshmukh, R.R., Waghmare, V.B., et al. "Continuous speech recognition system: A review", Asian Journal of Computer Science and Information Technology, 4(6), pp. 62-66 (2014).
3. Ossama, A.-H., Abdel-rahman, M., Hui, J., et al. "Convolutional neural networks for speech recognition", Audio, Speech, and Language Processing, IEEE/ACM, 22, pp. 1533-1545 (2014). 10.1109/TASLP.2014.2339736.
4. Hayani, S., Benaddy, M., El Meslouhi, O., et al. "Arab sign language recognition with convolutional neural networks", In 2019 International Conference of Computer Science and Renewable Energies (ICCSRE), UAE, IEEE (2019).
5. Vildan, _I.K. and Kavak, E. "Variation sets in childdirected and child speech: A case study in Turkish", Eurasian Journal of Applied Linguistics, 7(1), pp. 1-10 (2021).
6. Boukdir, A., Benaddy, M., Ellahyani, A., et al. "Isolated video-based Arabic sign language recognition using convolutional and recursive neural networks", Arabian Journal for Science and Engineering, 47(2), pp. 2187-2199 (2022).
7. Anil, M.A., Rebello, R.M., and Bhat, J.S. "Speechlanguage profile of a child with fahr's disease: Case report of a rare neurodegenerative disorder", Journal of Natural Science, Biology and Medicine, 11(2), pp. 206-211 (2020).
8. Hideyuki, T., Uenoyama, K., and Aihara, S. "Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention", 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4784- 4788 (2017).
9. Mitra, V. and Franco, H. "Time-frequency convolutional networks for robust speech recognition", 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Singapour, IEEE (2015).
10. Dahl, G.E., Yu, D., Deng, L., et al. "Contextdependent pre-trained deep neural networks for largevocabulary speech recognition", IEEE Transactions on Audio, Speech, and Language Processing, 20(1), pp. 30-42 (2021).
11. Almutairi, M., Nouf, A.L., and Zitouni, M. "The uses and functions of barack Obama's hedging language in selected speeches", Eurasian Journal of Applied Linguistics, 8(1), pp. 73-84 (2022).
12. Cheng, K.L., Yang, Z., Chen, Q., et al. "Fully convolutional networks for continuous sign language recognition", In European Conference on Computer Vision, Springer, Cham, Germany, pp. 697-714 (2020).
13. Toth, L. "Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition", 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Berlin, Germany, IEEE (2014).
14. Qazani, M.R.C., Asadi, H., Lim, C.P., et al. "Prediction of motion simulator signals using time-series neural networks", IEEE Transactions on Aerospace and Electronic Systems, 57(5), pp. 3383-3392 (2021).
15. Qazani, M.R.C., Asadi, H., Mohamed, S., et al. "An optimal washout filter for motion platform using neural network and fuzzy logic", Engineering Applications of Artificial Intelligence, 108, p. 104564 (2022).
16. Elaraby, A. and Moratal, D. "A generalized entropybased two-phase threshold algorithm for noisy medical image edge detection", Scientia Iranica, 24(6), pp. 3247-3256 (2017).
17. Gerazov, B., Bailly, G., Mohammed, O., et al. "A variational prosody model for mapping the contextsensitive variants of functional prosodic prototypes", ArXiv Preprint arXiv: 180608685 (2018).
18. Qian, Y., Fan, Y., Hu, W., et al. "On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Berlin, Germany, IEEE (2014).
19. Topaloglu, I. "Deep learning based convolutional neural network structured new image classification approach for eye disease identification", Scientia Iranica (2022) (In Press). DOI: 10.24200/sci.2022.58049.5537.
20. Awwad, S., Tartory, R., Johar, M.G.M., et al. "Use of rhetoric and metaphorical expressions in Jordanian political discourse (speeches): an exploratory study", Eurasian Journal of Applied Linguistics, 7(2), pp. 162- 170 (2021).
21. Shamrat, F.J.M., Chakraborty, S., Billah, M.M., et al. "Bangla numerical sign language recognition using convolutional neural networks", Indonesian Journal of Electrical Engineering and Computer Science, 23(1), pp. 405-413 (2021).
22. Malekmohammadi, A., Mohammadzade, H., Chamanzar, A., et al. "An efficient hardware implementation for a motor imagery brain computer interface system", Scientia Iranica, 26(1) (Special Issue on: Socio- Cognitive Engineering), pp. 72-94 (2019).
23. C evik, M. and Tabaru- Ornek, G. "Comparison of MATLAB and SPSS software in the prediction of academic achievement with artificial neural networks: Modeling for elementary school students", International Online Journal of Education and Teaching, 7(4), pp. 1689-1707 (2020).
24. Uni, K. "Benefits of Arabic vocabulary for teaching Malay to persian-speaking university students", Eurasian Journal of Applied Linguistics, 8(1), pp. 133- 142 (2022).
25. Jafari, H.S. and Homayoonpoor, M.M. "Persian speech sentence segmentation without speech recognition", Iranian Conference on Intelligent System (ICIS); Tehran, Iran (2014).
26. Sharma, S. and Kumar, K. "ASL-3DCNN: American sign language recognition technique using 3-D convolutional neural networks", Multimedia Tools and Applications, 80(17), pp. 26319-26331 (2021).
27. Adane, K. and Beyene, B. "Machine learning and deep learning based phishing websites detection: The current gaps and next directions", Review of Computer Engineering Research, 9(1), pp. 13-29 (2022).
28. Rustamovich, S., Ilgiz, I., and Larisa, Y. "Development of an application for creation and learning of neural networks to utilize in environmental sciences", Caspian Journal of Environmental Sciences, 18(5), pp. 595-601 (2020).
29. Al-masaeed, S., Alshareef, H.N., Johar, M.G.M., et al. "A study on educational research of artificial neural networks in the Jordanian perspective abstract", Eurasian Journal of Educational Research, 96(96), pp. 281-301 (2021).
30. Gao, R., Du, L., and Yuen, K.F. "Robust empirical wavelet fuzzy cognitive map for time series forecasting", Engineering Applications of Artificial Intelligence, 1(96), p. 103978 (2020).