Department of Biomedical Engineering,Amirkabir University of Technology
One of the most important challenges in automatic speech recognition is the case of mismatch between training and test data. Conventional methods for improving recognition robustness seek to eliminate or reduce the mismatch, e.g. enhancement of the speech by adapting the statistical models. Training the model in different situations is another example of these methods. The success with these techniques has been moderate compared to human performance. In this paper, an inspiration from human listeners created the motivation to develop and implement a new bidirectional neural network. This network is capable of modeling the phoneme sequence, using bidirectional connections in an isolated word recognition task. This network can correct the phoneme sequence obtained from the acoustic model to what is learned in the training phase. Acoustic feature vectors are enhanced, based on the inversion techniques in neural networks, by cascading the lexical and the acoustic model. Speech enhancement by this method has a remarkable effect in eliminating mismatch between the training and test data. The efficiency of the lexical model and speech enhancement was observed by a 17.3 percent increase in the phoneme recognition correction ratio.