Syllable Duration Prediction for Farsi Text-to-Speech Systems


Department of Electrical Engineering,Sharif University of Technology


In this paper, two different statistical approaches are used for duration prediction of the Farsi language. These two statistical models are Neural Networks (NN) and Classification And Regression Trees (CART). The first step in this work was to create a database and develop a flexible feature extraction and selection module. In the next step, the output of the feature selection module was used to train both models. The results of the trained models are further studied to determine the most important parameters affecting the syllable duration in Farsi. The model accuracy is evaluated by using separate training and test data. In the third step of this work, an automatic rule generator module was added to the CART model. These duration prediction rules can be easily applied in a rule-based speech synthesis system.