<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">
<ArticleSet>
<Article>
<Journal>
				<PublisherName>Sharif University of Technology</PublisherName>
				<JournalTitle>Scientia Iranica</JournalTitle>
				<Issn>1026-3098</Issn>
				<Volume>30</Volume>
				<Issue>1</Issue>
				<PubDate PubStatus="epublish">
					<Year>2023</Year>
					<Month>02</Month>
					<Day>01</Day>
				</PubDate>
			</Journal>
<ArticleTitle>Language recognition by convolutional neural networks</ArticleTitle>
<VernacularTitle></VernacularTitle>
			<FirstPage>116</FirstPage>
			<LastPage>123</LastPage>
			<ELocationID EIdType="pii">22870</ELocationID>
			
<ELocationID EIdType="doi">10.24200/sci.2022.59110.6064</ELocationID>
			
			<Language>EN</Language>
<AuthorList>
<Author>
					<FirstName>L.</FirstName>
					<LastName>Khosravani Pour</LastName>
<Affiliation>Department of Electrical Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran</Affiliation>

</Author>
<Author>
					<FirstName>A.</FirstName>
					<LastName>Farrokhi</LastName>
<Affiliation>Department of Electrical Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran</Affiliation>

</Author>
</AuthorList>
				<PublicationType>Journal Article</PublicationType>
			<History>
				<PubDate PubStatus="received">
					<Year>2021</Year>
					<Month>09</Month>
					<Day>26</Day>
				</PubDate>
			</History>
		<Abstract>Speech recognition and in other word communication between computers and human as a sub field of computational linguistics or Natural Language Processing (NLP) has a long history. ASR (Automatic Speech Recognition), TTS (Text to Speech), STT (Speech to Text), CSR (continuous speech recognition), IVR (Interactive Voice Response) systems are different approaches to solve problems in this area. Hybrid deep neural network (DNN) - hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional GMM-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that extracting prosodic features for Persian language (Farsi) can be obtained by using CNNs for segmentation and labeling speech for short texts. By using 128 and 200 filters for CNN and special architecture we reach 19.46 error in detection rate and also better time consumption in comparison with RNNs. One other advantages of using CNN is simplification of learning procedure. Experimental results show that CNN networks can be a good feature extractor for speech recognition in Farsi or other languages.</Abstract>
		<ObjectList>
			<Object Type="keyword">
			<Param Name="value">Speech Segmentation</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">convolutional neural networks</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Persian Language CSR</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Deep Neural Network</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Gaussian Mixture Model</Param>
			</Object>
		</ObjectList>
<ArchiveCopySource DocType="pdf">https://scientiairanica.sharif.edu/article_22870_e5c98677da9e255feb77253c6e5c7355.pdf</ArchiveCopySource>
</Article>
</ArticleSet>
