Robust Automatic Recognition of Speech with Background Music

Málek Jiří

Robust Automatic Recognition of Speech with Background Music

dc.contributor.author	Málek Jiří	cs
dc.contributor.author	Žďánský Jindřich	cs
dc.contributor.author	Červa Petr	cs
dc.date.accessioned	2018-09-25T12:15:05Z
dc.date.available	2018-09-25T12:15:05Z
dc.date.issued	2017-01-01	cs
dc.description.abstract	This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model training on the preprocessed data. In the latter case, two types of autoencoders are considered: the fully connected and the convolutional network. Presented experimental results show that all the investigated techniques are able to improve the recognition of speech distorted by music significantly. For example, in the case of artificial mixtures of speech and electronic music (low Signal-to-Noise Ratio (SNR) of 0 dB), we achieved absolute improvement of accuracy by 35.8%. For real-world broadcast news and a high SNR (about 10 dB), we achieved improvement by 2.4%. The important advantage of the studied approaches is that they do not deteriorate the accuracy in scenarios with clean speech (the decrease is about 1%).	en
dc.format.extent	5	cs
dc.identifier.doi	10.1109/ICASSP.2017.7953150
dc.identifier.isbn	978-1-5090-4117-6	cs
dc.identifier.issn	1520-6149	cs
dc.identifier.uri	https://dspace.tul.cz/handle/15240/31348
dc.identifier.uri	https://ieeexplore.ieee.org/document/7953150
dc.language.iso	eng	cs
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	cs
dc.publisher.city	USA	cs
dc.relation.ispartofseries	0	cs
dc.subject	Robust recognition	cs
dc.subject	background music	cs
dc.subject	feature enhancement	cs
dc.subject	denoising autoencoder	cs
dc.subject	multi-condition training	cs
dc.title	Robust Automatic Recognition of Speech with Background Music	en
dc.title	Robust Automatic Recognition of Speech with Background Music	cs
local.citation.epage	5210-5214	cs
local.citation.spage	5210-5214	cs
local.identifier.publikace	4811
local.identifier.wok	414286205074	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ROBUST AUTOMATIC RECOGNITION OF SPEECH WITH BACKGROUND MUSIC.pdf
Size:: 140.79 KB
Format:: Adobe Portable Document Format
Description:: článek

Download

Robust Automatic Recognition of Speech with Background Music

Files

Original bundle

Collections