Robust Automatic Recognition of Speech with Background Music
dc.contributor.author | Málek Jiří | cs |
dc.contributor.author | Žďánský Jindřich | cs |
dc.contributor.author | Červa Petr | cs |
dc.date.accessioned | 2018-09-25T12:15:05Z | |
dc.date.available | 2018-09-25T12:15:05Z | |
dc.date.issued | 2017-01-01 | cs |
dc.description.abstract | This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model training on the preprocessed data. In the latter case, two types of autoencoders are considered: the fully connected and the convolutional network. Presented experimental results show that all the investigated techniques are able to improve the recognition of speech distorted by music significantly. For example, in the case of artificial mixtures of speech and electronic music (low Signal-to-Noise Ratio (SNR) of 0 dB), we achieved absolute improvement of accuracy by 35.8%. For real-world broadcast news and a high SNR (about 10 dB), we achieved improvement by 2.4%. The important advantage of the studied approaches is that they do not deteriorate the accuracy in scenarios with clean speech (the decrease is about 1%). | en |
dc.format.extent | 5 | cs |
dc.identifier.doi | 10.1109/ICASSP.2017.7953150 | |
dc.identifier.isbn | 978-1-5090-4117-6 | cs |
dc.identifier.issn | 1520-6149 | cs |
dc.identifier.uri | https://dspace.tul.cz/handle/15240/31348 | |
dc.identifier.uri | https://ieeexplore.ieee.org/document/7953150 | |
dc.language.iso | eng | cs |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | cs |
dc.publisher.city | USA | cs |
dc.relation.ispartofseries | 0 | cs |
dc.subject | Robust recognition | cs |
dc.subject | background music | cs |
dc.subject | feature enhancement | cs |
dc.subject | denoising autoencoder | cs |
dc.subject | multi-condition training | cs |
dc.title | Robust Automatic Recognition of Speech with Background Music | en |
dc.title | Robust Automatic Recognition of Speech with Background Music | cs |
local.citation.epage | 5210-5214 | cs |
local.citation.spage | 5210-5214 | cs |
local.identifier.publikace | 4811 | |
local.identifier.wok | 414286205074 | en |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- ROBUST AUTOMATIC RECOGNITION OF SPEECH WITH BACKGROUND MUSIC.pdf
- Size:
- 140.79 KB
- Format:
- Adobe Portable Document Format
- Description:
- článek