Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers

Matějů Lukáš

Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers

dc.contributor.author	Matějů Lukáš	cs
dc.contributor.author	Červa Petr	cs
dc.contributor.author	Žďánský Jindřich	cs
dc.contributor.author	Málek Jiří	cs
dc.date.accessioned	2018-09-25T12:15:06Z
dc.date.available	2018-09-25T12:15:06Z
dc.date.issued	2017-01-01	cs
dc.description.abstract	In this paper, a new approach to online Speech Activity Detection (SAD) is proposed. This approach is designed for the use in a system that carries out 24/7 transcription of radio/TV broadcasts containing a large amount of non-speech segments, such as advertisements or music. To improve the robustness of detection, we adopt Deep Neural Networks (DNNs) trained on artificially created mixtures of speech and non-speech signals at desired levels of signal-to-noise ratio (SNR). An integral part of our approach is an online decoder based on Weighted Finite State Transducers (WFSTs); this decoder smooths the output from DNN. The employed transduction model is context-based, i.e., both speech and non-speech events are modeled using sequences of states. The presented experimental results show that our approach yields state-of-the-art results on standardized QUT-NOISE-TIMIT data set for SAD and, at the same time, it is capable of a) operating with low latency and b) reducing the computational demands and error rate of the target transcription system.	en
dc.format.extent	5	cs
dc.identifier.doi	10.1109/ICASSP.2017.7953200
dc.identifier.isbn	978-1-5090-4117-6	cs
dc.identifier.issn	1520-6149	cs
dc.identifier.uri	https://dspace.tul.cz/handle/15240/31351
dc.identifier.uri	https://ieeexplore.ieee.org/document/7953200
dc.language.iso	eng	cs
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	cs
dc.publisher.city	USA	cs
dc.relation.ispartofseries	0	cs
dc.subject	deep neural networks	cs
dc.subject	speech activity detection	cs
dc.subject	weighted finite state transducers	cs
dc.subject	speech recognition	cs
dc.title	Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers	en
dc.title	Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers	cs
local.citation.epage	5460-5464	cs
local.citation.spage	5460-5464	cs
local.identifier.publikace	4814
local.identifier.wok	414286205124	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SPEECH ACTIVITY.pdf
Size:: 280.46 KB
Format:: Adobe Portable Document Format
Description:: článek

Download

Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers

Files

Original bundle

Collections