Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers

dc.contributor.authorMatějů Lukášcs
dc.contributor.authorČerva Petrcs
dc.contributor.authorŽďánský Jindřichcs
dc.contributor.authorMálek Jiřícs
dc.date.accessioned2018-09-25T12:15:06Z
dc.date.available2018-09-25T12:15:06Z
dc.date.issued2017cs
dc.description.abstractIn this paper, a new approach to online Speech Activity Detection (SAD) is proposed. This approach is designed for the use in a system that carries out 24/7 transcription of radio/TV broadcasts containing a large amount of non-speech segments, such as advertisements or music. To improve the robustness of detection, we adopt Deep Neural Networks (DNNs) trained on artificially created mixtures of speech and non-speech signals at desired levels of signal-to-noise ratio (SNR). An integral part of our approach is an online decoder based on Weighted Finite State Transducers (WFSTs); this decoder smooths the output from DNN. The employed transduction model is context-based, i.e., both speech and non-speech events are modeled using sequences of states. The presented experimental results show that our approach yields state-of-the-art results on standardized QUT-NOISE-TIMIT data set for SAD and, at the same time, it is capable of a) operating with low latency and b) reducing the computational demands and error rate of the target transcription system.en
dc.format.extent5cs
dc.identifier.doi10.1109/ICASSP.2017.7953200
dc.identifier.isbn978-1-5090-4117-6cs
dc.identifier.issn1520-6149cs
dc.identifier.urihttps://dspace.tul.cz/handle/15240/31351
dc.identifier.urihttps://ieeexplore.ieee.org/document/7953200
dc.language.isoengcs
dc.publisherInstitute of Electrical and Electronics Engineers Inc.cs
dc.publisher.cityUSAcs
dc.relation.ispartofseries0cs
dc.subjectdeep neural networkscs
dc.subjectspeech activity detectioncs
dc.subjectweighted finite state transducerscs
dc.subjectspeech recognitioncs
dc.titleSpeech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducersen
dc.titleSpeech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducerscs
local.citation.epage5460-5464cs
local.citation.spage5460-5464cs
local.identifier.publikace4814
local.identifier.wok414286205124en
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SPEECH ACTIVITY.pdf
Size:
280.46 KB
Format:
Adobe Portable Document Format
Description:
článek
Collections