Browsing by Author "Červa Petr"
Now showing 1 - 20 of 21
Results Per Page
Sort Options
- ItemAn Approach to Online Speaker Change Point Detection Using DNNs and WFSTs(ISCA, 2019) Matějů Lukáš; Červa Petr; Žďánský Jindřich
- ItemASR for South Slavic Languages Developed in Almost Automated Way(International Speech and Communication Association, 2016) Nouza Jan; Šafařík Radek; Červa Petr
- ItemCompensation of Nonlinear Distortions in Speech for Automatic Recognition(Institute of Electrical and Electronics Engineers Inc., 2015) Málek Jiří; Silovský Jan; Červa Petr; Koldovský Zbyněk; Nouza Jan; Žďánský Jindřich
- ItemCross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources(Fundancja Uniwersytetu im. Adama Mickiewicza w Poznaniu, 2015) Nouza Jan; Červa Petr; Šafařík Radek
- ItemDealing with Newly Emerging OOVs in Broadcast Programs by Daily Updates of the Lexicon and Language Model(Springer Nature Switzerland, 2020) Červa Petr; Volná Veronika; Weingartová Lenka
- ItemInvestigation into the use of deep neural networks for LVCSR of Czech(IEEE, 2015) Matějů Lukáš; Červa Petr; Žďánský Jindřich
- ItemInvestigation into the Use of WFSTs and DNNs for Speech Activity Detection in Broadcast Data Transcription(Springer Verlag, 2017) Matějů Lukáš; Červa Petr; Žďánský Jindřich
- ItemMultilingual Multimedia Monitoring and Analyzing Platform(2017) Nouza Jan; Červa Petr; Žďánský Jindřich; Čihák Stanislav; Bureš Kamil
- ItemMyVoice version 2.0(2020) Chaloupka Josef; Červa Petr; Nouza Jan
- ItemOptical Character Recognition for Audio-Visual Broadcast Transcription System(IEEE, 2020) Chaloupka Josef; Paleček Karel; Červa Petr; Žďánský Jindřich
- ItemRobust Automatic Recognition of Speech with Background Music(Institute of Electrical and Electronics Engineers Inc., 2017) Málek Jiří; Žďánský Jindřich; Červa PetrThis paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model training on the preprocessed data. In the latter case, two types of autoencoders are considered: the fully connected and the convolutional network. Presented experimental results show that all the investigated techniques are able to improve the recognition of speech distorted by music significantly. For example, in the case of artificial mixtures of speech and electronic music (low Signal-to-Noise Ratio (SNR) of 0 dB), we achieved absolute improvement of accuracy by 35.8%. For real-world broadcast news and a high SNR (about 10 dB), we achieved improvement by 2.4%. The important advantage of the studied approaches is that they do not deteriorate the accuracy in scenarios with clean speech (the decrease is about 1%).
- ItemRobust Recognition of Conversational Telephone Speech via Multi-Condition Training and Data Augmentation(Springer Verlag, 2018) Málek Jiří; Žďánský Jindřich; Červa Petr
- ItemRobust Recognition of Speech with Background Music in Acoustically Under-Resourced Scenarios(IEEE, 2018) Málek Jiří; Žďánský Jindřich; Červa Petr
- ItemSpeech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers(Institute of Electrical and Electronics Engineers Inc., 2017) Matějů Lukáš; Červa Petr; Žďánský Jindřich; Málek JiříIn this paper, a new approach to online Speech Activity Detection (SAD) is proposed. This approach is designed for the use in a system that carries out 24/7 transcription of radio/TV broadcasts containing a large amount of non-speech segments, such as advertisements or music. To improve the robustness of detection, we adopt Deep Neural Networks (DNNs) trained on artificially created mixtures of speech and non-speech signals at desired levels of signal-to-noise ratio (SNR). An integral part of our approach is an online decoder based on Weighted Finite State Transducers (WFSTs); this decoder smooths the output from DNN. The employed transduction model is context-based, i.e., both speech and non-speech events are modeled using sequences of states. The presented experimental results show that our approach yields state-of-the-art results on standardized QUT-NOISE-TIMIT data set for SAD and, at the same time, it is capable of a) operating with low latency and b) reducing the computational demands and error rate of the target transcription system.
- ItemSpeech-to-text summarization using automatic phrase extraction from recognized text(Springer International Publishing, 2016) Rott Michal; Červa Petr
- ItemStudy on Methods for Vector Representation of Text for Topic-based Clustering of News Articles(Fundancja Uniwersytetu im. Adama Mickiewicza w Poznaniu, 2015) Rott Michal; Červa Petr
- ItemStudy on the use and adaptation of bottleneck features for robust speech recognition of nonlinearly distorted speech(SciTePress, 2016) Málek Jiří; Červa Petr; Šeps Ladislav; Nouza Jan
- ItemStudy on the use of deep neural networks for speech activity detection in broadcast recordings(SciTePress, 2016) Matějů Lukáš; Červa Petr; Žďánský Jindřich
- ItemSystem for Producing Subtitles to Internet Audio-Visual Documents(Institute of Electrical and Electronics Engineers Inc., 2015) Nouza Jan; Blavka Karel; Boháč Marek; Červa Petr; Málek Jiří
- ItemUnique Software Technological Platform for Re-scripting of Archives of Historical And Contemporary Relations Čro And Their Opening Up by the Web(2014) Nouza Jan; Červa Petr; Žďánský Jindřich; Blavka Karel; Boháč Marek; Silovský Jan; Chaloupka Josef; Kuchařová Michaela; Málek Jiří