Block-online multi-channel speech enhancement using deep neural network-supported relative transfer function estimates

dc.contributor.authorMálek, Jiří
dc.contributor.authorKoldovský, Zbyněk
dc.contributor.authorBoháč, Marek
dc.date.accessioned2020-05-13T07:39:39Z
dc.date.available2020-05-13T07:39:39Z
dc.date.issued2020-05-01
dc.description.abstractThis work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when short utterances are processed, e.g. in voice assistant applications. We consider several variants of a system that performs beamforming supported by deep neural network-based voice activity detection followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently to make the method applicable in highly dynamic environments. Due to short processed blocks, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to batch processing regime, when recordings are treated as one block. The experimental evaluation is performed on large datasets of CHiME-4 and another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria. Moreover, word error rate (WER) of a speech recognition system is evaluated, for which the method serves as a front-end. The results indicate that the proposed method is robust for short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.cs
dc.format.extent10 strancs
dc.identifier.WebofScienceResearcherIDV-6332-2019 Koldovský, Zbyněk
dc.identifier.doi10.1049/iet-spr.2019.0304
dc.identifier.orcid0000-0002-1791-5675 Koldovský, Zbyněk
dc.identifier.urihttps://dspace.tul.cz/handle/15240/154819
dc.identifier.urihttps://ieeexplore.ieee.org/document/9080890
dc.language.isocscs
dc.publisherInstitution of Engineering and Technology
dc.relation.ispartofIET Signal Processing
dc.titleBlock-online multi-channel speech enhancement using deep neural network-supported relative transfer function estimatescs
local.citation.epage133
local.citation.spage124
local.relation.issue3
local.relation.volume14
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Block-online multi-channel speech.pdf
Size:
1.6 MB
Format:
Adobe Portable Document Format
Description:
článek
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections