Investigations into the incorporation of the Ideal Binary Mask in ASR

13 years 4 months ago

Download mirlab.org

While much work has been dedicated to exploring how best to incorporate the Ideal Binary Mask (IBM) in automatic speech recognition (ASR) for noisy signals, we demonstrate that the simple use of masked speech can outperform standard spectral reconstruction methods. We explore the effects of both the accuracy of the mask estimation and the strength of the language model on our results. The relative performance of these techniques is directly tied to the accuracy of the estimated mask. Although the use of masked speech fails when signiﬁcant numbers of errors are present, the maximum performance for spectral reconstruction techniques also drops signiﬁcantly. This implies improvements in mask estimation can provide greater gains in ASR performance than improvements in the incorporation of the IBM in ASR. Previous work may have ignored the direct use of masked speech due to its poor performance on tasks without a strong language model.

William Hartmann, Eric Fosler-Lussier

Real-time Traffic

ICASSP 2011 | Mask | Mask Estimation | Signal Processing | Spectral Reconstruction |

claim paper

Post Info
More Details (n/a)

Added	21 Aug 2011
Updated	21 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	William Hartmann, Eric Fosler-Lussier

Comments (0)

Sciweavers

Investigations into the incorporation of the Ideal Binary Mask in ASR

ICASSP 2011 | Mask | Mask Estimation | Signal Processing | Spectral Reconstruction |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers