| |
Abstract:
A challenging, unsolved problem in the speech recognition
community is recognizing speech signals that are corrupted by
loud, highly nonstationary noise. One approach to noisy speech
recognition is to automatically remove the noise from the
cepstrum sequence before feeding it in to a clean speech
recognizer. In previous work published in
Eurospeech
, we showed how a probability model trained on clean speech and a
separate probability model trained on noise could be combined for
the purpose of estimating the noise-free speech from the noisy
speech. We showed how an iterative 2nd order vector Taylor series
approximation could be used for probabilistic inference in this
model. In many circumstances, it is not possible to obtain
examples of noise without speech. Noise statistics may change
significantly during an utterance, so that speech-free frames are
not sufficient for estimating the noise model. In this paper, we
show how the noise model can be learned even when the data
contains speech. In particular, the noise model can be learned
from the
test
utterance and then used to denoise the test utterance. The
approximate inference technique is used as an approximate E step
in a generalized EM algorithm that learns the parameters of the
noise model from a test utterance. For both Wall Street Journal
data with added noise samples and the Aurora benchmark, we show
that the new noise adaptive technique performs as well as or
significantly better than the non-adaptive algorithm, without the
need for a separate training set of noise examples.
|