| |
Abstract:
We present a sequential Monte Carlo method applied to additive
noise compensation for robust speech recognition in time-varying
noise. The method generates a set of samples according to the
prior distribution given by clean speech models and noise prior
evolved from previous estimation. An explicit model representing
noise effects on speech features is used, so that an extended
Kalman filter is constructed for each sample, generating the
updated continuous state estimate as the estimation of the noise
parameter, and prediction likelihood for weighting each sample.
Minimum mean squareerror (MMSE) inference of the time-varying
noise parameter is carried out over these samples by fusion the
estimation of samples according to their weights. A residual
resampling selection step and a Metropolis-Hastings smoothing
step are used to improve calculation efficiency. Experiments were
conducted on speech recognition in simulated non-stationary
noises, where noise power changed artificially, and highly
non-stationary Machinegun noise. In all the experiments carried
out, we observed that the method can have significant recognition
performance improvement, over that achieved by noise compensation
with stationary noise assumption.
|