Speaker independent isolated digit recognition: Multilayer perceptron vs Dynamic Time Warping
Abstract: Former experiments have shown the benefit of using specific multi-layer architectures, the so-called time delay neural networks, for phoneme recognition (Waibel, Hanazawa, Hinton, Shikano, & Lang, 1988). Similar experiments on a speaker-independent task were also performed on a small set of minimal pairs (Bottou, 1988). In this paper we focus on a speaker-independent, global word recognition task with time delay networks. We first describe these networks as a way for learning feature extractors by constrained back-propagation. Such a time-delay network is shown to be capable of dealing with a near real-sizedproblem: French digit recognition. The results are discussed and compared, on the same data sets, with those obtained with a classical time warping system.
@article{bottou-90,
author = {Bottou, {L\'eon} and Fogelman Souli\'e, Fran\c{c}oise and Blanchet, Pascal and Lienard, {Jean Sylvain}},
title = {Speaker independent isolated digit recognition: Multilayer perceptron vs Dynamic Time Warping},
journal = {Neural Networks},
year = {1990},
volume = {3},
pages = {453-465},
url = {http://leon.bottou.org/papers/bottou-90},
}
