Speaker independent isolated digit recognition: Multilayer perceptron vs Dynamic Time Warping

Abstract: Former experiments have shown the benefit of using specific multi-layer architectures, the so-called time delay neural networks, for phoneme recognition (Waibel, Hanazawa, Hinton, Shikano, & Lang, 1988). Similar experiments on a speaker-independent task were also performed on a small set of minimal pairs (Bottou, 1988). In this paper we focus on a speaker-independent, global word recognition task with time delay networks. We first describe these networks as a way for learning feature extractors by constrained back-propagation. Such a time-delay network is shown to be capable of dealing with a near real-sizedproblem: French digit recognition. The results are discussed and compared, on the same data sets, with those obtained with a classical time warping system.

Léon Bottou, Françoise Fogelman Soulié, Pascal Blanchet and Jean Sylvain Lienard: Speaker independent isolated digit recognition: Multilayer perceptron vs Dynamic Time Warping, Neural Networks, 3:453-465, 1990.

nnj-1990.djvu nnj-1990.pdf nnj-1990.ps.gz

@article{bottou-90,
  author = {Bottou, {L\'eon} and Fogelman Souli\'e, Fran\c{c}oise and Blanchet, Pascal and Lienard, {Jean Sylvain}},
  title = {Speaker independent isolated digit recognition: Multilayer perceptron vs Dynamic Time Warping},
  journal = {Neural Networks},
  year = {1990},
  volume = {3},
  pages = {453-465},
  url = {http://leon.bottou.org/papers/bottou-90},
}