Building on conceptual work done earlier by Yakov Z.Tsypkin and Vladimir Vapnik, I proposed one of the first interpretations of neural networks as statistical learning machines. This framework unifies a large number of neural network models and provides a common proof of convergence. I also proposed the concept of modular learning systems in which all components are globally trained by optimizing a single cost function. This global training approach was first applied to speech recognition systems integrating neural networks and Hidden Markov Models. Global training was then applied to various pattern recognition problems, like speech recognition, optical character recognition, speaker recognition, or radar spot tracking. This led to the first characterization of the so-called label-bias problem in complex learning systems.