I started working on structured learning systems in the
context of my Ph.D. thesis work on speech recognition.
I first focused on combination of time-delay neural networks
and dynamic programming techniques. When Bourlard and Wellekens
published their first paper combining HMMs and neural networks,
I realized that the HMM framework provided much greater opportunities
to approach the problem. It also led to the concept of *“global training”*.
This was developed in my 1991 Ph.D. thesis,
where I also identified a curious modeling problem that was
later termed *“the label bias problem”*.

Attempts to solve the label bias problem led to the non-probabilistic (LVQ-based) approach described in the very last paragraph of the IJCNN 1991 paper. Meanwhile Burges and Denker solved the probabilistic puzzle in 1994, leading to the document analysis systems and graph transformer network work. To approach this work, I would first recommend reading the 1996 draft which I find much clearer than the published papers (see "Graph Transducer Networks explained".)

My point of view evolved dramatically around 2010 when I started rethinking the connections between structured learning and the emerging deep learning methods. The continuation of this research work can be found in the section on Machine Reasoning and Machine Learning.

- The slides about Graph Transformer Networks.
- The tutorial Energy Based Learning by Yann LeCun and his NYU collaborators.

Léon Bottou and Patrick Gallinari: **A Framework for the Cooperation of Learning Algorithms**, *Advances in Neural Information Processing Systems*, 3, Edited by D. Touretzky and R. Lippmann, Morgan Kaufmann, Denver, 1991.

Léon Bottou: **Une Approche théorique de l'Apprentissage Connexionniste: Applications à la Reconnaissance de la Parole**, Orsay, France, 1991.

Xavier Driancourt, Léon Bottou and Patrick Gallinari: **Learning Vector Quantization, Multi Layer Perceptron and Dynamic Time Warping: Comparison and Cooperation**, *Proceedings of the International Joint Conference on Neural Networks*, Seattle, 1991.

Léon Bottou, Yoshua Bengio and Yann LeCun: **Draft report: Document Analysis with Transducers**, July 1996.

Léon Bottou, Yann Le Cun and Yoshua Bengio: **Global Training of Document Processing Systems using Graph Transformer Networks**, *Proc. of Computer Vision and Pattern Recognition*, 489-493, IEEE, Puerto-Rico, 1997.

Yann Le Cun, Léon Bottou and Yoshua Bengio: **Reading Checks with graph transformer networks**, *International Conference on Acoustics, Speech, and Signal Processing*, 1:151-154, IEEE, Munich, 1997.

Yann Le Cun, Léon Bottou, Yoshua Bengio and Patrick Haffner: **Gradient Based Learning Applied to Document Recognition**, *Proceedings of IEEE*, 86(11):2278-2324, 1998.

Yann Le Cun, Patrick Haffner, Léon Bottou and Yoshua Bengio: **Object Recognition with Gradient-Based Learning**, *Feature Grouping*, Edited by David Forsyth, Springer Verlag, 1999.

Léon Bottou and Yann LeCun: **Graph Transformer Networks for Image Recognition**, *Bulletin of the International Statistical Institute (ISI)*, 2005.