===== Natural Language Processing (Almost) from Scratch ===== //Abstract:// We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu and Pavel Kuksa: **Natural Language Processing (Almost) from Scratch**, //Journal of Machine Learning Research//, 12:2493--2537, Aug 2011. [[http://jmlr.csail.mit.edu/papers/v12/collobert11a.html|JMLR Link]]    [[http://leon.bottou.org/publications/djvu/jmlr-2011.djvu|jmlr-2011.djvu]] [[http://leon.bottou.org/publications/pdf/jmlr-2011.pdf|jmlr-2011.pdf]] [[http://leon.bottou.org/publications/psgz/jmlr-2011.ps.gz|jmlr-2011.ps.gz]] @article{collobert-2011, author = {Collobert, Ronan and Weston, Jason and Bottou, L\'eon and Karlen, Michael and Kavukcuoglu, Koray and Kuksa, Pavel}, title = {Natural Language Processing (Almost) from Scratch}, journal = {Journal of Machine Learning Research}, year = {2011}, volume = {12}, pages = {2493--2537}, month = {Aug}, url = {http://leon.bottou.org/papers/collobert-2011}, } ==== Senna ==== The universal natural language tagger described in this paper is actively maintained by [[http://ronan.collobert.com|Ronan Collobert]]. It can be downloaded from [[http://ronan.collobert.com/senna/|Senna]]. Besides part-of-speech tagging, chunking, named entity extraction, and semantic role labelling, the latest version also outputs syntactic parse trees, still using the neural network architecture described in this paper. * State-of-the-art or near-state-of-the-art tagging accuracies. * Exceptional tagging speed (POS+NER+Chunk+SRL+Parse at more than 10000 words per second.) * Small memory footprint (about 120MB.) * Compact soure code (about 3000 lines of C.)