===== Natural Language Processing (Almost) from Scratch =====

//Abstract:// 
We propose a unified neural network architecture and learning algorithm
that can be applied to various natural language processing tasks
including part-of-speech tagging, chunking, named entity recognition,
and semantic role labeling.
This versatility is achieved by
trying to avoid task-specific engineering and therefore 
disregarding a lot of prior knowledge.
Instead of exploiting man-made
input features carefully optimized for each task, our system
learns internal representations on the basis of vast amounts
of mostly unlabeled training data.
This work is then used as a basis for building a freely available
tagging system with good performance and
minimal computational requirements.


<box 99% orange>
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu and Pavel Kuksa:  **Natural Language Processing (Almost) from Scratch**,  //Journal of Machine Learning Research//, 12:2493--2537, Aug 2011.

[[http://jmlr.csail.mit.edu/papers/v12/collobert11a.html|JMLR Link]]
<html>&nbsp;&nbsp;</html>
[[http://leon.bottou.org/publications/djvu/jmlr-2011.djvu|jmlr-2011.djvu]]
[[http://leon.bottou.org/publications/pdf/jmlr-2011.pdf|jmlr-2011.pdf]]
[[http://leon.bottou.org/publications/psgz/jmlr-2011.ps.gz|jmlr-2011.ps.gz]]
</box>

  @article{collobert-2011,
    author = {Collobert, Ronan and Weston, Jason and Bottou, L\'eon and Karlen, Michael and Kavukcuoglu, Koray and Kuksa, Pavel},
    title = {Natural Language Processing (Almost) from Scratch},
    journal = {Journal of Machine Learning Research},
    year = {2011},
    volume = {12},
    pages = {2493--2537},
    month = {Aug},
    url = {http://leon.bottou.org/papers/collobert-2011},
  }


==== Senna ====

The universal natural language tagger described in this paper
is actively maintained by [[http://ronan.collobert.com|Ronan Collobert]].
It can be downloaded from [[http://ronan.collobert.com/senna/|Senna]].
Besides part-of-speech tagging, chunking, named entity extraction, and semantic role labelling, 
the latest version also outputs syntactic parse trees, still using the neural network architecture described in this paper.

  * State-of-the-art or near-state-of-the-art tagging accuracies.
  * Exceptional tagging speed (POS+NER+Chunk+SRL+Parse at more than 10000 words per second.)
  * Small memory footprint (about 120MB.)
  * Compact soure code (about 3000 lines of C.)