Deep Learning for NLP: SpaCy vs PyTorch vs AllenNLP

Deep neural networks have become really popular nowadays, producing state-of-the-art results in many areas of NLP, like sentiment analysis, text summarization, question answering, and more. In this blog post, we compare three popular NLP deep learning frameworks: SpaCy, PyTorch, and AllenNLP: what are their advantages, disadvantages, and use cases.


Pros: easy to use, very fast, ready for production

Cons: not customizable, internals are opaque


SpaCy is a mature and batteries-included framework that comes with prebuilt models for common NLP tasks like classification, named entity recognition, and part-of-speech tagging. It’s very easy to train a model with your data: all the gritty details like tokenization and word embeddings are handled for you. SpaCy is written in Cython which makes it faster than a pure Python implementation, so it’s ideal for production.

The design philosophy is the user should only worry about the task at hand, and not the underlying details. If a newer and more accurate model comes along, SpaCy can update itself to use the improved model, and the user doesn’t need to change anything. This is good for getting a model up and running quickly, but leaves little room for a NLP practitioner to customize the model if the task doesn’t exactly match one of SpaCy’s prebuilt models. For example, you can’t build a classifier that takes both text, numerical, and image data at the same time to produce a classification.


Pros: very customizable, widely used in deep learning research

Cons: fewer NLP abstractions, not optimized for speed


PyTorch is a deep learning framework by Facebook, popular among researchers for all kinds of DL models, like image classifiers or deep reinforcement learning or GANs. It uses a clear and flexible design where the model architecture is defined with straightforward Python code (rather than TensorFlow’s computational graph design).

NLP-specific functionality, like tokenization and managing word embeddings, are available in torchtext. However, PyTorch is a general purpose deep learning framework and has relatively few NLP abstractions compared to SpaCy and AllenNLP, which are designed for NLP.


Pros: excellent NLP functionality, designed for quick prototyping

Cons: not yet mature, not optimized for speed


AllenNLP is built on top of PyTorch, designed for rapid prototyping NLP models for research purposes. It supports a lot of NLP functionality out-of-the-box, like text preprocessing and character embeddings, and abstracts away the training loop (whereas in PyTorch you have to write the training loop yourself). Currently, AllenNLP is not yet at a 1.0 stable release, but looks very promising.

Unlike PyTorch, AllenNLP’s design decouples what a model “does” from the architectural details of “how” it’s done. For example, a Seq2VecEncoder is any component that takes a sequence of vectors and outputs a single vector. You can use GloVe embeddings and average them, or you can use an LSTM, or you can put in a CNN. All of these are Seq2VecEncoders so you can swap them out without affecting the model logic.

The talk “Writing code for NLP Research” presented at EMNLP 2018 gives a good overview of AllenNLP’s design philosophy and its differences from PyTorch.

Which is the best framework?

It depends on how much you care about flexibility, ease of use, and performance.

  • If your task is fairly standard, then SpaCy is the easiest to get up and running. You can train a model using a small amount of code, you don’t have to think about whether to use a CNN or RNN, and the API is clearly documented. It’s also well optimized to deploy to production.
  • AllenNLP is the best for research prototyping. It supports all the bells and whistles that you’d include in your next research paper, and encourages you to follow the best practices by design. Its functionality is a superset of PyTorch’s, so I’d recommend AllenNLP over PyTorch for all NLP applications.

There’s a few runner-ups that I will mention briefly:

  • NLTK / Stanford CoreNLP / Gensim are popular libraries for NLP. They’re good libraries, but they don’t do deep learning, so they can’t be directly compared here.
  • Tensorflow / Keras are also popular for research, especially for Google projects. Tensorflow is the only framework supported by Google’s TPUs, and it also has better multi-GPU support than PyTorch. However, multi-GPU setups are relatively uncommon in NLP, and furthermore, its computational graph model is harder to debug than PyTorch’s model, so I don’t recommend it for NLP.
  • PyText is a new framework by Facebook, also built on top of PyTorch. It defines a network using pre-built modules (similar to Keras) and supports exporting models to Caffe to be faster in production. However, it’s very new (only released earlier this month) and I haven’t worked with it myself to form an opinion about it yet.

That’s all, let me know if there’s any that I’ve missed!

2 thoughts on “Deep Learning for NLP: SpaCy vs PyTorch vs AllenNLP

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s