AIMachine LearningNLPTech

Recurrent Neural Networks (RNN) – Architecture Perspective(5 min read)

hero image
Photo by André Sanano on Unsplash.

This article explains what are Recurrent Neural Network, shows the most common model architectures as well as popular real-world applications. Furthermore, after reading this article, you’ll have vital knowledge about RNNs, as well as which RNN network you’ll need for a project you have in mind.

By reading this article we assume that you have a fundamental understanding of what Artificial Intelligence is and how Neural Network works. If that isn’t a case, read more about in provided links.

Before all, Recurrent Neural Network (RNN) represents a sub-class of general Artificial Neural Networks specialized in solving challenges related to sequence data. To be able to do this, RNNs use their recursive properties to manage well on this type of data. More details about how RNN works will be provided in future posts.

RNN Applications

To solve many challenges regarding sequence data, we use Recurrent Neural Networks (RNN). There are many types of sequence data out there, but most common are:

  • Audio (music, dialog sounds, etc.)
  • Text (articles, books, etc.)
  • Biological sequences (DNA, genomes, etc.)
  • Video (sports, surveillance, etc.)

Hance, utilizing RNN models and sequence datasets (like ones mentioned above), you can solve many difficulties like (but not only):

  • Speech recognition
  • Music generation
  • Sentiment analysis
  • Automated translations
  • Video action analysis
  • Genom and DNA sequence analysis

It is common for Neural Networks to have different input and output data. For example, in the case of RNNs, speech recognition requires audio data as input and then outputs text data. Furthermore, inputs and outputs can be the same. For example, automated translations can have audio dana (or text data) both ways. This really depends on the use case you have, and the challenge you are solving.

RNN Architectures

Depending on the challenge you solve, RNN architecture can differ. From ones that have single input and output to the ones that have multiple (with variations between). To better address this, below are some common examples of RNN architectures.

One to one

This model is similar to standard Perceptron (single layer neural network). Today, it doesn’t have much utilization as it only provides linear predictions.

This is simple model as it consists of single input x and output y.

One to one architecture.

One to many

Often in use when there is a single income stream, like an audio stream. On the output, it can generate text or event new audio stream, depending on the situation. In some cases, it propagates outputs y_{0…n} to the next RNN units like in the image below.

It consists of a single input x (as an audio stream for example), activation a, and multiple outputs y (that could represent new tones for generated music).

One to many architecture.

Many to one

Some of the common cases when this architecture comes in hand is sentiment analysis, text quality scoring, etc. The way it works is by accumulating and processing inputs through the network and generating single output as the outcome.

It consists of multiple inputs x (like words from sentences), activations a, and at the end single output (this output could be sentiment type for example).

Many to one architecture.

Many to many (same input and output)

One of the use cases, when this architecture comes helpful, is video classification. More precisely, when we try to classify each frame of the video. Here, the input for each RNN unit is a single frame, while the output could be an object coordinate or label of the image.

As mentioned in the video example above, video frames represent multiple x inputs, activations a are propagated through network units, and outputs y are classification results for each frame.

Many to many architecture with same number of inputs and outputs.

Many to many (different input and output)

This architecture is very handy because there is no constraint with the same input and output size. This architecture consists of two parts: encoder, and decoder. Here encoder aggregates input and decoder generates an output of the network. This model is often used in machine translations (like Google Translate) where different languages are not translated word to word but in a more natural (semantical) manner.

Another difference beside different input and output units is that encoder and decoder part are separated. This means that until the whole input is loaded, the output will not generate the desired result.

Many to many architecture with different number of inputs and outputs.


By reading this article you now have the necessary knowledge to decide which architecture suits your future project. Have in mind that there are many variations between presented architectures, as well as hybrids with other models, but ones presented are most general and common regarding RNNs.

Stay tuned for future posts to learn in-depth details on how RNN works, different layer types, how to build them, etc.

Leave a Reply