There are many models that have improved LSTM, GRU (Gated Recurrent Unit) is one of them. In this tutorial, we will introduce GRU and compare it with LSTM.

## What is GRU?

The structure of GRU likes:

## Compare GRU with LSTM

The formula of GRU and LSTM are below:

LSTM |
GRU |

We can find: GRU is also a LSTM, it only get different output from LSTM.

We can tell you this conclusion one by one.

1. Look at the output equation lstm.

\(h^t = o_t \odot tanh(c_t)\)

\(h^t\) is the lstm cell output. However, how about we use \(tanh(c_t)\) to be the output of the lstm cell?

2. Move the gate \(o_t\) to the input of next lstm cell

As to input in lstm cell, it is computed as:

\(g^t = tanh(W_{gx}x_t+W_{ch}h_{t-1}+b_g)\)

We use the output gate \(o_t\) to control the \(h_{t-1}\). The modified input can be:

\(g^t = tanh(W_{gx}x_t+W_{ch}( o_t \odot h_{t-1})+b_g)\)

This is the input of GRU.

However, GRU only consider the \(o_t\) when computing input. I think if we use \( o_t \odot h_{t-1}\) to replace \(h_{t-1}\) and add a tanh for the output of GRU. GRU is also a lstm.

There is one problem**, if we do not use \(o_t\) (it is r ^{t} in GRU), the performance of GRU will be decreased?**

The answer is not, you can read this tutorial.

Can We Remove Reset Gate in GRU? Can It Decrease the Performance of GRU? – Deep Learning Tutorial