Gated recurrent unit

Syntax

``Y = gru(X,H0,weights,recurrentWeights,bias)``
``[Y,hiddenState] = gru(X,H0,weights,recurrentWeights,bias)``
``[___] = gru(___,'DataFormat',FMT)``

Description

The gated recurrent unit (GRU) operation allows a network to learn dependencies between time steps in time series and sequence data.

This function applies the deep learning GRU operation to `dlarray` data. If you want to apply an GRU operation within a `layerGraph` object or `Layer` array, use the following layer:

````Y = gru(X,H0,weights,recurrentWeights,bias)` applies a gated recurrent unit (GRU) calculation to input `X` using the initial hidden state `H0`, and parameters `weights`, `recurrentWeights`, and `bias`. The input `X` must be a formatted `dlarray`. The output `Y` is a formatted `dlarray` with the same dimension format as `X`, except for any `'S'` dimensions.The `gru` function updates the hidden state using the hyperbolic tangent function (tanh) as the state activation function. The `gru` function uses the sigmoid function given by $\sigma \left(x\right)={\left(1+{e}^{-x}\right)}^{-1}$ as the gate activation function.```
````[Y,hiddenState] = gru(X,H0,weights,recurrentWeights,bias)` also returns the hidden state after the GRU operation.```
````[___] = gru(___,'DataFormat',FMT)` also specifies the dimension format `FMT` when `X` is not a formatted `dlarray`. The output `Y` is an unformatted `dlarray` with the same dimension order as `X`, except for any `'S'` dimensions.```

Examples

Perform a GRU operation using 100 hidden units.

Create the input sequence data as 32 observations with ten channels and a sequence length of 64.

```numFeatures = 10; numObservations = 32; sequenceLength = 64; X = randn(numFeatures,numObservations,sequenceLength); dlX = dlarray(X,'CBT');```

Create the initial hidden state with 100 hidden units. Use the same initial hidden state for all observations.

```numHiddenUnits = 100; H0 = zeros(numHiddenUnits,1);```

Create the learnable parameters for the GRU operation.

```weights = dlarray(randn(3*numHiddenUnits,numFeatures)); recurrentWeights = dlarray(randn(3*numHiddenUnits,numHiddenUnits)); bias = dlarray(randn(3*numHiddenUnits,1));```

Perform the GRU calculation.

`[dlY,hiddenState] = gru(dlX,H0,weights,recurrentWeights,bias);`

View the size and dimension format of `dlY`.

`size(dlY)`
```ans = 1×3 100 32 64 ```
`dlY.dims`
```ans = 'CBT' ```

View the size of `hiddenState`.

`size(hiddenState)`
```ans = 1×2 100 32 ```

You can use the hidden state to keep track of the state of the GRU operation and input further sequential data.

Input Arguments

Input data, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array. When `X` is not a formatted `dlarray`, you must specify the dimension label format using `'DataFormat',FMT`. If `X` is a numeric array, at least one of `H0`, `weights`, `recurrentWeights`, or `bias` must be a `dlarray`.

`X` must contain a sequence dimension labeled `'T'`. If `X` has any spatial dimensions labeled `'S'`, they are flattened into the `'C'` channel dimension. If `X` does not have a channel dimension, then one is added. If `X` has any unspecified dimensions labeled `'U'`, they must be singleton.

Data Types: `single` | `double`

Initial hidden state vector, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

If `H0` is a formatted `dlarray`, it must contain a channel dimension labeled `'C'` and optionally a batch dimension labeled `'B'` with the same size as the `'B'` dimension of `X`. If `H0` does not have a `'B'` dimension, the function uses the same hidden state vector for each observation in `X`.

If `H0` is a formatted `dlarray`, then the size of the `'C'` dimension determines the number of hidden units. Otherwise, the size of the first dimension determines the number of hidden units.

Data Types: `single` | `double`

Weights, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

Specify `weights` as a matrix of size `3*NumHiddenUnits`-by-`InputSize`, where `NumHiddenUnits` is the size of the `'C'` dimension of `H0`, and `InputSize` is the size of the `'C'` dimension of `X` multiplied by the size of each `'S'` dimension of `X`, where present.

If `weights` is a formatted `dlarray`, it must contain a `'C'` dimension of size `3*NumHiddenUnits` and a `'U'` dimension of size `InputSize`.

Data Types: `single` | `double`

Recurrent weights, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

Specify `recurrentWeights` as a matrix of size `3*NumHiddenUnits`-by-`NumHiddenUnits`, where `NumHiddenUnits` is the size of the `'C'` dimension of `H0`.

If `recurrentWeights` is a formatted `dlarray`, it must contain a `'C'` dimension of size `3*NumHiddenUnits` and a `'U'` dimension of size `NumHiddenUnits`.

Data Types: `single` | `double`

Bias, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

Specify `bias` as a vector of length `3*NumHiddenUnits`, where `NumHiddenUnits` is the size of the `'C'` dimension of `H0`.

If `bias` is a formatted `dlarray`, the nonsingleton dimension must be labeled with `'C'`.

Data Types: `single` | `double`

Dimension order of unformatted input data, specified as the comma-separated pair consisting of `'DataFormat'` and a character array or string `FMT` that provides a label for each dimension of the data. Each character in `FMT` must be one of the following:

• `'S'` — Spatial

• `'C'` — Channel

• `'B'` — Batch (for example, samples and observations)

• `'T'` — Time (for example, sequences)

• `'U'` — Unspecified

You can specify multiple dimensions labeled `'S'` or `'U'`. You can use the labels `'C'`, `'B'`, and `'T'` at most once.

You must specify `'DataFormat',FMT` when the input data is not a formatted `dlarray`.

Example: `'DataFormat','SSCB'`

Data Types: `char` | `string`

Output Arguments

GRU output, returned as a `dlarray`. The output `Y` has the same underlying data type as the input `X`.

If the input data `X` is a formatted `dlarray`, `Y` has the same dimension format as `X`, except for any `'S'` dimensions. If the input data is not a formatted `dlarray`, `Y` is an unformatted `dlarray` with the same dimension order as the input data.

The size of the `'C'` dimension of `Y` is the same as the number of hidden units, specified by the size of the `'C'` dimension of `H0`.

Hidden state vector for each observation, returned as a `dlarray` or a numeric array with the same data type as `H0`.

If the input `H0` is a formatted `dlarray`, then the output `hiddenState` is a formatted `dlarray` with the format `'CB'`.

Limitations

• `functionToLayerGraph` does not support the `gru` function. If you use `functionToLayerGraph` with a function that contains the `gru` operation, the resulting `LayerGraph` contains placeholder layers.

Gated Recurrent Unit

The GRU operation allows a network to learn dependencies between time steps in time series and sequence data. For more information, see the Gated Recurrent Unit Layer definition on the `gruLayer` reference page.

References

[1] Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).

Version History

Introduced in R2020a