Gaussian error linear unit (GELU) layer

Since R2022b

## Description

A Gaussian error linear unit (GELU) layer weights the input by its probability under a Gaussian distribution.

This operation is given by

`$\text{GELU}\left(x\right)=\frac{x}{2}\left(1+\text{​}\text{erf}\left(\frac{x}{\sqrt{2}}\right)\right),$`

where erf denotes the error function.

## Creation

### Syntax

``layer = geluLayer``
``layer = geluLayer(Name=Value)``

### Description

````layer = geluLayer` returns a GELU layer.```
````layer = geluLayer(Name=Value)` sets the optional `Approximation` and `Name` properties using name-value arguments. For example, `geluLayer(Name="gelu")` creates a GELU layer with the name `"gelu"`.```

## Properties

### GELU

Approximation method for the GELU operation, specified as one of these values:

• `'none'` — Do not use approximation.

• `'tanh'` — Approximate the underlying error function using

`$\text{erf}\left(\frac{x}{\sqrt{2}}\right)\approx \text{tanh}\left(\sqrt{\frac{2}{\pi }}\left(x+0.044715{x}^{3}\right)\right).$`

Tip

In MATLAB®, computing the tanh approximation is typically less accurate, and, for large input sizes, slower than computing the GELU activation without using an approximation. Use the tanh approximation when you want to reproduce models that use this approximation, such as BERT and GPT-2.

### Layer

Layer name, specified as a character vector or a string scalar. For `Layer` array input, the `trainnet`, `trainNetwork`, `assembleNetwork`, `layerGraph`, and `dlnetwork` functions automatically assign names to layers with the name `""`.

The `GELULayer` object stores this property as a character vector.

Data Types: `char` | `string`

Number of inputs to the layer, returned as `1`. This layer accepts a single input only.

Data Types: `double`

Input names, returned as `{'in'}`. This layer accepts a single input only.

Data Types: `cell`

Number of outputs from the layer, returned as `1`. This layer has a single output only.

Data Types: `double`

Output names, returned as `{'out'}`. This layer has a single output only.

Data Types: `cell`

## Examples

Create a GELU layer.

`layer = geluLayer`
```layer = GELULayer with properties: Name: '' Hyperparameters Approximation: 'none' ```

Include a GELU layer in a `Layer` array.

```layers = [ imageInputLayer([28 28 1]) convolution2dLayer(5,20) geluLayer maxPooling2dLayer(2,Stride=2) fullyConnectedLayer(10) softmaxLayer classificationLayer]```
```layers = 7x1 Layer array with layers: 1 '' Image Input 28x28x1 images with 'zerocenter' normalization 2 '' 2-D Convolution 20 5x5 convolutions with stride [1 1] and padding [0 0 0 0] 3 '' GELU GELU 4 '' 2-D Max Pooling 2x2 max pooling with stride [2 2] and padding [0 0 0 0] 5 '' Fully Connected 10 fully connected layer 6 '' Softmax softmax 7 '' Classification Output crossentropyex ```

## References

[1] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (GELUs)." Preprint, submitted June 27, 2016. https://arxiv.org/abs/1606.08415

## Version History

Introduced in R2022b

