Gaussian process regression (GPR) models are nonparametric kernel-based
probabilistic models. You can train a GPR model using the `fitrgp`

function.

Consider the training set $$\{({x}_{i},{y}_{i});i=1,2,\mathrm{...},n\}$$, where $${x}_{i}\in {\mathbb{R}}^{d}$$ and $${y}_{i}\in \mathbb{R}$$, drawn from an unknown distribution. A GPR model addresses the question of predicting the value of a response variable $${y}_{new}$$, given the new input vector $${x}_{new}$$, and the training data. A linear regression model is of the form

$$y={x}^{T}\beta +\epsilon ,$$

where $$\epsilon \sim N(0,{\sigma}^{2}\text{)}$$.
The error variance *σ*^{2} and
the coefficients *β* are estimated from the
data. A GPR model explains the response by introducing latent variables, $$f\left({x}_{i}\right),\text{\hspace{0.17em}}i=1,2,\mathrm{...},n$$,
from a Gaussian process (GP), and explicit basis functions, *h*.
The covariance function of the latent variables captures the smoothness
of the response and basis functions project the inputs $$x$$ into
a *p*-dimensional feature space.

A GP is a set of random variables, such that any finite number
of them have a joint Gaussian distribution. If $$\left\{f\left(x\right),x\in {\mathbb{R}}^{d}\right\}$$ is
a GP, then given *n* observations $${x}_{1},{x}_{2},\mathrm{...},{x}_{n}$$,
the joint distribution of the random variables $$f({x}_{1}),f({x}_{2}),\mathrm{...},f({x}_{n})$$ is
Gaussian. A GP is defined by its mean function $$m\left(x\right)$$ and
covariance function, $$k\left(x,{x}^{\prime}\right)$$.
That is, if $$\left\{f\left(x\right),x\in {\mathbb{R}}^{d}\right\}$$ is
a Gaussian process, then $$E\left(f\left(x\right)\right)=m\left(x\right)$$ and $$Cov\left[f\left(x\right),f\left({x}^{\prime}\right)\right]=E\left[\left\{f\left(x\right)-m\left(x\right)\right\}\left\{f\left({x}^{\prime}\right)-m\left({x}^{\prime}\right)\right\}\right]=k\left(x,{x}^{\prime}\right).$$

Now consider the following model.

$$h{(x)}^{T}\beta +f(x),$$

where $$f\left(x\right)~GP\left(0,k\left(x,{x}^{\prime}\right)\right)$$,
that is* f*(*x*) are from a zero
mean GP with covariance function, $$k\left(x,{x}^{\prime}\right)$$. *h*(*x*)
are a set of basis functions that transform the original feature vector *x* in
R^{d} into a new feature
vector *h*(*x*) in R^{p}. *β* is
a *p*-by-1 vector of basis function coefficients.
This model represents a GPR model. An instance of response *y* can
be modeled as

$$P\left({y}_{i}|f\left({x}_{i}\right),{x}_{i}\right)~N\left({y}_{i}|h{\left({x}_{i}\right)}^{T}\beta +f\left({x}_{i}\right),{\sigma}^{2}\right)$$

Hence, a GPR model is a probabilistic model. There is a latent
variable *f*(*x _{i}*)
introduced for each observation $${x}_{i}$$,
which makes the GPR model nonparametric. In vector form, this model
is equivalent to

$$P(y|f,X)~N(y|H\beta +f,{\sigma}^{2}I),$$

where

$$X=\left(\begin{array}{c}{x}_{1}^{T}\\ {x}_{2}^{T}\\ \vdots \\ {x}_{n}^{T}\end{array}\right),\text{\hspace{1em}}y=\left(\begin{array}{c}{y}_{1}\\ {y}_{2}\\ \vdots \\ {y}_{n}\end{array}\right),\text{\hspace{1em}}H=\left(\begin{array}{c}h\left({x}_{1}^{T}\right)\\ h\left({x}_{2}^{T}\right)\\ \vdots \\ h\left({x}_{n}^{T}\right)\end{array}\right),\text{\hspace{1em}}f=\left(\begin{array}{c}f\left({x}_{1}\right)\\ f\left({x}_{2}\right)\\ \vdots \\ f\left({x}_{n}\right)\end{array}\right).\text{\hspace{1em}}$$

The joint distribution of latent variables $$f\left({x}_{1}\right),\text{\hspace{0.17em}}f\left({x}_{2}\right),\text{\hspace{0.17em}}\mathrm{...},\text{\hspace{0.17em}}f\left({x}_{n}\right)$$ in the GPR model is as follows:

$$P(f|X)~N\left(f|0,K\left(X,X\right)\right),$$

close to a linear regression model, where $$K\left(X,X\right)$$ looks as follows:

$$K\left(X,X\right)=\left(\begin{array}{cccc}k\left({x}_{1},{x}_{1}\right)& k\left({x}_{1},{x}_{2}\right)& \cdots & k\left({x}_{1},{x}_{n}\right)\\ k\left({x}_{2},{x}_{1}\right)& k\left({x}_{2},{x}_{2}\right)& \cdots & k\left({x}_{2},{x}_{n}\right)\\ \vdots & \vdots & \vdots & \vdots \\ k\left({x}_{n},{x}_{1}\right)& k\left({x}_{n},{x}_{2}\right)& \cdots & k\left({x}_{n},{x}_{n}\right)\end{array}\right).$$

The covariance function $$k\left(x,{x}^{\prime}\right)$$ is usually parameterized by a set of kernel parameters or hyperparameters, $$\theta $$. Often $$k\left(x,{x}^{\prime}\right)$$ is written as $$k\left(x,{x}^{\prime}|\theta \right)$$ to explicitly indicate the dependence on $$\theta $$.

`fitrgp`

estimates the basis
function coefficients, $$\beta $$,
the noise variance, $${\sigma}^{2}$$,
and the hyperparameters,$$\theta $$,
of the kernel function from the data while training the GPR model.
You can specify the basis function, the kernel (covariance) function,
and the initial values for the parameters.

Because a GPR model is probabilistic, it is possible to compute
the prediction intervals using the trained model (see `predict`

and `resubPredict`

).
Consider some data observed from the function *g*(*x*)
= *x**sin(*x*), and assume that
they are noise free. The subplot on the left in the following figure
illustrates the observations, the GPR fit, and the actual function.
It is more realistic that the observed values are not the exact function
values, but a noisy realization of them. The subplot on the right
illustrates this case. When observations are noise free (as in the
subplot on the left), the GPR fit crosses the observations, and the
standard deviation of the predicted response is zero. Hence, you do
not see prediction intervals around these values.

You can also compute the regression error using the trained
GPR model (see `loss`

and `resubLoss`

).

[1] Rasmussen, C. E. and C. K. I. Williams. *Gaussian
Processes for Machine Learning.* MIT Press. Cambridge,
Massachusetts, 2006.