ActiveSetMethod: entropy | GPR

Question

Marius Marinescu on 3 Dec 2021

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1602400-activesetmethod-entropy-gpr

Answered: Aditya on 4 Feb 2025

Hello,

I was wondering what option to select for ActiveSetMethod when fitting a Gaussian procces model. Since I have too many data I use the option subset of data point ('FitMethod','sd'), and -'ActiveSetSize',2000- to select only two thousands points. So far I understood, fitrgp select randomly 2000 points from the data set. Some questions arrises:

Do GPR use the other points in the data set (for training)? Where? I saw that in the RegressionGP object there is saved all the data and some matrices have the size of all data (for example matrix W, Alpha,...).
In spite of choosing the points randomly Matlab have the option 'ActiveSetMethod' with four possible values: random (default), sgma, entropy, likelihood. Is there any documentation of what does each option specifically? When I choose entropy, fitgpr takes so long in comparison to random (21 min. vs less than 5). Why is so different?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Aditya on 4 Feb 2025

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1602400-activesetmethod-entropy-gpr#answer_1558943

Hi Marius,

When using Gaussian Process Regression (GPR) with a large dataset in MATLAB, you can employ the 'FitMethod', 'sd' option to fit the model using a subset of data points, known as the active set. This approach helps manage computational complexity by reducing the number of data points used in training. Here's a breakdown of your questions and the options available:ActiveSetMethod Options

random: Selects data points randomly for the active set. This is the fastest option because it doesn't involve any optimization or criterion-based selection.
sgma (Subset of Data using a Greedy Method for Approximation): Uses a greedy approach to select points that are most representative of the data distribution. This method is more computationally intensive than random selection but aims to choose a more informative subset.
entropy: Selects points based on maximizing the differential entropy of the predictive distribution. This method tries to choose the most informative points and is computationally expensive, which explains the longer runtime compared to random selection.
likelihood: Chooses points that maximize the marginal likelihood of the model. This method is also computationally intensive as it involves optimizing the likelihood function over subsets of the data.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

ActiveSetMethod: entropy | GPR

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

ActiveSetMethod: entropy | GPR

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments