- random: Selects data points randomly for the active set. This is the fastest option because it doesn't involve any optimization or criterion-based selection.
- sgma (Subset of Data using a Greedy Method for Approximation): Uses a greedy approach to select points that are most representative of the data distribution. This method is more computationally intensive than random selection but aims to choose a more informative subset.
- entropy: Selects points based on maximizing the differential entropy of the predictive distribution. This method tries to choose the most informative points and is computationally expensive, which explains the longer runtime compared to random selection.
- likelihood: Chooses points that maximize the marginal likelihood of the model. This method is also computationally intensive as it involves optimizing the likelihood function over subsets of the data.
ActiveSetMethod: entropy | GPR
11 views (last 30 days)
Show older comments
Hello,
I was wondering what option to select for ActiveSetMethod when fitting a Gaussian procces model. Since I have too many data I use the option subset of data point ('FitMethod','sd'), and -'ActiveSetSize',2000- to select only two thousands points. So far I understood, fitrgp select randomly 2000 points from the data set. Some questions arrises:
- Do GPR use the other points in the data set (for training)? Where? I saw that in the RegressionGP object there is saved all the data and some matrices have the size of all data (for example matrix W, Alpha,...).
- In spite of choosing the points randomly Matlab have the option 'ActiveSetMethod' with four possible values: random (default), sgma, entropy, likelihood. Is there any documentation of what does each option specifically? When I choose entropy, fitgpr takes so long in comparison to random (21 min. vs less than 5). Why is so different?
0 Comments
Accepted Answer
Aditya
on 4 Feb 2025 at 4:52
Hi Marius,
When using Gaussian Process Regression (GPR) with a large dataset in MATLAB, you can employ the 'FitMethod', 'sd' option to fit the model using a subset of data points, known as the active set. This approach helps manage computational complexity by reducing the number of data points used in training. Here's a breakdown of your questions and the options available:ActiveSetMethod Options
0 Comments
More Answers (0)
See Also
Categories
Find more on Gaussian Process Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!