GEV Mixture Model (as opposed to GMM)

3 views (last 30 days)
Andrew Feenan
Andrew Feenan on 19 Jun 2023
Commented: Ayush Kashyap on 22 Jun 2023
Hi,
I'm looking for advice on using a GEV mixture model for clustering as opposed to a Gaussian Mixture Model. I want to compare the results of the two. I'm wondering if it is feasible?
The data set I will be using is approx. 1TB of time series data.

Answers (1)

Ayush Kashyap
Ayush Kashyap on 19 Jun 2023
If you have a good understanding of the data's underlying distribution, you can use a Generalized Extreme Value (GEV) mixture model for clustering. Extreme events, such as the distribution of storm intensity or the peak levels of a flood, are frequently modeled using the GEV distribution, a continuous probability distribution. It attempts to estimate those parameters from the data, as with any mixture model, assuming that your data come from a mixture of several subpopulations, each of which has its own parameters.
Where Gaussian distributions may not be sufficient, GEV mixture models can capture more complex data distribution shapes, particularly tail behaviors, in comparison to Gaussian mixture models. As a result, if your data have high values or long tails, you might find that a GEV mixture model fits the data better than a Gaussian mixture model.
However, you should be aware that working with GEV mixture models can be more time-consuming than working with Gaussian mixture models, particularly when dealing with large datasets like the one you describe. Compared to the mean and covariance parameters used in Gaussian mixture models, these models require fitting multiple distribution parameters for each mixture component, which can be more complicated and take longer to compute. To process your data quickly, you may need to use advanced computing architectures like GPUs or clusters.
In conclusion, if your data have extreme values or large tails, a GEV mixture model can be used for clustering. However, you should be aware that these models may necessitate specialized computing resources and can be more computationally demanding to work with than Gaussian mixture models. You can determine which method provides the best representation and fit to your data by comparing the results of the two models. This will help you comprehend the underlying distribution of your data.
  2 Comments
Andrew Feenan
Andrew Feenan on 19 Jun 2023
Thank you for your response!
The data does fit the GEV distribution better than the Gaussian distribution. My understating of clustering is very limited. How would you recommend going about creating a GEV mixture model? I have found it difficult to find resources/research papers touching on the topic, whereas GMM is a very common approach.
I would really appreciate any advice you may have to offer.
Ayush Kashyap
Ayush Kashyap on 22 Jun 2023
You may refer to following documentations for a better understanding of clustering and GEV in particular:

Sign in to comment.

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!