Fit a multivariate Gaussian distribution

6 views (last 30 days)
Sumera Zem
Sumera Zem on 1 Mar 2023
Answered: Sai Pavan on 18 Apr 2024
I have a A that I am analyzing using Principal Component Analysis (PCA) to reduce the dimensionality. Once I have reduced the dimensionality, I am attempting to fit a multivariate Gaussian distribution probability density function. Here is the code I used.
A = rand(32, 10); % generate a matrix
[m,n] = size(A);
[U, ~, ~] = svd(1/m * (A)' * A); % singular value decomposition
k = 2; % choose number of principal components
A_p = (U(:,1:k)' * A')';
% Fit Gaussian density function to data
Mu = mean(A_p , 2); % mean
Sigma = cov(A_p'); % covariance
P = @(z) 1/((2*pi)^(n/2)*det(Sigma)^0.5)*exp(-0.5*(z-Mu)'*pinv(Sigma)*(z-Mu));
A_val = rand(32, 1); % new data for evaluation
delta = linspace(min(P(A_val)), max(P(A_val)), 1000);
Unfortunately, I have encountered three problems with my code that I am struggling to resolve. Firstly, I believe that P(A_val) should be a vector, but the code is not generating a vector. Secondly, the determinant of Sigma is becoming zero, resulting in an infinite delta value. Thirdly, I am using the formulation A_p = U(:,1:k)' * A for PCA, but I am encountering an error due to a size mismatch in the matrices. For this reason, change it to A_p = (U(:,1:k)' * A')'.

Answers (1)

Sai Pavan
Sai Pavan on 18 Apr 2024
Hi Sumera,
I understand that you are facing multiple issues while performing Principal Component Analysis (PCA) on the data and then attempting to fit a multivariate Gaussian distribution probability density function on the dimension reduced data.
Please refer to the below workflow to perform PCA on the data:
  • Standardize the data to have a mean of 0 and a standard deviation of 1 as PCA is affected by scale.
  • Compute the covariance matrix of the standardized data.
  • Decompose the covariance matrix into its eigenvalues and eigenvectors. The eigenvectors represent the directions of the maximum variance (principal components), and the eigenvalues represent the magnitude of the variance in the directions of the corresponding eigenvectors.
  • Project the standardized data onto the principal components (eigenvectors) to transform the data into the new space.
  • Select a subset of the principal components that explain most of the variance.
A = rand(32, 10);
% Standardize the data
A_standardized = (A - mean(A)) ./ std(A);
% Compute the covariance matrix
covMatrix = cov(A_standardized);
% Eigenvalue decomposition
[eigVectors, eigValues] = eig(covMatrix);
% Project the data
A_pc = A_standardized * eigVectors;
% Choose the number of components
numComponents = 2;
A_reduced = A_pc(:, 1:numComponents);
% The simpler approach is to use MATLAB's PCA function on standardized data
% [coeff, score, latent] = pca(A_standardized);
% To reduce the data to 2 principal components using pca function
% A_reduced_builtin = score(:, 1:2);
This procedure of performing PCA resolves the issues of determinant of Sigma becoming zero and size mismatch of matrices. The function "P" is designed to evaluate the probability density of a single point "z" at a time. To apply "P" to multiple points and get a vector of densities, we need to iterate over each point in "A_val" or vectorize the operation as shown in the below code snippet:
densities = arrayfun(@(idx) P(A_val(idx, :)), 1:size(A_val, 1));
Please refer to the below documentation to learn more about:
Hope it helps!

Categories

Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!