Clear Filters
Clear Filters

Fit meta data to a custom equation

28 views (last 30 days)
Yang Hu
Yang Hu on 23 Jul 2024 at 17:55
Answered: Star Strider on 24 Jul 2024 at 19:02
Hello everyone,
I have been having trouble fitting a set of meta data (attached called IndiMSD) to this custom equation, doing fit to a custom equation and output parameters of the equation.
The first row of data points of x and y is zero, it will output Inf of NaN when do the fit. So I disgard all in x and y matrix.
now the x and y are below:
x_input = readmatrix('IndiMSD.xlsx');
x = x_input(2:end,:); % first zero rows are disgarded, now x is 96 by 112, instead of 97 by 112
y = 15:15:1440;% first zero data point is disgarded, now y is 96 by 1
I want to fit each paif of column of x with y as above. For example, x(:,1) will be fitted with y, output its own S, P value with goodness of fit; , x(:,2) fits with y, output its own S, P value with goodness of fit, and so on until the last column. All fitted results of S, P and goodness of fit will be stored in separate matrice.
The fitted equation is
y = 2*((S^2)*(P^2)) * ((x./P) - 1 + exp(-(x./P)));
I really appreciate it if anyone has some ideas of how to do it.

Answers (2)

Milan Bansal
Milan Bansal on 23 Jul 2024 at 19:17
Edited: Milan Bansal on 23 Jul 2024 at 19:19
Hi Yang Hu,
As per my understanding you wish to fit a each column of x with given y and determine the S and P parameters along with goodness of fit for each column in the attached table. Here, I am assuming the metrics for goodness of fit is R^2 value.
You can use lsqcurvefit function for this task. It is based on least square method. It adjusts the parameters of a given model to best fit a set of data points by minimizing the sum of the squares of the residuals. Please refer to the documentation to learn more about lsqcurvefit function : https://www.mathworks.com/help/optim/ug/lsqcurvefit.html
Please refer to the steps in following code snippet to learn how lsqcurvefit can be used:
% Load the data
x_input = readmatrix('IndiMSD.xlsx');
x = x_input(2:end, :); % Discard the first row
y = (15:15:1440)'; % Make sure y is a column vector
% Preallocate
numCols = size(x, 2);
S_values = zeros(1, numCols); % S parameter
P_values = zeros(1, numCols); % P parameter
gof_values = zeros(1, numCols); % goodness of fit
% Custom equation as a function handle
customEquation = @(params, x) 2 * ((params(1)^2) * (params(2)^2)) * ((x./params(2)) - 1 + exp(-(x./params(2))));
% Define the options for lsqcurvefit
options = optimset('Display', 'off');
% Loop through each column of x
for col = 1:numCols
x_col = x(:, col);
% Initial guess for parameters [S, P], please modify this as per your
% requirement
initialGuess = [1, 1];
% Perform the curve fitting
[fittedParams, ~, ~, exitflag, output] = lsqcurvefit(customEquation, initialGuess, x_col, y, [], [], options);
% Store the fitted parameters
S_values(col) = fittedParams(1);
P_values(col) = fittedParams(2);
% Calculate goodness of fit (e.g., R-squared)
y_fit = customEquation(fittedParams, x_col);
SS_res = sum((y - y_fit).^2);
SS_tot = sum((y - mean(y)).^2);
gof_values(col) = 1 - (SS_res / SS_tot);
end
% Display Results
% disp('Fitted S:'); disp(S_values);
% disp('Fitted P:'); disp(P_values);
% disp('Goodness of fit (R-squared):'); disp(gof_values);
Hope this helps!
  1 Comment
Yang Hu
Yang Hu on 24 Jul 2024 at 3:22
I found the fitted results are depend on the intial guess, do you know if there is another way to get rid of it?

Sign in to comment.


Star Strider
Star Strider on 24 Jul 2024 at 19:02
@Yang Hu — Your regression equation essentially produces a linear fit to your data. I ran selected columns through the genetic algorithm funciton and got reasonable approsimations, however only lilnear, not matching the data.
The results for specific columns:
Column S P
______ _________ _______
1 0.092765 2.8623
11 0.19424 2.5388
21 0.0067009 605.66
31 0.069583 12.167
41 0.011039 376.5
51 0.11991 2.3603
61 0.0039749 729.5
71 0.73355 0.16846
81 0.1022 3.4403
91 0.0041122 491.22
101 0.09321 2.8053
111 0.055063 12.485
I initially used a two-point approximation using the Symbolic Math Toolbox to see what the matgnitudes of the parametters should be. That actually produces a decent fit to the data, and gave me some idea of what the results should be. I could not get a good fit using the complete data vectors.
x_input = readmatrix('IndiMSD.xlsx');
x = x_input(2:end,:); % first zero rows are disgarded, now x is 96 by 112, instead of 97 by 112
y = 15:15:1440;% first zero data point is disgarded, now y is 96 by 1
Szx = size(x)
Szx = 1x2
96 112
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
figure
semilogx(x, y)
grid
xlabel('x')
ylabel('y')
xv = x(:,50);
yv = y;
xi = [0.25 1.0]*1E+4
xi = 1x2
2500 10000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
yi = interp1(xv, yv, xi)
yi = 1x2
344.3111 742.2230
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
syms S P x y
yf(x) = 2*((S^2)*(P^2)) * ((x./P) - 1 + exp(-(x./P)));
[S,P] = solve(yf(yi) == xi, [S,P])
Warning: Unable to solve symbolically. Returning a numeric solution using vpasolve.
S = 
P = 
788.25581191190670442399341118728
yfcn = @(S,P,x) 2*((S^2)*(P^2)) * ((x./P) - 1 + exp(-(x./P)));
figure
plot(yv, xv)
hold on
plot(yv, yfcn(double(S), double(P), yv))
hold off
grid
xlabel('y')
ylabel('x')
return
NOTE — There is something wrong with your ‘y’ function. I reversed the ‘x’ and ‘y’ arguments to the fitness funciton ‘ftns’ because it otherwise does not work at all to approximate tthe data.
The MATLAB Online code —
clear all
close all
file = websave('IndiMSD.xlsx','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1739776/IndiMSD.xlsx')
x_input = readmatrix(file);
% x_input = readmatrix('IndiMSD.xlsx');
x = x_input(2:end,:); % first zero rows are disgarded, now x is 96 by 112, instead of 97 by 112
y = 15:15:1440;% first zero data point is disgarded, now y is 96 by 1
yfcn = @(S,P,x) 2*((S^2)*(P^2)) * ((x./P) - 1 + exp(-(x./P)));
ftns = @(b,x,y) norm(x - yfcn(b(1),b(2),y));
PopSz = 100;
Parms = 2;
optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[-1E-6 1E-1], 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
kv = [];
k_iter = 1:10:size(x,2);
kv = k_iter;
B = zeros(2,numel(k_iter));
for k = 1:numel(k_iter)
disp(["X Column = "+k])
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(@(b)ftns(b,y,x(:,k)), Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);
t1 = clock;
B(:,k) = theta(:)
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
figure
plot(x(:,k), y, '.', 'DisplayName','Data')
hold on
plot(x(:,k), yfcn(theta(1),theta(2),x(:,k)), '-r', 'DisplayName','Regression')
hold off
grid
xlabel('X')
ylabel('Y')
title("Column "+k)
legend('Location','best')
end
B
Result = table(k_iter(:), B(1,:).', B(2,:).', 'VariableNames',{'Column','S','P'})
The offline code —
x_input = readmatrix('IndiMSD.xlsx');
x = x_input(2:end,:); % first zero rows are disgarded, now x is 96 by 112, instead of 97 by 112
y = 15:15:1440;% first zero data point is disgarded, now y is 96 by 1
yfcn = @(S,P,x) 2*((S^2)*(P^2)) * ((x./P) - 1 + exp(-(x./P)));
ftns = @(b,x,y) norm(x - yfcn(b(1),b(2),y));
PopSz = 100;
Parms = 2;
optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms).*[-1E-6 1E-1], 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
kv = [];
k_iter = 1:10:size(x,2);
kv = k_iter;
B = zeros(2,numel(k_iter));
for k = 1:numel(k_iter)
disp(["X Column = "+k])
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(@(b)ftns(b,y,x(:,k)), Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);
t1 = clock;
B(:,k) = theta(:)
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
figure
plot(x(:,k), y, '.', 'DisplayName','Data')
hold on
plot(x(:,k), yfcn(theta(1),theta(2),x(:,k)), '-r', 'DisplayName','Regression')
hold off
grid
xlabel('X')
ylabel('Y')
title("Column "+k)
legend('Location','best')
end
B
Result = table(k_iter(:), B(1,:).', B(2,:).', 'VariableNames',{'Column','S','P'})
You will likely want to loop through the entire data set (I used every tenth column here, for efficiency) once you correct the problem with your function.
.

Categories

Find more on Descriptive Statistics in Help Center and File Exchange

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!