Is there any option to run a polyfit on a scatter plot?

Question

Wolfgang McCormack on 28 May 2021

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot

Answered: Image Analyst on 29 May 2021

Hi all,

I have a scatter plot and there are some dots on that. Is there any option to get the X and Y of those points on the scatter plot? Furthermore, anyoption to run polyfit among those points directly on the scatter plot?

Thanks

8 Comments
Show 6 older commentsHide 6 older comments

Wolfgang McCormack on 29 May 2021

@Star Strider Yes, they are 10k points but in one the variable y changes between only 6 values. For example there are 3000 values of 4.36 and 2000 of 6.54 and so on. Same thing happens in variable x too. Are they duplicates? They are not actually duplicates because the data is hourly. But in other sense, you can say they are the same in many hours. THere is no third dimension in my case but thanks for pointing that out. It'll def help me in future. :D I guess I can write a book based on all my questions so far on MATLAB, naming it MATLAB for junior researchers :D Thank you all

Star Strider on 29 May 2021

Open in MATLAB Online

@Wolfgang McCormack — I am doing my best to understand what the data are in the absence of the data themselves.

So they are actually something like this, then —

N = 25;

x = rand(1,N);

y = repmat(randn(6,1),1,N);

figure

scatter(x, y, 'filled')

grid

or this —

figure

scatter(x, y(randi(6,1,N)), 'filled')

grid

?

.

Sign in to comment.

Sign in to answer this question.

Answer 1

Cris LaPierre on 29 May 2021

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#answer_712080

Polyfit is not going to return the X and Y values of those 6 points. It's going to return the polynomial coefficients for the equation that best fits the data. You then supply that equation with whatever X values you want to obtain the corresponding Y values. Using those, you can plot the fit line.

If you need the X and Y of the 6 groups, I'd suggest using something like kmeans clustering to identify 6 clusters and return their centroids first.

[idx,C] = kmeans(___)

Use the centroids as inputs to polyfit.

Calculating

and RMSE is fairly simple. You just need to do some math to calculate SSR and SST. See this answer.

2 Comments
Show NoneHide None

Wolfgang McCormack on 29 May 2021

@Cris LaPierre Thank you Cris, this will def help me in future however in my current case, I only have 6 changes in each variable. like there are 3000 of 3.46 in X and 5000 of 4.6 and so on.

Cris LaPierre on 29 May 2021

Share your data. It will be easier than trying to guess what is going on. Save your variables to a mat file and attach them to your post using the paperclip icon.

Sign in to comment.

Answer 2

Image Analyst on 29 May 2021

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#answer_712185

Open in MATLAB Online

Chris suggests a nice trick. And attach your data like he says in his hidden comment (click link above to show it).

save('answers.mat', 'DataA', 'DataB');

Use the paperclip icon.

Another trick I've used when you have quantized data (multiple points with the same x value) is to add a very slight amount of noise to the x data. Add enough noise to make them unique and avoid the error polyfit throws, but not enough to change the formula it will find:

% Determine range of data.
minx = min(x)
maxx = max(x)
% Add a fraction of a percent of noise to x to make them unique.
xNoisy = x + 0.00001 * (maxx - minx); 
% Determine the formula with the noisy x instead of the actual x.
% Below we will use a second order polynomial.
coefficients = polyfit(xNoisy, y, 2); % Fit a quadratic.
% Get estimated y from arbitrary x
estimatedY = coefficients(3) * thisX .^2 + coefficients(2) * thisX + coefficients(1);

Note that this will give a different formula than Chris's because this will consider how many points are in the cluster, so more points in a cluster will influence the line more, while Chris's uses the centroids of the clusters which ignores how many points are in the cluster. If you have about the same number of points in each cluster, it won't make much of a difference, but if some clusters have wildly different number of points than other clusters, then it could make a noticeable difference.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Is there any option to run a polyfit on a scatter plot?

8 Comments
Show 6 older commentsHide 6 older comments

Answers (2)

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Is there any option to run a polyfit on a scatter plot?

8 Comments Show 6 older commentsHide 6 older comments

Answers (2)

2 Comments Show NoneHide None

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

8 Comments
Show 6 older commentsHide 6 older comments

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments