# An error occurred when calculating the correlation coefficient using "reduce_data_points" for two sets of data. Thanks for all the corrections.

4 views (last 30 days)
Wesley on 10 Mar 2022
Commented: Walter Roberson on 10 Mar 2022
When using "reduce_data_points" to calculate the correlation coefficient of two sets of data, "The index exceeds the number of array elements (147)."
Error reduce_data_points (line 63)
varargout{i} = data{i}(save);" such a mistake. Thank you for your correction.
%%%%%%%Import two curve data%%%%%%%%%%%%%%%
X=m1(:,1);
Y=m1(:,2);
x=A(:,1);
y=A(:,2);
a=size(X);
b=size(x);
c=min(a,b);
d=c(:,1);
%%%%%%%%%reduce data points%%%%%%%%%%%%%%%
[x_new,X_new] = reduce_data_points(x,X,d);
index1 = find(X_new); % or 'first'
Y_new=Y(index1);
index2 = find(x_new); % or 'first'
y_new=y(index2);
r=corrcoef(y_new,Y_new);
Wesley on 10 Mar 2022
Edited: Walter Roberson on 10 Mar 2022
%==========================================================================
%
% reduce_data_points Reduces the number of data points in a data set to a
% specified number.
%
% [x1_new,...,xn_new] = reduce_data_points(x1,...,xn,N)
%
% Last Update: 2021-08-28
% Contact: tamas.a.kis@outlook.com
%
%--------------------------------------------------------------------------
%
% ------
% INPUT:
% ------
% x1,...,xn - original data set with N0 data points
% --> each vector is a 1×N0 or N0×1 double
% N - (1×1 double) desired number of data points
%
% -------
% OUTPUT:
% -------
% x1_new,...,xn_new - updated data set with N data points
% --> each vector is a 1×N or N×1 double
%
% -----
% NOTE:
% -----
% --> n = dimension of a data point
% • For example, if a single data point is represented by an
% ordered triple (x1,x2,x3), then n = 3.
% --> N0 = original number of data points
% --> N = (desired) new number of data points
% --> Sometimes, the function will not be able to return exactly N points
% (due to rounding issues).
% --> The main purpose of this function is to reduce the size of a data
% set when not all the points are needed. For example, plotting
% y = x^2 with 100 points rather than with 1000 points will (to the
% naked eye) be visually identical, but will be a lot faster for the
% computer to perform.
%
%==========================================================================
function varargout = reduce_data_points(varargin)
% extracts data set and number of data points to save
data = varargin(1:(end-1));
N = varargin{end};
% determines original number of data points
N0 = length(data{1});
% determines indices of data points to save
save_ratio = round(N0/N);
save = 1:save_ratio:N0;
% preallocates output argument
varargout = cell(1,nargin-1);
% shrinks data set
for i = 1:(nargin-1)
varargout{i} = data{i}(save);
end
end

Walter Roberson on 10 Mar 2022
[x_new,X_new] = reduce_data_points(x,X,d);
Your x and X are different sizes. The first parameter needs to be no longer than the second parameter.
The reduce_data_points function does not apply the save ratio (last parameter) independently to each input: it assumes that all parameters other than the last parameter are the same size.
Walter Roberson on 10 Mar 2022
That reduction code with not reduce the first input at all unless the number of requested points is at most 2/3 of the number of points in the first input -- and when it does reduce, it will get the number of output points wrong unless the request is for quite close to 1/N of the original (1/2, 1/3, and so on.)
The code seems to be written assuming that all of the inputs are have corresponding points selected. It is not intended for the case where the inputs are different sizes. It is intended more for the case of (x,y) or (x,y,z) pairs that are to be thinned down to corresponding locations.
If your x and your X are intended to hold corresponding values, then there is something quite wrong with your data. Your .mat file has about 857 entries for one of them, and about 147 entries for the other. If they are assumed to match 1:1 then that is not clear.
If, however, what you want to do is extract samples the same length, but more or less at random, from the two variables, then there are better ways of handling that.
x_new = x(round(linspace(1, numel(x), d)));
X_new = X(round(linspace(1, numel(X), d)));