Inserting blank columns at specific points

So I have a set of data that I wish to take the means of. The first set worked fine for me as there were 22 values, where there were 11 items with 2 repeats. The issue I am now having is that my second set has 29 values, where there are 10 items, where 9 items have 3 repeats and a annoying 1 item has 2 repeats.
for j = 3:16
for i=1:21
MeanR{j,i} = mean(Numbers(j,i:(i+1)));
end
end
MeanR(:,2:2:end)=[];
I am not sure if there was an easier way, but I wanted mean values of 1&2, 3&4 etc, but this takes means values of 1&2, 2&3 etc so I just deleted every second item. But this way will not work for my second set. I tried to insert an empty column so I can use the same method, but I cannot seem to do so. Any ideas?
I have attached the file of data that I am using. I use xls read to extract the data into numbers and text, then from that work out means

3 Comments

What is it you want to do? Do you want to take the mean of unique values? And if you want to calculate the mean of pairs of consecutive numbers, it is probably easier to add them and divide by two.
Could you attach a screenshot/file or at least a description of how your data is organised?
  • Is your data stored in a column vector?
  • Or in a tabular way i.e. rows = 'items' - columns = no of repeats?
Lastly, how do you get your data into Matlab? Do you import it as a matrix? Table?
It will be easier for the community to address your problem if you provide the above details.
I have attached the data that I am using. I import it using xls read to get Numbers and Text. Then from that Calculate means. I want to calculate the means of every 2 consecutive numbers, for the first case, and then every 3 consecutive numbers for the second case. But to do that for the second case I need to insert a blank column. I suppose i could go into the xls file and do it manually, but is there a way in matlab to do it?

Sign in to comment.

 Accepted Answer

% import data to Matlab as a Table
T_data = readtable('Data_of_interest.csv');
% get rid of meaningless columns/rows
T_data(1,:) = [];
T_data(:,53:end) = [];
% 1st column seems to be a description so use it as such
T_data.Properties.RowNames = T_data.(1);
T_data.(1) = [];
% Split the data into
% - SET1: first 22 columns
% - SET2: the rest
T_set1 = T_data(:,1:22);
T_set2 = T_data(:,23:end);
% Let's focus now on the 'T_set2' as it has irregularl
% number of the repeats
% One way to deal with the problem is to count
% the uniqe tests and their repeats.
% Your column names have a fixed format'.
% Make sure you only keep the relevant bit of the name, that is:
% 'x201X_XX_XX' = 11 CHARS
% If you loose the suffix, you'll be able to determine:
% 1) how many unique names you have
% 2) what is the number of their repeats
% 3) their location within table
% get the colum names
columNames = T_set2.Properties.VariableNames;
% transpose them for easier human readibility
columNames = columNames'
% remove the suffix
for i=1:length(columNames)
% before truncation
columNames(i)
columNames{i}(11+1:end) = [];
%after truncation
columNames(i)
end
% find unique names (logical vector)
uniqueNameCounter = false(length(columNames),1);
uniqueNameCounter(1) = true;
for i=2:length(columNames)
if columNames{i}==columNames{i-1}
uniqueNameCounter(i) = false;
else
uniqueNameCounter(i) = true;
end
end
% now use this logical vector to find the indexes
% of your uniqe tests
allindexes = 1:size(T_set2,2);
uniqueIndexes = allindexes(uniqueNameCounter);
% now use 'uniqueIndexes' to get the number
% of the repeats per test
noOfrepeats = uniqueIndexes(2:end)-uniqueIndexes(1:end-1)
% notice that you have 10 unique names but 9 'noOfRepeats'
% 10th number of the repeats is produced in a following way:
noOfrepeats(end+1) = size(T_set2,2) - uniqueIndexes(end)+1
% Now that you know where your uniqe names are
% And the number of the repeats per unique name
% You can create a code that accesses these data
% And applies whatever calculation you need
% i.e. Add MEAN
% preallocate memory for your calcualtion
tempMat = zeros(size(T_data,1),length(uniqueIndexes));
% calculate MEAN
for i = 1:length(uniqueIndexes)
% show selected fragment of the table
T_set2(:,uniqueIndexes(i):uniqueIndexes(i)+noOfrepeats(i)-1)
% make room for new data
aveVal = mean(T_set2{:,uniqueIndexes(i):uniqueIndexes(i)+noOfrepeats(i)-1},2);
tempMat(:,i) = aveVal
end
% Store MEAN results to a table
T_AveVal = array2table(tempMat);
% new column names
newNames = cell(1, length(uniqueIndexes));
for i=1:length(uniqueIndexes)
newNames{i}= [columNames{uniqueIndexes(i)} '_Mean']
end
T_AveVal.Properties.VariableNames = newNames;
% Display MEAN
T_AveVal

4 Comments

I mean, I'm new to matlab and this site so I'm not sure if your response is a common one, but I'm amazed at the amount of work you put into your response. Thank you.
So, that calculation where you've said we can calculate whatever we need, it would work for standard deviation simply by replacing mean with std right?
Note:
for i=1:length(columNames)
% before truncation
columNames(i)
columNames{i}(11+1:end) = [];
%after truncation
columNames(i)
end
uniqueNameCounter = false(length(columNames),1);
uniqueNameCounter(1) = true;
for i=2:length(columNames)
if columNames{i}==columNames{i-1}
uniqueNameCounter(i) = false;
else
uniqueNameCounter(i) = true;
end
end
can be simplified to the single line:
uniqueNameCounter = [true, ...
strncmp(columNames(2:end), columNames(1:end-1), 12)]
This compares the first 12 characters of the column names only.
Hi,
I was just wondering if you could help me again. I have been plotting the data, and for the first 22 values it works fine as there are 2 repeats, but like the issue with calculating means for the second 29 values, I am having the same issue with plotting. I have managed to do it one way, where i basically plot the values all separately, but its a pretty long loop and I'm not even sure it works properly. Any ideas?
  • How are you plotting the data? What type of plot are do you use/want to use?
  • Could you submit your code so far?

Sign in to comment.

More Answers (2)

What about:
[s1, s2] = size(Numbers);
MeanR = squeeze(mean(reshape(Numbers, s1, 2, s2/2), 2));
But this works only, if s2 is even. At least, no loop is required.
Your loop could be modified:
for j = 3:16
for i = 1:2:21 % Instead of 1:21
MeanR{j, (i+1)/2} = mean(Numbers(j, i:(i+1)));
end
end
How do you determine "repeats"? Maybe splitapply is much easier for your purpose.

3 Comments

Repeats are just predetermined. I have attached the file that I am using. The first 22 dates are my first set, and the next lot of dates are my second set. Repeats are just ones taken on the same day.
The code you just proposed works, but something like that won't work for my second set as like i said, there is an exception where there are 2 repeats instead of 3. Which is why I wanted to add an empty column in of suppose 0s so that it will not affect the means. Does that make sense?
Zeroes will affect the mean, so you should use NaN values to pad the array if you're going to, and use 'omitnan' in your call to mean
@Jacob: I asked for how the "repeats" are recognized, because this would allow to post a matching suggestion with splitapply or accumarray. But I can invent an example:
x = rand(1, 20) > 0.6;
group = 1 + cumsum(x); % Perhaps [1,1,2,3,3,3,4,5,5,5,...]
data = rand(1, 20); % Any data
meanData = splitapply(@mean, data, group)
Now meanData contains the mean value over all elements of data belonging to the same value in group. This considers the number of elements automatically and this is cleaner than inserting dummy data only to ignore it later.

Sign in to comment.

Pawel Jastrzebski
Pawel Jastrzebski on 23 Feb 2018
Edited: Pawel Jastrzebski on 23 Feb 2018
  • In general, if you post a specific question and provide all the information, the community is more likely to help you out.
  • In regards to my code, I consider myself a beginner as well and there's probably plenty of way it can be improved. Jan Simon's suggestion is one for starters.
  • I think that if you swap 'mean' calculation for the 'standard deviation' - the code will still work. However, from the statistical point of view - is getting a std.dev from 2 or 3 data points is going to provide a relevant measure for you?
  • Lastly, this code only works, because you have a unique date for each test. If you were to test multiple tests on the same day with multiple repeats you would either have to think of different file naming convention or tweak the bit of the code that counts the unique names.

3 Comments

I need to calculate the standard deviation so that I can calculate the standard error of the means. I tried replacing mean with std but it gives me an error
Check out the documentation for MEAN and STD - for the STD you need to specify 3 inputs:
So the code will be more like this:
stdval = std(T_set2{:,uniqueIndexes(i):uniqueIndexes(i)+noOfrepeats(i)-1},0,2);
Or:
meanData = splitapply(@std, data, group)

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!