Inserting blank columns at specific points
7 views (last 30 days)
Show older comments
Jacob Barrett-Newton
on 22 Feb 2018
Commented: Pawel Jastrzebski
on 12 Mar 2018
So I have a set of data that I wish to take the means of. The first set worked fine for me as there were 22 values, where there were 11 items with 2 repeats. The issue I am now having is that my second set has 29 values, where there are 10 items, where 9 items have 3 repeats and a annoying 1 item has 2 repeats.
for j = 3:16
for i=1:21
MeanR{j,i} = mean(Numbers(j,i:(i+1)));
end
end
MeanR(:,2:2:end)=[];
I am not sure if there was an easier way, but I wanted mean values of 1&2, 3&4 etc, but this takes means values of 1&2, 2&3 etc so I just deleted every second item. But this way will not work for my second set. I tried to insert an empty column so I can use the same method, but I cannot seem to do so. Any ideas?
I have attached the file of data that I am using. I use xls read to extract the data into numbers and text, then from that work out means
3 Comments
Pawel Jastrzebski
on 22 Feb 2018
Could you attach a screenshot/file or at least a description of how your data is organised?
- Is your data stored in a column vector?
- Or in a tabular way i.e. rows = 'items' - columns = no of repeats?
Lastly, how do you get your data into Matlab? Do you import it as a matrix? Table?
It will be easier for the community to address your problem if you provide the above details.
Accepted Answer
Pawel Jastrzebski
on 22 Feb 2018
% import data to Matlab as a Table
T_data = readtable('Data_of_interest.csv');
% get rid of meaningless columns/rows
T_data(1,:) = [];
T_data(:,53:end) = [];
% 1st column seems to be a description so use it as such
T_data.Properties.RowNames = T_data.(1);
T_data.(1) = [];
% Split the data into
% - SET1: first 22 columns
% - SET2: the rest
T_set1 = T_data(:,1:22);
T_set2 = T_data(:,23:end);
% Let's focus now on the 'T_set2' as it has irregularl
% number of the repeats
% One way to deal with the problem is to count
% the uniqe tests and their repeats.
% Your column names have a fixed format'.
% Make sure you only keep the relevant bit of the name, that is:
% 'x201X_XX_XX' = 11 CHARS
% If you loose the suffix, you'll be able to determine:
% 1) how many unique names you have
% 2) what is the number of their repeats
% 3) their location within table
% get the colum names
columNames = T_set2.Properties.VariableNames;
% transpose them for easier human readibility
columNames = columNames'
% remove the suffix
for i=1:length(columNames)
% before truncation
columNames(i)
columNames{i}(11+1:end) = [];
%after truncation
columNames(i)
end
% find unique names (logical vector)
uniqueNameCounter = false(length(columNames),1);
uniqueNameCounter(1) = true;
for i=2:length(columNames)
if columNames{i}==columNames{i-1}
uniqueNameCounter(i) = false;
else
uniqueNameCounter(i) = true;
end
end
% now use this logical vector to find the indexes
% of your uniqe tests
allindexes = 1:size(T_set2,2);
uniqueIndexes = allindexes(uniqueNameCounter);
% now use 'uniqueIndexes' to get the number
% of the repeats per test
noOfrepeats = uniqueIndexes(2:end)-uniqueIndexes(1:end-1)
% notice that you have 10 unique names but 9 'noOfRepeats'
% 10th number of the repeats is produced in a following way:
noOfrepeats(end+1) = size(T_set2,2) - uniqueIndexes(end)+1
% Now that you know where your uniqe names are
% And the number of the repeats per unique name
% You can create a code that accesses these data
% And applies whatever calculation you need
% i.e. Add MEAN
% preallocate memory for your calcualtion
tempMat = zeros(size(T_data,1),length(uniqueIndexes));
% calculate MEAN
for i = 1:length(uniqueIndexes)
% show selected fragment of the table
T_set2(:,uniqueIndexes(i):uniqueIndexes(i)+noOfrepeats(i)-1)
% make room for new data
aveVal = mean(T_set2{:,uniqueIndexes(i):uniqueIndexes(i)+noOfrepeats(i)-1},2);
tempMat(:,i) = aveVal
end
% Store MEAN results to a table
T_AveVal = array2table(tempMat);
% new column names
newNames = cell(1, length(uniqueIndexes));
for i=1:length(uniqueIndexes)
newNames{i}= [columNames{uniqueIndexes(i)} '_Mean']
end
T_AveVal.Properties.VariableNames = newNames;
% Display MEAN
T_AveVal
4 Comments
Pawel Jastrzebski
on 12 Mar 2018
- How are you plotting the data? What type of plot are do you use/want to use?
- Could you submit your code so far?
More Answers (2)
Jan
on 22 Feb 2018
What about:
[s1, s2] = size(Numbers);
MeanR = squeeze(mean(reshape(Numbers, s1, 2, s2/2), 2));
But this works only, if s2 is even. At least, no loop is required.
Your loop could be modified:
for j = 3:16
for i = 1:2:21 % Instead of 1:21
MeanR{j, (i+1)/2} = mean(Numbers(j, i:(i+1)));
end
end
How do you determine "repeats"? Maybe splitapply is much easier for your purpose.
3 Comments
Rik
on 22 Feb 2018
Zeroes will affect the mean, so you should use NaN values to pad the array if you're going to, and use 'omitnan' in your call to mean
Jan
on 22 Feb 2018
@Jacob: I asked for how the "repeats" are recognized, because this would allow to post a matching suggestion with splitapply or accumarray. But I can invent an example:
x = rand(1, 20) > 0.6;
group = 1 + cumsum(x); % Perhaps [1,1,2,3,3,3,4,5,5,5,...]
data = rand(1, 20); % Any data
meanData = splitapply(@mean, data, group)
Now meanData contains the mean value over all elements of data belonging to the same value in group. This considers the number of elements automatically and this is cleaner than inserting dummy data only to ignore it later.
Pawel Jastrzebski
on 23 Feb 2018
Edited: Pawel Jastrzebski
on 23 Feb 2018
Jacob Barrett-Newton thank you.
- In general, if you post a specific question and provide all the information, the community is more likely to help you out.
- In regards to my code, I consider myself a beginner as well and there's probably plenty of way it can be improved. Jan Simon's suggestion is one for starters.
- I think that if you swap 'mean' calculation for the 'standard deviation' - the code will still work. However, from the statistical point of view - is getting a std.dev from 2 or 3 data points is going to provide a relevant measure for you?
- Lastly, this code only works, because you have a unique date for each test. If you were to test multiple tests on the same day with multiple repeats you would either have to think of different file naming convention or tweak the bit of the code that counts the unique names.
3 Comments
Pawel Jastrzebski
on 26 Feb 2018
Check out the documentation for MEAN and STD - for the STD you need to specify 3 inputs:
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/188049/image.png)
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/188050/image.png)
So the code will be more like this:
stdval = std(T_set2{:,uniqueIndexes(i):uniqueIndexes(i)+noOfrepeats(i)-1},0,2);
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!