# Count repeated numbers from a data column and produce new columns

Georgios Tsiledakis on 24 Mar 2023
Dear experts,
I have one column of 10000 repeated unique numbers (input). For example, as it can be seen in this abstract of a 20 values column: the 692 is repeated 7 times, the 3988 6 times, the 5248 4 times and the 5313 3 times and goes on...
I would like to have 3 new columns as output 1, 2 and 3 are shown below. So, if the number is repeated 4 times, the new output1 should show the numbers as 1,2,3,4 and in the output2 the last number "4" from the output1 should be written 4 times... and a counter for the first repeatd number like 1111 etc
The input is a column text file: ceadata.txt (I just place 20 values in a column format)
692
692
692
692
692
692
692
3988
3988
3988
3988
3988
3988
5248
5248
5248
5248
5313
5313
5313
With this piece of code:
-------------------------------------
data = fscanf(fid, '%f');
nRows = data(1);
data = reshape(data(1:end), 20, 1).';
c = unique(data);
for i = 1:length(data)
counts(i,1) = sum(data==data(i)); % number of times each unique value is repeated
end
------------------------------------
I have the output 2!
The problem now is how to get the output1 and output3!
input output1 output2 output3
692 1 7 1
692 2 7 1
692 3 7 1
692 4 7 1
692 5 7 1
692 6 7 1
692 7 7 1
3988 1 6 2
3988 2 6 2
3988 3 6 2
3988 4 6 2
3988 5 6 2
3988 6 6 2
5248 1 4 3
5248 2 4 3
5248 3 4 3
5248 4 4 3
5313 1 3 4
5313 2 3 4
5313 3 3 4
I am looking forward to hearing from you.
Thanks a lot
Georgios

Dyuman Joshi on 24 Mar 2023
Edited: Dyuman Joshi on 25 Mar 2023
Code edited to include output3 added by OP.
@Georgios Tsiledakis, Use repelem() rather than looping over the data (especially without preallocation) for obtaining output2.
I'm confused as to why you are reshaping the data in the code mentioned above.
%Using random data (1e4x1) as we do not have your data
%10000 random integers in the range [100,1000]
in = randi([1e2 1e3],1e4,1);
%unique elements
un = unique(in);
%frequency of unique elements
freq = histcounts(in,[un;Inf]);
vec = arrayfun(@(x) 1:x, freq, 'uni', 0);
out1 = [vec{:}]';
out2 = repelem(freq,freq)';
out3 = repelem(1:numel(freq),freq)';
disp([sort(in) out1 out2 out3])
This is an interesting problem, specifically to find a vectorized solution for output 1.
Georgios Tsiledakis on 27 Mar 2023
I am really grateful for your help!
it works perfect with histc()!
Thank you so much
Best regards

Atsushi Ueno on 24 Mar 2023
Edited: Atsushi Ueno on 24 Mar 2023
data = repelem([692;3988;5248;5313],[7;6;4;3]);
n1 = find(diff([data; inf]));
n2 = [n1(1); diff(n1)];
[data, cell2mat(arrayfun(@(x) (1:x)',n2,'uni',false)), repelem(n2,n2)]
ans = 20×3
ans = 20×3
692 1 7 692 2 7 692 3 7 692 4 7 692 5 7 692 6 7 692 7 7 3988 1 6 3988 2 6 3988 3 6