# histogram of signals gaps width

8 views (last 30 days)
Michal on 25 May 2021
Edited: Michal on 26 May 2021
I am looking for algorithm (effective + vectorized) how to find histogram of gaps (NaN) width in the following manner:
1. signals are represented by (Nsamples x Nsig) array
2. gaps in signal are encoded by NaN's
3. width of gaps: is number of consecutive NaN's in the signal
4. gaps width histogram: is frequency of gaps with specific widths in signals
And the following conditions are fulfilled:
[Nsamples,Nsig ]= size(signals)
isequal(size(signals),size(gapwidthhist)) % true
isequal(sum(gapwidthhist.*(1:Nsamples)',1),sum(isnan(signals),1)) % true
Of course, compressed form of gapwidthhist (represented by two cells: "gapwidthhist_compressed_widths" and "gapwidthhist_compressed_freqs") is required too.
Example:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN]' % signal No. 2
gapwidthhist = [1 1 1 1 0 0 0 0 0 0 0 0 0 0; % gap histogram for signal No. 1
3 1 0 0 1 0 0 0 0 0 0 0 0 0]' % gap histogram for signal No. 2
where integer histogram bins (gap widths) are 1:Nsamples (Nsamples=14).
Coresponding compressed gap histogram looks like:
gapwidthhist_compressed_widths = cell(1,Nsig)
gapwidthhist_compressed_widths =
1×2 cell array
{[1 2 3 4]} {[1 2 5]}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
gapwidthhist_compressed_freqs = cell(1, Nsig)
gapwidthhist_compressed_freqs =
1×2 cell array
{[1 1 1 1]} {[3 1 1]}
Typical problem dimension:
Nsamples = 1e5 - 1e6
Nsig = 1e2 - 1e3
Thanks in advance for any help.

Image Analyst on 25 May 2021
If you have the Image Processing Toolbox and can use regionprops() to count the number and length of NaN regions, you can do this:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN]' % signal No. 2
[numData, numSignals] = size(signals)
gapwidthhist = zeros(ceil(numData/2), numSignals);
for column = 1 : numSignals
thisSignal = signals(:, column); % Extract this column.
% Find lengths of all NAN runs
props = regionprops(isnan(thisSignal), 'Area');
allLengths = [props.Area];
hc = histcounts(allLengths)
for k2 = 1 : length(hc)
gapwidthhist(k2, column) = hc(k2);
end
end
% Should be
% gapwidthhist = [1 1 1 1 0 0 0 0 0 0 0 0 0 0; % gap histogram for signal No. 1
% 3 1 0 0 1 0 0 0 0 0 0 0 0 0]' % gap histogram for signal No. 2
% What it is:
gapwidthhist
Michal on 25 May 2021
Well done ... Thanks! Your code is pretty fast even for large dimension problem.
But still, I am looking for pure Matlab code without any toolbox functions, because final user have only basic Matlab.
There is no way how to extract source code of the core functionality, because function "regionprops" calls some
internal built-in functions.

Michal on 26 May 2021
Edited: Michal on 26 May 2021
This is much more simple Matlab implementation but still not optimal (+ not vectorized):
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
gaps = zeros(numData+1,numSignals);
auxnan = isnan(signals);
for i = 1:numSignals
c = 0;
for j = 1:numData
if auxnan(j,i)
c = c + 1;
else
gaps(j,i) = c;
c = 0;
end
end
gaps(numData+1,i) = c;
gapwidthhist(:,i) = histcounts(gaps(:,i),1:numData+1);
end
gapwidthhist
Any idea how to optimize (vectorize) this code to be more effective?

### Categories

Find more on Descriptive Statistics in Help Center and File Exchange

R2021a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!