# How to remove artifacts from a signal

66 views (last 30 days)
maxroucool on 23 Sep 2017
Edited: Cedric on 1 Oct 2017
Hello,
I would like to remove artifacts from a signal, typically a lack of information in a thermocouple signal due to a shortcut. In fact I don't want to calculate the moving average because it would alter the "good information" and extrapolate data that don't exist. But I want to detect wrong information (artifacts) based on this moving average, and replace it by NaN or 0.
I tried to code it, but it is not efficient at all (based on for loops), and I am sure I am not the first one to do this ;)
Thanks a lot for your help!
Max

maxroucool on 30 Sep 2017
Hello guys,
Tanks a lot for all tour help. But : - Image Analyst, your way of doing delete to much information around the artifact - Star Rider, your way is too "binary", the artifact does not necessarily have an absolute value (threshold) - Cedric, your way seems efficient, but I don't understand it :)
So I was inspired by all of your solution and made my own that is based on a moving average, that allows me to apply a threshold as a percentage :
function signal = deleteArtifactMax(signal, threshold, interpolation)
global samplingFrequency maxInterpolationDuration;
signal(isnan(signal)) = 0;
time = repmat([1:size(signal,1)]',1,size(signal,2));
movingAverage = (...
signal(1:end-9,:) + ...
signal(2:end-8,:) + ...
signal(3:end-7,:) + ...
signal(4:end-6,:) + ...
signal(5:end-5,:) + ...
signal(6:end-4,:) + ...
signal(7:end-3,:) + ...
signal(8:end-2,:) + ...
signal(9:end-1,:) + ...
signal(10:end,:))/10;
isValid = abs(signal(5:end-5,:) - movingAverage) <= movingAverage*threshold;
notValid = logical([zeros(4,size(signal,2)); ~isValid; zeros(5,size(signal,2))]);
signal(notValid) = NaN;
signal(signal == 0) = NaN;
timeCut = time .* notValid;
for i = 1:size(signal,2)
timeCutCell{i} = find(timeCut(:,i) ~= 0);
end
if interpolation == true
for i = 1:size(signal,2)
if ~isempty(timeCutCell{1,i})
% Delete too long interpolation
iNotToInterpolate = [];
dTimeCutCell = [0; diff(timeCutCell{:,i})];
k=0;
for j=1:size(dTimeCutCell)
if dTimeCutCell(j) == 1
k=k+1;
else
if k > maxInterpolationDuration*60/samplingFrequency
iNotToInterpolate = [iNotToInterpolate j-k:j-1];
end
k=0;
end
end
% Selection of points to interpolate
iToInterpolate = setdiff(timeCutCell{1,i},timeCutCell{1,i}(iNotToInterpolate));
interpTime = time(:,i);
interpTime(iToInterpolate) = [];
interpSignal = signal(:,i);
interpSignal(iToInterpolate) = [];
% Interpolation
signal(iToInterpolate,i) = interp1(interpTime, interpSignal, iToInterpolate, 'linear');
end
end
end
end
Max
Cedric on 30 Sep 2017
Edited: Cedric on 1 Oct 2017
This looks much simpler than my 4 lines, no doubt ;-) You could have asked, that's why the forum is for!
In short, I flag places where the derivative is larger (in absolute value) than any possible valid value for the smooth signal that you have, which defines the positions where you signal goes down or up abruptly. This generates a vector of logicals like
(abs( d ) > 10) ):
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0
----------------- -------
glitch 1 glitch 2
but what I want ultimately is a vector of logicals that flags also the interior of glitches, so I can select them (and e.g. replace the content). One simple way to do this is to multiply by the minus sign of the derivative (which is -1 when going down and +1 when going up):
-sign( d ) .* (abs( d ) > 10):
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 -1 0 0 0 0 0
------------------ --------
glitch 1 glitch 2
and to compute a cumulative sum:
cumsum( -sign( d ) .* (abs( d ) > 10) ):
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
---------------- ------
glitch 1 glitch 2
Then convert to logical and use it or its complement depending what we want to index.
Note that you may need to enlarge a little the width of the selected zones to be on the safe side and this can be done using a basic convolution (I use a random vector of flags to illustrate):
flags = [0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0] ;
extFlags = conv( flags, [1 1 1], 'same' ) > 0 ;
Displaying flags and extFlags respectively, we can see what this did:
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0

Cedric on 24 Sep 2017
Edited: Cedric on 24 Sep 2017
Why not simply:
d = [0; diff( signal )] ;
isValid = ~logical( cumsum( -sign( d ) .* (abs( d ) > 10) )) ;
then e.g. set invalid entries to NaN
signalCut = signal ;
signalCut(~isValid) = NaN ;
or interpolate:
t = 1 : numel( signal ) ;
signalClean = interp1( t(isValid), signal(isValid), t ) ;
This is a quick technical trick that you have to tweak and check visually. It is not robust (in comparison to the two other answers), but if all you want to do is a one shot elimination of the glitches ..
With that you get:
subplot( 3, 1, 1) ; plot( signal, 'b', 'LineWidth', 2 ) ;
grid ; title( 'Signal' ) ;
subplot( 3, 1, 2 ) ; plot( isValid, 'r', 'LineWidth', 2 ) ;
grid ; title( '"is valid"' ) ;
subplot( 4, 1, 3) ; plot( signalCut, 'g', 'LineWidth', 2 ) ;
grid ; title( 'Cut signal' ) ;
subplot( 4, 1, 4) ; plot( signalClean, 'm', 'LineWidth', 2 ) ;
grid ; title( 'Cleaned signal' ) ;

Star Strider on 23 Sep 2017
Try this:
signal = D.signal;
t = (0:length(signal)-1)'; % Create Time Vector
p = polyfit(t, signal, 2); % Polynomial Fit For Detrending
dtrnd_sig = signal - polyval(p, t); % Detrend
t2 = t; % Temporary Time Vector
signal(dtrnd_sig < -100) = []; % Set Signal ‘Dips’ To ‘Empty’
t2(dtrnd_sig < -100) = []; % Set Corresponding Time Of ‘Dips’ To ‘Empty’
signal_new = interp1(t2, signal, t, 'linear'); % Interpolate To Fill Missing Data
figure(1)
plot(t,signal_new,'-r')
There may be other ways to accomplish this objective. I used linear interpolation, 'spline' is another option you may want to experiment with, as is the Signal Processing Toolbox resample function.
maxroucool on 24 Sep 2017
Hi,
Thanks you Star Strider. This solution seems efficient. In fact, I did not wanted to interpolate but given the good result, I will activate or not this option based on the signal I am treating.
Thanks a lot to you both!
Max
Star Strider on 24 Sep 2017
Edited: Star Strider on 25 Sep 2017
Our pleasure!
If you do not want to interpolate, change this part of my code to output the non-interpolated data and time vectors as ‘SplitSignal’:
for k1 = 1:nr_sig
Msignal = Csignal{k1}; % Create Copy
MTime = Ct{k1}; % Create Copy
Msignal(Mask) = []; % Set Signal ‘<Threshold’ To ‘[]’
MTime(Mask) = []; % Set Time ‘<Threshold’ To ‘[]’
CIsignal{k1} = interp1(MTime, Msignal, Ct{k1}, 'linear'); % Interpolate
SplitSignal{k1} = [MTime, Msignal]; % Store Time & Signal Without Interpolation
end

Image Analyst on 23 Sep 2017
Edited: Image Analyst on 23 Sep 2017
You can detect "bad" elements either by computing the MAD and then thresholding to identify them, and then replacing them with zero or NAN (whichever you want)
badIndexes = abs(signal - conv(signal, ones(1, windowWidth)/windowWidth, 'same')) > 10;
signal(badIndexes) = nan; % or 0
Image Analyst on 23 Sep 2017
Two lines where I blur and subtract are too complex? Let me make it longer where I make intermediate signals where you can see what's going on. Bad locations are nans, as you requested, so those points are not plotted so that's why there are "gaps" in the curve, but that's what you asked for.
signal = s.signal;
subplot(4, 1, 1);
plot(signal, 'b-');
grid on;
% Now fix the signal.
windowWidth = 231;
kernel = ones(1, windowWidth) / windowWidth;
blurredSignal = conv(signal, kernel, 'same');
subplot(4, 1, 2);
plot(blurredSignal, 'b-');
grid on;
subplot(4, 1, 3);
grid on;
fixedSignal = signal; % Initialize.
fixedSignal(badIndexes) = nan; % or 0
subplot(4, 1, 4);
plot(fixedSignal, 'b-');
grid on;
maxroucool on 24 Sep 2017
Hi Image Analyst,
Thanks for the update. In fact I missed the "/ windowWidth" and I used too narrow windows. But know it works perfectly.
Max

maxroucool on 23 Sep 2017
Hello,
Thank you very much to you both!
I tried both solutions : The first one is easy to understand and work well on the sample I provided. But in fact, I gave only the beginning of the curve. So the polyfit does not fit well enough the whole curve. I tried various polynomial order but it just don't fit...
The second one is more complex to understand. I checked what is the mathematical function used and I think I understood. However, I tried for several "windowWidth" but none seems to work... Any idea on how to make it work?
Thanks a lot,
Max
Jan on 27 Sep 2017