Distortion introduced by filtering interpolated data.

My goal is to filter the column g correctly.
Since there are duplicates in column t, I firstly used unique() to remove the duplicates and then applied a linear interpolation (gI). Then I applied a low pass filter to g and gI, respectively, just for comparison. I found a big distortion on filtered gI (attached .png). What happened and how to fix it?
PS.: The interpolation is a must, because I have other datasets to merge with.
Thank you in advance!
Best,
Jasmine
clear;
dat = load('data.txt');
t = dat(:,1);
g = dat(:,2);
% remove duplicates and interpolate
tI = t(1):t(end);
tI = tI';
[~,indt] = unique(t);
gI = interp1(t(indt),g(indt),tI,'linear','extrap');
% low pass filter
fl = 60;
B = fir1(fl,1/fl,blackman(fl+1));
gf1=filter(B,1,g);
gf2=filter(B,1,gI);
% [~,gf1] = gaussfilt(t,g,fl); %% filter original data
% [~,gf2] = gaussfilt(tI,gI,fl); %% filter interpolated data
clf
plot(t(fl+1:end),gf1(fl+1:end))
hold on
plot(tI(fl+1:end),gf2(fl+1:end))
hold on
xlabel('t')
ylabel('g')
legend('filter orig data','filter interp data')

1 Comment

For whatever reason, even using the Signal Processing Toolbox resample funciton (more reliable for signal processing applications than interp1) on the original signal, either without first using unique, there is a significant discontinuity beginning at t=500, and that is likely responsible for the transient there. (You can see this by subtracting the two signals and plotting the difference.)
dat = readmatrix('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1296480/data.txt')
dat = 2001×2
1.0e+06 * 0.0000 0.9768 0.0000 0.9759 0.0000 0.9815 0.0000 0.9895 0.0000 0.9941 0.0000 0.9923 0.0000 0.9853 0.0000 0.9767 0.0000 0.9702 0.0000 0.9682
t = dat(:,1);
g = dat(:,2);
tI = (t(1):size(dat,1)).';
gI = g;
fl = 60;
B = fir1(fl,1/fl,blackman(fl+1));
gf1=filter(B,1,g);
gf2=filter(B,1,gI);
figure
plot(t(fl+1:end),gf1(fl+1:end))
hold on
plot(tI(fl+1:end),gf2(fl+1:end))
hold on
xlabel('t')
ylabel('g')
legend('filter orig data','filter interp data')
Re-defining the time vector so that all the time values are sequential (instead of some being duplicated) may be the easiest fix.
.

Sign in to comment.

Answers (1)

hello
when you do non unique data removal and interpolation, you are introducing a little distorsion in your original signal , like a small local step or impulse.
Then, depending of what kind of filter you apply (and especially how low is the cut off frequency) , you will see the filter transient response vs. this input distorsion, on top of what it is supposed the "true" filtered signal itself.
See that changing the cut off frequency of your low pass filter will have a significant effect on the distortion : if I shift the corner frequency to a higher value , the amplitude of the distorsion gets reduced. This clearly shows that what we are seing here is the filter transient response , and the transient is something WE have created during the duplicates removal / interpolation process.
So that may sound crazy or at least contra intuitive, but the best thing to do is to do the low pass filtering on the raw data first and unique / interpolation stage after.
In this case you get no distorsion at all
clear;
dat = load('data.txt');
t = dat(:,1);
g = dat(:,2);
% low pass filter (first)
fl = 60;
B = fir1(fl,1/fl,blackman(fl+1));
gf1=filter(B,1,g);
% remove duplicates and interpolate (second)
[tu,indt,~] = unique(t);
gfu = gf1(indt);
tI = (tu(1):tu(end))';
gfI = interp1(tu,gfu,tI,'linear');
clf
figure(1)
plot(t(fl+1:end),gf1(fl+1:end),'*r',tI(fl+1:end),gfI(fl+1:end),'g')
xlabel('t')
ylabel('g')
legend('filter orig data','filter interp data')

3 Comments

The data has two separate issues as I understand it. Some data points are given the same time stamp and some time stamps are missing. For example, here is a short window of the time vector
dat = load(websave('data.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1296480/data.txt'));
t = dat(:,1);
g = dat(:,2);
t(498:510).'
ans = 1×13
498 499 500 502 503 504 505 505 506 507 508 510 511
We see that a data point is missing at t = 501 and t = 505 is repeated (with two different values I believe). Of course, filter doesn't care about those issues, it just processes the input through the specified difference equation. But is it correct to do so in this case where the input sequence doesn't represent samples equally spaced in time? It seems in this case that some sort of corrective action would be needed on the input before applying filter()?
I saw that too (missing / repeated values)
have tried a few different corrective actions, but all "failed" somehow as they would create a kind of local distorsion on the input signal that will create that "bump" transient on the filtered output. The bump amplitude is worse as you apply a lower cut off frequency filter. NB : the bump seems big but in fact when you look at the bump amplitude vs the main y value is not that big at all.
It surprised me a bit but from a practical point of view (and maybe not from a true mathematical point of view) I found the filtering on the raw data and then only the duplicates removal and interpolation better than the other way around.
Still not 100% scientific I admit but much better in term of plot result.
There may be still some work to do on this topic, but here it's going to be late , so maybe tomorrow I'll have a look again.
The same confusion as Paul's. Doesn't a filtering process require evenly spaced data?

Sign in to comment.

Categories

Products

Release

R2022b

Asked:

on 15 Feb 2023

Commented:

on 17 Feb 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!