Fast fourier tranformer for Time series data
Show older comments
Dear All.
I am a biginner and for the 1st time in my life I am programming an FFT code.
The problem:
well I aquired data ( 1052 values ) each values captured at a time period ( it is not perfectly uniform ), this time period equalt to nearly 0.01132 s (+/- 0.000005 s)
the vector of time is recorded also and it contain 1052 values as explained. below the plot of the time data signal.

My objective:
Plot the FFT of this DATA to identify the spectrum.
What I did so far:
I followed this page to code https://www.mathworks.com/help/matlab/ref/fft.html, yet I did not have anything and this due that I am not understanding the process on how to do it.

Thanks in advance for all of your thoughts and suggestions.
10 Comments
dpb
on 4 Aug 2022
Show us the exact code you used -- and, it never hurts to attach a .mat file with the data itself.
The MATLAB FFT is not designed for non-uniform sampling but your jitter isn't a terribly large amount it would appear so will probably not have a terribly large effect, but you could try interpolating to an absolutely uniform timebase.
Have you looked at the detail of the time trace -- it's so compact on the given plot can't tell for sure -- are you sure the signal was sampled at a high-enough rate to avoid aliasing?
Mohammed Lamine Mekhalfia
on 4 Aug 2022
Why would Fs be 200 Hz with sampling period 0.01132? Should it be:
Ts = 0.01132;
Fs = 1/Ts
Mohammed Lamine Mekhalfia
on 4 Aug 2022
Well, you can't change the sampling frequency except by interpolation/decimation of the original data so the actual Fs is what it is from the relationship of Fs=1/dt as @Paul showed above.
With ad hoc sampling instead of an actual fixed-rate sampling via a SSH A/D, I'd use
Fs=1/mean(dt);
as the best estimate of the average rate, but it won't change the results much from the point estimate unless there's a real outlier which your prior post indicated was not the case. The change from 200 Hz to 88 Hz, however, should have changed the result by over a factor of 2 from where it had been before; if that didn't happen you did something fundamentally wrong.
Again, if you want some real feedback, attach a .mat file with the actual data; I'm still not convinced the data were not undersampled and that your signal isn't aliased drastically is what you're actually seeing the result of.
What is the quantity under measurment and how was it measured? Were any analog anti-aliasing filters used to ensure there was no frequency content above 75-80 Hz? Anything above Fs will be aliased and cannot be discerned from the real signal once the data have been sampled; anti-aliasing has to be applied before the data collection process digitization/sampling.
The noisy signal all the way from left to right and the symmetry of the plot you attached just don't match up with the code you posted -- it looks like a two-sided FFT and that there's no settling of the noise on top of the overall trend is highly suspicious.
Jeffrey Clark
on 7 Aug 2022
- I agree with comments above about needing the data file to look into this further
- You talk about relatively small time intervals but your first figure shows time from 0 to 190e6 when it should represent ~11.5 seconds?
- The amplitude mean of that figure is ~6e-4 which could translate to a strong DC response in and near the first frequency bin: if the frequecy is near Y(1:...) it will be lost in the DC response. You probably want to shift the time amplitudes to X-mean(X)?
- As in other comment, the second figure surely didn't come from the code you provided on 4 Aug 2022 at 14:47, perhaps you plotted the raw FFT output and had zero padded the X to 2000?
dpb
on 7 Aug 2022
I just assumed the time trace must been plotted against ordinal number, but OP says only has 1000 or so points and the plot would have been just a solid band if that were really the case so that isn't it -- who knows what must have done???? No way of telling.
The DC component comment is also true although for the PSD it would simply be a spike in the first bin -- but the amplitude of the signal about the mean is only a fraction of that so it would, indeed, be reduced in magnitude so one would almost surely have to use a log scale or convert to dB to see any structure.
All in all, whatever OP showed us is simply bound to be not what he/she thinks is -- but unless is willing to attach a data file, we've done all we can do...
Mohammed Lamine Mekhalfia
on 8 Aug 2022
Bjorn Gustavsson
on 8 Aug 2022
The data I get out of that xlsx-file looks nothing like the plot you've presented. What is going on?
Mohammed Lamine Mekhalfia
on 8 Aug 2022
Answers (3)
dpb
on 8 Aug 2022
No disagreements at all with what @Bjorn Gustavsson notes, just do a baseband PSD and look at the signal some...
First, look at a short section of the sampled waveform in enough detail to see it shows us --

which indicates as I suspected that the signal is undersampled for a best estimate of amplitude -- note the peaks are often flattened-of approximating square waves and most are just a spike; a truly adequate sample rate would show those as more rounded peaks; the actual magnitude of a the waveform at the peak will almost always be at a time instant that wasn't sampled.
Second, for the points that were saturated -- what caused that to happen, btw?, ought to try to fix when collect some additional data with better technique/higher sampling rate -- I looked at a region of that as well -- and just arbitrarily filled in an adjacent section -- trying to interpolate such an undersampled signal is pretty-much a pure guess, anyways...

Did same thing for other locations as well...
That done, I subtracted the mean from the resulting time signal to remove the DC component and did a one-sided PSD with the mean sampling frequency.

The top is on linear scale, the lower in dB; there is a strong peak at 28.4 Hz; not sure what that would be related to, but there is only the one dominant frequency in the signal; everything else is just noise it would appear.
For further work, you need to either increase the sample rate to something approaching 150 Hz or so (3-4X the highest frequency of interest is a common recommendation) or incorporate the use of analog lowpass filters before the A/D to limit the higher-frequency input energy.
I've no klew how/what use you are trying to make of this; I don't see how it relates to the problem description at all, but there are certainly clever techniques for solving problems I'm not aware of -- do you have a reference link to the method in application?
The above was done with code that (roughly) matches the following (it's not exact, I just poked around at the command line and then pulled out pertinent pieces to attach...)
% examine narrow range to ensure not aliased badly...
tData=readtable('Data.xlsx');
tData.T=tData.timeInS*1000;
plot(tData.T,tData.Displacement)
ylim([-0.520 -0.519])
xlim([tData.T(1) tData.T(100)])
Then explored the bad data sections...
find(tData.Displacement<-0.6)
xlim([tData.T(290) tData.T(320)])
ylim([-0.525 -0.519])
hold on
hL(1)=plot(tData.T(313:316),tData.Displacement([313 310 311 316]),'x-r'); % the plot presented...
tData.Displacement(314:315)=tData.Displacement(310:311); % subsitute for bad
N1=757;tData.Displacement(N1:N1+1)=tData.Displacement([N1:N1+1]-4); % ditto
N1=848;tData.Displacement(N1:N1+1)=tData.Displacement([N1:N1+1]-4);
tData.D=tData.Displacement-mean(tData.Displacement); % remove DC
tData.T=tData.T-tData.T(1); % conver T to 0 origin
figure
plot(tData.T,tData.D)
xlabel('Time (msec)')
ylabel('Displacement from mean')
figure
plot(tData.T(1:100),tData.D(1:100)) % another look at the detail after detrending
xlim([0 tData.T(100)])
ylim([-4 4]*1E-4)
yline(0)
% now do a PSD
L = height(tData);
Y=fft(tData.D);
P2=abs(Y)/L;
P1=P2(1:L/2+1);
Fs=1/mean(diff(tData.T/1000)); % mean sample rate
f = (0:L/2)*(Fs/L); % compute frequency vector to go with...
P1(2:end-1)=2*P1(2:end-1); % one-sided PSD is half total energy except DC and Fmaxc
figure,tiledlayout(2,1)
nexttile
plot(f,P1)
xlabel('Frequency (Hz)')
ylabel('Displacement Magnitude')
nexttile
plot(f,convert2db(P1))
ylim([-140 -70])
xlabel('Frequency (Hz)')
ylabel('Displacement Mag (dB)')
1 Comment
Bjorn Gustavsson
on 9 Aug 2022
To add to @dpb's answer you might get slightly better results if you use nufft since that function sould take the jitter in the sampling into account. But none of our suggestions will overcome th fundamental problem that you're a bit too close to the Nyquist-frequency.
Bjorn Gustavsson
on 8 Aug 2022
It seems that you're a bit too close to the Nyquist frequency. If you try:
[NUM,TXT,RAW]=xlsread('data.xlsx');
dt = mean(diff(NUM(:,5)));
fs = 1/dt;
% Calculate stft/spectrogram
[S,F,T] = spectrogram(NUM(1:1024,1),128,32,[],fs);
subplot(3,1,1)
plot(NUM(:,5),NUM(:,1),'.-')
hold on
% identify the local max and min
ilM = find(NUM(1:end-2,1)<NUM(2:end-1,1)&NUM(2:end-1,1)>NUM(3:end,1))+1;
ilm = find(NUM(1:end-2,1)>NUM(2:end-1,1)&NUM(2:end-1,1)<NUM(3:end,1))+1;
plot(NUM(ilM,5),NUM(ilM,1),'r.')
plot(NUM(ilm,5),NUM(ilm,1),'c.')
subplot(3,1,2)
pcolor(T,F,log10(abs(S))),shading flat
subplot(3,1,3)
% look at the variability of the period-time between local max and local
% min:
plot(diff(ilm),'c.-')
hold on
plot(diff(ilM),'r.-')
From there you could also zoom in in the top panel to look at the signal.
HTH
4 Comments
Mohammed Lamine Mekhalfia
on 8 Aug 2022
Bjorn Gustavsson
on 8 Aug 2022
If you take a look at the period between the peaks (red) or minimas (light-blue) in the bottom panel you see that there is a variability back and forth between 3 and 4 samples between consecutive peaks. That would be fine if the period-time was something like 3.25 samples - however the 4-samples peak-to-peak occur with variable separation with the 4-samples periods. This means that either the frequency varies a bit througout the measurement period, or that the noise-level is large enought to cause this type of fluctuation. The solution, as best I can judge this, is higher sampling-frequency.
dpb
on 8 Aug 2022
I think if you don't remove the bum data spikes somehow they completely corrupt everything else -- if, instead of making the substitutions noted above before compute the PSD, just remove the mean of the initial time series, the result then looks like

The magnitude of those three spikes has so overwhelmed everything else in the signal that all you have is the artifacts from those three peaks alone. To prove that I then just zeroed out everything in the input signal excepting for those six points as
>> ixSpike=find(tD.Displacement<-0.6).';
ixSpike =
314 315 757 758 848 849
>> tD.DD(ixSpike)=tD.Displacement(ixSpike);
and redid the PSD with it and plotted on top of the above -- almost indistinguishable results showing that the actual data you were interested in is totally swamped by those three anomolous readings. I'm guessing that's causing much of the difficulties in the other interpretation, too.

Both of @Bjorn Gustavsson's comments above are true as well, I believe -- the data are definitely undersampled and there's enough jitter in the time sampling to have some effect -- both of those problems in the data collection should be fixed rather than trying to make something from the present data set which also suffers from the bad data that's really tough to replace given the above issues -- although even just putting in the mean is probably not nearly as bad as the effects of leaving it in.
Look up "antialising" -- if you don't have any at hand, I've a pair of Waveteks that have corner from 1 Hz to 100 kHz, each two channels. I've retired entirely from the consulting gig and would let them go for cheap; they've just been sitting on the bench for years, now...

Jeffrey Clark
on 10 Aug 2022
@Mohammed Lamine Mekhalfia, I think the answer provided by @dpb is valid although @Bjorn Gustavsson has valid points as well. By better approximation of the peak actual value in amplitude, frequency and phase a cosine can be overlayed on the signal and has good correlation with the mean adjusted measurements in both where the "noise" seems strong and where mostly absent:


Their points about the missing data, slight time variances, and probable noise level do need to be addressed in future samples. The noise could be mitigated by a much longer sample period. For the record my calculations building upon dpb's code produced:
- amplitude: 2.0559e-04
- frequency: 28.3870 no chirp detectable
- phase: -2.6946 at time=0
%Compute cosine model of signal using dpb's code given in another answer
L2 = 2^(nextpow2(L)+1);
Y=fft(tData.D,L2);
P2=abs(Y)/L;
P1=P2(1:L2/2+1);
f = (0:L2/2)*(Fs/L2);
P1(2:end-1)=2*P1(2:end-1);
[~,P1i] = max(P1);
P1il = find(P1(1:P1i-1)>P1(2:P1i),1,"last")+1;
P1iu = find(P1(P1i+1:end)>P1(P1i:end-1),1,"first")+P1i-1;
if P1(P1il)<P1(P1iu)
P1il = find(P1(P1il+1:P1i-1)>=P1(P1iu),1,"first")+P1il;
elseif P1(P1il)>P1(P1iu)
P1iu = find(P1(P1i+1:P1iu-1)>P1(P1il),1,"last")+P1i-1;
end
figure
plot(f(1:P1il),P1(1:P1il),'b-',f(P1il:P1iu),P1(P1il:P1iu),'r-',f(P1iu:end),P1(P1iu:end),'b-')
Ypl = unwrap(angle(Y(P1i:-1:P1il)));
Yp = [Ypl(end:-1:2);unwrap(angle(Y(P1i:P1iu)))]';
hold on
Yps = max(abs(diff(P1(P1i-1:P1i+1))))/(max(Yp)-min(Yp));
csP1 = cumsum(P1(P1il:P1iu).^2);
pP1 = polyfit(f(P1i-1:P1i+1),P1(P1i-1:P1i+1),2);
pYp = polyfit(f(P1i-1:P1i+1),Yp([P1i-1:P1i+1]-P1il+1),2);
Xpo = -pP1(2)/(2*pP1(1));
Ypo = polyval(pP1,Xpo);
Zpo = polyval(pYp,Xpo);
plot(f(P1il:P1iu),Yp*Yps+Ypo,'g-',Xpo,Ypo,'r*')
axis([f([P1il-1 P1iu+1]) min(P1(P1il-1:P1iu+1)) max(0,max(Yp)*Yps)+Ypo*1.1])
Tx4 = [0:0.25:length(tData.T)]/Fs;
figure
plot(tData.T/1000,tData.D,Tx4,Ypo*cos(Zpo+Tx4*Xpo*2*pi))
12 Comments
dpb
on 10 Aug 2022
Good demo that one can make silk purses of sow's ears, sometimes... :)
I was too lazy before to do so, but even with only the 1K points averaging will help --
[PSD,ff]=pwelch(tD.D,[],[],[],Fs);

which is much smoother, averaging out the noise. However, this still had the bad points substituted for; leaving them in just generates nothing but trash; averaging can't make up for the amount of "energy" contained in those few cycles as compared to the total of the rest of the entire signal.
Mohammed Lamine Mekhalfia
on 11 Aug 2022
dpb
on 11 Aug 2022
That plot looks like a double-sided spectrum plotted against a baseband frequency vector...
Mohammed Lamine Mekhalfia
on 11 Aug 2022
But you should plot double-sided again the -Fmax:+Fmax, not 0:Fs -- the two peaks are symmetric about DC (0 Hz).
If you're sampling at ~88 Hz, the Nyquist frequency is about 44 Hz, NOT 88. There are two peaks at +/-28.4 Hz in that signal, not at 28.4 and 56.8 that your plot shows.
You've got the same problem as another answered just yesterday <Why-fwhm-in-the-frequency-domain-twice-the-value-expected?>
Jeffrey Clark
on 11 Aug 2022
Edited: Jeffrey Clark
on 11 Aug 2022
@Mohammed Lamine Mekhalfia, regarding the above dpb comments, you did not do the processing you siad in your comment that included code suggestions from MATLAB's fft Noisy Signal example. The L/2 when computing the single-sided spectrum P1 and corresponding f combines the + and - spectrum and divides 88 to the useable 44 Hz.
With regard to your statement "I would like to explain my experiment as I would appreciate if someone with your great expertise ca advice on how to analyse the problem and resolve it ofc if you would like to hear", I would like to know as much as you can provide about:
- You talk about several sensors, but only one being used in the data you provided and a second one you have now looked at. Are all of these sensors recording during the same test or just one for each test?
- Are all the sensors the same technology and/or do they have different sensitivity or response characterists?
- Is the object rotated such that different surface types, reflectivity or other conditions might be examined during the 11 seconds of data collection?
- Is this a 3D object being rotated simultaneously about 2 axes?
If you have N sensor data available from the same test that effectively is providing N samples per rotation their different recording amplitudes and times could be taken together in one analysis. Or if their relative positions are known (e.g. if 4 maybe at 0 90 180 270 degrees) but only recorded against the same times it could be used to estimate their times.
Mohammed Lamine Mekhalfia
on 12 Aug 2022
Jeffrey Clark
on 12 Aug 2022
@Mohammed Lamine Mekhalfia, thank you for the info. I have questions about he data.xlsx that you provided containing Displacement, time and time in S columns that we have all been working from.
- Displacement is a floating point representation that we assumed was some sort of measure of oscillation, but all negative values. Is this from the one sensor that you show as the speed sensor or one blade detector sensor? And how is Displacement provided by the machine?
- time is an integer and time in S is simply the formula time*2e-8 but how is time provided by the machine?
I have a concern that the data.xlsx is not being provided by the machine but rather that some transfer of data in bits, bytes or other, potentially in realtime, is being captured and then re-encoded into the values in the spreadsheet but doesn't correctly represent them. Consider the issue of delta time not being consistent. I had assumed this was perhaps due to truncation of more precise data and conversion from the machine measurement units, but looking at a histogram of the differences it is not a normal distribution. And looks very similar to the distribution of the of Displacement data; there should not be two peaks in either of these:


dpb
on 12 Aug 2022
@Jeffrey Clark -- while the dt should ideally be one value and undoubtedly would be if a fixed-rate SSH A/D with independent timebase were being used for the data collection, the observed data looks very much like what would be expected if the sampling were, instead a free-running tight loop running at either the speed the collection device is capable of or with a software timer interrupt.
In the latter case, the variablility in time from loop to loop would be expected and to have two (or, in reality three here) peaks isn't at all surprising. I'd venture, even without knowing more about the setup, that the displacement data actually reflects that timing by also showing the same general shape -- the displacement is longer with the longer sample time at a constant shaft speed and that's what the above shows.
Good thought to look at, though, one could probably use the above observation as the way in which to convert the variable-sampled data to an equivalent fixed rate by scaling the displacement to the dt of the sample.
Jeffrey Clark
on 13 Aug 2022
Edited: Jeffrey Clark
on 13 Aug 2022
@Mohammed Lamine Mekhalfia, @dpb, @Bjorn Gustavsson it seems that what we presumed to be the signal in the displacement data is also corrupting the time data. Still think it goes back to how/who created the data.xlsx file and can we get the actual data. This is the result of an fft of the diff(timeInS) :

Mohammed Lamine Mekhalfia
on 15 Aug 2022
dpb
on 15 Aug 2022
Yeah, you'd already mentioned this -- am so engrained with expericence of using a tach independently to measure shaft speed and then collecting the vibration data with conventional multi-channel analyzers, still responded to @Jeffrey Clark's comment as if that were the case here as well.
The data then match the expectation as you note; what that implies is that one should definitely use the procedures appropriate for non-uniform sampling; it's probably not the best idea to try to resample back to a uniform time base since the DUT itself is actually somewhat variable.
Categories
Find more on Spectral Measurements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
