Why does smoothdata give bad results at the beginning and end of a dataset?

38 views (last 30 days)
When I use smoothdata to process my data set, I am getting bad results at the beginning and end.
At the beginning, the smoothed curve is clearly much higher than what it should be. At the end, it is subtly lower.
It almost feels like the smoothdata function is "reflecting" the dataset to compensate for missing data.
Am I missing some sort of configuration for smoothdata or am I using entirely the wrong function?
  1 Comment
Jonas
Jonas on 8 Aug 2022
hiw did you call your function exactly (to see algorithm type, wibdow length, etc)? without knowing, you cpuld try pre and post padding your data with the first/last value of your actual data vector

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 9 Aug 2022
Think about it. This is really classic behavior, completely expected. Imagine the smoothing routine as a mathematical implementation of a thin, moderately flexible beam, that is forced to pass roughly through the data. But the beam will not be TOO flexible, as you don't want it to chase every bump in the curve. Make sense?
A problem though, is when you get near the ends of your data, curvature there is difficult to deal with. The smoother is unable to distinguish between curvature there, and noise. And some smoothers will be better able to predict well near the ends.
I'll create some simpe data to try to explain what happens.
x = linspace(0,2*pi,200);
y1 = cos(x) + randn(size(x))/10;
y1smooth = smoothdata(y1);
plot(x,y1,'ro',x,y1smooth,'b-')
Do you see that where the smooth misses the data the most, is in the regions where the curve exhibits high curvature? Think of a smoothing tool as a low pass filter, it tries to filter out any high frequency stuff, leaving behind only the low frequency stuff. Now, I am sure I could have made smoothdata do a better job here, were I not to simply use it with the defaults, but it exhibits what I want you to see.
A smoothing tool tries to kill off any signal with high curvature, because that is often a symptom of noise. The default method in smoothdata is a moving mean filter, but one that clearly uses a faiirly wide moving window. This is a good scheme for very noisy data. (Different filters have different characteristics. For example, moving median filters are great when you have noisy data that is compromised with outliers.)
But your data has a decent signal to noise ratio. So you either need to use a narrow window in the moving filter, or better yet, to use a different method. Personally, I like Savitsky Golay smoothers for problems with a reasonably strong signal.
y1smoothSG = smoothdata(y1,'sgolay','degree',3);
plot(x,y1,'ro',x,y1smoothSG,'b-')
Never use too high of an order in the Savitsky-Golay. 3 should be an ok compromise here.
  2 Comments
Rick Yuan
Rick Yuan on 11 Aug 2022
Edited: Rick Yuan on 11 Aug 2022
Thanks! Using Savitsky Golay really helped smooth out all regions of my dataset.
The main reason I wanted to smooth the data was to take the time derivative of my temperature data. Applying another smoothing on the derivative helped greatly with that too. It's actually really interesting how the derivative for the Savitsky Golay smoothing doesn't jump around until we get into the middle of the dataset. That's probably another artifact of the start and end of datasets on smoothing algorithms.

Sign in to comment.

More Answers (1)

William Rose
William Rose on 9 Aug 2022
Edited: William Rose on 9 Aug 2022
[edit: change "on one side" part of my description to "inside" and "outside", which is more clear, I hope]
You are not using the wrong function.
x=1:250;
y=12.2+4.5*sin(x*2*pi/1200)+rand(size(x))/10;
plot(x,y,'-b')
Looks kind of like your data. Try smoothing with default options.
y1=smoothdata(y);
hold on; plot(x,y1,'-r')
This has the same problems as your example, only worse. Here's what's happning, I htink:
smoothdata by default uses a flat moving average window centered over each data point. Near the edges, the window is no longer centered, because it is truncated on the outside but not truncated on the inside, and at the very edges, the window extends only inside, and not at all outside, the data point being smoothed. So the average in this case will behave like what you see in your example and like what is shown above. Try other method options to see if you like one of the others better.
  1 Comment
William Rose
William Rose on 9 Aug 2022
Edited: William Rose on 9 Aug 2022
Let's try the other methods. You could also try adjusting the smoothing factor or width, etc.
x=1:250;
y=12.2+4.5*sin(x*2*pi/1200)+rand(size(x))/10;
y1=smoothdata(y,'movmean'); y2=smoothdata(y,'movmedian');
y3=smoothdata(y,'gaussian'); y4=smoothdata(y,'lowess');
y5=smoothdata(y,'loess'); y6=smoothdata(y,'rlowess');
y7=smoothdata(y,'rloess'); y8=smoothdata(y,'sgolay');
plot(x,y,'k.',x,y1,'-r',x,y2,'-g',x,y3,'-b',x,y4,'-c',...
x,y5,'-m',x,y6,'--c',x,y7,'--m',x,y8,'-y');
legend('raw','movmean','movmedian','gaussian','lowess','loess',...
'rlowess','rloess','sgolay','Location','southeast')
xlim([0 70]); ylim([12 13.6])
At the left edge, shown above, methods loess, lowess, their robust versions, and S-G all do much better than movmean, movmedian, gaussian. At the right edge (not shown), methods loess, rloess, and sgolay are the best. In this example.

Sign in to comment.

Tags

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!