Calculating moving standard deviation (bounds) for time series

4 views (last 30 days)
I have a N x 6 matrix and of interest are columns 1 and 5. Column 1 is time data in Excel's serial date format (e.g. 4.264400001156250e+04 = 01-Oct-2016 00:00:00) and Column 5 is Measurement data.
The time data spans across M days and has interval in seconds (not regular), so the value of N is quite large (~40,000).
What I would like to do is this: For a Measurement data on day X, calculate the standard deviation(σ) for that day based on the Measurement data for past Y days. How can I script this out? I can do this in Excel, but due to the sheer number of rows, it tends to get somewhat slow.
  2 Comments
John Chilleri
John Chilleri on 18 Jan 2017
Edited: John Chilleri on 18 Jan 2017
Would you mind attaching the data file?
I'd prefer checking my code after running it with your data to ensure its correctness.
Also, do you want the resulting standard deviations stored as their own column-vector variable, written to a csv file, or printed to the command window?
Or if you'd prefer, I can outline the process of how to code this and let you do it yourself!
BenL
BenL on 19 Jan 2017
Hi John, sure I have attached the data file. Column-vector variable would be great actually. Looking forward to your advice regarding this. Thank you

Sign in to comment.

Answers (1)

John Chilleri
John Chilleri on 19 Jan 2017
Hello,
I created the following code which should do as you wanted. An explanation of the code is below.
%
% function standard_deviations = pickfcnname(data, Y)
%
% Inputs:
%
% data N x 6 matrix where column 1 is time, column 5 is measurements
% Y Number of days (if available) to use in std calculation
%
% Outputs:
%
% stdevs column of length N containing associated standard deviations
%
function stdevs = pickfcnname(data, Y)
%
% Initialize
N = size(data,1);
stdevs = zeros(N,1);
%
% Note: the first few standard deviations will be computed from few points
% and will thus be weird.
stdevs(1) = 0; % NaN probably more truthful
stdevs(2) = 0;
%
Dates = floor(data(:,1));
Values = data(:,5);
for i = 3:N
%
% Acquire starting point
if (Dates(1) < Dates(i)-Y)
j = 1;
while (Dates(j) ~= Dates(i)-Y)
j = j+1;
end
lbound = j;
else
lbound = 1;
end
%
% Compute standard deviation from (starting point:previous point)
stdevs(i) = std(Values(lbound:i-1));
end
end
Explanation:
The code takes in the N x 6 data and an integer Y number of days (i.e. Y = 3 uses the past 3 days to compute the standard deviation).
The code sets the first two standard deviations to 0 as there aren't enough data points to compute a standard deviation. You should also recognize that the first few points after that have standard deviations made from few points.
The code then loops from the third to final data point, setting a lower bound lbound at Y days prior if possible, 1 otherwise (for initial stds). The standard deviations are then computed using the Y days ago to previous point's measurements and stored in the output vector stdevs, which has the same number of rows as your input data.
Hope this helps!
  2 Comments
BenL
BenL on 23 Jan 2017
Edited: BenL on 23 Jan 2017
Dear John
Sorry for the delay in response. The code runs well. Have a question if you don't mind?
Assuming I have data from Day 1,Day 2, Day 3, Day 4, Day 5, and if Y = 3, then technically the standard deviation values should only come from Day 4 onwards right? (there isn't enough Y days for Days 1, 2 and 3).
I understand that you set the first 2 standard deviations to 0 and seek the starting point using j and lbound. Would it be possible to simply use :
for i = Y+1:N .... end
Sorry I am quite new at this
John Chilleri
John Chilleri on 23 Jan 2017
Edited: John Chilleri on 23 Jan 2017
Yes, you can definitely start at 3 days delayed, which would require a check for which point indicates the fourth day.
As it is now, it computes the standard deviation based on what's available, so on day 3 it uses days 1 and 2 only since those are all that are available.
If you want to make it so no standard deviations are computed until day 4, we can change that pretty easily:
start = 1;
while (Dates(start) < Dates(1)+3)
start = start + 1;
end
for i = start:N
% etc
The above finds the first data point corresponding to day 4 and labels that as start, where the for loop can begin.
I believe that would do the job!

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!