Random generator seed for parallel simulation using fast restart

I'm going to perform a monte carlo simulation by simulate a simulink model thousands of times. The model are built by blocks from SimEvent-toolbox and some standard blocks.
I use one simulink function block that are calling a MATLAB-function block. In here I try to simulate the break down of sensors during one year of service in the machine (which is the rest of the model).
MATLAB-function:
function [outputPort,log] = failureRate(time,lambda,nSensors)
persistent failed failedCount
if isempty(failed)
failed = false(1,nSensors);
failedCount = 0;
rng('shuffle')
end
% Failure rate function and random variable
failRate = @(t,lam) 1 - exp(-t*lam);
r = rand(1,nSensors);
nBroken = 0;
% Check if sensors break or not
if any(r < failRate(time,lambda))
tmp = failed;
failed = max(failed, r < failRate(time,lambda));
if failedCount < sum(failed)
failedCount = sum(failed);
nBroken = sum(failed - tmp);
end
end
if all(failed)
% All sensors broken
outputPort = 2;
log = [nBroken,time];
else
% Some sensors still working
outputPort = 1;
log = [0, time];
end
end
This function is called each time an entity enters a "Entity Server"-block, which will decide which path it will go (failed sensor path or not).
I can run this model using a normal for loop using fast restart but as soon as I try to use Fast restart for any parallel computing the randomization becomes repeated.
I've tried following:
for i = 1:n
simIn(i) = Simulink.SimulationInput('SensorFailure');
simIn(i) = simIn(i).setPreSimFcn(@(x) rng('shuffle'));
end
but it does not work when Fast restart is actiaved using the parsim command. Also I've tried with and without the "SetupFnc" option in parsim (and all different combinations of these three.
out = parsim(simIn,...
'ShowSimulationManager','off',...
'ShowProgress','on',...
'TransferBaseWorkspaceVariables','on',...
'UseFastRestart','on',...
'SetupFcn',@() rng(randi(9999999)),...
'StopOnError','on');
I also tried using parfor loop but as long as fast restart is active the randomization will be repeated.
parfor i = 1:n
rng('shuffle') %tried with shuffle and randi(10000)
out = sim('SensorFailure','FastRestart','on');
data{i} = out.sensorfailed(out.sensorfailed(:,1) > 0,2);
if mod(i,10) == 0
disp(i)
end
end
Since I'm looking to run thousands of simulation and problably things will need to be redone once the results are analyzed any gain in simulation time is worth some time to investigate!
I've tried using rng('shuffle') and rng(randi(100000)) for all possible situations since I've understood that "shuffle" is not optimal for parallel computing. I've also tried these commands in the simulink models callback (initializing and start callbacks)

 Accepted Answer

After a while I managed to find a solution that solved my problem!
By adding one more input parameter to the matlab function and feeding that input with a constant value-block:
function [outputPort,log] = failureRate(time,lambda,nSensors,seed)
persistent failed failedCount
if isempty(failed)
failed = false(1,nSensors);
failedCount = 0;
rng(seed) % Initiate with seed from constant block outside of function.
end
% Failure rate function and random variable
failRate = @(t,lam) 1 - exp(-t*lam);
r = rand(1,nSensors);
nBroken = 0;
if any(r < failRate(time,lambda))
tmp = failed;
failed = max(failed, r < failRate(time,lambda));
log = [0,0];
if failedCount < sum(failed)
failedCount = sum(failed);
nBroken = sum(failed - tmp);
end
end
if all(failed)
% All sensors broken
outputPort = 2;
log = [nBroken,time];
else
% Still working
outputPort = 1;
log = [nBroken > 0, time];
end
end
Then for each simulation in the SimulationInput-object randomize the seed value as below:
seedBlockPath = 'SensorFailure/Subsystem/Simulink Function/Seed';
for i = 1:n
% Set random seed for each simulation
simIn(i) = Simulink.SimulationInput('SensorFailure');
r = randi(99999);
simIn(i) = simIn(i).setBlockParameter(seedBlockPath,'Value',num2str(r));
end
out = parsim(simIn,...
'ShowSimulationManager','off',...
'ShowProgress','on',...
'TransferBaseWorkspaceVariables','on',...
'UseFastRestart','on',...
'StopOnError','on');

13 Comments

Niklas,
Did you try using the output of a Uniform Random Number block as the input to the Matlab Function? If that works, then (I think) you can keep your Simulink model separated from the base Matla random generators, which may be advantageous. Unless you don't prefer the generator that the block uses.
Setting that aside, if I understand correctly I think the code is trying to model the time to failure of each sensor as an exponential random variable. If that's the case, did you verify that the failure times produced by model follow that exponential distribution across the Monte Carlo runs?
No I never tried using that block because the very thing I wanted was to have different randomization on each run. Since you have to specify the seed for the Uniform Random Number-block it will always give the same result, if I'm not mistaken?
Good point regarding the distribution. See results below. Each graph represents 10 000 simulation runs using a set number of sensors. If one sensor fails the simulation is stopped at that time. So the time to failure decreases as the sensors increases, which is reasonable.
From my experience I would say that the cases with 10 and 20 does represent a distribution from the exponential-family, like a gamma-distribution or so.
For the other two it might be hard to see due to the low failrate. Just plotting the fail rate-function within this time frame gives a konstant slope-graph ( y = x ). What do you think?
But you can specify the seed to the block through simIn just like any other parameter in the model, at least I think you can.
What exactly are those plots? After 10000 runs, how are those plots constructed?
Yes, that's true. But I guess this would result in the same thing? I still take random samples from rand()-function which is a uniform distribution.
Oh sorry, should have mentioned, they are histograms. So each bar represents the amount of times a sensor broke down during a simulation. Then you need to replace this sensor.
I was just curious if using that block would solve the original problem with parsim/fastrestart. If it does, it might or might not have advantages.
As for the histograms, I'm stil not sure what to think. What I do recommend, if you haven't done it already, is run the simulation with a single sensor and stop the simulation when the sensor fails. Plot the hisogram of the simulation stop time and compare it to the pdf of an exponential random variable (check exppdf() ).
I think that the function failureRate is being called at a fixed time step in the simulation and I assume that the time input is time since simulation start, in which case the function failRate (which isn't a failure rate, it's really a probability of failure) doesn't look right. But maybe I'm misunderstanding the code.
What you recommend is exactly what I've done with the graphs you see. The upper-left graph is run with one sensor only. The distribution there does have a similar look as the exponential functon I use, since I've turne the equation around (by 1 - exppdf() instead of just exppdf()) it does get a logarithmic apperance. It is however not perfect but for the current use case it works fine!
And that is true, the name FailRate is wrong, since it was the fail rate I had when I started develop this function I just gave it some name.
Hi Niklas,
If you're satisfied with what you have, then I guess no further discussion needed.
What equation have you turned around? What exactly does 1 - exppdf() mean? Or did you mean 1 - expcdf()? Is the "time" input argument to the function time since t = 0? Or is it the time since the previous call to failureRate()?
Hi again!
You are very well informed or wise my friend! My understanding is now that the function I use and are supposed to use is the CDF of the probabilty. Which would be the integral of the probability distribution, right?
Therefore would it make sens that I compare that function at each time step of the simulation, since this will increase the risk of a sensor break down as the time goes. The function I was talking about in the pevious post is seen in the graph below.
Why I used this function:
I have used a fail rate density function (for when you have a constant fail rate) as below:
and the primitive function (integral) of this is the one used in the graph below.
Do you understand and would you agree?
I understand that you are modeling the time-of-failure (T) as a random variable with an exponential distribution. Its CDF is
P(T < t) = F(t) = 1 - exp(-lambda*t), for t>=0, and 0 otherwise. The parameter lamda is constant and is the failure rate. The expected value of T is E(T) = 1/lambda.
Its PDF, the derivative of the CDF, is
f(t) = lambda*exp(-lambda*t), for t>=0, and 0 otherwise.
So the question becomes, how does one generate the failure time, T, in a simulation such that across many simulations T has an exponential distribution. Assuming (and this is a very important assumption) that the function failureRate() is called with a constant sample time, the code below emulates the process in the simulation and workflow as I understand it.
lambda = 0.5; % example failure rate
nruns = 1000; % number of Monte Carlo runs
dt = 0.01; % fixed sample time of failurRate()
failuretime = nan(nruns,1); % pre-allocate results
rng(100); % for repeatability
for ii = 1:nruns % Monte Carlo loop
% initialize simulation
time = 0;
failed = false;
while ~failed % run simulation
failed = rand < 1 - exp(-lambda*time); % the test
if failed
failuretime(ii) = time;
else
time = time + dt;
end
end
end
% check the mean time of failure. Should be 1/0.5 = 2
mean(failuretime)
ans = 0.1753
% plot the histogram and compare to theoretical pdf
figure
histogram(failuretime,'Normalization','pdf')
hold on
tvals = 0:.01:8;
plot(tvals,exppdf(tvals,1/lambda))
xlim([0 2])
We see that the results of the simulation do not match expected value nor the desired pdf. The reason is that the test for failure is using time since t = 0. However, at each time step where the item is determined to have not failed, the probability "resets," which is a property of the exponential random variable. So the test needs to change as in the following code:
failuretime = nan(nruns,1); % pre-allocate results
rng(100); % for repeatability
for ii = 1:nruns % Monte Carlo loop
% initialize simulation
time = 0;
failed = false;
while ~failed % run simulation
failed = rand < 1 - exp(-lambda*dt); % the modified test, use dt
if failed
failuretime(ii) = time;
else
time = time + dt;
end
end
end
% check the mean time of failure. Should be 1/0.5 = 2
mean(failuretime)
ans = 2.0654
% plot the histogram and compare to theoretical pdf
figure
histogram(failuretime,'Normalization','pdf')
hold on
tvals = 0:.01:8;
plot(tvals,exppdf(tvals,1/lambda))
Now we have a good match between the simulation and the theoretical results (increase nruns to make it match even better). Note that failuretime(ii) is not really the time of failure, it's really the trailing edge of the window, defined by failuretime(ii) - dt to failuretime(ii), in which the failure occurred. So this approach is really an approximation because it's not telling you whent the failure occurred. But it might be a reasonable approximation for your application depending on the value of dt relative to failure rate.
Another option that may be applicable (depends on what you're actually simulating) is to avoid the dynamic computation of failuretime in the simuation altogether. Instead, compute the failure times a priori with exprnd() and use them as input parameters to the simulation.
You have a valid point! My mean value does not corresponde to the inverse of my lambda. Your very important assumption however does not stand in my simulation as I've used the variable discrete solver. I will come back when I've tried both the sample time and changing the T to dt.
Thank you so much for your help and involvement!
The main problem for getting the mean value correct was just as you said, the assumption that the function is called at each time step and changing the time to dt instead of the current time.
Before was it only called once an entity arrived at a "Entity Server"-block, which was far from each time step by natural cause. I just moved out the MATLAB-function and let it run "freely" in the simulation as well as change the solver to a fixed step solver.
Se result below. I would say it is reasonable that using 10 sensors instead of 1 would increase the risk of failure since more sensors are at risk of breaking at each time step while the machine is running. The reason I do this is due to my application. The Fail Rate units is [1/hours] and MTBF in [hours]
FYI: When I changed the simulation to run the function at each timestep the simulations of course run slower but also running multiplie simulation runs (like 1000 simulation for 1 sensor and a certan fail rate, then another 1000 simultion with 10 sensors and a certain fail rate) were not possible for my computer. To fix it I had to turn of the command
'TransferBaseWorkspaceVariables'
and just input all variables using SimulationInput commands. Since the saved data increased drastically due to all the simulation steps I guess I ran out of memory.
Upon further thought, I'd like to change my very important assumption to be that the function failureRate() is called once per time step. So the variable step discrete solver should work fine as long as
the function failureRate keeps track of (or gets an input that is) the dt on each call, and
the dt on every time step is small enough so that the model still achieves a good approximation to the exponential distribution.
I guess that would work too if, as you say, the dt inbetween the calls are small enought. For my case I dont think it is.
Either way, thank you very much for your time and expertise!

Sign in to comment.

More Answers (1)

I do not know the solution to your problem, but I just want to mention that I faced a problem with SimEvents and parsim() with fast restart a while ago in R2020a, and I wrote about it here.
They reported it as a bug report here.
But the problem persists in R2021a. Maybe your problem relates to this bug.

1 Comment

That bug reports indicates the problem was fixed in R2020b. If you can reproduce your problem in R2020b (or later), then something else must be going wrong - please contact MathWorks support with your reproduction steps.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!