How do I explain the random functions to my professor?

Question

Young Chan Jung on 17 Jun 2019

3
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/467529-how-do-i-explain-the-random-functions-to-my-professor

Commented: Walter Roberson on 18 Jun 2019

ok.. so the works I do usually do not require using Matlab, but in one case, I had to generate random values for several distributions such as normal, lognormal, weibull...

I am sure all the random value generating distribution function work the same way.

I am to generate 3 random arrays. For example:

A=normrnd(3,10,[1,5])
B=normrnd(30,100,[1,5])
C=normrnd(300,1000,[1,5])

So... i get 1x5 vector named A, B, and C.

Everything going well and then my professor, who does not know anything about Matlab or coding but clames himself as an expert, heard something from someone about the random function, and now is tackling me with stuff that he can't explain. The sad part is his words might have a point since he probabily did hear this from someone who knows Matlab.

He said he does not know how matlab generates random values from rand functions, but the random function picking all 5 values at once([1,5] for the above example) is different from picking 1 value each for 5 times.
Also, it will be different from picking one value each from A,B,C for 5 times compared to picking all 5 from A, then moving on to B, then C. Now this part does not make sense, since I know matlab finishes generating vector for A and goes to B and C.

For the first part, in other words, he is saying if the random function picks one value each for 5 times, the generation of random value is affected by the previously generated value. I asked him if he is talking about permutation/combination situation, but he said it's more complicated than that.

This professor... this is the guy who says Minitab generates random function in 'right way', not because he knows the generation algorithm and process, but because this program is strictly made for statistical analysis, unlike the Matlab. I told him I will generate the values from the Minitab as he likes then use the values in Matlab, but he said 'That is not the researcher's way of doing things'...

I tried to look at the documents related to rand functions, but honestly I cannot understand.

Can anyone help me convince this professor so I can just use these functions?

8 Comments
Show 6 older commentsHide 6 older comments

Rik on 18 Jun 2019

"That is not the researcher's way of doing things"

This is not something he should have said. You should be using the correct tool for each job. If he is convinced Matlab has a wrong method of generating random values, then you can use another tool to generate those values. If the later steps in your analysis are easier/faster in Matlab, using it should be fine.

He could make an argument that using Matlab is off-limits for a researcher, because it is closed-source, but so is Minitab.

It is good research practice to always be able to explain what you're doing, with what tool, and why. There is no reason why mixing and matching programs would be bad (apart from the pipeline being a bit more difficult to understand).

This argument only suggests he doesn't understand Matlab or the way you're using it. "that's not how we do things round here" is an attitude that is hostile to good research when applied incorrectly. That sentiment should only apply to research ethics. In every other case there should be an underlying argument explaining why it would be a bad idea to do something in a certain way.

In short, that is a non-argument. He should not use it.

Bjorn Gustavsson on 18 Jun 2019

1, To make John's comment more exact, in a "meaningless" way the professor is not wrong as Walter explains below.

2, @Rik, I think that might be an allowed thing to say - in a Sokratean way - since the OP didn't grasp the documentation enough to outline how matlab generates random numbers it is not undefendable to send him/her back to work that out. It might just be a way to force the student to take some steps towards verify the algorithm design...

Adam Danz on 18 Jun 2019

These higher level topics are my favorite types of conversations to follow here (reaching back into bowl of popcorn). Thanks for posting the question, Young Chan Jung!

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 17 Jun 2019

5
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/467529-how-do-i-explain-the-random-functions-to-my-professor#answer_379575

Edited: John D'Errico on 17 Jun 2019

Open in MATLAB Online

I wonder if your professor needs to learn a few things. For starters, that Minitab is not doing anything any better than is MATLAB. The values that Minitab generates are not truly random either. Minitab uses a generator that employs a starting seed, that then forms a pseudo-random sequence.

https://en.wikipedia.org/wiki/Pseudorandom_number_generator

Each number is never going to be truly random in any such sequence. Computer random number generators use algorithms for the numbers, so each number from the sequence has properties that are as close to truly random as mathematicians can make them. There are limits to what you can do.

First, lets look at MATLAB. There is NO difference between 5 successive calls to rand, instead of rand(1,5). The way to show that is to make two sets of calls preceded by a reset of the random state to the default.

rng('default')
>> rand(1,5)
ans =
      0.81472      0.90579      0.12699      0.91338      0.63236
>> rng('default')
>> rand(1)
ans =
      0.81472
>> rand(1)
ans =
      0.90579
>> rand(1)
ans =
      0.12699
>> rand(1)
ans =
      0.91338
>> rand(1)
ans =
      0.63236

You can learn about the methods used in MATAB to generate a pseudo-random seuence, in

doc rng

There you will see various random schemes that are offered. In R2019a, you will see the statement:

" In this release, the default settings are the Mersenne Twister with seed 0."

You can choose the scheme employed, IF your professor thinks one of the alternatives is better than the default.

As for the second comment, about order that you choose from A,B,C, etc. is meaningless. The elements of such a pseudo-random sequence are independent. So the "order" is irrelevant.

However, if you did this in Minitab too? You would see similar behavior. I don't have Minitab. But I can read the documentation. For example, here:

https://support.minitab.com/en-us/minitab/18/help-and-how-to/probability-distributions-and-random-data/how-to/random-data-and-set-base/before-you-start/overview/

In there, we se that Minitab also uses a random seed. In fact, this is a good behavior for simulations. For some simulations, you WANT to be able to have a REPRODUCIBLE random sequence, which might seem to be a bit of an incongruity. Such behavior can be a good thing, as long as the sequence itself has the properties you want to see in a pseudo-random sequence.

Yes, you can apparently obtain truly random numbers from some sources. But Minitab does not make any such claim. The use of a random seed tells us that the sequence it provides is no better than that provided by MATLAB.

If we look online for true random numbers, I find this as an example:

https://www.random.org

In there, they claim:

"RANDOM.ORG offers true random numbers to anyone on the Internet. The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs."

But what we don't know is how those random numbers behave. Are they TRULY UNIFORMLY random? For example, you can have a random sequence that is not uniform either. In the end, you can trust what they claim, or you can spend the time needed to devise and perform truly extensive tests. Oh well. Take it for what it is worth.

If you are looking for a sequence that is not quite so predictable? Well, you can do a shuffle, AT THE BEGINNING!

rng('shuffle') seeds the random number generator based on the current time.

Do NOT try to shuffle the seed before each call! This is a mistake made by some people, who think they can get a truly random seqeunce by shuffling before every call to rand. That is actually a mistake, as it may result in a non-random sequence.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Walter Roberson on 18 Jun 2019

4
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/467529-how-do-i-explain-the-random-functions-to-my-professor#answer_379636

Edited: Walter Roberson on 18 Jun 2019

Open in MATLAB Online

"Also, it will be different from picking one value each from A,B,C for 5 times compared to picking all 5 from A, then moving on to B, then C. Now this part does not make sense, since I know matlab finishes generating vector for A and goes to B and C.

The professor is correct about this, but not in any useful way.

Suppose you have to generate two random numbers for each of two different variables, A and B.

Now suppose that you are using a pseudorandom generator. Pseudorandom generators generate one value at a time and then update the state, and use the new state to generate a new value. Let four generated values in a row be designated as R1, R2, R3, R4.

Now if you generate two values and assign them to A, and then generate two values and assign them to B, then A will hold R1, R2, and B will hold R3, R4. But if you generate one value and assign it to A, then a value and assign it to B, then another value for A, then another value for B, then A will hold R1, R3, and B will hold R2, R4. As this A=R1 R3 and B=R2 R4 is not the same as the previous A=R1 R2, B=R3 R4, then your professor is correct that you get different results in MATLAB depending upon the order you assign the values to the variables.

If you use the same random seed before the two series of assignments, you can see this explicitly:

rng(13576);
A1(1) = rand(1); A1(2) = rand(1); B1(1) = rand(1); B1(2) = rand(1);
rng(13576);
A2(1) = rand(1); B2(1) = rand(1); A2(2) = rand(1); B2(2) = rand(1);
A1, B1
A2, B2

A1 =

0.602986056559806 0.417270312145994

B1 =

0.372524018219478 0.543302974095882

A2 =

0.602986056559806 0.372524018219478

B2 =

0.417270312145994 0.543302974095882

You can see that the exact same values are generated, but the order is different because of how you did the assignments, and so your professor is correct that the order matters.

However, we have to ask whether this is unexpected or a disadvantage compared to the other possibilities.

The exact random pseudo-random generator in use would make a difference as to exactly which values are generated in the above code, but would make no difference at all about the pattern of the outcome. The complaint about the order mattering is like complaining that if you have a queue of values 1, 2, 3, 4, that if you assign the first two to A and the second two to B, that you would get a different outcome than if you assigned them alternately. Well, of course you do.

Consider the contrafactual: what would be needed in a system in order for there not to be a difference in outcome depending upon the order you did the assignments ?

The most direct way that could be true is if the generator only ever returned one value, like if the queue of four values only contained 5, 5, 5, 5, then it does not matter whether you give the first two 5's to A and the second two to B, or if you alternate them. This solution would, of course, promptly fail any reasonable randomness test.

The alternate way that it could happen is if the system could somehow predict the order you are going to use the values in, and then internally hold as many values as needed in order to still generate randomly but somehow the values end up assigned in the same order as if you had assigned sequentially, no matter what order you assign in. This is not something that can be done.

2 Comments
Show NoneHide None

Walter Roberson on 18 Jun 2019

Open in MATLAB Online

We can also consider the question of true random number generators. We could construct a hypothetical Trand ("true random") function

rng(13576);
A1(1) = Trand(1); A1(2) = Trand(1); B1(1) = Trand(1); B1(2) = Trand(1);
rng(13576);
A2(1) = Trand(1); B2(1) = Trand(1); A2(2) = Trand(1); B2(2) = Trand(1);
A1, B1
A2, B2

but with true random numbers, seeding does not control the output values, so A1(1) would be different than A2(1) even though at that point exactly the same ordering of assignment had been used. It is not useful to complain that the order of assignment matters for true random numbers considering that re-running the exact same code under circumstances as close as you can get to identical would produce different results (otherwise the generation cannot be truly random.)

Walter Roberson on 18 Jun 2019

Open in MATLAB Online

A=normrnd(3,10,[1,5])
B=normrnd(30,100,[1,5])

Looking at that, we can form the hypothesis that the professors might believe that the different means and standard deviations between the A and B imply that the random number generators are used differently for the two cases, and that therefore there could be a subtle order dependence as to exactly what was generated.

Imagine, for example, a situation in which generating for standard deviation 100 "used up" about sqrt(10) ~= 3.16 times more internal random numbers than is the case for standard deviation 10, in order to meet internal numeric tolerances. Suppose for example standard deviation 10 needed 3 internal random numbers to meet tolerances, R1, R2, R3, and that standard deviation 100 needed 10 internal random numbers to meet tolerances, R4 through R13, and then generate a second with standard deviation 10, using up R14, R15, R16. Now compare that hypothetical situation to generating three in a row with standard deviation 10, using up R1 R2 R3, R4 R5 R6, R7 R8 R9 . We can see that in this hypothetical situation, the "third" random number generated is standard deviation 10 in both cases, but is internally derived from different internal random numbers, R14 R15 R16 versus R7 R8 R9.

However.... this scenario is not how normrnd works at all. normrnd calls upon randn() to generate one random number, and multiplies it by the standard deviation and adds the mean. The standard deviation has no effect on how many internal random numbers are "used up". R1 is "used up" for the first one no matter what the standard deviation, R2 is used up for the second one no matter what the standard deviation, R3 is used up for the third one no matter what the standard deviation.

On the other hand, this situation of each call "using up" exactly one internal random number does not always hold between rand() [uniform random] and randn() [normal distribution]. Some of the random generators use a "ziggurat" algorithm for randn() that can "use up" more than one internal random number to get the proper distribution towards the tail ends. When those random number generators are configured, then

rng(TheSeed); randn(1); A = rand(1);
rng(TheSeed); rand(1); B = rand(1);

can potentially give different results for A and B due to more internal random numbers possibly being "used up" by randn() if the randn() was generating towards a tail of the distribution. MATLAB documents the different random number generator algorithms that are available.

Sign in to comment.

How do I explain the random functions to my professor?

8 Comments
Show 6 older commentsHide 6 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

How do I explain the random functions to my professor?

8 Comments Show 6 older commentsHide 6 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

8 Comments
Show 6 older commentsHide 6 older comments

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None