I have noticed this too. Even if you do not shuffle in every generation and shuffle it once, it will be slower than rng("default"). Of course, it might be unnoticeable depending on your code.
But... We do not know the nature of your iterations. If you are using parallelization (e.g. parfor), than you need to do rng("shuffle") right after the parfor definition line. Otherwise, after each individual for loop is done, it goes back go rng("default").
If your iterations are not using parallelization, then you should be fine by just adding that at the beginning of your code.