# How does one verify the results of simulation from genetic algorithm

19 views (last 30 days)
Kah Wei on 5 Aug 2014
Commented: Geoff Hayes on 9 Aug 2014
Recently I am using genetic algorithm to optimize a 2 by 2 controller.The optimization runs well and I manage to obtain the result.
However from the simulation result it is shown that the mean fitness value does not change much during the simulation as in it is almost a straight line for the whole simulation.I tried to change a few options like crossover,mutation,selection and even increase the population and the range of the initial population but there is no change on how the mean fitness proceed during the simulation.I attached a diagram here for illustration.I read a few examples on MATLAB and usually the way mean fitness proceed is almost like the best fitness value where there is a slight curve in a decreasing manner as we proceed from one generation to another generation.So my questions are below
1)Is it normal for the mean fitness value of my optimization process to behave in such a manner?If it is an indication of bad result,how can I proceed to improve the result of the simulation?
2)How does one confirm or verify that the results from the optimization via GA (genetic algorithm) is good?
Please kindly give me some guidance on this.Thanks.

Geoff Hayes on 5 Aug 2014
Yes, this behaviour is possible (though may not be desirable). It could be that the population has prematurely converged to a sub-optimal solution. So it could be a "good" solution, just not the best one.
What changes to the mutation, crossover, and selection operators did you make? What population sizes have you been using? Because you can modify these parameters to help avoid or delay premature convergence (i.e. increase the mutation in a population).
On multiple runs of the GA, do you always get the same result (does the population always converge to the same solution)? The documentation states that the initial population is randomly generated using a uniform random number generator (doesn't say which one), but it could be that the same set of random numbers is being used for the initial populations at each run. (I suppose you could verify this if you can print out the initial populations.) It may be useful (and it can't hurt!) to seed the random number generators via rng. Use rng('shuffle') which seeds the random number generator based on the current time.
Kah Wei on 6 Aug 2014
Hi Geoff.Nice to talk to you again.
The changes that I made on the options of the genetic algorithm are shown in the program that I attached (calculation_error2.m)
On multiple runs (5 runs to be exact) I am getting almost the same reading for the best fitness and the mean fitness value.I also initiate rng ('shuffle') as recommended by you.The value for best fitness also stay roughly the same during the simulation as shown on the command window from the diagnostic information.
How can I modify the parameters to avoid premature convergence and get the best solution?Is there any scientific way to achieve that rather than trial and error which is time consuming and not efficient?

Geoff Hayes on 6 Aug 2014
Hi Kah,
Five runs without much change could mean either that your code has found the optimal solution or it is just getting stuck.
I noticed that you are using the following for the crossover
options = gaoptimset(options,'CrossoverFraction',0.8,...
'CrossoverFcn',@crossovertwopoint);
According to the documentation (see crossover options), this crossover function works as follows
Two point (@crossovertwopoint) selects two random integers m and n between 1 and Number of variables. The function selects
Vector entries numbered less than or equal to m from the first parent Vector entries numbered from m+1 to n, inclusive, from the second parent Vector entries numbered greater than n from the first parent. The algorithm then concatenates these genes to form a single gene.
For example, if p1 and p2 are the parents [a b c d e f g h] and [1 2 3 4 5 6 7 8] respectively, and the crossover points are 3 and 6, the function returns the following child [a b c 4 5 6 g h].
So while it does produce a child that is different from either parent, i.e. the child has a subset of the variables from both parents, it does not change the variables at all - they still have the same values as the parent variables. This means that the variable values will only change due to a mutation.
I suggest you try the crossoverintermediate, crossoverarithmetic, or crossoverheuristic. All three indicate that the child variable will be some combination of the two from the parents - so it won't necessarily be the same. The first two should ensure (if you leave the default Ratio as is) that the resulting child variable is still within the interval defined for that variable. The crossoverheuristic may not do that unless you set the Ratio to be one or less. It has an interesting algorithm whereby the child variable will be closer to the variable of the parent with the better fitness. So you could try this one first with a ratio of 0.95 say. The default ratio is 1.2 so you could leave as is, though your new child variable value may fall outside of the interval (unless the GA code can be told to enforce intervals).
_____________________________
I also noticed that the intervals for your 11 variables are very narrow. If you look at the difference between the lower and upper bound for each one, we see them as
1.123
3.38
0.999
0.68177
2.476
1.049
0.3739
2.193
0.3287
1.0671
0.691
For the seventh and ninth variables, the difference between their lower and upper bounds are 0.3739 and 0.3287 respectively. I don't know what either variable represents, but I just wonder what would happen if you fix both to something and then reduce your problem to a nine variable optimization problem, would that lead to a better solution?
Try changing the crossover operator to the heuristic one - leave the default ratio of 1.2 and see what happens. Continue to decrease it on subsequent runs to 0.95, 0.85,...,0.45 (maybe no lower than this as then the parent with the worse fitness will have its variables weighted more). If you see no change in the convergence of your solution, then try reducing your problem from 11 variables to 10, to 9 for those with the small intervals.
_____________________________
By the way, how long does it take one run of the GA (so population 3000, 100 iterations/generations)?

Geoff Hayes on 9 Aug 2014
Hi Kah - so you could run 100s of simulations then without it taking too long. Since you always seem to be getting the same solution, I suppose you could try to "penalize" that one in the fitness function so that other optimal solutions may be encouraged to dominate. In the fitness function, you could write some code that would check to see if the 11 variables are close (to within some tolerance) of those of the usual solution. If true, then just increase (double?) the fitness function output.
Kah Wei on 9 Aug 2014
Hi Geoff.
I don't understand what you mean above.Do you mind give me some example to illustrate what you mean?
Today I tried to run multiple simulations by manipulating the rate and ratio for @mutationuniform and @crossoverheuristics respectively and finally I manage to obtain optimized controller that give me good result compared to the default controller. I am running 10 simulations for each case as shown below.
Simulation 1 - @mutationuniform at rate of 0.3 1)@crossoverheuristics at ratio of 0.45 (10 simulations) 2)@crossoverheuristics at ratio of 0.85 (10 simulations) 3)@crossoverheuristics at ratio of 0.95 (10 simulations)
Simulation 2 - @mutationuniform at rate of 0.5 1)@crossoverheuristics at ratio of 0.45 (10 simulations) 2)@crossoverheuristics at ratio of 0.85 (10 simulations) 3)@crossoverheuristics at ratio of 0.95 (10 simulations)
I manage to get an optimized parameters that will improve the performance of the controller in the performance criterion that I am researching.The optimized controller is obtained when the rate for @mutationuniform is 0.5 and the ratio for @crossoverheuristics is 0.95 at 4th simulation.
The value for best fitness value and the mean fitness value are still quite close though.
:)
Geoff Hayes on 9 Aug 2014
Awesome that you got some good results, Kah!
For the "penalty", I had meant something along the lines of: suppose that x1,x2,x3,…,x11 are the optimized values for each of the 11 values. If you pass these 11 variables into the fitness function, the scalar result is y (say). And you know all of this from your previous runs/simulations. Now suppose that a parent or child has 11 variables that are considered near-identical to x1,x2,…,x11 i.e. if x is the input vector to the fitness function, then within this function do
tolerance = 0.000001;
if abs(x1-x(1))<tolerance && ...
abs(x2-x(2))<tolerance && ...
%etc.
abs(x11-x(11))<tolerance
% the input variables are considered near-identical to the
% optimal solution found on previous runs, so just double
% the fitness function value
outFitnessVal = 2*y;
end
So now, this parent or child has a fitness function value that is double what it should be, and so it may be discouraged from "reproducing" and so another solution may become optimal.
Since your results improved with varying the mutation rate and the crossover ratio, you can now ignore this. Though it would be interesting to see what happens if you run 100s of simulations.
As for The value for best fitness value and the mean fitness value are still quite close though, I think that this is expected - over time, the population will converge to the optimal solution, so the average should start to edge closer to this optimal solution.