EE6227-遗传算法概要、优化策略及数学定理

编程入门行业动态更新时间:2024-10-17 15:25:48

Involved Algorithms

GAs, PSO, Constraint Handling, Evolutionary Strategy, Differential Evolution, Multi-Objective Algorithms and multimodal optimization.

Principle of Evolutionary Algorithms

Proper encoding is necessary.
There is a certain index to quantify the fitness of every individual.
The selection of parents and the reproduction of parents is to some extend random.
The population of solutions will be better as the number of generations increases.

Introduction to Optimization

The problem can be described as:

M i n i m i z e f ( x ) o v e r x ∈ Ω Minimize\,f(x)\,over\,x\in\Omega Minimizef(x)overx∈Ω

Ω i s t h e p e r m i s s i b l e r e g i o n o f t h e d e c i s i o n v a r i a b l e s , w h i c h i s d e f i n e d b y c o n s t r a i n t s . \Omega\,is\,the\,permissible\,region\,of\,the\,decision\,variables,\,which\,is\,defined\,by\,constraints. Ωisthepermissibleregionofthedecisionvariables,whichisdefinedbyconstraints.Discrete solution can be considered as an approximate solution to the continuous problem. For example:

M i n i m i z e f ( x ) = x 2 w h e r e − 1 ≤ x ≤ 1 Minimize\,f(x)=x^{2}\,where\,-1\leq x\leq1 Minimizef(x)=x2where−1≤x≤1

The value of x can be discretized for example using 32 bits plus a sign bit, so totally there are 2 33 2^{33} 233 discrete solutions.

However, it is not efficient enough because the problems are likely to be a large numer of enumerations. Consequently, real-parameter evolutionary algorithms is much better.

Global Optimum v.s. Computational Cost

Due to the computational time, in most cases it is impossible and unnecessary to find the global solution. Algorithms such as the GA or simulated annealing can be used because these approaches make use of randomness to escape from local optimal solutions. However, no method can guarantee to produce the global optimal solution (especially for complex multimodal problems).

Nonlinear Programming Problems

Can be solved by using methods of calculus, but it is difficult to solve these problems analytically.
The most popular method is the sequential quadratic programming.
Iterative numerical algorithms are ideal ways to obtain acceptable solutions.

Properties of GAs

A simple way to obtain acceptable solutions.
Do not require the search spaceto be continuous or differentiable or unimodal.
Not deterministic rules so that it can escape from local opimal solutions.
Work with the objective function without gradient, nor other auxiliary information.
GAs usually operate on the coded variables instead of directly on the decision variables themselves. However, it has better performance when the decision variables are binary.

Simple Genetic Algorithm (SGA)

Process of SGA
1. Encoding & Initial Polulation
2. Selection (Roulette Wheel)
3. Crossover (1-point)
4. Mutation
5. Iteration
Simple Example

if the number less than the probability, then change the value of the corresponding bit. (0→1 or 1→0)

The above process can be represented as following:
Things should be noted
- Due to the property of Roulette Wheel, the fitness of each individual should be positive, otherwise, linear translation should be made as a component to make sure the fitness value is positive.
- The probability of mutation should be properly setted so that for every generation, there is at least mutated one mutated bit.
- The total fitness and average fitness can be utilized to quantify the performance of generation.

The computer implementation section is clear and simple in suganthan’s slides, and it can be found everywhere in github. So it will not be metioned in this material.

Fitness Scaling

Motivation: In early generation, some highly fit individuals may dominate the selection and lead to a premature convergence.

Linear Fitness Scaling

Scaled function: g = a f + b g=af+b g=af+b

Where f is fitness function, a and b are parameters which can be calculated by:

Maintaining average fitness values before and after scaling: g a v g = f a v g g_{avg} = f_{avg} gavg=favg

A rule of thumb: g m a x = C f a v g g_{max}=Cf_{avg} gmax=Cfavg, for example, for at most 2 samples in the mating poole just set C=2

Then: a = f a v g ( C − 1 ) f m a x − f a v g a=\frac{f_{avg}(C-1)}{f_{max}-f_{avg}} a=fmax−favgfavg(C−1), b = f a v g ( 1 − a ) b=f_{avg}(1-a) b=favg(1−a)

However, this scaled function may produce negative value in later generations.

One solution is modify it to: a = f a v g f m a x − f a v g a=\frac{f_{avg}}{f_{max}-f_{avg}} a=fmax−favgfavg, b = f a v g ( 1 − a ) b=f_{avg}(1-a) b=favg(1−a)

Fitness scaling is only done in the process of Selection.
Sigma Truncation

This method can be used before linear scaling

f ′ = f − ( f a v g − c σ ) f'=f-(f_{avg}-c\sigma) f′=f−(favg−cσ)

σ \sigma σ is the variance, c is a scale factor usually set around 2-3

Any negative value of f ′ f' f′ is set to zero.
Power Law Scaling

g ( f ) = f α g(f) = f^{\alpha} g(f)=fα

A rule of thumb:
- If 0 ≤ f ≤ 1 0\le f \le 1 0≤f≤1, and if you wish to increase the separation, then 0 < α < 1 0<\alpha<1 0<α<1.
- If 1 < f 1<f 1<f and you wish to increase the separation, then 1 < α 1<\alpha 1<α
Reduce the separation in early generations and increase the separation when close to termination.

Improvement Stategies

Selection

Motivation: In early generations, individual with highly fitness may be too frequently to be selected so that it dominate the population.
- Stochastic Universal Selection
  
  Fix n point on the wheel’s background and rotate the wheel once and get n individuals. n is the size of population.
- Tournament Selection
  
  Every time, 2 (or more) individuals are randomly selected and their fitness values are compared. The best on is included in the mating pool and all the individuals are returned to the population.
  
  Repeat this process until the size of mating pool is enough.
There is some wrong of Ranking Selection in suganthan’s slides, so it is not mentioned here.
Crossover
- 2-point Crossover
  
  Randomly select 2 points insted of 1 point, which can combine more features of individuals.
  
  For example: parents are 110000 and 001111, crossover points are 2 and 4.
  
  The offsprings are 110011 and 001100.
- Uniform Crossover
  
  For each offspring, randomly select the value of its parents. For example, parents are 110000 and 001111
  
  One of the offsprings maybe:010101
Population Replacement

Motivation: In SGA, all the generation is replaced by the offspring, however, the offspring maybe worse than their parents. This strategy is called non-overlapping.
- Elitism
  
  Just copy the best or few best parents into the next generation to replace the worst children or some worst children randomly.
- Steady-State Reproduction
  
  Only replace few members in every generation instead of replacing all the individuals.
- μ + λ \mu+\lambda μ+λ Selection
  
  μ \mu μ offspring and λ \lambda λ parents are chosen as parents in the next generation.
  
  If λ \lambda λ is very small, it is Elitist strategy.

Mathematical Analysis of SGA

Schemas

Use wild card character ‘*’ to represent a subset individuals (strings).

*101 represents the subset {1101, 0101}

If the string length l l l is 5, then there are 3 5 3^{5} 35 different schemas in total (for binary code).
Order of Schema

The order of Schema H is denoted by o ( H ) o(H) o(H), which indicates the number of fixed positions in the string.

E.g. o ( 011 ∗ ∗ ∗ ) o(011***) o(011∗∗∗)=3, o ( 1 ∗ ∗ ∗ ∗ ∗ ) o(1*****) o(1∗∗∗∗∗)=1
Defining Length of Schema

Distance between the first and last fixed positions, denoted as δ ( H ) \delta(H) δ(H).

E.g. δ ( 1 ∗ ∗ 1 ) = 4 − 1 = 3 \delta(1**1)=4-1=3 δ(1∗∗1)=4−1=3, δ ( 011 ∗ 11 ∗ ) = 6 − 1 = 5 \delta(011*11*)=6-1=5 δ(011∗11∗)=6−1=5
Survival & Propagation of Schema

m ( H , t ) m(H,t) m(H,t) is used to represent the number of instances exist in the population at generation t.

101* is more likely to be destroyed than1*** in the evolution process.

1***1 is more likely to be destroyed than 11*** in the crossover process.

Theorem1: Short&Low order schemas with above population average fitness will get exponentially increasing numbers in the subsequent generations.

# The author will introduce Bandit Problem later in PSO algorithm because Bandit problem seems like not important in this section.

GA and Bandit Problem

Just as in the k-armed bandit problem, we allocate exponentially increasing numbers to the best schema.

In general, for schema with order j j j and length l l l, there are C j l C^{l}_{j} Cjl different 2 j 2^{j} 2j-armed bandit problems.

E.g. with l l l=7 and j j j=3, there are C 3 7 C^{7}_{3} C37=35 different 8-armed problems.
Building Block Hypothesis

As we known in Theorem1, the Short&Low order schemas can be called Building Block.

The number of Building Block will increase exponentially so that we can obtain Blocks with higher fitness.

Hypothesis: Building Blocks will combine into Long&High order&High fitness schema and finally get the optimal solution because of Genetic Operator.
GA-Deceptive Problems

If Short&Low order schemas do not include the global optimal solution, the algorithm will get premature convergence, which is called Deceptive Problem.
Minimal Deceptive Problem (MDP)

No order-1 problems is GA-deceptive. The smallest possible GA-deceptive problem is an order-2 problem.

Consider 4 order-2 schemas with 2 fixed positions:

[0**0*] f(00)

[1**0*] f(10)

[0**1*] f(01)

[1**1*] f(11)

We set f ( 11 ) f(11) f(11) as global optimal solution

There are two situations:

Type1: f ( 01 ) > f ( 00 ) f(01)>f(00) f(01)>f(00)

Type2: f ( 01 ) < f ( 00 ) f(01)<f(00) f(01)<f(00)

For type1: GA can almost always get global optimal solution, while for type2 sometimes GA can only get the local optimal solution.

The results shown in suganthan’s slides: