Monday, May 4, 2015

Particle Swarm Optimization in F#

Why is particle swarm optimization good?

Lets say you have a function that takes an array of inputs and produces a single output. Your have an objective, you want to find what input results in the lowest possible output for this function. If the function were differentiable you could just compute the gradient and follow it back to reach your objective, but in this hypothetical scenario it's not. If the range of inputs were quite small you could just search the entire input space to find the best output, but again not in this scenario. If you've reached this point, this is when particle swarm optimization(referred to as PSO from here on) might be a good idea.

When would you use it?

One example might be a function runs a simulation of how an aircraft flies. The inputs are the wing span, tail length, different kinds of engines that can be used, etc. The output is a how much fuel the aircraft uses in the simulation. Or perhaps you could have functions simulating an economic model.

What is it?

In PSO you have many "particles" which will each execute the function with their own set of parameters for it. After each execution, each particle will look at the result they got and attempt to find better input parameters. They do this by both looking at there own best result, attempting to move towards that, and also the best result of all the other particles. In this way the particles search through the input space but in such a way that they prioritize successful areas. Also with there being many particles searching each with different sets of parameters, you hope to avoid problems of local minima.

 velocity = inertia_weights * previous_velocity  
         +constant1*random1*(local_best_parameters-parameters[time-1])   
         + constant2*random2*(global_best_parameters-previous_parameters)  
 parameters[time] = parameters[time-1]+velocity[time]  

Here inertia_weight is just an array of number, all most likely between 0 and 1 used to dampen the previous velocity, to stop the particles heading off into the middle of nowhere. Also often velocity is constrained by a maximum value. constant1 and constant2 are used to control how much the particle is affected by the local and global bests. Normally they are set to between 1 and 2.

Is there any intuitive way of understanding this?

The thing that works for me is imagining a collection of bugs, one for each particle. Each bug is trying to find some food in an area. They can't see the food, but there sense of smell tells them how close to it they are, this is the loss function and there position is the parameters for the function. Imagine all the bugs spread out in the area, all moving around it. Each iteration they reconsider what direction they are going? Based on there smell they consider is my current position better than the best position I have ever been if it isn't they adjust there direction to move towards that. But these bugs are also social creatures and have there hive mind, so they also consider how does my current position compare to the best position any of the bugs have ever reached and adjust to move towards that. Over time each bug is getting a better and better personal best and the global best is also slowly improving. Imagine them swarming across the area gradually grouping more and more tightly around the successful points. Hope that makes things less not more confusing...

Why not use a genetic algorithm?

There is some overlap, genetic algorithms more naturally fit with discreet variables(e.g. bool) vs PSO fits better with continuous. But either can be modified to take the full range. I think when working with continuous variable PSO will work out to be more efficient in finding good solutions, but this is probably massively variable depending on the data. This is one of those annoying situations in machine learning where you have to try things and just see what looks like it's working.

Why F#?

The best thing about F# is how easy is to write easy to read parallel code and the lightweight syntax. PSO is can achieve massive benefits from running in parallel. Also my day job is mostly working in C#, Java and Python so F# makes a nice change of pace.

Show me the code

I wanted to write all the code in a pure functional style. This means aiming for all functions being deterministic, the output is solely dependent on the input. No side effects or state. For the PSO algorithm there are a lot of variables such as constant 1, constant 2, inertia weight, etc. These can all be set as class variables, but because I wanted to this to be pure they should all be passed in as arguments to functions that use them. This would mean a lot of messy long functions, so I made an args type that would hold all these variables.

 type Args(inertia_weight : list<float>, ?particles : int, ?iterations : int, ?max_velocity : float, ?success_threshold : float, ?c1 : float, ?c2 : float) =  
   member this.inertia_weight = inertia_weight  
   member this.particles = defaultArg particles 10  
   member this.iterations = defaultArg iterations 100  
   member this.max_velocity = defaultArg max_velocity 10.0  
   member this.success_threshold = defaultArg success_threshold 0.0001  
   member this.c1 = defaultArg c1 2.0  
   member this.c2 = defaultArg c2 2.0  

Next we have the particle class. Note I went for lists as the type for all the variables. I didn't want to use Arrays because they are not immutable. Sequences are the most functionally pure choice, but in reality they are quite a lot slower than lists. Also structuredFormatDisplay is used so they print nicely.

 [<StructuredFormatDisplay("Particle {Parameters}")>]  
 type Particle =  
   val Parameters : list<float>  
   val Velocity : list<float>  
   val Local_best : list<float>  
   val Local_best_loss : float  
   new(parameters, velocity, local_best, local_best_loss) =   
     { Parameters = parameters; Velocity = velocity; Local_best = local_best; Local_best_loss = local_best_loss }  

Here is the main algorythm

 let update_particle (args : Args) loss_func (particle : Particle) (global_best : list<float>) : Particle =  
   let limit_velocity velocity max_velocity =  
     velocity |&gt; List.map (fun x -&gt; if x &gt; 0.0 then  
                       min x max_velocity  
                     else  
                       min x -max_velocity)  
   let random = new System.Random()  
   let r1 = random.NextDouble()*args.c1  
   let r2 = random.NextDouble()*args.c2  
   
   let velocity = (List.map2 (fun w v -&gt; w*v) args.inertia_weight particle.Velocity, // multiple last velocity by inertia weight  
           List.map2 (fun l p -&gt; r1*(l-p)) particle.Local_best particle.Parameters, //get attraction of local best  
           List.map2 (fun g p -&gt; r2*(g-p)) global_best particle.Parameters)//get attration of global best  
           |||&gt; List.map3 (fun x y z -&gt; x+y+z)//add the result of these 3 calculations together  
           |&gt; limit_velocity &lt;| args.max_velocity //limit velocity by max  
   
   let new_parameters = (particle.Parameters, velocity) ||&gt; List.map2 (fun x y -&gt; x + y)  
   let new_loss = loss_func new_parameters  
   
   if new_loss &lt; particle.Local_best_loss then  
     Particle(new_parameters, velocity, new_parameters, new_loss)   
   else   
     Particle(new_parameters, velocity, particle.Local_best, particle.Local_best_loss)  

For PSO we need to run this on a full collection of particles like so


 let update_particles (args : Args) (particles : list<Particle>) (global_best_params : list<float>) (global_best_loss : float) loss_func : list<Particle> * list<float> * float =    
   let updated_particles = particles |> List.map (fun x -> update_particle args loss_func x global_best_params)  
   
   let best_from_this_iteration = updated_particles |> List.minBy (fun x -> x.Local_best_loss)  
   
   if global_best_loss < best_from_this_iteration.Local_best_loss then  
     (updated_particles, global_best_params, global_best_loss)  
   else  
     (updated_particles, best_from_this_iteration.Local_best, best_from_this_iteration.Local_best_loss)  

A method for running this loop until our stop condition is reached

 let rec private run_until_stop_condition (args : Args) (particles : list<Particle>) (global_best_params : list<float>) (global_best_loss : float) loss_func iterations_to_run =  
   let stop_condition (args : Args) iterations global_best_loss =  
     iterations <= 0 || global_best_loss <= args.success_threshold  
   
   let (new_particles, new_global_best_params, new_global_best_loss) = update_particles args particles global_best_params global_best_loss loss_func  
   let new_iterations_to_run = iterations_to_run - 1  
   
   if stop_condition args iterations_to_run new_global_best_loss then  
     (new_global_best_params, new_global_best_loss, new_particles)  
   else  
     run_until_stop_condition args new_particles new_global_best_params new_global_best_loss loss_func new_iterations_to_run  

Now example usage looks like this.

 let target_point = [23.0;54.0]  
 let loss_func (x : list<float>) = x |> List.map2 (fun x y -> sin 1.0/x-sin 1.0/y) target_point |> List.sum |> abs  
 let random = new System.Random()  
 let initial_weights = seq{ while true do yield [random.NextDouble()*1000.0;random.NextDouble()*1000.0]};  
   
 let args = Args(inertia_weight=[0.8; 0.8], particles=10, success_threshold=0.01, iterations = 10);  
 let (global_best_params, global_best_loss, particles) = execute args loss_func initial_weights;;  


Here is a link to the full code on git hub. In the next post I look at options for parallelizing the code.

2 comments:

  1. Hello Daniel,

    it looks like the 'next post' link here points back to this document instead of the parallelizing entry. Not a big deal as it can be reached higher up on the right, but I just thought I would let you know.

    Thanks,
    Kelly

    ReplyDelete