Tuesday, December 29, 2015

How to run learning agents against PyGame


PyGame is the best supported library for programming games in Python. There are 1000's of open source games that have been built with it.
I wanted to do some reinforcement learning neural networks in games and PyGame seemed the best choice.
But I was a bit disappointed that most examples involved hacking the original game files.
I wanted to see if I could write a general framework for running learning agents in PyGame that would require zero touching of the games files.

If you are interested in writing your own implementation of this then read on, but if you just want to dive straight in and start playing with your own learning agents I've uploaded a project here which contains examples of setting up Pong and Tetris.

Getting started

  • You will need Python 2 or 3 installed.
  • You will need to install PyGame which can be obtained here. You can also install it via pip, but this can be tricky in windows.
  • You will need numpy, downloadable from here or again can be installed from pip
  • You will also need a game of some sort to try and set up. Here is a good simple pong game.

The plan


There are 3 things a learning agent needs to be able to play a game:
  1. An array of all the pixels on the screen when ever it is updated.
  2. The ability to output a set of key presses back into the game.
  3. A feedback value to tell it when it is doing well/badly.
We will tackle them in order.

Grabbing the screen buffer


In PyGame there is this handy method which will give us a 3 dimensional numpy array for each colour of every pixel on screen.
 screen = pygame.surfarray.array3d(pygame.display.get_surface())  

We could write our learning agent by hacking the game file and inserting this line into the main loop after the screen has been updated. But a better way to do it(and a way that allows us to have zero touches of the game file) is to intercept any calls to the pygame.display.update method and then grab the buffer then, like so:
 import pygame  
 import pygame.surfarray  

 # function that we can give two functions to and will return us a new function that calls both
 def function_combine(screen_update_func, our_intercepting_func):  
   def wrap(*args, **kwargs):  
     screen_update_func(*args,  
               **kwargs) # call the screen update func we intercepted so the screen buffer is updated  
     our_intercepting_func() # call our own function to get the screen buffer  
   return wrap  

 def on_screen_update():  
   surface_array = pygame.surfarray.array3d(pygame.display.get_surface())  
   print("We got the screen array")  
   print(surface_array)  

 # set our on_screen_update function to always get called whenever the screen updated  
 pygame.display.update = function_combine(pygame.display.update, on_screen_update)  
 # FYI the screen can also be modified via flip, so this might be needed for some games  
 pygame.display.flip = function_combine(pygame.display.flip, on_screen_update)  

You can try this out by inserting this code before you start your game and it will print out the screen buffer as it comes in.

Intercepting key presses


The normal method in PyGame detecting key presses is via this method:
 events = pygame.event.get()  

So we can intercept it and have it return our learning agents key presses:
 import pygame  
 from pygame.constants import K_DOWN, KEYDOWN  

 def function_intercept(intercepted_func, intercepting_func):  
   def wrap(*args, **kwargs):  
     # call the function we are intercepting and get it's result  
     real_results = intercepted_func(*args, **kwargs)
     # call our own function and return our new results  
     new_results = intercepting_func(real_results, *args, **kwargs)  
     return new_results  
   return wrap  

 def just_press_down_key(actual_events, *args, **kwargs):  
   return [pygame.event.Event(KEYDOWN, {"key": K_DOWN})]  

 pygame.event.get = function_intercept(pygame.event.get, just_press_down_key)  

I should also warn you that the pygame.event.get method can be called with args to filter out which events are needed. If your running a game that uses these you will either need to handle them or just use my complete implementation here.

Getting feedback to the player


The final piece of the puzzle is handling the feedback/reward from the game. Unfortunately there is no standard way of doing scoring in PyGame so this will always require some amount of going through the game code, but it can still be done with zero touches.

For the pong game the scores are stored in two global variables bar1_score and bar2_score, which can be imported. Our reward is when the score changes in our favor.

 last_bar1_score = last_bar2_score = 0  

 def get_feedback():  
   global last_bar1_score, last_bar2_score  

   # import must be done inside the method because otherwise importing would cause the game to start playing  
   from games.pong import bar1_score, bar2_score  
   # get the difference in score between this and the last run  
   score_change = (bar1_score - last_bar1_score) - (bar2_score - last_bar2_score)  
   last_bar1_score = bar1_score  
   last_bar2_score = bar2_score  
   return score_change  

But for other games, such as Tetris there may not be a globally scoped score variable we can grab. But there may be a method or a set of that methods that we know are good/bad. Such as a player_takes_damage, level_up or kill_enemy. We can use our function_intercept code from before to grab these. Here is an example in Tetris using the result of removeCompleteLines to reward our agent:
 import tetris  

 new_reward = 0.0
 def add_removed_lines_to_reward(lines_removed, *args, **kwargs):  
   global new_reward  
   new_reward += lines_removed  
   return lines_removed  

 tetris.removeCompleteLines = function_intercept(tetris.removeCompleteLines,  
                            add_removed_lines_to_reward)  

 def get_reward():  
   global new_reward  
   temp = new_reward  
   new_reward = 0.0  
   return temp  

Dealing with frame rates


One final issue that you may need to consider is that the learning agent will significantly impact the execution speed of the game. In a lot of games the physics is scaled by the elapsed time since the last update. If your agent takes 1 second to process a single frame in pong then in the next update loop the ball will have already passed off the screen. The agent may also struggle to learn if there is significant variance in the movement of different frames.
This can be handled by intercepting the pygame.time.get_ticks method and pygame.time.Clock in the same way as we have the other functions. See this file for details.

Pong in PyGamePlayer


Now all that remains is too stitch all those parts together and plug in the learning agent. In my project I've chosen to do this in a class, but it would be fine as a script.
Below is an example of the full thing set up to learn against Pong using the PyGamePlayer module. The PongPlayer simply needs to inherit from the PyGamePlayer class and implement the get_keys_pressed and get_feeback methods, the framework handles everything else.
 from pygame.constants import K_DOWN  
 from pygame_player import PyGamePlayer  

 class PongPlayer(PyGamePlayer):  
   def __init__(self):  
     """  
     Example class for playing Pong  
     """  
     super(PongPlayer, self).__init__(force_game_fps=10) # we want to run at 10 frames per second
     self.last_bar1_score = 0.0  
     self.last_bar2_score = 0.0  

   def get_keys_pressed(self, screen_array, feedback):  
     # The code for running the actual learning agent would go here with the screen_array and feeback as the inputs
     # and an output for each key in the game, activating them if they are over some threshold.
     return [K_DOWN]  

   def get_feedback(self):  
     # import must be done here because otherwise importing would cause the game to start playing  
     from games.pong import bar1_score, bar2_score  

     # get the difference in score between this and the last run  
     score_change = (bar1_score - self.last_bar1_score) - (bar2_score - self.last_bar2_score)  
     self.last_bar1_score = bar1_score  
     self.last_bar2_score = bar2_score  
     return score_change

 if __name__ == '__main__':  
   player = PongPlayer()  
   player.start()
   # importing pong will start the game playing  
   import games.pong  

So hazar! We now have the worlds worst Pong AI.

In my next post I'll go through writing a good reinforcement learning agent for this.
If you have any questions/correction please don't hesitate to contact me.

Full source code here.

8 comments:

  1. What's the use of the FixedFPSClock class?
    It does not seem to cause any delay. I changed the line grabbing the screen buffer to just initializing it to an empty array, and my game started to run at an insane fps. I think the major delay is caused by the large copying operation rather than the tick which is actually supposed to cause the delay.
    I think a solution would be to make a separate intercepting function for Clock.tick() which changes the argument of the call. What do you think?

    ReplyDelete
    Replies
    1. Thanks for commenting :) I think I've probably named it badly. The idea is that even if the real frame rate is say 300 frames per second the FixedFPSClock will make it "seem" to the AI/game like it is running at whatever the desired_fps is.
      As you say if you can run at 300 fps through some optimization or because you have a beast of a computer, you want to do that so the AI trains quicker, but you want it to still seem to the AI like it is running at normal frame rate so successive frames are both significant and consistent. Does that make sense? Can you think of a better name I could use?

      Delete
    2. Oh okay so causing delay is not the purpose. That clears it up, thanks. I think just mentioning this fact in the docstring should be enough.
      But when someone would like to, say, demo the AI, there is no mechanism to control the fps. I think this would be a great addition to the class. I'll try to add this and send you a pull request. :)

      Delete
    3. Cool, I've tried to clear up the doc strings so the functionality makes a bit more sense.
      I've also added this feature, the param is called run_real_time.
      Though if you want to make your own changes and do a pull request that would be awesome :)

      Delete
  2. Thanks for sharing the code, this is really great! Is it possible to suppress the display or do some other modification so that pygame can be used on AWS? One gets the following error on AWS:

    screen = pygame.display.set_mode((640,480),0,32)
    pygame.error: Unable to memory map the video hardware

    ReplyDelete
    Replies
    1. Hi Anthony, glad you like it :)

      Running without display would be a really useful feature. I looked into it briefly when I first wrote the code and couldn't find an easy way to do it. I will try and have a longer look next weekend.

      Delete