[pio-develop] Core of the Genetic Algorithm committed
Rodrigo Espiga Gómez <rodrigoespiga <at> hotmail.com>
2014-01-29 20:40:58 GMT
Hi Roland and everyone!
I've been somewhat missing those months, and one of the reasons is that I had to change the way my algorithm will decide how many turns it will take it to perform a given action. I realized that the way it did it before was absolutely wrong most of the time, and I've been scratching my head quite a while to come to an acceptable solution.
I was using Average Resource Supply as a measure of the resources obtained each 36 turns, but without having in mind that is not the same, for example, having one settlement near a 6 and another near an 8 than having a city near an 8, even though the AVR would be the same in both cases (10). Many other cases share the same AVR while the probability of getting different resources is completely different. Also the probabilty of getting a certain combination did not account for the fact that each resource is NOT an independent event, so depending of the source of that AVR the calculations were completely misleading for the algorithm.
There is a mathematical way of solving the problem, but it takes so much computer time to solve it that, having in mind is a calculation that has to be made about a hundred of times each turn, is no realistic to use it. At least not during the evolution of the algorithm. Maybe when every variable is fixed. Maybe.
In this new approach the AVR is replaced by a resourceSupplyMatrix that will hold the amount of every resource I get from every dice outcome, from 2 to 12. Using a simulation it will use it to calculate the number of turns needed to perform every action, single or doubled. As I explained when I post the old code the actions are:
SET - Settlement
CIT - City
DEV - Development Card
RSET - Road to Settlement, it means building a Road leading to a node and then a Settlement in that node
RRSET - Long Road to Settlement, idem to RSET but with two Roads and a Settlement
and all the combinations of two of them.
The function numberOfTurnsForProbability will simulate MAX_SIMS simulations of dice rolls for every turn, and will calculate in how many of those simulations the player would get the resources needed for every possible action. It will go on simulating turn after turn, checking whether enough simulations (a percentage of the total determined by probability) fulfill the requirements at a particular turn for performing a particular action. At that point, numberOfTurnsForProbability for that action will be set to the current turn being simulated. At the end of the simulations turnsToAction will hold the number of turns needed to perform every possible action (single and double).
Probability will be evolved through the Top Level G.A., and it will basically say how greedy or how foreseeing the player is. The higher the probability, the more it will value the actions it can do now as opposite of those it will be able to do in the future (because from its point of view, that "future" will take longer to come).
The function bestStrategy will use numberOfTurnsForProbability to set turnsToAction, and given the value given to every action in actionValue and a Turn number will dictate the best strategy. If the action chosen is RSET or RRSET and it's possible to build the Road (or Roads) right now it will say so too.
It's important to notice that actionValue values will be derived from a number of factors not present in the current impementation, for simplicity sake. First of all, each location in the map is given a certain value through a resourcesValueMatrix, that will give a certain weight to every kind of resource
at every point in the game (VP of the player), and a depreciation constant k, that will decrease the value given to resources I have as opposite to resources I don't have. These are values that will evolve through the Top Level Genetic Algorithm, and will cause that certain players value certain resources more than others, or at different moments of the game. A high depreciation constant k will make the player try to get resources sources it does not have more eagerly than those it already posses. With a low k, that will not make any difference to it.
After all those calculations, the best possible Settlement, City, Settlement after a Road and Settlement after two Roads values will be set in actionValue.
Knowing actionValue and the time needed to perform each one of them, it can calculate, for any given turn, the expected profit of doing every possible action. That is what strategyProfit calculates. Here Turn is again something that will be evolved through the Top Level GA, and in combination with Probability will make different players behave in different ways.
Comparing different courses of action, bestStrategy will return the pair of preferred actions to perform, and the expected profit of that strategy at a given turn.
There is yet another function, updateTradingMatrix, that using bestStrategy with modified resources calculates if there is a way of improving my profit after doing a 1:1 trade, or a 4:1 trade.
As a summary, values that will be evolved through the top level algorithm (every player will have its own, and evolution will decide which ones are better) are:
Probability (the higher, the geedier)
Depreciation constant k
ResourcesValueMatrix (in case you were wondering, DEV value will come from this, too)
You can compile and run the program. It will give you the option of inputting actionValue, your actualResources, and a value for probability. resourcesSupply matrix is based on a fictional state of a game, and can be modified in the code if you want (it would be tedious to input that on every run). You have the option of seeing how the simulation modifies conditionsMet matrix to see how it works (only 30 of the 1000 simulations are actually shown).
The program has been checked on a 1920X1080 display, and on smaller screens can (will) output in a weird way. Sorry.
I appreciate any suggestions about the best way of implementing this behaviour in the genetic.c AI. For me it would be enough right now to get it work with a set of fixed values first, and then I will have to use and external file to input a different "chromosome" to every player and "Let'em fight!" I'm still struggling with the code of greedy.c to understand how it makes it play ;)
Thank you to every one with enough patience to read this long post!
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
Pio-develop mailing list
Pio-develop <at> lists.sourceforge.net