This tutorial shows how you can calculate model from experimental data using GPdotNET. Experimental data contains input parameters and output variable. For now GPdotNET support only one output variable. Regarding the maximum number of variables GPdotNET support up to 1000 input variables. In previous post it was described what format of the experimental data needs to be. So create csv file with experimental data, as well as data for prediction, to evaluate how model describes data out from experimental data range.
Suppose we have experimental data which needs to be modeled. First, start GPdotNET, and from File menu choose New, like picture below.
Select Data Tab item, and click on Load Testing Data button on the bottom of the main window. From the disk find csv file, which you previously prepared. You can also load testing data for prediction.
We loaded experimental data in GPdotNET for modeling with genetic programming. Next we need to prepare GP parameters, as well as Function data set.
Choose Settings Tab item, you can see various GP parameters needs to be correctly defined.
First, define population size. Default value is 500, but you can defined other value. For this tutorial define population size to be equal 3000. Choose selection method, initialize method generation, as well as fitness function.
Choose Fitness proportionalte selection method for this example. You can see more info about selection in GP, as well as general information about genetic programming at www.genetic-programming.org
Choose Initialization method ramped half and half. This method is mixed method of two others.
Maximum tree depth is group of value for defining S-Expression during the initialization and mating of individuals in population. Depending of these value GPdotNET runs algorithm consuming very large memory and processors time. So be carful when defining these parameters.
For example: If you have function set taking only two arguments, one chromosome which have 15 depth tree needs to create 2^15 – 1= 32 767 nodes. And with a population size of 1000, you can imagine how large memory have to be consumed. You also have limitation of creating object in .NET to 2GB, so these consideration must be taken during defining GP parameters.
GPdotNET support parallel processing with ParallelFX new features in .NET 4.0, so you can speed up algorithm approximately 3 time faster, on quad core processor. The reason of why the speed is not bigger (4 times). The answer is maybe be in the future :).
The random constants is very important to to GP, so you can generate as many as you wish.
Probability of GP operation is trivial. Give probability for crossover near to 1 (100%), but for mutation and other about to 0.2 (20%).
The parameters have been defined, so lets define function set in Function tab item. GPdotNET support 50 primitive function and distributions. So you can choose any of these. For this tutorial choose the first 3 function, Add3, Sub3, Mul4 and sqrt. You can select or deselect function by checking check button cell in DataGridView control. Increase weight attribute for those function which you want to appear in GP model more than other function. So for + function we defined weight 4. That mean that “+” have 4 times greater probability that function with 1 weight.
Choose Run tab item. To run algorithm you have to choose terminate condition. You can choose to run number of generation, or you can say run algorithm until you get fitness greater than specified in edit box, or RSquare,. For this tutorial choose envolve until Generation number is equal 500.
On the right size you can seed two chart controls for fitness simulation and model simulation.
Ruining algorithm by pressing the Start button. Stop algorithm by pressing Stop button. During the run, simulation show how algorithm is close to experimental data, by showing in every generation best solution.
During the run, you can also see how current best solution describe prediction, by choosing Model tab item, and Prediction sub tab item.
To see current best model in S-Expression form choose Model sub tab item, click View S-Expression button.
After some time depending of your problem, you can see result. In this example i found best solution for about two hour. The picture below show S-Expression of my solution. GP algorithm is stochastic method so you cant get the same solution if you repeat run, except for simple problems.
X1, X2, … represent input variable, and R1, R2,… represent random constants.
The next post you will learn how to model time series data.