Modelling with GPdotNET

This tutorial shows how you can calculate model from experimental data using GPdotNET. Experimental data contains input parameters and output variable. For now GPdotNET support only one output variable. Regarding the maximum number of variables GPdotNET support up to 1000 input variables. In previous post it was described what format of the experimental data needs to be. So create csv file with experimental data, as well as data for prediction, to evaluate how model describes data out from experimental data range.

Suppose we have experimental data which needs to be modeled. First, start GPdotNET, and from File menu choose New, like picture below.

Select Data Tab item, and click on Load Testing Data button on the bottom of the main window. From the disk find csv file, which you previously prepared. You can also load testing data for prediction.

We loaded experimental data in GPdotNET for modeling with genetic programming. Next we need to prepare GP parameters, as well as Function data set.

Choose Settings Tab item, you can see various GP parameters needs to be correctly defined.

First, define population size. Default value is 500, but you can defined other value. For this tutorial define population size to be equal 3000. Choose selection method, initialize method generation, as well as fitness function.

Choose Fitness proportionalte selection method for this example. You can see more info about selection in GP, as well as general information about genetic programming at

Choose Initialization method ramped half and half. This method is mixed method of two others.

Maximum tree depth is group of value for defining S-Expression during the initialization and mating of individuals in population. Depending of these value GPdotNET runs algorithm consuming very large memory and processors time. So be carful when defining these parameters.

For example: If you have function set taking only two arguments, one chromosome which have 15 depth tree needs to create  2^15 – 1=  32 767 nodes. And with a population size of 1000, you can imagine how large memory have to be consumed. You also have limitation of creating object in .NET to 2GB, so these consideration must be taken during defining GP parameters.

GPdotNET support parallel processing with ParallelFX new features in .NET 4.0, so you can speed up algorithm approximately 3 time faster, on quad core processor. The reason of why the speed is not bigger (4 times). The answer is maybe be in the future :).

The random constants is very important to to GP, so you can generate as many as you wish.

Probability of GP operation is trivial. Give probability for crossover  near to 1 (100%), but for mutation and other about to 0.2 (20%).

The parameters have been defined, so lets define function set in Function tab item. GPdotNET support 50 primitive function and distributions. So you can choose any of these. For this tutorial choose the first 3 function, Add3, Sub3, Mul4 and sqrt. You can select or deselect function by checking check button cell in DataGridView control. Increase weight attribute for those function which you want to appear in GP model more than other function.  So for + function we defined weight 4. That mean that “+” have 4 times greater probability that function with 1 weight.

Choose Run tab item. To run algorithm you have to choose terminate condition. You can choose to run number of generation, or you can say run algorithm until you get fitness greater than specified in edit box, or RSquare,. For this tutorial choose envolve until Generation number is equal 500.

On the right size you can seed two chart controls for fitness simulation and model simulation.

Ruining algorithm by pressing the Start button. Stop algorithm by pressing Stop button. During the run, simulation show how algorithm is close to experimental data, by showing in every generation best solution.

During the run, you can also see how current best solution describe prediction, by choosing Model tab item, and Prediction sub tab item.

To see current best model in S-Expression form choose Model sub tab item, click View S-Expression button.

After some time depending of your problem, you can see result. In this example i found best solution for about two hour. The picture below show S-Expression of my solution. GP algorithm is stochastic method so you cant get the same solution if you repeat run, except for simple problems.


X1, X2, … represent input variable, and R1, R2,… represent random constants.

The next post you will learn how to model time series data.


GPdotNET – Quick Tour

GPdotNET is tree based genetic programming application for solving problems based on Symbolic Regression. GPdotNET can be applied in various engineering problems of modeling and optimization, as well as Time Series modeling. Project contains C# library with Genetic programming implementation algorithms and Windows Forms application for graphical and visual results presentation. GPdotNET also support parallel processing for multicore processors based on ParallelFx library.

GPdotNET Quick Tour

GPdotNEt requires data to be stored in CSV format, which you can load in application. You can also load data for testing prediction model. The following picture shows Data dialog for loading training and testing data. Loaded data are presented in tabular and graphical manner.

Data format

The format of data which you can load in to  an application are in CSV format. The following picture show data format in notepad editor. Each column is split with semicolon, and each row with new line.


Note that decimal separator is comma (European standard). During of loading data for training and testing, program creates model with the last colum as an output variable, and others as input. From training data GPdotNET define variable for Terminal set as well.

Settings parameters of Genetic programming

GP has various parameters which have to be correctly set. The following picture show GP parameters GPdotNET supports.

Selection Methods in Genetic programming

Regarding selection methods in GP, GPdotNET supports 6 kind of selection:

  1. Rank
  2. Roulete
  3. Tournment
  4. SUS
  5. FUS
  6. SS

Tournment and SS (Skrgic selection) contain additional parameter, which can be specified by the user.

Initialization methods

Initialization methods supported with GPdotNET

1. Full

2. Grow

3. Ramphed – Half & half

Primitive computers programs

Every genetic programming algorithms contains set of primitive computers programs which user can choose for model. GPdotNET support near 50 function which can be included in GP model. User can easily choose which programs can be included in GP model by checking Selected column. The weight column  defined different probability for choosing function during of run the GP.

Running GP algorithm

When you setup all information needed for running GP, click Run button, and algorithm is starting. On this Tab page (see picture below) you can see information about:

1. Current generation

2. Currently best fitness

4. Maximum fitness

5. Average fitness in population


Involve until is option how to run program regarding fitness value or generation number. In combo box option you can choose:

Involve until:

1. Generation number

2. Fitness >=

Based on the option, you can input value in edit control.

You can also see some time specific information about average time left to program completes (if you select involve until generation number).

Presenting the results

GPdotNET is multithreaded application, so during the program runs, you can see result model, as well as result in S-Expression.

Click on View S- expression button to see result.


Testig GP model

If you loaded test data in Data Tab page, when program finish searching for the best chromosome, you can see prediction based on the current result. See picture below. You can also see the prediction of testing data during the program runs.

There will be more in the next post, which we will put some tutorials about modelling in GPdogtNET.

GPdotNET na


Projekat koji nekoliko godina razvijam za potrebe postdiplomskog studija, a koji primjenjuje metodu genetskog programiranja u rješavanju inženjerskih problema modeliranja i optimizacije. Projekat je postavljen na u open source verziji, i vjerujem da će biti od pomoći kako inženjerima u korištenju pri modeliranju, a tako i programerima u daljnjem razvoju i načinu implementacije algoritama umjetne inteligencije.

Više informacija o Genetskom programiranju možete naći na članku o Genetskom programiranju , kao i raznim izvorima na internetu. Tvorac i kreator ove metode je John Koza, koji je  1990 objavio ovu metodu: A Paradigm for Genetically Breeding Populations of Computer Programs to Solve Problems. Stanford University Computer Science Department technical report STAN-CS-90-1314. June 1990.

Sve informacije vezane za ovaj projekat možete pronaći na GPdotNET stranici bloga, a source code i aplikaciju na codeplex stranici.