Step by step CNTK Object Detection on Custom Dataset with Python

Recently, I was playing with CNTK object detection API, and produced very interesting model which can recognize the Nokia3310 mobile phone. As you probably already know Nokia3310 is legendary mobile phone which was popular 15 years ago, and recently re-branded by Nokia.

In this blog post I will provide you with step by step introductions how to:

  • prepare images for training
  • generate training data for selected images by using VOOT tool,
  • prepare Python code for object detection using FasterRCNN alogirithm implemented with CNTK,
  • testing custom image in order to detect Nokia3310 on image.

Preparing Image for model training

Finding appropriate images for our model is very easy. Just go to and type “Nokia3310” and bum, there are plenty of images.

Find at least 20 images, and put into the Nokia3310 image folder. Once we collect enough image for the model, we can move to the next step.

Generating data from the image data set using VOTT tool

In order to train image detection model by using FasterRCNN algoritm, we have to provide three kinds of data separated in three different files:

  1. class_map file – which contains list of available objects which the model should recognize on the image,
  2. train_image file – which contains the list of image file paths
  3. train roi file – which contains “region of interest” data. The data is consisting of list of 4 numbers which represent the top, left, right and bottom coordinate producing rectangle of the object.

Seems pretty much job for simple object detection, but hopefully there is a tool which can generate all data for us. It is called  VoTT: Visual Object Tagging Tool, and it can be found at :

Generating Image data with VOTT

Here we will explain in detail how to generate image data by using VOTT tool.

1. Open VOTT tool, from File menu and select folder we previously collected with images.

2. Enter “nokia3310”  in Labels edit box and click Continue button. In case we have more than one

3. Then for each image, make a rectangle on each object which represents the Nokia3310.

4. Once you finish with tagging for one image, press Next, and do the same for all selected images.

5. Once the process of tagging is finished, then the export action can be performed.

6. With Export option data is generated for each rectangle we made, and the two files are generated for each image in data set. Also once the tagging process is completed VOTT tool generated three folders:

a) negative – contains images which have no any tagged rectangle (no nokia3310 on images),

b) positive – contains approximate 70% of all images which we tagged Nokia3310 object, and this folder will be used for training the model,

c) testImages – contains approximate 30% of all images which we tagged Nokia3310 object, and this folder will be used for evaluation and testing the model.

The VOOT classified all images in three folders. In case there are images with no tagging, images will be moved to negatives, all other images is separated into positive and testImages folder.

From each image two files are generated:

[imagename].bboxes.labels.tsv – which consist of all labels tagged in image file.

[imagename].bboxes.tsv – rectangle coordinates of all tags in the image.

Processing VOTT generated data into CNTK training and testing dataset files

Once we have VOTT generated data, we need to transform them into cntk format. First we will generate: class_map file.txt

7. Create new “class_map file.txt”  file, and put the following text into it:

__background__	0
Nokia3310	1

As can be seen there is only one class which we want to detect, and ti is Nokia3310, (the __backgroud__ is reserved tag which is added by default and cannot be removed). Now we need to generate the second file:
8. Create new “train_image_file.txt” file, and put text similar with this one:

0 positive/img01.jpg 0
1 positive/img05.jpg 0
2 positive/img10.jpg 0

The content of the file is list of all images placed in positive folder, with ID on the left side and zero on the right side, separated by tabulator. Image path should be relative.
9. Create new “train_roi_file.txt”, and put data similar with this one:

0 |roiAndLabel 10	418	340	520 1
1 |roiAndLabel 631	75	731	298 1
2 |roiAndLabel 47	12	222	364 1
3 |roiAndLabel 137	67	186	184 1	188	69	234	180 1

As can be seen first four numbers are rectangle coordinates, which follow the 1 number indicates classValue. Since we have only one class 1 is always after 4 numbers. Also in case image contains more than one rectangle which is the case of line 3, after every four  numbers it goes class value.

This is procedure how can we make three files for training, needed to run CNTK object detection. Also for testing data we need image and ROI files. Whole data set and corresponded files can be found on GitHub page.

Implementation of Object Detection

CNTK comes with example how to implement object detection which can be found at:

So I took the source code from there, and modify it for my case, and published at git hub which can be found here.

10. Before downloading source code, be sure the CNTK 2.3 is installed on your machine with Anaconda 4.1.1, in the environment with Python 3.5 version.

11. Clone the Github repository and open it in Visual Studio or Visual Studio Code.

12. First thing you should do is to download pre-trained “Alex net” model. You can easily download it, by running the python code placed in PretrainedModels folder.

13. Process of training is started when you run python file. Beside pre-trained model, no other resources are required in order to run the project. The folowing picture shows main parts of the solution.

Once the training process is finished, once image is evaluated and shown in order to evaluate how model is good in detecting the phone. Such image is shows at the beginning of the blog post.

All source code with image dataset you can download from GitHub at


Using CNTK 2.2 and Python to learn from Iris data

Now that we have setup CNTK 2.2 and Python we can start with first example. For the first time, we can take the Iris data. The data set has categorical output value which contains three classes : Sentosa, Virglica and Versicolor. The features consist of the 4 real value inputs. The Iris data set can be easily found on  the internet. One of the places is on

Usually, the Iris data is given in the flowing format:

Since we are going to use CNTK we should prepare the data in cntk file format, which is far from the format we can see on the previous image. This format has different structure and looks like on the flowing image:

The difference is obvious. To transform the previous file format in to the cntk format it tooks me several minutes and now we can continue with the implementation.

First, lets implement simple python function to read the cntk format. For the implementation we are going to use CNTK MinibatchSource, which is specially developed to handle file data. The flowing python code reads the file and return the MinibatchSource.

import cntk

# The data in the file must satisfied the following format:
# |labels 0 0 1 |features 2.1 7.0 2.2 - the format consist of 4 features and one 3 component hot vector
#represents the iris flowers
def create_reader(path, is_training, input_dim, num_label_classes):

#create the streams separately for the label and for the features
labelStream ='label', shape=num_label_classes, is_sparse=False)
featureStream ='features', shape=input_dim, is_sparse=False)

#create deserializer by providing the file path, and related streams
deserailizer =, = labelStream, features = featureStream))

#create mini batch source as function return
mb =, randomize = is_training, max_sweeps = if is_training else 1)
return mb

The code above take several arguments:

-path – the file path where the data is stored,

-is_training – Boolean variable which indicates if the data is for training or testing. In case of training the data will be randomized.

– input_dim, num_label_classes are the numbers of the input features and the output hot vector size. Those two arguments are important in order to properly parse the file.

The first method creates the two streams , which are passed as argument in order to create deserializer, and then for minibatchsource creation. The function returns minibatchsource object which the trainer uses for data handling.

Once that we implemented the data reader, we need the python function for model creation. For the Iris data set we are going to create 4-50-3 feed forward neural network, which consist of one input layer with 4 neurons, one hidden layer with 50 neurons and the output layer with 4 neurons. The hidden layer will contain tanh- activation function.

The function which creates the NN model will looks like on the flowing code snippet:

#model creation
# FFNN with one input, one hidden and one output layer 
def create_model(features, hid_dim, out_dim):
    #perform some initialization 
    with cntk.layers.default_options(init = cntk.glorot_uniform()):
        #hidden layer with hid_def number of neurons and tanh activation function
        h1=cntk.layers.Dense(hid_dim, activation= cntk.ops.tanh, name='hidLayer')(features)
        #output layer with out_dim neurons
        o = cntk.layers.Dense(out_dim, activation = None)(h1)
        return o

As can be seen Dense function creates the layer where the user has to specify the dimension of the layer, activation function and the input variable. When the hidden layer is created, input variable is set to the input data. The output layer is created for the hidden layer as input.

The one more helper function would be showing the progress of the learner. The flowing function takes the three arguments and prints the current status of the trainer.

# Function that prints the training progress
def print_training_progress(trainer, mb, frequency):
    training_loss = "NA"
    eval_error = "NA"

    if mb%frequency == 0:
        training_loss = trainer.previous_minibatch_loss_average
        eval_error = trainer.previous_minibatch_evaluation_average
        print ("Minibatch: {0}, Loss: {1:.4f}, Error: {2:.2f}%".format(mb, training_loss, eval_error*100))   
    return mb, training_loss, eval_error

Once we implemented all three functions we can start with CNTK learning on the Iris data.

At the beginning,  we have to specify some helper variable which we will use later.

#setting up the NN type
hidden_dim = 50
input = cntk.input_variable(input_dim)
label = cntk.input_variable(num_output_classes)

Create the reader for data batching.

# Create the reader to training data set
reader_train= create_reader("C:/sc/Offline/trainData_cntk.txt",True,input_dim, num_output_classes)

Then create the NN model, with Loss and Error functions:

#Create model and Loss and Error function
z= create_model(input, hidden_dim,num_output_classes);
loss = cntk.cross_entropy_with_softmax(z, label)
label_error = cntk.classification_error(z, label)

Then we defined how look like the trainer. The trainer will be with Stochastic Gradient Decadent learner, with learning rate of 0.2

# Instantiate the trainer object to drive the model training
learning_rate = 0.2
lr_schedule = cntk.learning_parameter_schedule(learning_rate)
learner = cntk.sgd(z.parameters, lr_schedule)
trainer = cntk.Trainer(z, (loss, label_error), [learner])

Now we need to defined parameters for learning, and showing results.

# Initialize the parameters for the trainer
minibatch_size = 120 #mini batch size will be full data set
num_iterations = 20 #number of iterations 

# Map the data streams to the input and labels.
input_map = {
label  : reader_train.streams.labels,
input  : reader_train.streams.features
# Run the trainer on and perform model training
training_progress_output_freq = 1

plotdata = {"batchsize":[], "loss":[], "error":[]}

As can be seen the batchsize is set to dataset size which is typical for small data sets.  Since we defined minibach to dataset size, the iteration should be very small value since Iris data is very simple and the learner will find good result very fast.

Running the trainer looks very simple. For each iteration, the reader load the batch size amount of the data, and pass to the trainer. The trainer performs the learning process using SGD learner, and returns the Loss and the error value for the current iteration. Then we call print function to show the progress of the trainer.

for i in range(0, int(num_iterations)):
        # Read a mini batch from the training data file
        data=reader_train.next_minibatch(minibatch_size, input_map=input_map) 
        batchsize, loss, error = print_training_progress(trainer, i, training_progress_output_freq)
        if not (loss == "NA" or error =="NA"):

Once the learning process completes, we can perform some result presentation.

# Plot the training loss and the training error
import matplotlib.pyplot as plt

plt.plot(plotdata["batchsize"], plotdata["loss"], 'b--')
plt.xlabel('Minibatch number')
plt.title('Minibatch run vs. Training loss')

plt.plot(plotdata["batchsize"], plotdata["error"], 'r--')
plt.xlabel('Minibatch number')
plt.ylabel('Label Prediction Error')
plt.title('Minibatch run vs. Label Prediction Error')

We plot the Loss and Error function converted in to total accuracy of the classifier. The folowing pictures shows those graphs.

The last part of the ML procedure is the testing or validating the model. FOr the Iris data set we prepare 20 samples which will be used for the testing. The code i similar to the previous, except we call create_reader with different file name. Then we try to evaluate the model and grab the Loss and error values, and print out.

# Read the training data
reader_test = create_reader("C:/sc/Offline/testData_cntk.txt",False, input_dim, num_output_classes)

test_input_map = {
    label  : reader_test.streams.labels,
    input  : reader_test.streams.features,

# Test data for trained model
test_minibatch_size = 20
num_samples = 20
num_minibatches_to_test = num_samples // test_minibatch_size
test_result = 0.0

for i in range(num_minibatches_to_test):
    data = reader_test.next_minibatch(test_minibatch_size,input_map = test_input_map)
    eval_error = trainer.test_minibatch(data)
    test_result = test_result + eval_error

# Average of evaluation errors of all test minibatches
print("Average test error: {0:.2f}%".format(test_result*100 / num_minibatches_to_test))

Full sample with python code and data set can be found here.

Introduction to CNTK – Microsoft Cognitive Toolkit

This summer Microsoft released the CNTK 2.0,  C++ open source, cross-platform and cross-OS library for deep learning based on deep neural network (NN). As they said this is the fastest NN library today, which is several times faster than related competitors libraries: TensorFlow, Caffe, Theno and Torch. From the demos which are included in the library it can be said this is very powerfully library, which can be of huge help for those who is doing Data Science and Machine Learning.

Actually CNTK is created by Microsoft speech researcher in 2012, and few years after it became the open source library at the codeplex site in early 2015. A year later it is moved to GitHub, by announcing CNTK 1.0. The first version released in January 2016 as the open source project on GitHub .

In June this year the CNTK 2.0 is released with lot of improvements and benchmarks. In September CNTK 2.2 is released by fully supporting the .NET platform which allows .NET developers to include the library in .NET based applications.

Beside C#, CNTK support C++ as native support, as well as Python which is proven to be first class citizen for this library.

Later the library will be ported on Java and R.