…continue from the previous post.
Once the model is build and Loss and Validation functions are satisfied our expectation, we need to validate and test the model using the data which was not part of the training data set (unseen data). The model validation is very important because we want to see if our model is trained well,so that can evaluates unseen data approximately same as the training data. Otherwise the model which cannot predict the output is called overfitted model. Overfitting can happen when the model was trained long enough that shows very high performance for the training data set, but for the testing data evaluate bad results.
We will continue with the implementation from the prevision two posts, and implement model validation. After the model is trained, the model and the trainer are passed to the Evaluation method. The evaluation method loads the testing data and calculated the output using passed model. Then it compares calculated (predicted) values with the output from the testing data set and calculated the accuracy. The following source code shows the evaluation implementation.
private static void EvaluateIrisModel(Function ffnn_model, Trainer trainer, DeviceDescriptor device) { var dataFolder = "Data";//files must be on the same folder as program var trainPath = Path.Combine(dataFolder, "testIris_cntk.txt"); var featureStreamName = "features"; var labelsStreamName = "label"; //extract features and label from the model var feature = ffnn_model.Arguments[0]; var label = ffnn_model.Output; //stream configuration to distinct features and labels in the file var streamConfig = new StreamConfiguration[] { new StreamConfiguration(featureStreamName, feature.Shape[0]), new StreamConfiguration(labelsStreamName, label.Shape[0]) }; // prepare testing data var testMinibatchSource = MinibatchSource.TextFormatMinibatchSource( trainPath, streamConfig, MinibatchSource.InfinitelyRepeat, true); var featureStreamInfo = testMinibatchSource.StreamInfo(featureStreamName); var labelStreamInfo = testMinibatchSource.StreamInfo(labelsStreamName); int batchSize = 20; int miscountTotal = 0, totalCount = 20; while (true) { var minibatchData = testMinibatchSource.GetNextMinibatch((uint)batchSize, device); if (minibatchData == null || minibatchData.Count == 0) break; totalCount += (int)minibatchData[featureStreamInfo].numberOfSamples; // expected labels are in the mini batch data. var labelData = minibatchData[labelStreamInfo].data.GetDenseData<float>(label); var expectedLabels = labelData.Select(l => l.IndexOf(l.Max())).ToList(); var inputDataMap = new Dictionary<Variable, Value>() { { feature, minibatchData[featureStreamInfo].data } }; var outputDataMap = new Dictionary<Variable, Value>() { { label, null } }; ffnn_model.Evaluate(inputDataMap, outputDataMap, device); var outputData = outputDataMap[label].GetDenseData<float>(label); var actualLabels = outputData.Select(l => l.IndexOf(l.Max())).ToList(); int misMatches = actualLabels.Zip(expectedLabels, (a, b) => a.Equals(b) ? 0 : 1).Sum(); miscountTotal += misMatches; Console.WriteLine($"Validating Model: Total Samples = {totalCount}, Mis-classify Count = {miscountTotal}"); if (totalCount >= 20) break; } Console.WriteLine($"---------------"); Console.WriteLine($"------TESTING SUMMARY--------"); float accuracy = (1.0F - miscountTotal / totalCount); Console.WriteLine($"Model Accuracy = {accuracy}"); return; }
The implemented method is called in the previous Training method.
EvaluateIrisModel(ffnn_model, trainer, device);
As can be seen the model validation has shown that the model predicts the data with high accuracy, which is shown on the following picture.
This was the latest post in series of blog posts about using Feed forward neural networks to train the Iris data using CNTK and C#.
The full source code for all three samples can be found here.