Infragistics WPF controls

numl - a machine learning library for .NET developers

In one of my previous posts called Machine learning resources for .NET developers, I introduced a machine learning library called numl.net.  numl.net is a machine learning library for .NET created by Seth Juarez.  You can find the library here and Seth's blog here.  When I began researching the library, I learned quickly that one of Seth's goals in writing numl.net was to abstract away the complexities that stops many software developers from trying their hand at machine learning.  I must say that in my opinion, he has done a wonderful job in accomplishing this goal!

Tutorial

I've decided to throw together a small tutorial to show you just how easy it is to use numl.net to perform predictions.  This tutorial will use structured learning by way of a decision tree to perform predictions.  I will use the infamous Iris Data set which contains data 3 different types of Iris flowers and the data that defines them.  Before we get into code, let's look at some basic terminology first.

With numl.net you create a POCO (plain old CLR object) to use for training as well as predictions.  There will be properties that you will specify known values (features) so that you can predict the value of an unknown property value (label).  numl.net makes identifying features and labels easy, you simply mark your properties with the [Feature] attribute or the [Label] attribute (there is also a [StringLabel] attribute as well).  Here is an example of the Iris class that we will use in this tutorial.

using numl.Model;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace NumlDemo
{
    /// <summary>
    /// Represents an Iris in the infamous Iris classification dataset (Fisher, 1936)
    /// Each feature property will be used for training as well as prediction.  The label
    /// property is the value to be predicted.  In this case, it's which type of Iris we are dealing with.
    /// </summary>
    public class Iris
    {
        //Length in centimeters
        [Feature]
        public double SepalLength { get; set; }

        //Width in centimeters
        [Feature]
        public double SepalWidth { get; set; }

        //Length in centimeters
        [Feature]
        public double PetalLength { get; set; }

        //Width in centimeters
        [Feature]
        public double PetalWidth { get; set; }

        
        //-- Iris Setosa 
        //-- Iris Versicolour 
        //-- Iris Virginica
        
        public enum IrisTypes
        {
            IrisSetosa,
            IrisVersicolour,
            IrisVirginica
        }

        [Label]
        public IrisTypes IrisClass { get; set; } //This is the label or value that we wish to predict based on the supplied features
    }
}

As you can see, we have a simple POCO Iris class, which defines four features and one label.  The Iris training data can be found here .  Here is an example of the data found in the file.

 

5.1,3.5,1.4,0.2,Iris-setosa

6.3,2.5,4.9,1.5,Iris-versicolor

6.0,3.0,4.8,1.8,Iris-virginica
 
 
The first four values are doubles which represent the features Sepal Length, Sepal Width, Petal Length, Petal Width.  The final value is an enum that represents the label that we will predict which is the class of Iris.
 
We have the Iris class, so now we need a method to parse the training data file and generate a static List<Iris> collection.  Here is the code:
 
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace NumlDemo
{
    /// <summary>
    /// Provides the services to parse the training data files 
    /// </summary>
    public static class IrisDataParserService
    {
        //provides the training data to create the predictive model
        public static List<Iris> TrainingIrisData { get; set; }

        /// <summary>
        /// Reads the trainingDataFile and populates the TrainingIrisData list
        /// </summary>
        /// <param name="trainingDataFile">File full of Iris data</param>
        /// <returns></returns>
        public static void LoadIrisTrainingData(string trainingDataFile)
        {
            //if we don't have a training data file
            if (string.IsNullOrEmpty(trainingDataFile))
                throw new ArgumentNullException("trainingDataFile");

            //if the file doesn't exist on the file system
            if (!File.Exists(trainingDataFile))
                throw new FileNotFoundException();

            if (TrainingIrisData == null)
                //initialize the return training data set
                TrainingIrisData = new List<Iris>();

            //read the entire file contents into a string
            using (var fileReader = new StreamReader(new FileStream(trainingDataFile, FileMode.Open)))
            {
                string fileLineContents;
                while ((fileLineContents = fileReader.ReadLine()) != null)
                {
                    //split the current line into an array of values
                    var irisValues = fileLineContents.Split(',');

                    double sepalLength = 0.0;
                    double sepalWidth = 0.0;

                    double petalLength = 0.0;
                    double petalWidth = 0.0;

                    if (irisValues.Length == 5)
                    {
                        Iris currentIris = new Iris();

                        double.TryParse(irisValues[0], out sepalLength);
                        currentIris.SepalLength = sepalLength;

                        double.TryParse(irisValues[1], out sepalWidth);
                        currentIris.SepalWidth = sepalWidth;

                        double.TryParse(irisValues[2], out petalLength);
                        currentIris.PetalLength = petalLength;

                        double.TryParse(irisValues[3], out petalWidth);
                        currentIris.PetalWidth = petalWidth;

                        if (irisValues[4] == "Iris-setosa")
                            currentIris.IrisClass = Iris.IrisTypes.IrisSetosa;
                        else if (irisValues[4] == "Iris-versicolor")
                            currentIris.IrisClass = Iris.IrisTypes.IrisVersicolour;
                        else
                            currentIris.IrisClass = Iris.IrisTypes.IrisVirginica;

                        IrisDataParserService.TrainingIrisData.Add(currentIris);
                    }
                }
            }
        }
    }
}
This code is pretty standard.  We simply read each line in the file, split the values out into an array, and populate a List<Iris> collection of Iris objects based on the data found in the file.
 

Now the magic

Using the numl.net library, we need only use three classes to perform a prediction based on the Iris data set.  We start with a Descriptor, which identifies the class in which we will learn and predict.  Next, we will instantiate a DecisionTreeGenerator, passing the descriptor to the constructor.  Finally, we will create our prediction model by calling the Generate method of the DecisionTreeGenerator, passing the training data (IEnumerable<Iris>) to the Generate method.  The generate method will provide us with a model in which we can perform our prediction.

Here is the code:

using numl;
using numl.Model;
using numl.Supervised;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace NumlDemo
{
    class Program
    {
        public static void Main(string[] args)
        {
            //get the descriptor that describes the features and label from the Iris training objects
            var irisDescriptor = Descriptor.Create<Iris>();

            //create a decision tree generator and teach it about the Iris descriptor
            var decisionTreeGenerator = new DecisionTreeGenerator(irisDescriptor);

            //load the training data
            IrisDataParserService.LoadIrisTrainingData(@"D:\Development\machinelearning\Iris Dataset\bezdekIris.data");

            //create a model based on our training data using the decision tree generator
            var decisionTreeModel = decisionTreeGenerator.Generate(IrisDataParserService.TrainingIrisData);

            //create an iris that should be an Iris Setosa
            var irisSetosa = new Iris
            {
                SepalLength = 5.1,
                SepalWidth = 3.5,
                PetalLength = 1.4,
                PetalWidth = 0.2
            };

            //create an iris that should be an Iris Versicolor
            var irisVersiColor = new Iris
            {
                SepalLength = 6.1,
                SepalWidth = 2.8,
                PetalLength = 4.0,
                PetalWidth = 1.3
            };

            //create an iris that should be an Iris Virginica
            var irisVirginica = new Iris
            {
                SepalLength = 7.7,
                SepalWidth = 2.8,
                PetalLength = 6.7,
                PetalWidth = 2.0
            };

            var irisSetosaClass = decisionTreeModel.Predict<Iris>(irisSetosa);
            var irisVersiColorClass = decisionTreeModel.Predict<Iris>(irisVersiColor);
            var irisVirginicaClass = decisionTreeModel.Predict<Iris>(irisVirginica);

            Console.WriteLine("The Iris Setosa was predicted as {0}",
                irisSetosaClass.IrisClass.ToString());

            Console.WriteLine("The Iris Versicolor was predicted as {0}",
                irisVersiColorClass.IrisClass.ToString());

            Console.WriteLine("The Iris Virginica was predicted as {0}",
                irisVirginicaClass.IrisClass.ToString());

            Console.ReadKey();
        }
    }
}

And that's all there is to it.  As you can see, you can use the prediction model accurately and there's no math, only simple abstractions.

I hope this has peaked your interest in the numl.net library for machine learning in .NET.  

Feel free to post any questions or opinions.

Thanks for reading!

Buddy James

 



Comments (8) -

sam
sam
3/27/2013 11:47:06 PM #

Hi , nice stuff, but I am getting error "Invalid descriptor: Empty feature set!"

var data = Value.GetData();
            var description = Descriptor.Create<Value>();
            var generator = new DecisionTreeGenerator(50);
            var model = generator.Generate(description, data);

where value :
public class Value
    {
        public int V1 { get; set; }
        public int V2 { get; set; }
        public int R { get ;set; }
}

Buddy James
Buddy James
3/28/2013 7:25:41 AM #

Sam,

You need to mark some of your properties with the [Feature] attribute.. That would be my first guess.. give that a shot and see if that doesn't take care of your problem.

Thanks for reading!

Buddy James

sam
sam
3/28/2013 7:56:47 AM #

Thanks Buddy , will try it today . it look good , but there is no documentation at all .

Buddy James
Buddy James
3/28/2013 2:40:01 PM #

Sam,

I've forked the GitHub repository and I'm working to help provide some documentation as I learn the library.

sam
sam
3/28/2013 5:50:47 PM #

I just posed an issue , it seem it cant handle big date , it only works when the sample data is less than 10 and the data somehow consistent , for example if you feed it from this it will  error    :
public static IEnumerable<Value>  GetData()
        {

           for ( int i=0  ; i<1000; i++)
           {
               yield return new Value { V1 =1 , V2 = i , R= (i>50 ) ?"l":"s"  };

           }
        }

Buddy James
Buddy James
3/29/2013 1:31:30 AM #

Sam,

I appreciate the heads up.

I've submitted the bug and I'll let you know when it's been fixed.

Thanks,

Buddy James

sam
sam
3/30/2013 2:22:55 PM #

Hi Buddy ,
really thank you for your help , but it seem numl is not what I am looking for , also its developer confirmed that . I was trying to make it predict the level of student from his degrees which supposed to be easy job .
I advise you to do some deep test before consider using it to any real life project .
I a mgoing to try :
www.codeproject.com/.../AForge-NET-open-source-framework

Buddy James
Buddy James
3/31/2013 1:07:24 AM #

Sam,

I appreciate your feedback.

The library is a young work in progress.  It's on github and I feel it's worth the time.  I'm having a lot of fun working on it and I'm learning a lot in the process.

I suggest trying all libraries and pick the one that calls out to you.  NND is another project that I'm involved in that is more geared toward neural networks, however, it's vast... Maybe you should give it a try.  Be sure to stop by and let me know what works for you.  Thanks for reading!

Add comment

  Country flag


  • Comment
  • Preview
Loading

About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List

refactorthis.net | All posts by admin
Infragistics WPF controls

How to list local Windows System Services and their dependencies using WPF DataBinding and MVVM

The example I wrote this example in an effort to monitor the Windows Services that are installed on my system at any given time.  I decided to kill two birds with one stone and learn a bit about MVVM, DataBinding, and WPF in the process.  I'll start by showing you a class diagram of the s... [More]


.NET Design Patterns : The decorator pattern. Adding new behavior to your existing objects.

  .NET Design Patterns series: Part 1 The Decorator pattern   A brief introduction to software design patterns Hello and welcome to my first article in a series on design patterns.  A design pattern is simply a documented solution to a common software design problem.  This i... [More]


Emulating iPhone, iPad, and Opera devices to debug mobile ASP.NET MVC4 apps with VS 2012 and WebMatrix 2.

Visual Studio 2012RC and ASP.NET MVC4 Mobile sites I've recently downloaded Visual Studio 2012 RC to experiment with the latest and greatest that .NET 4.5 has to offer.  The installation took around an hour and a half. What's new in .NET 4.5 you ask?  Here's a comprehensive list from (so... [More]


WPF Validation tutorial for the rest of us. Learn to use IDataErrorInfo to automatically validate your views.

    WPF Validation for mere mortals I've recently jumped on the WPF bandwagon and I've fallen in love with this technology.  XAML is a fascinating animal indeed.  The declarative nature of the language leaves much to be explored.  There are many ways to solve the same pro... [More]


About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List