Infragistics ASP.NET controls

RapidMiner tutorial: How to explore correlations in your data to discover the relevance of attributes

What is correlation?

From wikipedia

In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence.

In laymans terms, correlation is a relationships between data attributes.  For a quick refresher, in data mining, a dataset is made up of different attributes.  We use these attributes to classify or predict a label.  Some attributes have more "meaning" or influence over the label's value.  As you can imagine, if you can determine the influence that specific attributes have over your data, you are in a better position to build a classification model because you will know which attributes you should focus on when building your model.  

In this example, I will use the kaggle.com Titanic datamining challenge dataset.  This post will not uncover any information that is not readily available in the tutorial posted on kaggle.com.

Here are two screenshots.  The first screenshot will show you some statistics about the dataset.  The second screenshot will show a sample of the data.

Meta data view of the Titanic data mining challenge Training dataset

A data view of the dataset

The correlation matrix

First start by importing the Titanic training dataset into RapidMiner.  You can use Read From CSV, Read From Excel, or Read from Database to achieve this step.  Next, search for the "Correlation Matrix" operator and drag it onto the process surface.  Connect the Titanic training dataset output port to the Correlation Matrix operator's input example port.  Your process should look like this.

 

Now run the process and observe the output.

You are presented with several different result views.  The first view will be the Correlation Matrix Attribute Weights view.  The Attribute weights view displays the "weight" of each attribute.  The purpose of this tutorial is to explain a different view of the Correlation matrix.  Click on the Correlation Matrix view.  This is a matrix that shows the Correlation Coefficients which is a measure of the strength of the relationship between our attributes.  An easy way to get started with the Correlation matrix is to notice that when an attribute intersects with itself, you have a dark blue cell with the value of 1 which represents the strongest possible value.  This is because any attribute matched with itself is a perfect correlation.  A correlation coefficient value can be positive or negative.  A negative value does not necessarily mean there is less of a relationship between the values represented.  The larger the coefficient in either direction represents a strong relationship between those two attributes.  If we look at the matrix and follow along the top row (survived) we will see the attributes that have the strongest correlation with the label in which we are trying to predict.

Just as the kaggle.com tutorial specifies, the attributes with the strongest correlation with the label (survived) are

sex(0.295), pclass(0.115), and fare(0.66) 

Remember that the value as well as the color will help you to visually identify the stronger correlation between attributes.

If you are working with a classification problem, I'm sure you can see how valuable the correlation matrix can be in showing you the relationships between your label and attributes.  Such insights let can provide a great start on where to focus your attention when building your classification model.

Thanks for reading and keep your eyes open for my next tutorial! 



Add comment

  Country flag


  • Comment
  • Preview
Loading

About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List

refactorthis.net | RapidMiner tips and tricks #1 How to use SQL Server named instances with RapidMiner Read/Write database operators
Infragistics JQuery controls

RapidMiner tips and tricks #1 How to use SQL Server named instances with RapidMiner Read/Write database operators

 Tips and tricks. Tip #1 How to use SQL Server named instances with RapidMiner Read/Write to database operators

Hello and welcome to my first of many tips and tricks for RapidMiner.  If you are unfamiliar with RapidMiner, it's a Open Source Java based data mining solution.  You can visit the official RapidMiner website by clicking here.  My plan is to write a short article to provide solutions to problems that I encounter as I learn more about this awesome application.  

RapidMiner and database connectivity

There are many operators in RapidMiner that take input data sets and generate models for prediction and analysis.  Often, you will want to write the result set of the model to a database.  To do this you use the "Write Database" operator.

I was using RapidMiner for web mining by way of the Crawl Web operator.  The Example set output of the Crawl Web operator was connected to the input of the Write Database operator.  At the time I was using a SQL Server database that I pay for through my web hosting account.  Just like most everything in RapidMiner, the setup was easy and worked like a charm.  My database size quota was 200MB with my current hosting plan and it became apparent to me that I would quickly run out of space.  As such, I decided to use the local SQL Express 2012 named instanced on my machine.  This is where the problem was introduced.  I couldn't figure out how to successfully setup the database connection in RapidMiner.  

RapidMiner, Named Instances, and Integrated Security

The issues that I encountered when trying to setup my local SQL Server 2012 named instanced were as follows:

  1. If I used the named instance for the server name(localhost\SQLExpress), I was unable to connect.  I didn't encounter this problem with my hosting server's database because it was a direct hostname (xxx.sqlserverdb.com).  There was no instance name and so the configuration was easy.
  2. I wasn't sure how to specify integrated security as this is something that you usually specify in the connection string.  I didn't encounter this problem either using my hosting database server because I was given a user name and password to connect to the server.

After some research and banging my head against my laptop, I finally figured out the resolution to my problems and I'm here to save someone else the headache.

For the named instance issue, there is a trick that is not readily apparent to get this to work.  You set your database server name as per usual, in my case, localhost, however, when you specify the database name, you include a semicolon (;) followed by instance=<instance name>.  So for my local server instance (localhost\sqlexpress), I set the Host value to localhost and the Database scheme value to mydatabasename;instance=sqlexpress .  

As far as the integrated security requirement, all you need to do is make sure that you have the latest JTDS SQL Server driver from here.  Once you download the zip file, you'll need to extract the file jtds-1.3.0-dist.zip\x86\SSO\ntlmauth.dll and place it in your windows\system32 directory.  This will insure that you have the driver with the capabilities of using the integrated security.  Once this file is in place, you simply leave the username and password values blank. Here is a screen shot of the Manage Database Connections window in RapidMiner for your reference.

 

Well that about wraps it up.  Please leave a comment if you have any questions.

Until next time,

Buddy James



Add comment

  Country flag


  • Comment
  • Preview
Loading

About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List

refactorthis.net | Machine learning resources for .NET developers
Infragistics WPF controls

Machine learning resources for .NET developers

Machine learning for .NET

Greetings friends and welcome to this article on Machine learning libraries for .NET developers.  Machine learning is a hot topic right now and for good reason.  Personally, I haven't been so excited about a technology since my computer used my 2800 baud modem to dial into a BBS over 17 years ago.  The thought that my computer could communicate with another computer was so fascinating to me.  That moment was the very moment that would forever change my life.  I learned a lot about DOS by writing batch scripts and running other programs that allowed me to visit and then run a BBS system.  It eventually lead me to QBasic.  I wanted to learn to write BBS door games and QBasic was included as a part of a standard DOS installation back then.

Fast forward 17 years and I'm still in love with computers, programming, and the concept of communication between machines.  The magic never disappeared.  So when i first learned about the concept of Machine learning, I felt like that 13 year old kid again.  The idea that a machine can learn to do things that it has not been programmed to do is now a passion of mine.  The concepts of Machine learning have an extreme learning curve, however, I believe that we as humans can do anything that we put our mind to.  So I began looking around for tutorials on machine learning.  I found many great tutorials and books, however, most of them involved using python.  I have nothing against python.  As a matter of fact, I find it ironic that I started with BASIC and now in this moment of "rebirth" I'm beginning to use python which looks a lot like BASIC in many ways.  The fact of the matter remains, I'm a .NET developer.  I've spent the last 9 years in the .NET framework and I love the technology.  C# is an awesome programming language and it's hard to imagine life without Visual Studio.  What can I say, the IDE has spoiled me.

While I scoured the internet looking for tutorials related to Machine learning resources for .NET developers, I wished that there was a one resource that would assist me in my search for resources to help me achieve my goal.

Well that's what this article is all about.  In this article, I will introduce you to some .NET libraries that will assist you in your quest to learn about Machine learning.

NND Neural Network Designer by Bragisoft

The Neural Network Designer project (NND) is a DBMS management system for neural networks that was created by Jan Bogaerts.  The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feed back.  The chat bots can even scrape the internet for information to return in their output as well as to use for learning.  The project includes a custom language syntax called NNL (neural network language) that you can use in configuring your machine learning project.  The source code is designed so that the libraries can be used in your own custom applications so you don't have to start from scratch with such a complex set of technologies.  The project is actually an open source project in which I am a part of.  Some of the possibilities offered by this awesome project include predictions, image and pattern recognition, value inspection, memory profiling and much more.  Stop by the Bragisoft NND website and download the application to give it a try

 Screen shots of the neural network designer by Bragisoft

A DBMS for neural networks

A DBMS for neural networks

 

Mind map rand forrest

Machine learning

The chat bot designer and other tools

GUIs and debuggers

Accord.net

Here is a description from the Accord.NET project website 

Accord.NET is a framework for scientific computing in .NET. The framework builds upon AForge.NET, an also popular framework for image processing, supplying new tools and libraries. Those libraries encompass a wide range of scientific computing applications, such as statistical data processing, machine learning, pattern recognition, including but not limited to, computer vision and computer audition. The framework offers a large number of probability distributions, hypothesis tests, kernel functions and support for most popular performance measurements techniques.

 The most impressive parts of this library has got to be the documentation and sample applications that are distributed with the project.  This makes the library easy to get started using.  I also like the ability to perform operations like Audio processing (beat detection and more), Video processing (easy integration with your web cam, vision capabilities and object recognition).  This is an excellent place to start with approaching Machine learning with the .NET framework.  Here are a two videos that should whet your appetite.

Hand writing recognition with Accord.NET

 

Here is an example of head tracking with Accord.NET (super cool)

 

AIMLBot Program# AILM Chat bot library

AIMLBot (Program#) is a small, fast, standards-compliant yet easily customizable implementation of an AIML (Artificial Intelligence Markup Language) based chatter bot in C#. AIMLBot has been tested on both Microsoft's runtime environment and Mono. Put simply, it will allow you to chat (by entering text) with your computer using natural language.  The project is located here.

Math.NET

Machine learning algorithms are extremely math heavy.  Math.NET is a library  that can assist with the math that is required to solve machine learning related problems.

Math.NET Numerics aims to provide methods and algorithms for numerical computations in science, engineering and every day use. Covered topics include special functions, linear algebra, probability models, random numbers, interpolation, integral transforms and more.

DotNumerics

DotNumerics is a website dedicated to numerical computing for .NET. DotNumerics includes a Numerical Library for .NET. The library is written in pure C# and has more than 100,000 lines of code with the most advanced algorithms for Linear Algebra, Differential Equations and Optimization problems. The Linear Algebra library includes CSLapack, CSBlas and CSEispack, these libraries are the translation from Fortran to C# of LAPACK, BLAS and EISPACK, respectively.

You can find the library here. 

ALGLIB

ALGLIB is a cross-platform numerical analysis and data processing library. It supports several programming languages (C++, C#, Pascal, VBA) and several operating systems (Windows, Linux, Solaris). ALGLIB features include:

Accessing ‘R’ from C#–Lessons learned

Here are instructions to use the R statistical framework from within c#

ILNumerics

You can check out the library at http://www.ilnumerics.net

NuML.net http://numl.net

A nice site about the basics of machine learning in c# by Seth Juarez . NuML.NET is a machine learning library for .NET developers written by Seth Juarez.  I've recently tried this library and I'm impressed!  Seth has stated publicly that his intention behind the numl.net library is to abstract the scary math away from machine learning to provide tools that are more approachable by software developers and boy did he deliver!  I've been working with this library for a little more than an hour and I've written a prediction app in c#.  You can find his numl.net library source on github.

Encog Machine Learning Framework

Here is what the official Heaton Research website has to say about Encog:

Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models and Genetic Algorithms are supported. Most Encog training algoritms are multi-threaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms. Encog has been in active development since 2008.

Encog is available for Java, .Net and C/C++.

Jeff Heaton knows a great deal about machine learning algorithms and he's created a wonderful library called Encog.  I was able to write a neural network application that solved the classic XOR problem in 20 minutes after installing the library.  What really amazes me is that he has an Encog Library for JavaScript which includes live samples on his website of Javascript + encog solving problems like the Traveling Salesman Problem and Conway's game of life, all in a browser!  This library can even use your GPU for the heavy lifting if that's your choice.  I would highly recommend that you at least check out his site and download the library to look at the examples.  You can find the Encog library here

 

Conclusion

This concludes my article on Machine learning resources for the .NET developer.  If you have any suggestions regarding a project that you know of or you are working on related to Machine learning in .NET, please don't hesitate to leave a comment and I will update the article to mention the project.  This article has shown that we as .NET developers have many resources available to us to use to implement Machine learning based solutions.  I appreciate your time in reading this article and I hope you found it useful.  Please subscribe to my RSS feed.  Until next time..

Buddy James



Comments (6) -

Seth Juarez
Seth Juarez
3/4/2013 11:11:59 AM #

Hey! I also made something: http://numl.net.

Buddy James
Buddy James
3/9/2013 4:22:53 AM #

Seth,

Thank you for contributing.  I'm going to add your project to my list.

The code looks great.  The site design is really awesome too!  Kudos!

Buddy James

terrell26
terrell26
3/21/2013 12:44:32 PM #

You seem to know a great deal about this subject

Buddy James
Buddy James
3/21/2013 5:33:28 PM #

I appreciate the compliment.  I'm very passionate about machine learning and I'm constantly learning.

Thanks again!

Buddy James

Don Syme
Don Syme
7/2/2013 4:46:37 AM #

Great links!

For F# developers (or C# developers adding an F# project to their solution) see also  

    http://fsharp.org/machine-learning

Buddy James
Buddy James
7/4/2013 8:37:45 PM #

Thanks for reading @Don.  I hear great things about F# and machine learning.  F# is on my list of languages to learn.  Thanks again!

Buddy

Pingbacks and trackbacks (1)+

Add comment

  Country flag


  • Comment
  • Preview
Loading

About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List

refactorthis.net | Development
Infragistics JQuery controls

ASP.NET MVC Basics: How to create a HtmlHelper

 ASP.NET MVC Basics: How to create a HtmlHelper Hello! And welcome to another tutorial on refactorthis.net.  This is the first tutorial in a series on ASP.NET MVC development.  This first installment will cover how to create an HtmlHelper.   What is ASP.NET MVC? I apologize if I'm stating the obvious, however, this is a tutorial on the basics so I'm approaching the tutorial with the assumption that the reader has no knowledge of ASP.NET, MVC, or ASP.NET MVC.  I'm sure you noticed that I made three references to the technology.  That's because ASP.NET MVC is more than one technology.  First we have ASP.NET, which is a server side web development technology created by Microsoft that utilizes the .NET framework.  Simply put, ASP.NET allows you to write dynamic web pages in .NET language of your choice (C#, VB.NET, etc..).  ASP.NET is an open source technology and it can be developed using the .NET framework or the Mono runtime.  You can find all sorts of wonderful information at http://www.asp.net/ . ASP.NET comes in two "flavors".  ASP.NET WebForms which is the original flavor, and ASP.NET MVC.  ASP.NET MVC is made up of two concepts.  First and foremost there is the MVC design pattern.  The MVC or Model-View-Controller design pattern was introduced by Trygve Reenskaug in the 1970's.  The design pattern has also be explained by Martin Fowler in his wonderful book on design patterns called Patterns of Enterprise Application Architecture . The idea behind the MVC pattern is that you have data that you wish to display (Model), you have a presentation layer in which you wish to display your Model (View), and you have a class that handles the interaction between the View and the Model (controller).  The MVC design pattern is an excellent example of a design with proper separation of concerns.  Applications that implement the MVC design pattern are generally loosely coupled, and easy to test (unit testing).  In ASP.NET MVC, the View is a web page, the model can be any class that holds data that you want to display in the View, and a controller class. Now that I've explained ASP.NET and MVC, I'll now explain ASP.NET MVC.  You see, not only is ASP.NET MVC an implementation of the Model-View-Controller design pattern, it's also a framework built by Microsoft to support the implementation of the MVC design pattern.  You will see as you begin working with MVC that the framework is a large part of what makes ASP.NET MVC what it is.  You could implement ASP.NET MVC without the framwork, however, I don't recommend it.  The framework will create controller's for you, create Views that are bound to a strongly typed model of your choice, and much more.  Using the ASP.NET MVC framework, you can create a basic CRUD web application in a very short amount of time. I hope that this gives you a nice overview of the ASP.NET MVC framework and design pattern. HtmlHelper: What they are and how to create them I'll explain HtmlHelpers by taking you through implementing one..  We'll start by firing up Visual Studio and create a new ASP.NET MVC 4 project (note: I'm using VS 2012, however, if you have an earlier version installed, simply choose whichever ASP.NET MVC version that you have available to you.  If you don't have any ASP.NET MVC templates, you can use the Web platform installer to install ASP.NET MVC on your system.       The next dialog allows you to choose which type of ASP.NET MVC application that you'd like to create.  We will pick an Internet Application, with the Razor View engine and a unit testing project as shown below.     Now click OK and your project will be created for you.  The ASP.NET MVC Framework will create a lot of boiler plate code behind the scenes.  This includes "Forms based Authentication" in the web.config file as well as an AccountController which is a controller to handle authentication to the site.  There is also a default controller called HomeController. We are going to create an html helper that creates an HTML table.  This is merely an example and is not particularly useful, however, it will show you how to create and use an HTML Helper.  An HTML helper is nothing more than an extension method which returns a string of HTML.  Here is the official definition of an extension method from MSDN: Extension methods enable you to "add" methods to existing types without creating a new derived type, recompiling, or otherwise modifying the original type. Extension methods are a special kind of static method, but they are called as if they were instance methods on the extended type. For client code written in C# and Visual Basic, there is no apparent difference between calling an extension method and the methods that are actually defined in a type. Creating the HtmlHelper  Create a folder called "Helpers" in your solution.  In the folder, create a static class called TableExtensions.cs .  The class will look like this: TableExtensions.cs using System; using System.Text; using System.Web.Mvc; namespace HtmlHelpers.Helpers { public static class TableExtensions { public static MvcHtmlString Table(this HtmlHelper helper, string id, string name, int rows, int columns) { if (string.IsNullOrEmpty(id)) throw new ArgumentNullException("id"); if(string.IsNullOrEmpty("name")) throw new ArgumentNullException("name"); if (rows <= 0) throw new IndexOutOfRangeException(); if (columns <= 0) throw new IndexOutOfRangeException(); StringBuilder tableBuilder = new StringBuilder(); tableBuilder.Append(string.Format("<table id=\"{0}\" name=\"{1}\">", id, name)); for (int rowCounter = 0; rowCounter < rows; rowCounter++) { tableBuilder.Append("<tr>"); for (int columnCounter = 0; columnCounter < columns; columnCounter++) { tableBuilder.Append("<td>"); tableBuilder.Append(rowCounter.ToString()); tableBuilder.Append("</td>"); } tableBuilder.Append("</tr>"); } tableBuilder.Append("</table>"); return new MvcHtmlString(tableBuilder.ToString()); } } } As you can see, we've created a static class and a static method. The the first method parameter is  this HtmlHelper helperThis is where the magic happens. This allows us to use the method from the View like all other HTML helpers.The other methods specify the number of rows and columns that should be built in the HTML table.Here is an example of how to use the helper: @using HtmlHelpers.Helpers; @{ ViewBag.Title = "Index"; } <h2>Index</h2> @Html.Table("myTable", "myTableName", 4, 4)The output looks like this: And there you have it folks.  I hope you enjoyed this tutorial.  If you have any questions please don't hesitate to leave a comment. Thanks for reading.


RapidMiner tips and tricks #1 How to use SQL Server named instances with RapidMiner Read/Write database operators

 Tips and tricks. Tip #1 How to use SQL Server named instances with RapidMiner Read/Write to database operators Hello and welcome to my first of many tips and tricks for RapidMiner.  If you are unfamiliar with RapidMiner, it's a Open Source Java based data mining solution.  You can visit the official RapidMiner website by clicking here.  My plan is to write a short article to provide solutions to problems that I encounter as I learn more about this awesome application.   RapidMiner and database connectivity There are many operators in RapidMiner that take input data sets and generate models for prediction and analysis.  Often, you will want to write the result set of the model to a database.  To do this you use the "Write Database" operator. I was using RapidMiner for web mining by way of the Crawl Web operator.  The Example set output of the Crawl Web operator was connected to the input of the Write Database operator.  At the time I was using a SQL Server database that I pay for through my web hosting account.  Just like most everything in RapidMiner, the setup was easy and worked like a charm.  My database size quota was 200MB with my current hosting plan and it became apparent to me that I would quickly run out of space.  As such, I decided to use the local SQL Express 2012 named instanced on my machine.  This is where the problem was introduced.  I couldn't figure out how to successfully setup the database connection in RapidMiner.   RapidMiner, Named Instances, and Integrated Security The issues that I encountered when trying to setup my local SQL Server 2012 named instanced were as follows: If I used the named instance for the server name(localhost\SQLExpress), I was unable to connect.  I didn't encounter this problem with my hosting server's database because it was a direct hostname (xxx.sqlserverdb.com).  There was no instance name and so the configuration was easy. I wasn't sure how to specify integrated security as this is something that you usually specify in the connection string.  I didn't encounter this problem either using my hosting database server because I was given a user name and password to connect to the server. After some research and banging my head against my laptop, I finally figured out the resolution to my problems and I'm here to save someone else the headache. For the named instance issue, there is a trick that is not readily apparent to get this to work.  You set your database server name as per usual, in my case, localhost, however, when you specify the database name, you include a semicolon (;) followed by instance=<instance name>.  So for my local server instance (localhost\sqlexpress), I set the Host value to localhost and the Database scheme value to mydatabasename;instance=sqlexpress .   As far as the integrated security requirement, all you need to do is make sure that you have the latest JTDS SQL Server driver from here.  Once you download the zip file, you'll need to extract the file jtds-1.3.0-dist.zip\x86\SSO\ntlmauth.dll and place it in your windows\system32 directory.  This will insure that you have the driver with the capabilities of using the integrated security.  Once this file is in place, you simply leave the username and password values blank. Here is a screen shot of the Manage Database Connections window in RapidMiner for your reference.   Well that about wraps it up.  Please leave a comment if you have any questions. Until next time, Buddy James


numl - a machine learning library for .NET developers

In one of my previous posts called Machine learning resources for .NET developers, I introduced a machine learning library called numl.net.  numl.net is a machine learning library for .NET created by Seth Juarez.  You can find the library here and Seth's blog here.  When I began researching the library, I learned quickly that one of Seth's goals in writing numl.net was to abstract away the complexities that stops many software developers from trying their hand at machine learning.  I must say that in my opinion, he has done a wonderful job in accomplishing this goal! Tutorial I've decided to throw together a small tutorial to show you just how easy it is to use numl.net to perform predictions.  This tutorial will use structured learning by way of a decision tree to perform predictions.  I will use the infamous Iris Data set which contains data 3 different types of Iris flowers and the data that defines them.  Before we get into code, let's look at some basic terminology first. With numl.net you create a POCO (plain old CLR object) to use for training as well as predictions.  There will be properties that you will specify known values (features) so that you can predict the value of an unknown property value (label).  numl.net makes identifying features and labels easy, you simply mark your properties with the [Feature] attribute or the [Label] attribute (there is also a [StringLabel] attribute as well).  Here is an example of the Iris class that we will use in this tutorial. using numl.Model; using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace NumlDemo { /// <summary> /// Represents an Iris in the infamous Iris classification dataset (Fisher, 1936) /// Each feature property will be used for training as well as prediction. The label /// property is the value to be predicted. In this case, it's which type of Iris we are dealing with. /// </summary> public class Iris { //Length in centimeters [Feature] public double SepalLength { get; set; } //Width in centimeters [Feature] public double SepalWidth { get; set; } //Length in centimeters [Feature] public double PetalLength { get; set; } //Width in centimeters [Feature] public double PetalWidth { get; set; } //-- Iris Setosa //-- Iris Versicolour //-- Iris Virginica public enum IrisTypes { IrisSetosa, IrisVersicolour, IrisVirginica } [Label] public IrisTypes IrisClass { get; set; } //This is the label or value that we wish to predict based on the supplied features } } As you can see, we have a simple POCO Iris class, which defines four features and one label.  The Iris training data can be found here .  Here is an example of the data found in the file.   5.1,3.5,1.4,0.2,Iris-setosa 6.3,2.5,4.9,1.5,Iris-versicolor 6.0,3.0,4.8,1.8,Iris-virginica     The first four values are doubles which represent the features Sepal Length, Sepal Width, Petal Length, Petal Width.  The final value is an enum that represents the label that we will predict which is the class of Iris.   We have the Iris class, so now we need a method to parse the training data file and generate a static List<Iris> collection.  Here is the code:   using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Threading.Tasks; namespace NumlDemo { /// <summary> /// Provides the services to parse the training data files /// </summary> public static class IrisDataParserService { //provides the training data to create the predictive model public static List<Iris> TrainingIrisData { get; set; } /// <summary> /// Reads the trainingDataFile and populates the TrainingIrisData list /// </summary> /// <param name="trainingDataFile">File full of Iris data</param> /// <returns></returns> public static void LoadIrisTrainingData(string trainingDataFile) { //if we don't have a training data file if (string.IsNullOrEmpty(trainingDataFile)) throw new ArgumentNullException("trainingDataFile"); //if the file doesn't exist on the file system if (!File.Exists(trainingDataFile)) throw new FileNotFoundException(); if (TrainingIrisData == null) //initialize the return training data set TrainingIrisData = new List<Iris>(); //read the entire file contents into a string using (var fileReader = new StreamReader(new FileStream(trainingDataFile, FileMode.Open))) { string fileLineContents; while ((fileLineContents = fileReader.ReadLine()) != null) { //split the current line into an array of values var irisValues = fileLineContents.Split(','); double sepalLength = 0.0; double sepalWidth = 0.0; double petalLength = 0.0; double petalWidth = 0.0; if (irisValues.Length == 5) { Iris currentIris = new Iris(); double.TryParse(irisValues[0], out sepalLength); currentIris.SepalLength = sepalLength; double.TryParse(irisValues[1], out sepalWidth); currentIris.SepalWidth = sepalWidth; double.TryParse(irisValues[2], out petalLength); currentIris.PetalLength = petalLength; double.TryParse(irisValues[3], out petalWidth); currentIris.PetalWidth = petalWidth; if (irisValues[4] == "Iris-setosa") currentIris.IrisClass = Iris.IrisTypes.IrisSetosa; else if (irisValues[4] == "Iris-versicolor") currentIris.IrisClass = Iris.IrisTypes.IrisVersicolour; else currentIris.IrisClass = Iris.IrisTypes.IrisVirginica; IrisDataParserService.TrainingIrisData.Add(currentIris); } } } } } } This code is pretty standard.  We simply read each line in the file, split the values out into an array, and populate a List<Iris> collection of Iris objects based on the data found in the file.   Now the magic Using the numl.net library, we need only use three classes to perform a prediction based on the Iris data set.  We start with a Descriptor, which identifies the class in which we will learn and predict.  Next, we will instantiate a DecisionTreeGenerator, passing the descriptor to the constructor.  Finally, we will create our prediction model by calling the Generate method of the DecisionTreeGenerator, passing the training data (IEnumerable<Iris>) to the Generate method.  The generate method will provide us with a model in which we can perform our prediction. Here is the code: using numl; using numl.Model; using numl.Supervised; using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace NumlDemo { class Program { public static void Main(string[] args) { //get the descriptor that describes the features and label from the Iris training objects var irisDescriptor = Descriptor.Create<Iris>(); //create a decision tree generator and teach it about the Iris descriptor var decisionTreeGenerator = new DecisionTreeGenerator(irisDescriptor); //load the training data IrisDataParserService.LoadIrisTrainingData(@"D:\Development\machinelearning\Iris Dataset\bezdekIris.data"); //create a model based on our training data using the decision tree generator var decisionTreeModel = decisionTreeGenerator.Generate(IrisDataParserService.TrainingIrisData); //create an iris that should be an Iris Setosa var irisSetosa = new Iris { SepalLength = 5.1, SepalWidth = 3.5, PetalLength = 1.4, PetalWidth = 0.2 }; //create an iris that should be an Iris Versicolor var irisVersiColor = new Iris { SepalLength = 6.1, SepalWidth = 2.8, PetalLength = 4.0, PetalWidth = 1.3 }; //create an iris that should be an Iris Virginica var irisVirginica = new Iris { SepalLength = 7.7, SepalWidth = 2.8, PetalLength = 6.7, PetalWidth = 2.0 }; var irisSetosaClass = decisionTreeModel.Predict<Iris>(irisSetosa); var irisVersiColorClass = decisionTreeModel.Predict<Iris>(irisVersiColor); var irisVirginicaClass = decisionTreeModel.Predict<Iris>(irisVirginica); Console.WriteLine("The Iris Setosa was predicted as {0}", irisSetosaClass.IrisClass.ToString()); Console.WriteLine("The Iris Versicolor was predicted as {0}", irisVersiColorClass.IrisClass.ToString()); Console.WriteLine("The Iris Virginica was predicted as {0}", irisVirginicaClass.IrisClass.ToString()); Console.ReadKey(); } } } And that's all there is to it.  As you can see, you can use the prediction model accurately and there's no math, only simple abstractions. I hope this has peaked your interest in the numl.net library for machine learning in .NET.   Feel free to post any questions or opinions. Thanks for reading! Buddy James  


Complete coverage of your source code with NDepend part 1

What is NDepend? This article is part one of a two part series about one of the most practical and dynamic tools in existence for .NET development.  I’m talking about NDepend http://www.NDepend.com.  I was approached about writing a review for NDepend so I downloaded the application to give it a try.  As with all of my reviews, let it be known that if I think a product is mediocre, then that’s what I’m going to write.  All that to say that this is no exaggeration, I really feel this strongly about this tool.  I’m sure by the end of this article, I will have peeked your interest too.  If you are interested, please read on. NDepend pro product suite From NDepend.com, “NDepend is a Visual Studio tool to manage complex .NET code and achieve high Code Quality.”  This tool allows you to visualize your source code in many different ways in an effort to analyze the quality of your code and how to improve it.  The product comes complete with a Visual Studio add in, an independent GUI tool, and a set of power tools that are console based which makes the product suite extremely versatile.  Whether you are pressed for time and need to analyze your code while in visual studio, you prefer a standalone GUI, or you are addicted to the command line, this product is made to fit your needs. Installation The NDpend installation process is very straight forward.  The download is a zip file that contains the complete product suite.  You simply pick a folder to install to and unzip the archive.  If you’ve purchased the pro version, you will be provided with a license in the form of an XML file which needs to be placed in the directory that you chose to install the product. Installing the Visual Studio 2012 add-in Once you’ve unzipped the archive, you need to run the NDepend.Install.VisualStudioAddin.exe executable to install the Visual Studio add-in. Running the install The installation completed Adding an NDepend project to your solution When you use the Visual Studio integration, you need to create an NDepend project in the solution that you wish to analyze. NDepend will tell you anything that wish you know about source code.  This is powerful, however, it’s a point that must be covered.  In order to be productive with NDepend, you must first define what information that you wish to discover about your source code and how you plan to use that information.  If you don’t have this information then you will not get much use from the product.  The information that it provides to you is very useful, however, you must take some time to plan out how you will use this information to benefit you and your coding efforts. You may wish to make sure that your code maintains a consistent amount of test coverage.  Perhaps you wish to make sure that all methods in your codebase stay below a certain threshold regarding the number of lines of code that they contain.  NDepend is capable of telling you this and much more about your source code. One of the coolest features that I’ve seen in the product is the Code Query Linq (CQLinqing).  This allows you to query your source code using LINQ syntax to bring back anything that you wish to know about your source code.   You can query an assembly, a class, even a method.  The product comes with predefined CQLinq rules but also allows you to create your own rules as well as edit existing rules. I plan to write another blog post that explains my personal experience with the product.  I’ve recently joined an open source project that is a framework that handles some very advanced topics such Artificial intelligence, Machine learning, and language design.  The project is called neural network designer http://bragisoft.com/ .  I chose this project because the source code is vast and I believe that a large code base is a perfect target to use NDepend to get the most benefit. I plan to use the product and test the following areas:   What information do I want to know about my code base?   When do I wish to be presented with this information?   How do I plan on using this information to improve my code?   How can I use NDepend to provide this information? I think that if you wish to get any use out of the product, it will be very important that you answer these questions.  The product is vast and diverse but it can also be a bit intimidating.  With that said, I plan to use my next post to illustrate how I was able to use NDepend to define the metrics that I needed from my code, and how I used NDepend to provide those metrics to me. Stay tuned for the next installment which will explain my experience with using NDepend to improve my development efforts and my source code. Thanks for reading, Buddy James kick it on DotNetKicks.com


About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List