Infragistics JQuery controls

numl - a machine learning library for .NET developers

In one of my previous posts called Machine learning resources for .NET developers, I introduced a machine learning library called numl.net.  numl.net is a machine learning library for .NET created by Seth Juarez.  You can find the library here and Seth's blog here.  When I began researching the library, I learned quickly that one of Seth's goals in writing numl.net was to abstract away the complexities that stops many software developers from trying their hand at machine learning.  I must say that in my opinion, he has done a wonderful job in accomplishing this goal! Tutorial I've decided to throw together a small tutorial to show you just how easy it is to use numl.net to perform predictions.  This tutorial will use structured learning by way of a decision tree to perform predictions.  I will use the infamous Iris Data set which contains data 3 different types of Iris flowers and the data that defines them.  Before we get into code, let's look at some basic terminology first. With numl.net you create a POCO (plain old CLR object) to use for training as well as predictions.  There will be properties that you will specify known values (features) so that you can predict the value of an unknown property value (label).  numl.net makes identifying features and labels easy, you simply mark your properties with the [Feature] attribute or the [Label] attribute (there is also a [StringLabel] attribute as well).  Here is an example of the Iris class that we will use in this tutorial. using numl.Model; using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace NumlDemo { /// <summary> /// Represents an Iris in the infamous Iris classification dataset (Fisher, 1936) /// Each feature property will be used for training as well as prediction. The label /// property is the value to be predicted. In this case, it's which type of Iris we are dealing with. /// </summary> public class Iris { //Length in centimeters [Feature] public double SepalLength { get; set; } //Width in centimeters [Feature] public double SepalWidth { get; set; } //Length in centimeters [Feature] public double PetalLength { get; set; } //Width in centimeters [Feature] public double PetalWidth { get; set; } //-- Iris Setosa //-- Iris Versicolour //-- Iris Virginica public enum IrisTypes { IrisSetosa, IrisVersicolour, IrisVirginica } [Label] public IrisTypes IrisClass { get; set; } //This is the label or value that we wish to predict based on the supplied features } } As you can see, we have a simple POCO Iris class, which defines four features and one label.  The Iris training data can be found here .  Here is an example of the data found in the file.   5.1,3.5,1.4,0.2,Iris-setosa 6.3,2.5,4.9,1.5,Iris-versicolor 6.0,3.0,4.8,1.8,Iris-virginica     The first four values are doubles which represent the features Sepal Length, Sepal Width, Petal Length, Petal Width.  The final value is an enum that represents the label that we will predict which is the class of Iris.   We have the Iris class, so now we need a method to parse the training data file and generate a static List<Iris> collection.  Here is the code:   using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Threading.Tasks; namespace NumlDemo { /// <summary> /// Provides the services to parse the training data files /// </summary> public static class IrisDataParserService { //provides the training data to create the predictive model public static List<Iris> TrainingIrisData { get; set; } /// <summary> /// Reads the trainingDataFile and populates the TrainingIrisData list /// </summary> /// <param name="trainingDataFile">File full of Iris data</param> /// <returns></returns> public static void LoadIrisTrainingData(string trainingDataFile) { //if we don't have a training data file if (string.IsNullOrEmpty(trainingDataFile)) throw new ArgumentNullException("trainingDataFile"); //if the file doesn't exist on the file system if (!File.Exists(trainingDataFile)) throw new FileNotFoundException(); if (TrainingIrisData == null) //initialize the return training data set TrainingIrisData = new List<Iris>(); //read the entire file contents into a string using (var fileReader = new StreamReader(new FileStream(trainingDataFile, FileMode.Open))) { string fileLineContents; while ((fileLineContents = fileReader.ReadLine()) != null) { //split the current line into an array of values var irisValues = fileLineContents.Split(','); double sepalLength = 0.0; double sepalWidth = 0.0; double petalLength = 0.0; double petalWidth = 0.0; if (irisValues.Length == 5) { Iris currentIris = new Iris(); double.TryParse(irisValues[0], out sepalLength); currentIris.SepalLength = sepalLength; double.TryParse(irisValues[1], out sepalWidth); currentIris.SepalWidth = sepalWidth; double.TryParse(irisValues[2], out petalLength); currentIris.PetalLength = petalLength; double.TryParse(irisValues[3], out petalWidth); currentIris.PetalWidth = petalWidth; if (irisValues[4] == "Iris-setosa") currentIris.IrisClass = Iris.IrisTypes.IrisSetosa; else if (irisValues[4] == "Iris-versicolor") currentIris.IrisClass = Iris.IrisTypes.IrisVersicolour; else currentIris.IrisClass = Iris.IrisTypes.IrisVirginica; IrisDataParserService.TrainingIrisData.Add(currentIris); } } } } } } This code is pretty standard.  We simply read each line in the file, split the values out into an array, and populate a List<Iris> collection of Iris objects based on the data found in the file.   Now the magic Using the numl.net library, we need only use three classes to perform a prediction based on the Iris data set.  We start with a Descriptor, which identifies the class in which we will learn and predict.  Next, we will instantiate a DecisionTreeGenerator, passing the descriptor to the constructor.  Finally, we will create our prediction model by calling the Generate method of the DecisionTreeGenerator, passing the training data (IEnumerable<Iris>) to the Generate method.  The generate method will provide us with a model in which we can perform our prediction. Here is the code: using numl; using numl.Model; using numl.Supervised; using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace NumlDemo { class Program { public static void Main(string[] args) { //get the descriptor that describes the features and label from the Iris training objects var irisDescriptor = Descriptor.Create<Iris>(); //create a decision tree generator and teach it about the Iris descriptor var decisionTreeGenerator = new DecisionTreeGenerator(irisDescriptor); //load the training data IrisDataParserService.LoadIrisTrainingData(@"D:\Development\machinelearning\Iris Dataset\bezdekIris.data"); //create a model based on our training data using the decision tree generator var decisionTreeModel = decisionTreeGenerator.Generate(IrisDataParserService.TrainingIrisData); //create an iris that should be an Iris Setosa var irisSetosa = new Iris { SepalLength = 5.1, SepalWidth = 3.5, PetalLength = 1.4, PetalWidth = 0.2 }; //create an iris that should be an Iris Versicolor var irisVersiColor = new Iris { SepalLength = 6.1, SepalWidth = 2.8, PetalLength = 4.0, PetalWidth = 1.3 }; //create an iris that should be an Iris Virginica var irisVirginica = new Iris { SepalLength = 7.7, SepalWidth = 2.8, PetalLength = 6.7, PetalWidth = 2.0 }; var irisSetosaClass = decisionTreeModel.Predict<Iris>(irisSetosa); var irisVersiColorClass = decisionTreeModel.Predict<Iris>(irisVersiColor); var irisVirginicaClass = decisionTreeModel.Predict<Iris>(irisVirginica); Console.WriteLine("The Iris Setosa was predicted as {0}", irisSetosaClass.IrisClass.ToString()); Console.WriteLine("The Iris Versicolor was predicted as {0}", irisVersiColorClass.IrisClass.ToString()); Console.WriteLine("The Iris Virginica was predicted as {0}", irisVirginicaClass.IrisClass.ToString()); Console.ReadKey(); } } } And that's all there is to it.  As you can see, you can use the prediction model accurately and there's no math, only simple abstractions. I hope this has peaked your interest in the numl.net library for machine learning in .NET.   Feel free to post any questions or opinions. Thanks for reading! Buddy James  


Assistant Professor receives $518,434 to apply Machine Learning to network analysis

The University of Illinois at Urbana-Champaign - College of Engineering has awarded $518,434 to Assistant Professor Maxim Raginsky to use to apply Machine Learning techniques to network analysis to try and discover how to make networks more efficient. From the article http://csl.illinois.edu/news/raginsky-receives-career-award-apply-information-theory-machine-learning-problems “The overall design objective is to make sure that the network resources are allocated in a smart way, and each user receives only the data they need without significant waste of bandwidth or power,”  said Raginsky, a member of Illinois' electrical and computer engineering faculty. Raginsky uses ecological monitoring as an example. If someone is tracking a rare bird species in a specific habitat and wants to record how many of these birds fly in and out of the area, it would be a waste of resources to continuously stream video if what the person really wants is just the arrivals and departures of the birds. A big part of the problem is learning to detect events of interest and to reliably communicate only the data describing these events. “So I want to make sure that only the relevant information gets to those who need it, despite the fact that everyone is using the same network and the kinds of information that are relevant to one user are different than the kinds of information that are relevant to somebody else,” Raginsky said. These problems are messy and complex, and there is no hope to come up with an accurate model for all kinds of data being transmitted and received over networks because of the increasing size and complexity of both the networks and the data, Raginsky said. Machine learning offers a variety of tools for extracting predictively relevant information from observations, but to date most of the research on machine learning has not focused on the network aspect and all the resource constraints that it imposes. This project will systematically explore what is and is not possible in these types of large networks with multiple learning agents, specifically identifying the effect of bandwidth limitations, losses, delays and lack of central coordination on the performance of statistical learning algorithms, thus helping develop efficient and robust coding/decoding schemes. The NSF CAREER Award is awarded by the National Science Foundation specifically to “junior faculty members who demonstrate their roles through outstanding research and education,” according to NSF’s website. Raginsky said that because these awards are for 5-year projects, the proposals take a lot of time and effort. “You propose to research something you’re really passionate about, and presumably you want to work on this topic even if it did not get funded,” Raginsky said. “So, when I heard about my proposal being recommended for funding, of course it was a relief. I will have a good time working on this problem.” Raginsky is a member of the Decision and Control group at CSL. I think that this is a wonderful problem domain in which Machine learning can prove useful.  Machine learning is a powerful set of technologies, and we have yet to even scratch the surface of what it can do for human kind.  This goes to show you that there are other great uses besides targeted advertising systems, though that is where most of the jobs are at the moment.  Do you have ay ideas as to some practical applications of Machine learning that have yet to be tested? Please share by leaving a comment.  


Machine learning: bitly can do a lot more for you than shrink your URLs..

bitly's contributions to BigData and Machine learning Greetings to all of my fellow technologists.  I wanted to write an article to let you know about some very interesting resources that bitly has made available to developers and data lovers alike.  Just in case you've been living under a rock, bitly provides a URL shortening service.  What you may not know is they offer much more than that. Popular links can tell us a lot about the world If you have been working with machine learning and big data, I'm sure you know that access to data is extremely useful and that one of the best places to find useful data regarding what's happening on the Internet and in the world is the "Social web".  Everyone uses facebook, twitter, goggle+.  A lot of people use services like these to share with the world the subjects that they find important.  So that makes these networks extremely important regarding what people are looking at on the Internet and thus what people care about throughout the world.  Considering the fact that the same people that use these social networking tools to communicate with the world also use the bitly service to communicate the URLs of the pages that they find interesting or important, one would think that bitly's data could tell us a lot about what's going on in the world, and in real-time. Hilary Mason, remember the name A new hero of mine as an activist regarding machine learning and BigData is none other than Hilary Mason.  From her blog at http://www.hilarymason.com  Hilary Mason states "I'm the Chief Scientist at bitly, co-organizer of DataGotham, Co-Founder of HackNY, member of NYC Resistor".  Her blog post titled "Bitly Social Data APIs" describes some really interesting services that bitly offers to the public for obtaining useful information regarding links shared by bitly.  These services include how to see where in the world people are consuming a particular bitly link , how to see what is the world paying attention to right now (called "bursting phrases"), and my personal favorite, a real-time search engine which will bring back search results that are relevant based on what people are clicking on the day of the search. Hilary Mason and bitly have provided some great tools for developers that are interested in machine learning and data science.  I urge you to check out some of her talks on http://www.youtube.com . Links http://www.hilarymason.com is Hilary Mason's blog  http://www.hilarymason.com/blog/bitly-social-data-apis/ is the blog post on bitly's social data apis Youtube videos: Hilary Mason on machine learning  That about does it for this article.  Please leave a comment and tell us about someone that you admire in regards to machine learning and BigData. Thanks for reading. Buddy James    


Machine learning resources for .NET developers

Greetings friends and welcome to this article on Machine learning libraries for .NET developers.  Machine learning is a hot topic right now and for good reason.  Personally, I haven't been so excited about a technology since my computer used my 2800 baud modem to dial into a BBS over 17 years ago.  The thought that my computer could communicate with another computer was so fascinating to me.  That moment was the very moment that would forever change my life.  I learned a lot about DOS by writing batch scripts and running other programs that allowed me to visit and then run a BBS system.  It eventually lead me to QBasic.  I wanted to learn to write BBS door games and QBasic was included as a part of a standard DOS installation back then. Fast forward 17 years and I'm still in love with computers, programming, and the concept of communication between machines.  The magic never disappeared.  So when i first learned about the concept of Machine learning, I felt like that 13 year old kid again.  The idea that a machine can learn to do things that it has not been programmed to do is now a passion of mine.  The concepts of Machine learning have an extreme learning curve, however, I believe that we as humans can do anything that we put our mind to.  So I began looking around for tutorials on machine learning.  I found many great tutorials and books, however, most of them involved using python.  I have nothing against python.  As a matter of fact, I find it ironic that I started with BASIC and now in this moment of "rebirth" I'm beginning to use python which looks a lot like BASIC in many ways.  The fact of the matter remains, I'm a .NET developer.  I've spent the last 9 years in the .NET framework and I love the technology.  C# is an awesome programming language and it's hard to imagine life without Visual Studio.  What can I say, the IDE has spoiled me. While I scoured the internet looking for tutorials related to Machine learning resources for .NET developers, I wished that there was a one resource that would assist me in my search for resources to help me achieve my goal. Well that's what this article is all about.  In this article, I will introduce you to some .NET libraries that will assist you in your quest to learn about Machine learning. NND Neural Network Designer by Bragisoft The Neural Network Designer project (NND) is a DBMS management system for neural networks that was created by Jan Bogaerts.  The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feed back.  The chat bots can even scrape the internet for information to return in their output as well as to use for learning.  The project includes a custom language syntax called NNL (neural network language) that you can use in configuring your machine learning project.  The source code is designed so that the libraries can be used in your own custom applications so you don't have to start from scratch with such a complex set of technologies.  The project is actually an open source project in which I am a part of.  Some of the possibilities offered by this awesome project include predictions, image and pattern recognition, value inspection, memory profiling and much more.  Stop by the Bragisoft NND website and download the application to give it a try.   Screen shots of the neural network designer by Bragisoft A DBMS for neural networks   Mind map rand forrest The chat bot designer and other tools Accord.net Here is a description from the Accord.NET project website  Accord.NET is a framework for scientific computing in .NET. The framework builds upon AForge.NET, an also popular framework for image processing, supplying new tools and libraries. Those libraries encompass a wide range of scientific computing applications, such as statistical data processing, machine learning, pattern recognition, including but not limited to, computer vision and computer audition. The framework offers a large number of probability distributions, hypothesis tests, kernel functions and support for most popular performance measurements techniques.  The most impressive parts of this library has got to be the documentation and sample applications that are distributed with the project.  This makes the library easy to get started using.  I also like the ability to perform operations like Audio processing (beat detection and more), Video processing (easy integration with your web cam, vision capabilities and object recognition).  This is an excellent place to start with approaching Machine learning with the .NET framework.  Here are a two videos that should whet your appetite. Hand writing recognition with Accord.NET   Here is an example of head tracking with Accord.NET (super cool)   AIMLBot Program# AILM Chat bot library AIMLBot (Program#) is a small, fast, standards-compliant yet easily customizable implementation of an AIML (Artificial Intelligence Markup Language) based chatter bot in C#. AIMLBot has been tested on both Microsoft's runtime environment and Mono. Put simply, it will allow you to chat (by entering text) with your computer using natural language.  The project is located here. Math.NET Machine learning algorithms are extremely math heavy.  Math.NET is a library  that can assist with the math that is required to solve machine learning related problems. Math.NET Numerics aims to provide methods and algorithms for numerical computations in science, engineering and every day use. Covered topics include special functions, linear algebra, probability models, random numbers, interpolation, integral transforms and more. DotNumerics DotNumerics is a website dedicated to numerical computing for .NET. DotNumerics includes a Numerical Library for .NET. The library is written in pure C# and has more than 100,000 lines of code with the most advanced algorithms for Linear Algebra, Differential Equations and Optimization problems. The Linear Algebra library includes CSLapack, CSBlas and CSEispack, these libraries are the translation from Fortran to C# of LAPACK, BLAS and EISPACK, respectively. You can find the library here.  ALGLIB ALGLIB is a cross-platform numerical analysis and data processing library. It supports several programming languages (C++, C#, Pascal, VBA) and several operating systems (Windows, Linux, Solaris). ALGLIB features include: Accessing ‘R’ from C#–Lessons learned Here are instructions to use the R statistical framework from within c# ILNumerics You can check out the library at http://www.ilnumerics.net NuML.net http://numl.net A nice site about the basics of machine learning in c# by Seth Juarez . NuML.NET is a machine learning library for .NET developers written by Seth Juarez.  I've recently tried this library and I'm impressed!  Seth has stated publicly that his intention behind the numl.net library is to abstract the scary math away from machine learning to provide tools that are more approachable by software developers and boy did he deliver!  I've been working with this library for a little more than an hour and I've written a prediction app in c#.  You can find his numl.net library source on github. Encog Machine Learning Framework Here is what the official Heaton Research website has to say about Encog: Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models and Genetic Algorithms are supported. Most Encog training algoritms are multi-threaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms. Encog has been in active development since 2008. Encog is available for Java, .Net and C/C++. Jeff Heaton knows a great deal about machine learning algorithms and he's created a wonderful library called Encog.  I was able to write a neural network application that solved the classic XOR problem in 20 minutes after installing the library.  What really amazes me is that he has an Encog Library for JavaScript which includes live samples on his website of Javascript + encog solving problems like the Traveling Salesman Problem and Conway's game of life, all in a browser!  This library can even use your GPU for the heavy lifting if that's your choice.  I would highly recommend that you at least check out his site and download the library to look at the examples.  You can find the Encog library here.    Conclusion This concludes my article on Machine learning resources for the .NET developer.  If you have any suggestions regarding a project that you know of or you are working on related to Machine learning in .NET, please don't hesitate to leave a comment and I will update the article to mention the project.  This article has shown that we as .NET developers have many resources available to us to use to implement Machine learning based solutions.  I appreciate your time in reading this article and I hope you found it useful.  Please subscribe to my RSS feed.  Until next time.. Buddy James


About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List