It's with great pleasure that I let you know about my ebook, WPF Succinctly, published by the great team over at Syncfusion. This book is a part of their succinctly series which offers a wide variety of free ebooks on the hottest software development topics. I'm proud of the book and especially of the final product. I tried to start with the basics (controls, xaml basics) and then work my way up to more advanced topics like MVVM and Commands. The book is full of examples for you to follow along.
I'm currently working on a following up to WPF succinctly called "Orubase Succinctly" which is a book on Syncfusion's mobile hybrid platform. You can learn more about the product on the official Orubase page.
Don't forget to check out the other FREE books in the succinctly series here. Also, I'd like to recommend that you check out the trial versions of the Syncfusion's essential studio suite of controls. The suite includes top notch controls for ASP.NET, ASP.NET MVC, WPF, Silverlight, Windows phone, WinRT, WinForms and more.
I would love any feedback regarding the book. And please be on the lookout for Orubase succinctly which I should be wrapping up soon.
Until next time,
Greetings! And welcome to another wam bam, thank you ma'am, mind blowing, flex showing, machine learning tutorial here at refactorthis.net!
This tutorial is based on a machine learning toolkit called RapidMiner by RapidI. RapidMiner is a full featured Java based open source machine learning toolkit with support for all of the popular machine learning algorithms used in data analytics today. The library supports supports the following machine learning algorithms (to name a few):
Naive Bayes (kernel)
Decision Tree (Weight-based, Multiway)
Vector Linear Regression
Support Vector Machine (Linear, Evolutionary, PSO)
k-Means (kernel, fast)
And much much more!!
Excited yet? I thought so!
How to create a decision tree using RapidMiner
When I first ran across screen shots of RapidMiner online, I thought to myself, "Oh boy.. I wonder how much this is going to cost...". The UI looked so amazing. It's like Visual Studio for Data Mining and Machine learning! Much to my surprise, I found out that the application is open source and free!
Here is a quote from the RapidMiner site:
RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Thousands of applications of RapidMiner in more than 40 countries give their users a competitive edge.
I've been trying some machine learning "challenges" recently to sharpen my skills as a data scientist, and I decided to use RapidMiner to tackle the kaggle.com machine learning challenge called "Titanic: Machine Learning from Disaster" . The data set is a CSV file that contains information on many of the passengers of the infamous Titanic voyage. The goal of the challenge is to take one CSV file containing training data (the training data contains all attributes as well as the label Survived) and a testing data file containing only the attributes (no Survived label) and to predict the Survived label of the testing set based on the training set.
Warning: Although I'm not going to provide the complete solution to this challenge, I warn you, if you are working on this challenge, then you should probably stop reading this tutorial. I do provide some insights into the survival data found in the training data set. It's best to try to work the challenge out on your own. After all, we learn by TRYING, FAILING, TRYING AGAIN, THEN SUCCEEDING. I'd also like to say that I'm going to do my very best to go easy on the THEORY of this post.. I know that some of my readers like to get straight to the action :) You have been warned..
Why a decision tree?
A decision tree model is a great way to visualize a data set to determine which attributes of a data set influenced a particular classification (label). A decision tree looks like a tree with branches, flipped upside down.. Perhaps a (cheesy) image will illustrate..
After you are finished laughing at my drawing, we may proceed....... OK
In my example, imagine that we have a data set that has data that is related to lifestyle and heart disease. Each row has a person, their sex, age, Smoker (y/n), Diet (good/poor), and a label Risk (Less Risk/More Risk). The data indicates that the biggest influence on Risk turns out to be the Smoker attribute. Smoker becomes the first branch in our tree. For Smokers, the next influencial attribute happens to be Age, however, for non smokers, the data indicates that their diet has a bigger influence on the risk. The tree will branch into two different nodes until the classification os reached or the maximum "depth" that we establish is reached. So as you can see, a decision tree can be a great way to visualize how a decision is derived based on the attributes in your data.
RapidMiner and data modeling
Ready to see how easy it is to create a prediction model using RapidMiner? I thought so!
Create a new process
When you are working in RapidMiner, your project is known as a process. So we will start by running RapidMiner and creating a new process.
The version of RapidMiner used in this tutorial is version 5.3. Once the application is open, you will be presented with the following start screen.
From this screen you will click on New Process
You are presented with the main user interface for RapidMiner. One of the most compelling aspects of Rapidminer is it's ease of use and intuitive user interface. The basic flow of this process is as follows:
Import your test and training data from CSV files into your RapidMiner repository. This can be found in the repository menu under Import CSV file
Once your data has been imported into your repository, the datasets can be dragged onto your process surface for you to apply operators
You will add your training data to the process
Next, you will add your testing data to the process
Search the operators for Decision Tree and add the operator
In order to use your training data to generate a prediction on your testing data using the Decision Tree model, we will add an "Apply Model" operator to the process. This operator has an input that you will associate with the output model of your Decision Tree operator. There is also an input that takes "unlearned" data from the output of your testing dataset.
You will attach the outputs of Apply Model to the results connectors on the right side of the process surface.
Once you have designed your model, RapidMiner will show you any problems with your process and will offer "Quick fixes" if they exists that you can double click to resolve.
Once all problems have been resolved, you can run your process and you will see the results that you wired up to the results side of the process surface.
Here are screenshots of the entire process for your review
Add the training data from the repository by dragging and dropping the dataset that you imported from your CSV file
Repeat the process and add the testing data underneath the training data
Now you can search in the operators window for Decision Tree operator. Add it to your process.
The way that you associate the inputs and outputs of operators and data sets is by clicking on the output of one item and connecting it by clicking on the input of another item. Here we are connecting the output of the training dataset to the input of the Decision Tree operator.
Next we will add the Apply model operator
Then we will create the appropriate connections for the model
Observe the quick fixes in the problems window at the bottom.. you can double click the quick fixes to resolve the issues.
You will be prompted to make a simple decision regarding the problem that was detected. Once you resolve one problem, other problems may appear. be sure to resolve all problems so that you can run your process.
Here is the process after resolving all problems.
Next, I select the decision tree operator and I adjust the following parameters:
Maximum Depth: change from 20 to 5.
check both boxes to make sure that the tree is not "pruned".
Once this has been done, you can Run your process and observe the results. Since we connected both the model as well as the labeled result to the output connectors of the process, we are presented with a visual display of our Decision Tree (model) as well as the Test data set with the prediction applied.
(Decision Tree Model)
(The example test result set with the predictions applied)
As you can see, RapidMiner makes complex data analysis and machine learning tasks extremely easy with very little effort.
This concludes my tutorial on creating Decision Trees in RapidMiner.
Until next time,
Greetings friends and welcome to this article on Machine learning libraries for .NET developers. Machine learning is a hot topic right now and for good reason. Personally, I haven't been so excited about a technology since my computer used my 2800 baud modem to dial into a BBS over 17 years ago. The thought that my computer could communicate with another computer was so fascinating to me. That moment was the very moment that would forever change my life. I learned a lot about DOS by writing batch scripts and running other programs that allowed me to visit and then run a BBS system. It eventually lead me to QBasic. I wanted to learn to write BBS door games and QBasic was included as a part of a standard DOS installation back then.
Fast forward 17 years and I'm still in love with computers, programming, and the concept of communication between machines. The magic never disappeared. So when i first learned about the concept of Machine learning, I felt like that 13 year old kid again. The idea that a machine can learn to do things that it has not been programmed to do is now a passion of mine. The concepts of Machine learning have an extreme learning curve, however, I believe that we as humans can do anything that we put our mind to. So I began looking around for tutorials on machine learning. I found many great tutorials and books, however, most of them involved using python. I have nothing against python. As a matter of fact, I find it ironic that I started with BASIC and now in this moment of "rebirth" I'm beginning to use python which looks a lot like BASIC in many ways. The fact of the matter remains, I'm a .NET developer. I've spent the last 9 years in the .NET framework and I love the technology. C# is an awesome programming language and it's hard to imagine life without Visual Studio. What can I say, the IDE has spoiled me.
While I scoured the internet looking for tutorials related to Machine learning resources for .NET developers, I wished that there was a one resource that would assist me in my search for resources to help me achieve my goal.
Well that's what this article is all about. In this article, I will introduce you to some .NET libraries that will assist you in your quest to learn about Machine learning.
NND Neural Network Designer by Bragisoft
The Neural Network Designer project (NND) is a DBMS management system for neural networks that was created by Jan Bogaerts. The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feed back. The chat bots can even scrape the internet for information to return in their output as well as to use for learning. The project includes a custom language syntax called NNL (neural network language) that you can use in configuring your machine learning project. The source code is designed so that the libraries can be used in your own custom applications so you don't have to start from scratch with such a complex set of technologies. The project is actually an open source project in which I am a part of. Some of the possibilities offered by this awesome project include predictions, image and pattern recognition, value inspection, memory profiling and much more. Stop by the Bragisoft NND website and download the application to give it a try.
Screen shots of the neural network designer by Bragisoft
A DBMS for neural networks
Mind map rand forrest
The chat bot designer and other tools
Here is a description from the Accord.NET project website
Accord.NET is a framework for scientific computing in .NET. The framework builds upon AForge.NET, an also popular framework for image processing, supplying new tools and libraries. Those libraries encompass a wide range of scientific computing applications, such as statistical data processing, machine learning, pattern recognition, including but not limited to, computer vision and computer audition. The framework offers a large number of probability distributions, hypothesis tests, kernel functions and support for most popular performance measurements techniques.
The most impressive parts of this library has got to be the documentation and sample applications that are distributed with the project. This makes the library easy to get started using. I also like the ability to perform operations like Audio processing (beat detection and more), Video processing (easy integration with your web cam, vision capabilities and object recognition). This is an excellent place to start with approaching Machine learning with the .NET framework. Here are a two videos that should whet your appetite.
Hand writing recognition with Accord.NET
Here is an example of head tracking with Accord.NET (super cool)
AIMLBot Program# AILM Chat bot library
AIMLBot (Program#) is a small, fast, standards-compliant yet easily customizable implementation of an AIML (Artificial Intelligence Markup Language) based chatter bot in C#. AIMLBot has been tested on both Microsoft's runtime environment and Mono. Put simply, it will allow you to chat (by entering text) with your computer using natural language. The project is located here.
Machine learning algorithms are extremely math heavy. Math.NET is a library that can assist with the math that is required to solve machine learning related problems.
Math.NET Numerics aims to provide methods and algorithms for numerical computations in science, engineering and every day use. Covered topics include special functions, linear algebra, probability models, random numbers, interpolation, integral transforms and more.
DotNumerics is a website dedicated to numerical computing for .NET. DotNumerics includes a Numerical Library for .NET. The library is written in pure C# and has more than 100,000 lines of code with the most advanced algorithms for Linear Algebra, Differential Equations and Optimization problems. The Linear Algebra library includes CSLapack, CSBlas and CSEispack, these libraries are the translation from Fortran to C# of LAPACK, BLAS and EISPACK, respectively.
You can find the library here.
ALGLIB is a cross-platform numerical analysis and data processing library. It supports several programming languages (C++, C#, Pascal, VBA) and several operating systems (Windows, Linux, Solaris). ALGLIB features include:
Accessing ‘R’ from C#–Lessons learned
Here are instructions to use the R statistical framework from within c#
You can check out the library at http://www.ilnumerics.net
A nice site about the basics of machine learning in c# by Seth Juarez . NuML.NET is a machine learning library for .NET developers written by Seth Juarez. I've recently tried this library and I'm impressed! Seth has stated publicly that his intention behind the numl.net library is to abstract the scary math away from machine learning to provide tools that are more approachable by software developers and boy did he deliver! I've been working with this library for a little more than an hour and I've written a prediction app in c#. You can find his numl.net library source on github.
Encog Machine Learning Framework
Here is what the official Heaton Research website has to say about Encog:
Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models and Genetic Algorithms are supported. Most Encog training algoritms are multi-threaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms. Encog has been in active development since 2008.
Encog is available for Java, .Net and C/C++.
This concludes my article on Machine learning resources for the .NET developer. If you have any suggestions regarding a project that you know of or you are working on related to Machine learning in .NET, please don't hesitate to leave a comment and I will update the article to mention the project. This article has shown that we as .NET developers have many resources available to us to use to implement Machine learning based solutions. I appreciate your time in reading this article and I hope you found it useful. Please subscribe to my RSS feed. Until next time..
Mono 3.0.4 released
Greetings to all of you open source patrons out there!
I've just received news of the latest release of Mono (3.0.4). The new release includes several major improvements and bug fixes. In this article, I'd like to provide a brief overview highlighting the major changes in the latest release of Mono.
So without further ado, here is a quick overview of what's offered in this version of the Mono project.
Improved garbage collection
The GC implementation has been given a makeover. These changes include:
A new approach called "cementing" has been added to the SGen concurrent garbage collector.
Mono allocates all new small objects in a defined memory space referred to as the nursery. When a collection occurs, the surviving objects become root objects and are copied to the major heap. Typically, few references that are allocated to the nursery survive to become roots, so the majority of the objects are instantly collected which leaves plenty of allocation space for new objects. These nursery collections minimize the work that must be done by the collector.
One of the problems with the garbage collection in previous versions of mono involved instances in which objects are "pinned" in the nursery (due to managed/unmanaged references or other operations). Objects that are "pinned" cannot be moved to the major heap.
Typically the collector must keep track of these "pinned" objects (and their relationships) and it rescans them on each collection attempt to try to see if they have been released and are able to be moved. This approach was an inefficient practice of the collector. This is where cementing comes in to play.
Cementing is a process by which references in the nursery that are pinned are simply marked as root objects, but they remain in the nursery since they can't be moved to the heap. This dramatically reduces overhead related to pinned nursery objects and their relationships.
There are also several bug fixes related to garbage collection including #9928 pointer free deadlock problem and bugs in mono_gc_weak_link_get
Improved StreamReader/StreamWritter asynchronous operations
The asynchronous operations have been rewritten to resolve bug #9761. Which caused the operations to fail on subsequent calls.
OSX Homebrew installation conflict resolution
Mono no longer installs a /usr/bin/pkg-config file on OSX, which resolves an issue that effected Homebrew installations.
The installation only contains the new Gtk+ stack that allows the new Xamarin Studio to run on OSX with 3.0.
This is exciting news!
Conclusion (for now)
Well that about wraps it up. Oh, one more thing..
In case you haven't heard, Xamarin has released Xamarian 2.0 which includes iOS development from within Visual Studio, a brand new IDE called Xamarin studio that is geared toward developing mobile apps for Android, and iOS. The IDE runs on Windows, Linux and OSX!
I would like to mention that I will be delivering a detailed refactorthis.net product review on the new and exciting features of Xamarin 2.0.
So check back for my review and thanks for reading!