What is correlation?

From wikipedia

In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence.

In laymans terms, correlation is a relationships between data attributes.  For a quick refresher, in data mining, a dataset is made up of different attributes.  We use these attributes to classify or predict a label.  Some attributes have more "meaning" or influence over the label's value.  As you can imagine, if you can determine the influence that specific attributes have over your data, you are in a better position to build a classification model because you will know which attributes you should focus on when building your model.

In this example, I will use the kaggle.com Titanic datamining challenge dataset.  This post will not uncover any information that is not readily available in the tutorial posted on kaggle.com.

Here are two screenshots.  The first screenshot will show you some statistics about the dataset.  The second screenshot will show a sample of the data.

Meta data view of the Titanic data mining challenge Training dataset

A data view of the dataset

The correlation matrix

First start by importing the Titanic training dataset into RapidMiner.  You can use Read From CSV, Read From Excel, or Read from Database to achieve this step.  Next, search for the "Correlation Matrix" operator and drag it onto the process surface.  Connect the Titanic training dataset output port to the Correlation Matrix operator's input example port.  Your process should look like this.

Now run the process and observe the output.

You are presented with several different result views.  The first view will be the Correlation Matrix Attribute Weights view.  The Attribute weights view displays the "weight" of each attribute.  The purpose of this tutorial is to explain a different view of the Correlation matrix.  Click on the Correlation Matrix view.  This is a matrix that shows the Correlation Coefficients which is a measure of the strength of the relationship between our attributes.  An easy way to get started with the Correlation matrix is to notice that when an attribute intersects with itself, you have a dark blue cell with the value of 1 which represents the strongest possible value.  This is because any attribute matched with itself is a perfect correlation.  A correlation coefficient value can be positive or negative.  A negative value does not necessarily mean there is less of a relationship between the values represented.  The larger the coefficient in either direction represents a strong relationship between those two attributes.  If we look at the matrix and follow along the top row (survived) we will see the attributes that have the strongest correlation with the label in which we are trying to predict.

Just as the kaggle.com tutorial specifies, the attributes with the strongest correlation with the label (survived) are

sex(0.295), pclass(0.115), and fare(0.66)

Remember that the value as well as the color will help you to visually identify the stronger correlation between attributes.

If you are working with a classification problem, I'm sure you can see how valuable the correlation matrix can be in showing you the relationships between your label and attributes.  Such insights let can provide a great start on where to focus your attention when building your classification model.

Thanks for reading and keep your eyes open for my next tutorial!

• Comment
• Preview

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).

Month List

refactorthis.net | All posts tagged 'Silverlight'

Prism 4.1 release for .NET 4.5 and Windows 8

This is a short post but I'm happy to let you know that the Microsoft patterns and practices team are currently testing a new release of Prism, the composite user interface framework.  There are plans for the new release to support Windows 8 WinRT applications as well as .NET applications. I'm happy regarding this release because I was worried that this project may have been abandoned like Silverlight.   Read about it here.   Thanks

XAML Basics: Styles

Styles in XAML If you are a Microsoft Windows developer, chances are you are using XAML with one of the prominent development technologies (WPF, Windows Phone, SilverLight, WinRT).  XAML is a wonderful markup language that provides a whole new look at developing applications for the Microsoft Windows operating system.  In this series, I plan to cover the basics of XAML as they apply to these technologies.  Microsoft has put a lot behind XAML and I think we know that it has a bright future when it comes to Windows development. What are XAML styles? You can think of a style in XAML much as you would CSS styles in HTML.  Styles provide a way for you to change the visual properties of a given control or set of controls.  Never before has the developer had this much freedom when it comes to changing the appearance of the standard Windows controls.  Using Styles you can declare the font size and family of all buttons in an application.  You can create style resource files that serve as style libraries that can be shared by applications. This article will use WPF as the technology when explaining XAML styles, however, most if not all of the principles can be used with the other XAML based technologies. Defining styles in your application The way that you define styles is very much the same in WPF, SilverLight, and WinRT applications with subtle differences.  You can define styles at the Window, Page, UserControl, Application, and Resource Dictionary levels. We will start with an example of a WPF window called MainWindow.xaml. MainWindow.xaml <Window x:Class="WPF_styles_article_example.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="MainWindow" Height="350" Width="525"> <Grid> <StackPanel Orientation="Vertical" > <Button Margin="20" Name="btnOne" Content="Button One" Width="260" Height="40"> </Button> <Button Margin="20" Name="btnTwo" Content="Button two" Width="260" Height="40"> </Button> </StackPanel> </Grid> </Window>   Here's the output As you can see, we have two buttons contained inside of a StackPanel.  They both share the same height and width. You typically want a consistent look for all elements of your application.  If you were to follow the method used in this example, you would need to find each button in the entire application to make any changes to the properties of your buttons.  The more properties that are set on the button, the more potential properties that you would need to change. The XAML style object Styles are defined in a Style object.  A xaml Style contains a collection of Setter objects.  These Setter objects allow you to specify the control type in which the style should be applied, the property of the control to change, and the property value that you wish to set for the specified control type.  Styles are defined in Resource Dictionaries.  Once you've created your Style and populated the setters to define the properties and property values to be applied, you can then reference this resource in the Style property of the control that you wish style.   Here is an example of changing the style of a control by binding a style resource to the control's Style property. <Window x:Class="WPF_styles_article_example.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="MainWindow" Height="350" Width="525"> <Window.Resources> <!-- Here we define our Style object and populate it's setter collection with property names and values. Notice that we specify the TargetType as the type of control that can use the style. --> <Style x:Key="CustomButtonStyle" TargetType="Button"> <Setter Property="FontFamily" Value="Verdana" /> <Setter Property="FontSize" Value="30" /> <Setter Property="Width" Value="260" /> <Setter Property="Height" Value="40" /> </Style> </Window.Resources> <Grid> <StackPanel Orientation="Vertical" > <!-- Notice that we have moved all visual property settings into the style resource. --> <Button Margin="20" x:Name="btnOne" Content="Button One" Style="{StaticResource CustomButtonStyle}" /> <Button Margin="20" Name="btnTwo" Content="Button two" Width="260" Height="40" /> <!-- Notice that setting the properties on the Button will override the properties specified in the style. --> <Button Margin="20" Name="btnThree" Content="Button three" Style="{StaticResource CustomButtonStyle}" FontSize="15" FontFamily="Wide Latin" /> </StackPanel> </Grid> </Window> Here's the output As you can see, we've taken the visual properties of btnOne and moved them into a Style resource.  We specify the x:Key property so that the style resource can be referenced by other controls.  We specify the type of control that can use the style by setting the TargetType property and populate the Style's Setter collection with Setter objects that define property names and values of the style. You'll also notice that we added a third button and set it's Style property just as we did with btnOne, however, btnThree overrides the properties that we defined in the Style resource.  Properties set at the control level take precedence over the properties defined in the Style resource. Creating a default style for a specified control type. In the previous example, we defined a Style resource and referenced the resource key in two of the three control's Style properties.  Since we didn't specify the Style property of btnTwo, it was not effected by the Style's defined Setter property values. What happens if we wanted to define a default Style for all of the button's for the current Window?  As it turns out, this process is fairly simple.  All we have to do is remove the Style's x:Key property.  This will cause all of the control's of the specified TargetType to apply the specified style.  Here is an example. <Window x:Class="WPF_styles_article_example.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="MainWindow" Height="350" Width="525"> <Window.Resources> <!-- I've removed the x:Key attribute to apply the style to all Buttons in the window. --> <Style TargetType="Button"> <Setter Property="FontFamily" Value="Verdana" /> <Setter Property="FontSize" Value="30" /> <Setter Property="Width" Value="260" /> <Setter Property="Height" Value="40" /> </Style> </Window.Resources> <Grid> <StackPanel Orientation="Vertical" > <!-- No styles set --> <Button Margin="20" x:Name="btnOne" Content="Button One" /> <Button Margin="20" Name="btnTwo" Content="Button two" /> <Button Margin="20" Name="btnThree" Content="Button three" /> </StackPanel> </Grid> </Window> Here's the output Applying styles across an entire application The previous examples demonstrate defining the Style resources at the Window level.  The same approach applies to Pages and UserControls. If you want to make styles available to the entire application, simply create an Application level resource.   Here is an example. <Application x:Class="WPF_styles_article_example.App" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" StartupUri="MainWindow.xaml"> <Application.Resources> <Style TargetType="Button"> <Setter Property="FontFamily" Value="Verdana" /> <Setter Property="FontSize" Value="30" /> <Setter Property="Width" Value="260" /> <Setter Property="Height" Value="40" /> </Style> </Application.Resources> </Application> Here's the output Conclusion This concludes the first article in the XAML basics series.  The next article in the series will build on the concept of XAML styles by introducing Control Templates. Stay tuned!

C# 5.0: INotifyPropertyChanged with the [CallerMemberName] attribute

WPF/Silverlight and the INotifyPropertyChanged interface Greetings and welcome to another post on refactorthis.net. Today's topic is about an interface that all WPF and Silverlight developers have grown to love.  The INotifyPropertyChanged interface facilitates notifying the data binding mechanisms of WPF and Silverlight of a property value change in your ViewModel to your view.  Here is a simple illustration.   ClassicViewModel.cs using System; using System.Collections.Generic; using System.ComponentModel; using System.Linq; using System.Text; using System.Threading.Tasks; namespace INotifyPropertyChangedExample { public class ClassicViewModel { private string _personName; public event PropertyChangedEventHandler PropertyChanged; public string PersonName { get; set { if (value != _personName) _personName = value; //No type safety here. If you make a mistake you will have problems. OnPropertyChanged("PersonName"); } } public void OnPropertyChanged(string property) { if (property == null) throw new ArgumentNullException("property"); if (PropertyChanged != null) PropertyChanged(this, new PropertyChangedEventArgs(property)); } } } When calling the method to raise the property changed event, you would need to pass the property name as a string to the handler.  If you made any mistake in typing the property name, you would have to take time to debug your data binding. A new and improved solution C# 5.0 has provided some nifty compiler level attributes to assist with this scenario. The [CallerMemberName] attribute of the System.Runtime.CompilerServices namespace can be used as an optional method parameter.  When a method is called and there is no value specified for the parameter that is decorated with this attribute, you are provided with the name of the member that called the method.  This takes the risk of a typing mishap out of your hands and allows you to rely on the compiler to provide this information to you!  Here is an updated example! ImprovedViewModel.cs using System; using System.Collections.Generic; using System.ComponentModel; using System.Linq; using System.Runtime.CompilerServices; using System.Text; using System.Threading.Tasks; namespace INotifyPropertyChangedExample { public class ImprovedViewModel : INotifyPropertyChanged { private string _personName; public event PropertyChangedEventHandler PropertyChanged; public string PersonName { get { return _personName; } set { if (value != _personName) _personName = value; OnPropertyChanged(); } } public void OnPropertyChanged([CallerMemberName] string property = null) { if (PropertyChanged != null) PropertyChanged(this, new PropertyChangedEventArgs(property)); } } }   As you can see, this is a great little addition to the C# language that will make your life a little easier. I'd like to thank Patrick Steel for his article in MSDN magazine that inspired this article.  Please check out his article for more uses of this and other new attributes and features of the C# 5.0 language. http://visualstudiomagazine.com/Articles/2012/11/01/More-Than-Just-Async.aspx