User Tools

Site Tools


bicn01:dm05

Create a Naive Bayes Mining Model

The Microsoft Naive Bayes algorithm is a classification algorithm provided by Microsoft SQL Server Analysis Services for use in predictive modeling. The name Naive Bayes derives from the fact that the algorithm uses Bayes theorem but does not take into account dependencies that may exist, and therefore its assumptions are said to be naive.

This algorithm is less computationally intense than other Microsoft algorithms, and therefore is useful for quickly generating mining models to discover relationships between input columns and predictable columns. You can use this algorithm to do initial explorations of data, and then later you can apply the results to create additional mining models with other algorithms that are more computationally intense and more accurate.

The Microsoft Naive Bayes Viewer lists each input column in the dataset, and shows how the states of each column are distributed, given each state of the predictable column. You can use this view to identify the input columns that are important for differentiating between states of the predictable column. For example, in the Commute Distance column, we will see that if the customer commutes from one to two miles to work, the probability that the customer will buy a bike is 0.387, and the probability that the customer will not buy a bike is 0.287.

  • In the Mining Models tab of Data Mining Designer, right-click the Structure column, and select New Mining Model.
  • In the New Mining Model dialog box, under Model name, type STM-NaiveBayes.
  • In Algorithm name, select Microsoft Naive Bayes, then click OK.
A message may appear stating that the Microsoft Naive Bayes algorithm does not support the continuous values.

Deploying and Processing the Model

In Data Mining Designer, we have to process the mining structure, the specific mining model that is associated with a mining structure, or the structure and all the models that are associated with that structure.

  • In the Mining Model menu, select Process Mining Structure and All Models.
  • Click Run in the Processing Mining Structure - Targeted Mailing dialog box.

The Process Progress dialog box opens to display the details of model processing. Model processing might take some time, depending on your computer.

  • Click Close in the Process Progress dialog box after the models have completed processing.

Exploring the Naïve Bayes Model

The Microsoft Naive Bayes algorithm provides several methods for displaying the interaction between bike buying and the input attributes.

The Dependency Network tab works in the same way as the Dependency Network tab for the Microsoft Tree Viewer.

Each node in the viewer represents an attribute, and the lines between nodes represent relationships. In the viewer, you can see all the attributes that affect the state of the predictable attribute, Bike Buyer Flag.

  • Use the Mining Model list at the top of the Mining Model Viewer tab to switch to the STM-Naive-Bayes model.
  • Click the Bike Buyer Flag node to identify its dependencies.

The pink shading indicates that all of the attributes have an effect on bike buying.

  • Adjust the slider to identify the most influential attribute.

As you lower the slider, only the attributes that have the greatest effect on the [Bike Buyer Flag] column remain. By adjusting the slider, you can discover that a few of the most influential attributes are: Age Group, Social Class Category, and Bike Friendly Area.

The Attribute Profiles tab describes how different states of the input attributes affect the outcome of the predictable attribute.

To explore the model in the Attribute Profiles tab:

  • In the Predictable box, verify that Bike Buyer Flag is selected.If the Mining Legend is blocking display of the Attribute profiles, move it out of the way.
  • In the Histogram bars box, select 5.

In our model, 5 is the maximum number of states for any one variable.

The attributes that affect the state of this predictable attribute are listed together with the values of each state of the input attributes and their distributions in each state of the predictable attribute.

In the Attributes column, find Bike Friendly Area. Notice the differences in the histograms for bike buyers (column labeled “yes”) and non-buyers (column labeled “no”). A person living in a very Bike-Friendly Area is much more likely to buy a bike than a person that lives in a Less Bike-Friendly Area.

* Right-click the **Bike Friendly Area** cell in the bike buyer (column labeled "yes") column. 

The Mining Legend displays a more detailed view and shows the distribution of the data in that group.

With the Attribute Characteristics tab, you can select an attribute and value to see how frequently values for other attributes appear in the selected value cases.

To explore the model in the Attribute Characteristics tab:

  • In the Attribute list, verify that Bike Buyer Flag is selected.
  • Set the Value to “yes”.

In the viewer, you will see that customers who have a poor public transportation, are more likely to buy a bike.

With the Attribute Discrimination tab, you can investigate the relationship between two discrete values of bike buying and other attribute values. Because the STM-NaiveBayes model has only two states, “yes” and “no”, you do not have to make any changes to the viewer.

In the viewer, you can see that people who live in a poor public transportation and in a Very Bike-Friendly Area tend to buy bicycles, and people from the Upper Class with a less bike-friendly region tend not to buy bicycles.

bicn01/dm05.txt · Last modified: 2018/12/04 08:39 (external edit)