The Microsoft Naive Bayes algorithm is a classification algorithm provided by Microsoft SQL Server Analysis Services for use in predictive modeling. The name Naive Bayes derives from the fact that the algorithm uses Bayes theorem but does not take into account dependencies that may exist, and therefore its assumptions are said to be naive.
This algorithm is less computationally intense than other Microsoft algorithms, and therefore is useful for quickly generating mining models to discover relationships between input columns and predictable columns. You can use this algorithm to do initial explorations of data, and then later you can apply the results to create additional mining models with other algorithms that are more computationally intense and more accurate.
The Microsoft Naive Bayes Viewer lists each input column in the dataset, and shows how the states of each column are distributed, given each state of the predictable column. You can use this view to identify the input columns that are important for differentiating between states of the predictable column. For example, in the Commute Distance column, we will see that if the customer commutes from one to two miles to work, the probability that the customer will buy a bike is 0.387, and the probability that the customer will not buy a bike is 0.287.
In Data Mining Designer, we have to process the mining structure, the specific mining model that is associated with a mining structure, or the structure and all the models that are associated with that structure.
The Process Progress dialog box opens to display the details of model processing. Model processing might take some time, depending on your computer.
The Microsoft Naive Bayes algorithm provides several methods for displaying the interaction between bike buying and the input attributes.
The Dependency Network tab works in the same way as the Dependency Network tab for the Microsoft Tree Viewer.
Each node in the viewer represents an attribute, and the lines between nodes represent relationships. In the viewer, you can see all the attributes that affect the state of the predictable attribute, Bike Buyer Flag.
The pink shading indicates that all of the attributes have an effect on bike buying.
As you lower the slider, only the attributes that have the greatest effect on the [Bike Buyer Flag] column remain. By adjusting the slider, you can discover that a few of the most influential attributes are: Age Group, Social Class Category, and Bike Friendly Area.
The Attribute Profiles tab describes how different states of the input attributes affect the outcome of the predictable attribute.
To explore the model in the Attribute Profiles tab:
In our model, 5 is the maximum number of states for any one variable.
The attributes that affect the state of this predictable attribute are listed together with the values of each state of the input attributes and their distributions in each state of the predictable attribute.
In the Attributes column, find Bike Friendly Area. Notice the differences in the histograms for bike buyers (column labeled “yes”) and non-buyers (column labeled “no”). A person living in a very Bike-Friendly Area is much more likely to buy a bike than a person that lives in a Less Bike-Friendly Area.
* Right-click the **Bike Friendly Area** cell in the bike buyer (column labeled "yes") column.
The Mining Legend displays a more detailed view and shows the distribution of the data in that group.
With the Attribute Characteristics tab, you can select an attribute and value to see how frequently values for other attributes appear in the selected value cases.
To explore the model in the Attribute Characteristics tab:
In the viewer, you will see that customers who have a poor public transportation, are more likely to buy a bike.
With the Attribute Discrimination tab, you can investigate the relationship between two discrete values of bike buying and other attribute values. Because the STM-NaiveBayes model has only two states, “yes” and “no”, you do not have to make any changes to the viewer.
In the viewer, you can see that people who live in a poor public transportation and in a Very Bike-Friendly Area tend to buy bicycles, and people from the Upper Class with a less bike-friendly region tend not to buy bicycles.