The model that performs the best will be used by the AdventureBikes marketing department to identify the customers for their targeted mailing campaign.
Consider a group of people who share similar demographic information and who buy similar products from the Adventure-Bikes company. This group of people represents a cluster of data. Several such clusters may exist in a database. By observing the columns that make up a cluster, you can more clearly see how records in a data-set are related to one another.
You can customize the way the algorithm works by selecting a specifying a clustering technique, limiting the maximum number of clusters, or changing the amount of support required to create a cluster.
The new model now appears in the Mining Models tab of Data Mining Designer. This model, built with the Microsoft Clustering algorithm, groups customers with similar characteristics into clusters and predicts bike buying for each cluster. Although you can modify the column usage and properties for the new model, no changes to the STM-Clustering model are necessary.
In Data Mining Designer, we have to process the mining structure, the specific mining model that is associated with a mining structure, or the structure and all the models that are associated with that structure.
The Process Progress dialog box opens to display the details of model processing. Model processing might take some time, depending on your computer.
The Microsoft Clustering algorithm groups cases into clusters that contain similar characteristics. These groupings are useful for exploring data, identifying anomalies in the data, and creating predictions.
The default variable is Population, but you can change this to any attribute in the model, to discover which clusters contain members that have the attributes you want.
A tooltip displays the percentage of cases that have the attribute Bike Buyer = yes.
When you select a cluster, the lines that connect this cluster to other clusters are highlighted, so that you can easily see all the relationships for this cluster. When the cluster is not selected, you can tell by the darkness of the lines how strong the relationships are amongst all the clusters in the diagram. If the shading is light or nonexistent, the clusters are not very similar.
The Cluster Profiles tab contains a column for each cluster in the model. The first column lists the attributes that are associated with at least one cluster. The rest of the viewer contains the distribution of the states of an attribute for each cluster.
The distribution of a discrete variable is shown as a colored bar with the maximum number of bars displayed in the Histogram bars list.
The Variables column is sorted in order of importance for that cluster. Scroll through the column and review characteristics of the Bike Buyer High cluster.
For example, they are more likely to buy a bicycle in an age group between 46 and 55 or between 56 and 65.
The Mining Legend displays a more detailed view and you can see the age range of these customers as well as the mean age.
With the Cluster Characteristics tab, you can examine in more detail the characteristics that make up a cluster. Instead of comparing the characteristics of all of the clusters, you can explore one cluster at a time.
For example, if you select Bike Buyers High from the Cluster list (1), you can see the characteristics of the customers in this cluster. Though the display is different from the Cluster Profiles viewer, the findings are the same.
With the Cluster Discrimination tab, you can explore the characteristics that distinguish one cluster from another. After you select two clusters, one from the Cluster 1 list, and one from the Cluster 2 list, the viewer calculates the differences between the clusters and displays a list of the attributes that distinguish the clusters most.
Click Variables to sort alphabetically.
Some of the more substantial differences among the customers in the Bike Buyers Low and Bike Buyers High clusters include Age Group (Age between 56 and 65), Month of Sales (March and December) and Distance to Sales Office (less then 5 km).