Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple data sources. An Analysis Services database also contains mining structures and mining models, and supplemental objects such as data sources and data source views.
Business Intelligence Development Studio is the primary environment for data warehouse designer that will use to develop business solutions that include Analysis Services, Integration Services, and Reporting Services projects. Each project type supplies templates for creating the objects required for business intelligence solutions, and provides a variety of designers, tools, and wizards to work with the objects.
To create new OLAP cubes, designers use Business Intelligence Development Studio. Business Intelligence Development Studio is the environment that you use to develop OLAP cubes and data mining models.
You create Analysis Services projects in Business Intelligence Development Studio by using the Analysis Services Projects template. The Analysis Services project helps you organize and manage the items that are required to build, debug and deploy an Analysis Services database.
The prospective customers addresses came from a recent marketing campaign in Bike Magazine where AdventureBikes Germany invited the readers to enter a raffle drawing to win a 7-day adventure tripp to North America.
The design surface in Data Tools is specialized for each object that you work with in Analysis Services. For example, there is a designer for working with data mining models, named Data Mining Designer, and another designer for working with cubes, named Cube Designer.
The Solution Explorer, located to the right of the design surface, provides a mechanism for navigating between design surfaces and managing the items in a project. Business Intelligence Development Studio also contains a deployment window that displays the progress of a deployment, and a properties window, enabling you to change the properties of selected objects.
When you are working with an Analysis Services project, you must build and deploy the project to an Analysis Services instance to instantiate the objects. Processing the objects in an instance of Analysis Services copies the data from the underlying data sources into the cube objects.
A data source is a data connection that is saved and managed in your project and deployed to your Microsoft SQL Server Analysis Services database. The data source contains the names of the server and database where your source data resides, in addition to any other required connection properties.
The new data source, Adventure Bikes CRM DataMart appears in the Data Sources folder in Solution Explorer.
Projects in Analysis Services are designed based on a logical data model of related tables, views, and SQL queries from one or more data sources. This logical data model is called a data source view. A data source view contains the metadata from selected data source objects, including any relationships between these objects defined in the underlying data source or defined in the data source view.
The new data source view opens in the Customer DataSets.dsv
We take a look at the data sets.
Scrolling through the data, Susan notices that there is a lot of interesting information which can be used for analyzing customer sales.
The marketing department subscribe to the Nielson's data marketplace where they can download sociology-demographic and market research data about different aspects. One aspect is geographical data. They receive a monthly data-set that contains data about cities. The cities are identified by the ZIP code, which is used in postal addresses. From the ZIP code, the creator of the Actual Customer DataSet - perhaps the data warehouse team - has enriched the customer data with.
Have a look at the Nielsons ZIP Data. You will find a lot of useful information about the places where your customers live.
Wouldn't it make sense to create a customer age group.
Next, we want to identify all customers who bought a bicycle. If you look on the data, you will notice, that you can derive this information from the product category column.
Finally, we want to connect the Nielson ZIP Data to the DataSet Actual Customer. This is an easy job, because both, the ZIP code and the ZIP city name is available in both tables.
Next, we manually create a relationship between the Actual Customer Data and the Nielson's ZIP data.
Your data set should look like this.
By investigating the attributes of previous bike buyers, Susan hopes to discover patterns that they can then apply to potential customers.
In this task Susan will create the initial mining structure, based on the Microsoft Decision Trees algorithm.
The Microsoft Decision Trees algorithm is an algorithm that incorporates different methods for creating a tree.
The first step in creating a targeted mailing scenario is to use the Data Mining Wizard in Data Tools to create a new mining structure and decision tree mining model.
You can click Browse to view the tables in the data source view and then click Close to return to the wizard.
You will use the DataSet Prospective Customer table later for testing; ignore it for now.
Verify that the checkbox in the Key column is selected in the Customer ID row.
Select the checkboxes in the Input column in the following rows. These attributes will be used by the data mining algorithmn.
On the far left column - the structure columns - of the page, also select the check boxes in the following rows.
The Microsoft Decision Trees algorithm supports modeling of both discrete and continuous attributes.
Typically, the wizard will detect numbers and assign an appropriate numeric data type, but there are many scenarios where you might want to handle a number as text instead. For example, the GeographyKey should be handled as text, because it would be inappropriate to perform mathematical operations on this identifier.
Separating data into training and testing sets is an important part of evaluating data mining models. Typically, when you partition a data set into a training set and testing set, most of the data is used for training, and a smaller portion of the data is used for testing. Analysis Services randomly samples the data to help ensure that the testing and training partitions are similar. By using similar data for training and testing, you can minimize the effects of data discrepancies and better understand the characteristics of the model.
After a model has been processed by using the training set, you test the model by making predictions against the test set. Because the data in the testing set already contains known values for the attribute that you want to predict, it is easy to determine whether the model's guesses are correct.
Drill-through can be enabled on models and on structures. The check-box in this window enables drill-through on the named model and enables you to retrieve detailed information from the model cases that were used to train the model.
In Data Mining Designer, you can process a mining structure, a specific mining model that is associated with a mining structure, or the structure and all the models that are associated with that structure. For this task, we will process the structure and the model at the same time.
If you made changes to the structure, you will be prompted to build and deploy the project before processing the models.
The Process Progress dialog box opens to display the details of model processing. Model processing might take some time, depending on your computer.
After the models in your project are processed, you can explore them in Business Intelligence Development Studio to look for interesting trends. Because the results of mining models are complex and can be difficult to understand in a raw format, visually investigating the data is often the easiest way to understand the rules and relationships that the algorithms have discovered within the data. Exploring also helps you to understand the behavior of the model and discover which model performs best before you deploy it.
Each model you created is listed in the Mining Model Viewer tab in Data Mining Designer. Each algorithm that you used to build a model in Analysis Services returns a different type of result. Therefore, Analysis Services provides a separate viewer for each algorithm.
Analysis Services also provides a generic viewer that works for all model types. The Generic Content Tree Viewer displays detailed model content information that varies depending on the algorithm that was used.
On the Decision Tree tab, you can examine all the tree models that make up a mining model.
By default, the designer opens to the first model that was added to the structure – in this case, Decision-Tree.
By default, the Microsoft Tree Viewer shows only the first three levels of the tree. If the tree contains fewer than three levels, the viewer shows only the existing levels. You can view more levels by using the Show Level slider or the Default Expansion list.
By changing the Background setting, you can quickly see the number of cases in each node that have the target value of “yes” for the attribute Bike Buyer Flag. Remember that in this particular scenario, each case represents a customer. The value “yes” indicates that the customer previously purchased a bicycle; the value “no” indicates that the customer has not purchased a bicycle.
Place your cursor over the node labeled All. A tool tip will display the following information:
Alternately, place your cursor over any node in the tree to see the condition that is required to reach that node from the node that comes before it. You can also view this same information in the Mining Legend.
The histogram is displayed as a thin horizontal bar across the node and represents the distribution of customers in this age range who previously did (pink) and did not (blue) purchase a bike. The Viewer shows us that customers with an age under 25 living in an Very Bike-Friendly Region or Middle Bike-Friendly Region are likely to purchase a bicycle. If you open the Middle Bike-Friendly Region node you can note, that the sales is also determined by the landscape.
Because you enabled drill-through when he created the structure and model, he can retrieve detailed information from the model cases and mining structure, including those columns that were not included in the mining model (e.g., City name, Address, FirstName).
The details for each training case are displayed in spreadsheet format. These details come from the DataSet Actual Customers view that you selected as the case table when building the mining structure.
The same spreadsheet displays with the structure columns appended to the end.