It is important to understand the data mining process. This hands-ons consists of two steps.
In the Customer Relationship Management system, you will find a data set about our customers, the customer orders and the data about the ZIP codes.
034
Please note, that there is only information about the customer. Right click on Explore data to see some sample data.
035
We need to add additional fields about social democatics about the zip codes to the data.
This is a great thing. You can use the entire language of SQL and make really good things with it.
A named query lets you extend the relational schema of existing tables without modifying the underlying data source. For example, a named query can be used to join multiple database tables from a data source into a single data source view table.
In the free space of the query definition editor, select Add Tables.
Named Query - Login failed for user
Add:
Upps! You cannot mark any column. Bug or feature? That’s how it’s done (Workaround):
Voila! Now the column names are there.
Build two joins. Just drag-and-drop the column names for the actual customer table to the orders table.
1. Join: V_CRM_ACTUAL_CUSTOMER.CUSTOMER_ID -> V_CRM_CUSTOMER_ORDERS.CUSTOMER_ID
and
2. Join: * V_CRM_ACTUAL_CUSTOMER.AddressZIP -> V_CRM_ZIP_CODES.ZIP, and * V_CRM_ACTUAL_CUSTOMER.AddressCity -> V_CRM_ZIP_CODES.City
Now we have nearly created the data basis. What’s missing?
036
We want to identify the Bike Buyer. How do we do that?
We create a flag BikeBuyerFlag as a new field in the Named Query.
Do not just copy-paste the html text into the query editor. Use notepad to convert html to plain text.
Use BikeBuyerFlag as the alias name for this new column.
How to add the BikeBuyerFlag to the Data Source
Now we are ready.
Let’s go with Data Mining!!!
037
You will now train, test, and explore a data mining models. Then, you are ready to use the models to identify the people most likely to respond to the new targeted mailing campaign.
Let us create a new mining structure.
The Data Mining Wizard starts.
The method we want to use to define the mining structure is from existing relational database.
There is only one data source view available. Select it.
The Specify the Training Data – window is very important. Select
The next question is, which attributes determines a sales of a bike?
Let us discuss this together…
If you don’t want to wait, select
In the Specify Columns‘ Content and Data Type Window, it is important to set the correct Content Type.
It is important, that the BikeBuyerFlag is a discrete data type.
Before you complete the wizard, name the Mining structure and Mining model.
Before you deploy your mining model on the server, check under Project → Properties if the deployment server is set to NO MDB
Let’s go! Run Build → Deploy.
You should get
To analyse your data in the Mining Model Viewer, set the background to yes. Study the decision tree. Click on a dark box and read the legend. Do you understand it?
Use the clustering algorithm to have a second algorithm.
…
Analyse it.
Then, in the Input Selection Window, set the Prediction Value to yes.
In the Mining Accuracy Chart, look at the Lift Chart. Do you understand it?
Close Visual Studio.
038
Idea: get the data from the CRM system, apply the profiling on it, calculate the probability and write the data back to CRM.
We use Integration Services to define a data flow.
Steps:
In Connection Manager, create an OLEDB Connection to the CRM system. We get the prospective customers from there.
Create a new SSIS Package, or rename the default package to Lead_Generation.dtsx
Create a Data Flow Task and name it Lead Generation.
Tipp: if the SSIS Toolbox does not show up, you can display it via Menu → SSIS → SSIS Toolbox.
Add an OLE DB Source to get the prospects from the CRM system.
Next, we add a new column BikeBuyerFlag to our prospective data. The BikeBuyerFlag is used to set the prediction and your username will help us later to identify your data.
Next, we will add the attributes from the ZIP data to the data set.
Create a connection to the ODS and select the object that contains the ZIP codes. In the columns window, create a relationship between ZIP and City and select all attributes starting with ZIP .
Apply Data Mining to your Data Set
Next, we have to apply the data mining transformation to our data.
Add a Data Mining Query to your data flow.
In the input selection window, select lookup match output.
Open the Data Mining Query Transformation.
The connection should point to your aCRM_Lead_Generation project.
Create a new connection.
Select your mining structure and the mining model.
Next, we have to build a mining query.
Next, we want to calculate the probability that indicates the percent value that a prospect is likely to buy a bike.
Next, we have to check the mapping between the mining model and the input model.
Click on any line between the models and select Modify Mapping . The mapping should connect
and
How to write data back to CRM? A good CRM or ERP system contains retractors (versus extractors). Retractor programs insert or update data in a system. The hide the complexity of a data struction.
Our retractor is called crm_retractor_leads.
It required to following parameter:
Add an OLEDB command transformation and name it CRM retractor Leads.
In the connection manager window, select your CRM system under “connection manager”
In the Component Properties, set the SQLCommand to
The ? is used as a parameter place holder.
In the column mapping, map
Run the package.
Login to the CRM system and check your leads.
Open Customer Relationship Management → SD - Your Leads.
Select ZIP-Codes that start with a 7…
Click on Predict BikeBuyer.
The prediction may run a few seconds. Have a look at the results. The probability column shows the probability that a prospective customer is likely to buy a bike.
040