User Tools

Site Tools


bicn01:game_etl



The ETL Process

Guideline: Load Data from ODS into the Data Mart

Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions. You use Integration Services to solve complex business problems by copying or downloading files, sending e-mail messages in response to events, updating data warehouses, cleaning and mining data, and managing SQL Server objects and data. The packages can work alone or in concert with other packages to address complex business needs. Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations.

Integration Services includes a rich set of built-in tasks and transformations; tools for constructing packages; and the Integration Services service for running and managing packages. You can use the graphical Integration Services tools to create solutions without writing a single line of code; or you can program the extensive Integration Services object model to create packages programmatically and code custom tasks and other package objects.

Start Microsoft Visual Studio Data Tools.

Create a new IntegrationServices Project.

:!: In the location field, select the desktop as the project location.

Name it NO SID LoadDataMart

The ETL designer occurs. You will see a solution explorer, a list with ETL tools and an empty window which will serve as a graphical designer for the data flows.

Create Database Connections

Create a connection to your DataMart

In the solution explorer, right click on Connection ManagersNew Connection Manager

From the list of connection types search for OLEDB and click on Add

  • Server name: NO RDB
  • Use SQL Server identification
  • User name: NO SID
  • Password: NO PAS
  • Database: select the database from the drop-down menu: pick NO SID_DATAMART

Rename the connection to DATAMART.conmgr.

Create a connection to the DataStore

Click on Connection Managers → New Connection Manager

The correct Connection Type is OLEDB and click Add → New

  • Server name: dwh.hdm-server.eu
  • Use SQL Server identification
  • Username: NO ODS
  • Password: NO ODS
  • Database: select it from the dropdown menu: Pick AdventureBikes_ODS_NO GID

Rename the connection to DATASTORE.conmgr.

Save your project.

042

Create an SSIS package to load the Data

Next, we create the data flow for the dimension tables.

Right-Click on SSIS Package → New SSIS package.

  • Rename the SSIS Package: LoadDimensions.dtsx

Load Calendar

From the SSIS Toolbox, drag and drop the DataFlow Task to the Control Flow window.

  • Rename the Data Flow Task to Load Calendar.
The Control Flow contains functions in order to control the data flow. Example: send an email, loop through files, get data with ftp.

Double-click to edit the data flow task.

The functions in the SSIS toolbox have changed. Drag and drop the OLE DB Source to the Data Flow Window.

  • Double click to edit the data source parameters
  • Type the following connection parameters to the OLE DB Source Editor:
    • Name of the Connection Manager: DataStore
    • Name of the table: V_MDS_CALENDAR_MONTH

:-A No Connection in Connection Manager found

  • Preview the data
  • Add an OLE DB Destination
  • Connect the OLE DB Source with the OLE DB Destination

  • Add the following connection parameters to the OLE DB Destination Editor:
    • Name of the Connection Manager: DATAMART
    • Name of the table: D_TIME
  • In the Mappings option, connect the following objects

:-A Could not retrieve the table information

Input Target
ID_CALMONTH ID_Time
CALENDAR_MONTH_ISO Calendar_Month_ISO
CALENDAR_QUARTER Calendar_Quarter
CALENDAR_YEAR Calendar_Year
  • Start the data flow. Does it work?

:-A Invalid character value for cast specification

Open SQL Server Management Studio, connect to NO RDB with user NO SID and password NO PAS and have a look at the calendar data (Databases → NO SID → Tables → D_Time → Select Top 1000 rows). You will notice, there is data in the dimension table D_TIME

  • Switch back to SSIS and run the ETL package again.
  • To run it again, click on stop debugging and click on run

  • Did you get an Error Message?
  • To read the error message, klick on Progress, and copy the error message into notepad.

You are correct: duplicate Data rows!

To avoid the message, we will empty the table before we load the data.

  • Add an execute SQL Task before your Control Flow.

  • Right-click to edit the SQL Task.
  • Add the data mart connection into connection parameter and add the following SQL statement:
DELETE FROM D_TIME

  • Start the package again.

Did it work?

Save your project.

Load Store

Create a new data flow task in the Control Flow and rename it to “Load Store”. Decide, if you want tor run the data flow task after the calendar load or in parallel.

  • Add an OLEDB source. The source table: V_MDS_STORE
  • Add an OLEDB destination. The destination table: D_STORE
  • Map the column fields.
Input Target
ID_STOREID_Store
SALES_OFFICEStore_Name
ADDRESSStore_Address
LATITUDEStore_Latitude
LONGITUDEStore_Longitute
STORE_MANAGERStore_Manager
SALES_REGIONSales_Region
COUNTRYSales_Country

  • Add the delete statement to the sql task. Click into the SQL Statement. A little icon occurs at the end of the field. If you click on it, a text editor occurs which allows you to enhance the SQL statement in a comfortable way.
DELETE FROM D_STORE
  • Execute the control flow.

041

Save your project.

Load Campaign

Create a new data flow task in the Control Flow and rename it to Load Campaign. Decide, if you want tor run the data flow task after another load or in parallel.

  • Add an OLEDB source. The source tabe: V_MDS_CAMPAIGN
  • Add an OLEDB destination. The destination table: D_CAMPAIGN
Input Destination
ID_CAMPAIGN ID_Campaign
CAMPAIGN_NAME Campaign_Name
CAMPAIGN_TYPE Campaign_Type
CAMPAIGN_DESCR Campaign_Description
  • Add the delete statement to the SQL Statement
DELETE FROM D_CAMPAIGN
  • Execute the package

Save your project.

Load Employee

Create a new data flow task in the Control Flow and rename it to “Load Employee”. Decide, if you want tor run the data flow task after another load or in parallel.

  • Add an OLEDB source. The source tabe: V_MDS_EMPLOYEE
  • Add an OLEDB destination. The destination table: D_EMPLOYEE
Input Destination
ID_EMPLOYEE ID_Employee
EMP_NUMBER Employee_Number
EMP_POSITION Employee_Position
EMP_NAME Employee_Name
  • Add the delete statement to the SQL Statement
DELETE FROM D_EMPLOYEE
  • Execute the package.

Save your project.

Load Product

Create a new data flow task in the Control Flow and rename it to “Load Product”. Decide, if you want tor run the data flow task after another load or in parallel.

  • Add an OLEDB source. The source tabe: V_MDS_MATERIAL
  • Add an OLEDB destination. The destination table: D_PRODUCT
Source Destination
ID_MATERIAL ID_Product
MATERIAL_NUMBER Material Number
PRODUCT_CATEGORY Product_Category
PRICE_SEGMENT Price_Seqment
MATERIAL_DESC Material_Description
SALES_PRICE_EUR Material_Sales_price
TRANSFER_PRICE Material_Transfer_Price
  • Add the delete statement to the SQL Statement
DELETE FROM D_PRODUCT
  • Execute the package.

Save your project.

Load Cost Record

Create a new data flow task in the Control Flow and rename it to “Load Cost Record”. Decide, if you want tor run the data flow task after another load or in parallel.

  • Add an OLEDB source. The source tabe: V_MDS_COST RECORD
  • Add an OLEDB destination. The destination table: D_COST_RECORD
Input Destination
ID_COST_RECORD ID_Cost_Record
COST_CATEGORY Cost_Category
COMMENT Cost_Description
  • Add the delete statement to the SQL Statement
DELETE FROM D_COST_RECORD
  • Execute the package .

The final data flow package should look like this.

Save your project.

Load Fact Tables

In the control flow window, create a new data flow task. Rename it to “Load Monthly Sales”.

Connect the data flow task to the existing data flows.

  • Add an OLEDB source. The source tabe: V_ODS_MONTHLY_SALES
  • Add an OLEDB destination. The destination table: F_MONTHLY_SALES
  • Try to map the input to the destination column.

You may notice, that there is not column for Gross Profit.

So, we have to calculate the Gross Profit

  • Add the transformation Derived Column and connect it from the OLEDB Source.
  • Rename it to Calculate GrossProfit.

  • Right click to Edit.
  • Derived Column Name: GROSS_PROFIT
  • Derived Column: <add as new column>
  • Expression: REVENUE - TRANSFER_PRICE_EU - DISCOUNT_IN_EUR
  • Exit the Derived Column Transformation.
  • Add an OLEDB destination. The destination table: F_MONTHLY_SALES
  • Connect the Derived Column Transformation to the OLEDB destination
  • Check, if the mapping of the input field to the destination fields are okay.

Add the delete statement to the SQL Task.

  DELETE FROM F_MONTHLY_SALES
  • run the package.

In the control flow window, create a new data flow task and connect it to the Load Monthly Sales task. Rename it to “Load Monthly Costs”.

  • Add an OLEDB source. The source tabe: V_ODS_MONTHLY_COSTS
  • Add an OLEDB destination. The destination table: F_MONTHLY_COSTS
  • Connect the source and the destination.
  • Check, if the mapping is correct.

Add the delete statement to the SQL Task.

  DELETE FROM F_MONTHLY_COSTS
  • run the package.

In the control flow window, create a new data flow task and connect it to the Load Monthly Sales task. Rename it to “Load Monthly Marketing.

  • Add an OLEDB source. The source tabe: V_ODS_MONTHLY_MARKETING
  • Add an OLEDB destination. The destination table: F_MONTHLY_MARKETING
  • Connect the source and the destination.
  • Check, if the mapping is correct.

Add the delete statement to the SQL Task.

  DELETE FROM F_MONTHLY_MARKETING
  • run the package.

In the control flow window, create a new data flow task and connect it to the Load Monthly Sales task. Rename it to “Load Monthly Stock.

  • Add an OLEDB source. The source tabe: V_ODS_MONTHLY_STOCK_BOOKINGS
  • Add an OLEDB destination. The destination table: F_MONTHLY_STOCK
  • Connect the source and the destination.
  • Check, if the mapping is correct.

Add the delete statement to the SQL Task.

  DELETE FROM F_MONTHLY_STOCK
  • run the package.

Save your project.

25.11.2019 Use SSMS instead of VS Object Explorer

bicn01/game_etl.txt · Last modified: 2019/11/25 09:10 by admin