Structured Data Pipeline

Objective

The primary objective of this project is to gain practical skills in generating a data pipeline. The deliverables will involve creating a data pipeline with relational data using Azure's Database and Data Factory. I will be using the AdventureWorks data set published by Microsoft. Here's a notebook that contains all CREATE tables and views queries.

Approach

Step 1: Create a new Schema

The new schema will have the newly created views and data models.

Step 2: Create three views

Three views have been created from the existing tables found under the existing "SalesLT" schema.
The three views were created using different combinations of the available tables.
The views will act as the source of the data pipeline.

Step 3: Create three data models

Three data models were created in the form of three empty tables.
The tables will act as the sink of the data pipelines.

Step 4: Copy data from the views and into the newly modeled tables

Using Azure Data Factory, the data in the views were copied into thier comparable modeled tables.

Step 5: Validate that all records were correctly

A simple SELECT COUNT(*) sql clause applied to each view and its matching table to verfiy that all records were copied.

Structured Data Pipeline

Objective

Approach

Check out the project's GitHub Repo

Semi-Structured Data Pipeline