Datawarehousing Concepts Basics (Fact and Dimension Table) - YouTube

Channel: Tech Coach

[0]
HI guys my name is Vivek and today I will be explaining you the data warehousing concepts
[5]
in a simplified manner Before we go ahead let’s understand what
[9]
do we mean by a data warehouse ? Data warehouse is a system which is used for
[15]
reporting and data analysis.
[17]
From this data analysis we derive some logical conclusions that helps us in improving our
[23]
businesses.
[24]
We will be explaining the entire concept with a simple example today.
[29]
Mark who is the CEO of an ecommerce website is willing to increase his profit by streamlining
[35]
the process
[36]
This is the basic model of seller point Seller points has 3 components customers,
[44]
products and courier services
[45]
So these customers can buy these products and these products are shipped with these
[51]
Courier Services So let’s take an example Vivek can buy a
[55]
shoe from boot makers and that will be shipped by speed fast.
[62]
Mark is analyzing the data for last one month to identify which products or which customer
[70]
has made money for him and what are the reasons why he has incurred loss.
[75]
So based on that data analysis he will come up with some logical conclusions to improve
[81]
his business line.
[83]
Let’s analyze the data as I just mentioned we have three entities customers products
[90]
and careers and these are the details of the item that have been sold say over the last
[96]
one month.
[97]
So if you see we have certain components that have been highlighted in red these are the
[102]
items that have been returned an item that has been returned has made no profit for the
[107]
company rather it has incurred losses and the reason being an item that was returned
[115]
means it was shipped and then it was sent back
[119]
That means it didn’t make any money for the company,so what are the reasons why these
[124]
items were returned For the 1st 2 items the customer said he didn’t
[129]
like the product, Vivek is the customer who bought a shirt and
[132]
jeans and he said that he didn't like the product also note that the Delivery status
[139]
for this particular jeans was delayed.
[143]
The other two items, the reason for return were “The items were damaged”
[149]
These items were shipped by batman couriers, That means 2 items that were delivered by
[155]
batman couriers were damaged also if you see 2 items that were delivered by local couriers
[160]
were delayed.
[163]
Based on this we can come up with some logical conclusions like
[167]
Few items were delivered late Few products were not sold at all
[171]
Vivek returned 2 out of his 3 items And batman couriers delivered 2 broken items
[178]
out of 3 Obviously mark can come up with logical conclusions
[183]
with the data we just saw
[184]
He can remove the batman couriers services He can remove the unsold product line with
[190]
new similar products He should warn the local couriers service
[193]
to expedite their courier process He should remove cash on delivery for vivek
[198]
as he is returning majority of his items And he should give an additional discount
[203]
to neeraj on his next purchase as his previous experience was not smooth.
[209]
This will save him the name of the company because neeraj will not give him a bad review
[213]
in case if he gives him an additional discount.
[218]
If we want to understand the entire concept in terms of data warehousing it’s very simple
[225]
The 3 entities customers, products and courier services are dimensions
[230]
And the transaction table that we saw is the fact table.
[235]
The fact table has keys of all the dimensions and some additional columns we call measures
[243]
A fact table has the primary key of all the associated dimensions and the measures.
[250]
So measures are used to evaluate the profit and loss game
[253]
For example here the measure we can calculate which product made me how much money
[262]
Dimensions are the true driving table as they provide descriptive information for the measurements
[267]
recording in the fact The vital thing to remember in a data warehouse
[271]
is that the dimensions will always be loaded first.
[278]
Because you can’t sell a product that doesn’t exist.
[281]
Also if you see dimensions are relatively small in size because a customer can buy multiple
[288]
products.
[289]
Which means vivek he can buy 10 items and that will be 10 entries in the fact table
[296]
but he still is a single customer
[298]
In a data warehouse 80% of the data usually is consumed by a fact table
[305]
As I already explained the objective of data warehouse is to leverage the data for precise
[310]
and effective decision making . Also the data in warehouse is persistent.
[315]
If I try to explain the data flow of data for reporting, First of all the data enters
[323]
in the OLTP system from there it goes to the integration layer between OLTP and the data
[329]
warehouse which is ods (Operational data store).
[333]
From the data warehouse the data goes to data mart where the reporting is usually done.
[339]
So we still have some open questions
[341]
What is an OLTP system ? What is ODS?
[345]
What is data mart?
[346]
Why the reporting is not usually done on OLTP or ODS and is done on warehouse or data mart?
[354]
What kind of reporting tools are being used ?
[356]
All these answers I will give in my next tutorial