🔍
Correlation and Regression Analysis: Simplest Way To Learn With Examples | Get Complete Clarity - YouTube
Channel: unknown
[0]
Hello Friends,
In the last video of Scatter plot, we had
[3]
seen how to detect, relationship between 02
or more variables, by using scatter plot and
[9]
looking at the trend of scatter plot.
[11]
But only obvious relationships can be detect
easily, by looking at the graph.
[16]
In some cases, it is not possible to judge
the relationship between 02 variables, only
[21]
looking at the graph.
[22]
But the good thing is, we can comment about
the relationship between these variables using
[27]
statistics.
[28]
Yes, I am going to explain the use of statistics,
in determination of relationships between
[34]
these variables, as well as to predict the
response of one variable, if we know the value
[38]
of another variable.
[40]
So let’s begin……
[42]
Correlation:
The term correlation, is a combination of
[45]
two words ‘Co’ (together) and relation
(connection) between two quantities.
[51]
Correlation is when, at the time of study
of two variables, a unit change in one variable,
[56]
is reacted by an equivalent change in another
variable, direct or indirect.
[60]
Or else, the variables are said to be uncorrelated,
when the movement in one variable, does not
[68]
show any movement in another variable, in
a specific direction.
[72]
It is a statistical technique, which represents
the strength of the connection, between pairs
[77]
of variables.
[79]
Correlation can be positive or negative.
[82]
When the two variables move in the same direction,
i.e. an increase in one variable, will result
[87]
in the corresponding increase in another variable
and vice versa, then the variables are considered
[92]
to be positively correlated.
[94]
For example: profit and investment.
[98]
On the contrary, when the two variables move
in different directions, in such a way that,
[103]
an increase in one variable, will result in
a decrease in another variable and vice versa,
[108]
this situation is known as negative correlation.
[111]
For example: Price and demand of a product.
[116]
Correlation Analysis
[118]
The degree of association, is measured by
a correlation coefficient, denoted by r.
[123]
It is sometimes called, Pearson's correlation
coefficient, after its originator and is a
[129]
measure of linear association.
[130]
If a curved line is needed to express the
relationship, other and more complicated measures
[136]
of the correlation, must be used.
[138]
The correlation coefficient is measured on
a scale, that varies from + 1 through 0 to
[144]
- 1.
[145]
Complete correlation between two variables
is expressed by either + 1 or -1.
[150]
When one variable increases as the other increases
the correlation is positive; when one decreases
[155]
as the other increases it is negative.
[160]
Complete absence of correlation is represented
by 0.
[163]
This figure gives some graphical representations
of correlation.
[169]
Calculating Correlation Coefficient
This coefficient is easily calculated by using
[174]
Microsoft Excel and Minitab very easily.
[180]
Correlation Coefficient by Excel
There are 02 methods to calculate the correlation
[185]
coefficient in Excel.
[186]
Method-01):
Let’s take an earlier example of Temperature
[190]
of the day vs Ice cream sales.
[193]
To determine the correlation between these
02 variables, use the function CORREL.
[197]
Type the function CORREL and select 1st variable
column values and apply comma.
[202]
After that, select 2nd variable column and
press Enter.
[207]
We will get the value of correlation coefficient
as 0.96
[210]
It means, there is a strong relationship between
the variables.
[215]
Method-02):
Let’s continue with the same example.
[219]
To determine relationship between these 02
variables, use the Analysis Toolpak add-in
[224]
in Excel to quickly generate correlation coefficients
between multiple variables, and execute the
[231]
following steps.
[233]
1.
[234]
On the Data tab, in the Analysis group, click
Data Analysis.
[239]
2.
[240]
Select Correlation and click OK.
[243]
3.
[245]
Select the range A1:B13 as the Input Range,
Check Labels in the 1st row and select output
[251]
range anywhere in the sheet or new worksheet
option and click OK.
[255]
4.
[256]
We will get the final results as below:
It is showing same value of correlation coefficient
[262]
i.e. 0.96 indicates strong relationship between
variables.
[269]
Regression
We have seen that, correlation describes the
[272]
strength of an association between two variables,
and is completely symmetrical, the correlation
[278]
between A and B is the same as the correlation
between B and A.
[283]
However, if the two variables are related
it means that when one changes by a certain
[288]
amount the other changes on an average by
a certain amount.
[292]
If y represents the dependent variable and
x the independent variable, this relationship
[298]
is described as the regression of y on x.
[302]
The relationship can be represented by a simple
equation called the regression equation.
[308]
In this context "regression" simply means
that the average value of y is a "function"
[312]
of x, that is, it changes with x.
[316]
The regression equation representing how much
y changes with any given change of x, can
[322]
be used to construct a regression line on
a scatter diagram, and in the simplest case
[327]
this is assumed to be a straight line.
[329]
I am not going in further details as we can
easily found it by using simple software like
[335]
Microsoft excel.
[336]
Let’s continue with the same example of
Temperature of the day and ice cream sale
[341]
to understand regression analysis in Excel
and how to interpret the Summary Output.
[347]
1.
[349]
On the Data tab, in the Analysis group, click
Data Analysis.
[354]
2.
[356]
Select Regression and click OK.
[358]
3.
[360]
Select the Y Range (B1:B13).
[362]
This is the predictor variable (also called
the dependent variable).
[367]
4.
[368]
Select the X Range (A1:A13).
[371]
These are the explanatory variables (also
called independent variables).
[375]
These columns must be adjacent to each other.
[378]
5.
[380]
Check Labels.
6.
[381]
Click in the Output Range box and select cell
A16.
[386]
(You can select anywhere in the sheet as well
as in a separate worksheet)
[389]
7.
[390]
Check Residuals.
8.
[392]
Click OK.
[395]
Excel produces the following Summary Output
(rounded it to 3 decimal places).
[401]
R Square
R Square equals 0.917, which is a very good
[406]
fit.
[407]
92% of the variation in Ice cream sales is
explained by the Temperature of the day.
[412]
The closer to 1, the better the regression
line fits the data.
[418]
Significance F and P-values
To check if your results are reliable (statistically
[424]
significant), look at Significance F (0.000).
[428]
If this value is less than 0.05, you're OK.
[433]
If Significance F is greater than 0.05, it's
probably better to stop using this set of
[439]
independent variables.
[441]
Delete a variable with a high P-value (greater
than 0.05) and rerun the regression until
[447]
Significance F drops below 0.05.
[449]
Most or all P-values should be below 0.05.
[452]
In our example, this is the case.
[456]
(0.015 and 0.000).
[462]
Coefficients
The regression line is: y = Ice cream sale
[466]
= -159.474+ 30.088 * Temperature of the day.
[475]
In other words, for each unit increase in
Temperature, Ice cream sale increases by 30.088
[481]
units.
[482]
This is valuable information.
[484]
You can also use these coefficients to do
a forecast.
[488]
Residuals
The residuals show you how far away the actual
[492]
data points are from the predicted data points
(using the equation).
[496]
For example, the first data point equals 215.
[500]
Using the equation, the predicted data point
equals
[503]
-159.474+ 30.088 * 14.2= 267.773, giving a
residual of 267.773 - 215 = -52.773
[515]
Conclusion:
With the above discussion, it is evident,
[528]
that there is a big difference between these
two mathematical concepts, although these
[532]
two are studied together.
[534]
Correlation is used when the researcher wants
to know whether the variables under study
[539]
are correlated or not, if yes then what is
the strength of their association?
[545]
In regression analysis, a functional relationship
between two variables is established so as
[550]
to make future projections
[572]
on events.
Most Recent Videos:
You can go back to the homepage right here: Homepage





