Correlation and Regression Analysis: Simplest Way To Learn With Examples | Get Complete Clarity - YouTube

Channel: unknown

[0]
Hello Friends, In the last video of Scatter plot, we had
[3]
seen how to detect, relationship between 02 or more variables, by using scatter plot and
[9]
looking at the trend of scatter plot.
[11]
But only obvious relationships can be detect easily, by looking at the graph.
[16]
In some cases, it is not possible to judge the relationship between 02 variables, only
[21]
looking at the graph.
[22]
But the good thing is, we can comment about the relationship between these variables using
[27]
statistics.
[28]
Yes, I am going to explain the use of statistics, in determination of relationships between
[34]
these variables, as well as to predict the response of one variable, if we know the value
[38]
of another variable.
[40]
So let’s begin……
[42]
Correlation: The term correlation, is a combination of
[45]
two words ‘Co’ (together) and relation (connection) between two quantities.
[51]
Correlation is when, at the time of study of two variables, a unit change in one variable,
[56]
is reacted by an equivalent change in another variable, direct or indirect.
[60]
Or else, the variables are said to be uncorrelated, when the movement in one variable, does not
[68]
show any movement in another variable, in a specific direction.
[72]
It is a statistical technique, which represents the strength of the connection, between pairs
[77]
of variables.
[79]
Correlation can be positive or negative.
[82]
When the two variables move in the same direction, i.e. an increase in one variable, will result
[87]
in the corresponding increase in another variable and vice versa, then the variables are considered
[92]
to be positively correlated.
[94]
For example: profit and investment.
[98]
On the contrary, when the two variables move in different directions, in such a way that,
[103]
an increase in one variable, will result in a decrease in another variable and vice versa,
[108]
this situation is known as negative correlation.
[111]
For example: Price and demand of a product.
[116]
Correlation Analysis
[118]
The degree of association, is measured by a correlation coefficient, denoted by r.
[123]
It is sometimes called, Pearson's correlation coefficient, after its originator and is a
[129]
measure of linear association.
[130]
If a curved line is needed to express the relationship, other and more complicated measures
[136]
of the correlation, must be used.
[138]
The correlation coefficient is measured on a scale, that varies from + 1 through 0 to
[144]
- 1.
[145]
Complete correlation between two variables is expressed by either + 1 or -1.
[150]
When one variable increases as the other increases the correlation is positive; when one decreases
[155]
as the other increases it is negative.
[160]
Complete absence of correlation is represented by 0.
[163]
This figure gives some graphical representations of correlation.
[169]
Calculating Correlation Coefficient This coefficient is easily calculated by using
[174]
Microsoft Excel and Minitab very easily.
[180]
Correlation Coefficient by Excel There are 02 methods to calculate the correlation
[185]
coefficient in Excel.
[186]
Method-01): Let’s take an earlier example of Temperature
[190]
of the day vs Ice cream sales.
[193]
To determine the correlation between these 02 variables, use the function CORREL.
[197]
Type the function CORREL and select 1st variable column values and apply comma.
[202]
After that, select 2nd variable column and press Enter.
[207]
We will get the value of correlation coefficient as 0.96
[210]
It means, there is a strong relationship between the variables.
[215]
Method-02): Let’s continue with the same example.
[219]
To determine relationship between these 02 variables, use the Analysis Toolpak add-in
[224]
in Excel to quickly generate correlation coefficients between multiple variables, and execute the
[231]
following steps.
[233]
1.
[234]
On the Data tab, in the Analysis group, click Data Analysis.
[239]
2.
[240]
Select Correlation and click OK.
[243]
3.
[245]
Select the range A1:B13 as the Input Range, Check Labels in the 1st row and select output
[251]
range anywhere in the sheet or new worksheet option and click OK.
[255]
4.
[256]
We will get the final results as below: It is showing same value of correlation coefficient
[262]
i.e. 0.96 indicates strong relationship between variables.
[269]
Regression We have seen that, correlation describes the
[272]
strength of an association between two variables, and is completely symmetrical, the correlation
[278]
between A and B is the same as the correlation between B and A.
[283]
However, if the two variables are related it means that when one changes by a certain
[288]
amount the other changes on an average by a certain amount.
[292]
If y represents the dependent variable and x the independent variable, this relationship
[298]
is described as the regression of y on x.
[302]
The relationship can be represented by a simple equation called the regression equation.
[308]
In this context "regression" simply means that the average value of y is a "function"
[312]
of x, that is, it changes with x.
[316]
The regression equation representing how much y changes with any given change of x, can
[322]
be used to construct a regression line on a scatter diagram, and in the simplest case
[327]
this is assumed to be a straight line.
[329]
I am not going in further details as we can easily found it by using simple software like
[335]
Microsoft excel.
[336]
Let’s continue with the same example of Temperature of the day and ice cream sale
[341]
to understand regression analysis in Excel and how to interpret the Summary Output.
[347]
1.
[349]
On the Data tab, in the Analysis group, click Data Analysis.
[354]
2.
[356]
Select Regression and click OK.
[358]
3.
[360]
Select the Y Range (B1:B13).
[362]
This is the predictor variable (also called the dependent variable).
[367]
4.
[368]
Select the X Range (A1:A13).
[371]
These are the explanatory variables (also called independent variables).
[375]
These columns must be adjacent to each other.
[378]
5.
[380]
Check Labels. 6.
[381]
Click in the Output Range box and select cell A16.
[386]
(You can select anywhere in the sheet as well as in a separate worksheet)
[389]
7.
[390]
Check Residuals. 8.
[392]
Click OK.
[395]
Excel produces the following Summary Output (rounded it to 3 decimal places).
[401]
R Square R Square equals 0.917, which is a very good
[406]
fit.
[407]
92% of the variation in Ice cream sales is explained by the Temperature of the day.
[412]
The closer to 1, the better the regression line fits the data.
[418]
Significance F and P-values To check if your results are reliable (statistically
[424]
significant), look at Significance F (0.000).
[428]
If this value is less than 0.05, you're OK.
[433]
If Significance F is greater than 0.05, it's probably better to stop using this set of
[439]
independent variables.
[441]
Delete a variable with a high P-value (greater than 0.05) and rerun the regression until
[447]
Significance F drops below 0.05.
[449]
Most or all P-values should be below 0.05.
[452]
In our example, this is the case.
[456]
(0.015 and 0.000).
[462]
Coefficients The regression line is: y = Ice cream sale
[466]
= -159.474+ 30.088 * Temperature of the day.
[475]
In other words, for each unit increase in Temperature, Ice cream sale increases by 30.088
[481]
units.
[482]
This is valuable information.
[484]
You can also use these coefficients to do a forecast.
[488]
Residuals The residuals show you how far away the actual
[492]
data points are from the predicted data points (using the equation).
[496]
For example, the first data point equals 215.
[500]
Using the equation, the predicted data point equals
[503]
-159.474+ 30.088 * 14.2= 267.773, giving a residual of 267.773 - 215 = -52.773
[515]
Conclusion: With the above discussion, it is evident,
[528]
that there is a big difference between these two mathematical concepts, although these
[532]
two are studied together.
[534]
Correlation is used when the researcher wants to know whether the variables under study
[539]
are correlated or not, if yes then what is the strength of their association?
[545]
In regression analysis, a functional relationship between two variables is established so as
[550]
to make future projections
[572]
on events.