How to Calculate and Interpret a Correlation (Pearson's r) - YouTube

Channel: unknown

[0]
okay now let's take a look at calculating a correlation coefficient
[4]
suppose we have the following values with scores for five people on the
[9]
variables x and y and it's a small example because we're going to calculate
[13]
this by hand so I want to keep it relatively simple here we have for
[17]
example the first person has a score of 1 on X and a score of 600 I the second
[23]
person to on X 4 on Y and so on so we have 5 different people here and the 5
[30]
people produced a score on both x and y so let's go ahead and calculate the
[36]
correlation or Pearson's R on this example the first thing we want to do
[41]
though is state our null and alternative hypotheses so the null hypothesis States
[46]
this is called Rho here it's the population correlation so Rho XY equals
[52]
0 or in other words the correlation between x and y in the population equals
[58]
0 and that means that there is no
[61]
relationship between x and y in the population and once again this means row
[67]
and the alternative hypothesis is really just the opposite of that notice
[70]
everything's the same except equals four null not equals four alternative so the
[76]
alternative hypothesis is that the population correlation between x and y
[80]
is not equal to zero as you can see here and this is really saying that there is
[87]
a relationship between x and y in the population it could be positive
[90]
relationship or a negative relationship or in other words we're conducting a
[95]
two-tailed test here in terms of the calculations first we'll begin with
[100]
calculating the mean of X and the mean of Y so to find the mean of X and wife
[105]
starting with X we're just going to add all of these values together and we're
[111]
going to divide by the number of values that there are so a 1 plus 2 Plus 3 plus
[115]
4 plus 5 divided by 5 total gives us a mean of 3 for X and then we'll do the
[122]
same for y so we add all these values and divide by 5 and that gives us a mean
[127]
of 4 okay next we need to calculate what are known as deviation scores for each
[134]
variable deviation scores subtract the mean from each variable they're called
[140]
deviation scores because they indicate how far each value deviates or departs
[146]
from the mean so let's take a look at that now you know there's a lot of
[152]
information here on the screen but let me walk you through it it's really not
[156]
that bad recall that the mean of X was 3 in the
[160]
mean of Y was 4 so here are my X values and all I'm doing here notice the 3 for
[166]
the mean is taking the x value and subtracting the mean from it as I
[171]
mentioned a few moments ago so here we're just going to go 1 minus 3 so you
[176]
see that here 1 minus 3 is equal to negative 2 then we take 2 minus 3 or
[183]
mean again equal to negative 1 3 minus 3 0 4 minus the mean of 3 gives us 1 and 5
[192]
minus the mean of 3 gives us 2 an interesting thing here if your
[196]
calculations are correct the these deviation scores should all add up
[201]
to zero so notice here we have negative 2
[204]
negative 1 that's negative 3 0 we can ignore that then we have positive 3 so
[210]
negative 3 plus positive 3 is 0 notice how these all add up to 0 and that's by
[216]
definition because the mean of 3 here is a balance point in the distribution and
[222]
it always balances out the deviation scores below the mean with the deviation
[228]
scores above the mean so this should equal 0 when you add these up if it does
[233]
not that means that an error was made in the calculations okay so now let's find
[238]
the deviation scores for y so once again we just take the value for Y and
[243]
subtract the mean which is 4 in this case from each value so 6 minus 4 is 2 4
[251]
minus 4 is 0 5 minus 4 is 1 3 minus 4 is negative 1 and 2 minus 4 is negative 2
[259]
now let's take a look at this we have positive 3 negative 3 notice how those
[265]
add up to 0 again so that looks good ok so that's it for the deviation scores so
[272]
next what we need to do is we need to square each of these values we need to
[276]
square the deviation scores and the reason for that is it gets rid of the
[279]
negative numbers remember how when I showed you that the deviation score is
[283]
always sum or add up to 0 we can't really do much with it if our answer is
[288]
0 so when we square them that gets rid of the negative values and then later
[292]
we'll take care of that square by taking the square root at the end I'll show you
[297]
that later but for now step 3 is squaring the deviation scores so these
[304]
were our deviation scores from earlier that we calculated so all we're doing
[308]
now for X is squaring each one and that gives us these values and for y we're
[314]
squaring each of those deviation scores and we get these values here okay so
[318]
that's done we squared all those now all we want to do is add up those squared
[324]
values now technically speaking we do call this the sum
[328]
of the squared deviation scores and you may see this in your text or in other
[333]
places usually short-handed as SS or some of the squares it's called as
[340]
shorthand okay so some of the squares or SS is equal to adding up all of the
[346]
squared deviation scores so that's what we'll do here so we have our square
[352]
deviation scores here for X and for y so all we do now is just add those up and
[358]
SS or some of the squares for X when we add these values together gives us 10
[364]
and SS for Y also gives us 10 now it's not always the case when you calculate a
[371]
correlation that SS X and SS Y will be equal to each other that just happened
[376]
to occur in this example so don't expect that you have to see these as equal they
[381]
absolutely do not have to be equal whatsoever but they are on occasion okay
[386]
so finally what we need to do is find what are called the cross products now
[391]
cross products just multiply that word product right multiply the two deviation
[398]
scores together so let's take a look at this next now when you first see this it
[404]
may look a little intimidating but let me walk you through it it's really not
[407]
that bad because we've done most of this already
[410]
recall we found the deviation score for X right these values here and we found
[416]
the deviation scores for y we've already done that now all we do to find the sum
[421]
of the products is we multiply the deviation score for X by the deviation
[428]
score for y for a given person so here we have negative two times positive 2
[434]
you see that here this product is negative for multiplying the deviation
[439]
scores for the second person negative one times zero you see that here that's
[443]
zero third person 0 times 1 is 0 the fourth person one times negative one is
[450]
negative one and then finally for the last person two times negative two is
[455]
negative four now those are the cross products and
[459]
what we have to do is find the sum of them so we have to add them up okay and
[465]
that's what we do right here we have negative 4 0 0 negative 1 negative 4 so
[470]
that's what you see here all add it together and that gives us the sum of
[473]
the products of negative 9 a very quick review we found the mean then we found
[480]
the deviation scores these here and then we squared them added them together and
[486]
then our last step here we found the cross products and then we sum those up
[492]
so we've found the sum of the products or SP so we have everything we need to
[497]
calculate a correlation so here's our formula for the
[501]
correlation coefficient of Pearson's are some of the products divided by square
[505]
root of SS x times square root of SS y and we found all of these values so
[511]
we're ready to go recall that SS x and y we're both 10 in this example and the SP
[517]
was negative 9 so we'll just go ahead and plug these values in we have
[521]
negative 9 over square root of 10 times square root of 10 and that gives us when
[528]
we work it out an R of negative 0.9 okay so once again our value of Pearson's R
[535]
is negative 0.9 we can stop right here and just report that our R was negative
[541]
0.9 and we can be done however if we want to know whether this value is
[546]
statistically significant that is whether it's significantly different
[550]
from zero then we need to conduct a hypothesis test we'll do that next