馃攳
Basic Excel Business Analytics #45: Covariance and Correlation to Measure Linear Relationship - YouTube
Channel: unknown
[0]
welcome to Highline bi 348 class video
[3]
number 45 if you want to download this
[6]
workbook bi 348 chapter 4 or the
[9]
PowerPoint click on the link below the
[12]
video yeah I'm gonna hit shift f5 and
[15]
start this PowerPoint now in last video
[17]
we talked about scatter plots one two
[20]
three to determine what type of
[22]
relationship there was between two
[24]
variables but this was a visual means
[27]
for determining whether we had positive
[30]
negative or no relationship now I'm
[33]
gonna jump ahead to slide number nine we
[36]
want to talk about a numerical measure
[39]
called covariance and correlation these
[41]
will be measures to investigate if
[43]
there's a relationship and these
[45]
numerical measures will be more precise
[47]
than that positive negative and no
[51]
relationship that we had with scatter
[53]
charts now next slide before we actually
[57]
look at the math and the formulas for
[60]
calculating covariance and correlation
[62]
we actually want to see how to create
[65]
this chart here and this chart will
[67]
involve yes plotting the scatter points
[70]
but we want to plot an X bar line and a
[75]
y bar line
[76]
now these plotted lines will help us
[80]
understand the math behind covariance
[83]
and correlation not only that but later
[86]
on the class will be plotting in
[87]
particular this y bar quite a lot so
[91]
we're gonna go over to excel and here we
[94]
have our data set weekly add expense x
[98]
weekly sells Y and just like we did in
[101]
last video we plotted in and we have all
[103]
of our markers out a certain X up a
[106]
certain Y now our goal is to plot the Y
[109]
bar line and the X bar now down here
[113]
I've already for X calculated the min
[117]
the max and look at this I've calculated
[122]
Y bar which this is just the average
[125]
from the Y column and I've listed it
[128]
twice why because we need one two
[133]
points to plot a line so this point will
[136]
go out 14,000 and then up to the Y bar
[141]
of about 370 that'll be one point and
[144]
then 64,000 700 that's the max up here
[149]
all the way up to Y of 370 that'll be
[152]
the second point and then we'll draw a
[154]
line all right so we come up to the
[156]
chart and we can right click select data
[160]
or we can go up to design select data
[164]
and lonely click on this button this is
[167]
the real power of charting because we're
[169]
allowed to and series of numbers edit
[172]
remove and change the horizontal axis so
[175]
I'm gonna click the Add button and our
[179]
series name when I click on Y bar
[182]
that'll be the name of the series or the
[184]
numbers inside our select data dialog
[187]
box that'll also show up in the legend
[190]
our X values we have our two x's tab and
[194]
we want to make sure in delete that
[195]
little array and then highlight our Y's
[198]
when I click OK click OK I have plotted
[203]
those two lines now notice they showed
[205]
up as two markers just like these
[207]
because that's what this chart is about
[209]
but we can change the chart type for
[212]
just one of the series I'm going to
[214]
click on the data point and right click
[217]
and I can point to change series chart
[219]
type or I can come up to change chart
[222]
type and look at this in 2013 and later
[225]
they list each one of the series the
[228]
lists of numbers and we can change just
[231]
that particular series so I'm gonna say
[233]
hey markers in line click OK now we want
[238]
to do the exact same thing in right
[240]
click select data add series name that's
[247]
X bar this time tab the X values both of
[252]
the X bars tab I'm going to delete that
[254]
and the two y's the min and the max when
[259]
I click OK click OK now we have our two
[263]
markers right click change series chart
[268]
x-bar and we want scatter with smooth
[272]
lines or markers bowl click OK and there
[276]
we have X bar and y bar now we want to
[280]
think about what these markers mean in
[283]
relation to X bar and y bar now Y bar is
[287]
370,000 and X bar is 40000 let's just
[290]
look at this point right here you can
[292]
see in the screen tip the little pop-up
[294]
there it says 57,000 for the x value and
[299]
731 thousand for the Y while compare
[302]
730,000 to the 370 that's a huge
[306]
positive deviation if we compare that
[309]
57,000 to the 40,000 for X bar that's an
[313]
X bar deviation positive that means that
[317]
particular point is above the Y bar and
[320]
above the X bar if we were to take both
[323]
of those deviations positive positive
[326]
and multiply them we get some big
[329]
positive number now let's think about a
[332]
number down here and this region number
[334]
3 you can see from the screen tip 17,000
[338]
for the X well if this is 40,000 and
[341]
this is 17,000 that's a huge minus
[345]
deviation now look at that screen -
[347]
110,000 for the Y Y bar is 370 if I'm
[352]
taking the particular value and
[353]
subtracting the Y bar that's a huge
[355]
negative deviation now think about this
[357]
if I take X bar deviation and Y bar
[361]
they're both negative negative and
[363]
multiply them guess what I get a huge
[366]
positive number so that's the idea
[368]
behind our first calculation covariance
[372]
we're gonna take all the deviations for
[373]
every single point multiply them and
[377]
then add them now for this particular
[379]
set of markers since most of the markers
[383]
are in Quadrant 1 and 3 that means when
[387]
we add them all up we're gonna get some
[388]
huge positive number
[390]
whereas in Quadrant 2 and 4 think about
[394]
that point right there we have a
[396]
negative deviation for X and a positive
[399]
deviation
[400]
for why so we're gonna get a bunch of
[402]
negative times positives we will get a
[404]
huge set of negative numbers over here
[407]
and negative numbers down here so when
[409]
the preponderance is in two and four and
[412]
we add all the products we're gonna get
[415]
some huge negative number now I want to
[418]
go over to the sheet covariance and
[420]
correlation now here's our formula for
[424]
sample covariance we're gonna multiply
[426]
in the numerator the deviation for x
[430]
times the deviation for y and then add
[432]
and then we're gonna divide by n minus
[435]
one now here's our data said here's the
[439]
weekly add expense and the weekly sales
[443]
this is the Y this is the X so we're
[445]
gonna calculate deviations and then
[447]
multiply them now I actually already did
[451]
the count of all of the records got a
[453]
hundred and nine n minus one our average
[457]
that's X bar Y bar
[459]
there's our sample standard deviation
[462]
and there's our samples deviation for
[464]
our Y's
[466]
alright so you're ready our deviations
[468]
hey equals and there's our particular X
[471]
minus our X bar there's our X bar I'm
[475]
gonna make sure an f4 to lock it
[477]
control-enter and copy it down now these
[481]
are actually deviations and this should
[483]
be familiar right because we did a lot
[485]
of this when we were calculating
[486]
standard deviation particular value
[489]
minus the X bar when we add up all of
[492]
our deviations alt equals of course
[495]
we're gonna get 0 all right so now we're
[498]
gonna calculate deviation for y there's
[500]
the particular Y minus our y bar right
[505]
there f4 to lock it ctrl enter and copy
[510]
it down when I add these up all 2 equals
[514]
hey those are deviations so the sum of
[515]
those are always 0 now we can multiply
[518]
equals I'm going to take X deviation
[521]
times y deviation control enter copy it
[526]
down and when I have these up alt equals
[530]
that's going to be the numerator
[533]
now we can come down here and actually
[535]
calculate covariance equals the sum of
[539]
all the products of the deviations
[542]
divided by and there's our N minus 1 and
[547]
when I hit enter
[548]
that's a big positive number so the
[551]
linear association between X and Y is
[557]
positive now we don't have to go through
[560]
all these steps manually to calculate it
[563]
because there's a built-in function but
[564]
certainly see in this chart and how
[568]
we're multiplying and the logic behind
[570]
that helps but once we understand that
[573]
we can simply use covariance and there's
[576]
a P for population and an S for sample
[579]
and it has array 1 or a 2 we've been
[582]
using the convention we're gonna put Y
[584]
first because some of our other linear
[586]
regression functions require that but
[588]
array want to you can put it in any
[591]
order and when I hit enter exactly the
[594]
same thing now covariance is great
[596]
except for there is one problem imagine
[600]
if instead of dollars up here we had
[602]
feet like someone's height or something
[605]
and we calculated covariance and we got
[607]
a number now imagine if we have the
[609]
exact same data points but they were in
[612]
inches we'd get a much larger covariance
[617]
even though the two data sets had the
[619]
same linear Association so there's a
[622]
problem with units for covariance with
[624]
this so guess what we're actually going
[626]
to take covariance put it in the
[628]
numerator and divide it by standard
[632]
deviation of x times standard deviation
[634]
of Y this is a way of standardizing it
[638]
in an essence getting covariance per
[640]
unit of standard deviation now why are
[644]
we multiplying what you can see up here
[645]
we multiplied the deviations right so
[647]
we're going to multiply these down here
[649]
and the beauty of this number is that it
[652]
will always come out between minus 1 and
[654]
1 if you had a perfect negative
[658]
correlation you'd have a line like this
[661]
and all of the points would be exactly
[664]
on the line if you are a perfect pie
[667]
of relationship you'd have that line
[669]
down the middle and all the points would
[671]
be exactly on the line so if we get a
[673]
negative number from correlation that's
[676]
close to minus one then we'll have a
[678]
strong negative relationship if it's
[681]
close to one it'll be a strong positive
[683]
relationship we could actually see a
[685]
picture of this over in our powerpoints
[688]
there you go our R comes out to zero
[692]
that means our data points in all four
[694]
quadrants if it came out exactly
[696]
negative one it would look like this
[698]
positive one it would look like this an
[700]
example of a strong positive
[702]
relationship would be something like
[704]
point eight nine an example of a strong
[707]
negative relationship would be something
[709]
like minus point eight nine and a
[711]
moderate direct relationship you can see
[714]
there's some dots but it looks like as x
[717]
increases Y is increasing we get
[719]
something like R equals point six so the
[723]
closer to minus one the better the
[726]
negative relationship the closer to one
[728]
the better the positive relationship now
[730]
let's go calculate this here's our
[732]
coefficient of correlation hey I'm just
[735]
gonna take my covariance I already
[738]
calculated and divide it by now I need
[741]
to multiply the standard deviation for x
[744]
and y so I'm going to use the product
[747]
function and simply highlight standard
[750]
deviation for x and y and wanna hit
[753]
enter point nine one seven so that's
[756]
pretty strong because it's close to one
[759]
it's a positive number so we know it's
[762]
direct now we don't have to do all the
[765]
steps for our calculation because
[767]
there's actually two functions equals P
[770]
e Pearson Pearson is named after the
[773]
statistician who invented this so they
[776]
named this function and we're gonna have
[778]
a ray one or a two I'm gonna put the Y's
[780]
comma and then the X's I'll get exactly
[784]
the same thing
[785]
not only is there Pearson but we have
[789]
Correll for correlation array one comma
[794]
R ring too
[795]
and there it is when I hit enter I get
[798]
exactly the same thing Pearson
[800]
or Correll now here we saw a positive
[805]
relationship I want to go over to the
[807]
sheet covariance and correlation -
[810]
actually this should be blue this should
[812]
be a blue sheet here's the example we
[815]
used earlier bike weight in pounds as
[818]
our X and price as our Y these are
[822]
racing bikes that we did our scatter
[825]
chart and we saw that it looked like a
[826]
negative or indirect relationship so if
[830]
we do covariance equal CoV and we do the
[834]
dot s for sample array 1 comma array 2
[841]
and so this will come out negative right
[844]
and there it is if we do coefficient of
[847]
correlation equals Pearson I'm gonna say
[851]
array 1 comma array 2 and there we go so
[857]
minus 0.897 the negative says it's
[862]
inverse or negative relationship that
[864]
means as X is increasing Y is decreasing
[868]
and it's close to minus 1 so that's
[871]
really strong now let's scroll over and
[874]
see one last examples here's the plot of
[878]
an XY scatter you could see quadrant one
[881]
two three four there's markers
[883]
everywhere so there's not going to be
[885]
much of anything equals covariance of
[888]
the sample array 1 comma array - and I'm
[894]
gonna cheat watch this I'm gonna
[895]
highlight in control-c because I'm gonna
[897]
use that again enter so that's very very
[901]
very close to zero not much linear
[904]
Association at all from Co variance
[906]
equals Pearson tab and I'm going to
[909]
control V when I hit enter look at that
[912]
absolutely almost zero there's no
[916]
correlation between these markers
[918]
there's no positive relationship or
[922]
negative relationship it's just all over
[924]
the map alright so in this video we saw
[930]
how to calculate covariance and
[933]
coefficient of
[934]
correlation and we even saw how to plot
[936]
our X&Y lines to visually look at where
[941]
the markers were in relation to the two
[944]
means all right now next video when we
[947]
come back we'll see how to create our
[949]
equation to make predictions by
[952]
calculating the slope and the
[954]
y-intercept all right we'll see you next
Most Recent Videos:
You can go back to the homepage right here: Homepage





