馃攳
Bivariate relationship linearity, strength and direction | AP Statistics | Khan Academy - YouTube
Channel: Khan Academy
[0]
what we have here is six different
[2]
scatter plots that show the relationship
[4]
between different variables so for
[6]
example in this one here in the
[7]
horizontal axis we might have something
[9]
like age and then here could be accident
[12]
frequency
[14]
accident frequency and i'm just making
[17]
this up
[18]
and i could just show these data points
[20]
maybe for some kind of statistical
[22]
survey that when the age is this
[24]
whatever number this is maybe this is 20
[26]
years old this is the accident frequency
[28]
and it could be a number of accidents
[30]
per hundred and that when the age is 21
[33]
years old this is the frequency and so
[35]
these data scientists or statisticians
[37]
went and plotted all of these in this
[40]
scatter plot this is often known as
[42]
bivariate data which is a very fancy way
[44]
of saying hey you're plotting things
[47]
that take it two variables into
[49]
consideration and you're trying to see
[50]
whether there's a pattern with how they
[53]
relate and what we're going to do in
[55]
this video is think about well can we
[58]
try to fit a line does it look like
[61]
there's a linear or non-linear
[63]
relationship between the variables on
[65]
the different axes how strong is that
[68]
variable is it a positive is it a
[70]
negative relationship and then we'll
[71]
think about this idea of outliers so
[74]
let's just first think about whether
[76]
there's a linear or non-linear
[77]
relationship and i'll get my little
[79]
ruler tool out here
[81]
so this data right over here it looks
[82]
like i could get a i could put a line
[85]
through it that gets pretty close
[87]
through the data you're not going to
[88]
it's very unlikely you're going to be
[89]
able to go through all of the data
[91]
points but you can try to get a line and
[93]
i'm just doing this there's more
[94]
numerical more precise ways of doing
[96]
this but i'm just eyeballing it right
[98]
over here and it looks like i could plot
[100]
a line that looks something like that
[103]
that goes roughly through the data so
[105]
this looks pretty linear so i would call
[108]
this a linear
[109]
relationship
[111]
and since as we increase one variable it
[113]
looks like the other variable decreases
[115]
this is a downward sloping line i would
[118]
say this is a negative
[120]
this is a negative
[122]
linear relationship but this one looks
[124]
pretty strong so
[126]
because the dots aren't that far from my
[130]
line this one gets a little bit further
[131]
but it's not you know there's not some
[133]
dots way out there so most of them are
[135]
pretty close to the line so i'll call
[137]
this a negative reasonably strong linear
[140]
relationship
[141]
negative
[143]
strong i'll call reasonably i'll just
[145]
say strong but reasonably strong
[148]
linear
[150]
linear relationship between these two
[152]
variables
[153]
now let's look at this one and pause
[155]
this video and think about what what
[156]
this one would be for you
[159]
well let's see i'll get my ruler tool
[161]
out again
[162]
it looks like i can try to put a line
[164]
looks like generally speaking as one
[166]
variable increases the other variable
[169]
increases as well so something like this
[172]
goes through the data and
[174]
approximates the direction and this
[176]
looks positive as one variable increases
[179]
the other variable increases roughly so
[182]
this is a positive relationship
[184]
but this is weak a lot of the data is
[188]
off well off of the line so positive
[192]
weak
[193]
but i'd say this is still linear it
[195]
seems that as we increase one the other
[198]
one increases at roughly the same rate
[200]
although these data points are all over
[201]
the place so i would still call this
[204]
linear
[205]
now there's also this notion of outliers
[207]
if i said hey this line is trying to
[209]
describe the data well we have some data
[212]
that is
[213]
fairly off the line so for example
[216]
even though we're saying it's a positive
[218]
weak linear relationship this one over
[220]
here is reasonably high on the vertical
[223]
variable but it's low on the horizontal
[226]
variable and so this one right over here
[228]
is an outlier it's quite far away from
[230]
the line you could view that as an
[231]
outlier and this is a little bit
[233]
subjective outliers well what looks
[235]
pretty far from the rest of the data
[237]
this could also be an outlier let me
[240]
label these
[242]
out
[243]
liar
[244]
now pause the video and see if you can
[245]
think about this one is this positive or
[247]
negative is it linear non-linear is it
[249]
strong or weak
[251]
i'll get my ruler tool out here so this
[253]
goes here
[255]
it seems like i can fit a line pretty
[257]
well to this
[259]
so i could fit
[261]
maybe i'll do the line in purple
[263]
i could fit a line that looks like that
[266]
and so this one looks like it's positive
[268]
as one variable increases the other one
[270]
does for these data points so it's a
[272]
positive i'd say this is pretty strong
[275]
the dots are pretty close to the line
[277]
there it really does look like a little
[278]
bit of a fat line if you just look at
[280]
the dots so positive strong
[284]
linear
[286]
linear relationship
[288]
and none of these data points are really
[290]
strong outliers this one's a little bit
[292]
further out but they're all pretty close
[294]
to the line and seem to describe that
[296]
trend roughly
[298]
all right now let's look at this this
[300]
data right over here
[302]
so let me get my line tool out again
[306]
so
[307]
it looks like i can fit a line so it
[309]
looks and it looks like it's a positive
[310]
relationship the line would be upward
[312]
sloping it would look something like
[314]
this and once again i'm eyeballing it
[317]
you can use computers and other methods
[319]
to actually find a more precise line
[321]
that minimizes the collective distance
[323]
to all of the points
[324]
but it looks like
[326]
there is a
[327]
positive
[328]
but i would say this one is a weak
[331]
linear relation because you have a lot
[332]
of points that are far off the line so
[335]
not so strong so i would call this a
[337]
positive
[338]
weak
[339]
linear
[340]
relationship and there's a lot of
[342]
outliers here you know this one over
[344]
here is pretty far
[346]
pretty far out
[348]
now let's look at this one
[349]
pause this video and think about is
[350]
positive negative is it strong or weak
[352]
is this linear non-linear
[354]
well the first thing we want to do is
[355]
just think about linear non-linear i
[357]
could try to put a line on it
[360]
but if i try to put a line on it it's
[362]
it's actually quite difficult if i try
[363]
to align like this notice everything is
[365]
kind of bending away from the line it
[368]
looks like generally as one variable
[370]
increases the other variable decreases
[372]
but they're not doing it in a linear
[373]
fashion
[374]
it looks like there's some other type of
[376]
curve at play
[377]
so i could try to do a fancier curve
[380]
that looks something like this and this
[382]
seems to fit the data a lot better so
[384]
this one i would describe as
[387]
non-linear
[388]
and it is a negative relationship as one
[390]
variable increases the other variable
[392]
decreases
[393]
so this is a negative
[396]
i would say reasonably strong non-linear
[398]
relationship
[400]
pretty strong
[401]
pretty strong this is subjective
[404]
so i'll say negative reasonably strong
[408]
non-linear relationship
[410]
and maybe you could call this one an
[413]
outlier but it's not that far and i i
[415]
might even be able to fit a curve that
[416]
gets a little bit closer to that once
[417]
again i'm eyeballing this now let's do
[419]
this last one
[421]
so this one
[422]
looks like a negative linear
[425]
relationship to me
[427]
a fairly strong negative linear
[429]
relationship although there's some
[431]
outliers
[432]
so
[433]
let me draw this line
[435]
so that seems to fit the data pretty
[437]
good so this is a negative
[440]
reasonably strong
[442]
reasonably strong
[444]
linear relationship
[446]
but these are very clear outliers these
[448]
are well away from the data or from the
[451]
cluster of where most of the points are
[453]
so with some significant with at least
[455]
these two significant outliers here so
[458]
hopefully this makes you a little bit
[459]
familiar with some of this terminology
[461]
and it's important to keep in mind this
[463]
is a little bit subjective there'll be
[465]
some cases that are more obvious than
[467]
others so for and oftentimes you want to
[469]
make a comparison that this is a
[471]
stronger linear positive linear
[472]
relationship than this one is right over
[474]
here because you can see most of the
[475]
data is closer to the line this one is
[479]
for sure this is more non-linear than
[482]
linear it depends how you want to
[484]
describe
[485]
oftentimes making a comparison or making
[487]
a subjective call on how to describe the
[489]
data
Most Recent Videos:
You can go back to the homepage right here: Homepage





