Sampling error and variation in statistics and data science - YouTube

Channel: Dr Nic's Maths and Stats

[0]
variation and sampling error hi I'm
[4]
doctor Nick and in this video I'm going
[6]
to tell you about why statistical
[9]
methods are needed I will also explain
[11]
variation in error including sampling
[14]
error and non sampling error variation
[19]
is everywhere it has practical
[20]
implications that is why statistical
[24]
methods are used to make sense of data
[27]
there are several sources of variation
[29]
and data today we are going to talk
[32]
about one natural or real variation to
[35]
explainable variation sometimes called
[38]
confounding three sampling error or
[41]
sampling variation and four variation
[45]
due to non sampling error for example
[49]
say we want to know how many texts on
[51]
average a 19 year old student sins in a
[53]
day on their mobile phone this could be
[56]
useful when setting up a pricing scheme
[58]
to attract students we could ask one
[61]
nineteen year old student but we know
[64]
that not all nineteen year old students
[66]
send the same number of texts in a day
[68]
there is natural variation if all
[72]
students sent the same number of texts
[74]
in a day taking a sample of one student
[77]
would give us all the information we
[79]
need but or students don't send the same
[82]
number of texts in a day
[84]
there is variation and this is true of
[87]
almost all human natural and
[90]
manufacturing processes there will be
[92]
variation in some cases such as weights
[97]
of medicines the variation may be very
[99]
small but it will still be there even if
[102]
we can't measure it there are other
[104]
reasons why asking one student wouldn't
[106]
give us the information we need about
[108]
texting habits of 19 year old students
[111]
the men and women could send different
[113]
numbers of texts and students from
[116]
different countries might have different
[117]
texting habits this can be called
[120]
explainable variation in some
[123]
statistical tests we try to find out
[126]
things like the differences between
[127]
groups such as males and females or
[130]
maybe the relationship between age of
[132]
the student and number of
[133]
texts we're trying to see what variation
[136]
we can attribute to known factors
[139]
statistical inference is the process of
[142]
drawing conclusions about population
[144]
parameters based on a sample taken from
[147]
the population because of natural and
[151]
explainable variation each sample will
[153]
only approximate the population say I
[157]
took a sample of five students and you
[159]
took a sample of five students from the
[161]
same class and asked them about the
[164]
number of texts they send yesterday do
[166]
you think you and I would get the same
[167]
mean or average I presume you've said no
[170]
because that is correct because our
[173]
samples will include different people we
[175]
are unlikely to get the same mean or
[177]
average that is another kind of
[179]
variation the fact that our samples
[183]
included different people is the
[184]
underlying cause of sampling error or
[186]
sampling variation because we are
[189]
looking at a subset of the population we
[191]
are not giving all the information about
[193]
the population and there are almost
[196]
infinite different samples we could take
[198]
each was its own distribution the term
[201]
sampling error is a bit of a problem as
[204]
the word error suggests that a mistake
[207]
has been made but this is not actually
[209]
the case
[210]
sampling error arises due to the
[212]
variability that occurs by chance
[214]
because a sample rather than an entire
[217]
population is surveyed when samples are
[221]
small the potential effect of sampling
[223]
error is greater but even with a big
[225]
sample we can never eliminate sampling
[228]
error completely typically a larger
[231]
sample size leads to an increase in the
[233]
precision of a statistic is an estimate
[236]
of a population parameter you might
[239]
think that if we are really careful to
[241]
take a random sample we might avoid
[243]
sampling error but even perfectly random
[246]
samples are subject to sampling error
[249]
sampling error exists because a sample
[252]
is not the complete picture of the
[254]
population you might think that if you
[258]
take a larger percentage of the
[259]
population that will reduce your
[261]
sampling error but strangely enough and
[264]
this the population is really small what
[266]
matters is
[267]
size of the sample not the percentage of
[269]
the population that represents so a
[272]
sample of 100 from our population of 1
[274]
million is just as useful as a sample of
[277]
100 from a population of 10,000 the
[280]
sample size known as in is what meters
[283]
the larger the sample is the less and
[286]
fluent sampling error will have on the
[288]
information we get
[289]
we cannot generally measure the extent
[292]
of sampling error to do this we would
[294]
need to know the true population
[296]
parameters such as the mean median and
[298]
standard deviation as well as the
[300]
equivalent sample statistics and if we
[303]
knew the true population parameters we
[305]
would not need to take a sample but with
[308]
probability modelling and procedures
[311]
like bootstrapping
[312]
we can get a good idea of the effect of
[314]
sampling error probability models
[317]
underpins statistical tests like the
[320]
t-test chi-square test and if test
[323]
confidence intervals Express estimates
[326]
of population parameters based on the
[328]
information gained from a sample
[330]
statistical tests generally require that
[333]
the sample is random
[334]
and unbiased and with as little error as
[336]
possible there are many ways that the
[339]
distribution of a sample may not
[341]
represent the true distribution of the
[343]
population the other errors that may
[345]
occur when a sample is used to infer
[348]
about a population are categorized is
[350]
non sampling error examples of non
[353]
sampling error are badly worded
[356]
questions and self selection of sample
[358]
members both of these can cause bias
[361]
which is a non sampling error there are
[364]
many sources of non sampling error which
[366]
we will talk about in a companion video
[368]
bias and non sampling error this video
[372]
was brought to you by statistics
[374]
Learning Center within our website for
[376]
more resources to help you learn