đ
Real-world application of the Central Limit Theorem (CLT) - YouTube
Channel: 365 Data Science
[0]
Hi everyone and welcome! In this video, weâll
talk about the real-world application of one
[5]
of the most widely used theorems in data science:
The Central Limit Theorem. For more super
[10]
practical videos like this one, make sure
to subscribe to our channel right now.
[14]
The Central Limit Theorem is the core of âhypothesis
testingâ- an approach in statistics that
[19]
lets you use data to evaluate your ideas.
In fact, this theorem can be applied to a
[23]
variety of real-life problems.
Letâs illustrate with an example.
[28]
Say, you own a business connected to the fish
market area, more specifically - a trout farm.
[33]
You own hundreds of fish reservoirs where
you keep and breed trout in order to sell
[37]
them to the main fish stores which supply
the biggest cities in the country. The farm
[41]
operates in the following way: you buy and
breed fish, which you later sell. Your clients
[46]
range from single fish vendors, to supermarket
chains.
[49]
Quite straightforward, right?
But to transform the above procedure into
[53]
a cycle, you need to use your capacity of
reservoirs appropriately. Hereâs how it
[57]
works. First, you have to label the reservoir
depending on the approximate size of fish
[62]
in it. The labels are three - newly hatched,
middle size, and first-class. As the fish
[68]
grow, you move them from newly hatched to
middle size and from middle size to first-class
[72]
and once theyâre fully grown, you sell them.
Meanwhile, youâve stocked the pool with
[77]
newly hatched trout and the process goes on.
Now, whatâs crucial to know is that first-class
[82]
fish are the largest among all the fish-groups.
Why is that so important? Well, as a business
[87]
owner, your goal is to maximize profit, and,
to achieve that, you must sell fish when they
[92]
reach the largest possible size, as customers
pay by the pound. Whatâs more, thereâs
[97]
a regulation set by the government that allows
you to keep 1,000 fish maximum in the first-class
[102]
reservoirs.
All things considered, selling the first-class
[105]
fish as large as possible would be your best
strategy to increase profit. Therefore, you
[110]
need to maximize the length of each fish in
every single tank.
[114]
Easy to say, but how can you do that? How
long is it going to take? Is the effort worth
[119]
the time you will lose while in competition
with other fish farms?
[123]
Letâs think about it for a second. One option
is to try to measure each fish separately...
[128]
But there are 1,000 fish in each tank, and
more than 20 tanks stocked with first-class
[132]
fish. So, manually measuring each fish doesnât
sound like a good idea. It will simply take
[138]
too long. You and your employees will be stuck
measuring fish from dawn till dusk which is
[143]
highly inefficient. Whatâs more, if you
can quickly find out the average size of your
[147]
fish, you can project how long it will take
for each tank to grow the necessary size.
[152]
This will allow you to plan key resources
such as staff and fish food supplies.
[156]
Finally, having an edge in the business depends
on your ability to stay competitive and agile.
[162]
Knowing what kind of sales volumes you can
produce daily helps you to be prepared when
[166]
a customer calls to purchase a certain number
of tanks due a particular date.
[171]
And this is where you can truly benefit from
some maths knowledge to optimize the process.
[175]
More precisely, you can use the Central Limit
Theorem - it will help you tremendously with
[180]
time-saving and, whatâs more, you will also
maximize your profit at the same time.
[185]
So, what is the Central Limit Theorem and
how does it work?
[189]
The Central Limit Theorem is a theorem in
probability theory, whose first version was
[193]
proposed by the French mathematician Abraham
de Moivre in 1733. Moivre published an article
[199]
where he used a normal distribution to approximate
the distribution of the number of heads resulting
[204]
from many tosses of a fair coin. The finding
was nearly forgotten until the French mathematician
[209]
Pierre-Simon Laplace expanded it in his monumental
work in the 19th century. Over the years,
[215]
numerous versions of it have been discovered
and proven by other mathematicians.
[219]
In its base form, the Central Limit Theorem
states that if we have a population and we
[224]
take sufficiently large random samples from
it, then the sample means will be approximately
[229]
normally distributed. We call the average
of each sample a sample mean.
[234]
In case youâre wondering what a sample mean
is, letâs go back to our fish example and
[238]
see.
If you simply take groups of fish from each
[240]
first-class reservoir and record the average
size in each group, that would be the so-called
[245]
sample means. In fact, thatâs exactly how
the Central Limit Theorem can be applied to
[250]
improve our fish measuring dilemma.
Letâs see it in practice!
[253]
Itâs a rule of thumb that the minimum sample
size to apply the CLT on is 30. Thatâs why
[259]
we start with a sample size of 30. That means,
each group of fish you select from your 1,000
[264]
first-class fish reservoir will consist of
30 fish. Then, youâll increase the sample
[269]
size to 50,100 etc. And each time you record
the sample mean. The idea is to gradually
[275]
increase the sample size because the bigger
the sample the better the theorem applies.
[280]
However, you must not try with too many values
for sample sizes because you want to be able
[284]
to act as quickly as possible for each first-class
tank. So, there are two possible outcomes:
[290]
The size of fish in the respective pool is
the maximized one, 50 cm, which means you
[295]
can sell the whole tank;
Or you must keep feeding them and take measurements
[299]
at a future date;
Then you do this for each first-class reservoir.
[303]
After recording and plotting the sample means,
you can see the plot fits under the bell-shaped
[307]
curve, illustrating the Normal distribution.
Therefore, going forward, you are in a position
[312]
to perform a statistical analysis using the
properties of this distribution.
[316]
From the normal distribution graph, you see
that the middle is denoted by ”-the mean
[321]
of sample means, which divides the area into
two equal and symmetric halves. Moreover,
[326]
the area under the curve, which is this area
here is equal to 1.
[330]
The first key observation is that approximately
two thirds of the collected means are one
[335]
standard deviation away from the mean of sample
means and approximately all of the data lies
[340]
within two standard deviations away from it.
But how does all this affect your fish measuring
[344]
business?
Well, for example if you have a sample mean
[347]
of 48 and standard deviation of 2 for a tank,
then the theorem says that approximately two
[353]
thirds of your observed sample means are in
the range between 46 and 50. Moreover, almost
[358]
all of the sample means are between 44 and
52. This tells you that you must feed the
[363]
fish in the tank a little bit more and on
the next measurements. This can have a massive
[367]
effect on your planning, as it helps you track
the rate of growth in length of fish.
[371]
Also, the sample means are normally distributed
random variables, which yields that we can
[376]
standardize them. More precisely, standardization
in this case stands for transforming each
[381]
of our variablesâ mean to 0 and variance
to 1. This way, we can easily find information
[387]
in statistical tables about the area under
the curve of the standard normal distribution.
[393]
Knowing all this, now we can answer some very
interesting questions:
[397]
What is the probability of seeing the average
of the 5th sample of 30 fish in the range
[401]
between 45 and 50? Or what is the probability
to obtain that the mean of 10th sample is
[407]
bigger than 48cm?
Extracting those probabilities from the table
[412]
above, will give you a more intuitive picture
of whatâs happening in your tanks.
[416]
Alright!
So, this is how the Central Limit Theorem
[419]
can be applied in a real-world scenario. The
power of this theorem lies in the fact that
[424]
it makes it possible to analyze data even
with incomplete information about it and allows
[429]
large datasets to be well approximated in
a highly accurate manner.
[433]
If you enjoyed this video, donât forget
to hit the âlikeâ or âshareâ button!
[438]
And if youâd like to become an expert in
all things data science, subscribe to our
[441]
channel for more great videos every week!
Thanks for watching!
You can go back to the homepage right here: Homepage





