🔍

Real-world application of the Central Limit Theorem (CLT) - YouTube

Channel: 365 Data Science

[0]

Hi everyone and welcome! In this video, we’ll talk about the real-world application of one

[5]

of the most widely used theorems in data science: The Central Limit Theorem. For more super

[10]

practical videos like this one, make sure to subscribe to our channel right now.

[14]

The Central Limit Theorem is the core of ‘hypothesis testing’- an approach in statistics that

[19]

lets you use data to evaluate your ideas. In fact, this theorem can be applied to a

[23]

variety of real-life problems. Let’s illustrate with an example.

[28]

Say, you own a business connected to the fish market area, more specifically - a trout farm.

[33]

You own hundreds of fish reservoirs where you keep and breed trout in order to sell

[37]

them to the main fish stores which supply the biggest cities in the country. The farm

[41]

operates in the following way: you buy and breed fish, which you later sell. Your clients

[46]

range from single fish vendors, to supermarket chains.

[49]

Quite straightforward, right? But to transform the above procedure into

[53]

a cycle, you need to use your capacity of reservoirs appropriately. Here’s how it

[57]

works. First, you have to label the reservoir depending on the approximate size of fish

[62]

in it. The labels are three - newly hatched, middle size, and first-class. As the fish

[68]

grow, you move them from newly hatched to middle size and from middle size to first-class

[72]

and once they’re fully grown, you sell them. Meanwhile, you’ve stocked the pool with

[77]

newly hatched trout and the process goes on. Now, what’s crucial to know is that first-class

[82]

fish are the largest among all the fish-groups. Why is that so important? Well, as a business

[87]

owner, your goal is to maximize profit, and, to achieve that, you must sell fish when they

[92]

reach the largest possible size, as customers pay by the pound. What’s more, there’s

[97]

a regulation set by the government that allows you to keep 1,000 fish maximum in the first-class

[102]

reservoirs. All things considered, selling the first-class

[105]

fish as large as possible would be your best strategy to increase profit. Therefore, you

[110]

need to maximize the length of each fish in every single tank.

[114]

Easy to say, but how can you do that? How long is it going to take? Is the effort worth

[119]

the time you will lose while in competition with other fish farms?

[123]

Let’s think about it for a second. One option is to try to measure each fish separately...

[128]

But there are 1,000 fish in each tank, and more than 20 tanks stocked with first-class

[132]

fish. So, manually measuring each fish doesn’t sound like a good idea. It will simply take

[138]

too long. You and your employees will be stuck measuring fish from dawn till dusk which is

[143]

highly inefficient. What’s more, if you can quickly find out the average size of your

[147]

fish, you can project how long it will take for each tank to grow the necessary size.

[152]

This will allow you to plan key resources such as staff and fish food supplies.

[156]

Finally, having an edge in the business depends on your ability to stay competitive and agile.

[162]

Knowing what kind of sales volumes you can produce daily helps you to be prepared when

[166]

a customer calls to purchase a certain number of tanks due a particular date.

[171]

And this is where you can truly benefit from some maths knowledge to optimize the process.

[175]

More precisely, you can use the Central Limit Theorem - it will help you tremendously with

[180]

time-saving and, what’s more, you will also maximize your profit at the same time.

[185]

So, what is the Central Limit Theorem and how does it work?

[189]

The Central Limit Theorem is a theorem in probability theory, whose first version was

[193]

proposed by the French mathematician Abraham de Moivre in 1733. Moivre published an article

[199]

where he used a normal distribution to approximate the distribution of the number of heads resulting

[204]

from many tosses of a fair coin. The finding was nearly forgotten until the French mathematician

[209]

Pierre-Simon Laplace expanded it in his monumental work in the 19th century. Over the years,

[215]

numerous versions of it have been discovered and proven by other mathematicians.

[219]

In its base form, the Central Limit Theorem states that if we have a population and we

[224]

take sufficiently large random samples from it, then the sample means will be approximately

[229]

normally distributed. We call the average of each sample a sample mean.

[234]

In case you’re wondering what a sample mean is, let’s go back to our fish example and

[238]

see. If you simply take groups of fish from each

[240]

first-class reservoir and record the average size in each group, that would be the so-called

[245]

sample means. In fact, that’s exactly how the Central Limit Theorem can be applied to

[250]

improve our fish measuring dilemma. Let’s see it in practice!

[253]

It’s a rule of thumb that the minimum sample size to apply the CLT on is 30. That’s why

[259]

we start with a sample size of 30. That means, each group of fish you select from your 1,000

[264]

first-class fish reservoir will consist of 30 fish. Then, you’ll increase the sample

[269]

size to 50,100 etc. And each time you record the sample mean. The idea is to gradually

[275]

increase the sample size because the bigger the sample the better the theorem applies.

[280]

However, you must not try with too many values for sample sizes because you want to be able

[284]

to act as quickly as possible for each first-class tank. So, there are two possible outcomes:

[290]

The size of fish in the respective pool is the maximized one, 50 cm, which means you

[295]

can sell the whole tank; Or you must keep feeding them and take measurements

[299]

at a future date; Then you do this for each first-class reservoir.

[303]

After recording and plotting the sample means, you can see the plot fits under the bell-shaped

[307]

curve, illustrating the Normal distribution. Therefore, going forward, you are in a position

[312]

to perform a statistical analysis using the properties of this distribution.

[316]

From the normal distribution graph, you see that the middle is denoted by µ-the mean

[321]

of sample means, which divides the area into two equal and symmetric halves. Moreover,

[326]

the area under the curve, which is this area here is equal to 1.

[330]

The first key observation is that approximately two thirds of the collected means are one

[335]

standard deviation away from the mean of sample means and approximately all of the data lies

[340]

within two standard deviations away from it. But how does all this affect your fish measuring

[344]

business? Well, for example if you have a sample mean

[347]

of 48 and standard deviation of 2 for a tank, then the theorem says that approximately two

[353]

thirds of your observed sample means are in the range between 46 and 50. Moreover, almost

[358]

all of the sample means are between 44 and 52. This tells you that you must feed the

[363]

fish in the tank a little bit more and on the next measurements. This can have a massive

[367]

effect on your planning, as it helps you track the rate of growth in length of fish.

[371]

Also, the sample means are normally distributed random variables, which yields that we can

[376]

standardize them. More precisely, standardization in this case stands for transforming each

[381]

of our variables’ mean to 0 and variance to 1. This way, we can easily find information

[387]

in statistical tables about the area under the curve of the standard normal distribution.

[393]

Knowing all this, now we can answer some very interesting questions:

[397]

What is the probability of seeing the average of the 5th sample of 30 fish in the range

[401]

between 45 and 50? Or what is the probability to obtain that the mean of 10th sample is

[407]

bigger than 48cm? Extracting those probabilities from the table

[412]

above, will give you a more intuitive picture of what’s happening in your tanks.

[416]

Alright! So, this is how the Central Limit Theorem

[419]

can be applied in a real-world scenario. The power of this theorem lies in the fact that

[424]

it makes it possible to analyze data even with incomplete information about it and allows

[429]

large datasets to be well approximated in a highly accurate manner.

[433]

If you enjoyed this video, don’t forget to hit the “like” or “share” button!

[438]

And if you’d like to become an expert in all things data science, subscribe to our

[441]

channel for more great videos every week! Thanks for watching!

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage