🔍

Levels of variation and intraclass correlation - YouTube

Channel: unknown

[0]

One of the first things that we analyzed when we start working

[3]

with multiple data is on which level each individual observation varies.

[8]

We also calculate intraclass correlation to quantify these variances.

[12]

Let's take a look at an example to understand

[16]

what levels of variation on an individual variable level means.

[19]

We have here our profitability data or a company. So we have five observations for a single company

[25]

and average profitability for this company is about fourteen percent and the individual

[30]

observations vary randomly around the average profitability because companies sometimes have

[36]

good years sometimes they have bad years so the performance is not always the same. So there's

[40]

always some year to year variation. But that doesn't really fully explain why a large data

[49]

set of profitability figures would vary because there can be also other levels of variation.

[54]

For example there can be company level variation. These red mounds each present one company and

[61]

they are all working within an industry and this blue area here represents the variation

[68]

of the performance of all companies within that industry. So we can see that different companies

[74]

vary. Their performance vary within company but there are also variations between companies. So

[79]

that this company here is consistently less profitable than this company here.

[84]

So we have two levels. We have the within company level and we

[89]

have the between company level which is also the within industry level.

[92]

We can also add more levels. There is no limit on

[94]

how many levels we can do but let's go for an industry level.

[98]

So we have these blue five different industries and the industries are different in their

[106]

profitability. Some industries are highly profitable - others are not so and we can see that

[112]

the individual variation of the data here is a function of these three sources of variation - the

[119]

between industry level the between company level and the year-to-year variation within companies.

[125]

To understand our data and understand the phenomenon that the data represent we typically

[132]

need to decomposite variance to understood somehow come up with percentages or some other statistics

[138]

that quantify how much of the variation is here and how much of the variation is here in our data.

[144]

If our data set is small we typically start with a graphical analysis. So we can just upload the

[151]

data. This is 25 observations. 5 observation for each company for 5 companies within one industry.

[159]

We can see that there's some patterns for example this company - there is not much

[164]

variation in performance. This company is less profitable than that company and so on.

[169]

This kind of analysis works well when you have a small set of observations. If we have a large

[176]

number of observations but still a fairly manageable number of clusters let's say up

[183]

to 30 companies or 30 industries or whatever is our level to unit - we can use box plots.

[189]

Box plots are graphical presentations of individual variables and we can do

[195]

box plots by groups. The idea of a box plot is that we first calculate for a variable

[202]

regulate the median and the median gets this thick line here. That marks the median. Then

[209]

we calculate the first quartile and the third quartile of the data. So quartile

[215]

means that below this line lies 25 percent of our observations and above the line 75

[221]

percent. Median is half-and-half and third quartile is 75% and 25%.

[228]

We draw a box between the first quartile and the third quartile and half of our data is

[235]

within this box. Then we have these whiskers that indicate the minimum and maximum and

[241]

sometimes we also have outliers that the box plot algorithm identifies as circles.

[247]

So why is this box plot presentation useful and how can we analyze the box plots?

[252]

We can first of all start to understand the between and within variance by looking at the

[259]

box plots. We can compare these medians or we can do box plot with means and we can check how

[267]

much variation there is between these means or medians and that is our between variation.

[273]

We can also take a look at how high the boxes are and that quantifies the within

[280]

variance and comparing these two dimensions tell us if the variation in this variable

[289]

is more due to the differences between firms or is it just random variation or some other

[295]

variation within firms. So is it a within firm or between variation that explains the data.

[300]

We can quantify the level of variation between two levels also numerically by

[307]

calculating the within variance and between variance. This is our data and we start by

[313]

calculating group means. So we take each of these companies and we calculate a mean

[319]

of this. So these are the group means or cluster means for these five firms. And we

[327]

check how much these means vary. The variation is quantified here with this statistic and then

[335]

we calculate how much these individual observations vary from the group mean.

[341]

In practice we do group mean centering. So we take each of these observations. We subtract

[347]

the group mean and that gives us the group mean standard values. Then we calculate how much the

[354]

group mean standard data varies and this is our between variation. This is our within variation

[361]

and this is our total variation which is the sum of the between variation and the within variation.

[368]

So the variation of variance is a statistic that depends on the

[373]

scale. It would be useful to have a scale-free way to explain on which

[379]

level the data varies and this is where the intraclass correlation comes to play.

[385]

So intraclass correlation is simply calculated as variance between groups divided by the total

[392]

variance. So it answers the question how much of the variation in the data is attributed the

[397]

groups and how much is attributed to the variation within the groups. This

[403]

is called ICC one for reason that there are many other kinds of intraclass correlations.

[409]

So intraclass correlation generally refers to correlation between observations and

[415]

because there are many this is called the ICC one. There are like a few others but

[420]

this is the most important one that you need to understand when you work with multi-level data.

[424]

Other inter cross correlations are mostly about reliability of multiple

[430]

raters but as you see one is - this simple equation that simply quantifies variation.

[435]

When ICC 1 is 0 then that indicates that there is no variance between

[444]

groups. So the box plots are all on the same level here. There are no difference

[451]

between means and this case medians are close as well and all variation

[456]

is simply because there is variation between these within these groups.

[459]

Then when intraclass correlation is 1 then that means there is no variance within

[466]

clusters at all. All observations equal the level to means. So this firm's profitability

[473]

is always here. This firm's always here and so on. So there's no within unit variation.

[479]

Why do people calculate intraclass correlation and how it's typically reported? The role of

[488]

intraclass correlation - the first role is to make a decision whether something needs to be done for

[496]

the clustering. If all the observations within cluster have the same then you can just pick

[501]

one observation for each cluster and use those in regression analysis and it doesn't really

[506]

matter that you have the remaining observations because they don't provide you any more data.

[510]

If ICC 1 is 0 then there is no clustering in that variable and if your all ICC ones are very

[518]

low then there is no meaningful clustering in your data and it's possibly safe to go without

[525]

a multi-level modeling. There exceptions to that rule but generally when ICC 1 is close to

[532]

0 or when ICC 1 is close to 1 then a multi-level modeling may not be needed but if it's somewhere

[539]

between like it's 50% then you typically need to take levels into account in your analysis somehow.

[545]

Let's take a look at the example of how ICC 1 has been reported in published research. So

[551]

this comes from Hausknecht's paper and this is a good example because they first explain what

[560]

the statistic is. So quite often people just report a statistic or report a number without

[564]

explaining what ICC one is. And this study provides concise description. ICC 1 values

[573]

can be interpreted as the total amount of variance in the dependent variable that

[577]

is attributable to between unit rather than within uni differences over time.

[581]

So that explains what the statistics interpretation is and also if their

[586]

values are are high then regression analysis could be inappropriate and

[593]

then you would have to do something else or for example use cluster over standard

[598]

errors or multi-level modeling. And then they go on and they explain what is the

[603]

actual statistic. Abstention point 76 and then they explain what the statistic means.

[609]

So giving this short introduction to your statistics is very useful for your readers

[616]

because your readers may not be experts in using multi-level data so make it easier for them.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage