How to choose bin sizes for histograms - YouTube

Channel: Prof. Essa

[0]
This is Stephanie from StatisticsHowTo.com and in this video I will be showing you how
[4]
to choose bin sizes to make a histogram. There isn't an equation to choose bins.
[9]
I have a few general rules you'll want to follow. Bins should be the same size.
[13]
You can see on this histogram here, the bins are 8 spaces apart from 118 to 126, 126 to
[21]
134 and so on. You need to include all of the data including
[25]
any outliers. Use whole numbers whenever possible.
[30]
And choose 5 to 20 bins. If you have a small data set, you are going
[34]
to want to go with 5 bins, a very large data set, 20 bins.
[38]
It is a judgment call. Let me give you an example.
[41]
Here is a list of 20 items in my data set. The first thing I need to do is find my lowest
[47]
value and my highest value. Here is my lowest value, 1.
[52]
And here is my highest value, 50.1. Now, rule 3 says we need to use whole numbers,
[59]
so I am going to round 50.1 up to 51. My bottom number is a whole number, so that
[68]
is fine. The next step is I need to decide how many
[71]
bins to use. Rule 4 says, I should use between 5 and 20
[77]
bins. I only have a small amount of items in my
[80]
data set, so I am going to go with the lowest number.
[82]
I am going to go with 5 bins and see how that works out for me.
[86]
Now remember, we need to use whole numbers. I want to find out whether my number of bins
[95]
is going to divide into my range. My range is my highest number minus my lowest
[100]
number, so 51 minus 1 is 50. And 50 is divisible by 5.
[108]
In fact, 50 divided by 5 gives me 10. And I am going to use that number to create
[120]
my bin boundaries. My bin boundaries are these numbers right
[124]
here on the ticks, 118, 126 and 134. For my data set, these are going to be a little
[130]
different. So I am going to create my bin boundaries.
[135]
I am going to want to start my histogram at 1, the lowest number.
[145]
And then I am going to add 10 to get my next bin boundary, that gives me 11.
[153]
I am going to add 10 again, 21. Then add 10 to get 31, 41, and then I am going
[163]
to stop at 51, because that has included all of my numbers.
[169]
Now of course, things are not always going to be that simple.
[172]
You are not always going to choose the right bins and you numbers might not cooperate.
[179]
Well, you can round up or round down. Let me give you another example.
[184]
Let us say I have a list of 100 numbers, which means I am going to want to have a middling
[190]
bin size probably around 10. And let us say these numbers go from 1 to
[200]
48. Well, if I subtract to find my range.
[206]
I find my range is 47. 47 is a prime number.
[211]
I am not going to be able to find any bins that are going to evenly divide into 47.
[218]
So what I would here was I would round up to 50.
[222]
And let us say my hypothetical range is 50. That way if I choose bins starting at 1.
[229]
Let us say I am going to have 10 bins. So my range 50 divided by 10 is 5.
[237]
So I am going to go up by 5 each time. So 1 plus 5 is 6.
[243]
6 plus 5 is 11. I am going to keep on adding 5.
[254]
And eventually, my last bin number will be 51.
[260]
Now, my range was 1 to 48, and I will have included all my numbers in these bins 1 through
[268]
51. Just remember, always use whole numbers.
[272]
This is very much a judgment call. If you are using a program like Excel, you
[276]
will be better able to fiddle around with bin sizes and see which ones work and which
[281]
ones do not. But that is basically how to find bins for
[284]
histograms. Visit us at StatisticsHowTo.com for more articles
[289]
and videos on elementary statistics and AP statistics.