馃攳
Dynamic Arrays, aka ArrayLists - YouTube
Channel: Algorithms with Attitude
[0]
This video is on dynamic, or growable, or resizable arrays, also known as
[4]
ArrayLists in java or vectors in C++. It is part of a playlist, linked
[8]
here.
To understand ArrayLists, first you have to know how arrays work.
[12]
We build ArrayLists using arrays, analyze them, go over their operations,
[16]
and finish with a playlist including advanced optimization issues.
[20]
So an array is something that holds a fixed number of elements. For notation, an array has some
[24]
variable name, and then has an index position inside of some square brackets,
[28]
so this would be the 4th position within the array. Some languages
[32]
use this same notation but allow you to index by anything, like a string, but I'm
[36]
not talking about that. I don't know what the freaking "cat"th position of an
[40]
array is.
A real array is implemented as a continuous block of computer memory.
[44]
You could start numbering at 1, but for this video, I start at 0, so an
[48]
array of size 8 has items indexed from 0 to 7.
[51]
No fancy-smancy number skipping.
[53]
We can store whatever we like in the array,
[56]
but the size of each item is the same. So, we can have an array of integers,
[60]
or characters, or object references, or even fixed sized objects.
[64]
Some of that is language dependent, but abstractly, just imagine a bunch
[68]
of items lined up in some fixed amount of memory that has been
[72]
allocated to your program.
So the great thing about arrays is that they give
[76]
direct access to any location. Like any variable, the array name is
[80]
associated with a memory location, and then the index just gives an offset
[84]
from that location to a new memory location. If we want the fourth item,
[88]
we just move forwards by four times the size of each item in the array.
[92]
I will use integers here, so we just move forwards to get the fourth
[96]
integer after the one that starts the array. This is a very quick operation.
[100]
Because arrays map directly to your computer memory, you need to
[104]
specify how large the array will be when you allocate memory for it,
[108]
which is their weakness: sometimes you don't know how many spaces you need.
[112]
If you don't allocate enough space, you will run out of room, but if you allocate too much space, you
[116]
waste memory.
The big idea behind a dynamic array is that if you want to add
[120]
something to an array that is already full, just allocate a bigger array,
[124]
copy over all of the items from the old array, and now you have space for more elements.
[127]
Freakin' genius, right?
[129]
No, it's really simple, but, you need to do it the right
[132]
way to get good performance.
A bad way to go would be to just allocate
[136]
exactly the space you know you need right now. So from my example,
[140]
if I need room for a 9th element, I could copy over everything into an
[144]
array with 9 spaces, but later if I want to add a 10th item,
[148]
I would then need to move to an array of size 10.
[152]
With this approach, I can really hose myself. To add the ith item
[156]
takes time proportional to i, so if I add n total items, it takes
[160]
order n squared time.
A better way is to make the new
[164]
array proportionally bigger than its current size, even if that's
[168]
more than you immediately need. Starting with this size 10 array, when we
[172]
run out of space, we could double the array size. So, once you realize
[176]
that you have an 11th item and your array only has 10 spaces, you allocate a
[180]
new size 20 array, copy over the first 10 items, and then add
[184]
the 11th. We keep track of the fact that our data structure has size 11,
[188]
but capacity 20.
Now,
[192]
there's still more room in the array, and if we want to insert a 12th item to the end of the data structure,
[196]
just stick it there, and increment the size. For this data structure,
[199]
by default, always add new elements to the first
[203]
unfilled location. We don't allow skipping,
[206]
and if you add it to an earlier location, you have to shift
[209]
items over to make room for it, and it kills your performance.
[212]
While inserting only to the end is a bit of a limitation,
[216]
the underlying array structure does let us
[218]
directly access any item in constant time.
[222]
If the size is less than our capacity, we can also add to
[225]
or delete from the end of the list in constant time too.
[229]
Of course, if you really want to insert
[232]
to somewhere in the middle of the array, you can, but it takes time
[235]
proportional to the number of items after the position you are inserting to,
[240]
because we will shift all of those items over to make room.
[244]
Deletion from the middle also forces a shift to fill the blank space.
[248]
Insertion and deletion from the middle of the array aren't really
[252]
natural operations, they force us to loop over everything after
[256]
the changed location. The ArrayList class in java has methods
[260]
for indexed insertion or deletion, but it is internally
[264]
implemented with a loop. They may have optimized that loop to try
[268]
to make it fast, but you shouldn't forget it's there. The default position
[272]
for insertion or deletion is the end of the list. Okay, what happens
[276]
when we want to insert to the end, but the array is full? We can't just
[280]
stick the item in place and increment the size anymore. Instead, we need to
[284]
allocate new space, copy all of the items over, taking linear time
[288]
in the total size. Can we fix that?
Well, it turns out that we don't
[292]
need to. Because we double the array size when growing, those
[296]
expensive insertions can't happen very often. Inserting to the end is
[300]
expensive when going from size 10 to 11, but then we get a whole bunch of
[304]
constant time insertions to go all the way up to 20. Then we get another expensive
[308]
insertion, twice as expensive as the previous one, but then we get cheap
[312]
insertions all the way up to size 40. Some insertions are expensive,
[316]
but on average they take constant time each. This is called
[320]
amortized analysis: instead of looking at the cost of each operation
[324]
individually, we consider the total runtime for an entire sequence
[328]
of operations, and use this to get the average performance per operation.
[332]
For this data structure, the average performance of inserting
[336]
to the back of the list is constant time. You can look at best
[340]
case average performance per insertion, or the worst case average
[344]
performance, or even the average case average performance, they are all
[348]
constant time.
How do we prove that insertion takes on average constant
[352]
time per operation? Imagination and incense. No,
[356]
going back to our half-full array of capacity 20, imagine that every time
[360]
we insert into the array, if we have space, we not only pay for
[364]
insertion, but also prepay to copy 2 values
[368]
to a new location sometime in the future.
[372]
Including this extra imagined payment, insertion still takes constant
[376]
time, even if it is a few times larger than the real time insertion takes.
[380]
Clearly, adding that extra time to my
[384]
time analysis isn't going to let me undercount how much real time
[388]
is used to insert items.
When we go to insert the 21st
[392]
item, we first allocate the new array, size 40, and copy
[396]
over 20 items to it. But, for those last 10 insertions
[400]
I did, I have already included the cost of 2 copies each in my
[404]
time analysis. In our analysis, we withdraw against our
[408]
prepayed copy operations, and we have already deposited enough to copy
[412]
over all 20 items. The actual copying doesn't
[416]
happen until now, but if I am analyzing all of the time spent from the beginning,
[420]
we see that we have already accounted for the time of these
[424]
copies, during the last 10 insertions. For the 21st item,
[428]
I only still need to pay for its insertion, and with it
[432]
I will also prepay for two copy operations for the next
[436]
time the array grows.
This is called the accounting method for
[440]
amortized analysis. Your analysis overcharges for some operations,
[444]
and uses those prepayed overages to pay for
[448]
expensive operations which may come later. As long as the
[452]
analysis for any sequence from the start charges at least the real life
[456]
amount, you can use it to get valid average upperbounds.
[460]
Okay, how about deletion? If you want to delete from the end of the list, clearly
[464]
you can do that in constant time, but is there any reason to shrink the
[468]
size of the underlying array, if it is too empty? Before figuring
[472]
out how to do that, is it worth doing? It is.
[476]
Imagine that we are simulating a set of a billion items moving around between
[480]
a thousand lists. On average, the lists have a million items each,
[484]
but occasionally, maybe most of the items end up being in one list. When that
[488]
happens, that list size will grow to hold a billion items. If
[492]
different lists are crowded at different times, and you never shrink any of them,
[496]
eventually, you end up trying to allocate a thousand arrays of
[500]
size a billion each, even though you only have a billion items.
[504]
More generally, if you never shrink,
[505]
m lists holding n items total can take
[509]
order nm space, instead of n + m space.
[513]
So when should we shrink the arrays? First of all,
[517]
it depends on your growth factor
[519]
when you grow your arrays. We have been using growth factor
[523]
2, so you can't shrink them if they just go under half full,
[527]
or you could end up with a bunch of expensive operations if you ever alternated
[531]
between inserting and deleting right near a boundary, like between sizes
[535]
19 and 21 here, it would be slow. But, if you
[539]
double when growing, but only shrink when you go under, let's say, one quarter
[543]
full, you will be safe. Maybe have some minimum size too.
[547]
Using similar analysis to the insertion, where every deletion deletes
[551]
one item from the end of the list, but we also prepay for a
[555]
copy operation, we can prove that each deletion also
[559]
takes amortized constant time. If we implement shrinking,
[563]
ArrayLists use only linear space in their size.
[567]
Now that we know how ArrayLists are implemented, it is easy to see what
[571]
operations they give, and also how to make an iterator for them.
[575]
They support constant average time insertion and deletion from the
[579]
last position, as well as constant time indexed
[583]
item retrieval. Because of that quick retrieval, to make an
[587]
iterator, the only information you need to store is an index, but
[591]
that iterator shouldn't really be used to mess around with the middle of the list by
[595]
inserting or deleting there.
If you want to get into boss level
[599]
topics like how to optimize your ArrayList growth factor, I will cover that in
[603]
a separate video in the playlist, but that's it for this one.
[606]
I've got to go work on my pirate accent.
[608]
ArrrrrayList.
Most Recent Videos:
You can go back to the homepage right here: Homepage





