🔍

Dynamic Arrays, aka ArrayLists - YouTube

Channel: Algorithms with Attitude

[0]

This video is on dynamic, or growable, or resizable arrays, also known as

[4]

ArrayLists in java or vectors in C++. It is part of a playlist, linked

[8]

here. To understand ArrayLists, first you have to know how arrays work.

[12]

We build ArrayLists using arrays, analyze them, go over their operations,

[16]

and finish with a playlist including advanced optimization issues.

[20]

So an array is something that holds a fixed number of elements. For notation, an array has some

[24]

variable name, and then has an index position inside of some square brackets,

[28]

so this would be the 4th position within the array. Some languages

[32]

use this same notation but allow you to index by anything, like a string, but I'm

[36]

not talking about that. I don't know what the freaking "cat"th position of an

[40]

array is. A real array is implemented as a continuous block of computer memory.

[44]

You could start numbering at 1, but for this video, I start at 0, so an

[48]

array of size 8 has items indexed from 0 to 7.

[51]

No fancy-smancy number skipping.

[53]

We can store whatever we like in the array,

[56]

but the size of each item is the same. So, we can have an array of integers,

[60]

or characters, or object references, or even fixed sized objects.

[64]

Some of that is language dependent, but abstractly, just imagine a bunch

[68]

of items lined up in some fixed amount of memory that has been

[72]

allocated to your program. So the great thing about arrays is that they give

[76]

direct access to any location. Like any variable, the array name is

[80]

associated with a memory location, and then the index just gives an offset

[84]

from that location to a new memory location. If we want the fourth item,

[88]

we just move forwards by four times the size of each item in the array.

[92]

I will use integers here, so we just move forwards to get the fourth

[96]

integer after the one that starts the array. This is a very quick operation.

[100]

Because arrays map directly to your computer memory, you need to

[104]

specify how large the array will be when you allocate memory for it,

[108]

which is their weakness: sometimes you don't know how many spaces you need.

[112]

If you don't allocate enough space, you will run out of room, but if you allocate too much space, you

[116]

waste memory. The big idea behind a dynamic array is that if you want to add

[120]

something to an array that is already full, just allocate a bigger array,

[124]

copy over all of the items from the old array, and now you have space for more elements.

[127]

Freakin' genius, right?

[129]

No, it's really simple, but, you need to do it the right

[132]

way to get good performance. A bad way to go would be to just allocate

[136]

exactly the space you know you need right now. So from my example,

[140]

if I need room for a 9th element, I could copy over everything into an

[144]

array with 9 spaces, but later if I want to add a 10th item,

[148]

I would then need to move to an array of size 10.

[152]

With this approach, I can really hose myself. To add the ith item

[156]

takes time proportional to i, so if I add n total items, it takes

[160]

order n squared time. A better way is to make the new

[164]

array proportionally bigger than its current size, even if that's

[168]

more than you immediately need. Starting with this size 10 array, when we

[172]

run out of space, we could double the array size. So, once you realize

[176]

that you have an 11th item and your array only has 10 spaces, you allocate a

[180]

new size 20 array, copy over the first 10 items, and then add

[184]

the 11th. We keep track of the fact that our data structure has size 11,

[188]

but capacity 20. Now,

[192]

there's still more room in the array, and if we want to insert a 12th item to the end of the data structure,

[196]

just stick it there, and increment the size. For this data structure,

[199]

by default, always add new elements to the first

[203]

unfilled location. We don't allow skipping,

[206]

and if you add it to an earlier location, you have to shift

[209]

items over to make room for it, and it kills your performance.

[212]

While inserting only to the end is a bit of a limitation,

[216]

the underlying array structure does let us

[218]

directly access any item in constant time.

[222]

If the size is less than our capacity, we can also add to

[225]

or delete from the end of the list in constant time too.

[229]

Of course, if you really want to insert

[232]

to somewhere in the middle of the array, you can, but it takes time

[235]

proportional to the number of items after the position you are inserting to,

[240]

because we will shift all of those items over to make room.

[244]

Deletion from the middle also forces a shift to fill the blank space.

[248]

Insertion and deletion from the middle of the array aren't really

[252]

natural operations, they force us to loop over everything after

[256]

the changed location. The ArrayList class in java has methods

[260]

for indexed insertion or deletion, but it is internally

[264]

implemented with a loop. They may have optimized that loop to try

[268]

to make it fast, but you shouldn't forget it's there. The default position

[272]

for insertion or deletion is the end of the list. Okay, what happens

[276]

when we want to insert to the end, but the array is full? We can't just

[280]

stick the item in place and increment the size anymore. Instead, we need to

[284]

allocate new space, copy all of the items over, taking linear time

[288]

in the total size. Can we fix that? Well, it turns out that we don't

[292]

need to. Because we double the array size when growing, those

[296]

expensive insertions can't happen very often. Inserting to the end is

[300]

expensive when going from size 10 to 11, but then we get a whole bunch of

[304]

constant time insertions to go all the way up to 20. Then we get another expensive

[308]

insertion, twice as expensive as the previous one, but then we get cheap

[312]

insertions all the way up to size 40. Some insertions are expensive,

[316]

but on average they take constant time each. This is called

[320]

amortized analysis: instead of looking at the cost of each operation

[324]

individually, we consider the total runtime for an entire sequence

[328]

of operations, and use this to get the average performance per operation.

[332]

For this data structure, the average performance of inserting

[336]

to the back of the list is constant time. You can look at best

[340]

case average performance per insertion, or the worst case average

[344]

performance, or even the average case average performance, they are all

[348]

constant time. How do we prove that insertion takes on average constant

[352]

time per operation? Imagination and incense. No,

[356]

going back to our half-full array of capacity 20, imagine that every time

[360]

we insert into the array, if we have space, we not only pay for

[364]

insertion, but also prepay to copy 2 values

[368]

to a new location sometime in the future.

[372]

Including this extra imagined payment, insertion still takes constant

[376]

time, even if it is a few times larger than the real time insertion takes.

[380]

Clearly, adding that extra time to my

[384]

time analysis isn't going to let me undercount how much real time

[388]

is used to insert items. When we go to insert the 21st

[392]

item, we first allocate the new array, size 40, and copy

[396]

over 20 items to it. But, for those last 10 insertions

[400]

I did, I have already included the cost of 2 copies each in my

[404]

time analysis. In our analysis, we withdraw against our

[408]

prepayed copy operations, and we have already deposited enough to copy

[412]

over all 20 items. The actual copying doesn't

[416]

happen until now, but if I am analyzing all of the time spent from the beginning,

[420]

we see that we have already accounted for the time of these

[424]

copies, during the last 10 insertions. For the 21st item,

[428]

I only still need to pay for its insertion, and with it

[432]

I will also prepay for two copy operations for the next

[436]

time the array grows. This is called the accounting method for

[440]

amortized analysis. Your analysis overcharges for some operations,

[444]

and uses those prepayed overages to pay for

[448]

expensive operations which may come later. As long as the

[452]

analysis for any sequence from the start charges at least the real life

[456]

amount, you can use it to get valid average upperbounds.

[460]

Okay, how about deletion? If you want to delete from the end of the list, clearly

[464]

you can do that in constant time, but is there any reason to shrink the

[468]

size of the underlying array, if it is too empty? Before figuring

[472]

out how to do that, is it worth doing? It is.

[476]

Imagine that we are simulating a set of a billion items moving around between

[480]

a thousand lists. On average, the lists have a million items each,

[484]

but occasionally, maybe most of the items end up being in one list. When that

[488]

happens, that list size will grow to hold a billion items. If

[492]

different lists are crowded at different times, and you never shrink any of them,

[496]

eventually, you end up trying to allocate a thousand arrays of

[500]

size a billion each, even though you only have a billion items.

[504]

More generally, if you never shrink,

[505]

m lists holding n items total can take

[509]

order nm space, instead of n + m space.

[513]

So when should we shrink the arrays? First of all,

[517]

it depends on your growth factor

[519]

when you grow your arrays. We have been using growth factor

[523]

2, so you can't shrink them if they just go under half full,

[527]

or you could end up with a bunch of expensive operations if you ever alternated

[531]

between inserting and deleting right near a boundary, like between sizes

[535]

19 and 21 here, it would be slow. But, if you

[539]

double when growing, but only shrink when you go under, let's say, one quarter

[543]

full, you will be safe. Maybe have some minimum size too.

[547]

Using similar analysis to the insertion, where every deletion deletes

[551]

one item from the end of the list, but we also prepay for a

[555]

copy operation, we can prove that each deletion also

[559]

takes amortized constant time. If we implement shrinking,

[563]

ArrayLists use only linear space in their size.

[567]

Now that we know how ArrayLists are implemented, it is easy to see what

[571]

operations they give, and also how to make an iterator for them.

[575]

They support constant average time insertion and deletion from the

[579]

last position, as well as constant time indexed

[583]

item retrieval. Because of that quick retrieval, to make an

[587]

iterator, the only information you need to store is an index, but

[591]

that iterator shouldn't really be used to mess around with the middle of the list by

[595]

inserting or deleting there. If you want to get into boss level

[599]

topics like how to optimize your ArrayList growth factor, I will cover that in

[603]

a separate video in the playlist, but that's it for this one.

[606]

I've got to go work on my pirate accent.

[608]

ArrrrrayList.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage