Data Structures: Crash Course Computer Science #14 - YouTube

Channel: CrashCourse

[3]
Hi, I'm Carrie Anne, and welcome to Crash Course Computer Science!
[5]
Last episode, we discussed a few example classic algorithms, like sorting a list of numbers
[9]
and finding the shortest path in a graph.
[11]
What we didn’t talk much about, is how the data the algorithms ran on was stored in computer
[16]
memory.
[16]
You don’t want your data to be like John Green’s college dorm room, with food, clothing
[19]
and papers strewn everywhere.
[21]
Instead, we want our data to be structured, so that it’s organized, allowing things
[25]
to be easily retrieved and read.
[27]
For this, computer scientists use Data Structures!
[29]
INTRO
[39]
We already introduced one basic data structure last episode, Arrays, also called lists or
[43]
Vectors in some languages.
[45]
These are a series of values stored in memory.
[47]
So instead of just a single value being saved into a variable, like ‘j equals 5’, we
[51]
can define a whole series of numbers, and save that into an array variable.
[55]
To be able to find a particular value in this array, we have to specify an index.
[59]
Almost all programing languages start arrays at index 0, and use a square bracket syntax
[64]
to denote array access.
[65]
So, for example, if we want to add the values in the first and third spots of our array
[70]
‘j’, and save that into a variable ‘a’, we would write a line of code like this.
[74]
How an array is stored in memory is pretty straightforward.
[76]
For simplicity, let’s say that the compiler chose to store ours at memory location 1,000.
[81]
The array contains 7 numbers, and these are stored one after another in memory, as seen here.
[86]
So when we write “j index of 0”, the computer goes to memory location 1,000, with an offset
[91]
of 0, and we get the value 5.
[93]
If we wanted to retrieve “j index of 5”, our program goes to memory location 1000,
[98]
plus an offset of 5, which in this case, holds a value of 4.
[102]
It’s easy to confuse the fifth number in the array with the number at index 5.
[105]
They are not the same.
[106]
Remember, the number at index 5 is the 6th number in the array because the first number
[110]
is at index 0.
[111]
Arrays are extremely versatile data structures, used all the time, and so there are many functions
[115]
that can handle them to do useful things.
[117]
For example, pretty much every programming language comes with a built-in sort function,
[121]
where you just pass in your array, and it comes back sorted.
[124]
So there’s no need to write that algorithm from scratch.
[126]
Very closely related are Strings, which are just arrays of characters, like letters, numbers,
[131]
punctuation and other written symbols.
[133]
We talked about how computers store characters way back in Episode 4.
[136]
Most often, to save a string into memory, you just put it in quotes, like so.
[140]
Although it doesn’t look like an array, it is.
[143]
Behind the scenes, the memory looks like this.
[144]
Note that the string ends with a zero in memory.
[147]
It’s not the character zero, but the binary value 0.
[151]
This is called the null character, and denotes the end of the string in memory.
[154]
This is important because if I call a function like “print quote”, which writes the string
[158]
to the screen, it prints out each character in turn starting at the first memory location,
[162]
but it needs to know when to stop!
[164]
Otherwise, it would print out every single thing in memory as text.
[167]
The zero tells string functions when to stop.
[170]
Because computers work with text so often, there are many functions that specifically
[174]
handle strings.
[175]
For example, many programming languages have a string concatenation function, or “strcat”,
[179]
which takes in two strings, and copies the second one to the end of the first.
[183]
We can use arrays for making one dimensional lists, but sometimes you want to manipulate
[187]
data that is two dimensional, like a grid of numbers in a spreadsheet, or the pixels
[191]
on your computer screen.
[192]
For this, we need a Matrix.
[194]
You can think of a Matrix as an array of arrays!
[197]
So a 3 by 3 matrix is really 2 an array of size 3, with each index storing an array of
[201]
size 3.
[203]
We can initialize a matrix like so.
[204]
In memory, this is packed together in order like this.
[207]
To access a value, you need to specify two indexes, like “J index of 2, then index
[212]
of 1” - this tells the computer you’re looking for the item in subarray 2 at position 1.
[217]
And this would give us the value 12.
[219]
The cool thing about matrices is we’re not limited to 3 by 3 -- we can make them any
[222]
size we want -- and we can also make them any number of dimensions we want.
[226]
For example, we can create a five dimensional matrix and access it like this.
[230]
That’s right, you now know how to access a five dimensional matrix -- tell your friends!
[234]
So far, we’ve been storing individual numbers or letters into our arrays or matrices.
[239]
But often it’s useful to store a block of related variables together.
[242]
Like, you might want to store a bank account number along with its balance.
[245]
Groups of variables like these can be bundled together into a Struct.
[249]
Now we can create variables that aren’t just single numbers, but are compound data
[252]
structures, able to store several pieces of data at once.
[256]
We can even make arrays of structs that we define, which are automatically bundled together
[260]
in memory.
[261]
If we access, for example, J index of 0, we get back the whole struct stored there, and
[265]
we can pull the specific account number and balance data we want.
[268]
This array of structs, like any other array, gets created at a fixed size that can’t
[272]
be enlarged to add more items.
[274]
Also, arrays must be stored in order in memory, making it hard to add a new item to the middle.
[278]
But, the struct data structure can be used for building more complicated data structures
[282]
that avoid these restrictions.
[284]
Let’s take a look at this struct that’s called a “node”.
[286]
It stores a variable, like a number, and also a pointer.
[289]
A pointer is a special variable that points, hence the name, to a location in memory.
[293]
Using this struct, we can create a linked list, which is a flexible data structure that
[297]
can store many nodes.
[299]
It does this by having each node point to the next node in the list.
[302]
Let’s imagine we have three node structs saved in memory, at locations 1000, 1002 and 1008.
[309]
They might be spaced apart, because they were created at different times, and other data
[313]
can sit between them.
[314]
So, you see that the first node contains the value 7, and the location 1008 in its “next”
[319]
pointer.
[320]
This means that the next node in the linked list is located at memory location 1008.
[324]
Looking down the linked list, to the next node, we see it stores the value 112 and points
[328]
to another node at location 1002.
[331]
If we follow that, we find a node that contains the value 14 and points back to the first
[336]
node at location 1000.
[337]
So this linked list happened to be circular, but it could also have been terminated by
[341]
using a next pointer value of 0 -- the null value -- which would indicate we’ve reached
[345]
the end of the list.
[347]
When programmers use linked lists, they rarely look at the memory values stored in the next
[350]
pointers.
[351]
Instead, they can use an abstraction of a linked list, that looks like this, which is
[355]
much easier to conceptualize.
[356]
Unlike an array, whose size has to be pre-defined, linked lists can be dynamically extended or
[361]
shortened.
[362]
For example, we can allocate a new node in memory, and insert it into this list, just
[366]
by changing the next pointers.
[367]
Linked Lists can also easily be re-ordered, trimmed, split, reversed, and so on.
[372]
Which is pretty nifty!
[373]
And pretty useful for algorithms like sorting, which we talked about last week.
[376]
Owing to this flexibility, many more-complex data structures are built on top of linked lists
[381]
The most famous and universal are queues and stacks.
[383]
A queue – like the line at your post office – goes in order of arrival.
[387]
The person who has been waiting the longest, gets served first.
[390]
No matter how frustrating it is that all you want to do is buy stamps and the person in
[393]
front of you seems to be mailing 23 packages.
[396]
But, regardless, this behavior is called First-In First-Out, or FIFO.
[399]
That’s the first part.
[401]
Not the 23 packages thing.
[402]
Imagine we have a pointer, named “post office queue”, that points to the first node in
[406]
our linked list.
[407]
Once we’re done serving Hank, we can read Hank’s next pointer, and update our “post
[411]
office queue” pointer to the next person in the line.
[414]
We’ve successfully dequeued Hank -- he’s gone, done, finished.
[418]
If we want to enqueue someone, that is, add them to the line, we have to traverse down
[422]
the linked list until we hit the end, and then change that next pointer to point to
[425]
the new person.
[426]
With just a small change, we can use linked lists as stacks, which are LIFO

[430]
Last-In First-Out.
[431]
You can think of this like a stack of pancakes... as you make them, you add them to the top
[435]
of stack.
[436]
And when you want to eat one, you take them from the top of the stack.
[439]
Delicious!
[440]
Instead of enqueueing and dequeuing, data is pushed onto the stack and popped from the stacks.
[445]
Yep, those are the official terms!
[447]
If we update our node struct to contain not just one, but two pointers, we can build trees,
[452]
another data structure that’s used in many algorithms.
[454]
Again, programmers rarely look at the values of these pointers, and instead conceptualize
[458]
trees like this: The top most node is called the root.
[462]
And any nodes that hang from other nodes are called children nodes.
[465]
As you might expect, nodes above children are called parent nodes.
[468]
Does this example imply that Thomas Jefferson is the parent of Aaron Burr?
[471]
I’ll leave that to your fanfiction to decide.
[474]
And finally, any nodes that have no children -- where the tree ends -- are called Leaf Nodes.
[478]
In our example, nodes can have up to two children, and for that reason, this particular data
[483]
structure is called a binary tree.
[484]
But you could just as easily have trees with three, four or any number of children by modifying
[489]
the data structure accordingly.
[490]
You can even have tree nodes that use linked lists to store all the nodes they point to.
[495]
An important property of trees – both in real life and in data structures – is that
[498]
there’s a one-way path from roots to leaves.
[500]
It’d be weird if roots connected to leaves, that connected to roots.
[503]
For data that links arbitrarily, that include things like loops, we can use a graph data
[508]
structure instead.
[509]
Remember our graph from last episode of cities connected by roads?
[512]
This can be stored as nodes with many pointers, very much like a tree, but there is no notion
[516]
of roots and leaves, and children and parents

[518]
Anything can point to anything!
[520]
So that’s a whirlwind overview of pretty much all of the fundamental data structures
[523]
used in computer science.
[525]
On top of these basic building blocks, programmers have built all sorts of clever variants, with
[529]
slightly different properties -- data structures like red-black trees and heaps, which we don’t
[533]
have time to cover.
[534]
These different data structures have properties that are useful for particular computations.
[539]
The right choice of data structure can make your job a lot easier, so it pays off to think
[542]
about how you want to structure your data before you jump in.
[545]
Fortunately, most programming languages come with libraries packed full of ready-made data structures.
[550]
For example, C++ has its Standard Template Library, and Java has the Java Class Library.
[555]
These mean programmers don’t have to waste time implementing things from scratch, and
[559]
can instead wield the power of data structures to do more interesting things, once again
[564]
allowing us to operate at a new level of abstraction!
[567]
I’ll see you next week.