馃攳
Our data is GONE... Again - Petabyte Project Recovery Part 1 - YouTube
Channel: unknown
[0]
- When I say we store
and handle a lotta data
[2]
for a YouTube channel, I mean it.
[5]
I mean, we've built some'n sick,
[6]
hundred plus terabyte servers
[8]
for some of our fellow YouTubers,
[10]
but those are nothing
[11]
compared to the two plus
petabytes of archival storage
[16]
that we currently have in
production in our server room
[18]
that is storing all the footage
[20]
for every video we have
ever made, at full quality.
[25]
For the uninitiated,
[26]
that is over 11,000, Warzone
Installs worth of data.
[31]
But with great power comes
great responsibility,
[34]
and we weren't responsible.
[38]
Despite our super dope hardware,
we made a little oopsie
[43]
that resulted in us
permanently losing data
[45]
that we don't have any backup for.
[48]
We still don't know how much,
[49]
but what we do know is what went wrong
[52]
and we've got a plan
to recover what we can,
[54]
but it is going to take
some work, and some money,
[58]
thanks to our sponsor, Hetzner.
[60]
Hetzner offers
high-performance Cloud Servers
[62]
for an amazing price.
[64]
With their new US location
in Ashburn, Virginia,
[66]
you can deploy cloud servers
in four different locations
[69]
and benefit from features
like, Load Balancers,
[70]
Block Storage and more.
[72]
Use code LTT22 at the
link below for $20 off.
[79]
(upbeat music)
[85]
Let's start with a bit of
background on our servers.
[87]
Our archival storage is composed
[89]
of two discrete GlusterFS clusters.
[92]
Both of them spread across two
45Drives Storinator servers,
[96]
each with 60 hard drives.
[99]
The original petabyte project,
[101]
is made up of the Delta
1 and Delta 2 servers,
[104]
and goes by the moniker Old Vault.
[107]
Petabyte project two, or the New Vault
[109]
is Delta 3 and Delta 4.
[112]
Now, because of the nature of our content,
[113]
most of our employees
are pretty tech literate
[117]
with many of them even falling
[118]
into the tech wizard category.
[120]
So, we've always had
substantially lower need
[122]
for tech support than the average company.
[125]
And as a result, we have never
hired a full-time IT person,
[129]
despite the handful of times,
perhaps including this one,
[132]
that it probably would have been helpful.
[134]
So, in the early days, I
managed the infrastructure,
[138]
and since then I've had some
help from both outside sources,
[144]
and other members of the writing team.
[149]
We all have different strengths,
[151]
but what we all have in common
[153]
is that we have other jobs to do,
[155]
meaning that it's never really been clear
[158]
who exactly is supposed to be accountable
[161]
when something slips through the cracks.
[163]
And unfortunately, while obvious issues
[166]
like, a replacement power cable
[168]
and a handful of failed
drives over the years
[171]
were handled by Anthony,
[172]
we never really tasked anyone
[174]
with performing preventative maintenance
[176]
on our precious petabyte servers.
[179]
A quick point of clarification
[180]
before we get into the rest of this.
[182]
Nothing that happened as
the result of anything
[184]
other than us messing up.
[186]
The hardware, both from
45Drives and from Seagate
[190]
who provided the bulk of what makes up
[191]
our petabyte project servers,
[193]
has performed beyond our expectations
[196]
and we would recommend
checking out both of them,
[198]
if you or your business has
serious data storage needs.
[201]
We're gonna have links to them down below.
[203]
But even the best hardware
in the world can be let down
[207]
by misconfigured software.
[209]
And Jake, who tasked himself
[211]
with auditing our current infrastructure,
[213]
found just such a thing.
[215]
Everything was actually going pretty well.
[217]
He was setting up monitoring and alerts,
[219]
verifying that every machine
would gracefully shut down
[222]
when the power goes out,
[223]
which happens a lot here for some reason,
[225]
but he eventually worked his way around
[227]
to the petabyte project servers
[228]
and checked the status of
the ZFS pools or Zpools
[232]
on each of them.
[233]
And this is where the Kaka hit the fan.
[236]
Right off the bat, Delta 1 had
two of its 60 drives faulted
[240]
in the same Vdev.
[242]
And you can think of a Vdev,
[243]
kind of like its own mini RAID array
[246]
within a larger pool of
multiple RAID arrays.
[250]
So, in our configuration
where we're running RAID-Z2,
[253]
if another disc out of our 15 drive Vdev
[256]
was to have any kind of problem,
[258]
we would incur irrecoverable data loss.
[261]
Upon further inspection,
both of the drives
[263]
were completely dead,
[265]
which does happen with mechanical devices
[267]
and had dropped from the system.
[269]
So, we replaced them and let
the array start rebuilding.
[272]
That's pretty scary, but not
in and of itself a lost cause.
[277]
More on that later though.
[278]
Far scarier was when Delta 3,
[280]
which is part of the New Vault cluster
[283]
had five drives in a faulted state
[286]
with two of the Vdevs
having two drives down.
[289]
That's very dangerous.
[292]
Interestingly, these drives
weren't actually dead,
[296]
instead, they had just faulted
[297]
due to having too many errors.
[300]
So, read and writers like this
[302]
are usually caused by a
faulty cable or a connection,
[305]
but they can also be the
sign of a dying drive.
[308]
In our case, these errors
probably cropped up
[310]
due to a sudden power loss
[311]
or due to naturally occurring bit rot,
[314]
as they were never configured
[315]
to shut down nicely while on backup power,
[317]
in the case of an outage.
[318]
And we've had quite a few
of those over the years.
[321]
Now, storage systems are usually designed
[324]
to be able to recover from such an event,
[326]
especially ZFS, which is known for being
[328]
one of the most resilient ones out there.
[330]
After booting back up from a power loss,
[332]
ZFS pools and most other RAID
or RAID like storage arrays,
[336]
should do something called
a scrub or a re-sync,
[339]
which in the case of ZFS means
[341]
that every block of data gets checked
[343]
to ensure that there are no errors.
[345]
And if there are any errors,
[346]
these errors are automatically fixed
[348]
with the parity data that
is stored in the array.
[351]
On most NAS operating systems,
[353]
like TrueNAS, Unraid or any pre-built NAS,
[357]
this process should just
happen automatically.
[359]
And even if nothing goes wrong,
[361]
they should also run a scheduled
scrub every month or so.
[364]
But our servers were set up by
us a long time ago on CentOS
[370]
and never updated.
[371]
So, neither a scheduled nor
a power on recovery scrub
[376]
was ever configured.
[377]
Meaning the only time data integrity
[379]
would have been checked on these arrays,
[381]
is when a block of data got read.
[384]
This function should theoretically
protect against bit rot,
[387]
but since we have thousands of old videos,
[390]
of which a very, very small portion
[393]
ever actually gets accessed,
[395]
the rest were essentially
left to slowly rot
[398]
and power lost themselves
into an unrecoverable mess.
[402]
When we found the drive issues,
[404]
we weren't even aware of all this yet.
[405]
And even though the five drives
weren't technically dead,
[408]
we erred on the side of caution
[410]
and started a replacement
operation on all of them.
[413]
It was while we were
rebuilding the array on Delta 3
[415]
with the new discs,
[416]
that we started to uncover the
absolute mess of data errors.
[421]
ZFS has reported around 169 million errors
[425]
at the time of recording this.
[427]
And no, it's not nice.
[429]
In fact, there are so
many errors on Delta 3
[432]
that with two faulted drives
in both of the first Vdevs,
[435]
there is not enough parity
data to fix the errors.
[439]
And this caused the
array to offline itself
[441]
to protect against further degradation.
[444]
And unfortunately, much
further along in the process,
[447]
the same thing happened on Delta 1.
[450]
That means that both the original
and new petabyte projects,
[454]
Old and New Vault, have suffered
nonrecoverable data loss.
[461]
So, now what do we do?
[463]
In regards to the corrupted and
lost data, honestly nothing.
[467]
I mean, it's very likely
[468]
that even with 169 million data errors,
[471]
we still have virtually
[472]
all of the original bits
in the right places.
[476]
But as far as we know,
[477]
there's no way to just tell ZFS,
[480]
"Yo dawg! Ignore those errors, you know,
[482]
"Pretend like they never happened,
[484]
"tow easy ZFS" or something.
[486]
Instead then, the plan is to build a new
[489]
properly configured 1.2 petabyte server,
[492]
featuring Seagate's shiny
new 20 terabyte drives,
[495]
which we're really excited about like,
[496]
these things are almost as shiny
[498]
as our reflective hard
drive shirt, lttstore.com.
[502]
And once that's complete,
[503]
we intend to move all of the data
[505]
from the New Vault cluster
onto this New, New Vault.
[508]
- [Jake] All three.
[509]
- New New Vault.
[511]
Then we'll reset up New Vault,
[514]
ensure all the drives are good
[515]
and repeat the process to
move Old Vault's data onto it.
[519]
Then we can reformat Old
Vault, probably upgraded a bit
[523]
and use it for new data.
[524]
Maybe we'll rename it
to New, New, New Vault.
[527]
Get subscribed, so, you
don't miss any of that.
[529]
We'll hopefully be building
that new server this week.
[532]
Now, if everything were set up properly
[534]
with regularly scheduled
and post power loss scrubs,
[537]
this entire problem would
probably have never happened.
[541]
And if we had a backup of that data,
[543]
we would be able to
simply restore from that.
[545]
But here's the thing, backing
up over a petabyte of data
[549]
is really expensive.
[551]
Either we would need to build
a duplicate server array
[554]
to backup to, or we could
back up to the cloud.
[557]
But even using the economical
option, Backblaze B2,
[560]
it would cost us somewhere between
[561]
five and 10,000 US dollars per month,
[566]
to store that kind of data.
[567]
Now, if it was mission critical,
[569]
then by all means it
should have been backed up
[571]
in both of those ways,
[573]
but having all of our archival footage
[575]
from day one of the channel
[576]
has always been a nice to have
[579]
and an excuse for us to
explore really cool tech
[582]
that we otherwise wouldn't
have any reason to play with.
[584]
I mean, it takes a little bit more effort
[586]
and it yields lower quality results,
[588]
but we have a backup of
all of our old videos.
[591]
It's called downloading
them off of YouTube
[593]
or Floatplane, if we wanted
a higher quality copy.
[596]
So, the good news, is that
our production 1X server
[599]
is running great.
[601]
With proper backups configured,
[602]
and this isn't gonna have
any kind of lasting effect
[604]
on our business,
[606]
but I am still hopeful that
[607]
if all goes well with
the recovery efforts,
[609]
we'll be able to get back
the majority of the data,
[611]
mostly error free.
[613]
But only time will tell, a lot of time
[616]
because transferring all
those petabytes of data
[618]
off of hard drives to other hard drives,
[620]
is gonna take weeks or even months.
[623]
So, let this be a lesson,
[624]
follow proper storage
practices, have a backup
[628]
and probably hire someone
to take care of your data
[631]
if you don't have the time.
[632]
Especially if you measure
it in anything other
[633]
than tenths of terabytes,
[636]
or you might lose all of it.
[638]
But you won't lose our sponsor, Lambda.
[641]
Are you training deep learning models
[642]
for the next big breakthrough
in artificial intelligence?
[644]
Then you should know about Lambda,
[646]
the deep learning company.
[647]
Founded by machine learning engineers,
[649]
Lambda builds GPU workstations, servers,
[651]
and cloud infrastructure for
creating deep learning models.
[654]
They've helped all five
of the big tech companies
[656]
and 47 of the top 50 research universities
[659]
accelerate their machine
learning workflows.
[661]
Lambda's easy to use
configurators let you spec out
[663]
exactly the hardware you need
[665]
from GPU laptops and workstations
[667]
all the way up to custom server clusters
[669]
and all Lambda machines come pre-installed
[671]
with Lambda Stack,
[672]
keeping your Linux machine
learning environment up to date
[675]
and out of dependency hell.
[676]
And with Lambda Cloud,
[677]
you can spin up a virtual
machine in minutes,
[679]
train models with 4 NVIDIA A6000s,
[682]
at just a fraction of the cost
of the big cloud providers.
[685]
So, go to Lambdalabs.com/linus
[687]
to configure your own workstation
[688]
or try out Lambda Cloud today.
[690]
If you liked this video,
[691]
maybe check out the time I almost lost
[693]
all of our active projects
when the OG 1X server failed.
[698]
That was a far more stressful situation.
[701]
I'm actually pretty relaxed right now
[705]
for someone with less
much data on the line.
[707]
- [Jake] Yeah, must be nice.
[708]
- Yeah, I'm doing okay, thanks for asking.
[712]
I mean, I'd prefer to get
it back, you know.(chuckles)
Most Recent Videos:
You can go back to the homepage right here: Homepage