Speaker:

You found the backup wrap up your go-to podcast for all things

Speaker:

backup recovery and cyber recovery.

Speaker:

In this episode, you'll hear the harrowing tale of what I'm

Speaker:

calling the backup from hell.

Speaker:

A project that started as a simple one-time backup, a 40 terabyte

Speaker:

of two sonology boxes that turned into a 400 terabyte nightmare

Speaker:

that took months to complete.

Speaker:

We're talking hundreds of millions of files with one directory alone

Speaker:

containing 99 million of them.

Speaker:

I'll share how I dealt with failing tape drives ridiculously slow

Speaker:

backup speeds, and ultimate solution that finally got the job done.

Speaker:

If you've ever wondered what happens when everything that could go wrong

Speaker:

with the backup actually goes wrong.

Speaker:

This episode is for you, plus you'll learn some valuable lessons about what to check

Speaker:

before starting a massive backup job.

Speaker:

By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.

Speaker:

Backup, and I've been passionate about backup and recovery for

Speaker:

over 30 years, ever since.

Speaker:

I had to tell my boss that we had no backups of the production

Speaker:

database that we just lost.

Speaker:

I don't want that to happen to you, and that's why I do this show.

Speaker:

On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.

Speaker:

This is the backup wrap up.

Speaker:

Welcome to the show, and if I could ask you to just take one quick second

Speaker:

and, uh, subscribe or follow us so you can make sure that you get all of this

Speaker:

great content, that would be great.

Speaker:

I'm w Curtis Preston, AKA, Mr.

Speaker:

Backup, and I have with me a guy that apparently owes Ben Kingsley

Speaker:

a huge apology Prasanna Malaiyandin

Speaker:

how's it going?

Speaker:

Prasanna, why do you owe

Speaker:

an apology?

Speaker:

so as everyone's probably like, who's Ben Kingsley.

Speaker:

So if you don't know, he is an actor and he also played Gandhi in the movie Gandhi.

Speaker:

He did.

Speaker:

Right?

Speaker:

And for the longest time I was a little, not upset, but like the fact that you have

Speaker:

like probably one of the most important Indian people in history being played

Speaker:

By a guy with the name Ben Kingsley.

Speaker:

Exactly.

Speaker:

Yeah.

Speaker:

Ben Kingsley.

Speaker:

And so today I found out that Ben Kingsley is actually Indian.

Speaker:

Half

Speaker:

How about that?

Speaker:

should say.

Speaker:

Yeah,

Speaker:

what?

Speaker:

he's Anglo Indian.

Speaker:

Anglo Indian.

Speaker:

Yes.

Speaker:

It's like us.

Speaker:

You and me we're Indian.

Speaker:

so his paternal side is from Gujarat.

Speaker:

Right.

Speaker:

And his mom's side I think is European.

Speaker:

His dad was a physician who was born in Kenya.

Speaker:

And Ben Kingsley's name is not actually Ben Kingsley.

Speaker:

It's like Krishna Bunge, I think

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And he realized that he wasn't getting called into the right casting

Speaker:

roles when he was looking for, when he was starting off his career.

Speaker:

So he is like, let me change my name.

Speaker:

And so we changed his name to Ben Kingsley and people started calling

Speaker:

him in and he started getting roles.

Speaker:

Racism in early Hollywood say, it isn't so.

Speaker:

Racism in current Hollywood.

Speaker:

Say it isn't So, Wouldn't be the only to do so.

Speaker:

Yeah.

Speaker:

yeah, so I apologize to Sir Ben Kingsley, uh, for all these years.

Speaker:

Yeah.

Speaker:

You were putting it in the same category as the quote unquote

Speaker:

Indian guy from the Short Circuit movie, which I don't know his name,

Speaker:

but he is very much not an Indian

Speaker:

person.

Speaker:

Do you know who it was?

Speaker:

the name?

Speaker:

I'm looking it up.

Speaker:

Or it's also like how Apu from, uh, the Simpsons is not Indian,

Speaker:

Yeah, he's, he's played by, um.

Speaker:

Oh, I know that.

Speaker:

I know the actor, but his name is escaping me.

Speaker:

So Fisher Stevens, is that

Speaker:

Fisher Stevens.

Speaker:

Yeah, Fisher Stevens.

Speaker:

Who?

Speaker:

Those of you that watch succession

Speaker:

will, uh, uh, Fisher Stevens was in succession.

Speaker:

He was, he was a, a lawyer, a a smarmy lawyer, which

Speaker:

always plays smarmy characters

Speaker:

yeah, I was just thinking, because I remember him from the blacklist

Speaker:

where he plays Marvin, the lawyer.

Speaker:

Yeah, got, he's got kind of the lawyer face.

Speaker:

I'm glad that you, you finally realized the error of your ways.

Speaker:

But did you know he was

Speaker:

No, no, I didn't.

Speaker:

I guess I always brought it up just like you, like I would bring

Speaker:

Ben Kingsley playing Gandhi and, um, as just another example of, uh, you

Speaker:

know, what would we call it, brown face, I guess we'd call it brown face.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

But people taking actor and there's been a lot of those great roles throughout the

Speaker:

Great.

Speaker:

You know, great roles played by very not,

Speaker:

you know, people that are not of that ethnic group.

Speaker:

Yeah.

Speaker:

and I think maybe also at the time, right, there weren't many

Speaker:

Indian actors in Hollywood at all.

Speaker:

And I would rather have the fact, or I would rather it like the movie be made

Speaker:

with someone who is non-Indian, rather, because it's a great movie.

Speaker:

I

Speaker:

don't know.

Speaker:

You've seen it,

Speaker:

good movie.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

So I would rather have that rather than not having the movie at all.

Speaker:

Hmm, I see what you're saying.

Speaker:

I see what you're saying.

Speaker:

Yeah.

Speaker:

And of course, you know, we have the same challenge with, uh, Asian, uh, actors,

Speaker:

right?

Speaker:

Uh, there's literally only three Chinese actors in all of Hollywood.

Speaker:

Like if you, if you look at like the Chinese roles, they've gone to

Speaker:

literally like one, there's one guy.

Speaker:

Uh, I forgot how many roles he's had, but he has had a prolific career playing

Speaker:

every Chinese person that you know.

Speaker:

Um, but, um, anyway, so we're gonna talk about something that we've

Speaker:

alluded to a little bit on the podcast.

Speaker:

Uh, sort of tell the final saga of what I'm calling the backup from Hell.

Speaker:

I may maybe, uh, we should probably phrase that slightly differently.

Speaker:

It's probably the,

Speaker:

the backup that keeps giving.

Speaker:

the back, the backup that, yeah.

Speaker:

Uh, what a mess.

Speaker:

The beginning of the story

Speaker:

that I was asked to do a backup of two Synology boxes that they

Speaker:

were, uh, repurposing, right?

Speaker:

So they were, um, going to move the data.

Speaker:

They, they were gonna reuse these servers, but they wanted to get a backup of the, of

Speaker:

the, the data before they moved it off of

Speaker:

Backup is good.

Speaker:

Yeah,

Speaker:

Backup is good.

Speaker:

Yeah.

Speaker:

Apparently they hadn't had a backup of the, of these servers before.

Speaker:

And, um, then the, the, um, and, and , they said it was

Speaker:

about 40 terabytes of data.

Speaker:

That's the information that I was given and after I had started doing

Speaker:

the backup, I very quickly realized that 40 terabytes might have been.

Speaker:

An understatement.

Speaker:

You, found additional data around

Speaker:

right as you

Speaker:

data.

Speaker:

Yeah.

Speaker:

Uh, so it turned out that it wasn't like 40 terabytes of data.

Speaker:

It was more like 400 terabytes of

Speaker:

Yeah, and

Speaker:

I'm guessing because these were systems that were kind of probably off on the

Speaker:

side, they hadn't been used in a while.

Speaker:

Like that's, I think, the problem, and I think we talked about this in one of

Speaker:

our episodes about sort of systems that kind of get stored away in the corner.

Speaker:

No one worries about

Speaker:

it.

Speaker:

Right?

Speaker:

And do you leave it powered on your old backup systems?

Speaker:

Right.

Speaker:

We just talked about that.

Speaker:

And so I think that becomes a challenge.

Speaker:

It's when you have these systems that are no longer actively being

Speaker:

used, it kind of gets away from you.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And so the customer really didn't have any idea just how much data that they

Speaker:

were dealing with here, out to be, like I said, like close to half a petabyte of

Speaker:

Yeah.

Speaker:

And, and for you, that changes things significantly because

Speaker:

changes the backup design like massively.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

because your backup target, I think you had mentioned previously

Speaker:

that it was like a server, right?

Speaker:

That you were backing this data up to

Speaker:

I was backing it up via a server, a window server.

Speaker:

And, um, and tape, right?

Speaker:

Had tape, but it's sized like, um, you know, four 40 terabytes.

Speaker:

And so, which is, which is basically the, the, the server and the tape

Speaker:

library was the perfect size for that.

Speaker:

But as I started realizing that it was figure, it was filling up.

Speaker:

And again, this, this is my fault for not really looking at the size of the

Speaker:

data before really jumping in there, but basically I realized very quickly

Speaker:

that this was a whole lot more data

Speaker:

than,

Speaker:

than,

Speaker:

That you expect it and and I think just kind of looking at lessons

Speaker:

learned as you're a backup admin who is being told, Hey, this new

Speaker:

application is coming online.

Speaker:

Make sure that you understand like what is the expected growth of that application.

Speaker:

Because what you size for, say, a five terabyte database with a 1% growth

Speaker:

is very different than like a file server with like a 50% growth rate.

Speaker:

Yeah, exactly.

Speaker:

Um, and just because somebody says they have 10 terabytes of data doesn't mean

Speaker:

that they have 10 terabytes of data.

Speaker:

So you mentioned you had a backup server, you had a tape drive.

Speaker:

Is there a reason you chose to use tape

Speaker:

Well, the, I mean, tape is great for long-term retention of data,

Speaker:

which is what this customer wanted.

Speaker:

They wanted to hold onto this data for a long period of time,

Speaker:

and that's where tape is great.

Speaker:

And tape also is has, uh, you know, if you're able to properly feed it,

Speaker:

tape is actually, can be quite fast.

Speaker:

the challenge that I had when backing up this data that for various reasons,

Speaker:

which I think I, I think by the end I sort of figured out the, the

Speaker:

core reason for various reasons.

Speaker:

Individual backups off of the, these filers, the, they were just

Speaker:

slow, just, um, you know, they were

Speaker:

Like how slow, slow.

Speaker:

like, slow, was like, like, like three and a half kilobytes a second slow.

Speaker:

So like slower than like a 56 K modem back

Speaker:

Yeah.

Speaker:

Right.

Speaker:

And you can multiplex all you want.

Speaker:

So first off, you know, I, I was using NetBackup, which, you know, NetBackup, it

Speaker:

did a great job at what we had available.

Speaker:

Um, the, challenge was that because I couldn't put.

Speaker:

The client on the filers themselves.

Speaker:

So the, was a way allegedly to put a, a backup client on the filer,

Speaker:

but I could never get that to work.

Speaker:

And so I had to back up over SMB because I'm backing up

Speaker:

over SMB, I'm just, I'm just.

Speaker:

I'm just limited at what that was, right?

Speaker:

What,

Speaker:

I could get, and because I'm backing up over SMB, the client

Speaker:

is just the backup server,

Speaker:

right?

Speaker:

So instead of running a backup from two clients, I'm running a backup from one

Speaker:

client because that's the backup server.

Speaker:

I'm backing it up over SMB.

Speaker:

And because of that, I'm limited to the number of jobs I can run at one time.

Speaker:

NetBackup, um, says 99 99 jobs, which should say, gee, that

Speaker:

sounds like a

Speaker:

Nine problems.

Speaker:

right?

Speaker:

But, but the thing is, towards the end, as I was running a lot of these backups,

Speaker:

the aggregate speed of like 99 backups was only like 30, 40 megabytes a second,

Speaker:

you

Speaker:

you're talking about 400 terabytes of data to

Speaker:

400 terabytes of data doing the math.

Speaker:

I backed up for months,

Speaker:

right?

Speaker:

And I tried all these different things.

Speaker:

Uh, you know, num, you know, was I running too many backups at a time?

Speaker:

Was I running not enough backups at a time?

Speaker:

You know, it, um, you know, and then the problem is every, every

Speaker:

test would take days or weeks.

Speaker:

Think we should mention one thing.

Speaker:

You were talking about these test taking days or

Speaker:

mm-hmm.

Speaker:

and then do you wanna mention sort of some of the issues you ran into with these long

Speaker:

running jobs just due to infrastructure or

Speaker:

Yeah.

Speaker:

other issues in the environment?

Speaker:

yeah, you, you backups are not made to run over weeks or months.

Speaker:

Just backup infrastructure isn't made to work like that.

Speaker:

And so when you do backups over weeks or months.

Speaker:

Weird things happen that, cause you know, consternation, one of the things

Speaker:

is LTO tape drives are great, but like we were using like the half high LTO

Speaker:

drives and as far as I could tell, their duty cycle was not meant to

Speaker:

be a hundred percent for two months.

Speaker:

Right.

Speaker:

Um, they're meant to be backed up for, you know, several hours and then give

Speaker:

'em a rest and then back up several hours and then give 'em a rest.

Speaker:

I was just beating the crap outta these things for weeks or months at a time.

Speaker:

And what would happen is after some significant period of time,

Speaker:

it would just go write error.

Speaker:

And that's fine when a backup runs for a few hours and then just try again.

Speaker:

But if you, but if it took you two weeks or three weeks to get to that point

Speaker:

and then you get a write error, um,

Speaker:

then

Speaker:

it's not like you could restart these jobs either, right?

Speaker:

I think you're running into

Speaker:

Yeah.

Speaker:

Well,

Speaker:

I mean, I mean, I could restart em, but, but it's like after

Speaker:

a period of time I became, I.

Speaker:

I eventually got to the point where I said, tape is not my friend.

Speaker:

I, anybody who

Speaker:

this is coming from Mr.

Speaker:

Backup.

Speaker:

know anybody who listens to this podcast knows that I am, I am a friend of tape,

Speaker:

right?

Speaker:

I believe strongly in tape for a lot of reasons, but I don't think that, uh,

Speaker:

specific and, and you know, maybe the, my LTO friends can chime in here, but I don't

Speaker:

think that these tape drives were designed to be backed up to like this for weeks and

Speaker:

months at a time, 24 7 with no, because as soon as one, I was multiplexing

Speaker:

as many backups together as I could.

Speaker:

And when one backup would finish, I would just add another backup onto it, right?

Speaker:

Because

Speaker:

I, I could, I could, I.

Speaker:

what I couldn't do is I couldn't say, well, let's do these 10 backups, let

Speaker:

them run until they're finished, and then we'll do the next 10 backups.

Speaker:

And that would've given the tape drives a, a moment to breathe, I think.

Speaker:

But, uh, I couldn't do that because the, because we, we just

Speaker:

didn't have that kind of time.

Speaker:

And so I

Speaker:

was just, I was just try, you know, tagging it

Speaker:

and, and I know you've always talked about like the shoe shining problem,

Speaker:

given that you're not going very fast with these backups, right.

Speaker:

Do you think that also led to some issues as well for the tape drives?

Speaker:

yeah.

Speaker:

So again, the core problem was that each individual backup was running slow.

Speaker:

matter how many of them that I multiplex together, it was not enough

Speaker:

speed to make the tape drive happy.

Speaker:

And so, yes, the tape driver shoe shining.

Speaker:

And when a tape tribe is continually shoe shining, the tape drive will fail.

Speaker:

And so everything, I remember learning about tape drives was

Speaker:

coming back to haunt me, right?

Speaker:

Um, this is all of the design that I was, that I had done throughout

Speaker:

the years on backup, um, you know,

Speaker:

um, backup system

Speaker:

And system.

Speaker:

all of the things that, you know, what do you do when the backups, you know?

Speaker:

And so I came to understand

Speaker:

that the only way I was gonna finish this backup was to do it to disc.

Speaker:

And just quickly before you move on, I think along the way, didn't

Speaker:

you also have a tape drive that failed that you then had to go

Speaker:

Oh, multiple Multiple times.

Speaker:

Swap out tape drives, reboot tape drives, put in cleaning tapes and tape drives.

Speaker:

And by the way, that's another thing is the way tape drives normally do

Speaker:

is you run them for a certain number of hours and then there's a cleaning

Speaker:

tape that goes in there and cleans it.

Speaker:

And when you have a robotic library, that happens automatically.

Speaker:

Well, when you just run the tape drive for.

Speaker:

Two months, you know, that

Speaker:

And so at some point the tape drive just fails.

Speaker:

Yeah.

Speaker:

um, yeah.

Speaker:

And so I ultimately that the only way to get this done was to, um, you know,

Speaker:

buy, uh, enough disc to back this up.

Speaker:

And that wasn't cheap.

Speaker:

Uh, but I, I didn't think that there was any other way that this was ever

Speaker:

going to get done 'cause again, the core problem that we've had with tape

Speaker:

for the last three decades has been that the backup, if the backup isn't

Speaker:

too fast enough for the tape drive it's a, it's a fundamental mismatch

Speaker:

right?

Speaker:

And so we use to make that better.

Speaker:

But if the multi, but if the speed you're dealing with is in kilobytes a second,

Speaker:

Yeah.

Speaker:

Well, and especially 'cause you're limited by those two, uh, Synology boxes, right?

Speaker:

Which are limiting your bandwidth, right?

Speaker:

It's not like

Speaker:

Yeah.

Speaker:

Synology boxes you can then pull from,

Speaker:

Yeah, and I was, I was watching, like, I was running every kind of tool I could

Speaker:

run to see, like, I wasn't overt tasking.

Speaker:

The, that was the really weird part is that the, it's not like the

Speaker:

Synology boxes were saying, you're really beating the crap out of it.

Speaker:

You shouldn't do so

Speaker:

backups at a time.

Speaker:

It wasn't, it, it was, I didn't have a high I/O wait.

Speaker:

I didn't have high CPU, I didn't have high ram.

Speaker:

There, there was no, there was no

Speaker:

rhyme or as to why we'll get to the rhyme or reason later.

Speaker:

I figured it out.

Speaker:

Um, but, but I knew the tape and I knew the tape and this wasn't gonna work.

Speaker:

So, so I had to bring in, uh, a couple of other Synology disc arrays, by the

Speaker:

way, and populate them with enough disc to handle all of this, uh, this backup.

Speaker:

Right.

Speaker:

Yeah,

Speaker:

And, um.

Speaker:

Then

Speaker:

but that wasn't without its issues either.

Speaker:

Right?

Speaker:

When you, when you brought those in, that wasn't without its issues either.

Speaker:

No, it wasn't without issues.

Speaker:

And the other thing, what I needed to do was to, I felt that with, in terms of the

Speaker:

number of directories that were remaining, I wasn't sure like the different sizes.

Speaker:

So what I did was I split, I.

Speaker:

Those jobs into many smaller jobs.

Speaker:

NetBackup is really good at like running thousands of jobs, right?

Speaker:

So rather than just have a hundred jobs, I turned that into like 2,400 jobs.

Speaker:

Like I went,

Speaker:

I went another level deep and created a policy for each of these

Speaker:

directories, and then I ran those and it was running for a while.

Speaker:

It was, it was, you know, again, more time.

Speaker:

And what I started seeing.

Speaker:

Were these jobs that were like an individual job that was running

Speaker:

inordinate amount of time.

Speaker:

but you also had some jobs that would finish like super fast, right?

Speaker:

Like

Speaker:

They'd finish five, they'd finish in

Speaker:

Some of 'em, some of 'em finished in five minutes, some 'em would finish.

Speaker:

But I noticed that over time there were certain policies that were running for

Speaker:

really, really long periods of time, and eventually started poking around.

Speaker:

when I discovered what ultimately was the, the true culprit.

Speaker:

And, uh, anyone who's been around backup for a long time

Speaker:

has seen this culprit before.

Speaker:

It's just, this is the worst example of this culprit that I've ever seen.

Speaker:

And what is that?

Speaker:

We affectionately refer to it as the million file problem.

Speaker:

Hmm.

Speaker:

Because remember, again, going back to that, um, that client back from

Speaker:

25 years ago, we had one server.

Speaker:

That was going to be storing a bunch of images and it was going

Speaker:

to result in millions of files.

Speaker:

And we knew that back then that the million file problem is, a real problem.

Speaker:

and and million file problem ev over, over the network is even worse, right?

Speaker:

Because everything is, is, is a

Speaker:

round trip.

Speaker:

The way we fixed it back then was we used a product back then called

Speaker:

flashback, which would back up at the raw level, but store the

Speaker:

information, and that was not available to me.

Speaker:

Why?

Speaker:

Because that product no longer exists

Speaker:

No.

Speaker:

because it doesn't run on a Synology box.

Speaker:

Right.

Speaker:

Remember, I'm not the Synology

Speaker:

All it was was an SMB mount to me.

Speaker:

Right?

Speaker:

And by the way, for those curious, yes, I tested SMB, I tested NFS.

Speaker:

It didn't matter.

Speaker:

It didn't matter.

Speaker:

Um, the um.

Speaker:

And

Speaker:

by the way, this was a constant, you know, you know the phrase, never, never

Speaker:

go into battle with an untested weapon.

Speaker:

This was constant example of I am in the battle, I'm in the stuff,

Speaker:

and now I'm trying to test stuff

Speaker:

and, and I did to try to make things better, just made it take longer

Speaker:

and the client just had to wait.

Speaker:

And the the client was incredibly patient, honestly.

Speaker:

And, and you know, I did my best to say, look, I, I've been doing this for 30

Speaker:

years, I've never seen anything like this.

Speaker:

Right.

Speaker:

And that, that helped.

Speaker:

But in the end, I was backing up.

Speaker:

You know, we got down to, I, I learned a way to identify which

Speaker:

were the problem directories.

Speaker:

So I would kick off a policy and I would watch, and I would notice

Speaker:

that had run for, let's say an hour.

Speaker:

And it listed, let's say 300,000 files backed up.

Speaker:

kilobytes.

Speaker:

Hmm.

Speaker:

Literally there's, there's a kilobyte column that

Speaker:

kilobytes of byte and there's no value in there.

Speaker:

We backed up 300,000 files, no kilobytes.

Speaker:

so that, that helped me identify these problem

Speaker:

Problem child.

Speaker:

Yeah.

Speaker:

it and let the other non-problem policies finish.

Speaker:

And

Speaker:

Right.

Speaker:

Yeah.

Speaker:

up getting down to like 150 policies that were the problem policies.

Speaker:

And so I backed them up and I was able to get them.

Speaker:

Over time, I was able to get them backed up, and then finally I got down to about

Speaker:

20 policies, I think somewhere around

Speaker:

policies.

Speaker:

Go ahead.

Speaker:

And at this point when you're down to the 20, like some of these have

Speaker:

been running for a long time, right?

Speaker:

Like how?

Speaker:

like two months backups that have been running for two months,

Speaker:

successfully running for two months.

Speaker:

Yeah.

Speaker:

And what was good was at this point again.

Speaker:

Like this is information that would've been really helpful to have at the

Speaker:

beginning, but it was information that, to get all this information at the

Speaker:

beginning, it would've taken time to, like we, we just wanted to get started.

Speaker:

Yeah.

Speaker:

What I ended up finding was that, um, these backups, um.

Speaker:

The, the, there were millions and millions and millions, like one of the, one

Speaker:

of the directories that I was backing up, it had 99 million files in it,

Speaker:

one directory, 99 million files, and eventually what I realized was that

Speaker:

again, the problem this time was just SMB.

Speaker:

So the fact that every one of these files results in a round

Speaker:

trip conversation, possibly multiple round trip conversations.

Speaker:

Yep.

Speaker:

And I realized that the only way I was gonna back up these truly problem

Speaker:

directories was to back them up locally.

Speaker:

But how do I back them up locally?

Speaker:

Well, luckily this is when I just, you know, basically go back

Speaker:

to dumb, dumb old backup tools.

Speaker:

And so I was able to run a backup using tar logged in locally

Speaker:

on the filers, and then just.

Speaker:

Directing the tarball across the network that finally worked.

Speaker:

That's crazy.

Speaker:

So you had these 20 jobs, right?

Speaker:

And some of them you said were running for 60 plus days, and then you sort of

Speaker:

were like, okay, let me start this over.

Speaker:

And by the way, you were kind of forced to start them over

Speaker:

because something happened right?

Speaker:

At

Speaker:

yeah.

Speaker:

Something some unknown thing.

Speaker:

Um, I think I.

Speaker:

I, I, I don't know.

Speaker:

I, I actually don't know

Speaker:

what caused it, but they, they did fail

Speaker:

and,

Speaker:

And you were like, I'm not gonna start these

Speaker:

yeah.

Speaker:

I'm not gonna start 'em again.

Speaker:

It's just, yeah.

Speaker:

Well, Because

Speaker:

like, one of jobs, the, the one with 99 fi, 99 million

Speaker:

files, we were nowhere near.

Speaker:

I.

Speaker:

yeah.

Speaker:

After 60 days you were barely

Speaker:

yeah, yeah.

Speaker:

We're barely, barely scratching the surface.

Speaker:

so I'm like, I, I, I don't have, I don't have that, you know, I, I don't

Speaker:

have the amount of time that it would take, so, so I switched to, you know,

Speaker:

experimentally once again, experimentally, I'm experimenting on the fly, I'm

Speaker:

doing development in production.

Speaker:

Uh, I was like, well, let me see how long, how quick a tar ball would run.

Speaker:

I ran a tar ball.

Speaker:

I remember for like a day, you remember this?

Speaker:

I ran a

Speaker:

a day and it, I, I had a du of the size of the directory and after a day it had

Speaker:

done like, like a half of it or something.

Speaker:

Yeah.

Speaker:

You're like, what?

Speaker:

Once taking 66 days and barely scratch the

Speaker:

yeah,

Speaker:

You are mainly done.

Speaker:

Almost done within a day.

Speaker:

yeah.

Speaker:

And so I was like, this is the way.

Speaker:

Right.

Speaker:

So it, it, it wasn't, it wasn't a way for everything because the, the, this

Speaker:

was, um, because I, you know, I'm glad that I, that I use NetBackup for the

Speaker:

bulk of it, because then I have the catalog data and, you know, and, um,

Speaker:

but

Speaker:

on the restore side.

Speaker:

yeah, yeah.

Speaker:

So this will.

Speaker:

This will be the diff the restores will be more difficult for these

Speaker:

like remaining 20 directories.

Speaker:

I mean, not, not astronomically.

Speaker:

So like,

Speaker:

you know, can create a tarball, a

Speaker:

list of this.

Speaker:

So, you know, lessons learned, like,

Speaker:

do that.

Speaker:

Don't store millions of files on the other side of a, of an SMB box.

Speaker:

I guess

Speaker:

Yeah, so Well, and I think a couple things, even if it's not SMB, right?

Speaker:

Just having that many files, because I think what people don't realize is

Speaker:

even though the size of every disc has gotten significantly larger, right?

Speaker:

You're talking like 18 terabyte, 20 terabyte disk

Speaker:

Yeah.

Speaker:

They can only handle so many operations per disc, right?

Speaker:

That number hasn't changed.

Speaker:

It's about a hundred per second.

Speaker:

And so no matter how many, how big your disc is, right?

Speaker:

If it was 21 terabyte discs, right, then you get 20 times a hundred iops.

Speaker:

Versus if it's one 20 terabyte disc, you only still get that a hundred.

Speaker:

So that's a big thing that people don't realize with these larger size discs.

Speaker:

Yeah.

Speaker:

And, and the thing was that the.

Speaker:

That many files.

Speaker:

So, because the problem, the, ultimately the problem wasn't disc io, the problem

Speaker:

io.

Speaker:

Right?

Speaker:

Network latency.

Speaker:

So, because

Speaker:

when I actually ran, I ran two tar balls.

Speaker:

I.

Speaker:

Simultaneously is what I did.

Speaker:

I using

Speaker:

I just, I ran, I was always running two at a time.

Speaker:

When I was running two at a time, I/O wait was sitting at 10,

Speaker:

which is, is high,

Speaker:

but I was like, well, it's got nothing else going on, so I'm, I'm

Speaker:

it go.

Speaker:

Right?

Speaker:

The highest I/O wait ran during all of those hundreds of

Speaker:

simultaneous backups was like four.

Speaker:

yeah,

Speaker:

So like I wasn't disc bound.

Speaker:

I was

Speaker:

bound, but not network bound in terms of throughput, network bound, in terms of

Speaker:

Laid C,

Speaker:

and

Speaker:

of operations, just because SMB is very chatty.

Speaker:

very chatty.

Speaker:

It's probably the chattiest of the protocols,

Speaker:

and

Speaker:

we, you

Speaker:

it was just a really combination.

Speaker:

Yeah.

Speaker:

And you know why this, and this is why backup vendors have their own protocols,

Speaker:

like Data Domain has boost, right?

Speaker:

To help alleviate and solve some of these issues.

Speaker:

Yeah.

Speaker:

You talked about, don't, don't do the somewhere we were talking about.

Speaker:

Just don't do this.

Speaker:

I, I'd like, I'd like to talk today.

Speaker:

When I looked at these, these, uh, these directories that had these

Speaker:

tens of millions of files, it was a structure that was very clearly

Speaker:

created by some application.

Speaker:

one of these directors had a common structure created by some.

Speaker:

I'm gonna say stupid application that thought this was perfectly fine.

Speaker:

That it was perfectly fine to create 99 million files for

Speaker:

Do you know, I,

Speaker:

item.

Speaker:

I bet they were using the file system as a database

Speaker:

I don't know.

Speaker:

what it was.

Speaker:

given just like the number of files and the size of those files.

Speaker:

I know it was forensic type information

Speaker:

and I, I don't, I clearly

Speaker:

That, that's fine.

Speaker:

Yeah, yeah,

Speaker:

No, I'm just saying I clearly don't know enough about forensic stuff

Speaker:

to know why they would want tens of

Speaker:

of vials,

Speaker:

but

Speaker:

So where are you?

Speaker:

So you talked about these 20 jobs that you were starting to do tarballs with.

Speaker:

So where are you right now?

Speaker:

So, so we finished all of them, but one, there was one that for some reason

Speaker:

it, it, the file didn't look right.

Speaker:

It was weird.

Speaker:

Um, it, the, the, the backup completed, but the, some reason, the, the tarball,

Speaker:

it just, it just didn't look right.

Speaker:

I don't wanna go into details.

Speaker:

It just didn't look

Speaker:

so I'm rerunning that one.

Speaker:

So it, based on its size and how well it's doing, it should

Speaker:

finish in about a day or so.

Speaker:

Um, and what I'm

Speaker:

is a significant improvement in terms of

Speaker:

A significant improvement a day versus, you know, a year, um,

Speaker:

Or two, I think actually it might have been two.

Speaker:

Yeah,

Speaker:

Agreed.

Speaker:

Um, and what I'm doing is I'm, because again, I don't have the catalog.

Speaker:

What I'm currently running is I'm running a tar TVF.

Speaker:

On all of those files and creating tarballs or creating, I'm sorry, text

Speaker:

files, a list.

Speaker:

of the, the files that are in there.

Speaker:

And then I'm gonna do a count on the files that are in there and

Speaker:

check it against the count of the files that are in the directory.

Speaker:

And, and hopefully those numbers should be the same.

Speaker:

Yeah, because I believe you are even saying that to run things

Speaker:

like a find to get a list of all the files in a directory or a DU

Speaker:

Yeah.

Speaker:

hours, right?

Speaker:

Well, it was days actually.

Speaker:

In

Speaker:

fact, it was why I didn't have this information in the beginning

Speaker:

because everything was so big and every find, every du every command

Speaker:

that I had DU is quicker than find.

Speaker:

DU is.

Speaker:

It just does less work than find.

Speaker:

But the problem that I ultimately realized was that DU wasn't

Speaker:

really being helpful in terms of.

Speaker:

The

Speaker:

scope of the job, what was the scope of the job was determined

Speaker:

by the number of these files.

Speaker:

And I couldn't get those numbers because that was the thing that took forever.

Speaker:

the number of jobs dwindled down to about 20, that's when I

Speaker:

was able to run these, uh, the

Speaker:

and they would, they would actually complete.

Speaker:

And that's when I realized just how bad it was.

Speaker:

so if you had to start this over, and hopefully you never do, but I'm just

Speaker:

saying, if you had to go back to day one, what would you do differently?

Speaker:

I know you talked about making sure you understand the size of your backups.

Speaker:

Right.

Speaker:

It just feels like some of these, you just have to go through the process

Speaker:

though because you don't know what to do.

Speaker:

Like it's not like you could just start day one and be like,

Speaker:

oh, I know I need to go to disc.

Speaker:

I need to do X, Y, and Z.

Speaker:

Right?

Speaker:

It's sort of like a learning process.

Speaker:

would say that I.

Speaker:

Yeah, because the problem is you're going off into the unknown,

Speaker:

you're doing a backup of something that you don't know what it is.

Speaker:

And I, I would say if possible, if at all possible, get things like

Speaker:

dus, uh, you know, discus it, it's a Unix command, but you can load those

Speaker:

tools and windows as well get, like if you're going to back up, if you're

Speaker:

gonna back up a hundred directories.

Speaker:

Get a du of every one of those directories so that you have an idea

Speaker:

of just what you're dealing with,

Speaker:

if at all possible.

Speaker:

Also, look and see if the number files and if the number of, and if you're

Speaker:

trying to do a, you know, it's not that hard, you just run a fine dot dash,

Speaker:

you know, I didn't even do a print just fine dot pipe to wc -l, right?

Speaker:

That was it.

Speaker:

Right?

Speaker:

Um, to, to get the number of files.

Speaker:

I'd say if again.

Speaker:

If I could go back in time, I, I would say maybe do a little bit more of this

Speaker:

research prior to beginning the job.

Speaker:

Um, but that's diff it's, it's easy to say that now,

Speaker:

um, because I know what

Speaker:

I know.

Speaker:

Right.

Speaker:

Um, but the, you know, the core problem was that you've

Speaker:

got these millions of files.

Speaker:

I mean, which is all.

Speaker:

Already gonna be a problem if you're backing it up in any sort of normal way.

Speaker:

But if you're

Speaker:

up remotely over the network, it's going to kill you.

Speaker:

Yeah.

Speaker:

So, um, you gotta figure out a way to do that.

Speaker:

And then I would just say, see if there's anything that you can do with the, with

Speaker:

the application that's created this data

Speaker:

which is why it's important to get involved early on, right when an

Speaker:

application is being developed or deployed, right, to get involved so

Speaker:

they understand the backup requirements.

Speaker:

yeah.

Speaker:

And so, this backup that would never finish, I literally was, I

Speaker:

was starting to think that this thing was never gonna finish.

Speaker:

Um.

Speaker:

It's essentially finally, I mean, it's not, at this point, it's

Speaker:

not a hundred percent, but I'm, I'm now, you know, it's just, I'm

Speaker:

at the finish line.

Speaker:

Yeah.

Speaker:

at the finish line.

Speaker:

Yeah.

Speaker:

Um, it's nice.

Speaker:

I know one of the other things you mentioned that you were using

Speaker:

NetBackup, but you had also looked at other tools out there as well, right?

Speaker:

That could potentially help you with this effort.

Speaker:

Right.

Speaker:

So do you think that that becomes valuable, like either looking at other

Speaker:

tools, um, I know you had reached out to like synology support, you

Speaker:

had reached out to some experts, like

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

The problem there, there were, there were, you could do, like with Synology,

Speaker:

you can like copy the data from A to B.

Speaker:

Mm-Hmm.

Speaker:

They have this ability essentially like, you know, for lack of a

Speaker:

better word, they have Snap Mirror.

Speaker:

they have the equivalent of Snap Mirror.

Speaker:

Yep.

Speaker:

from onSynologygy box to another.

Speaker:

But to me that wasn't really a backup like I wanted in a, in a format, you know,

Speaker:

the end I was forced to not do what I wanted with the tar.

Speaker:

Um, but I wanted it in a cataloged format.

Speaker:

So we looked at a couple of, the problem was never NetBackup.

Speaker:

Right?

Speaker:

NetBackup made it, um, easy to script this whole thing because it was the

Speaker:

only way I could make sense of it.

Speaker:

'cause it was, it was thousands of directories and, um, and even

Speaker:

more thousands of sub directories under those directories.

Speaker:

And the only way I could make sense of this was to script it all.

Speaker:

And, um, the, the fact that NetBackup allowed me to do that was great.

Speaker:

Um, there are some other tools these days, some of the newer tools,

Speaker:

they want to make it easy for you.

Speaker:

But if you get into a complicated situation like this, some of the newer

Speaker:

tools don't even have the ability to sort of grab it by the horns.

Speaker:

The

Speaker:

able to do a NetBackup,

Speaker:

Yeah.

Speaker:

I think the other thing also that you were doing, which I thought was interesting,

Speaker:

was also your scripting, right?

Speaker:

Trying to automate this, like, uh, I know like scheduling your,

Speaker:

the backup policies to run, right?

Speaker:

And then you were sort of doing load balancing to make sure

Speaker:

that you keep the two filers

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

I couldn't, yeah, that was the thing.

Speaker:

I couldn't normally, I, I just, I believe in just throwing

Speaker:

everything in the NetBackup schedule or, and let it figure it out.

Speaker:

But because again, because of the limitations of the weird thing I had,

Speaker:

I, I couldn't figure out a way to load balance across the two target filers.

Speaker:

the NetBackup scheduler.

Speaker:

Um, maybe I could have, uh, done that better.

Speaker:

I don't know.

Speaker:

But, uh, so the way I was doing it was I was just assigning a backup.

Speaker:

a backup would finish, I would assign the next backup to that, that the

Speaker:

was now had more space available to it.

Speaker:

Right.

Speaker:

So I just had a while loop that was running, you

Speaker:

know, checking to see if a backup job was done.

Speaker:

but I think that's important, right?

Speaker:

You can always script some of these things that if it doesn't

Speaker:

exist in the native tools, right?

Speaker:

Don't be afraid.

Speaker:

Yeah.

Speaker:

Don't be afraid.

Speaker:

you know, obviously I'm, I'm pretty good at scripting and

Speaker:

I'm pretty good in the backup.

Speaker:

And, um, th there are, and, and, and, and thanks.

Speaker:

Thanks very much to Veritas for keeping their, uh, their documentation online.

Speaker:

Uh, the number of times I Googled.

Speaker:

You know, backup job, you know, how do, how do I list, uh, you know, and

Speaker:

I know there's a, there's, I know there's a command to, to do this.

Speaker:

How do I do that?

Speaker:

And, you know, and then a man page would come up and I would read it

Speaker:

and I was like, oh, yeah, yeah, yeah.

Speaker:

It's

Speaker:

been a while.

Speaker:

Yeah.

Speaker:

Um.

Speaker:

you have to also thank Cygwin, of course.

Speaker:

Yes, special thanks to to Cygwin Without Cygwin.

Speaker:

That is the tool that you can download and run on any Windows

Speaker:

server to give you Unix capabilities.

Speaker:

I will say there were, there were moments where Cygwin was both helpful and

Speaker:

terrorizing me because it was the whole like backslash versus forward slash thing.

Speaker:

Because in Windows, you know, the file separator is a backslash, which

Speaker:

in Unix is an escape character,

Speaker:

Yep.

Speaker:

and Cygwin wasn't consistent.

Speaker:

When that escape character would be an escape character.

Speaker:

Like, like if you piped it into a file, it would do one thing.

Speaker:

If you piped it into a command, it would do it, it would behave differently.

Speaker:

And, um, so that, that definitely l lent.

Speaker:

The fact that I was doing constant file manipulation on directories

Speaker:

that were seven levels deep,

Speaker:

Yeah.

Speaker:

did not help.

Speaker:

Yeah.

Speaker:

Oh, and then I couldn't, the, the, the, the one thing with

Speaker:

Cygwin is that it doesn't see.

Speaker:

It doesn't see the, to point the backups to NetBackup, I have to point

Speaker:

'em in the backs back slash filer name

Speaker:

share name.

Speaker:

Cygwin doesn't see that.

Speaker:

Cygwin sees only mapped drive names

Speaker:

and

Speaker:

have to map it using

Speaker:

you have to map it to a drive name.

Speaker:

Let's say you map it to,

Speaker:

to letter F, and then in Cygwin you would see /cygdrive/f.

Speaker:

Which would be the same as this backs slash backs mount.

Speaker:

know, I was constantly having to go back and forth between

Speaker:

those two and, and that was fun.

Speaker:

Um,

Speaker:

scripting

Speaker:

here's the thing.

Speaker:

After all of this experience and everything you've learned, you're probably

Speaker:

never gonna use any of this again.

Speaker:

I don't know about that.

Speaker:

I dunno about that.

Speaker:

I tell you what, I'm, I'm taking a tar, all those scripts that

Speaker:

I wrote, um, because I will say this, that, that the NetBackup

Speaker:

documentation while, uh, extensive, it doesn't give a lot of examples.

Speaker:

And so like, I'm thinking of like, um, like the BP duplicate command,

Speaker:

which is the command to copy backups from one place to another.

Speaker:

I couldn't, I couldn't figure out from reading the man page how to

Speaker:

actually do, to do what I needed to do.

Speaker:

So I would, I would like.

Speaker:

I would do, I would have to run tests, you

Speaker:

know, I'd, you know, um, and, um, the, you know, not like now that Cohesity's

Speaker:

acquiring them, it's not like they're now gonna rewrite their man pages.

Speaker:

I just thought that they could have used some more, some more examples.

Speaker:

But

Speaker:

Yeah.

Speaker:

I figured it out eventually.

Speaker:

You know, I think someone used to have a forum that people would post on about.

Speaker:

Yeah, someone used to have that and then, but people stopped posting

Speaker:

on that forum, so I don't know

Speaker:

You know?

Speaker:

Um, where people are getting their help now,

Speaker:

but, uh,

Speaker:

Well, I'm glad that this is almost over,

Speaker:

yeah.

Speaker:

Yeah.

Speaker:

nearly over and I'm glad you're still alive,

Speaker:

I am alive.

Speaker:

I didn't kill anyone along the way.

Speaker:

I didn't scream at anyone.

Speaker:

Like the, the story that

Speaker:

you have heard were, were Curtis Cuss Preston.

Speaker:

I didn't scream at anyone.

Speaker:

yeah.

Speaker:

but I really, really, really think you should do an office space on those filers.

Speaker:

yeah.

Speaker:

Well, that would sort of defeat the purpo of the

Speaker:

but, uh, I, yeah, I, like that idea.

Speaker:

Hmm.

Speaker:

Anyway.

Speaker:

Well, uh, thanks Prasanna for helping me, uh, sort of through this.

Speaker:

You were my constant counselor through this.

Speaker:

I think I learned a bunch.

Speaker:

I know usually I'm all about YouTube knowledge, but in this case it was

Speaker:

the Preston knowledge, so it was good.

Speaker:

I.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

uh, thanks everybody else for, uh, uh, listening along with this sad, sad story

Speaker:

with I think a decent, happy ending.

Speaker:

That is a wrap.

Speaker:

The backup wrap up is written, recorded and produced by me w Curtis Preston.

Speaker:

If you need backup or Dr.

Speaker:

Consulting content generation or expert witness work,

Speaker:

check out backup central.com.

Speaker:

You can also find links from my O'Reilly Books on the same website.

Speaker:

Remember, this is an independent podcast and any opinions that you

Speaker:

hear are those of the speaker.

Speaker:

And not necessarily an employer.

Speaker:

Thanks for listening.