Speaker:

You found the backup wrap up your go-to podcast for all things

Speaker:

backup recovery and cyber recovery.

Speaker:

In this episode, we jump into disaster recovery testing, and trust me, you don't

Speaker:

wanna learn these lessons the hard way.

Speaker:

I've got some wild stories about DR.

Speaker:

Tests gone wrong.

Speaker:

Including one from my early days at a bank that'll make you cringe.

Speaker:

My co-host persona, and I break down exactly how to approach DR.

Speaker:

Testing the right way, starting with the basics and working your way up.

Speaker:

We'll tell you why non-destructive testing is absolutely critical.

Speaker:

Seriously, you don't wanna blow up your production environment just to test Dr.

Speaker:

And how to set realistic success criteria that won't make you cry Another episode

Speaker:

from The Lessons From the Trenches.

Speaker:

I hope you like it.

Speaker:

By the way, if you don't know who I am, you're a first time listener.

Speaker:

I'm w Curtis Preston, AKA, Mr.

Speaker:

Backup, and I've been passionate about backup and recovery for over 30 years.

Speaker:

Ever since.

Speaker:

I had to tell my boss that we had no backups of the production

Speaker:

database that we had just lost.

Speaker:

I don't want that to happen to you.

Speaker:

That's why I do this podcast.

Speaker:

On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.

Speaker:

This is the backup wrap up.

Speaker:

Welcome to the show.

Speaker:

Hi, I am your host, w Curtis Preston, AKA, Mr.

Speaker:

Backup, and if you could take just a quick moment to subscribe

Speaker:

or follow so that you'll get our great content, that would be great.

Speaker:

I am sitting here with none other than the guy who's very concerned.

Speaker:

About my obsession with serial killers lately, or at least one serial killer.

Speaker:

How's it going?

Speaker:

How's it going?

Speaker:

Persona?

Speaker:

I am good, Curtis.

Speaker:

Yeah,

Speaker:

about me?

Speaker:

so I think after this you should watch Hannibal,

Speaker:

the TV show, not the movie.

Speaker:

They both are good, but

Speaker:

Right?

Speaker:

Right?

Speaker:

And then we could discuss about whether people taste differently

Speaker:

depending on what they eat.

Speaker:

that's always something you talk about.

Speaker:

Uh,

Speaker:

your latest obsession is Dexter.

Speaker:

Yeah.

Speaker:

Which, which, for the record, I'm rewatching, right?

Speaker:

I, I enjoyed Dexter when it was on and when I had to wait a week, right?

Speaker:

going back to those old days

Speaker:

oh Lord,

Speaker:

I, you know, I'm, I'm kind.

Speaker:

I think you're, aren't you?

Speaker:

Like, don't you watch you, you do not watch shows when they're on.

Speaker:

You wait until they're done and then you binge them,

Speaker:

right?

Speaker:

Yeah, for

Speaker:

the most part, although these days I don't actually have like broadcast tv,

Speaker:

Right.

Speaker:

it's whatever's available on Netflix or Amazon or Take your pick.

Speaker:

We, we, we actually have YouTube tv, so we have broadcasts and there's

Speaker:

some shows that we watch on there.

Speaker:

Um, but there shows where, like, you know, this episode, that episode,

Speaker:

they're, they're all the same, you know?

Speaker:

And

Speaker:

Yeah.

Speaker:

you don't, it's not like, uh, it's not like

Speaker:

Dexter where there's an ongoing

Speaker:

so here's the funny thing, right?

Speaker:

So it was, it started off first on.

Speaker:

tv.

Speaker:

Right.

Speaker:

And like you said, you had to wait a

Speaker:

Showtime.

Speaker:

Yeah.

Speaker:

And then what I used to do, right, so I didn't have Showtime,

Speaker:

Mm-Hmm.

Speaker:

so I had to wait for it to come out on Netflix DVDs,

Speaker:

Oh,

Speaker:

right.

Speaker:

I would request the DVDs.

Speaker:

And of course I was cheap, frugal, however you want to say it, right?

Speaker:

And so I had like the two DVD plan, so I'd request like two DVDs, right?

Speaker:

So you'd get like six episodes.

Speaker:

Yeah.

Speaker:

Five or six episodes, you'd binge watch those and then you'd send them back.

Speaker:

And then you'd have to wait a week to then get the next set of DVDs

Speaker:

so funny.

Speaker:

in

Speaker:

order to watch it.

Speaker:

And that's I think, how I ended up watching Dexter.

Speaker:

And I think Breaking Bad might've been the same way too.

Speaker:

Yeah, you're just, you're cheaper than me.

Speaker:

Dexter is currently on Netflix,

Speaker:

so, um, yeah.

Speaker:

Anyway, so, and you could do, you can watch all these things while you do.

Speaker:

Disaster recovery testing.

Speaker:

'cause there's a lot of time, there's a lot of time when you do DR testing,

Speaker:

there's a lot of downtime, right?

Speaker:

You sit there and you stare at the screen.

Speaker:

Um, and, um, I'm gonna, I'm

Speaker:

gonna, I'm gonna start out a story.

Speaker:

What's that?

Speaker:

but can you really?

Speaker:

What,

Speaker:

I understand like if you're watching Dexter or pick one of these

Speaker:

very, the shows that pull you in,

Speaker:

uh huh.

Speaker:

Would you actually be focused or would your DR testing basically balloon like

Speaker:

10 times the normal amount of time?

Speaker:

Because you're

Speaker:

like, oh yeah,

Speaker:

I forgot to get back to that.

Speaker:

well, it de, you know what, it's gonna depend on the type of DR test

Speaker:

you do because some DR tests, there's a lot of waiting, there's a lot of,

Speaker:

I'm gonna start the restore part, and then I sit there for many, many hours.

Speaker:

And if you got one of those, then.

Speaker:

Doesn't really matter whether you're focused or not, as long

Speaker:

as somebody's keeping an eye on the, uh, percentage done.

Speaker:

And I'm gonna start this with a story from back in the day.

Speaker:

The first backup, uh, the first restore test I ever did.

Speaker:

The first, like, well, at least the first one I can really remember.

Speaker:

And this was when we had, um, you know, I was at the bank.

Speaker:

And which MBNA, which at the time was the second largest credit card corporation,

Speaker:

and I was in charge of backups.

Speaker:

And we had, I had talked the boss into moving to what would become

Speaker:

the first of many commercial backup products that I had, uh, used.

Speaker:

And that product was a product called SMarch, which as I've mentioned

Speaker:

before, should have been called SM Back because it was not an archive

Speaker:

product, it was a backup product.

Speaker:

Well, they were out, they were out of, uh, Minnesota area and,

Speaker:

and we had converted to them, but I like a good.

Speaker:

Good little backup guy.

Speaker:

I had done like a parallel implementation and so I was still

Speaker:

running my old dump tapes and I was running this new, uh, fancy tool.

Speaker:

And one of the things that this tool had was built-in, uh, compression.

Speaker:

Um, I wasn't using the compression on the tape drives.

Speaker:

I was using compression in the

Speaker:

software,

Speaker:

uh, to compress, uh, to go to the tape drives.

Speaker:

So we had our first major.

Speaker:

Failure, uh, file server, HP FS oh one.

Speaker:

I still remember the name of the server

Speaker:

just like get burned into your

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And, and this is related to another story that we, to a friend of mine that,

Speaker:

that I've referred to before where she was a consultant and she accidentally

Speaker:

basically re, this was a self-inflicted disaster where a consultant was trying

Speaker:

to clean up home directories and she just did a really, really good job

Speaker:

of cleaning up all the home directors.

Speaker:

So I, I put my, uh, SMR tapes.

Speaker:

Uh, you know, in my front pocket and I put my backup tapes in my back, my

Speaker:

back pocket, and I went down there like Mighty Mouse here comes said the day.

Speaker:

And um, some of you will get

Speaker:

Not available on iTunes, I

Speaker:

Not available at iTunes.

Speaker:

And we went in there and I put, these were DDS, uh, tapes, right?

Speaker:

And these, 'cause this, we were all hp, HP loved DDS.

Speaker:

And so I popped into DDS tape, I pulled up my SMR software

Speaker:

and I started the restore test.

Speaker:

And, well, it wasn't a restore test.

Speaker:

It was, it was, I was testing, I was testing in anger, uh, you know, we

Speaker:

were testing like this was for real.

Speaker:

And, and I had not done any actual testing.

Speaker:

I.

Speaker:

And so I kicked off the Restore and um, it.

Speaker:

I'm, I'm watching and I, I, I, you know, like a good little Unix boy.

Speaker:

I had a, for a while loop

Speaker:

running and it was doing a, a DF on the, on the, um, on the, you know, to

Speaker:

display the size of the file system.

Speaker:

And I'm watching, and like a long time was going by and there

Speaker:

was no change in the size of

Speaker:

the file system.

Speaker:

And I was like, this is weird.

Speaker:

And then I went over and I looked, I just.

Speaker:

Just outta curiosity, I looked at the

Speaker:

tape drive, right.

Speaker:

You know, it's kind like, it's kinda like your car dies, you know, you

Speaker:

open up the hood like, I have any idea what's going on in inside there.

Speaker:

You know, look in there

Speaker:

and I see the, I see the light on the tape drive and, and it

Speaker:

goes, B blink, B blink, blink,

Speaker:

one 1002, 1003, 1004,

Speaker:

blink, blink.

Speaker:

Right?

Speaker:

And there were these giant pauses in between.

Speaker:

The blinks.

Speaker:

And so I'm like, that's strange.

Speaker:

It always blinks when it's, you know, when it's reading or writing data, right?

Speaker:

So I called up, I called up the guys and I said, Hey,

Speaker:

The

Speaker:

vendor.

Speaker:

Yeah,

Speaker:

vendor.

Speaker:

Yeah.

Speaker:

Called it the vendor.

Speaker:

What's going on here?

Speaker:

They said, well, by any chance did you use the compression feature?

Speaker:

And I said yes.

Speaker:

Yes I did.

Speaker:

They go, yeah.

Speaker:

So let us explain to you how the compression feature works.

Speaker:

So when we're backing up files, um, basically, um, we do the equivalent

Speaker:

of compress minus CI think it was, to, to send the result to standard out

Speaker:

and it send it straight to the tape.

Speaker:

But we don't know how to do that on the way back in.

Speaker:

And so what we do is.

Speaker:

We, uh, we, we read the tape, we read the entire file that we're gonna restore

Speaker:

into, and we restore it to temp, and then we run uncompress in place in

Speaker:

temp, and then we move the res, the

Speaker:

uncompressed restored file.

Speaker:

To where it's gonna go.

Speaker:

And we do that one file at a

Speaker:

Oh,

Speaker:

because we're concerned that we're gonna fill up temp.

Speaker:

And I'm like, oh.

Speaker:

So like if temp is, if I have a single file that's bigger than

Speaker:

temp, it's just not gonna work.

Speaker:

They're like, yeah.

Speaker:

And they're like, if this is a concern, we suggest perhaps

Speaker:

you don't use this feature.

Speaker:

it would've been helpful to know before I needed

Speaker:

Right, right.

Speaker:

And so thank God I had the, I had the backup tape in my backup

Speaker:

pocket and I pulled 'em out.

Speaker:

I was like, okay, we're just using, you know,

Speaker:

uh, dump here.

Speaker:

And, um, and we restored and everything was basic, right?

Speaker:

Um, and luckily I had backups from the previous night.

Speaker:

And really the moral of this story is.

Speaker:

Don't do that.

Speaker:

Right.

Speaker:

Don't test your backups before

Speaker:

you need them.

Speaker:

Right.

Speaker:

Do a DR test and that's what we're talking about today.

Speaker:

Do a DR test before you actually need to do DR and, and thi this May, we

Speaker:

will see I the, you know, sometimes when we have these episodes, persona,

Speaker:

you know, you know, we have these conversations beforehand and I'm

Speaker:

like, I don't know if we can fill up an entire episode over this topic.

Speaker:

We said that I think over the last one.

Speaker:

Yeah,

Speaker:

And it ended up not being a problem at all.

Speaker:

This is not one of those episodes.

Speaker:

This is one of those episodes where I think we might go along and we'll end

Speaker:

up turning this into two episodes, um, because there's a lot to talk about here

Speaker:

because before we do any testing, right, um, what do you think we need to do?

Speaker:

Well, you, well, you need to understand,

Speaker:

there.

Speaker:

By the way, there's probably no wrong answer here.

Speaker:

Uh, but

Speaker:

well, well, I think before you could

Speaker:

un unless your answer is, unless your answer is nothing,

Speaker:

um,

Speaker:

no, no.

Speaker:

So I was, I was going through my head.

Speaker:

I was thinking even before you get to testing,

Speaker:

Yeah.

Speaker:

you need to understand what are the business requirements for how quickly

Speaker:

you need to bring up that site, which in which actually you don't

Speaker:

necessarily think about as testing, but that's actually part of your

Speaker:

backup system design before you even get to the testing part.

Speaker:

Yeah.

Speaker:

So we, we have to agree on what success would be, right?

Speaker:

And we have to agree on obviously what we're gonna test, but we have to

Speaker:

agree on the parameters of that test.

Speaker:

And so, um.

Speaker:

What?

Speaker:

What do you think are going what?

Speaker:

Go ahead.

Speaker:

You and I was just going to ask about like your parameters, right?

Speaker:

What are

Speaker:

those parameters in my mind?

Speaker:

Some of those are how quickly do I need to be able to bring up what the scope is,

Speaker:

right.

Speaker:

Am I failing over and

Speaker:

trying to recover a file, an application, a data center, right?

Speaker:

What am I looking to actually test?

Speaker:

So it depends what you're trying to test.

Speaker:

And what the extent is that you want to test.

Speaker:

And also, I think the other thing is how close do you want to get to testing

Speaker:

an actual disaster as part of that?

Speaker:

Right.

Speaker:

Um, and I, I would say that there are different kinds of DR tests and, um.

Speaker:

The, there's, and, and if you haven't done any, I would say let's start small, right?

Speaker:

If we've never, if, if we've never done a DR test of any kind, I would

Speaker:

probably start with, what are you

Speaker:

I was thinking about the test you don't wanna try is who is the guy in Alaska?

Speaker:

Oh yeah,

Speaker:

yeah.

Speaker:

Okay.

Speaker:

We definitely have to put a link to that episode.

Speaker:

That was the most amazing episode ever.

Speaker:

In fact, you know what?

Speaker:

We'll pro we'll probably replay it over, uh, over the holiday break, Uh,

Speaker:

by the way, we love him and it, and, and it had a happy ending, but oh my God.

Speaker:

What, what a

Speaker:

nightmare,

Speaker:

think he was rebuilding a raid array, if I recall.

Speaker:

He was swapping the disks around

Speaker:

it was self-inflicted in that he said, I want to move the discs around.

Speaker:

Because they were like.

Speaker:

different sizes I

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Like, and he's like, the only way I can do this is to just, to just

Speaker:

wipe everything and start over.

Speaker:

And he is like, yeah, okay.

Speaker:

So that's what I'll do.

Speaker:

I'll just wipe everything and then I'll restore everything.

Speaker:

My s my backup system works.

Speaker:

And this is how he tested his backup system.

Speaker:

Please do.

Speaker:

He definitely learned some things along the way, but, um, yeah, so don't do

Speaker:

Don't do that.

Speaker:

Yeah,

Speaker:

Don't do that.

Speaker:

Uh, but

Speaker:

learn from his example.

Speaker:

Listen to that episode and, and,

Speaker:

and have

Speaker:

a heart attack as you're listening.

Speaker:

because the thing that ran through my head was someone who's never done DR

Speaker:

testing would've been like, yeah, I'm just gonna shoot in the head my production

Speaker:

site with all the applications, and I make sure that it, or just test it out

Speaker:

and see does it fail over properly?

Speaker:

Yes.

Speaker:

The key here is non-destructive DR testing.

Speaker:

Right?

Speaker:

Um, and so, yeah, so, so to go back to like setting the scope, I, if this

Speaker:

is your first time doing DR testing, I would set the scope as small as possible.

Speaker:

In fact, if you've never done DR testing, I would not even be doing DR testing.

Speaker:

I would just be doing restore testing

Speaker:

and

Speaker:

What's the difference, Curtis?

Speaker:

What's that?

Speaker:

What's the

Speaker:

the question is, the question is are we going to declare a disaster

Speaker:

and say that this site is down in some way, um, and then we're

Speaker:

gonna fail over to another site?

Speaker:

Or are we simply just going to bring another server online?

Speaker:

Right.

Speaker:

De depending on how you define, uh, some, and, you know, depending on the

Speaker:

day of the week and the day of the year in which year it is, I have called.

Speaker:

Every time you have to restore, uh, a server of any kind, a disaster.

Speaker:

It's just a disaster of different levels, right?

Speaker:

So if the server, if a server caught on fire, that's a disaster,

Speaker:

right?

Speaker:

Um, and, or, or if a server just died, right?

Speaker:

Or if, like the, the story in the beginning, I think that was a disaster

Speaker:

because it took out a production workload,

Speaker:

right?

Speaker:

So I would say perhaps start with a single.

Speaker:

Production workload and restore it.

Speaker:

First off, just restore it, you know, like in place, not, not in place, not

Speaker:

in place, I meant like within the data

Speaker:

center or within whatever computing environment you're using.

Speaker:

And then start talking about VPNs and

Speaker:

you know, that sort of stuff.

Speaker:

So, uh, that would be defining the scope as small as you can

Speaker:

for the first test that you can.

Speaker:

so maybe like, uh, like maybe it might be a directory within a file server.

Speaker:

If you've never done

Speaker:

any restore testing, yes.

Speaker:

A simple directory within a file server, does our backup system work at all?

Speaker:

Right.

Speaker:

Um, and then the next, the next is going to be some type of

Speaker:

recovery of an entire server.

Speaker:

Hopefully you have that server virtualized because doing that is going to be,

Speaker:

uh, obviously much easier assuming you're treating that VM as a vm.

Speaker:

Uh, if it's not virtualized, then we we're gonna start going down the

Speaker:

level of, um, bare metal recovery,

Speaker:

right?

Speaker:

BMR By the way, we should probably do an episode on that.

Speaker:

I

Speaker:

didn't think about that.

Speaker:

We should do an episode on that.

Speaker:

I know you keep mentioning server as you're talking about this.

Speaker:

Would you also qualify that that could be server or an application?

Speaker:

thanks for bringing that up.

Speaker:

That's why you keep me around, you know?

Speaker:

So it depends on what you mean by application, right?

Speaker:

Um.

Speaker:

You could, these are like the different

Speaker:

level.

Speaker:

You talk about restoring a directory.

Speaker:

I would also look at restoring a single database within a server, right?

Speaker:

If that's what you mean by application, then I would say yes.

Speaker:

Sometimes when we say application, actually a lot of times when we

Speaker:

say application, what we mean is

Speaker:

application

Speaker:

Multiple

Speaker:

on multiple servers and multiple things, all interconnect.

Speaker:

And if that's what you're talking about, I'm gonna say no

Speaker:

yeah.

Speaker:

yeah, I

Speaker:

was thinking like a, my,

Speaker:

time out.

Speaker:

yeah, I was thinking more like restore your MySQL database.

Speaker:

Exactly

Speaker:

right.

Speaker:

Um,

Speaker:

table within your MySQL database.

Speaker:

Right.

Speaker:

yeah.

Speaker:

Um, and again, non-destructively,

Speaker:

Yeah,

Speaker:

right.

Speaker:

Um, I.

Speaker:

Uh, just, just a, just an entity and you can try all of these things, right?

Speaker:

You can try restoring a database.

Speaker:

You can try restoring an entire, all of the, all of the databases on a server.

Speaker:

You can try restoring a server with the databases, um, and or any other

Speaker:

applications that might like a web server.

Speaker:

You can try all of these things individually and make sure that

Speaker:

you've got those pieces down.

Speaker:

That this is about defining the scope.

Speaker:

Try the all of the different pieces first and make sure you've got the,

Speaker:

the recovery path for each of those parts of your infrastructure down

Speaker:

before you decide, okay, we're gonna pretend we're gonna blow up the data

Speaker:

center.

Speaker:

and the other reason to bring that up, it's important to test these different

Speaker:

types of components because there are gonna be different nuances like how

Speaker:

you deal with databases and recovering.

Speaker:

That is definitely gonna be different than file servers, which is probably

Speaker:

also gonna be different than servers.

Speaker:

And so it's important to understand the nuances of what is possible and

Speaker:

the steps for each one of these.

Speaker:

And also the different backup systems.

Speaker:

So if you've got.

Speaker:

Physical, you know, you've got physical servers that are not virtualized.

Speaker:

You've got servers running in Hyper V or VMware or you know, any, any, any sort

Speaker:

of on-premises virtualization set.

Speaker:

What was the third one

Speaker:

Broadcom,

Speaker:

brought?

Speaker:

VMware,

Speaker:

It's, it will always be

Speaker:

VMware to me.

Speaker:

It will always be world, um, the, um, if you've got.

Speaker:

VMs running in, uh, AWS, GCP Azure.

Speaker:

If you've got basically all of the different places that you

Speaker:

have infrastructure, probably have different backup and recovery

Speaker:

methodologies for each of them.

Speaker:

And so you should also be looking at testing all of those.

Speaker:

And you should be looking at testing each of them individually as a,

Speaker:

an overall process that you're working towards developing, uh,

Speaker:

declaring a much bigger disaster.

Speaker:

Yeah.

Speaker:

I, I think it's important also to be familiar because sometimes you might have

Speaker:

a real disaster that doesn't require you

Speaker:

to recover every single component within that higher level application, right?

Speaker:

So being familiar with the individual components is also important

Speaker:

depending on what the disaster is.

Speaker:

Yeah.

Speaker:

Agreed.

Speaker:

Um, and you know, there, there, there's an application that we haven't

Speaker:

talked about in terms of including or not including it in your Dr.

Speaker:

Scope and that is, what about SaaS applications like Microsoft 365?

Speaker:

Um, there are a couple of different scenarios there.

Speaker:

One is your.

Speaker:

Um, account is damaged in some sort of logical way, meaning logical

Speaker:

corruption, meaning a ransomware attack.

Speaker:

You deleted it.

Speaker:

Right?

Speaker:

We, we've covered that

Speaker:

on,

Speaker:

provider damaged it.

Speaker:

yeah.

Speaker:

The provider.

Speaker:

Well, that's, that, that was, I'm gonna list that as like a

Speaker:

third.

Speaker:

Well, if they damaged your account.

Speaker:

And there is an example of that.

Speaker:

Uh, for example, the sales for

Speaker:

Yep.

Speaker:

That's what I was thinking.

Speaker:

story where they went and blew up everybody's permissions and everybody

Speaker:

had to restore that themselves.

Speaker:

Um, that man, I hate that story.

Speaker:

I really do, because Salesforce, in my opinion, did not own up to.

Speaker:

Uh, they, they, didn't step up to the plate

Speaker:

Yeah.

Speaker:

at, at the time.

Speaker:

I remember writing a blog post for, uh, Druva.

Speaker:

I was working for Druva at the time, and I remember writing for Blo, a blog

Speaker:

post that said something like, proof that, that Salesforce should not be

Speaker:

trusted with your backup infrastructure.

Speaker:

Um, 'cause they clearly don't know what they're doing.

Speaker:

The but there's, but, so there's your account being damaged in some way.

Speaker:

And then there's OVH Cloud.

Speaker:

And what happened there Where the entire infrastructure goes?

Speaker:

Poof.

Speaker:

Right.

Speaker:

So, uh, I think you should have that as, as you, you should come up with that

Speaker:

as a scenario that you need to test.

Speaker:

It's going to be challenging in most cases because just let's just

Speaker:

talk about the different scenarios.

Speaker:

Let's say it's AWS the way most people back up AWS If AWS goes down

Speaker:

and takes your backups with it.

Speaker:

You're screwed.

Speaker:

You're screwed.

Speaker:

Um, the way that most people back up most cloud infrastructure.

Speaker:

And then there's the fact that most people trust their SaaS provider for data

Speaker:

protection, which they should not, right?

Speaker:

I talk about that all the time.

Speaker:

They should not, But, but but even if you're backing up your, your

Speaker:

data, um, to a third party and, and it's not, they should not.

Speaker:

Right?

Speaker:

I talk about that all the time.

Speaker:

They should not.

Speaker:

This has been

Speaker:

a day in eight.

Speaker:

Problem though,

Speaker:

This is an age old problem, but we're talking about testing today.

Speaker:

So by the way, this is definitely gonna be two episodes.

Speaker:

We haven't even gotten to the testing yet.

Speaker:

All we're talking about is setting up the requirements.

Speaker:

So,

Speaker:

so this is definitely gonna be a second

Speaker:

episode.

Speaker:

Um, so we talk about, we talk about setting the scope and we

Speaker:

talk about starting small first.

Speaker:

Each of these individual components in your infrastructure and um,

Speaker:

were you about to say something?

Speaker:

that this is not destructive and

Speaker:

not disruptive.

Speaker:

Yeah.

Speaker:

Nice, nice.

Speaker:

I like that.

Speaker:

Non-destructive, non-disruptive DR testing.

Speaker:

I

Speaker:

like it.

Speaker:

because you don't wanna take down your production.

Speaker:

You don't wanna affect, say, ongoing Dr.

Speaker:

Resiliency that's available on the secondary site because you're

Speaker:

about to do some of this testing.

Speaker:

There are cases where you do want to impact that, but when you're doing sort

Speaker:

of these individual component levels, you may not want to impact your overall DR.

Speaker:

Posture while you're doing this

Speaker:

Yeah, that that last one is probably the hardest and it

Speaker:

and may actually be impossible.

Speaker:

What are we talking about there?

Speaker:

We're saying that if you have a DR system, I hope you have a DR system

Speaker:

if you have one, you know, see if there's a way

Speaker:

to test your DR without messing up your dr.

Speaker:

Um, I'm not sure if that's possible

Speaker:

in, in many scenarios, but.

Speaker:

but one of the things I think about right is a lot of data protection

Speaker:

vendors, they do test and dev, right?

Speaker:

You can spin

Speaker:

up a copy off of your storage that is a writeable copy that you can then use

Speaker:

for testing out your recovery systems

Speaker:

without impacting the actual recovery instance.

Speaker:

right.

Speaker:

And so that's what I'm thinking about is just like those sort of scenarios or maybe

Speaker:

you're able to wheel in an extra server or beg, borrow steel, an extra server to use

Speaker:

for your recovery testing or DR testing,

Speaker:

Yeah.

Speaker:

You know, it's, it's funny that you say that because I, I, you know, when

Speaker:

you, I like this idea wheeling in a, a

Speaker:

server.

Speaker:

I mean, I, I, I think many of our listeners, you know, and, and I, and every

Speaker:

time I, by default, I'm always talking about data center, even though I, you

Speaker:

know, and then in my brain goes, Hey,

Speaker:

nobody has a data center anymore.

Speaker:

Um, I don't think many people are wheeling in a server.

Speaker:

I think that they're, they're, you know, they're looking at.

Speaker:

Cloud

Speaker:

infrastructure as a way.

Speaker:

I, I know that not everybody can do that.

Speaker:

And that's obviously another thing that you have to decide upfront is

Speaker:

how are we going to do this recovery?

Speaker:

Are we going to do it in the cloud?

Speaker:

Are we gonna do it?

Speaker:

The alternate infras, alternate infrastructure, um, these are all

Speaker:

things that you have to decide upfront.

Speaker:

We have to decide what it is we're gonna restore.

Speaker:

We have to decide, um, where we're going to restore and, and how.

Speaker:

Right and.

Speaker:

The deciding the where.

Speaker:

Again, this is all about planning.

Speaker:

This is all stuff that needs to be

Speaker:

decided upfront.

Speaker:

This is also something that's going to be part of your backup and Dr.

Speaker:

Design, you will have decided up upfront, you know how we're going to do Dr.

Speaker:

Uh, well, disaster recovery, how we're going to do it.

Speaker:

And that place is most likely the place that you're going

Speaker:

to, uh, be doing DR testing.

Speaker:

And, you know, there are a lot of choices here.

Speaker:

Cloud infrastructure, I think, is the best choice for most environments.

Speaker:

Another choice is aging infrastructure, right?

Speaker:

So

Speaker:

you move, you know, you move your older stuff out and that

Speaker:

becomes your DR environment.

Speaker:

Um, another choice, another very common choice is basically, um,

Speaker:

there, there's two ways to basically.

Speaker:

Rent infrastructure for the purposes of disaster recovery.

Speaker:

One way is to contract with, um, you know, a company that will provide

Speaker:

what you need if and when you need it.

Speaker:

Uh, and that costs, you know, let's say this much.

Speaker:

And then there's a company that will provide everything you need,

Speaker:

always available to you all the time.

Speaker:

Even when you don't need it.

Speaker:

And that will cost this much.

Speaker:

Yeah, exactly.

Speaker:

I can't do the fingers, I gotta do the hands.

Speaker:

Right.

Speaker:

It's significantly more.

Speaker:

Uh, and, and this is why I push everybody to the cloud as much as you can.

Speaker:

'cause that's the beautiful thing about the cloud, is that you can just literally

Speaker:

snap your fingers and, um, you know, and use, and you can also use infrastructure

Speaker:

as code so that you can just make all of the hardware that you need

Speaker:

magically appear.

Speaker:

What's that?

Speaker:

and work right.

Speaker:

And work.

Speaker:

Yeah.

Speaker:

Magically appear when you need it.

Speaker:

And then the moment you no longer need it because your test is over, you can

Speaker:

also snap your fingers and it all goes away and you just pay for it only when,

Speaker:

uh, it's, you know, up and running.

Speaker:

Um, so the, the next thing to talk about is, so we decided

Speaker:

what it's we're gonna test.

Speaker:

We decided where we're gonna test it, how we're gonna test it,

Speaker:

what about our success criteria?

Speaker:

Yeah.

Speaker:

I think this is important to note upfront what it means to be

Speaker:

successful, but I think it's also important to be realistic, right.

Speaker:

With your success criteria, especially if this is sort of your

Speaker:

first time doing this, because I remember the story you tell Curtis

Speaker:

about your, your runbooks that you

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

when you used to work at the bank.

Speaker:

And how a lot of times they were not able to get through and

Speaker:

actually do the recovery because they, a step would be skipped or

Speaker:

something wouldn't work properly.

Speaker:

And so I think it's sort of a learning process.

Speaker:

So don't be too hard on yourself.

Speaker:

If the first 10, 20, a hundred times you try doing your recovery testing, it's

Speaker:

not like a hundred percent successful.

Speaker:

Yeah.

Speaker:

may not be your fault.

Speaker:

Things change, environments change, hard

Speaker:

work may change, right?

Speaker:

There's so many other factors,

Speaker:

but.

Speaker:

this is, this is very closely related to the discussions we've had over, um,

Speaker:

doing, uh, um, cyber recovery testing.

Speaker:

Right?

Speaker:

And your disaster recovery very likely will be part of an

Speaker:

overall cyber recovery process.

Speaker:

And I agree with you that I.

Speaker:

Um, it's interesting.

Speaker:

I was go mentally, I was going somewhere completely different.

Speaker:

But you, you, once again, this is why we make such a good team.

Speaker:

Um, you were more like, Hey, make the re make the, um, um,

Speaker:

you know, the requirements.

Speaker:

Uh, be nice to yourself, especially if it's the first time out.

Speaker:

You know, requirement number one, no one dies, right?

Speaker:

Yep.

Speaker:

No fires are created.

Speaker:

No one quits.

Speaker:

Um.

Speaker:

And, um, the, you know, set your expectations low and nobody can

Speaker:

take you, take 'em away from you.

Speaker:

I'm borrowing heavily from, uh, Michael p Connolly, the comedian who,

Speaker:

who says that's the, the happiness to life is to lower your expectations.

Speaker:

And he's like, my goal for today is to go to the bathroom outside my pants.

Speaker:

Um,

Speaker:

so yeah.

Speaker:

Set the, uh, you know, the success criteria low in the beginning.

Speaker:

Um, where I was going was of course, REO and RPO, which we talked about before.

Speaker:

The, that, that is definitely going to determine and the overall success,

Speaker:

once you click the stopwatch and you begin the recovery process.

Speaker:

And then you click it, that everything is back up and running and, and you've

Speaker:

tested that, that, that whatever it is that you destroyed or you pretended you

Speaker:

destroyed, is now up and running and fully functional Again, as we've mentioned

Speaker:

in the RTO and RPO, that doesn't just mean the time of the restore, it's,

Speaker:

it's, you know, it's.

Speaker:

ah, here's a question.

Speaker:

Is it RTO and RPO or is it RTA and RPA?

Speaker:

the, it's, that is the objective,

Speaker:

right?

Speaker:

The RTO and RPO are, are the objective that we are shooting for.

Speaker:

And so the goal, um, in a recovery test of any kind is that the RTA, that's recovery

Speaker:

time, actual, uh, and recovery point actual, are less than, uh, or equal to

Speaker:

the RTO and RPO.

Speaker:

go back and listen to our episode.

Speaker:

We covered it, I think a couple episodes ago, right?

Speaker:

No, it was, well, well, I don't know.

Speaker:

It depends on when this one gets published.

Speaker:

We just, we just did it.

Speaker:

Yeah.

Speaker:

That's why I'm saying we just did it not.

Speaker:

Oh, was that really the last episode?

Speaker:

Well, it just published.

Speaker:

Um, uh, but again, I don't know, you know,

Speaker:

we'll see how these things, you know, but, um, yeah, if you're, if, if RTO and RPO

Speaker:

don't just roll off your tongue and you don't know what they are and, and, and,

Speaker:

you know, all of that stuff, it, they literally should be in every conversation

Speaker:

having anything to do with backup and Dr.

Speaker:

Design.

Speaker:

But that to me is the ultimate success criteria, right?

Speaker:

Another success criteria is the degree, and this, this is gonna be

Speaker:

a percentage, um, a per a percentage achieved, and that is the degree to which

Speaker:

you were able to follow your runbook

Speaker:

and just do what's in the runbook.

Speaker:

This is

Speaker:

what you, this is what you were alluding to before, right?

Speaker:

Well, and hopefully you have a runbook.

Speaker:

Yeah.

Speaker:

Like you said, that, that, that assumes you have a run book.

Speaker:

Yeah.

Speaker:

Um.

Speaker:

And, and the way we always did this at the bank, as we mentioned

Speaker:

before, is that we would have someone other than the person who runs the

Speaker:

backup system do the DR testing.

Speaker:

Right?

Speaker:

Here's the runbook, please follow it because we're gonna pretend

Speaker:

that Curtis got hit by a bus.

Speaker:

It was never anything nice.

Speaker:

It was never

Speaker:

that I won the lottery and then just flew the coop.

Speaker:

It was always Curtis got hit by a bus or got swallowed up in the, in the sink.

Speaker:

The great sinkhole of 2024.

Speaker:

Um,

Speaker:

long do you think it's gonna be until people don't even know what a bus is?

Speaker:

I think, I think we're good.

Speaker:

I think we're

Speaker:

good.

Speaker:

You know, I, um, it's funny, definitely an aside, when I was working the

Speaker:

election, uh, the last two weeks we're at a school and one of the hardest

Speaker:

time we're, we're, we're at a, like a multifunction building next to a school.

Speaker:

And one of the hardest times was during pickup and drop off because

Speaker:

there are all these parents that are picking up and dropping off their kids.

Speaker:

And I was like.

Speaker:

You know, if only there was like a large vehicle that we could put all the kids

Speaker:

in and like we could like paint it yellow so that people could see it and then like

Speaker:

have a sign that comes out and make sure, oh, traffic stops in both directions.

Speaker:

If only we could do all that.

Speaker:

And then all these parents wouldn't have to like, uh, take their kids

Speaker:

to school and waste all of that gas and all of those cars sitting there.

Speaker:

And I'm sure they're, the gas is running while they're sitting

Speaker:

there for a half hour waiting for their kid to come outta school.

Speaker:

Anyway, I digress.

Speaker:

I don't know.

Speaker:

Hmm.

Speaker:

But, um, so that, I, I think that, we'll, we'll stop there because

Speaker:

basically it, the number one thing, the number one success criteria that

Speaker:

I'm gonna say is that you know what your success criteria are before.

Speaker:

What are we gonna restore?

Speaker:

How are we gonna restore it?

Speaker:

Where are we going to restore it, and how long is it, you know, what, what

Speaker:

timeframe are we trying to fit in?

Speaker:

And also what other, what other, like the thing we talked about with the.

Speaker:

With the, uh, uh, the other criteria being that, that we can

Speaker:

do it without Curtis' help, right?

Speaker:

Or what, whatever, you know, whatever your, your Curtis is, right?

Speaker:

Um, decide on what all of those are upfront and you have a much

Speaker:

better chance of being successful when you actually do the recovery.

Speaker:

What do you think?

Speaker:

No, that makes sense.

Speaker:

Okay.

Speaker:

And with that, I will thank you once again for being a great co-host persona.

Speaker:

I try, I try.

Speaker:

This was a fun, I I like talking about disaster recovery testing.

Speaker:

Yeah, absolutely.

Speaker:

And we will, uh, hope you guys enjoyed this as well.

Speaker:

Uh, that is a wrap.

Speaker:

The backup wrap up is written, recorded, and produced by me w Curtis Preston.

Speaker:

If you need backup or Dr.

Speaker:

Consulting content generation or expert witness work,

Speaker:

check out backup central.com.

Speaker:

You can also find links from my O'Reilly Books on the same website.

Speaker:

Remember, this is an independent podcast and any opinions that

Speaker:

you hear are those of the speaker and not necessarily an employer.

Speaker:

Thanks for listening.