Speaker:

You found the backup wrap up your go-to podcast for all things

Speaker:

backup recovery and cyber recovery.

Speaker:

In our final episode about DR testing, the rubber meets the road.

Speaker:

Last time we talked about getting ready for your DR test, and this time we're

Speaker:

talking about actually running the test.

Speaker:

We'll cover what you need to do during the test, like coordinating between

Speaker:

teams, documenting what goes wrong, because something always goes wrong,

Speaker:

and making sure that you've got backup communication methods ready.

Speaker:

By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.

Speaker:

Backup, and I've been passionate about backup and recovery for over 30 years,

Speaker:

ever since I had to tell my boss.

Speaker:

We had no backups of that really important production database that we had just lost.

Speaker:

I don't want that to happen to you, and that's why I do this.

Speaker:

On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.

Speaker:

This is the backup wrap up.

Speaker:

Welcome to the show.

Speaker:

Hi, I am w Curtis Preston, AKA, Mr.

Speaker:

Backup, and if you could just take a couple of seconds to

Speaker:

either like or subscribe or.

Speaker:

Uh, follow the channel so that you can, uh, always get our great content.

Speaker:

That would be awesome.

Speaker:

I am once again joined by a guy who has finally put some of his car

Speaker:

knowledge to use Prasanna Malaiyandi.

Speaker:

I'm doing well Curtis, and yes, I am finally putting some of that car knowledge

Speaker:

to use, uh, for viewers who may not, or listeners who may not be aware.

Speaker:

tend to watch a lot of car YouTube stuff, um, a lot of it

Speaker:

tends to be a brown fabrication engine rebuilding, drag racing.

Speaker:

It's a really odd mix, but a lot of it is just YouTube knowledge.

Speaker:

And so I finally decided to try something different, and I've been taking auto

Speaker:

shop classes at my local community college, which has been amazing.

Speaker:

And so as part of it, you actually have a hands-on lab section where you get to

Speaker:

actually work on cars like your own car.

Speaker:

And right now it's all basic stuff, right?

Speaker:

So changes, underhood inspections, inspecting cooling systems.

Speaker:

But we actually gotta do things like charging tests, uh, compression tests,

Speaker:

leak down tests, replacing spark plugs.

Speaker:

So excited.

Speaker:

I'm actually using these hands for things.

Speaker:

And you did a, you did an oil change yesterday on your wife's car,

Speaker:

do an oil change on my wife's car.

Speaker:

Yep.

Speaker:

I.

Speaker:

How filthy was the oil in your wife's car?

Speaker:

Yeah, it looked almost brand new.

Speaker:

Um, it didn't have many miles since the last oil change.

Speaker:

I'd probably say five or 600 miles, but it was a sacrifice since I needed to

Speaker:

actually do an oil change for the class.

Speaker:

So.

Speaker:

Right, right now, this isn't the first time you've done an oil change, right?

Speaker:

no.

Speaker:

Okay.

Speaker:

I've done one in the past, but this is the first time I've done it on a lift,

Speaker:

which oh my God, is so much nicer.

Speaker:

Oh, you brought, you brought her car into the class.

Speaker:

class.

Speaker:

And we put it

Speaker:

I see, I see.

Speaker:

and I.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Everything's nicer on a lift.

Speaker:

Absolutely.

Speaker:

When you're not like struggling underneath the car, trying not to drop the hot oil

Speaker:

on you, and you're actually able to get a large, like container underneath, like the

Speaker:

drum was probably like three feet wide.

Speaker:

Right, right.

Speaker:

Yeah.

Speaker:

You had one of those that you can wheel around, right?

Speaker:

Yep.

Speaker:

So significantly easier.

Speaker:

And then I was just thinking, I was like, do I have room in

Speaker:

my garage for a two post lift?

Speaker:

Even a short one, but no.

Speaker:

Trust me, I have thought about it back when I was doing a

Speaker:

lot more work on my cars.

Speaker:

I definitely looked into it and I was like, okay, I don't, I

Speaker:

can't spend that kind of money.

Speaker:

So let's talk about something that we actually know a little bit of something

Speaker:

about, uh, so last, so two weeks ago, for those of you that follow the,

Speaker:

uh, episode, or for those of you that follow the show, uh, two weeks ago

Speaker:

we did DR testing part one, and then.

Speaker:

Um, we aired, uh, a great speaking of Dr.

Speaker:

Testing a great episode from 2021, which was the best DR testing story ever, right?

Speaker:

Yep.

Speaker:

Oh yeah.

Speaker:

The scariest, I would say.

Speaker:

Yeah.

Speaker:

Where a guy for reasons that he goes into in the show, he essentially purposefully

Speaker:

destroys his production environment, not just for DR testing, but as a.

Speaker:

As a matter of how everything happened, he ends up testing his DR system

Speaker:

and it, it does work, but oh my God.

Speaker:

There was, there was a, there was a quote in there that said something like, he had

Speaker:

a long weekend that lasted like five days.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

So, yeah.

Speaker:

things it's like, and well, and the other challenge is he was up in Alaska,

Speaker:

Yeah,

Speaker:

If he needed to get parts or other things like

Speaker:

right.

Speaker:

luck.

Speaker:

Yeah, exactly

Speaker:

if I'm about to do like a house repair or something else like that,

Speaker:

it's like, you know not to do it on a Saturday or a Sunday or a Friday,

Speaker:

right.

Speaker:

if you have to call someone or you need to pick up something

Speaker:

and you don't do it at night.

Speaker:

Yeah, definitely.

Speaker:

Definitely don't do it at night.

Speaker:

Right.

Speaker:

Yeah.

Speaker:

Um, yeah, so that's a great episode if you didn't listen to that episode.

Speaker:

That is a great episode.

Speaker:

Um, and, uh, uh, yeah, listen to that.

Speaker:

So this one, the, the.

Speaker:

Two weeks ago, we talked essentially about getting ready to do the DR test,

Speaker:

preparing for it, setting the scope for it, agreeing on what's going to be a

Speaker:

success, and then this week we're gonna talk about actually executing the DR test.

Speaker:

And again, this is a DR test.

Speaker:

What would you say is the purpose of a DR test Prasanna?

Speaker:

I.

Speaker:

To make sure that you're actually in the case of an actual disaster,

Speaker:

you're able to recover as agreed upon whatever your agreement was.

Speaker:

Yeah, I, I think that's sort of the general, yeah.

Speaker:

Obviously that's the purpose of a test in general, right.

Speaker:

Is to, is to, is to.

Speaker:

To test whether or not you could do it when you, when you need it.

Speaker:

But since most tests fail, I'm going to say that the other purpose and

Speaker:

perhaps the bigger purpose is to fix the parts of your, of the TR system that

Speaker:

you discover are broken in some way.

Speaker:

Right?

Speaker:

Um, and, uh, so the, the probably one of the biggest.

Speaker:

Outcomes of a DR test is to feed back into the DR plan, right?

Speaker:

Yeah.

Speaker:

just in terms of what fails, I know sometimes people are like,

Speaker:

oh, it's just thinking about like, I can't restore the data.

Speaker:

But a lot of times what really fails is the dependencies that you didn't consider.

Speaker:

Right.

Speaker:

you make sure you're able to fail over and recover your active

Speaker:

directory in your DR site before you can bring your applications online?

Speaker:

You, you know, um, I'm glad you brought that up because I aired

Speaker:

another classic episode about the actual disaster recovery on an island.

Speaker:

And, uh, again, well, it's with the islands, right?

Speaker:

Because Alaska was Kodiak Island.

Speaker:

Um, but this was in a Caribbean island.

Speaker:

And they do an actual deal, you know, an actual recovery because there

Speaker:

was a hurricane that took it out.

Speaker:

And one of those dependencies that you talked about was the lack of internet

Speaker:

Yeah.

Speaker:

and, uh, lack, the lack of power, the lack of internet.

Speaker:

These are all things that we come to expect on a normal everyday basis, which

Speaker:

In

Speaker:

an actual disaster is, is not,

Speaker:

Yep.

Speaker:

not that right.

Speaker:

Yeah.

Speaker:

And we also had that other episode.

Speaker:

Do you remember maybe, I don't know if you want to air that or not.

Speaker:

The dire show one.

Speaker:

That's right, the one that talked about the derecho.

Speaker:

I'm gonna have to, I have to go find that one.

Speaker:

'cause it's not titled the Derecho episode.

Speaker:

It was, um.

Speaker:

I'll have to find, if I can find that, I'll rebroadcast it in the keeping

Speaker:

of the, the disaster recovery theme.

Speaker:

I'll, I'll definitely see if I can find that when I br

Speaker:

'cause that was also very good.

Speaker:

I didn't even know what a derecho was.

Speaker:

Derecho is a land hurricane.

Speaker:

Uh, a hurricane that forms over land.

Speaker:

I don't know why it's called derecho, but that is what it is.

Speaker:

Right.

Speaker:

Yep.

Speaker:

Uh, to me that just means Right, you know, to the right in Spanish.

Speaker:

But I.

Speaker:

You know, it is what it is.

Speaker:

Um, so, uh, so, so we talk about if we're executing the DR test.

Speaker:

Uh, we, we, you know, we, we, we've, we've agreed on what we're gonna test.

Speaker:

We've agreed on what the success criteria is.

Speaker:

It's time to actually start walking through the, the test

Speaker:

we're, we're going to have.

Speaker:

And, and also we created a, an environment that we're going to test in.

Speaker:

We're not doing what our friend from Alaska did.

Speaker:

I, I was just thinking, are you just gonna go around like the TV shows

Speaker:

when they get hit with an attack and they're just like, plug gun,

Speaker:

plug the cables up, plug the cables.

Speaker:

Yeah, don't do that.

Speaker:

We, we have some sort of test of environment.

Speaker:

Generally speaking, today's, it's generally gonna be the

Speaker:

cloud and we're going to start executing the, the, um, this test.

Speaker:

Can you think of, uh, and, and one of the things, again, this is more of

Speaker:

set up a thing, but one of the things you wanna make sure is to allocate

Speaker:

enough time, uh, for this, you know.

Speaker:

For this process to unfold in its natural, um, evolution.

Speaker:

I would say time.

Speaker:

And then also make sure you have the resources right.

Speaker:

And I'm not, I don't mean compute resources, but people because.

Speaker:

Right?

Speaker:

Make sure that people are available, right?

Speaker:

yeah.

Speaker:

Um.

Speaker:

don't do this at like, uh, quarter end because people may

Speaker:

be firefighting other things or.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

The company that I, the, the bank, we did it on a weekend.

Speaker:

Um, but it was a dedicated, you know, a, a dedicated weekend where we're

Speaker:

going to do the DR test, and we did that because again, you're, you're

Speaker:

making all these resources available for the DR test, which means they're not

Speaker:

available to do their day job, and their day job would happen during the week.

Speaker:

So we chose to do it on the weekend and.

Speaker:

I'd say the bigger, the bigger you're going, the bigger te the bigger,

Speaker:

this isn't coming out in English.

Speaker:

Uh, the bigger the test, the bigger the need to prepare and to, to have, um,

Speaker:

you know, to make sure you have those resources and to not do it when the

Speaker:

normal production stuff is going on.

Speaker:

requires buy-in from the business communication, right?

Speaker:

All these

Speaker:

Yeah.

Speaker:

right?

Speaker:

Yeah.

Speaker:

Make sure you.

Speaker:

Make sure you communicate to all the powers that be, that you are doing

Speaker:

a DR test, especially if you're gonna do any kind of failover.

Speaker:

Um,

Speaker:

it too, right?

Speaker:

Because you want this to be done on an ongoing basis.

Speaker:

right,

Speaker:

to convince 'em upfront, Hey, here's why it's valuable, such that when you go back

Speaker:

and after the results, right, you're like, Hey, we now need to do another DR test.

Speaker:

Maybe six months down the line,

Speaker:

right.

Speaker:

already bought it.

Speaker:

Another thing as, as we're going through the DR test, we're documenting

Speaker:

what went right, what went wrong, especially what went wrong.

Speaker:

Right.

Speaker:

Um, go ahead.

Speaker:

so

Speaker:

this is an interesting thing 'cause when we had Mike podcast, right, and

Speaker:

he was talking about sort of doing these tabletop exercises, right?

Speaker:

I think it's important the person documenting kind of needs to

Speaker:

take an objective perspective.

Speaker:

Mm-hmm.

Speaker:

Right, because you may be showing some biases or the person documenting

Speaker:

may not want to document certain things, or may just sort of dismiss

Speaker:

it as, Hey, this isn't important,

Speaker:

Right,

Speaker:

Versus actually capturing what happened throughout the process.

Speaker:

right.

Speaker:

Agreed.

Speaker:

Um, the next thing is, and, and we covered this, uh, in the previous

Speaker:

episode, but once you've, you know, we talked about testing little parts

Speaker:

of the infrastructure, but once we grow, once we've tested this piece

Speaker:

and this piece and this piece, I.

Speaker:

I do think it's important to test, you know, you look at the scenario, what

Speaker:

would this scenario do to our company?

Speaker:

Right?

Speaker:

The scenario is a disaster.

Speaker:

The scenario is a fire, a.

Speaker:

A terrorist action, um, and it's gonna take out all of this infrastructure.

Speaker:

What would that do to us?

Speaker:

So for example, you might not need to test your ability to recover from a SaaS outage

Speaker:

when your, if you have a data center and your data center goes out, right?

Speaker:

It's a, it's gonna be scenario dependent.

Speaker:

What you're gonna test, but, um, you, you might wanna, what would be the impact to

Speaker:

our business and our ability to use the different parts of our infrastructure?

Speaker:

And so speaking of dependencies, if we don't have internet of any kind, it

Speaker:

is, it is kind of a SaaS outage, right?

Speaker:

Right.

Speaker:

Um, so, uh, we're gonna, we want to test as many of those parts

Speaker:

of our infrastructure that are going to be impacted by the

Speaker:

scenario that we're testing, right.

Speaker:

Yeah, and sometimes it's a little bit about.

Speaker:

consequences or identifying gaps.

Speaker:

It's like when you're writing code, right?

Speaker:

You normally do unit

Speaker:

Um.

Speaker:

but then when you actually test the end-to-end functionality, you're

Speaker:

like, oh, I didn't realize that this interacts with this other thing

Speaker:

this way, and things don't work.

Speaker:

That's why we also do end-to-end testing in addition to unit tests.

Speaker:

Yeah.

Speaker:

And, and, and again, this is why I went, why back in the beginning I

Speaker:

was saying that the purpose of the DR test is to identify these gaps, right?

Speaker:

The, yeah.

Speaker:

I mean we can have, um, I.

Speaker:

You know, we can have that perfect test that goes well and that's great and

Speaker:

everybody feels better, but it's just as valuable to find the DR test that

Speaker:

had, that had a big hole or a small hole and, um, you know, the, uh, and, and,

Speaker:

and to document that and address that.

Speaker:

And this is why we do it on a regular basis.

Speaker:

I have a question for you, Curtis.

Speaker:

Yeah.

Speaker:

Do you think DR.

Speaker:

Testing?

Speaker:

So most organizations have a risk management team,

Speaker:

Mm-hmm.

Speaker:

right?

Speaker:

Which usually has a lot of this information in terms of, okay,

Speaker:

what are the business risks and everything else like that.

Speaker:

But they're also probably the ones who are coordinating across the business

Speaker:

in order to say, okay, let's do a test.

Speaker:

Mm-hmm.

Speaker:

Right where the infrastructure, DR testing that we're talking about here

Speaker:

is probably one portion of that overall

Speaker:

Mm-hmm.

Speaker:

Mm-hmm.

Speaker:

Do you think that's fair?

Speaker:

Yeah, I think that's fair.

Speaker:

And you know, this is, we're going to.

Speaker:

I think that if we're doing a, a real DR test, we're going to this.

Speaker:

This is a business test as much as it is a technology test, right?

Speaker:

Yeah.

Speaker:

There is this, that overlap between business continuity planning

Speaker:

and disaster recovery planning.

Speaker:

And maybe for a DR test, we're not concerned so much with, um.

Speaker:

Uh, like if, if it's just a DR test, we're not concerned with, let's say,

Speaker:

uh, uh, buildings and people, places for people to, to work and things like that.

Speaker:

We're concerned more with getting the technology back up and running.

Speaker:

But I, I'm glad you brought that up.

Speaker:

That is a, a separate aspect that does need to be taken into account.

Speaker:

Well, and the benefit with this is if there's already a team that is looking

Speaker:

at that business continuity aspect,

Speaker:

Mm-hmm.

Speaker:

You may not need to convince the business as much, right?

Speaker:

In order to be

Speaker:

Right.

Speaker:

right, you should partner with people who already, like that is their job,

Speaker:

Agreed.

Speaker:

Agreed.

Speaker:

them.

Speaker:

Agreed.

Speaker:

We talked about documenting things that we discover here.

Speaker:

I, I think that we should be maintaining like a log of, you

Speaker:

know, all of the tests and the things that we've learned from them.

Speaker:

Because again, that may be helpful for, uh, you know, for

Speaker:

future generations of tests.

Speaker:

You know, It's important to have a Dr.

Speaker:

Runbook and to, to, to have this, you know, one of the pur

Speaker:

the, one of the purposes of the test is to update that runbook.

Speaker:

So let's just talk about that.

Speaker:

Um, the, the, the thing about having a Dr.

Speaker:

Runbook, I do believe in having an electronic copy of the Dr.

Speaker:

Runbook, uh, but uh, also have the ability to easily update.

Speaker:

A paper copy of that runbook.

Speaker:

So the way to do that is to have some sort of documentation system

Speaker:

online that you can easily update.

Speaker:

Um, and then if you want to have a paper copy and you want to have a paper copy,

Speaker:

then um, the best way to have that is a, is a loose leaf type notebook system right

Speaker:

where you can update pages of it, where you don't have to update the entire book.

Speaker:

I have a comment about the electronic copy.

Speaker:

Sure.

Speaker:

I would recommend also keeping a copy out of your normal corporate infrastructure.

Speaker:

Agreed.

Speaker:

Right, right.

Speaker:

in case, say you get hit with ransomware and you no longer have access to that

Speaker:

infrastructure, or someone deletes your account that hosted that data, right?

Speaker:

So make sure it's something completely disconnected as well.

Speaker:

A copy just in case.

Speaker:

And I go back to think about the Pixar story, right, where they just happen to

Speaker:

be lucky with Toy Story two and have a copy offsite offline to save the movie.

Speaker:

Exactly.

Speaker:

Um, yeah, I, I, I think obviously we, we have to keep security in mind.

Speaker:

We have to make sure that what, wherever that system, wherever that other

Speaker:

copy is, it's protected by security.

Speaker:

But the whole point of it is to have it outside the normal security.

Speaker:

So, uh, there, there's a, there's a, um, a balance that you need to have there.

Speaker:

Right.

Speaker:

Um, what about communications during the DR tests?

Speaker:

Um, we need to keep everyone.

Speaker:

Abreast of what's going on.

Speaker:

You wanna talk about that a little bit?

Speaker:

Yeah, so you wanna make sure people aren't working in silos and because during a

Speaker:

DR test things are gonna be chaotic.

Speaker:

but since this is more of a controlled environment, you want to establish

Speaker:

those patterns and say, this is a normal way that we communicate.

Speaker:

It might be via phones, it might be emails.

Speaker:

You might jump into a video conference, right?

Speaker:

Whatever it is that you use, make sure that you have all the right

Speaker:

stakeholders in that session.

Speaker:

Right,

Speaker:

in order.

Speaker:

So, so then everyone knows what's going on.

Speaker:

The other thing though, uh, to mention is make sure you also have

Speaker:

alternate methods, Just like what we talked about, the runbook itself.

Speaker:

Make

Speaker:

right.

Speaker:

case your voiceover IP phones are down in your corporate environment

Speaker:

or your chat slack is down, or whatever else you're using,

Speaker:

Right.

Speaker:

Make sure you have an alternate mechanism to get in touch with people.

Speaker:

Yeah.

Speaker:

That's a real challenge.

Speaker:

Um, I mean, it, it is

Speaker:

Smoke

Speaker:

to have communication during it.

Speaker:

What'd you say?

Speaker:

signals.

Speaker:

So signals, it's, it's a real challenge because we depend so much on technology

Speaker:

and I would say that that, um.

Speaker:

Again, if it's an outage, generally the outage is for you

Speaker:

and not for everything else.

Speaker:

So for example, if you're relying on Zoom, um, as your mechanism, zoom will

Speaker:

probably be up when you have your outage.

Speaker:

You just have to need to make sure that everybody can get to zoom.

Speaker:

So, um, if for example, your your, your challenge there

Speaker:

will be if you are using, um.

Speaker:

You know, a, a, a third party authentication mechanism to get into Zoom

Speaker:

and then you don't have access to that, that could be, that could be a problem.

Speaker:

So these are the things you wanna make sure, you wanna be able to make sure that

Speaker:

you can communicate during the outage.

Speaker:

Um, and I can definitely think of a, you know, of a multi-headed zoom call where

Speaker:

everybody's just sort of keeping everybody abreast of what's going on, right.

Speaker:

Um, and we wanna make sure that the stakeholders are aware of everything

Speaker:

that's going on, as well as the people that are executing the, um,

Speaker:

that are ex executing the test.

Speaker:

Um, and then what about, um, I, I, I think, by the way, the Zoom call,

Speaker:

I think is the best way to have, or something like a Zoom call to have

Speaker:

coordination between the teams if there are multiple teams that are happening.

Speaker:

You don't necessarily have to have everybody who's

Speaker:

doing something with the Dr.

Speaker:

Uh.

Speaker:

To, to be on the Zoom call, but the purpose of the Zoom call, I

Speaker:

think is probably to keep, keep all of the different teams aware

Speaker:

of what the other teams are doing.

Speaker:

Right?

Speaker:

It's almost like a war room, if you will.

Speaker:

Right?

Speaker:

exactly.

Speaker:

The big, again, the bigger the test, the bigger it is, the bigger

Speaker:

the need is to have, uh, some type of communication like this.

Speaker:

Right?

Speaker:

Uh, and then you've also got escalation procedures.

Speaker:

What happens if something doesn't go right?

Speaker:

Who do we call?

Speaker:

Um, yeah.

Speaker:

you could throw a monkey wrench in things and be like, someone's about to

Speaker:

do a, normally is part of the DR test.

Speaker:

Right?

Speaker:

Or would be responsible for something.

Speaker:

You could be like, that person is home sick with the flu

Speaker:

and cannot be in the office.

Speaker:

Now what do you do?

Speaker:

Yeah.

Speaker:

Um, yeah.

Speaker:

If your DR.

Speaker:

Test says, you know, call Steve.

Speaker:

Um, this, this is the, the, you know, the more you have something like that,

Speaker:

the bigger that, that, that kind of thing is gonna be a problem, right?

Speaker:

you bring this up.

Speaker:

So I was reading the register this morning

Speaker:

Mm-hmm.

Speaker:

there was a call in or a write in from a, a reader,

Speaker:

Mm-hmm.

Speaker:

and they were saying that they had worked at a company I.

Speaker:

In it and managed a bunch of infrastructure and they had built

Speaker:

this system to automate all of their, uh, software deployment stuff.

Speaker:

Mm-hmm.

Speaker:

Um, and then they had quit the company, but no one knew how to

Speaker:

operate it, and he had left his number, it's in the closet, was a machine.

Speaker:

He had left his number.

Speaker:

It said, do not reboot, call Steve or whatever his name was.

Speaker:

And he got the call and this was like 20 years later,

Speaker:

Wow.

Speaker:

he got a call and he was like, I don't remember the password.

Speaker:

I'm sorry.

Speaker:

You gotta figure it out on your own.

Speaker:

Wow.

Speaker:

That's crazy.

Speaker:

That's just crazy.

Speaker:

So call Steve.

Speaker:

That's funny.

Speaker:

Um, yeah, don't, don't be like that.

Speaker:

Um, so, uh, let's just say we get to the end of the test, right?

Speaker:

We've successfully recovered all of the, all of the aspects

Speaker:

if we're doing a full DR test.

Speaker:

What needs to happen is a full sort of end-to-end functional test of the

Speaker:

different parts of the business to make sure that not just that the, that a

Speaker:

system was recovered or a database was recovered, but the application and the.

Speaker:

The, the system around that application is able to function.

Speaker:

And again, this is why we go into things like phone systems, right?

Speaker:

Yeah.

Speaker:

Um, you know, if, if the, the application that we're recovering is our customer

Speaker:

call center, um, but we don't have phones, uh, great, uh, you know, all of that

Speaker:

stuff, all of that stuff has to work.

Speaker:

And you've got to do the functional end-to-end test to make sure that all

Speaker:

the parts that you are pretending.

Speaker:

Are, you know, were damaged, are now fully functional.

Speaker:

I agree to that, but I think it's also one of the things, you have to

Speaker:

be careful not to boil the ocean.

Speaker:

Yes.

Speaker:

Yeah, yeah.

Speaker:

Well, again, this is about,

Speaker:

Yeah.

Speaker:

what's that?

Speaker:

a balance.

Speaker:

I.

Speaker:

Well, what I'm saying is, whatever it is, this is, I, I think what you're

Speaker:

talking about is, is more about scope,

Speaker:

Yes.

Speaker:

Because.

Speaker:

Even if we just agreed to test this one part of the application, you

Speaker:

need to do a functional test of whatever it is that you recovered.

Speaker:

E even if it's just a small part of the environment.

Speaker:

What I'm that, that's all I'm saying.

Speaker:

Yeah.

Speaker:

Right.

Speaker:

That, that, that we often focus a little bit too much time on the

Speaker:

recovery, the restore, and we say, okay, the application's restored.

Speaker:

I can walk away.

Speaker:

No, the application's restored.

Speaker:

When the application is restored, when people can do the thing that whatever

Speaker:

it is that application was supposed to.

Speaker:

I was intent, I was thinking more about, be careful about thinking

Speaker:

about all the failure scenarios.

Speaker:

Like I was saying, the person gets sick with the flu, right.

Speaker:

Oh, yeah, yeah, yeah.

Speaker:

Yeah.

Speaker:

about going down that rabbit hole because you will never come back

Speaker:

out because it might be, what if the butterfly flops its wing halfway around

Speaker:

the world and causes X, Y, Z, right?

Speaker:

So,

Speaker:

The butterfly will die.

Speaker:

right.

Speaker:

So don't get overwhelmed by these scenarios is

Speaker:

Yeah.

Speaker:

And, and speaking of not being overwhelmed when we get to the

Speaker:

post, you know, when we get to the, uh, the post game analysis, right?

Speaker:

Let's measure against the success criteria that we agreed to.

Speaker:

Um, we, we look at the things that didn't work and the

Speaker:

bottlenecks and things like that.

Speaker:

The key, again here is to better the world, not to prove that

Speaker:

you were the best or whatever.

Speaker:

I.

Speaker:

Um, I know it can be really difficult.

Speaker:

Say that again.

Speaker:

you were the worst.

Speaker:

You were the worst.

Speaker:

Yeah.

Speaker:

Um, you know, we're looking for things that we can improve.

Speaker:

We're looking for procedures that we can update based on, you know,

Speaker:

the lessons that we learned.

Speaker:

Um, any other post-game analysis?

Speaker:

What can you, that you can think of?

Speaker:

I would also say.

Speaker:

If this is your first time doing this, I think it's also good

Speaker:

to say what things went well.

Speaker:

I think a lot of times we tend to focus on the negatives,

Speaker:

Right.

Speaker:

right?

Speaker:

But if this is your first time, like this is really hard.

Speaker:

This is a hard thing to do.

Speaker:

Yeah.

Speaker:

And you should acknowledge that and realize if you got through, like I

Speaker:

know Curtis, you've always talked about the bank and your DR tests, right?

Speaker:

And how I don't think you guys ever completed a hundred

Speaker:

No.

Speaker:

right?

Speaker:

No.

Speaker:

Yeah, yeah.

Speaker:

So don't be too hard on yourself.

Speaker:

Congratulate yourself first off on doing the test in the first place,

Speaker:

and second, making it to the end of the test, even if everyone is dead.

Speaker:

Um, you know, and then, and then, and then, you know, yeah,

Speaker:

don't be too hard on yourself.

Speaker:

Right.

Speaker:

Uh, because these things, these things rarely do they go well, uh,

Speaker:

unless it's like fully automated.

Speaker:

And, you know, the, the more I will say, the more you can

Speaker:

automate things, the better.

Speaker:

Right?

Speaker:

Yeah.

Speaker:

So you ran the tests,

Speaker:

Mm-hmm.

Speaker:

things that went well, things that went wrong.

Speaker:

think the next step after that is.

Speaker:

Identifying how do you close the gaps,

Speaker:

Right.

Speaker:

And coming up with a plan, because you don't want to

Speaker:

just let these things linger,

Speaker:

Right.

Speaker:

create a plan.

Speaker:

Identify what are the most critical elements that you want to address first

Speaker:

Mm-hmm.

Speaker:

timeframes, and make sure you get buy-in across the board to fix those things.

Speaker:

Yeah.

Speaker:

Agreed.

Speaker:

Right.

Speaker:

Um, you, you, you have a, you have an action item list and who's responsible

Speaker:

for addressing the different things, and then of course, what's the next thing?

Speaker:

You do it again.

Speaker:

Right?

Speaker:

Um.

Speaker:

When,

Speaker:

Uh, soon.

Speaker:

Right.

Speaker:

Um, I would say I'm a fan of more frequent, smaller tests

Speaker:

than like an annual huge test.

Speaker:

Right.

Speaker:

Um, I think the more often we do that, the more we get into the, the

Speaker:

mindset of thinking about the things that can go wrong, because a, a lot

Speaker:

of things are, are, you know, um.

Speaker:

They're the same on different discip disciplines across the, uh, the,

Speaker:

the, uh, the organization, right?

Speaker:

So the more often we test, the more often we get to a recovery mindset and

Speaker:

we start including those things in the system design from the very beginning.

Speaker:

Yeah.

Speaker:

Right?

Speaker:

Um, again, that's the other purpose.

Speaker:

I would add that to.

Speaker:

My original question, that's the other purpose of a DR test, is

Speaker:

to get people to a DR mindset,

Speaker:

Yeah.

Speaker:

um, to a recovery mindset of saying, um, we need to design the infrastructure and

Speaker:

the processes around the infrastructure so that they are easy to recovery.

Speaker:

Right.

Speaker:

Yep.

Speaker:

Or at least even think about it to start with rather than, oh yeah, this failed.

Speaker:

Now what?

Speaker:

What were you gonna do with our Dr.

Speaker:

Yeah.

Speaker:

And, and, and lemme just give you a, a, a silly but simple example of what happens

Speaker:

when you don't have a recovery mindset.

Speaker:

So I go back to the bank, right?

Speaker:

I have so many good stories from the days of the bank, right?

Speaker:

And when we bought a a T 1000, which was, uh, an HP server, it

Speaker:

was a really big server and it had, um, it was a huge server.

Speaker:

It had a hundred gigabytes of data.

Speaker:

Ginormous, wait.

Speaker:

Let me go grab my flash drive.

Speaker:

It was a huge server for the time, and it came with a two gigabyte tape drive.

Speaker:

Right.

Speaker:

I think with compression it was like a four gigabyte tape drive

Speaker:

that, that was a system design.

Speaker:

And there, there were no changes.

Speaker:

No, we, we, we added 30%.

Speaker:

With one server, we added 30% to the capacity of the

Speaker:

data center with one server.

Speaker:

There wasn't a single discussion about what we should do from a

Speaker:

backup and recovery perspective.

Speaker:

That's what happens when you don't have a recovery mindset,

Speaker:

Yeah.

Speaker:

right?

Speaker:

Is that you, you do things, you add things to the system without any thought

Speaker:

to what they would, you know, how that would impact the recovery system.

Speaker:

So that's why we want to have a recovery mindset.

Speaker:

Yep.

Speaker:

Okey dokey.

Speaker:

I think we covered everything.

Speaker:

Yeah.

Speaker:

I think so, yeah, everything you could possibly want to know about

Speaker:

Dr in, uh, four episodes with the two, the two, maybe five.

Speaker:

We'll see.

Speaker:

We'll see if I can find that other episode.

Speaker:

Thanks Prasanna for, uh, you know, once again, uh, you know, great team.

Speaker:

Woo hoo.

Speaker:

Go team.

Speaker:

Go.

Speaker:

Team go and uh, I want to thank you once again to our listeners.

Speaker:

We'd be nothing without you.

Speaker:

That is a wrap.