Speaker:

You found the backup wrap up your go-to podcast for all things

Speaker:

backup recovery and cyber recovery.

Speaker:

In this episode, we're tackling one of the biggest lies in it,

Speaker:

your recovery time objective.

Speaker:

I don't care what your RTO documentation says or what you

Speaker:

believe you've promised your bosses.

Speaker:

If you haven't tested it, you can't meet it.

Speaker:

Period persona and I break down why most organizations are living in fantasy

Speaker:

land when it comes to recovery time, objective, and more importantly, what

Speaker:

you can actually do to address that gap.

Speaker:

If you've ever felt that pit in your stomach when someone

Speaker:

asks you about recovery times.

Speaker:

This is your episode.

Speaker:

Let's get real about RTO.

Speaker:

By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.

Speaker:

Backup, and I've been passionate about backup and recovery ever since.

Speaker:

I had to tell my boss that there were no backups of that production

Speaker:

database that we had just lost.

Speaker:

I don't want that to happen to you, and that's why I do this.

Speaker:

On this podcast, we turn unappreciated backup admins into cyber recovery heroes.

Speaker:

This is the backup wrap up.

Speaker:

hi, and welcome to the backup wrap up.

Speaker:

I'm your host, w Curtis Preston, AKA, Mr. Backup, and I have with

Speaker:

me the rarest of All Beasts lately.

Speaker:

Anyway, Prasanna Malaiyandi how's it going?

Speaker:

Prasanna, I.

Speaker:

I am good, Curtis.

Speaker:

I know it's been a

Speaker:

It is been, it's been a minute.

Speaker:

we've

Speaker:

why, that's why the listeners have been listening to like repeats.

Speaker:

Uh, yeah.

Speaker:

'cause you of course you're gonna blame it on me with I know, I know.

Speaker:

I was working the election, I was working the, uh, I, for those who don't

Speaker:

know, I'm gonna, I'm a poll worker, you know, and I'm not doing other things.

Speaker:

Site

Speaker:

Yeah.

Speaker:

I, I am a site manager of the Yeah, the Bonsall Vote Center.

Speaker:

In San Diego.

Speaker:

And so we did have our special election and so I worked for 11 days, not

Speaker:

including the setup and tear down day.

Speaker:

So I've been a little busy.

Speaker:

Okay.

Speaker:

What, how many voters?

Speaker:

Yeah, you've been a little busy and how many voters Yeah.

Speaker:

Did you have

Speaker:

Uh, we had like, like 75 over the first 10 days.

Speaker:

And then on the last day we had about 400.

Speaker:

Um.

Speaker:

Oh

Speaker:

Which was, which is, which is a lot.

Speaker:

Um, and I, you know, I love, I love, I love democracy.

Speaker:

I love people.

Speaker:

I want everybody to, uh, to, to, to, to vote.

Speaker:

Um, you know, if you don't vote, you don't get to bitch.

Speaker:

That's my,

Speaker:

But please don't wait

Speaker:

but yeah, for the love of God, look into, look into your, your sight

Speaker:

most, or your state most likely has.

Speaker:

Early voting, look into early voting and early vote or vote by mail.

Speaker:

Right?

Speaker:

Um, those 400 people could have come any time in the previous

Speaker:

10 days, and we would've, they could have voted just the same.

Speaker:

Um, yeah.

Speaker:

Anyway, so please vote.

Speaker:

Um,

Speaker:

Well, welcome back.

Speaker:

Yeah.

Speaker:

So, um.

Speaker:

I wanted to, we're, we're gonna kind of, you know, we're kind of

Speaker:

redoing things after a, you know, a couple different phases here.

Speaker:

And, uh, we're gonna just try to do some, some hot topics

Speaker:

that, um, I think are important.

Speaker:

And one of them that we're gonna talk about this week is RTO.

Speaker:

And specifically which recovery time objective.

Speaker:

Of course, we're, we're gonna what?

Speaker:

What?

Speaker:

Yeah.

Speaker:

Return to office.

Speaker:

So we're gonna talk about return to office and then, um, and you know, what it is

Speaker:

and why it is fantasy for most people.

Speaker:

And then what, what they could do, um, you know, to, to address that.

Speaker:

So first off, you want to define recovery time objective.

Speaker:

Yeah, it's basically your objective, right?

Speaker:

Your goal for how long it should take you to recover from some disaster, right?

Speaker:

And get back to a good known spot.

Speaker:

This is including things like recovering your data, reconfiguring

Speaker:

your network, right, and different.

Speaker:

disasters might have different recovery time objectives, so it's also important

Speaker:

to remember, like recovering a file may be a lot less in terms of RTO

Speaker:

say, recovering an entire data center.

Speaker:

If it, uh, something

Speaker:

Yeah, it's interesting you brought up that that's actually a hotly debated topic as

Speaker:

to whether or not RTO should ever change.

Speaker:

I agree with you that, um, the RTO is situational, situationally dependent,

Speaker:

um, and that, you know, if, if you've been attacked by ransomware.

Speaker:

For example, there's no way you're gonna meet sort of what

Speaker:

I would call a normal RTO.

Speaker:

Uh, and the same, you know, and the same with like, if it's a complete

Speaker:

disaster that wipes out your entire data center and you have to physically

Speaker:

build a building into which to put your servers or something like that,

Speaker:

that RTO should be, um, larger than, you know, we lost a single server.

Speaker:

Or like, you know, you said a lost a single file.

Speaker:

Or if you go tell like your application admin who needs to recover data, oh, by

Speaker:

the way, it's gonna take one week or two weeks to recover your data because that's

Speaker:

the RTO you set for like site disasters.

Speaker:

They're also probably gonna be unhappy,

Speaker:

Yeah, absolutely.

Speaker:

wait, wait, wait.

Speaker:

That makes

Speaker:

So that's another really important thing that you brought up right there,

Speaker:

which is the, and, and this is a really important concept that goes

Speaker:

through almost everything that we teach, and that is that you, meaning

Speaker:

the backup admin, the sysadmin in charge of backups, whoever you happen

Speaker:

to be, you do not determine the RTO.

Speaker:

Right.

Speaker:

The business unit determines the RTO or whatever, whatever term is appropriate at

Speaker:

your governmental entity or NGO, right?

Speaker:

Um, that is the, the entity that determines, uh, the the

Speaker:

recovery time objective because it's based on finances, right?

Speaker:

It's based on, uh, you know, like if it's a business, it's based on

Speaker:

how much money are we going to lose.

Speaker:

While we are down, right?

Speaker:

Um, if it's a governmental organization, it's based, it,

Speaker:

it's very different, right?

Speaker:

It, it, it's more along the lines of how much damage to our organization

Speaker:

like reputationally will happen based on how long we're down.

Speaker:

And also how much more difficult will it be to redo the things that

Speaker:

we, you know, to, to do the things we had to do while we were down.

Speaker:

Uh, you might have to switch to, you know, to paper in the meantime.

Speaker:

And, uh, so you, but, but the point is, all of these calculations are

Speaker:

things that the, the business or management should be doing, not

Speaker:

those, uh, in charge of backups.

Speaker:

Uh, what role do you think the, the, the backup people play in determining,

Speaker:

uh, recovery time objective?

Speaker:

I think it is basically to figure out, okay, based on what the business has asked

Speaker:

for, say they come back and say, okay, my Reto recovery time objective is one

Speaker:

day.

Speaker:

Based on that, here are some options that we can do technology wise,

Speaker:

and I think their goal is to come back and say, okay, here's how much

Speaker:

it will cost you if you want to support that recovery time objective

Speaker:

Yeah.

Speaker:

And, and you know, in the very beginning this is gonna be ballpark numbers, right?

Speaker:

Um, well the first thing I would say is you come back and you go, okay,

Speaker:

you've asked for A, we do B, right?

Speaker:

We do a times four.

Speaker:

Um, right.

Speaker:

So, well, let let me ask you this.

Speaker:

Why do you think, uh, I have, I have my opinions, uh, I'm

Speaker:

curious, why do you think.

Speaker:

Most organizations, if they have an RTO or even if it's poorly documented,

Speaker:

et cetera, et cetera, et cetera.

Speaker:

If we haven't agreed upon RTO, why are most organizations

Speaker:

completely unable to meet that RTO?

Speaker:

Well, the biggest thing is they probably haven't tested.

Speaker:

To understand like is it actually like, and that's why I said like when you

Speaker:

asked the definition of RTO, right?

Speaker:

It's your desire, it's your objective.

Speaker:

It doesn't mean what you will

Speaker:

actually hit because there are so many other things involved.

Speaker:

Like we talked about.

Speaker:

Maybe part of your RTO is just bringing back the data or the

Speaker:

application, but then what about.

Speaker:

Like making sure I'm able to procure the servers to recover or get those

Speaker:

up and running.

Speaker:

Uh,

Speaker:

maybe I,

Speaker:

need to bring up active directory or Intra or whatever it's called

Speaker:

now, whatever Microsoft calls

Speaker:

it's, it was rebranded while we were on this recording.

Speaker:

Yeah.

Speaker:

right?

Speaker:

But all of these other things, which maybe you don't necessarily have control

Speaker:

of, and maybe you're only thinking as a backup admin of, Hey, I need to recover

Speaker:

the application or just the data, restore

Speaker:

Well, and also in addition, and again, let, let's make sure that

Speaker:

we, we, we say that the recovery time objective has been met or not

Speaker:

met when you, when the application.

Speaker:

Is fully up and running and available for use by the user, right?

Speaker:

It's not, oh, well I did my restore.

Speaker:

We got a four hour R-T-O-I-I did my restore and it only took four hours.

Speaker:

No, the question is, is the application back up and running?

Speaker:

And that includes it, like I said, any hardware, uh, procurement, which, which

Speaker:

hopefully you're doing in the cloud.

Speaker:

But any hardware procurement, any.

Speaker:

Stuff you gotta do.

Speaker:

And if we're talking about ransomware, all of the, the, the stuff you gotta

Speaker:

do to make sure that the, the server is ready to, uh, be restored and the

Speaker:

restore, depending on what type of thing you're recovering from, the

Speaker:

actual restore may be the smallest part of the, uh, recovery time.

Speaker:

Actual is the term that we use.

Speaker:

Based on your sort of consulting,

Speaker:

Yeah.

Speaker:

Mm-hmm.

Speaker:

this right, what would you estimate is that ratio between time to

Speaker:

actually restore the data or the application versus the end-to-end?

Speaker:

RTO

Speaker:

and I?

Speaker:

I'm

Speaker:

Yeah.

Speaker:

If, if we're not talking about ransomware, it's like 80 20.

Speaker:

Right.

Speaker:

Uh, meaning 80% of the time spent doing the ba the Restore, the other 20%

Speaker:

in a modern day scenario where we're probably, uh, gonna do this in the

Speaker:

cloud so that we can, you know, snap our fingers and we have the hardware that

Speaker:

we need, uh, we're doing the Restore.

Speaker:

It's well, you know, well tested, uh, although often it is not right.

Speaker:

And then there's some amount of time to do some initial functionality

Speaker:

testing to make sure that all the dependencies have been met.

Speaker:

And then, um, you know, and then we're, we're ready to roll.

Speaker:

Right?

Speaker:

So I'd say it's like 80 20 in a, in a ransomware scenario,

Speaker:

it's like, you know, 10 90,

Speaker:

Yeah,

Speaker:

right?

Speaker:

yeah,

Speaker:

gonna spend most of your time making sure that you're recovering

Speaker:

to a, uh, pristine environment.

Speaker:

yeah.

Speaker:

And I think that's one thing you just touched upon in your previous statement,

Speaker:

which was like, sometimes things change.

Speaker:

And we are talking and sort of like understanding why do most people's

Speaker:

RTOs not meet what is expected?

Speaker:

Do you wanna touch on some

Speaker:

Well,

Speaker:

I know

Speaker:

yeah, so I'm gonna say the number one reason is that they simply don't have a

Speaker:

backup or disaster recovery system that is capable of meeting that RTO just period.

Speaker:

Hmm.

Speaker:

They didn't do, uh, and, and this is.

Speaker:

This is quite possibly they, I, I remember when, when I, you know,

Speaker:

back, go back 30 years, right?

Speaker:

That, that we knew we abs everyone knew that our backup system wasn't

Speaker:

anywhere near capable of meeting the RTOs that we had discussed.

Speaker:

Even though we didn't use that term back then, I, I'm sure the term was

Speaker:

available, but I didn't use it and the.

Speaker:

We just knew that it, it just wasn't, wasn't possible.

Speaker:

Right.

Speaker:

Um, I mean, in some cases it was laughably impossible, right?

Speaker:

Um, and that we had servers that it took us a week.

Speaker:

It took, took us a week to get a full backup.

Speaker:

Okay.

Speaker:

Like, how are we gonna meet a four hour R-P-R-T-O if it takes

Speaker:

a week to do a full backup?

Speaker:

Right?

Speaker:

Uh, and by the way, our next episode we're gonna talk about

Speaker:

RPO recovery point objective.

Speaker:

So it's a, it's very much a sister episode to this, but that, I'd say

Speaker:

that's the number one reason is that people's backup systems, and, and again,

Speaker:

I used the term backup very broadly.

Speaker:

Anything that.

Speaker:

That brings the server back to the way it looked, you know, before the

Speaker:

disaster is a backup system to me.

Speaker:

There.

Speaker:

Well, you just use 'em for different purposes, disaster recovery, et cetera.

Speaker:

Um, but that's the number one reason.

Speaker:

The other reason that's very closely related to that is

Speaker:

that they have no idea, right?

Speaker:

They, they haven't tested right.

Speaker:

They, they've got a system, they've got a clue.

Speaker:

Right.

Speaker:

And they're like, oh, well it takes us, you know, three hours

Speaker:

to, or four hours to back up.

Speaker:

Therefore, we should be able to do a four hour restore.

Speaker:

There's a lot of ifs in that, right?

Speaker:

Yeah.

Speaker:

the, the other thing is a, as you know, restores often a

Speaker:

lot slower than backup, right?

Speaker:

For a number of reasons that, you know, are all over the

Speaker:

place that, that they, they

Speaker:

the

Speaker:

ahead.

Speaker:

is like incremental

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Forever incremental.

Speaker:

Right.

Speaker:

That I would probably say is the biggest

Speaker:

Yeah.

Speaker:

That you're, that you're piecing together a restore from many, many, many stuff.

Speaker:

Uh, I, I think if you're, if you, if you have a proper design that alone

Speaker:

shouldn't, um, you know, impact you.

Speaker:

If you are doing a, a, you know, sort of the old school full restore, followed

Speaker:

by each incremental restore, and that means you're actually restoring some

Speaker:

files multiple times, then Absolutely.

Speaker:

Right.

Speaker:

If you're doing, if you, if you have a, if you have a. A system that is

Speaker:

properly de uh, developed, right?

Speaker:

That fixes that issue where if we know a file has changed, then we're not

Speaker:

gonna restore that file multiple times.

Speaker:

We're just gonna restore the latest version of the file.

Speaker:

If you have that, that's not really the problem.

Speaker:

But you do have the issue of ddu, right?

Speaker:

You have the, the DDU tax that quite often really rears its ugly

Speaker:

head when we go to do a restore.

Speaker:

Now why would that be?

Speaker:

Why would that be the case Prasanna?

Speaker:

Because when you're dup Deduplicating data, throwing away a whole

Speaker:

Mm-hmm.

Speaker:

But the problem is when you need to read it, you're basically doing random reads

Speaker:

across the entire system in order to be able to recreate that single file.

Speaker:

Because you might have old blocks from one part of the disc and a

Speaker:

different part of the file from a different part of the disc.

Speaker:

And so you end up with all these random reads, which as we know, our disc

Speaker:

drives are not very good at doing random

Speaker:

Yeah, it, it's the, it's the ultimate fragmented file system, right?

Speaker:

Uh, you are just, you're absolutely guaranteeing that everything

Speaker:

you need is everywhere, right?

Speaker:

Um, and, um, the, that, that is absolutely one of the cases.

Speaker:

And, and if we're coming from tape.

Speaker:

Right.

Speaker:

Uh, which is probably less likely for most people.

Speaker:

But if we're coming from tape, then we really do start talking about that,

Speaker:

the, the, the forever incremental stuff and, uh, you know, because you're having

Speaker:

to load all these tapes, uh, but also there's a network can get in the way.

Speaker:

There's also, depending on what.

Speaker:

Raid, uh, we're using, right?

Speaker:

If we're using RAID and, and we're using raid, right?

Speaker:

Everybody's using raid.

Speaker:

Yeah,

Speaker:

depending on whether or not you opted for, does anybody

Speaker:

wrap up for RAID 10 these days?

Speaker:

I don't know.

Speaker:

I

Speaker:

I don't think so.

Speaker:

I think everybody does raid six, right?

Speaker:

Or, or, or something.

Speaker:

So raid, dual parody or whatever, and that has a right penalty, right?

Speaker:

So for a number of reasons, restores are often slower than

Speaker:

backup and you will never know.

Speaker:

Until you do what Prasanna.

Speaker:

You

Speaker:

Exactly.

Speaker:

And again, go ahead.

Speaker:

Well, I'm gonna bring, I'm gonna bring out a story.

Speaker:

Sorry.

Speaker:

Uh, going,

Speaker:

okay.

Speaker:

going back to going back to my first, you know, the first time that things were

Speaker:

really, really bad was that time when we had a new backup system and we had

Speaker:

used a compression feature on, on the way in, and it was software compression.

Speaker:

And long story short, when we went to, uh, there, there was a. There

Speaker:

was the, um, we went to do the first major restore after we needed it.

Speaker:

Right?

Speaker:

We, we didn't test restores, we only tested backups.

Speaker:

And uh, when we went to do it, uh, the, it was a DD, s and it was like,

Speaker:

blink, blink, long pause, right?

Speaker:

Blink.

Speaker:

Blink.

Speaker:

And once we called into support and they were like, yeah, it's working as design.

Speaker:

And basically we had not.

Speaker:

Tested this at all.

Speaker:

And not only was it slow, it wouldn't work.

Speaker:

It, it was just literally without going, without taking too much time,

Speaker:

it just literally wouldn't work.

Speaker:

Right.

Speaker:

And, uh, unless we like tripled the size of ram or something.

Speaker:

Right.

Speaker:

And, um, so yeah, you, you just do not know how your system is going to

Speaker:

perform until you go to do a restore.

Speaker:

Yeah.

Speaker:

And.

Speaker:

One thing similar to that story is you should also do a realistic restore test.

Speaker:

Don't just be like, oh, I'm gonna just restore a file, or

Speaker:

I'm just gonna restore a vm.

Speaker:

I'm good

Speaker:

Yeah.

Speaker:

Right?

Speaker:

Because that may not be a realistic scenario for when you have to

Speaker:

recover a full application suite or your entire environment.

Speaker:

So make sure you're doing the right little type of

Speaker:

Yeah, absolutely.

Speaker:

It should,

Speaker:

any

Speaker:

it should be whatever the, whatever the thing is that we're

Speaker:

setting the RTO for, right?

Speaker:

Uh, you don't have restore the entire environment, but you need to do

Speaker:

representative restore tests, right?

Speaker:

Um, entire servers, entire environments, entire recovery groups.

Speaker:

What's a recovery group Prasanna?

Speaker:

It is a group of things you need to restore in order for

Speaker:

your application to come back.

Speaker:

So it might be your database server plus your storage, plus your active

Speaker:

directory or uh, system, right?

Speaker:

Plus whatever else is needed in order to get that production application back

Speaker:

exactly.

Speaker:

And so, and, and so that becomes important too.

Speaker:

I was actually gonna comment on that, because there's an order

Speaker:

of operations you have to do, and so you have to account for that.

Speaker:

When you calculate your RTO, it's not like, oh, I can just restore my, uh,

Speaker:

database, have it up and running before I have active directory up and running.

Speaker:

It's not

Speaker:

Right.

Speaker:

And, and by the way, that one, one of the things that prompted this episode,

Speaker:

uh, Kaseya did a 2025 state of the backup industry, uh, and they said that more

Speaker:

than 60% of respondents believed that they could recover under, in, under a day.

Speaker:

However, only 35% could actually do that in reality, which is, that's

Speaker:

quite, that's quite a, a gap there.

Speaker:

Um.

Speaker:

Yeah.

Speaker:

Another interesting thing was that only 10% of businesses reported

Speaker:

no outages in the last 12 months.

Speaker:

Uh, which means that 90% tested their backup systems the hard way.

Speaker:

Uh, not quite as hard as, uh, our Alaskan friend, but, uh, which for those of

Speaker:

you that haven't heard that episode, uh, he tested DR system by deleting

Speaker:

the entire surfer for the entire.

Speaker:

Data center and then restoring, and that was his first test.

Speaker:

And it was like, gee, I hope it works.

Speaker:

Don't do it like that.

Speaker:

Um,

Speaker:

but hey, it worked out

Speaker:

yeah, exactly right.

Speaker:

Um, and remember, again, going back to the things that fit into the RTO, right?

Speaker:

You know, you, you also have to, to include things like

Speaker:

detecting that there's a problem.

Speaker:

Right, because the RTO clock starts the moment the outage happens, not

Speaker:

the moment the restore happens, right?

Speaker:

So, uh, the moment you have the outage and then you're like, what's going on?

Speaker:

Right?

Speaker:

Because so many times the, the symptom that gets your attention has nothing to

Speaker:

do with the thing that actually went bad.

Speaker:

Right.

Speaker:

Uh, I mean, it does have something to do with it, but it's not

Speaker:

the thing that went bad, right?

Speaker:

So you gotta figure that out.

Speaker:

You gotta understand how bad it is if it's a ransomware attack.

Speaker:

Again, you gotta figure out, you know, how bad this, you know, how big the scope is.

Speaker:

You might have to get approvals, um, you know, all these different things, right?

Speaker:

Yeah.

Speaker:

And well, just one thing to add to that, because I was thinking

Speaker:

about the, uh, what was the.

Speaker:

Company,

Speaker:

Rackspace.

Speaker:

with their hosted exchange.

Speaker:

Right.

Speaker:

I think one of the things to also consider.

Speaker:

Uh, when you're thinking about RTO is order to bring my app back up and

Speaker:

running, do I need to restore all my data?

Speaker:

an example?

Speaker:

Maybe I only need a subset of my data in order for my application to come up

Speaker:

and I can solely backfill all my old data that's archived or other things

Speaker:

like that, I can still get people up and running and ready to go without

Speaker:

waiting for everything to be done.

Speaker:

And so there might also be slight nuances depending on the application

Speaker:

of what the expectations are.

Speaker:

Yeah.

Speaker:

The other thing I would say regarding that Rackspace outage, if you're in

Speaker:

the middle of your recovery or you're about to begin your recovery, don't

Speaker:

change all the rules, right In, in their case, they're like, we tested how to

Speaker:

do this recovery, but you know, just before they went to do the recovery,

Speaker:

they're like, ah, what if we just move everything over to Microsoft 365?

Speaker:

Right?

Speaker:

And it's like, oh, well that would mean that we have to like.

Speaker:

Basically you, you can't, you can no longer restore the exchange

Speaker:

databases directly into the user.

Speaker:

Uh, you have to, um, you'll have to restore it and then migrate

Speaker:

the data over individually, which is a much bigger process.

Speaker:

Much, it's gonna just take much, much longer, and it ended up taking months.

Speaker:

You may recall, and there was a, uh, some lawsuits regarding that.

Speaker:

So make sure that whatever scenario in which you do, uh, you do the testing,

Speaker:

you, you, you have to do the testing.

Speaker:

So, um,

Speaker:

Yep.

Speaker:

I know we talked earlier about, okay, that 24 hour RTO for some businesses,

Speaker:

but there are some industries, right?

Speaker:

Where even like seconds make a big difference, right?

Speaker:

Yeah, definitely.

Speaker:

Yeah, definitely like financial trading firms, banking organizations, the more you

Speaker:

can attach a real number when you can say one hour of downtime costs us this much.

Speaker:

If, if you can do that, if the business can do that.

Speaker:

One, $1 billion.

Speaker:

Um, yeah, I'm sorry, I, I gotta do the, the pinky, right?

Speaker:

Um, if you can do that, the more you can do that, the, the, the, the

Speaker:

much more equipped you will be as a, you know, backup and dr. Person.

Speaker:

To be able to make enhancements to the backup and recovery system if needed.

Speaker:

Right.

Speaker:

So let's talk about, uh, some of the things that you

Speaker:

could do to close this gap.

Speaker:

Obviously, the first, the first thing is if you can have an

Speaker:

iterative discussion on, uh, okay.

Speaker:

You said you want one minute, we can do 10 hours.

Speaker:

Right.

Speaker:

Let's figure out, you know, let's get the, let's get the RTO set to somewhere near.

Speaker:

Um, you know, uh, realistic that we, that we can actually meet, right?

Speaker:

And you can, you can say, we're gonna set the RTO for now at this.

Speaker:

We're gonna move towards, uh, a better RTO at a later, a later time.

Speaker:

Um, any thoughts there?

Speaker:

Yeah, no, I think that makes sense because it also takes time to implement

Speaker:

new technologies because if, say for instance, your RTO is 10 hours based

Speaker:

on your existing infrastructure, and now they're like, oh, we need

Speaker:

an hour or 10 minutes, right?

Speaker:

You're now going to need to think of something very different that's

Speaker:

gonna elongate the time it takes, and so it really is important to

Speaker:

ask the question, do you need.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

The help, just,

Speaker:

start with

Speaker:

yeah, everybody's gonna say zero and zero for your RTO and RPO, right?

Speaker:

So it is just, you gotta justify it and you gotta say, well, if it's

Speaker:

really worth $10 million every minute.

Speaker:

Then you need to give us, you know, whatever the number is.

Speaker:

Right.

Speaker:

Um, so then if we're gonna do testing, if we can automate that testing, the more we

Speaker:

can automate testing, the more the, you know, the better that things are gonna be.

Speaker:

Doing it very regularly, doing it small, it's sort of like, uh, the

Speaker:

same as opinions that I have on testing your, it's kind of like in

Speaker:

cybersecurity where you have a company that actively tries to send phishing.

Speaker:

You know, phishing tests to the users to see if, um, to

Speaker:

see if they fall for it, right?

Speaker:

Same thing here, where over there, more frequent, smaller bite-sized testing is

Speaker:

preferred to the once a year I have to do this and it takes two hours, right?

Speaker:

Keeping it on the mind, keeping a recovery mindset is really important.

Speaker:

So I think regular Dr. Drills is part of that.

Speaker:

And I think having the regular DR drills is important because if something

Speaker:

changes in your environment, you

Speaker:

Yeah,

Speaker:

rather than sort of that

Speaker:

exactly.

Speaker:

And then there's also the concept of chaos engineering, um, whi, which, you

Speaker:

know, like the chaos monkey, right?

Speaker:

You wanna talk about that?

Speaker:

Yep.

Speaker:

Yep, yep, yep.

Speaker:

Yeah.

Speaker:

So.

Speaker:

just try breaking things in your environment, see what happens and

Speaker:

see did I miss something that I wasn't backing up as an example.

Speaker:

Maybe you forgot a backup active directory, and now in order and something

Speaker:

happened to it, you lost all the data there and you realize, oh, I can't recover

Speaker:

my application because I don't actually have a backup of active directory.

Speaker:

And so you start to understand the dependencies in your

Speaker:

environment and point out sort of.

Speaker:

Issues that you might not foresee different failure scenarios or like if

Speaker:

the network goes down or The other thing is, it's not even just technology, right?

Speaker:

It might be even a person.

Speaker:

I know Curtis, you used to talk about at the bank doing

Speaker:

testing with someone who did not

Speaker:

Yeah, exactly.

Speaker:

Exactly.

Speaker:

Uh, well back then, I didn't write the book, but Yeah.

Speaker:

Yeah.

Speaker:

I wa I wasn't Mr. Backup yet.

Speaker:

I wa I was, I was Mr. Backup junior.

Speaker:

Um, and, and then, you know, the idea is if, if you do this, the whole, the

Speaker:

whole idea of doing this on a frequent basis to get better at it, to create

Speaker:

and, and, and improve your runbooks, to create and improve decision trees.

Speaker:

What do we do when this happens?

Speaker:

Right?

Speaker:

Um, we also, we didn't talk, uh, at all about, um, uh, tabletop exercises.

Speaker:

Those are, uh, obviously a great, uh, uh, tool here.

Speaker:

Uh, you know, do 'em at lunch.

Speaker:

Do 'em so that they're not like, so again, do them frequently in smaller.

Speaker:

We're not where the whole world isn't, you know, don't do 'em like just before

Speaker:

your, your performance review time.

Speaker:

Which makes them like much more stressful.

Speaker:

Do them frequently and, and, and have fun at it.

Speaker:

And, and then learn from it and improve your runbooks,

Speaker:

improve your decision trees.

Speaker:

Cross train your teams.

Speaker:

Do what we were talking about before.

Speaker:

Don't use the, don't rely on one person.

Speaker:

Uh, you know, you know, because that one person might not be available.

Speaker:

Uh.

Speaker:

You know, uh, at that time, right?

Speaker:

And then measure, again, measure and report reality.

Speaker:

Here's where we are.

Speaker:

Make sure that everyone is on the same page.

Speaker:

We've asked for this, we've agreed to this for now, we would like to get to here.

Speaker:

Here's where we are.

Speaker:

Report those gaps.

Speaker:

Uh, and then let, let business leadership decide.

Speaker:

What to do about that.

Speaker:

It is not your responsibility.

Speaker:

Right.

Speaker:

I do remember like

Speaker:

Yep.

Speaker:

bad because the backup system wasn't capable, but it's like, I'm not magic.

Speaker:

All I can do is make recommendations.

Speaker:

Right.

Speaker:

And I do remember, by the way, I do remember when I, and I had a shell

Speaker:

script that was doing everything right.

Speaker:

We had like, I dunno, like 50 servers and I was doing all this with like a, a Unix.

Speaker:

You know, shell script.

Speaker:

Right.

Speaker:

And at some point I couldn't, but, and all of it was based on that each

Speaker:

server could fit on a tape drive.

Speaker:

And then one day we bought a server that it didn't fit on.

Speaker:

50 tape drives, right?

Speaker:

On 50 tapes.

Speaker:

And, and it, it just, that and other servers that weren't

Speaker:

quite that bad, it just broke.

Speaker:

It broke my ability to do it right.

Speaker:

And I said, I'm just not, I can't do that.

Speaker:

And then, and I just went to the boss and I said, Hey, I can't do this.

Speaker:

And she said, well, aren't there like commercial products that do this,

Speaker:

that we can like spend money on?

Speaker:

Oh, because you flipping

Speaker:

I was, I was flipping, I was flipping out what I was doing,

Speaker:

and I, I remember feeling like a failure because I couldn't fix this.

Speaker:

Right.

Speaker:

I wasn't that good at scripting.

Speaker:

I, I, I don't think anybody could deal with the 50, you know, the right.

Speaker:

But I, I remember at the time feeling like a failure, and I guess I'm

Speaker:

saying don't try not to feel that way.

Speaker:

Right?

Speaker:

Go and give an honest assessment of where you're at.

Speaker:

And, um, even if you're the one that put you in that scenario, right.

Speaker:

Um, I, I remember another story of a guy that told me that he bought a

Speaker:

particular vendor's DDU product that had a 90% DDU tax, meaning that the restore

Speaker:

speed was 10% of the backup speed.

Speaker:

And, and he's like, I don't know what to do.

Speaker:

I'm like, well, you have to tell your boss.

Speaker:

And he's like, I'm the one that recommended the system.

Speaker:

It's okay.

Speaker:

You gotta be honest.

Speaker:

You get, because you can't get there from, you can't get there from here

Speaker:

if you, if you don't address that.

Speaker:

Right?

Speaker:

Yeah, so one question I wanted

Speaker:

Yeah.

Speaker:

Curtis.

Speaker:

There is a term though, right?

Speaker:

So you

Speaker:

Mm-hmm.

Speaker:

and then you have like, okay, you're doing these tests, you

Speaker:

actually figure out like, okay,

Speaker:

Yeah.

Speaker:

long it takes.

Speaker:

There's a term for that though, and I don't think it's

Speaker:

No, it's nowhere near.

Speaker:

Yeah,

Speaker:

it's

Speaker:

nowhere near as As.

Speaker:

Yeah, thanks.

Speaker:

Nowhere near as widely used as RTO.

Speaker:

Right?

Speaker:

And it's RTA recovery time actual.

Speaker:

Some people say recovery time reality, it doesn't matter.

Speaker:

Just have a different term.

Speaker:

Don't say.

Speaker:

Our RTO is an hour when what you're saying is this is how fast you can

Speaker:

recover your RTO is your objective.

Speaker:

Your other thing, I don't care what you call it, recovery time actual is good.

Speaker:

This is where we are.

Speaker:

The difference between your recovery time actual and your recovery.

Speaker:

Time objective is the gap that you need to address with whatever changes in

Speaker:

process, documentation, or quite possibly enhancements to your backup system.

Speaker:

Yeah.

Speaker:

All right.

Speaker:

I think, I think we've covered enough.

Speaker:

And then next, next week we're gonna recover recovery

Speaker:

point objective, which is.

Speaker:

It It is.

Speaker:

Yeah.

Speaker:

It's weird.

Speaker:

Like all the, yeah.

Speaker:

Uh, and this is going to be basically how much data we agree that we

Speaker:

can lose, which is something very different than how long the system is.

Speaker:

Yeah, exactly.

Speaker:

We should, we should recover in zero minutes and we should lose zero data.

Speaker:

We all agree.

Speaker:

That would be amazing.

Speaker:

Uh, it's also not gonna happen.

Speaker:

Well, thanks for, uh, thanks for joining me again.

Speaker:

I am.

Speaker:

Enjoy these.

Speaker:

I

Speaker:

Yeah, I'm glad.

Speaker:

Yeah.

Speaker:

more.

Speaker:

I think now, now that we, you've figured out your world and I figured out my

Speaker:

new world, uh, we, we should be good.

Speaker:

And, uh, thanks to the listeners you're, why we do this.

Speaker:

Uh, that is a wrap.