You found the backup wrap up your go-to podcast for all things
Speaker:backup recovery and cyber recovery.
Speaker:In this episode, we're tackling one of the biggest lies in it,
Speaker:your recovery time objective.
Speaker:I don't care what your RTO documentation says or what you
Speaker:believe you've promised your bosses.
Speaker:If you haven't tested it, you can't meet it.
Speaker:Period persona and I break down why most organizations are living in fantasy
Speaker:land when it comes to recovery time, objective, and more importantly, what
Speaker:you can actually do to address that gap.
Speaker:If you've ever felt that pit in your stomach when someone
Speaker:asks you about recovery times.
Speaker:This is your episode.
Speaker:Let's get real about RTO.
Speaker:By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.
Speaker:Backup, and I've been passionate about backup and recovery ever since.
Speaker:I had to tell my boss that there were no backups of that production
Speaker:database that we had just lost.
Speaker:I don't want that to happen to you, and that's why I do this.
Speaker:On this podcast, we turn unappreciated backup admins into cyber recovery heroes.
Speaker:This is the backup wrap up.
Speaker:hi, and welcome to the backup wrap up.
Speaker:I'm your host, w Curtis Preston, AKA, Mr. Backup, and I have with
Speaker:me the rarest of All Beasts lately.
Speaker:Anyway, Prasanna Malaiyandi how's it going?
Speaker:Prasanna, I.
Speaker:I am good, Curtis.
Speaker:I know it's been a
Speaker:It is been, it's been a minute.
Speaker:we've
Speaker:why, that's why the listeners have been listening to like repeats.
Speaker:Uh, yeah.
Speaker:'cause you of course you're gonna blame it on me with I know, I know.
Speaker:I was working the election, I was working the, uh, I, for those who don't
Speaker:know, I'm gonna, I'm a poll worker, you know, and I'm not doing other things.
Speaker:Site
Speaker:Yeah.
Speaker:I, I am a site manager of the Yeah, the Bonsall Vote Center.
Speaker:In San Diego.
Speaker:And so we did have our special election and so I worked for 11 days, not
Speaker:including the setup and tear down day.
Speaker:So I've been a little busy.
Speaker:Okay.
Speaker:What, how many voters?
Speaker:Yeah, you've been a little busy and how many voters Yeah.
Speaker:Did you have
Speaker:Uh, we had like, like 75 over the first 10 days.
Speaker:And then on the last day we had about 400.
Speaker:Um.
Speaker:Oh
Speaker:Which was, which is, which is a lot.
Speaker:Um, and I, you know, I love, I love, I love democracy.
Speaker:I love people.
Speaker:I want everybody to, uh, to, to, to, to vote.
Speaker:Um, you know, if you don't vote, you don't get to bitch.
Speaker:That's my,
Speaker:But please don't wait
Speaker:but yeah, for the love of God, look into, look into your, your sight
Speaker:most, or your state most likely has.
Speaker:Early voting, look into early voting and early vote or vote by mail.
Speaker:Right?
Speaker:Um, those 400 people could have come any time in the previous
Speaker:10 days, and we would've, they could have voted just the same.
Speaker:Um, yeah.
Speaker:Anyway, so please vote.
Speaker:Um,
Speaker:Well, welcome back.
Speaker:Yeah.
Speaker:So, um.
Speaker:I wanted to, we're, we're gonna kind of, you know, we're kind of
Speaker:redoing things after a, you know, a couple different phases here.
Speaker:And, uh, we're gonna just try to do some, some hot topics
Speaker:that, um, I think are important.
Speaker:And one of them that we're gonna talk about this week is RTO.
Speaker:And specifically which recovery time objective.
Speaker:Of course, we're, we're gonna what?
Speaker:What?
Speaker:Yeah.
Speaker:Return to office.
Speaker:So we're gonna talk about return to office and then, um, and you know, what it is
Speaker:and why it is fantasy for most people.
Speaker:And then what, what they could do, um, you know, to, to address that.
Speaker:So first off, you want to define recovery time objective.
Speaker:Yeah, it's basically your objective, right?
Speaker:Your goal for how long it should take you to recover from some disaster, right?
Speaker:And get back to a good known spot.
Speaker:This is including things like recovering your data, reconfiguring
Speaker:your network, right, and different.
Speaker:disasters might have different recovery time objectives, so it's also important
Speaker:to remember, like recovering a file may be a lot less in terms of RTO
Speaker:say, recovering an entire data center.
Speaker:If it, uh, something
Speaker:Yeah, it's interesting you brought up that that's actually a hotly debated topic as
Speaker:to whether or not RTO should ever change.
Speaker:I agree with you that, um, the RTO is situational, situationally dependent,
Speaker:um, and that, you know, if, if you've been attacked by ransomware.
Speaker:For example, there's no way you're gonna meet sort of what
Speaker:I would call a normal RTO.
Speaker:Uh, and the same, you know, and the same with like, if it's a complete
Speaker:disaster that wipes out your entire data center and you have to physically
Speaker:build a building into which to put your servers or something like that,
Speaker:that RTO should be, um, larger than, you know, we lost a single server.
Speaker:Or like, you know, you said a lost a single file.
Speaker:Or if you go tell like your application admin who needs to recover data, oh, by
Speaker:the way, it's gonna take one week or two weeks to recover your data because that's
Speaker:the RTO you set for like site disasters.
Speaker:They're also probably gonna be unhappy,
Speaker:Yeah, absolutely.
Speaker:wait, wait, wait.
Speaker:That makes
Speaker:So that's another really important thing that you brought up right there,
Speaker:which is the, and, and this is a really important concept that goes
Speaker:through almost everything that we teach, and that is that you, meaning
Speaker:the backup admin, the sysadmin in charge of backups, whoever you happen
Speaker:to be, you do not determine the RTO.
Speaker:Right.
Speaker:The business unit determines the RTO or whatever, whatever term is appropriate at
Speaker:your governmental entity or NGO, right?
Speaker:Um, that is the, the entity that determines, uh, the the
Speaker:recovery time objective because it's based on finances, right?
Speaker:It's based on, uh, you know, like if it's a business, it's based on
Speaker:how much money are we going to lose.
Speaker:While we are down, right?
Speaker:Um, if it's a governmental organization, it's based, it,
Speaker:it's very different, right?
Speaker:It, it, it's more along the lines of how much damage to our organization
Speaker:like reputationally will happen based on how long we're down.
Speaker:And also how much more difficult will it be to redo the things that
Speaker:we, you know, to, to do the things we had to do while we were down.
Speaker:Uh, you might have to switch to, you know, to paper in the meantime.
Speaker:And, uh, so you, but, but the point is, all of these calculations are
Speaker:things that the, the business or management should be doing, not
Speaker:those, uh, in charge of backups.
Speaker:Uh, what role do you think the, the, the backup people play in determining,
Speaker:uh, recovery time objective?
Speaker:I think it is basically to figure out, okay, based on what the business has asked
Speaker:for, say they come back and say, okay, my Reto recovery time objective is one
Speaker:day.
Speaker:Based on that, here are some options that we can do technology wise,
Speaker:and I think their goal is to come back and say, okay, here's how much
Speaker:it will cost you if you want to support that recovery time objective
Speaker:Yeah.
Speaker:And, and you know, in the very beginning this is gonna be ballpark numbers, right?
Speaker:Um, well the first thing I would say is you come back and you go, okay,
Speaker:you've asked for A, we do B, right?
Speaker:We do a times four.
Speaker:Um, right.
Speaker:So, well, let let me ask you this.
Speaker:Why do you think, uh, I have, I have my opinions, uh, I'm
Speaker:curious, why do you think.
Speaker:Most organizations, if they have an RTO or even if it's poorly documented,
Speaker:et cetera, et cetera, et cetera.
Speaker:If we haven't agreed upon RTO, why are most organizations
Speaker:completely unable to meet that RTO?
Speaker:Well, the biggest thing is they probably haven't tested.
Speaker:To understand like is it actually like, and that's why I said like when you
Speaker:asked the definition of RTO, right?
Speaker:It's your desire, it's your objective.
Speaker:It doesn't mean what you will
Speaker:actually hit because there are so many other things involved.
Speaker:Like we talked about.
Speaker:Maybe part of your RTO is just bringing back the data or the
Speaker:application, but then what about.
Speaker:Like making sure I'm able to procure the servers to recover or get those
Speaker:up and running.
Speaker:Uh,
Speaker:maybe I,
Speaker:need to bring up active directory or Intra or whatever it's called
Speaker:now, whatever Microsoft calls
Speaker:it's, it was rebranded while we were on this recording.
Speaker:Yeah.
Speaker:right?
Speaker:But all of these other things, which maybe you don't necessarily have control
Speaker:of, and maybe you're only thinking as a backup admin of, Hey, I need to recover
Speaker:the application or just the data, restore
Speaker:Well, and also in addition, and again, let, let's make sure that
Speaker:we, we, we say that the recovery time objective has been met or not
Speaker:met when you, when the application.
Speaker:Is fully up and running and available for use by the user, right?
Speaker:It's not, oh, well I did my restore.
Speaker:We got a four hour R-T-O-I-I did my restore and it only took four hours.
Speaker:No, the question is, is the application back up and running?
Speaker:And that includes it, like I said, any hardware, uh, procurement, which, which
Speaker:hopefully you're doing in the cloud.
Speaker:But any hardware procurement, any.
Speaker:Stuff you gotta do.
Speaker:And if we're talking about ransomware, all of the, the, the stuff you gotta
Speaker:do to make sure that the, the server is ready to, uh, be restored and the
Speaker:restore, depending on what type of thing you're recovering from, the
Speaker:actual restore may be the smallest part of the, uh, recovery time.
Speaker:Actual is the term that we use.
Speaker:Based on your sort of consulting,
Speaker:Yeah.
Speaker:Mm-hmm.
Speaker:this right, what would you estimate is that ratio between time to
Speaker:actually restore the data or the application versus the end-to-end?
Speaker:RTO
Speaker:and I?
Speaker:I'm
Speaker:Yeah.
Speaker:If, if we're not talking about ransomware, it's like 80 20.
Speaker:Right.
Speaker:Uh, meaning 80% of the time spent doing the ba the Restore, the other 20%
Speaker:in a modern day scenario where we're probably, uh, gonna do this in the
Speaker:cloud so that we can, you know, snap our fingers and we have the hardware that
Speaker:we need, uh, we're doing the Restore.
Speaker:It's well, you know, well tested, uh, although often it is not right.
Speaker:And then there's some amount of time to do some initial functionality
Speaker:testing to make sure that all the dependencies have been met.
Speaker:And then, um, you know, and then we're, we're ready to roll.
Speaker:Right?
Speaker:So I'd say it's like 80 20 in a, in a ransomware scenario,
Speaker:it's like, you know, 10 90,
Speaker:Yeah,
Speaker:right?
Speaker:yeah,
Speaker:gonna spend most of your time making sure that you're recovering
Speaker:to a, uh, pristine environment.
Speaker:yeah.
Speaker:And I think that's one thing you just touched upon in your previous statement,
Speaker:which was like, sometimes things change.
Speaker:And we are talking and sort of like understanding why do most people's
Speaker:RTOs not meet what is expected?
Speaker:Do you wanna touch on some
Speaker:Well,
Speaker:I know
Speaker:yeah, so I'm gonna say the number one reason is that they simply don't have a
Speaker:backup or disaster recovery system that is capable of meeting that RTO just period.
Speaker:Hmm.
Speaker:They didn't do, uh, and, and this is.
Speaker:This is quite possibly they, I, I remember when, when I, you know,
Speaker:back, go back 30 years, right?
Speaker:That, that we knew we abs everyone knew that our backup system wasn't
Speaker:anywhere near capable of meeting the RTOs that we had discussed.
Speaker:Even though we didn't use that term back then, I, I'm sure the term was
Speaker:available, but I didn't use it and the.
Speaker:We just knew that it, it just wasn't, wasn't possible.
Speaker:Right.
Speaker:Um, I mean, in some cases it was laughably impossible, right?
Speaker:Um, and that we had servers that it took us a week.
Speaker:It took, took us a week to get a full backup.
Speaker:Okay.
Speaker:Like, how are we gonna meet a four hour R-P-R-T-O if it takes
Speaker:a week to do a full backup?
Speaker:Right?
Speaker:Uh, and by the way, our next episode we're gonna talk about
Speaker:RPO recovery point objective.
Speaker:So it's a, it's very much a sister episode to this, but that, I'd say
Speaker:that's the number one reason is that people's backup systems, and, and again,
Speaker:I used the term backup very broadly.
Speaker:Anything that.
Speaker:That brings the server back to the way it looked, you know, before the
Speaker:disaster is a backup system to me.
Speaker:There.
Speaker:Well, you just use 'em for different purposes, disaster recovery, et cetera.
Speaker:Um, but that's the number one reason.
Speaker:The other reason that's very closely related to that is
Speaker:that they have no idea, right?
Speaker:They, they haven't tested right.
Speaker:They, they've got a system, they've got a clue.
Speaker:Right.
Speaker:And they're like, oh, well it takes us, you know, three hours
Speaker:to, or four hours to back up.
Speaker:Therefore, we should be able to do a four hour restore.
Speaker:There's a lot of ifs in that, right?
Speaker:Yeah.
Speaker:the, the other thing is a, as you know, restores often a
Speaker:lot slower than backup, right?
Speaker:For a number of reasons that, you know, are all over the
Speaker:place that, that they, they
Speaker:the
Speaker:ahead.
Speaker:is like incremental
Speaker:Yeah.
Speaker:Yeah.
Speaker:Forever incremental.
Speaker:Right.
Speaker:That I would probably say is the biggest
Speaker:Yeah.
Speaker:That you're, that you're piecing together a restore from many, many, many stuff.
Speaker:Uh, I, I think if you're, if you, if you have a proper design that alone
Speaker:shouldn't, um, you know, impact you.
Speaker:If you are doing a, a, you know, sort of the old school full restore, followed
Speaker:by each incremental restore, and that means you're actually restoring some
Speaker:files multiple times, then Absolutely.
Speaker:Right.
Speaker:If you're doing, if you, if you have a, if you have a. A system that is
Speaker:properly de uh, developed, right?
Speaker:That fixes that issue where if we know a file has changed, then we're not
Speaker:gonna restore that file multiple times.
Speaker:We're just gonna restore the latest version of the file.
Speaker:If you have that, that's not really the problem.
Speaker:But you do have the issue of ddu, right?
Speaker:You have the, the DDU tax that quite often really rears its ugly
Speaker:head when we go to do a restore.
Speaker:Now why would that be?
Speaker:Why would that be the case Prasanna?
Speaker:Because when you're dup Deduplicating data, throwing away a whole
Speaker:Mm-hmm.
Speaker:But the problem is when you need to read it, you're basically doing random reads
Speaker:across the entire system in order to be able to recreate that single file.
Speaker:Because you might have old blocks from one part of the disc and a
Speaker:different part of the file from a different part of the disc.
Speaker:And so you end up with all these random reads, which as we know, our disc
Speaker:drives are not very good at doing random
Speaker:Yeah, it, it's the, it's the ultimate fragmented file system, right?
Speaker:Uh, you are just, you're absolutely guaranteeing that everything
Speaker:you need is everywhere, right?
Speaker:Um, and, um, the, that, that is absolutely one of the cases.
Speaker:And, and if we're coming from tape.
Speaker:Right.
Speaker:Uh, which is probably less likely for most people.
Speaker:But if we're coming from tape, then we really do start talking about that,
Speaker:the, the, the forever incremental stuff and, uh, you know, because you're having
Speaker:to load all these tapes, uh, but also there's a network can get in the way.
Speaker:There's also, depending on what.
Speaker:Raid, uh, we're using, right?
Speaker:If we're using RAID and, and we're using raid, right?
Speaker:Everybody's using raid.
Speaker:Yeah,
Speaker:depending on whether or not you opted for, does anybody
Speaker:wrap up for RAID 10 these days?
Speaker:I don't know.
Speaker:I
Speaker:I don't think so.
Speaker:I think everybody does raid six, right?
Speaker:Or, or, or something.
Speaker:So raid, dual parody or whatever, and that has a right penalty, right?
Speaker:So for a number of reasons, restores are often slower than
Speaker:backup and you will never know.
Speaker:Until you do what Prasanna.
Speaker:You
Speaker:Exactly.
Speaker:And again, go ahead.
Speaker:Well, I'm gonna bring, I'm gonna bring out a story.
Speaker:Sorry.
Speaker:Uh, going,
Speaker:okay.
Speaker:going back to going back to my first, you know, the first time that things were
Speaker:really, really bad was that time when we had a new backup system and we had
Speaker:used a compression feature on, on the way in, and it was software compression.
Speaker:And long story short, when we went to, uh, there, there was a. There
Speaker:was the, um, we went to do the first major restore after we needed it.
Speaker:Right?
Speaker:We, we didn't test restores, we only tested backups.
Speaker:And uh, when we went to do it, uh, the, it was a DD, s and it was like,
Speaker:blink, blink, long pause, right?
Speaker:Blink.
Speaker:Blink.
Speaker:And once we called into support and they were like, yeah, it's working as design.
Speaker:And basically we had not.
Speaker:Tested this at all.
Speaker:And not only was it slow, it wouldn't work.
Speaker:It, it was just literally without going, without taking too much time,
Speaker:it just literally wouldn't work.
Speaker:Right.
Speaker:And, uh, unless we like tripled the size of ram or something.
Speaker:Right.
Speaker:And, um, so yeah, you, you just do not know how your system is going to
Speaker:perform until you go to do a restore.
Speaker:Yeah.
Speaker:And.
Speaker:One thing similar to that story is you should also do a realistic restore test.
Speaker:Don't just be like, oh, I'm gonna just restore a file, or
Speaker:I'm just gonna restore a vm.
Speaker:I'm good
Speaker:Yeah.
Speaker:Right?
Speaker:Because that may not be a realistic scenario for when you have to
Speaker:recover a full application suite or your entire environment.
Speaker:So make sure you're doing the right little type of
Speaker:Yeah, absolutely.
Speaker:It should,
Speaker:any
Speaker:it should be whatever the, whatever the thing is that we're
Speaker:setting the RTO for, right?
Speaker:Uh, you don't have restore the entire environment, but you need to do
Speaker:representative restore tests, right?
Speaker:Um, entire servers, entire environments, entire recovery groups.
Speaker:What's a recovery group Prasanna?
Speaker:It is a group of things you need to restore in order for
Speaker:your application to come back.
Speaker:So it might be your database server plus your storage, plus your active
Speaker:directory or uh, system, right?
Speaker:Plus whatever else is needed in order to get that production application back
Speaker:exactly.
Speaker:And so, and, and so that becomes important too.
Speaker:I was actually gonna comment on that, because there's an order
Speaker:of operations you have to do, and so you have to account for that.
Speaker:When you calculate your RTO, it's not like, oh, I can just restore my, uh,
Speaker:database, have it up and running before I have active directory up and running.
Speaker:It's not
Speaker:Right.
Speaker:And, and by the way, that one, one of the things that prompted this episode,
Speaker:uh, Kaseya did a 2025 state of the backup industry, uh, and they said that more
Speaker:than 60% of respondents believed that they could recover under, in, under a day.
Speaker:However, only 35% could actually do that in reality, which is, that's
Speaker:quite, that's quite a, a gap there.
Speaker:Um.
Speaker:Yeah.
Speaker:Another interesting thing was that only 10% of businesses reported
Speaker:no outages in the last 12 months.
Speaker:Uh, which means that 90% tested their backup systems the hard way.
Speaker:Uh, not quite as hard as, uh, our Alaskan friend, but, uh, which for those of
Speaker:you that haven't heard that episode, uh, he tested DR system by deleting
Speaker:the entire surfer for the entire.
Speaker:Data center and then restoring, and that was his first test.
Speaker:And it was like, gee, I hope it works.
Speaker:Don't do it like that.
Speaker:Um,
Speaker:but hey, it worked out
Speaker:yeah, exactly right.
Speaker:Um, and remember, again, going back to the things that fit into the RTO, right?
Speaker:You know, you, you also have to, to include things like
Speaker:detecting that there's a problem.
Speaker:Right, because the RTO clock starts the moment the outage happens, not
Speaker:the moment the restore happens, right?
Speaker:So, uh, the moment you have the outage and then you're like, what's going on?
Speaker:Right?
Speaker:Because so many times the, the symptom that gets your attention has nothing to
Speaker:do with the thing that actually went bad.
Speaker:Right.
Speaker:Uh, I mean, it does have something to do with it, but it's not
Speaker:the thing that went bad, right?
Speaker:So you gotta figure that out.
Speaker:You gotta understand how bad it is if it's a ransomware attack.
Speaker:Again, you gotta figure out, you know, how bad this, you know, how big the scope is.
Speaker:You might have to get approvals, um, you know, all these different things, right?
Speaker:Yeah.
Speaker:And well, just one thing to add to that, because I was thinking
Speaker:about the, uh, what was the.
Speaker:Company,
Speaker:Rackspace.
Speaker:with their hosted exchange.
Speaker:Right.
Speaker:I think one of the things to also consider.
Speaker:Uh, when you're thinking about RTO is order to bring my app back up and
Speaker:running, do I need to restore all my data?
Speaker:an example?
Speaker:Maybe I only need a subset of my data in order for my application to come up
Speaker:and I can solely backfill all my old data that's archived or other things
Speaker:like that, I can still get people up and running and ready to go without
Speaker:waiting for everything to be done.
Speaker:And so there might also be slight nuances depending on the application
Speaker:of what the expectations are.
Speaker:Yeah.
Speaker:The other thing I would say regarding that Rackspace outage, if you're in
Speaker:the middle of your recovery or you're about to begin your recovery, don't
Speaker:change all the rules, right In, in their case, they're like, we tested how to
Speaker:do this recovery, but you know, just before they went to do the recovery,
Speaker:they're like, ah, what if we just move everything over to Microsoft 365?
Speaker:Right?
Speaker:And it's like, oh, well that would mean that we have to like.
Speaker:Basically you, you can't, you can no longer restore the exchange
Speaker:databases directly into the user.
Speaker:Uh, you have to, um, you'll have to restore it and then migrate
Speaker:the data over individually, which is a much bigger process.
Speaker:Much, it's gonna just take much, much longer, and it ended up taking months.
Speaker:You may recall, and there was a, uh, some lawsuits regarding that.
Speaker:So make sure that whatever scenario in which you do, uh, you do the testing,
Speaker:you, you, you have to do the testing.
Speaker:So, um,
Speaker:Yep.
Speaker:I know we talked earlier about, okay, that 24 hour RTO for some businesses,
Speaker:but there are some industries, right?
Speaker:Where even like seconds make a big difference, right?
Speaker:Yeah, definitely.
Speaker:Yeah, definitely like financial trading firms, banking organizations, the more you
Speaker:can attach a real number when you can say one hour of downtime costs us this much.
Speaker:If, if you can do that, if the business can do that.
Speaker:One, $1 billion.
Speaker:Um, yeah, I'm sorry, I, I gotta do the, the pinky, right?
Speaker:Um, if you can do that, the more you can do that, the, the, the, the
Speaker:much more equipped you will be as a, you know, backup and dr. Person.
Speaker:To be able to make enhancements to the backup and recovery system if needed.
Speaker:Right.
Speaker:So let's talk about, uh, some of the things that you
Speaker:could do to close this gap.
Speaker:Obviously, the first, the first thing is if you can have an
Speaker:iterative discussion on, uh, okay.
Speaker:You said you want one minute, we can do 10 hours.
Speaker:Right.
Speaker:Let's figure out, you know, let's get the, let's get the RTO set to somewhere near.
Speaker:Um, you know, uh, realistic that we, that we can actually meet, right?
Speaker:And you can, you can say, we're gonna set the RTO for now at this.
Speaker:We're gonna move towards, uh, a better RTO at a later, a later time.
Speaker:Um, any thoughts there?
Speaker:Yeah, no, I think that makes sense because it also takes time to implement
Speaker:new technologies because if, say for instance, your RTO is 10 hours based
Speaker:on your existing infrastructure, and now they're like, oh, we need
Speaker:an hour or 10 minutes, right?
Speaker:You're now going to need to think of something very different that's
Speaker:gonna elongate the time it takes, and so it really is important to
Speaker:ask the question, do you need.
Speaker:Yeah.
Speaker:Yeah.
Speaker:The help, just,
Speaker:start with
Speaker:yeah, everybody's gonna say zero and zero for your RTO and RPO, right?
Speaker:So it is just, you gotta justify it and you gotta say, well, if it's
Speaker:really worth $10 million every minute.
Speaker:Then you need to give us, you know, whatever the number is.
Speaker:Right.
Speaker:Um, so then if we're gonna do testing, if we can automate that testing, the more we
Speaker:can automate testing, the more the, you know, the better that things are gonna be.
Speaker:Doing it very regularly, doing it small, it's sort of like, uh, the
Speaker:same as opinions that I have on testing your, it's kind of like in
Speaker:cybersecurity where you have a company that actively tries to send phishing.
Speaker:You know, phishing tests to the users to see if, um, to
Speaker:see if they fall for it, right?
Speaker:Same thing here, where over there, more frequent, smaller bite-sized testing is
Speaker:preferred to the once a year I have to do this and it takes two hours, right?
Speaker:Keeping it on the mind, keeping a recovery mindset is really important.
Speaker:So I think regular Dr. Drills is part of that.
Speaker:And I think having the regular DR drills is important because if something
Speaker:changes in your environment, you
Speaker:Yeah,
Speaker:rather than sort of that
Speaker:exactly.
Speaker:And then there's also the concept of chaos engineering, um, whi, which, you
Speaker:know, like the chaos monkey, right?
Speaker:You wanna talk about that?
Speaker:Yep.
Speaker:Yep, yep, yep.
Speaker:Yeah.
Speaker:So.
Speaker:just try breaking things in your environment, see what happens and
Speaker:see did I miss something that I wasn't backing up as an example.
Speaker:Maybe you forgot a backup active directory, and now in order and something
Speaker:happened to it, you lost all the data there and you realize, oh, I can't recover
Speaker:my application because I don't actually have a backup of active directory.
Speaker:And so you start to understand the dependencies in your
Speaker:environment and point out sort of.
Speaker:Issues that you might not foresee different failure scenarios or like if
Speaker:the network goes down or The other thing is, it's not even just technology, right?
Speaker:It might be even a person.
Speaker:I know Curtis, you used to talk about at the bank doing
Speaker:testing with someone who did not
Speaker:Yeah, exactly.
Speaker:Exactly.
Speaker:Uh, well back then, I didn't write the book, but Yeah.
Speaker:Yeah.
Speaker:I wa I wasn't Mr. Backup yet.
Speaker:I wa I was, I was Mr. Backup junior.
Speaker:Um, and, and then, you know, the idea is if, if you do this, the whole, the
Speaker:whole idea of doing this on a frequent basis to get better at it, to create
Speaker:and, and, and improve your runbooks, to create and improve decision trees.
Speaker:What do we do when this happens?
Speaker:Right?
Speaker:Um, we also, we didn't talk, uh, at all about, um, uh, tabletop exercises.
Speaker:Those are, uh, obviously a great, uh, uh, tool here.
Speaker:Uh, you know, do 'em at lunch.
Speaker:Do 'em so that they're not like, so again, do them frequently in smaller.
Speaker:We're not where the whole world isn't, you know, don't do 'em like just before
Speaker:your, your performance review time.
Speaker:Which makes them like much more stressful.
Speaker:Do them frequently and, and, and have fun at it.
Speaker:And, and then learn from it and improve your runbooks,
Speaker:improve your decision trees.
Speaker:Cross train your teams.
Speaker:Do what we were talking about before.
Speaker:Don't use the, don't rely on one person.
Speaker:Uh, you know, you know, because that one person might not be available.
Speaker:Uh.
Speaker:You know, uh, at that time, right?
Speaker:And then measure, again, measure and report reality.
Speaker:Here's where we are.
Speaker:Make sure that everyone is on the same page.
Speaker:We've asked for this, we've agreed to this for now, we would like to get to here.
Speaker:Here's where we are.
Speaker:Report those gaps.
Speaker:Uh, and then let, let business leadership decide.
Speaker:What to do about that.
Speaker:It is not your responsibility.
Speaker:Right.
Speaker:I do remember like
Speaker:Yep.
Speaker:bad because the backup system wasn't capable, but it's like, I'm not magic.
Speaker:All I can do is make recommendations.
Speaker:Right.
Speaker:And I do remember, by the way, I do remember when I, and I had a shell
Speaker:script that was doing everything right.
Speaker:We had like, I dunno, like 50 servers and I was doing all this with like a, a Unix.
Speaker:You know, shell script.
Speaker:Right.
Speaker:And at some point I couldn't, but, and all of it was based on that each
Speaker:server could fit on a tape drive.
Speaker:And then one day we bought a server that it didn't fit on.
Speaker:50 tape drives, right?
Speaker:On 50 tapes.
Speaker:And, and it, it just, that and other servers that weren't
Speaker:quite that bad, it just broke.
Speaker:It broke my ability to do it right.
Speaker:And I said, I'm just not, I can't do that.
Speaker:And then, and I just went to the boss and I said, Hey, I can't do this.
Speaker:And she said, well, aren't there like commercial products that do this,
Speaker:that we can like spend money on?
Speaker:Oh, because you flipping
Speaker:I was, I was flipping, I was flipping out what I was doing,
Speaker:and I, I remember feeling like a failure because I couldn't fix this.
Speaker:Right.
Speaker:I wasn't that good at scripting.
Speaker:I, I, I don't think anybody could deal with the 50, you know, the right.
Speaker:But I, I remember at the time feeling like a failure, and I guess I'm
Speaker:saying don't try not to feel that way.
Speaker:Right?
Speaker:Go and give an honest assessment of where you're at.
Speaker:And, um, even if you're the one that put you in that scenario, right.
Speaker:Um, I, I remember another story of a guy that told me that he bought a
Speaker:particular vendor's DDU product that had a 90% DDU tax, meaning that the restore
Speaker:speed was 10% of the backup speed.
Speaker:And, and he's like, I don't know what to do.
Speaker:I'm like, well, you have to tell your boss.
Speaker:And he's like, I'm the one that recommended the system.
Speaker:It's okay.
Speaker:You gotta be honest.
Speaker:You get, because you can't get there from, you can't get there from here
Speaker:if you, if you don't address that.
Speaker:Right?
Speaker:Yeah, so one question I wanted
Speaker:Yeah.
Speaker:Curtis.
Speaker:There is a term though, right?
Speaker:So you
Speaker:Mm-hmm.
Speaker:and then you have like, okay, you're doing these tests, you
Speaker:actually figure out like, okay,
Speaker:Yeah.
Speaker:long it takes.
Speaker:There's a term for that though, and I don't think it's
Speaker:No, it's nowhere near.
Speaker:Yeah,
Speaker:it's
Speaker:nowhere near as As.
Speaker:Yeah, thanks.
Speaker:Nowhere near as widely used as RTO.
Speaker:Right?
Speaker:And it's RTA recovery time actual.
Speaker:Some people say recovery time reality, it doesn't matter.
Speaker:Just have a different term.
Speaker:Don't say.
Speaker:Our RTO is an hour when what you're saying is this is how fast you can
Speaker:recover your RTO is your objective.
Speaker:Your other thing, I don't care what you call it, recovery time actual is good.
Speaker:This is where we are.
Speaker:The difference between your recovery time actual and your recovery.
Speaker:Time objective is the gap that you need to address with whatever changes in
Speaker:process, documentation, or quite possibly enhancements to your backup system.
Speaker:Yeah.
Speaker:All right.
Speaker:I think, I think we've covered enough.
Speaker:And then next, next week we're gonna recover recovery
Speaker:point objective, which is.
Speaker:It It is.
Speaker:Yeah.
Speaker:It's weird.
Speaker:Like all the, yeah.
Speaker:Uh, and this is going to be basically how much data we agree that we
Speaker:can lose, which is something very different than how long the system is.
Speaker:Yeah, exactly.
Speaker:We should, we should recover in zero minutes and we should lose zero data.
Speaker:We all agree.
Speaker:That would be amazing.
Speaker:Uh, it's also not gonna happen.
Speaker:Well, thanks for, uh, thanks for joining me again.
Speaker:I am.
Speaker:Enjoy these.
Speaker:I
Speaker:Yeah, I'm glad.
Speaker:Yeah.
Speaker:more.
Speaker:I think now, now that we, you've figured out your world and I figured out my
Speaker:new world, uh, we, we should be good.
Speaker:And, uh, thanks to the listeners you're, why we do this.
Speaker:Uh, that is a wrap.