You found the backup wrap up your go-to podcast for all things
Speaker:backup recovery and cyber recovery.
Speaker:In this episode, we jump into disaster recovery testing, and trust me, you don't
Speaker:wanna learn these lessons the hard way.
Speaker:I've got some wild stories about DR.
Speaker:Tests gone wrong.
Speaker:Including one from my early days at a bank that'll make you cringe.
Speaker:My co-host persona, and I break down exactly how to approach DR.
Speaker:Testing the right way, starting with the basics and working your way up.
Speaker:We'll tell you why non-destructive testing is absolutely critical.
Speaker:Seriously, you don't wanna blow up your production environment just to test Dr.
Speaker:And how to set realistic success criteria that won't make you cry Another episode
Speaker:from The Lessons From the Trenches.
Speaker:I hope you like it.
Speaker:By the way, if you don't know who I am, you're a first time listener.
Speaker:I'm w Curtis Preston, AKA, Mr.
Speaker:Backup, and I've been passionate about backup and recovery for over 30 years.
Speaker:Ever since.
Speaker:I had to tell my boss that we had no backups of the production
Speaker:database that we had just lost.
Speaker:I don't want that to happen to you.
Speaker:That's why I do this podcast.
Speaker:On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.
Speaker:This is the backup wrap up.
Speaker:Welcome to the show.
Speaker:Hi, I am your host, w Curtis Preston, AKA, Mr.
Speaker:Backup, and if you could take just a quick moment to subscribe
Speaker:or follow so that you'll get our great content, that would be great.
Speaker:I am sitting here with none other than the guy who's very concerned.
Speaker:About my obsession with serial killers lately, or at least one serial killer.
Speaker:How's it going?
Speaker:How's it going?
Speaker:Persona?
Speaker:I am good, Curtis.
Speaker:Yeah,
Speaker:about me?
Speaker:so I think after this you should watch Hannibal,
Speaker:the TV show, not the movie.
Speaker:They both are good, but
Speaker:Right?
Speaker:Right?
Speaker:And then we could discuss about whether people taste differently
Speaker:depending on what they eat.
Speaker:that's always something you talk about.
Speaker:Uh,
Speaker:your latest obsession is Dexter.
Speaker:Yeah.
Speaker:Which, which, for the record, I'm rewatching, right?
Speaker:I, I enjoyed Dexter when it was on and when I had to wait a week, right?
Speaker:going back to those old days
Speaker:oh Lord,
Speaker:I, you know, I'm, I'm kind.
Speaker:I think you're, aren't you?
Speaker:Like, don't you watch you, you do not watch shows when they're on.
Speaker:You wait until they're done and then you binge them,
Speaker:right?
Speaker:Yeah, for
Speaker:the most part, although these days I don't actually have like broadcast tv,
Speaker:Right.
Speaker:it's whatever's available on Netflix or Amazon or Take your pick.
Speaker:We, we, we actually have YouTube tv, so we have broadcasts and there's
Speaker:some shows that we watch on there.
Speaker:Um, but there shows where, like, you know, this episode, that episode,
Speaker:they're, they're all the same, you know?
Speaker:And
Speaker:Yeah.
Speaker:you don't, it's not like, uh, it's not like
Speaker:Dexter where there's an ongoing
Speaker:so here's the funny thing, right?
Speaker:So it was, it started off first on.
Speaker:tv.
Speaker:Right.
Speaker:And like you said, you had to wait a
Speaker:Showtime.
Speaker:Yeah.
Speaker:And then what I used to do, right, so I didn't have Showtime,
Speaker:Mm-Hmm.
Speaker:so I had to wait for it to come out on Netflix DVDs,
Speaker:Oh,
Speaker:right.
Speaker:I would request the DVDs.
Speaker:And of course I was cheap, frugal, however you want to say it, right?
Speaker:And so I had like the two DVD plan, so I'd request like two DVDs, right?
Speaker:So you'd get like six episodes.
Speaker:Yeah.
Speaker:Five or six episodes, you'd binge watch those and then you'd send them back.
Speaker:And then you'd have to wait a week to then get the next set of DVDs
Speaker:so funny.
Speaker:in
Speaker:order to watch it.
Speaker:And that's I think, how I ended up watching Dexter.
Speaker:And I think Breaking Bad might've been the same way too.
Speaker:Yeah, you're just, you're cheaper than me.
Speaker:Dexter is currently on Netflix,
Speaker:so, um, yeah.
Speaker:Anyway, so, and you could do, you can watch all these things while you do.
Speaker:Disaster recovery testing.
Speaker:'cause there's a lot of time, there's a lot of time when you do DR testing,
Speaker:there's a lot of downtime, right?
Speaker:You sit there and you stare at the screen.
Speaker:Um, and, um, I'm gonna, I'm
Speaker:gonna, I'm gonna start out a story.
Speaker:What's that?
Speaker:but can you really?
Speaker:What,
Speaker:I understand like if you're watching Dexter or pick one of these
Speaker:very, the shows that pull you in,
Speaker:uh huh.
Speaker:Would you actually be focused or would your DR testing basically balloon like
Speaker:10 times the normal amount of time?
Speaker:Because you're
Speaker:like, oh yeah,
Speaker:I forgot to get back to that.
Speaker:well, it de, you know what, it's gonna depend on the type of DR test
Speaker:you do because some DR tests, there's a lot of waiting, there's a lot of,
Speaker:I'm gonna start the restore part, and then I sit there for many, many hours.
Speaker:And if you got one of those, then.
Speaker:Doesn't really matter whether you're focused or not, as long
Speaker:as somebody's keeping an eye on the, uh, percentage done.
Speaker:And I'm gonna start this with a story from back in the day.
Speaker:The first backup, uh, the first restore test I ever did.
Speaker:The first, like, well, at least the first one I can really remember.
Speaker:And this was when we had, um, you know, I was at the bank.
Speaker:And which MBNA, which at the time was the second largest credit card corporation,
Speaker:and I was in charge of backups.
Speaker:And we had, I had talked the boss into moving to what would become
Speaker:the first of many commercial backup products that I had, uh, used.
Speaker:And that product was a product called SMarch, which as I've mentioned
Speaker:before, should have been called SM Back because it was not an archive
Speaker:product, it was a backup product.
Speaker:Well, they were out, they were out of, uh, Minnesota area and,
Speaker:and we had converted to them, but I like a good.
Speaker:Good little backup guy.
Speaker:I had done like a parallel implementation and so I was still
Speaker:running my old dump tapes and I was running this new, uh, fancy tool.
Speaker:And one of the things that this tool had was built-in, uh, compression.
Speaker:Um, I wasn't using the compression on the tape drives.
Speaker:I was using compression in the
Speaker:software,
Speaker:uh, to compress, uh, to go to the tape drives.
Speaker:So we had our first major.
Speaker:Failure, uh, file server, HP FS oh one.
Speaker:I still remember the name of the server
Speaker:just like get burned into your
Speaker:Yeah.
Speaker:Yeah.
Speaker:And, and this is related to another story that we, to a friend of mine that,
Speaker:that I've referred to before where she was a consultant and she accidentally
Speaker:basically re, this was a self-inflicted disaster where a consultant was trying
Speaker:to clean up home directories and she just did a really, really good job
Speaker:of cleaning up all the home directors.
Speaker:So I, I put my, uh, SMR tapes.
Speaker:Uh, you know, in my front pocket and I put my backup tapes in my back, my
Speaker:back pocket, and I went down there like Mighty Mouse here comes said the day.
Speaker:And um, some of you will get
Speaker:Not available on iTunes, I
Speaker:Not available at iTunes.
Speaker:And we went in there and I put, these were DDS, uh, tapes, right?
Speaker:And these, 'cause this, we were all hp, HP loved DDS.
Speaker:And so I popped into DDS tape, I pulled up my SMR software
Speaker:and I started the restore test.
Speaker:And, well, it wasn't a restore test.
Speaker:It was, it was, I was testing, I was testing in anger, uh, you know, we
Speaker:were testing like this was for real.
Speaker:And, and I had not done any actual testing.
Speaker:I.
Speaker:And so I kicked off the Restore and um, it.
Speaker:I'm, I'm watching and I, I, I, you know, like a good little Unix boy.
Speaker:I had a, for a while loop
Speaker:running and it was doing a, a DF on the, on the, um, on the, you know, to
Speaker:display the size of the file system.
Speaker:And I'm watching, and like a long time was going by and there
Speaker:was no change in the size of
Speaker:the file system.
Speaker:And I was like, this is weird.
Speaker:And then I went over and I looked, I just.
Speaker:Just outta curiosity, I looked at the
Speaker:tape drive, right.
Speaker:You know, it's kind like, it's kinda like your car dies, you know, you
Speaker:open up the hood like, I have any idea what's going on in inside there.
Speaker:You know, look in there
Speaker:and I see the, I see the light on the tape drive and, and it
Speaker:goes, B blink, B blink, blink,
Speaker:one 1002, 1003, 1004,
Speaker:blink, blink.
Speaker:Right?
Speaker:And there were these giant pauses in between.
Speaker:The blinks.
Speaker:And so I'm like, that's strange.
Speaker:It always blinks when it's, you know, when it's reading or writing data, right?
Speaker:So I called up, I called up the guys and I said, Hey,
Speaker:The
Speaker:vendor.
Speaker:Yeah,
Speaker:vendor.
Speaker:Yeah.
Speaker:Called it the vendor.
Speaker:What's going on here?
Speaker:They said, well, by any chance did you use the compression feature?
Speaker:And I said yes.
Speaker:Yes I did.
Speaker:They go, yeah.
Speaker:So let us explain to you how the compression feature works.
Speaker:So when we're backing up files, um, basically, um, we do the equivalent
Speaker:of compress minus CI think it was, to, to send the result to standard out
Speaker:and it send it straight to the tape.
Speaker:But we don't know how to do that on the way back in.
Speaker:And so what we do is.
Speaker:We, uh, we, we read the tape, we read the entire file that we're gonna restore
Speaker:into, and we restore it to temp, and then we run uncompress in place in
Speaker:temp, and then we move the res, the
Speaker:uncompressed restored file.
Speaker:To where it's gonna go.
Speaker:And we do that one file at a
Speaker:Oh,
Speaker:because we're concerned that we're gonna fill up temp.
Speaker:And I'm like, oh.
Speaker:So like if temp is, if I have a single file that's bigger than
Speaker:temp, it's just not gonna work.
Speaker:They're like, yeah.
Speaker:And they're like, if this is a concern, we suggest perhaps
Speaker:you don't use this feature.
Speaker:it would've been helpful to know before I needed
Speaker:Right, right.
Speaker:And so thank God I had the, I had the backup tape in my backup
Speaker:pocket and I pulled 'em out.
Speaker:I was like, okay, we're just using, you know,
Speaker:uh, dump here.
Speaker:And, um, and we restored and everything was basic, right?
Speaker:Um, and luckily I had backups from the previous night.
Speaker:And really the moral of this story is.
Speaker:Don't do that.
Speaker:Right.
Speaker:Don't test your backups before
Speaker:you need them.
Speaker:Right.
Speaker:Do a DR test and that's what we're talking about today.
Speaker:Do a DR test before you actually need to do DR and, and thi this May, we
Speaker:will see I the, you know, sometimes when we have these episodes, persona,
Speaker:you know, you know, we have these conversations beforehand and I'm
Speaker:like, I don't know if we can fill up an entire episode over this topic.
Speaker:We said that I think over the last one.
Speaker:Yeah,
Speaker:And it ended up not being a problem at all.
Speaker:This is not one of those episodes.
Speaker:This is one of those episodes where I think we might go along and we'll end
Speaker:up turning this into two episodes, um, because there's a lot to talk about here
Speaker:because before we do any testing, right, um, what do you think we need to do?
Speaker:Well, you, well, you need to understand,
Speaker:there.
Speaker:By the way, there's probably no wrong answer here.
Speaker:Uh, but
Speaker:well, well, I think before you could
Speaker:un unless your answer is, unless your answer is nothing,
Speaker:um,
Speaker:no, no.
Speaker:So I was, I was going through my head.
Speaker:I was thinking even before you get to testing,
Speaker:Yeah.
Speaker:you need to understand what are the business requirements for how quickly
Speaker:you need to bring up that site, which in which actually you don't
Speaker:necessarily think about as testing, but that's actually part of your
Speaker:backup system design before you even get to the testing part.
Speaker:Yeah.
Speaker:So we, we have to agree on what success would be, right?
Speaker:And we have to agree on obviously what we're gonna test, but we have to
Speaker:agree on the parameters of that test.
Speaker:And so, um.
Speaker:What?
Speaker:What do you think are going what?
Speaker:Go ahead.
Speaker:You and I was just going to ask about like your parameters, right?
Speaker:What are
Speaker:those parameters in my mind?
Speaker:Some of those are how quickly do I need to be able to bring up what the scope is,
Speaker:right.
Speaker:Am I failing over and
Speaker:trying to recover a file, an application, a data center, right?
Speaker:What am I looking to actually test?
Speaker:So it depends what you're trying to test.
Speaker:And what the extent is that you want to test.
Speaker:And also, I think the other thing is how close do you want to get to testing
Speaker:an actual disaster as part of that?
Speaker:Right.
Speaker:Um, and I, I would say that there are different kinds of DR tests and, um.
Speaker:The, there's, and, and if you haven't done any, I would say let's start small, right?
Speaker:If we've never, if, if we've never done a DR test of any kind, I would
Speaker:probably start with, what are you
Speaker:I was thinking about the test you don't wanna try is who is the guy in Alaska?
Speaker:Oh yeah,
Speaker:yeah.
Speaker:Okay.
Speaker:We definitely have to put a link to that episode.
Speaker:That was the most amazing episode ever.
Speaker:In fact, you know what?
Speaker:We'll pro we'll probably replay it over, uh, over the holiday break, Uh,
Speaker:by the way, we love him and it, and, and it had a happy ending, but oh my God.
Speaker:What, what a
Speaker:nightmare,
Speaker:think he was rebuilding a raid array, if I recall.
Speaker:He was swapping the disks around
Speaker:it was self-inflicted in that he said, I want to move the discs around.
Speaker:Because they were like.
Speaker:different sizes I
Speaker:Yeah.
Speaker:Yeah.
Speaker:Like, and he's like, the only way I can do this is to just, to just
Speaker:wipe everything and start over.
Speaker:And he is like, yeah, okay.
Speaker:So that's what I'll do.
Speaker:I'll just wipe everything and then I'll restore everything.
Speaker:My s my backup system works.
Speaker:And this is how he tested his backup system.
Speaker:Please do.
Speaker:He definitely learned some things along the way, but, um, yeah, so don't do
Speaker:Don't do that.
Speaker:Yeah,
Speaker:Don't do that.
Speaker:Uh, but
Speaker:learn from his example.
Speaker:Listen to that episode and, and,
Speaker:and have
Speaker:a heart attack as you're listening.
Speaker:because the thing that ran through my head was someone who's never done DR
Speaker:testing would've been like, yeah, I'm just gonna shoot in the head my production
Speaker:site with all the applications, and I make sure that it, or just test it out
Speaker:and see does it fail over properly?
Speaker:Yes.
Speaker:The key here is non-destructive DR testing.
Speaker:Right?
Speaker:Um, and so, yeah, so, so to go back to like setting the scope, I, if this
Speaker:is your first time doing DR testing, I would set the scope as small as possible.
Speaker:In fact, if you've never done DR testing, I would not even be doing DR testing.
Speaker:I would just be doing restore testing
Speaker:and
Speaker:What's the difference, Curtis?
Speaker:What's that?
Speaker:What's the
Speaker:the question is, the question is are we going to declare a disaster
Speaker:and say that this site is down in some way, um, and then we're
Speaker:gonna fail over to another site?
Speaker:Or are we simply just going to bring another server online?
Speaker:Right.
Speaker:De depending on how you define, uh, some, and, you know, depending on the
Speaker:day of the week and the day of the year in which year it is, I have called.
Speaker:Every time you have to restore, uh, a server of any kind, a disaster.
Speaker:It's just a disaster of different levels, right?
Speaker:So if the server, if a server caught on fire, that's a disaster,
Speaker:right?
Speaker:Um, and, or, or if a server just died, right?
Speaker:Or if, like the, the story in the beginning, I think that was a disaster
Speaker:because it took out a production workload,
Speaker:right?
Speaker:So I would say perhaps start with a single.
Speaker:Production workload and restore it.
Speaker:First off, just restore it, you know, like in place, not, not in place, not
Speaker:in place, I meant like within the data
Speaker:center or within whatever computing environment you're using.
Speaker:And then start talking about VPNs and
Speaker:you know, that sort of stuff.
Speaker:So, uh, that would be defining the scope as small as you can
Speaker:for the first test that you can.
Speaker:so maybe like, uh, like maybe it might be a directory within a file server.
Speaker:If you've never done
Speaker:any restore testing, yes.
Speaker:A simple directory within a file server, does our backup system work at all?
Speaker:Right.
Speaker:Um, and then the next, the next is going to be some type of
Speaker:recovery of an entire server.
Speaker:Hopefully you have that server virtualized because doing that is going to be,
Speaker:uh, obviously much easier assuming you're treating that VM as a vm.
Speaker:Uh, if it's not virtualized, then we we're gonna start going down the
Speaker:level of, um, bare metal recovery,
Speaker:right?
Speaker:BMR By the way, we should probably do an episode on that.
Speaker:I
Speaker:didn't think about that.
Speaker:We should do an episode on that.
Speaker:I know you keep mentioning server as you're talking about this.
Speaker:Would you also qualify that that could be server or an application?
Speaker:thanks for bringing that up.
Speaker:That's why you keep me around, you know?
Speaker:So it depends on what you mean by application, right?
Speaker:Um.
Speaker:You could, these are like the different
Speaker:level.
Speaker:You talk about restoring a directory.
Speaker:I would also look at restoring a single database within a server, right?
Speaker:If that's what you mean by application, then I would say yes.
Speaker:Sometimes when we say application, actually a lot of times when we
Speaker:say application, what we mean is
Speaker:application
Speaker:Multiple
Speaker:on multiple servers and multiple things, all interconnect.
Speaker:And if that's what you're talking about, I'm gonna say no
Speaker:yeah.
Speaker:yeah, I
Speaker:was thinking like a, my,
Speaker:time out.
Speaker:yeah, I was thinking more like restore your MySQL database.
Speaker:Exactly
Speaker:right.
Speaker:Um,
Speaker:table within your MySQL database.
Speaker:Right.
Speaker:yeah.
Speaker:Um, and again, non-destructively,
Speaker:Yeah,
Speaker:right.
Speaker:Um, I.
Speaker:Uh, just, just a, just an entity and you can try all of these things, right?
Speaker:You can try restoring a database.
Speaker:You can try restoring an entire, all of the, all of the databases on a server.
Speaker:You can try restoring a server with the databases, um, and or any other
Speaker:applications that might like a web server.
Speaker:You can try all of these things individually and make sure that
Speaker:you've got those pieces down.
Speaker:That this is about defining the scope.
Speaker:Try the all of the different pieces first and make sure you've got the,
Speaker:the recovery path for each of those parts of your infrastructure down
Speaker:before you decide, okay, we're gonna pretend we're gonna blow up the data
Speaker:center.
Speaker:and the other reason to bring that up, it's important to test these different
Speaker:types of components because there are gonna be different nuances like how
Speaker:you deal with databases and recovering.
Speaker:That is definitely gonna be different than file servers, which is probably
Speaker:also gonna be different than servers.
Speaker:And so it's important to understand the nuances of what is possible and
Speaker:the steps for each one of these.
Speaker:And also the different backup systems.
Speaker:So if you've got.
Speaker:Physical, you know, you've got physical servers that are not virtualized.
Speaker:You've got servers running in Hyper V or VMware or you know, any, any, any sort
Speaker:of on-premises virtualization set.
Speaker:What was the third one
Speaker:Broadcom,
Speaker:brought?
Speaker:VMware,
Speaker:It's, it will always be
Speaker:VMware to me.
Speaker:It will always be world, um, the, um, if you've got.
Speaker:VMs running in, uh, AWS, GCP Azure.
Speaker:If you've got basically all of the different places that you
Speaker:have infrastructure, probably have different backup and recovery
Speaker:methodologies for each of them.
Speaker:And so you should also be looking at testing all of those.
Speaker:And you should be looking at testing each of them individually as a,
Speaker:an overall process that you're working towards developing, uh,
Speaker:declaring a much bigger disaster.
Speaker:Yeah.
Speaker:I, I think it's important also to be familiar because sometimes you might have
Speaker:a real disaster that doesn't require you
Speaker:to recover every single component within that higher level application, right?
Speaker:So being familiar with the individual components is also important
Speaker:depending on what the disaster is.
Speaker:Yeah.
Speaker:Agreed.
Speaker:Um, and you know, there, there, there's an application that we haven't
Speaker:talked about in terms of including or not including it in your Dr.
Speaker:Scope and that is, what about SaaS applications like Microsoft 365?
Speaker:Um, there are a couple of different scenarios there.
Speaker:One is your.
Speaker:Um, account is damaged in some sort of logical way, meaning logical
Speaker:corruption, meaning a ransomware attack.
Speaker:You deleted it.
Speaker:Right?
Speaker:We, we've covered that
Speaker:on,
Speaker:provider damaged it.
Speaker:yeah.
Speaker:The provider.
Speaker:Well, that's, that, that was, I'm gonna list that as like a
Speaker:third.
Speaker:Well, if they damaged your account.
Speaker:And there is an example of that.
Speaker:Uh, for example, the sales for
Speaker:Yep.
Speaker:That's what I was thinking.
Speaker:story where they went and blew up everybody's permissions and everybody
Speaker:had to restore that themselves.
Speaker:Um, that man, I hate that story.
Speaker:I really do, because Salesforce, in my opinion, did not own up to.
Speaker:Uh, they, they, didn't step up to the plate
Speaker:Yeah.
Speaker:at, at the time.
Speaker:I remember writing a blog post for, uh, Druva.
Speaker:I was working for Druva at the time, and I remember writing for Blo, a blog
Speaker:post that said something like, proof that, that Salesforce should not be
Speaker:trusted with your backup infrastructure.
Speaker:Um, 'cause they clearly don't know what they're doing.
Speaker:The but there's, but, so there's your account being damaged in some way.
Speaker:And then there's OVH Cloud.
Speaker:And what happened there Where the entire infrastructure goes?
Speaker:Poof.
Speaker:Right.
Speaker:So, uh, I think you should have that as, as you, you should come up with that
Speaker:as a scenario that you need to test.
Speaker:It's going to be challenging in most cases because just let's just
Speaker:talk about the different scenarios.
Speaker:Let's say it's AWS the way most people back up AWS If AWS goes down
Speaker:and takes your backups with it.
Speaker:You're screwed.
Speaker:You're screwed.
Speaker:Um, the way that most people back up most cloud infrastructure.
Speaker:And then there's the fact that most people trust their SaaS provider for data
Speaker:protection, which they should not, right?
Speaker:I talk about that all the time.
Speaker:They should not, But, but but even if you're backing up your, your
Speaker:data, um, to a third party and, and it's not, they should not.
Speaker:Right?
Speaker:I talk about that all the time.
Speaker:They should not.
Speaker:This has been
Speaker:a day in eight.
Speaker:Problem though,
Speaker:This is an age old problem, but we're talking about testing today.
Speaker:So by the way, this is definitely gonna be two episodes.
Speaker:We haven't even gotten to the testing yet.
Speaker:All we're talking about is setting up the requirements.
Speaker:So,
Speaker:so this is definitely gonna be a second
Speaker:episode.
Speaker:Um, so we talk about, we talk about setting the scope and we
Speaker:talk about starting small first.
Speaker:Each of these individual components in your infrastructure and um,
Speaker:were you about to say something?
Speaker:that this is not destructive and
Speaker:not disruptive.
Speaker:Yeah.
Speaker:Nice, nice.
Speaker:I like that.
Speaker:Non-destructive, non-disruptive DR testing.
Speaker:I
Speaker:like it.
Speaker:because you don't wanna take down your production.
Speaker:You don't wanna affect, say, ongoing Dr.
Speaker:Resiliency that's available on the secondary site because you're
Speaker:about to do some of this testing.
Speaker:There are cases where you do want to impact that, but when you're doing sort
Speaker:of these individual component levels, you may not want to impact your overall DR.
Speaker:Posture while you're doing this
Speaker:Yeah, that that last one is probably the hardest and it
Speaker:and may actually be impossible.
Speaker:What are we talking about there?
Speaker:We're saying that if you have a DR system, I hope you have a DR system
Speaker:if you have one, you know, see if there's a way
Speaker:to test your DR without messing up your dr.
Speaker:Um, I'm not sure if that's possible
Speaker:in, in many scenarios, but.
Speaker:but one of the things I think about right is a lot of data protection
Speaker:vendors, they do test and dev, right?
Speaker:You can spin
Speaker:up a copy off of your storage that is a writeable copy that you can then use
Speaker:for testing out your recovery systems
Speaker:without impacting the actual recovery instance.
Speaker:right.
Speaker:And so that's what I'm thinking about is just like those sort of scenarios or maybe
Speaker:you're able to wheel in an extra server or beg, borrow steel, an extra server to use
Speaker:for your recovery testing or DR testing,
Speaker:Yeah.
Speaker:You know, it's, it's funny that you say that because I, I, you know, when
Speaker:you, I like this idea wheeling in a, a
Speaker:server.
Speaker:I mean, I, I, I think many of our listeners, you know, and, and I, and every
Speaker:time I, by default, I'm always talking about data center, even though I, you
Speaker:know, and then in my brain goes, Hey,
Speaker:nobody has a data center anymore.
Speaker:Um, I don't think many people are wheeling in a server.
Speaker:I think that they're, they're, you know, they're looking at.
Speaker:Cloud
Speaker:infrastructure as a way.
Speaker:I, I know that not everybody can do that.
Speaker:And that's obviously another thing that you have to decide upfront is
Speaker:how are we going to do this recovery?
Speaker:Are we going to do it in the cloud?
Speaker:Are we gonna do it?
Speaker:The alternate infras, alternate infrastructure, um, these are all
Speaker:things that you have to decide upfront.
Speaker:We have to decide what it is we're gonna restore.
Speaker:We have to decide, um, where we're going to restore and, and how.
Speaker:Right and.
Speaker:The deciding the where.
Speaker:Again, this is all about planning.
Speaker:This is all stuff that needs to be
Speaker:decided upfront.
Speaker:This is also something that's going to be part of your backup and Dr.
Speaker:Design, you will have decided up upfront, you know how we're going to do Dr.
Speaker:Uh, well, disaster recovery, how we're going to do it.
Speaker:And that place is most likely the place that you're going
Speaker:to, uh, be doing DR testing.
Speaker:And, you know, there are a lot of choices here.
Speaker:Cloud infrastructure, I think, is the best choice for most environments.
Speaker:Another choice is aging infrastructure, right?
Speaker:So
Speaker:you move, you know, you move your older stuff out and that
Speaker:becomes your DR environment.
Speaker:Um, another choice, another very common choice is basically, um,
Speaker:there, there's two ways to basically.
Speaker:Rent infrastructure for the purposes of disaster recovery.
Speaker:One way is to contract with, um, you know, a company that will provide
Speaker:what you need if and when you need it.
Speaker:Uh, and that costs, you know, let's say this much.
Speaker:And then there's a company that will provide everything you need,
Speaker:always available to you all the time.
Speaker:Even when you don't need it.
Speaker:And that will cost this much.
Speaker:Yeah, exactly.
Speaker:I can't do the fingers, I gotta do the hands.
Speaker:Right.
Speaker:It's significantly more.
Speaker:Uh, and, and this is why I push everybody to the cloud as much as you can.
Speaker:'cause that's the beautiful thing about the cloud, is that you can just literally
Speaker:snap your fingers and, um, you know, and use, and you can also use infrastructure
Speaker:as code so that you can just make all of the hardware that you need
Speaker:magically appear.
Speaker:What's that?
Speaker:and work right.
Speaker:And work.
Speaker:Yeah.
Speaker:Magically appear when you need it.
Speaker:And then the moment you no longer need it because your test is over, you can
Speaker:also snap your fingers and it all goes away and you just pay for it only when,
Speaker:uh, it's, you know, up and running.
Speaker:Um, so the, the next thing to talk about is, so we decided
Speaker:what it's we're gonna test.
Speaker:We decided where we're gonna test it, how we're gonna test it,
Speaker:what about our success criteria?
Speaker:Yeah.
Speaker:I think this is important to note upfront what it means to be
Speaker:successful, but I think it's also important to be realistic, right.
Speaker:With your success criteria, especially if this is sort of your
Speaker:first time doing this, because I remember the story you tell Curtis
Speaker:about your, your runbooks that you
Speaker:Yeah.
Speaker:Yeah.
Speaker:when you used to work at the bank.
Speaker:And how a lot of times they were not able to get through and
Speaker:actually do the recovery because they, a step would be skipped or
Speaker:something wouldn't work properly.
Speaker:And so I think it's sort of a learning process.
Speaker:So don't be too hard on yourself.
Speaker:If the first 10, 20, a hundred times you try doing your recovery testing, it's
Speaker:not like a hundred percent successful.
Speaker:Yeah.
Speaker:may not be your fault.
Speaker:Things change, environments change, hard
Speaker:work may change, right?
Speaker:There's so many other factors,
Speaker:but.
Speaker:this is, this is very closely related to the discussions we've had over, um,
Speaker:doing, uh, um, cyber recovery testing.
Speaker:Right?
Speaker:And your disaster recovery very likely will be part of an
Speaker:overall cyber recovery process.
Speaker:And I agree with you that I.
Speaker:Um, it's interesting.
Speaker:I was go mentally, I was going somewhere completely different.
Speaker:But you, you, once again, this is why we make such a good team.
Speaker:Um, you were more like, Hey, make the re make the, um, um,
Speaker:you know, the requirements.
Speaker:Uh, be nice to yourself, especially if it's the first time out.
Speaker:You know, requirement number one, no one dies, right?
Speaker:Yep.
Speaker:No fires are created.
Speaker:No one quits.
Speaker:Um.
Speaker:And, um, the, you know, set your expectations low and nobody can
Speaker:take you, take 'em away from you.
Speaker:I'm borrowing heavily from, uh, Michael p Connolly, the comedian who,
Speaker:who says that's the, the happiness to life is to lower your expectations.
Speaker:And he's like, my goal for today is to go to the bathroom outside my pants.
Speaker:Um,
Speaker:so yeah.
Speaker:Set the, uh, you know, the success criteria low in the beginning.
Speaker:Um, where I was going was of course, REO and RPO, which we talked about before.
Speaker:The, that, that is definitely going to determine and the overall success,
Speaker:once you click the stopwatch and you begin the recovery process.
Speaker:And then you click it, that everything is back up and running and, and you've
Speaker:tested that, that, that whatever it is that you destroyed or you pretended you
Speaker:destroyed, is now up and running and fully functional Again, as we've mentioned
Speaker:in the RTO and RPO, that doesn't just mean the time of the restore, it's,
Speaker:it's, you know, it's.
Speaker:ah, here's a question.
Speaker:Is it RTO and RPO or is it RTA and RPA?
Speaker:the, it's, that is the objective,
Speaker:right?
Speaker:The RTO and RPO are, are the objective that we are shooting for.
Speaker:And so the goal, um, in a recovery test of any kind is that the RTA, that's recovery
Speaker:time, actual, uh, and recovery point actual, are less than, uh, or equal to
Speaker:the RTO and RPO.
Speaker:go back and listen to our episode.
Speaker:We covered it, I think a couple episodes ago, right?
Speaker:No, it was, well, well, I don't know.
Speaker:It depends on when this one gets published.
Speaker:We just, we just did it.
Speaker:Yeah.
Speaker:That's why I'm saying we just did it not.
Speaker:Oh, was that really the last episode?
Speaker:Well, it just published.
Speaker:Um, uh, but again, I don't know, you know,
Speaker:we'll see how these things, you know, but, um, yeah, if you're, if, if RTO and RPO
Speaker:don't just roll off your tongue and you don't know what they are and, and, and,
Speaker:you know, all of that stuff, it, they literally should be in every conversation
Speaker:having anything to do with backup and Dr.
Speaker:Design.
Speaker:But that to me is the ultimate success criteria, right?
Speaker:Another success criteria is the degree, and this, this is gonna be
Speaker:a percentage, um, a per a percentage achieved, and that is the degree to which
Speaker:you were able to follow your runbook
Speaker:and just do what's in the runbook.
Speaker:This is
Speaker:what you, this is what you were alluding to before, right?
Speaker:Well, and hopefully you have a runbook.
Speaker:Yeah.
Speaker:Like you said, that, that, that assumes you have a run book.
Speaker:Yeah.
Speaker:Um.
Speaker:And, and the way we always did this at the bank, as we mentioned
Speaker:before, is that we would have someone other than the person who runs the
Speaker:backup system do the DR testing.
Speaker:Right?
Speaker:Here's the runbook, please follow it because we're gonna pretend
Speaker:that Curtis got hit by a bus.
Speaker:It was never anything nice.
Speaker:It was never
Speaker:that I won the lottery and then just flew the coop.
Speaker:It was always Curtis got hit by a bus or got swallowed up in the, in the sink.
Speaker:The great sinkhole of 2024.
Speaker:Um,
Speaker:long do you think it's gonna be until people don't even know what a bus is?
Speaker:I think, I think we're good.
Speaker:I think we're
Speaker:good.
Speaker:You know, I, um, it's funny, definitely an aside, when I was working the
Speaker:election, uh, the last two weeks we're at a school and one of the hardest
Speaker:time we're, we're, we're at a, like a multifunction building next to a school.
Speaker:And one of the hardest times was during pickup and drop off because
Speaker:there are all these parents that are picking up and dropping off their kids.
Speaker:And I was like.
Speaker:You know, if only there was like a large vehicle that we could put all the kids
Speaker:in and like we could like paint it yellow so that people could see it and then like
Speaker:have a sign that comes out and make sure, oh, traffic stops in both directions.
Speaker:If only we could do all that.
Speaker:And then all these parents wouldn't have to like, uh, take their kids
Speaker:to school and waste all of that gas and all of those cars sitting there.
Speaker:And I'm sure they're, the gas is running while they're sitting
Speaker:there for a half hour waiting for their kid to come outta school.
Speaker:Anyway, I digress.
Speaker:I don't know.
Speaker:Hmm.
Speaker:But, um, so that, I, I think that, we'll, we'll stop there because
Speaker:basically it, the number one thing, the number one success criteria that
Speaker:I'm gonna say is that you know what your success criteria are before.
Speaker:What are we gonna restore?
Speaker:How are we gonna restore it?
Speaker:Where are we going to restore it, and how long is it, you know, what, what
Speaker:timeframe are we trying to fit in?
Speaker:And also what other, what other, like the thing we talked about with the.
Speaker:With the, uh, uh, the other criteria being that, that we can
Speaker:do it without Curtis' help, right?
Speaker:Or what, whatever, you know, whatever your, your Curtis is, right?
Speaker:Um, decide on what all of those are upfront and you have a much
Speaker:better chance of being successful when you actually do the recovery.
Speaker:What do you think?
Speaker:No, that makes sense.
Speaker:Okay.
Speaker:And with that, I will thank you once again for being a great co-host persona.
Speaker:I try, I try.
Speaker:This was a fun, I I like talking about disaster recovery testing.
Speaker:Yeah, absolutely.
Speaker:And we will, uh, hope you guys enjoyed this as well.
Speaker:Uh, that is a wrap.
Speaker:The backup wrap up is written, recorded, and produced by me w Curtis Preston.
Speaker:If you need backup or Dr.
Speaker:Consulting content generation or expert witness work,
Speaker:check out backup central.com.
Speaker:You can also find links from my O'Reilly Books on the same website.
Speaker:Remember, this is an independent podcast and any opinions that
Speaker:you hear are those of the speaker and not necessarily an employer.
Speaker:Thanks for listening.