The thing is, if you want to do a team building exercise,
Gary Williams:forget all the assault courses and other things they have, you do get
Gary Williams:the team together and do a restore.
Gary Williams:Some of the best...
Gary Williams:seriously.
W. Curtis Preston:It's a bit like, the trust exercises where you lean
W. Curtis Preston:backwards and catches you .It's like that.
W. Curtis Preston:Hi, and welcome to backup.
W. Curtis Preston:Central's Restore it All podcast.
W. Curtis Preston:I'm your host.
W. Curtis Preston:W.
W. Curtis Preston:Curtis Preston, AKA Mr.
W. Curtis Preston:Backup and I have with me, my table saw safety, enthusiast, Prasanna Malaiyandi.
W. Curtis Preston:How's it going Prasanna?
Prasanna Malaiyandi:I'm good, Curtis.
Prasanna Malaiyandi:I don't know if I'd call myself a safety enthusiast, but
W. Curtis Preston:You don't believe in safety.
Prasanna Malaiyandi:no, not at all.
Prasanna Malaiyandi:Plus I think you could say I'm a bad influence on you seeing, how much
Prasanna Malaiyandi:equipment you've now started to accrue.
W. Curtis Preston:Yeah, last night I watched, I don't know.
W. Curtis Preston:I'm going to say two solid hours of just table saw safety videos.
Prasanna Malaiyandi:Yeah, but it is good for you to refresh your
Prasanna Malaiyandi:mind on what table saw safety means.
W. Curtis Preston:Yeah.
W. Curtis Preston:you do recall that table saw is the reason that this finger is missing
W. Curtis Preston:the, or this hand is missing end.
W. Curtis Preston:I'm missing the end of the middle finger on my left hand
W. Curtis Preston:for those of you listening.
W. Curtis Preston:so it's actually really hard for me to watch some of those videos.
Prasanna Malaiyandi:Is it like, when you're doing like driver's
Prasanna Malaiyandi:education learning to drive, they show what is that red asphalt.
Prasanna Malaiyandi:Was that the name of the movie where it's like accidents happen and.
W. Curtis Preston:Blood on the asphalt, I think is what that one's called.
W. Curtis Preston:I do remember that one, but this one, there's one where a guy actually
W. Curtis Preston:shows in the video,, he doesn't have the board completely clear the blade
W. Curtis Preston:when he takes his hand off of it.
W. Curtis Preston:And it, the blade grabs the board and tosses it essentially at his groin, area.
W. Curtis Preston:And the thing is when you watch it, he looks at it one frame at a time.
W. Curtis Preston:And the board goes from being on the other side of the blade to his groin
W. Curtis Preston:in less than a frame of the video.
W. Curtis Preston:And, so that's, one 30th of a second, probably.
W. Curtis Preston:yeah.
W. Curtis Preston:And he's like, don't do that.
W. Curtis Preston:but yeah, it's been interesting, but the thing that's got me super
W. Curtis Preston:excited right now, has been this new video editing or just editing tool.
W. Curtis Preston:It's both video and audio and, and it's this thing called,
Prasanna Malaiyandi:Descript,
W. Curtis Preston:Descript.
W. Curtis Preston:Yeah.
W. Curtis Preston:And it's just.
Prasanna Malaiyandi:you sounded so excited when you texted me.
W. Curtis Preston:Oh, my God.
W. Curtis Preston:it's hard to describe how amazing this tool is, where you input the, in my
W. Curtis Preston:case, I'm in, I'm actually, because we're using video clips of these episodes.
W. Curtis Preston:I'm inputting the video and I edit the video and then I excerpt
W. Curtis Preston:the audio for the audio excerpts.
W. Curtis Preston:, It's made mainly for talking head videos like these, right?
W. Curtis Preston:Or audio and you input the audio or video, it does, automated transcription,
W. Curtis Preston:which gets about 95% accurate.
W. Curtis Preston:And then you go through and you obviously, you can correct the things that it
W. Curtis Preston:got wrong, but the really amazing part is if you start a sentence and you
W. Curtis Preston:change your mind, or you have the, a lot of words going up to that sentence.
W. Curtis Preston:All you have to do is highlight those words in the document and
W. Curtis Preston:it cuts them out of the video.
Prasanna Malaiyandi:It's like magic
W. Curtis Preston:It's like magic.
W. Curtis Preston:And then if that's not enough magic, the part that I'm super excited
W. Curtis Preston:about trying is sometimes you say one word when you meant to say another.
Prasanna Malaiyandi:that never happens to you, Curtis.
W. Curtis Preston:Like the podcast I was editing yesterday...
W. Curtis Preston:. It was you and I talking about 365 and you don't want this
W. Curtis Preston:to happen on your worst day.
W. Curtis Preston:That's what I meant to say.
W. Curtis Preston:But for some reason I said last day, so with this tool, first
W. Curtis Preston:off, I train it with my voice.
W. Curtis Preston:I literally speak into the microphone, a bunch of stuff.
W. Curtis Preston:It can then synthesize my voice.
W. Curtis Preston:And I can select that word and change the word last to worst,
W. Curtis Preston:and it will put my voice there, a synthesized version of my voice.
Prasanna Malaiyandi:So here's a question, Curtis, do we actually
Prasanna Malaiyandi:need to have this podcast anymore?
Prasanna Malaiyandi:Or can we just have not even just type it out.
Prasanna Malaiyandi:Can we just have something auto-generate based on all of our past podcasts and
Prasanna Malaiyandi:just have it start creating new podcasts.
W. Curtis Preston:It'll just be a recording that says
W. Curtis Preston:3, 2, 1 rule over and over.
Prasanna Malaiyandi:No, but it's you know how they have, they've trained AI
Prasanna Malaiyandi:to now do paintings and things like that.
Prasanna Malaiyandi:I wonder if we could basically have,
W. Curtis Preston:AI based.
W. Curtis Preston:yeah.
W. Curtis Preston:first I get it.
W. Curtis Preston:I have to get all the audio and then feed that into a thing.
W. Curtis Preston:Yeah.
W. Curtis Preston:We don't need you and me anymore.
Prasanna Malaiyandi:Exactly.
W. Curtis Preston:How hard is it to just say backup your stuff, backup all the
W. Curtis Preston:stuff and make sure you test your backups?
Prasanna Malaiyandi:And then you just do it based off of whatever's
Prasanna Malaiyandi:trending on Twitter and the data protection, security space.
Prasanna Malaiyandi:And it comes up with a new podcast episode for us.
W. Curtis Preston:That may have already happened.
W. Curtis Preston:Who knows?
W. Curtis Preston:You don't know this is an auto-generated video and auto-generated audio, who
W. Curtis Preston:knows, but speaking of testing backups, I was thinking about this concept, as
W. Curtis Preston:long as you don't test your backups, your backup is both a complete success
W. Curtis Preston:and a complete failure, which reminds me of, the concept of Schrodinger's cat.
Prasanna Malaiyandi:I like the former, rather than thinking
Prasanna Malaiyandi:about the latter, but that's
W. Curtis Preston:Yeah, but,
Prasanna Malaiyandi:
Speaker:rather than the realist.
W. Curtis Preston:So you're familiar with the concept of Schrodinger's cat, right?
Prasanna Malaiyandi:
Speaker:Based on TV shows, movies,
W. Curtis Preston:Okay.
W. Curtis Preston:Yeah.
W. Curtis Preston:So it's just a concept, as I understand the concept that you have this cat in
W. Curtis Preston:a box, and as long as you don't look in the box, the cat is both alive and dead.
W. Curtis Preston:But once you look in the box, you will know that the cat is alive or dead.
W. Curtis Preston:That's the concept of Schrodinger's cat.
W. Curtis Preston:And the reason why this is relevant today is that we have the author of
a blog called Schrodinger's Backup:
Speaker:when good documentation goes bad.
a blog called Schrodinger's Backup:
Speaker:He's been in the IT industry almost as long as I have.
a blog called Schrodinger's Backup:
Speaker:He comes to us from the UK.
a blog called Schrodinger's Backup:
Speaker:Welcome to the podcast, Gary Williams.
Gary Williams:Thank you and thank you for the invite.
W. Curtis Preston:I saw that title.
W. Curtis Preston:And I was like, I gotta get this guy on the podcast.
Prasanna Malaiyandi:
Speaker:Curtis was so excited.
Prasanna Malaiyandi:
Speaker:Gary, you have no idea.
Prasanna Malaiyandi:
Speaker:This is like one of his favorite topics.
Gary Williams:Thank you.
Gary Williams:I don't know if I coined the term.
Gary Williams:I have seen it used since I'd like to think I coined the term,
Gary Williams:but I don't know for certain,
W. Curtis Preston:why not?
Gary Williams:it might be something that I heard and I just copied because
Gary Williams:it's just sounds really cool when it's perfectly accurate, I think.
Gary Williams:It was all three or four companies ago.
Gary Williams:The lessons we learned still definitely apply today, but this
Gary Williams:happened about three companies back.
Gary Williams:So about 10 years ago.
W. Curtis Preston:So what was your role at the time?
Gary Williams:So my role at the time was a senior network engineer or senior
Gary Williams:support engineer, something like that.
W. Curtis Preston:OK, And you had the, the gall to, to ask about backups.
Gary Williams:No, I didn't.
Gary Williams:I was overconfident with our backups, let's say so we had the backup
Gary Williams:software, I think it was backup exec.
Gary Williams:And, we had all the servers being backed up.
Gary Williams:We had everything going to dual tapes.
Gary Williams:The tapes were going off site.
Gary Williams:Everything was working.
W. Curtis Preston:Jewel, jewel tapes?
Gary Williams:Dual tapes.
Gary Williams:We actually had the backups, the software was writing
Gary Williams:effectively RAID-1 one backups.
Gary Williams:So it was writing to two tapes.
W. Curtis Preston:Oh, duel, capes.
W. Curtis Preston:Okay.
W. Curtis Preston:I heard, for some reason I heard Jewel.
W. Curtis Preston:I don't know why.
Gary Williams:It's the English accent.
Gary Williams:And yeah.
Gary Williams:So it's going to two tapes simultaneously.
Gary Williams:So the idea was that even if a tape broke, or if something happens to the
Gary Williams:backup and we weren't entirely sure of, or you couldn't restore from one of the
Gary Williams:tapes, you could then get the other tape and use that tape to do the restore.
Gary Williams:So we had all that stuff going on.
Gary Williams:And we got all the emails and of course we're getting the emails
Gary Williams:saying all the backups are good, everything must be absolutely fine.
Gary Williams:Why would we test them?
Gary Williams:Why we're busy enough with other tickets and other stuff going on and projects.
Gary Williams:We haven't got time to test them.
Gary Williams:What's the point?
Gary Williams:We know they work.
Prasanna Malaiyandi:And so it looks like you were doing all the
Prasanna Malaiyandi:right things in terms of setting up backups, Following the 3, 2, 1 rule.
Prasanna Malaiyandi:Right?
Prasanna Malaiyandi:Making sure your copies were offsite and.
W. Curtis Preston:Yeah.
Prasanna Malaiyandi:I think that's probably better than maybe
Prasanna Malaiyandi:like 70% of the people out there.
Prasanna Malaiyandi:Who try to do backups.
Prasanna Malaiyandi:You're like doing the right things.
Prasanna Malaiyandi:You're like, oh, I'm good to go.
Gary Williams:Yeah, absolutely.
Gary Williams:As I say, we had the emails, we even checked the emails.
Gary Williams:I think we even had a shared folder or something like that, where all
Gary Williams:the backups emails went, and if one of us saw that the folder had
Gary Williams:an unread one going, we check it.
Gary Williams:If there was an error, someone would get a ticket, it would get sorted out.
Gary Williams:If the error went on for several days, there would be a conversation.
Gary Williams:We will get these things fixed.
Gary Williams:where's the problem.
Gary Williams:We know our backups are good.
W. Curtis Preston:So you, you were a.
W. Curtis Preston:You were, I don't know.
W. Curtis Preston:I don't know what to call it, but so instead of being a proponent
W. Curtis Preston:of testing the backups, you were a proponent of oh, everything's fine.
Gary Williams:Unfortunately at that time.
Gary Williams:Yes, I was, sitting there and quite fat, dumb and happy going.
Gary Williams:We've got the emails, the backups work.
Gary Williams:We know they work.
Gary Williams:Where's the problem.
Gary Williams:I didn't see any issue here at all.
W. Curtis Preston:For what it's worth.
W. Curtis Preston:I had a similar point in my career and there was a time.
W. Curtis Preston:I remember when I was at a company, I won't give the actual name of the
W. Curtis Preston:company, but I will just say it's a very, well-known electronics manufacturer.
W. Curtis Preston:and I had helped him set up their backup system and I wasn't
W. Curtis Preston:there just to do the backups.
W. Curtis Preston:I was there to do sysadmin stuff.
W. Curtis Preston:And they were a mess.
W. Curtis Preston:th this was a, it was a small department in this bigger, electronics company.
W. Curtis Preston:It was an interesting department.
W. Curtis Preston:They called it.
W. Curtis Preston:Simulation modeling and research.
W. Curtis Preston:So it was a revolutionary idea at the time of the idea of modeling,
W. Curtis Preston:like in a computer, what would happen if you drop this device?
W. Curtis Preston:And so they were doing this in a computer.
W. Curtis Preston:It was a fascinating new at the time, new field of science.
W. Curtis Preston:So I was there to fix a whole bunch of problems.
W. Curtis Preston:One of which, for example, was that every workstation, it was all
W. Curtis Preston:Unix workstations, and every person had root on their workstation.
W. Curtis Preston:And that was the first thing I was going to fix.
W. Curtis Preston:But I also set up their backup system and, the backups worked.
W. Curtis Preston:So I assumed the restores would work and it was some time.
W. Curtis Preston:I was there long enough that I went, I actually, at some
W. Curtis Preston:point needed to do a restore.
W. Curtis Preston:And I found out that those tape drives were really good at writing data.
W. Curtis Preston:And they were completely incapable of reading data.
W. Curtis Preston:Again, I don't want.
W. Curtis Preston:I'm sure there was something wrong with these drives, but
W. Curtis Preston:they were IBM 3590 drives.
W. Curtis Preston:Normally IBM drives are top of the line or whatever, but there was something wrong
W. Curtis Preston:with these drives that I was completely.
W. Curtis Preston:So I guess what I'm saying is you're not alone.
W. Curtis Preston:even me who, I've spent my career in this, although honestly, that's
W. Curtis Preston:that event is on the list of things that I think back to when.
Prasanna Malaiyandi:Yeah.
W. Curtis Preston:when I try to get other people to do it.
Gary Williams:Absolutely same with me.
Gary Williams:the backups that we were taking, as I say, we were only a small
Gary Williams:team and we had all the emails.
Gary Williams:We had everything in place.
Gary Williams:We had the two tape libraries doing the backups.
Gary Williams:So we thought we were in a really good position because we had
Gary Williams:everything working the way it should.
Gary Williams:We even had documentation for how all this stuff was put together.
Gary Williams:we actually had to consultancy come in and help us put all this stuff together.
Gary Williams:Because at the time I was working for a financial institution, we
Gary Williams:had to have certain boxes ticked, and we had those boxes ticked
Gary Williams:because we have the documentation.
Gary Williams:We had the backups, they were going off site.
Gary Williams:They were going off site.
Gary Williams:They were being looked after for us.
Gary Williams:We even recalled tapes to make sure we could do the process
Gary Williams:and no tapes were getting lost.
Gary Williams:So we did that level of testing, but what we never actually tested was
Gary Williams:actually restoring the data itself.
Gary Williams:And it was a bit of an epiphany when we actually had someone come
Gary Williams:into the team who a brand new to IT.
Gary Williams:Had never worked in IT before.
Gary Williams:Always wanted to work in IT.
Gary Williams:Was actually employed in the business in a completely different role.
Gary Williams:And then he actually said to me, one day, I'd like to move into IT.
Gary Williams:I thought he was joking.
Gary Williams:It turns out no, he was actually serious.
Gary Williams:He was an ex-finance person wanting to move into IT.
Gary Williams:So he applied internally, he got the job and he started with us and he started
Gary Williams:looking through some old tickets and he was saying things like, why did you
Gary Williams:do such and such a change this way?
Gary Williams:So there's a whole education thing going on there.
Gary Williams:And that's when he asked the question.
Gary Williams:When did you test the backups?
Gary Williams:What do you mean test them.
Gary Williams:We've got the emails.
Gary Williams:Look, here, you can see the service.
Gary Williams:Here's the tape drives.
Gary Williams:Here's the tapes.
Gary Williams:We record the tape.
Gary Williams:Yeah, sure.
Gary Williams:But when did you restore something?
Gary Williams:And I, something I won't actually forget because there was this look,
Gary Williams:there's only four of us in the IT team.
Gary Williams:We were a really small team for a company of about 300 and there's
Gary Williams:this look going around the whole office and everyone's going well, we
Gary Williams:haven't actually tested them have we?
Prasanna Malaiyandi:It's like a light bulb goes off and it's yeah.
Prasanna Malaiyandi:It's Ooh.
Gary Williams:We like, hang on.
Gary Williams:yeah, we should probably test one of they shouldn't we.
Gary Williams:Okay.
Gary Williams:what should we test and looking back on it, it was a really insane moment
Gary Williams:just to think that we've had easy.
Gary Williams:I think what actually had the emails was coming in for over a year.
Gary Williams:And yes, we'd had the odd backup failure where something a time there, or there
Gary Williams:was a fault with one of the tape drives.
Gary Williams:These tape drives were quite old.
Gary Williams:So they actually had physical SCSI cables that would sometimes play up.
Gary Williams:So you had to make sure the SCSI cables were all firmly
Gary Williams:attached, the terminator was in.
Gary Williams:The good old days.
Gary Williams:And.
Prasanna Malaiyandi:never had to deal with restores?
W. Curtis Preston:Yeah.
W. Curtis Preston:And course you had both active and passive, terminators as well.
Gary Williams:Yeah, exactly.
Gary Williams:we did actually have to do some restores, but we had, a storage array and the
Gary Williams:storage provider let us do snapshots.
Gary Williams:So 99% of the restores that we needed.
Gary Williams:Just copy and paste from the snapshot.
Gary Williams:Not a problem.
Gary Williams:You deleted that file not a problem.
Gary Williams:There it is.
Gary Williams:If something was deleted from a desktop, the common response was, we
Gary Williams:don't back things up on your desktop.
Gary Williams:Sorry.
Gary Williams:That's tough.
Gary Williams:If you want it backed up, put it onto the server, put it into your
Gary Williams:home drive or something like that.
Gary Williams:It will get backed up.
Gary Williams:So that was the general understood consensus because it was a small company.
Gary Williams:Most of the time, this wasn't an issue, and as I say, people deleted a file.
Gary Williams:I remember one time we had an Excel file.
Gary Williams:That was a real pain because of all these financial macros.
Gary Williams:And we restored that from a snapshot.
Gary Williams:And it was still corrupt and we had to go back a week or so, we
Gary Williams:managed to get the file back and it was working and we actually said
Gary Williams:it and I remember it quite well.
Gary Williams:We said it within the team.
Gary Williams:that was lucky.
Gary Williams:We might have actually asked to get the tapes on site and do a restore from
Gary Williams:the tapes, but the snapshot worked.
Gary Williams:Everything's fine, you know yeah.
Prasanna Malaiyandi:Now you've decided, okay, we haven't tested.
Prasanna Malaiyandi:Maybe we should actually try doing the test.
Prasanna Malaiyandi:How did you decide what to test?
Gary Williams:funny enough, it was a new guy.
Gary Williams:the discussion was actually, okay, you're the person sitting
Gary Williams:there looking through the tickets.
Gary Williams:You're looking through the documentation.
Gary Williams:You're new to it all.
Gary Williams:You want us to prove to you that the restore process works.
Gary Williams:We know it does.
Gary Williams:Pick something.
Gary Williams:And then he sat there and he went, How about the exchange server?
Prasanna Malaiyandi:
Speaker:Swinging for the fences!
Gary Williams:Fine.
Gary Williams:So we thought, okay, fine.
Gary Williams:we'll get the tapes back on site.
Gary Williams:We'll do the restore.
Gary Williams:We'll prove that the backups work and we can go back to what we're normally doing.
Gary Williams:all the project work, that kind of thing.
Gary Williams:We could spend a day on this.
Gary Williams:It will be good for us.
Gary Williams:Not a problem.
Gary Williams:We even went to the documentation and got the documentation out and said,
Gary Williams:look, we've got the documentation.
Gary Williams:The tapes are coming in.
Gary Williams:This is going to be easy.
Gary Williams:And it wasn't.
Prasanna Malaiyandi:Of course not.
Prasanna Malaiyandi:So when you decided to do the restore.
Prasanna Malaiyandi:Did you bring down your production or were you like, I'm going to
Prasanna Malaiyandi:restore this into a safe spot and
Gary Williams:Yeah, we couldn't bring down production because the nature
Gary Williams:of the business was that we needed to keep the server up and running.
Gary Williams:We actually had a spare server and I think we're maybe had two spare servers.
Gary Williams:VMs were just starting to come on the scene and we actually
Gary Williams:had a spare server racked.
Gary Williams:And the idea was that if we had a server failure, we could take the
Gary Williams:physical discs out of one server.
Gary Williams:Put it into another server power it on, be back running.
Gary Williams:this is also before the days of re replicas.
Gary Williams:They were, again, just coming out on a lot of software was super expensive and
W. Curtis Preston:You're giving me flashbacks, Gary.
Gary Williams:the good old days.
W. Curtis Preston:Yeah.
Gary Williams:We had this physical server and it had plenty
Gary Williams:of disc space to handle this.
Gary Williams:So we said, okay, Let's we've not actually even powered this server on.
Gary Williams:I don't even think, I think maybe it was powered on when
Gary Williams:we bought it and that was it.
Gary Williams:So we said we should test that server out anyway.
Gary Williams:Yeah.
Gary Williams:Let's power it on.
Gary Williams:Let's get the data restored to that server and bring exchange up.
Gary Williams:We can bring it up in an isolated network.
Gary Williams:Do some very basic tests on it, because it was a small team.
Gary Williams:We had access to the networking guys.
Gary Williams:I'll say networking guys.
Gary Williams:We did a little bit of networking age and there was one guy who did a lot
Gary Williams:of the really key networking, tasks.
Gary Williams:So none of that was a problem.
Gary Williams:We didn't have to wait months for tickets or to get done or anything like that.
Gary Williams:So we set up this isolated network, we got the tapes on site and we
Gary Williams:started doing the restore and that's when it all went horribly wrong.
Prasanna Malaiyandi:So who was doing the restore?
Gary Williams:I, if I recall, it was actually our help desk guy.
Gary Williams:We S we said to him, look, you came up with this.
W. Curtis Preston:You put a lot on this guy.
W. Curtis Preston:It was his idea.
W. Curtis Preston:And you're like, what, if you think testing backups is so
W. Curtis Preston:important, why don't you do it?
Gary Williams:Pretty much .We did put it on him.
Gary Williams:cause it was his idea.
Gary Williams:And we said, look, this is a really good exercise for you to do again.
Gary Williams:Unfortunately, I'm going to put my hands up to this.
Gary Williams:It's a bad thing to have done, but we said, I'm a senior IT person.
Gary Williams:I know the backups are good.
Gary Williams:here you go.
Gary Williams:Here's the tapes.
Gary Williams:Here's the documentation.
Gary Williams:See you later and off he goes and he comes back.
Gary Williams:I think it was about two, three hours later, something like that.
Gary Williams:And he went, I can't get this working.
W. Curtis Preston:Yeah.
Gary Williams:What do you mean you can't get it working.
Gary Williams:What's the problem.
Gary Williams:And I don't actually recall what the problems, all the problems were, but
Gary Williams:I know that the server itself didn't have enough disc space, even though
Gary Williams:it was supposed to have the disc space, because the documentation said,
Gary Williams:you need partition sizes like this.
Gary Williams:And it actually changed since then.
Gary Williams:And we didn't realize, and that was really the start of a lot of problems.
W. Curtis Preston:Yeah.
W. Curtis Preston:first off I will say that even though.
W. Curtis Preston:the way you got there.
W. Curtis Preston:I like the way you did it.
W. Curtis Preston:what to say, even though the way you got there was wrong, the fact that
W. Curtis Preston:you, the fact that you had this person.
W. Curtis Preston:do it, who wasn't the person, that made the documentation.
W. Curtis Preston:That's actually something I push pretty heavily.
W. Curtis Preston:And it's an idea that came from back in my days when I was at a bank
W. Curtis Preston:and we very much did test restores.
W. Curtis Preston:first off we didn't have snapshots.
W. Curtis Preston:We didn't have any of that stuff.
W. Curtis Preston:And we had 10,000 employees and any one of them was allowed to
W. Curtis Preston:call into the help desk and ask for a restore on any given day.
W. Curtis Preston:And, so we would get 10 to 15 restores a day.
W. Curtis Preston:So we tested pretty regular, but the thing that we buy in that degree, but
W. Curtis Preston:the thing that we had to test in the way that you did were these large
W. Curtis Preston:server restores, we did a DR test and it was an absolute imperative
W. Curtis Preston:from the powers that be was that.
W. Curtis Preston:Curtis wrote the documentation.
W. Curtis Preston:Curtis cannot be the person actually doing the test.
W. Curtis Preston:Curtis needs to be standing back there, listening closely to the problems that
W. Curtis Preston:are happening, but, w which, which was actually kind of nice, although
W. Curtis Preston:it's nerve wracking to be the person who wrote the documentation and then
W. Curtis Preston:sitting there watching someone, you think you've answered all the questions,
W. Curtis Preston:but it's not like in this case, you.
W. Curtis Preston:you had the classic example of the documentation might've been
W. Curtis Preston:correct, but it was out of date.
Gary Williams:It was correct at the time, the irony is very similar with you.
Gary Williams:I didn't actually write the documentation.
Gary Williams:It was written by the contractors and consultants that came on.
Gary Williams:Actually signed off on the documentation saying, yes, all
Gary Williams:the version numbers are correct.
Gary Williams:And I think I'd done a couple of updates.
Gary Williams:And then we'd had other changes and the other people
Gary Williams:had forgoten or I'd forgotten.
Gary Williams:Probably I'd forgotten to update the documentation because we
Gary Williams:were busy only a small team.
Gary Williams:And so things very slowly on, not just that document, but on every other
Gary Williams:document that we had about the environment become out of date and it was this
Gary Williams:snowball of errors that had crept in.
Gary Williams:And the thing that we realized is actually having no documentation
Gary Williams:would have been better because the documentation was lying to us.
Gary Williams:this poor guy is sitting there going, I followed steps three, four,
Gary Williams:and five, but I can't do step six because step five doesn't work.
Gary Williams:What do you mean?
Gary Williams:It doesn't work.
Gary Williams:And that's when we found that there was a service pack that
Gary Williams:was missing from exchange.
Gary Williams:So it couldn't go any further and it just kept on building and building like this.
Prasanna Malaiyandi:That is an interesting problem.
Prasanna Malaiyandi:How do you keep your documentation up to date as you're making these
Prasanna Malaiyandi:changes and making sure everyone across the environment knows like where the
Prasanna Malaiyandi:documentation is and all the rest of that.
Gary Williams:today, we use a Wiki solution for all of our documentation.
Gary Williams:The idea behind that of course, is the Wiki is so easy to edit.
Gary Williams:But you still don't or sometimes you still don't.
Gary Williams:You make a note, I'll do that tomorrow or next week.
Gary Williams:So there is still the exact same risk.
Gary Williams:And even in my current place, we've seen this with certain, we do testing as well.
Gary Williams:We do a lot more testing now than, anywhere I've ever worked before.
Gary Williams:And even with a lot of the modern systems with Amazon.
Gary Williams:Backups to S3 and all this kind of stuff.
Gary Williams:We still test to make sure that everything's correct,
Gary Williams:that we know what we're doing.
Gary Williams:That those Wiki pages are fully up to date.
Gary Williams:we did some AD restore testing not so long ago and we found, not major errors,
Gary Williams:but there was a couple of little issues there with the restore process, which
Gary Williams:just needed a few corrections in the documentation, just, as like a permissions
Gary Williams:era type of thing where we couldn't actually get access to the bucket.
Gary Williams:So we had to make some changes there.
Gary Williams:So even with all the modern backup software.
Gary Williams:It's still so important.
W. Curtis Preston:I talked about those DR tests that we did back in the day and.
W. Curtis Preston:The, and the fact that we always had someone who wasn't me doing the
W. Curtis Preston:tests, and frequent listeners to the podcast will have heard this before.
W. Curtis Preston:But if we define a successful restore, as we got from A to Z without having to ask
W. Curtis Preston:Curtis, what does this line mean, not a single one of the restores was successful.
W. Curtis Preston:so if Curtis ever got, blown up and, whatever, the chances of a restore
W. Curtis Preston:going completely without a hitch was, zero, which is why you talked
W. Curtis Preston:about updating, there's always little things that you have to update.
W. Curtis Preston:I would suggest that original documentation.
W. Curtis Preston:and again, take this for what it's worth to anybody who's listening.
W. Curtis Preston:the first mistake was writing the documentation in a way that
W. Curtis Preston:it can easily get outdated.
W. Curtis Preston:our exchange server is 75.
W. Curtis Preston:Terra...
W. Curtis Preston:r ight.
W. Curtis Preston:that's a problem.
W. Curtis Preston:So if you're going to hand that to a restore documentation, what it should say
W. Curtis Preston:is before beginning the restore, go look at the size of the backups, And figure
W. Curtis Preston:out how big the current exchange server is, and then size the volume accordingly.
W. Curtis Preston:yeah, that, that line wouldn't have gone out of date as quickly.
W. Curtis Preston:it is a real challenge by the way.
W. Curtis Preston:this idea of what it's like to update documentation, by the way, back in
W. Curtis Preston:the day we were using Wordperfect.
W. Curtis Preston:Yeah.
W. Curtis Preston:And I remember the official company standard was WordPerfect,
W. Curtis Preston:because we could use it on, we had Unix versions of WordPerfect.
W. Curtis Preston:By the way, curses spaced WordPerfect.
W. Curtis Preston:Not this fancy Windows.
W. Curtis Preston:what you'd see is what you get editing stuff.
W. Curtis Preston:This was text on a screen.
W. Curtis Preston:and I remember getting in a fight over.
W. Curtis Preston:There was this one guy that was new and he wanted to use Word
W. Curtis Preston:because nobody used WordPerfect.
W. Curtis Preston:And we were like, we don't care.
W. Curtis Preston:We use WordPerfect here for our documentation.
W. Curtis Preston:And if you want your documentation to fit into our documentation,
W. Curtis Preston:you will use Wordperfect.
W. Curtis Preston:And you will like it.
Gary Williams:I remember our first days of moving across
Gary Williams:the world where you had the.
Gary Williams:Word had the ability to mimic WordPerfect key presses.
Gary Williams:So you could transition easily.
Gary Williams:Good old days.
W. Curtis Preston:Good old days, but I think what you're doing now with the Wiki,
W. Curtis Preston:I think that's a much better approach.
Gary Williams:It is.
Gary Williams:There's permissions list behind it, obviously, so that not everyone
Gary Williams:can get access to it, but it's the right people can get access.
Gary Williams:but what it means is everyone in the team can get access.
Gary Williams:They can all update.
Gary Williams:It.
Gary Williams:There's a history as well.
Gary Williams:So the other thing that we didn't have is the backup of the documentation
Gary Williams:was on the server we were backing up.
Prasanna Malaiyandi:Oh,
Gary Williams:Exactly.
Gary Williams:So we, all we had was that documentation and looking back on it, we made
Gary Williams:quite a few mistakes like this.
Gary Williams:We had the, let's say we had the documentation on the file server.
Gary Williams:So if the file server was lost.
Gary Williams:How did you get your documentation?
Gary Williams:And it was, again, something that the helpdesk guy pointed out to us.
Gary Williams:How did you get your documentation?
Gary Williams:That's fine, actually.
Gary Williams:How would we.
Prasanna Malaiyandi:Sometimes it's an outside perspective or
Prasanna Malaiyandi:someone's Hey, how are you actually going to get this stuff done?
Gary Williams:Something I think it's really important to know is at
Gary Williams:the time I was a senior IT person.
Gary Williams:There's a colleague of mine who was senior and we had a network guy.
Gary Williams:All of us, were reasonably senior.
Gary Williams:This guy was a junior.
Gary Williams:He'd been working in finance for three or four years beforehand.
Gary Williams:And then he'd just moved into IT.
Gary Williams:And he had such a fresh perspective on everything that it really opened our eyes.
Gary Williams:And that was the day I learned that it doesn't matter if you got 50
Gary Williams:years IT experience or five minutes.
Gary Williams:There's always something you can learn from someone.
Gary Williams:And sometimes the most valuable thing you can learn is from someone
Gary Williams:who is very new to the team, fresh eyes, fresh perspective.
Gary Williams:It's invaluable.
Prasanna Malaiyandi:100% agree.
W. Curtis Preston:There, there is a perspective that you can only
W. Curtis Preston:gain by being completely ignorant.
W. Curtis Preston:He could have been not junior to IT in this case.
W. Curtis Preston:He was, but even if he's a senior IT person, but he's joining your organization
W. Curtis Preston:for the first time, another way, you look at this person when they ask for things
W. Curtis Preston:of like, when they ask stupid questions, so how often do we, test our backups here?
W. Curtis Preston:And you're like, we don't do that.
Gary Williams:with my current place, any new person we get into our IT team, we
Gary Williams:literally do that sort of thing with them.
Gary Williams:Now where we say, have a look through the tickets.
Gary Williams:You've got any questions.
Gary Williams:Ask, have a look through the Wiki again.
Gary Williams:You've got any questions ask because.
Gary Williams:There's so many things in there.
Gary Williams:There's like the whole corporate culture and there's corporate acronyms.
Gary Williams:And if they don't know what they are, we've just found a problem
Gary Williams:because if there's one acronym we have this, I, my brain's gone.
Gary Williams:Sorry.
Gary Williams:there's one acronym that we have, that's very similar to an IT acronym.
Gary Williams:I can't remember what it is off the top of my head.
Gary Williams:Yeah.
Gary Williams:But when you look at it, you think the, IT term because you're an IT person,
Gary Williams:but it actually means the corporate.
Gary Williams:so there's that kind of thing.
Gary Williams:it's always important to spell out these acronyms at the start of any
Gary Williams:documentation so that everyone knows this is what you are referring to.
Prasanna Malaiyandi:Especially
W. Curtis Preston:it's Prasanna's job on, on the podcast.
W. Curtis Preston:If anybody ever brings up, an acronym that, they don't spell out,
W. Curtis Preston:Prasanna's, always making them spell it
Prasanna Malaiyandi:out.
Prasanna Malaiyandi:Yep.
Prasanna Malaiyandi:I'm like, what does that really mean?
Prasanna Malaiyandi:Please tell me.
Gary Williams:And this is the thing.
Gary Williams:You can walk into a meeting with all the IT acronyms and every IT
Gary Williams:person sitting there will probably think it's something different.
Gary Williams:I think DC is a good one because DC's direct current data center.
Gary Williams:Things like that.
Gary Williams:And this is the sort of thing that we've experienced several times, a few different
Gary Williams:companies I've worked for, and it's always valuable to get that new person's insight.
Gary Williams:Because they don't know the corporate terminology, they don't
Gary Williams:know the corporate acronyms.
Gary Williams:So it's worth getting them on board and going through all this stuff because
Gary Williams:they've got this fresh insight before they learn that stuff and they can spot these
Gary Williams:problems before they become a problem.
W. Curtis Preston:I just realized I haven't thrown out our
W. Curtis Preston:usual disclaimer, Prasanna and I work for different companies.
W. Curtis Preston:I work for Druva and he worked for Zoom.
W. Curtis Preston:And this is not a podcast of either company.
W. Curtis Preston:And the opinions that you hear are ours.
W. Curtis Preston:Please rate this podcast at ratethispodcast.com/restore.
W. Curtis Preston:And if you, are like our guest here today, Gary who, just you're an IT person
W. Curtis Preston:out there, and you want to talk about your favorite subject to, or if you know
W. Curtis Preston:what, maybe if you don't understand why
Prasanna Malaiyandi:Come challenge, Mr.
Prasanna Malaiyandi:Backup.
W. Curtis Preston:Some crazy person would actually like them then, come on
W. Curtis Preston:here related topics, cybersecurity, data privacy, a number of related topics.
W. Curtis Preston:We'd love to have you on as a guest and, and reach out
W. Curtis Preston:to me at wcurtispreston@gmail or at @wcpreston on Twitter.
W. Curtis Preston:And we'll get you on here.
W. Curtis Preston:So, um, how did it turn.
W. Curtis Preston:With your, with your restore.
Gary Williams:So eventually we got there, we actually got the
Gary Williams:exchange server fully restored with correctly, the documentation.
Gary Williams:and I think it took three or four days, something like that.
Gary Williams:And the thing is, if you want to do a team building exercise, forget all the
Gary Williams:assault courses and other things they have, you do get the team together and
Gary Williams:do a restore some of the best seriously.
W. Curtis Preston:It's a bit like, the trust exercises where you lean
W. Curtis Preston:backwards and catches you it's like that.
Gary Williams:I've also never seen so many whiteboards being used to
Gary Williams:describe issues and draw diagrams of how things hung together.
Gary Williams:And it was actually really good.
Gary Williams:And I will admit we ended up putting some projects, not exactly on pause,
Gary Williams:but we put them to one side as all of us started getting involved in
Gary Williams:this restore, because we realized we actually had a very serious problem.
Gary Williams:I'll be honest.
Gary Williams:We gave the help desk guy, this junior guy to IT the documentation.
Gary Williams:And we did expect him to trip over a few things.
Gary Williams:He's a new person, some of the terminology is new, fine, not a problem.
Gary Williams:We know we're there to help.
Gary Williams:What we didn't expect was us to trip over the same issues.
Gary Williams:We honestly thought that, like you were saying earlier, Curtis, that he
Gary Williams:was going to ask us some questions.
Gary Williams:We could do some updates to the documentation, do it again,
Gary Williams:and everything would be fine.
Gary Williams:But we didn't expect to get stumped by our own documentation.
Gary Williams:And unfortunately we actually did, we're sitting there going
Gary Williams:through the documentation going well, hang on a minute.
Gary Williams:we know the, the password is in this password safe and that password
Gary Williams:should work, but something had changed or I think at one point would
Gary Williams:actually, changed the security model.
Gary Williams:So it was requiring stronger passwords.
Gary Williams:So you couldn't actually use a password that was on the backup.
Gary Williams:You had to go and reset an account.
Gary Williams:And it was lots of.
Gary Williams:It was nothing seriously, wrong with a backup as such.
Gary Williams:And there's nothing seriously wrong with the documentation,
Gary Williams:but it was lots of little things that just piled up and piled up.
Gary Williams:And every time we took a couple of steps forward, we thought, that's it.
Gary Williams:We've got this solved, we'll get this restored.
Gary Williams:And then we got it all up and running and got the server running and
Gary Williams:exchange server service wouldn't start.
Gary Williams:couldn't figure out why.
Gary Williams:I think that one took us a day to go through and we ended up having
Gary Williams:to run some additional commands.
Gary Williams:And finally, we got there, we got it all up and running.
Gary Williams:And I still remember, I think it was actually like a Friday or something
Gary Williams:we're sitting there in the office and went, yeah, that was a really good
Gary Williams:question know, can we restore the data?
Gary Williams:Thank you for asking it.
Gary Williams:we had a bit of a celebration over that one.
W. Curtis Preston:I would say that, I like what you were saying
W. Curtis Preston:about, it sounded like there was a lot of collaboration.
W. Curtis Preston:It sounds like there's a lot of whiteboards going on
W. Curtis Preston:and you were learning a lot.
W. Curtis Preston:I would argue that the reason that was the case is that you
W. Curtis Preston:weren't doing it under duress.
W. Curtis Preston:You were doing this as a test.
W. Curtis Preston:if your exchange had been down for three or four days, that would have
W. Curtis Preston:been a very different experience.
Gary Williams:Completely.
Gary Williams:It's something that we actually discussed, that Friday afternoon, we've got the
Gary Williams:exchange server up and running and the conversation was what happens if
Gary Williams:this happens for real, because sure.
Gary Williams:We got the backup restored.
Gary Williams:We know that the backup is good.
Gary Williams:Do you mean I was told to stay and it was good, but the restore process wasn't good.
Gary Williams:And I think we focused way too much on the backup itself and
Gary Williams:not the restore at that point.
Gary Williams:I said we had that conversation and it was a matter of what would happen.
Gary Williams:And we knew that we were a small company.
Gary Williams:We knew we would have the CEO down in the office, screaming at us.
Gary Williams:I need this back.
Gary Williams:We can't conduct business and I'll be honest that day.
Gary Williams:We got a healthy lot of respect, both for the backups, for documentation
Gary Williams:and the accuracy of documentation and for the server itself.
Gary Williams:Because we knew that the company at that point, the company relied on email so
Gary Williams:much that if that server did disappear, and we took that long to get back up and
Gary Williams:running the loss, the financial loss to the company and the reputational loss
Gary Williams:to the company would have been huge.
Gary Williams:And that also actually helped form some push forward for additional resilience
Gary Williams:in , like, the servers and moving more towards things like virtual machines,
Gary Williams:so that we had the ability to clone and do other bits and pieces, because
Gary Williams:we could use that as an experience.
Gary Williams:It's look, this is how long potentially worst case scenario it will take.
Gary Williams:It shouldn't because we're learning and we need to do this a lot more often.
Gary Williams:We need to allocate time to do this.
Gary Williams:And the beauty was again, being such a small company.
Gary Williams:We actually had the ear of a couple of directors, so you could
Gary Williams:put this case forward and they were really receptive to it.
W. Curtis Preston:I want to tack on something you said there.
W. Curtis Preston:the fact that you and I have been in that timeframe.
W. Curtis Preston:young kids today, they don't understand what it was like back then, when you had
W. Curtis Preston:no resiliency, you had no redundancy.
W. Curtis Preston:You had nothing.
W. Curtis Preston:So we had a server that a server had a disk drive.
W. Curtis Preston:We didn't have mirroring.
W. Curtis Preston:We didn't have
Gary Williams:RAID.
Gary Williams:Although we had very, we had rightful life.
Gary Williams:We got, we were really market.
W. Curtis Preston:we didn't, when I was back in the day, we
W. Curtis Preston:literally were installing data directly on individual disk drives.
W. Curtis Preston:I think we might've had redundant power supplies on the servers that
W. Curtis Preston:we were using, and that was it.
W. Curtis Preston:And so the loss of any one of those components could take the server.
W. Curtis Preston:Right.
W. Curtis Preston:And, and now nowadays we move forward to the days of virtualization and
W. Curtis Preston:that you can just, if there's a little problem with this server, you just
W. Curtis Preston:move your VM over to another server.
W. Curtis Preston:In fact, you can V motion at and storage V motion, and you can
W. Curtis Preston:move it while it's running, which continues to boggle my brain.
Gary Williams:likewise.
W. Curtis Preston:And also the devices that would, that so
W. Curtis Preston:many of us have grown used to.
W. Curtis Preston:I, at home I pretty much live a solid state life.
W. Curtis Preston:My TiVo has a solid state hard drive.
W. Curtis Preston:and so those are so much more reliable than the moving part
W. Curtis Preston:drives that you and I grew up on.
W. Curtis Preston:and I think as a result, they don't have
W. Curtis Preston:the respect that you need to do to test backups the way you should.
W. Curtis Preston:I don't know.
W. Curtis Preston:Just a quick editor's note.
W. Curtis Preston:In the next section, Gary is going to mention something called iLO and iDRAC.
W. Curtis Preston:And he, we forgot to have him define it.
W. Curtis Preston:So I'm doing that now.
W. Curtis Preston:They are systems from Dell and HP, the integrated Dell remote access
W. Curtis Preston:controller and HP integrated lights out.
W. Curtis Preston:They're both systems that help increase the uptime of the server by notifying
W. Curtis Preston:you of potential failures or issues.
W. Curtis Preston:Back to your podcast.
Gary Williams:One of the things that we still do today, and this is
Gary Williams:probably me being paranoid coming from that environment, we didn't
Gary Williams:get alerts on a service if a disk failed, because it didn't really know.
Gary Williams:The ILOs and iDRACs were way too expensive for us to have at that point.
Gary Williams:So daily server room checks go around.
Gary Williams:Is there any flashing lights that shouldn't be flashing?
Gary Williams:And we still do that in our data centers today.
Gary Williams:And we still do that with some of our machines.
Gary Williams:We've actually got this philosophy in place now where if a machine is up for
Gary Williams:more than 30 days, it needs to be rebooted because we don't know if it's reboot safe.
Gary Williams:So we're starting to put uptime alarms in.
Gary Williams:Certainly on Windows.
Gary Williams:Linux is a bit different, but with Windows, when it hits a 30-day point.
Gary Williams:If we get an uptime alarm, it means that there's possibly
Gary Williams:a patching issue with that.
Gary Williams:We should get an alarm from the patching system as well.
Gary Williams:So we go off and we check.
Gary Williams:but the other thing we do something similar with Linux as well.
Gary Williams:We're trying to get all the Linux servers rebooted because generally
Gary Williams:with those, we can patch them hot, but we still want to get them rebooted.
Gary Williams:Are they reboot safe?
Gary Williams:Because if we do lose power or machine crashes, it's great having all that stuff
Gary Williams:there, but if it doesn't reboot, we've got a problem and we may have a backup if
Gary Williams:that backup is inherited that corruption or that problem we're in a bad place.
Gary Williams:So we do try to make sure that we've got, these servers rebooted on a fairly regular
Prasanna Malaiyandi:
Speaker:actually very interesting.
Prasanna Malaiyandi:
Speaker:I never thought about that About the fact that you need to reboot the systems
Prasanna Malaiyandi:
Speaker:and just make sure is a hardware and dos and everything else could be.
Gary Williams:Absolutely.
Gary Williams:The other thing that we've done is we've actually turned up time
Gary Williams:on his head now in the old days.
Gary Williams:Now these uptime figures of two years, three years, we'll put on the
Gary Williams:internet and it's look at my up time.
Gary Williams:Now it's the other way around.
Gary Williams:It's like, yeah, there's an uptime of 45 days.
Gary Williams:Oh, look at my uptime.
Gary Williams:That's bad.
Gary Williams:We need to get this rebooted.
Gary Williams:And check it is reboot safe.
Gary Williams:Trying to find reboot windows sometimes is a bit difficult, even with all the
Gary Williams:resilience . Just take systems down.
Gary Williams:but we do have some sort of bargaining going on with various teams where
Gary Williams:we do try and reboot the systems at least once a month, just to make
Gary Williams:sure that they are reboot safe.
W. Curtis Preston:So help me understand that phrase.
W. Curtis Preston:W what do you mean when you say reboot safe?
Gary Williams:So reboot safe is simply that if it's potentially a change can be
Gary Williams:made to a machine, that means a machine when it reboots is going to crash, or
Gary Williams:there's going to be a problem where it can't complete the boot corrupted
Gary Williams:boot loader or something like that.
Gary Williams:We've seen issues in the past where.
Gary Williams:Microsoft update has corrupted the bootloader.
Gary Williams:So when you go to reboot, it doesn't restart properly.
Gary Williams:So we've actually got the term reboot safe, which just means I know
Gary Williams:if I have to reboot that server, I don't have to worry about it.
Gary Williams:It will come up.
Gary Williams:You're printing system will start all the services that need to start will start,
Gary Williams:because we've had issues in the past where certain key services don't start.
Gary Williams:So we get a ticket.
Gary Williams:Can you please reboot this machine?
Gary Williams:Sure.
Gary Williams:Reboot it, you walk off.
Gary Williams:You think it's done, but the services don't start.
Gary Williams:Now, the alerting will alert on that.
Gary Williams:But in the meantime, you potentially still down for a bit longer than you need to be.
Gary Williams:So we do these tests where we just want something sure all the services
Gary Williams:that need to start actually start.
Gary Williams:And it comes up completely clean and working exactly how it should.
W. Curtis Preston:you're giving me, Yeah.
W. Curtis Preston:And by the way, I agree with you with this idea of, the occasional
W. Curtis Preston:reboots and I agree that it's, that it's a practice that has gone by
W. Curtis Preston:the wayside by a lot of people.
W. Curtis Preston:And I remember, I can remember the first time I left my,
W. Curtis Preston:this is before I got the Mr.
W. Curtis Preston:Backup.
W. Curtis Preston:moniker and I got a different moniker and I'll explain it in a minute.
W. Curtis Preston:I was at a large oil and gas company and no one had administered the data
W. Curtis Preston:center, like a real sysadmin in years.
W. Curtis Preston:And so I was going in there and I was doing crazy things like
W. Curtis Preston:installing the latest patch set.
W. Curtis Preston:And this was, these were a Solaris systems and, it required a reboot
W. Curtis Preston:in order to, to install the patches.
W. Curtis Preston:And what was happening was I was like, 0 for 10, in terms
W. Curtis Preston:of I would install a patch.
W. Curtis Preston:I would reboot the server and it wouldn't come back.
W. Curtis Preston:And so I picked up the nickname crash, because that's what I was
W. Curtis Preston:just, I was literally, it's like the cure is worse than the disease.
W. Curtis Preston:So it's we need to do this, but I was doing, I was proactively
W. Curtis Preston:doing damage to the environment.
W. Curtis Preston:By doing the things I was doing, what I did get really good at though is restoring
W. Curtis Preston:their environment because it kept,
W. Curtis Preston:so what it turned out, the things that were really.
W. Curtis Preston:I don't know uh in trouble were the disks themselves, because we actually
W. Curtis Preston:powered down the servers for some of them.
W. Curtis Preston:And that's when things really went awry because the disk
W. Curtis Preston:drives had never been turned off.
W. Curtis Preston:And then, yeah.
W. Curtis Preston:And then, they wouldn't come back on.
W. Curtis Preston:So I had to get all new disk drives and then, and then do the restore, but yeah.
Prasanna Malaiyandi:Yeah.
Gary Williams:Yeah, but even with the virtual machines, we still like to
Gary Williams:reboot them and to make sure all the services that should come up do come up.
Gary Williams:We've even in some cases taken that paranoia to the next level where we'll
Gary Williams:do a reboot test before we install a patch or before we do something,
Gary Williams:just to make sure that it's not, that patch that has caused a problem.
Gary Williams:Now, we generally don't do that for the Microsoft patches, but we do
Gary Williams:that for certain application patches.
Gary Williams:And it's almost a sanity check.
Gary Williams:Because that way, if there is a problem, we know that it is that patch
Gary Williams:that has caused a problem and not something lurking from beforehand.
Prasanna Malaiyandi:Going back to the article you wrote, Gary, one of the
Prasanna Malaiyandi:things I liked in it was you talked about this spreadsheet, if you will, that
Prasanna Malaiyandi:track sort of assets that were backed up and you had a methodology that you
Prasanna Malaiyandi:called out in the article in terms of how long you would wait before something
Prasanna Malaiyandi:had to be tested, Or how the longest something could go without being tested.
Prasanna Malaiyandi:And there were certain things that were critical in your environment that sort
Prasanna Malaiyandi:of had to be done more periodically.
Gary Williams:Yeah.
Gary Williams:So what we did is.
Gary Williams:we had a spreadsheet, the list of all the backups anyway, and one of the
Gary Williams:things we tried to do was make sure that there was no clashing backups.
Gary Williams:So the exchange server would get backed up at say, 10:00 PM.
Gary Williams:The file server get backed up at 11:00 PM.
Gary Williams:That kind of thing, because otherwise we found there was a
Gary Williams:lot of issues on the network and latency and all this kind of thing.
Gary Williams:So we wanted to stagger the backups as much as possible.
Gary Williams:But what we did was we actually added a column to that spreadsheet that said.
Gary Williams:Restore last tested, documentation last updated, that kind of thing.
Gary Williams:So that we new when the backups were tested and we knew when that
Gary Williams:documentation was last updated.
Gary Williams:And what we do is we actually have, there was a formula in
Gary Williams:it that would color the cells.
Gary Williams:And if it was all green, everything's fine.
Gary Williams:We've done a recent test.
Gary Williams:I think recent was like six months, 12 months, something like that.
Gary Williams:and if anything was over outside of that window, it would go red.
Gary Williams:So I think the exchange was every six months, the active
Gary Williams:directory was once a year.
Gary Williams:The file server was I think we would restore a folder or a file
Gary Williams:every month, something like that.
Gary Williams:and we did this quite a lot and we actually slowed down some
Gary Williams:of the tapes going off site for things like the file server.
Gary Williams:So we could do a backup a couple of days later, you do a restore test,
Gary Williams:update the date in the documentation.
Gary Williams:We know that's good.
Gary Williams:Send the tape off-site and that's actually funny enough.
Gary Williams:That was a financial reason as well because of the cost
Gary Williams:of sending the tapes offsite.
Gary Williams:but yeah, we started to do that and we started to get quite good
Gary Williams:at being able to do these restores.
Gary Williams:We were even able to get some additional hardware and we even
Gary Williams:starting to do some tests where we're restoring to virtual machines.
Gary Williams:Because doing that process.
Gary Williams:We found we could get them up and running a lot quicker.
Gary Williams:We had a bit more room to breathe and we could have a
Gary Williams:much better virtual environment.
Gary Williams:And then we've got into some other really clever stuff where we had a physical
Gary Williams:domain controller and a virtual domain controller, and we tested fail-over
Gary Williams:and all this, we got really advanced
W. Curtis Preston:So you're saying that, the green column was
W. Curtis Preston:actually the color of that column was automatically determined by By
Gary Williams:the age of the last test.
W. Curtis Preston:That's pretty cool
Prasanna Malaiyandi:Conditional formatting, Curtis in Excel.
Gary Williams:that's it.
W. Curtis Preston:you're probably better at Excel that I am,
W. Curtis Preston:but, Gary, this has been great.
W. Curtis Preston:I, I love this story.
W. Curtis Preston:I love that it, like the other story we had, where, I don't know if you
W. Curtis Preston:listen to the podcast at all, Gary, but we had an episode where someone, they
W. Curtis Preston:tested their backups by essentially deleting their entire data center.
Prasanna Malaiyandi:Paul van Dyke episode 135.
Gary Williams:Wow.
Gary Williams:I haven't heard that one.
Gary Williams:I have heard some of the others and I have to say I'm a fan.
W. Curtis Preston:And it was that one that would just, it
W. Curtis Preston:hurt to, to listen to his story.
W. Curtis Preston:And it was, he agrees that it was a really dumb idea.
W. Curtis Preston:It did eventually work out, but it it took him awhile.
Gary Williams:I can imagine.
Gary Williams:I I just remember the pain of the exchange server and whilst I've not had
Gary Williams:a repeat of that pain since, because.
Gary Williams:The software is better these days, the restores are a lot quicker and you do
Gary Williams:have a lot more options to play with.
Gary Williams:we still have that pain from time to time when trying to do certain restores
Gary Williams:and testing the environment out.
Gary Williams:So I am still not that brave to do something like that, but, yeah,
Gary Williams:I think we're getting there and.
W. Curtis Preston:Not brave be the word I would use, but.
Gary Williams:Now we have talks about bringing in things like the chaos
Gary Williams:monkey and taking down things, but yeah, that's a test for another day.
W. Curtis Preston:Yeah.
W. Curtis Preston:thanks Prasanna for your usual great questions
Prasanna Malaiyandi:Always and nice chatting with you, Gary.
Prasanna Malaiyandi:That was fun.
Gary Williams:Thank you.
W. Curtis Preston:and, thanks to the listeners again.
W. Curtis Preston:this is you're why we're here.
W. Curtis Preston:You're why we sit here and talk to us.
Prasanna Malaiyandi:Curtis.
Prasanna Malaiyandi:And we'll talk to each other anyway.
Prasanna Malaiyandi:It doesn't matter.
W. Curtis Preston:Yeah.
W. Curtis Preston:Yeah, exactly.
W. Curtis Preston:We'll probably be talking about table saws or video editing tools,
W. Curtis Preston:but, anyway, remember to subscribe so that you can restore it all.