There are dozens of things that people do to protect their data
Speaker:from loss, but many of them are worthless when you actually need them.
Speaker:In this episode, we'll learn what turns a copy into a backup so that
Speaker:you can make sure that anything you think is a backup actually is one.
Speaker:We'll also talk about some important backup concepts like
Speaker:multiplexing, incremental backups, block-level incremental backups
Speaker:and source side deduplication.
Speaker:Hi, I'm W.
Speaker:Curtis Preston, AKA Mister backup.
Speaker:And I've been specializing in backup and disaster recovery for over 30 years.
Speaker:My podcast turns unappreciated backup admins into cyber recovery heroes.
Speaker:This is the backup wrap up.
Speaker:. Hi and welcome to the show.
Speaker:and I have with me a guy who I think is super jelly of the new
Speaker:toy that I put in yesterday.
Speaker:Prasanna Malaiyandi.
Speaker:How's it going, Prasanna?
Speaker:I'm good Curtis, and yes, I am jealous of your toy, but I will have to start
Speaker:sending you a bill for consulting fees, when you inevitably, or if
Speaker:you inevitably run into issues.
Speaker:Huh?
Speaker:Yeah, because as you recall, I didn't, I purchased this like without even talking
Speaker:to you, which is rather atypical of me.
Speaker:because I, we talk about so much and, and basically what we're talking about
Speaker:is a Firewalla, which I've had my eye on for a while and then after I realized,
Speaker:so I've switched internet service providers and now they're telling me
Speaker:that I'm hitting the bandwidth limit already and which is highly possible
Speaker:given that I do, this, I realized that I had no bandwidth monitoring tools.
Speaker:I have this really nice, mesh router system.
Speaker:Wire, the wire, the wifi mesh, but that's put been put an access
Speaker:point mode, which then of course offers no bandwidth monitoring.
Speaker:And then I had this Cox router, which offered me nothing.
Speaker:And so I replaced the Cox router with the Firewallet Purple SE and man.
Speaker:Super simple to put in there.
Speaker:And now I have these like super, stats.
Speaker:And I get these, I'm going to, at some point I'm going to have to disable
Speaker:the notifications because it's like Curtis is playing games on his phone.
Speaker:Curtis is watching YouTube videos on MacBook Pro A, right?
Speaker:It's literally, it's like Curtis has downloaded 3.
Speaker:56 gigabytes of video on his, and I'm like, okay, this
Speaker:is going to get old pretty
Speaker:I have two questions for you.
Speaker:Yeah,
Speaker:The first question is, Did you figure out what was consuming
Speaker:your data cap or data usage cap?
Speaker:not yet.
Speaker:Cause it's only, it hasn't even been 24 hours, but I do have a pretty good guess
Speaker:and I think it's right in front of me.
Speaker:we'll see what is weird.
Speaker:And again, I didn't want to talk about this too much, but what is weird is I
Speaker:get these, some of the notifications it's, Mac book pro uploaded 3.
Speaker:5 megabytes of data to LinkedIn.
Speaker:At 3 45 a.
Speaker:m.
Speaker:And I'm like, what, like, why is my laptop uploading three and a half
Speaker:megabytes of anything while it's just sitting here and I'm sleeping somewhere?
Speaker:that's weird.
Speaker:weird.
Speaker:Yeah.
Speaker:So anyway, so yeah, go ahead.
Speaker:Yeah.
Speaker:since you've decided to get rid of the Cox router, can you not just
Speaker:use your Wi Fi mesh as a router?
Speaker:I could have, but then I would have to completely redo my
Speaker:network architecture, which as you recall, was a really big thing.
Speaker:That's true.
Speaker:And I actually really liked the firewall features of this.
Speaker:That's, that was what would, what really drew me to it.
Speaker:And I'd been thinking about it and this was that final excuse to get it.
Speaker:and, I'm really enjoying the security aspects
Speaker:glad.
Speaker:So there, for people, if Mr.
Speaker:Backup can learn networking and firewalls, so can you.
Speaker:Yeah.
Speaker:It's certainly not my, my forte.
Speaker:but I want to talk, it's time for the news of the week.
Speaker:, the big news, I think, of the entire IG world, everybody seems to be
Speaker:talking about it, is this MGM hack.
Speaker:I can't, can you imagine?
Speaker:if you've been living in a hole, They shut down MGM and Caesars and all
Speaker:of the hotels attached to MGM and Caesars, which is like half the strip.
Speaker:And they shut down like card keys, slot machines,
Speaker:ATMs.
Speaker:everything.
Speaker:ATMs,
Speaker:So for, so for our listeners, Who may not know about this,
Speaker:MGM is a hotel chain, right?
Speaker:They have a bunch of things as well, right?
Speaker:Like various hotels like Caesars and MGM, and they are in Las Vegas
Speaker:and there are casinos, right?
Speaker:So you can stay there, you can gamble there, right?
Speaker:They make lots and lots and lots and lots of money.
Speaker:but not in the last week or so,
Speaker:And they got hit by a cyber attack, on the week of September
Speaker:20th or so, I'm guessing.
Speaker:yeah, and The I think that's the saddest part about this and by the way as of
Speaker:today There's been half a dozen lawsuits attached because there's, threat of a,
Speaker:PII leak, personal information leak.
Speaker:And so there's been all sorts of worries about that.
Speaker:So there's, a half a dozen, what do you call it?
Speaker:Class action or lawsuits that are attempting.
Speaker:To achieve class action status that have been filed.
Speaker:I think the saddest part here and the way I like and we'll put the
Speaker:link to this particular article in the show description, the
Speaker:heading here targeting layer eight.
Speaker:I've heard of the seven layer networking model again With my extensive
Speaker:networking experience, the OSI model.
Speaker:What is Layer 8,
Speaker:People?
Speaker:it's people.
Speaker:Yes.
Speaker:So layer 8 is people, which is probably the weakest part
Speaker:of the entire stack, I'd say.
Speaker:Yeah.
Speaker:You are the weakest link!
Speaker:Yeah, so what, how did they get in?
Speaker:So they basically targeted an employee, right?
Speaker:Who had the right level of access and They basically were able to gain access into
Speaker:their Okta environment as a super admin.
Speaker:But how did they do that?
Speaker:That's the
Speaker:Oh, so how did they do that?
Speaker:They basically tripped, tricked their IT help desk?
Speaker:That is so bad, right?
Speaker:they somehow got...
Speaker:Access to a privileged account, right?
Speaker:according to the powers that be that they stole a password or they
Speaker:hacked Active Directory somehow.
Speaker:So they were able to attempt to log in, but they were stopped by MFA
Speaker:which is a good thing, Okta, but then
Speaker:They were able to convince the help desk that they were the person in
Speaker:question and get them to reset MFA.
Speaker:Now here's a question.
Speaker:Do you think that employee is still there at the company?
Speaker:And
Speaker:is one of those
Speaker:you be blaming the person
Speaker:so I'm going to fast forward like 30 years.
Speaker:Okay.
Speaker:so 20, what would that be?
Speaker:2053.
Speaker:There's a guy he's going to be called Mr.
Speaker:MFA and he's going to have a podcast dedicated to security because like my
Speaker:career started with a screw up of this.
Speaker:not quite this magnitude, but my career started with this.
Speaker:And so I, My personal opinion, I don't know if this person is, has been fired.
Speaker:I think they should only be fired if they didn't follow the processes that had been,
Speaker:established and they
Speaker:set out for them.
Speaker:Potentially they should be disciplined.
Speaker:I don't know if firing, if.
Speaker:If termination is the appropriate, they should be disciplined.
Speaker:If they followed the procedures that had been laid out for
Speaker:them, process, people, right?
Speaker:Then technology.
Speaker:If they had been, we just had a podcast about that.
Speaker:If they followed the procedures.
Speaker:have been given to them.
Speaker:Then I think some massive leniency, then you update your procedures,
Speaker:et cetera, et cetera, et cetera.
Speaker:I think of a massive, outage that was caused at a major, software
Speaker:vendor that I worked with.
Speaker:I'm trying to be, I'm trying to be very cagey here, where the backup
Speaker:operator followed his Procedure that they had two parts of the app
Speaker:that had to be shut down in order to do a backup because they couldn't
Speaker:synchronize the two backup systems.
Speaker:And so every two weeks they would shut down these apps and,
Speaker:and then do a backup offline.
Speaker:And this person.
Speaker:The backup operator just did what they were told to do and shut down these
Speaker:apps at the most critical time of the year when the apps were needed, right?
Speaker:The person was just doing their job.
Speaker:That person should not be fired.
Speaker:That person should be You know, you changed the procedure.
Speaker:So I don't know what happened here.
Speaker:Yeah.
Speaker:You train.
Speaker:Yeah.
Speaker:I hope some leniency was there.
Speaker:if the person was fired and, I'd love to have them on the podcast.
Speaker:But anyway, what could we learn from this, from this news here?
Speaker:Prasanna?
Speaker:basically that, one is even if you have the greatest technologies in
Speaker:place and the greatest processes in place, people will always exist
Speaker:never underestimate the power of people to do dumb things.
Speaker:I do think that perhaps what's in order here is an update to process.
Speaker:And the process should be when, cause you have to be able to reset MFA.
Speaker:When resetting MFA, it should require many more, bells and whistles
Speaker:and levels of authentication.
Speaker:And we need to identify, we need to identify that this person who
Speaker:calls in that says that they're Steve, we need a way to identify
Speaker:that Steve is actually Steve.
Speaker:so you create a process around that, that really verifies that someone who they are,
Speaker:and especially,
Speaker:you reset MFA.
Speaker:and especially when it's someone of, with that level of privilege.
Speaker:Especially, super especially, that's not a word, but yeah,
Speaker:that, oh, I feel for these guys.
Speaker:keep, abreast of, this story because it is going to get worse before it gets better.
Speaker:And that's the news for this week.
Speaker:So what I thought we would talk about this week and the backup to basic series is,
Speaker:I've got it defined as, backup methods that support a traditional restore.
Speaker:So basically the backup methods that I grew up with that are still,
Speaker:in
Speaker:Relevant.
Speaker:Yeah.
Speaker:in, yeah, right?
Speaker:we like to live in a world where everybody's using the
Speaker:latest and greatest, right?
Speaker:And nobody's doing this old, full and incremental backups and stuff.
Speaker:Nobody's doing that.
Speaker:And that's just not right.
Speaker:So we need to talk about these, these methods and see what we can get out there.
Speaker:the first thing, I just have to, again, I'm, I'm, we're doing this based
Speaker:on, my book, Modern Data Protection, There's a cover for those of you
Speaker:watching via video, all, all three listeners that are watching via video.
Speaker:I think it's 10.
Speaker:There's maybe 10.
Speaker:the number's actually gone up since we've been putting them on YouTube.
Speaker:Oh, there you go.
Speaker:the, I've got this thing in here.
Speaker:so this is from chapter nine and talking about backup and
Speaker:recovery software methods.
Speaker:And the first thing I had in there was, is everything backup?
Speaker:So there was a time when backup was well defined.
Speaker:Backup was copy something to tape and then put that tape in a box,
Speaker:right?
Speaker:It was so simple back then.
Speaker:Yeah, it was so simple back then.
Speaker:Yes.
Speaker:so I, as quote, Mr.
Speaker:Backup, I see backup a lot broader than I think a lot of people do.
Speaker:A lot of people, when they say backup, they go, Oh, this isn't backup.
Speaker:This is, to me, backup is anything really that protects the data, the
Speaker:way backup protects data, right?
Speaker:And so I'm defining backup rather broadly as anything that is a copy of data
Speaker:stored separately from the original.
Speaker:that can be used to restore the original if it is damaged.
Speaker:There's a lot of things that qualify for backup as backup under that
Speaker:so let me just give you some examples and see if you think they qualify.
Speaker:Okay.
Speaker:So take a copy on tape,
Speaker:Yes.
Speaker:a copy in AWS S3.
Speaker:A copy of the data that's in S3, which is separate from
Speaker:The
Speaker:your not.
Speaker:Yeah.
Speaker:yes.
Speaker:okay?
Speaker:a copy replicated from one storage system to another storage
Speaker:system from the same vendor.
Speaker:as long as...
Speaker:there's a caveat here, because you used the word replication.
Speaker:I need the ability, is it replicated in such a way that if
Speaker:I damage production, so
Speaker:that doesn't qualify as being stored separately.
Speaker:Replicated with separate retention of the copies on the destination.
Speaker:Okay.
Speaker:Yes, I would call that a
Speaker:Okay, snapshots on a production system, on a production storage
Speaker:array that does not include AWS S3,
Speaker:thank you for, yeah, so snapshots on the same array.
Speaker:no, End of story, not a backup until it's copied somewhere
Speaker:Okay, and then doing what you were recently doing when
Speaker:editing the podcast, right?
Speaker:Downloading a copy from the cloud onto your local system, copying it
Speaker:to a different directory, and then copying it to yet a third directory.
Speaker:On your local system.
Speaker:Is that local system considered backups, each of those copies?
Speaker:again, we're storing the data in a separate place that
Speaker:has a separate risk profile.
Speaker:Etc.
Speaker:yes,
Speaker:As long as the copy, the original
Speaker:copy was in the, cloud.
Speaker:it's also about, the purpose of why I'm doing it, right?
Speaker:If the purpose of downloading that is to serve as possibly a backup, right?
Speaker:Because there's a lot of times that we download data That
Speaker:is not for backup purposes.
Speaker:Now, it could accidentally become a backup if it's the only
Speaker:copy that you have available.
Speaker:But, just because I copy doesn't necessarily make it a backup.
Speaker:It might be an archive.
Speaker:And then the last example.
Speaker:taking pictures on your iPhone and using iCloud to sync
Speaker:your copies to iCloud photos.
Speaker:Not a backup.
Speaker:Because,
Speaker:is that?
Speaker:for two reasons.
Speaker:One, which is really the primary.
Speaker:And that is specifically in terms of Apple iCloud.
Speaker:But the biggest thing is that it's synchronized.
Speaker:that's the key.
Speaker:That's, you're, you asked earlier, you delete a picture in your phone or some
Speaker:app, delete, some like ransomware deletes a bunch of pictures in your phone.
Speaker:It synchronizes that deletion up in the cloud and they go byebye, right?
Speaker:It is a synchronized copy, not a backup.
Speaker:it is stored separately, but if you delete it here and it gets deleted
Speaker:there, that's not a backup, right?
Speaker:Just like we were talking, before.
Speaker:And that's one really important reason, possibly the most important reason.
Speaker:But the other is that there's a feature in iPhone that...
Speaker:It says we can store low res copies on the phone and the high res copies in
Speaker:the cloud, which means that not only is it a synchronized copy, the only true
Speaker:copy of your photo is in the cloud.
Speaker:It's only one copy, which means you need to be backing up iCloud.
Speaker:and by extension also, Google photos if you're an Android
Speaker:person, so yeah, not a backup,
Speaker:okay, no,
Speaker:which we had a whole podcast episode about that.
Speaker:How to properly back up your iCloud account.
Speaker:yeah,
Speaker:were good examples.
Speaker:I think those are a lot of things, like you said, right?
Speaker:It's not always easy to say, is it a backup or not?
Speaker:Unless you dive into the next level of questions and ask, okay, is it really a
Speaker:yeah,
Speaker:Does it meet these
Speaker:I think.
Speaker:or not?
Speaker:I think you did a good job of, the different categories, like that
Speaker:thing of, if it's fully synchronized, whether synchronous or asynchronous,
Speaker:if it's fully synchronized and if I delete the production and
Speaker:it deletes the data, the copy,
Speaker:that, That's not a backup.
Speaker:right?
Speaker:unless that copy has the ability to undo that.
Speaker:If it does, then, I would change my answer, right?
Speaker:And so like a NetApp synchronized filer, I would consider that
Speaker:other copy, that would be backup,
Speaker:other things that are not a backup, one that you didn't mention would be,
Speaker:the recycle bin in your Microsoft 365.
Speaker:That is not a backup, right?
Speaker:It's not stored separately.
Speaker:it's just, records in a database that have been flagged as deleted.
Speaker:They haven't gone anywhere.
Speaker:They're sitting right next to the production data.
Speaker:So yeah,
Speaker:Okay.
Speaker:And then the other one is,
Speaker:So in your opinion, does backup require you to always be able to go
Speaker:back to a point in time that could plausibly have existed in the system?
Speaker:And the reason I'm asking this is if I look at, I know email archiving comes
Speaker:up a lot and sometimes people are like, oh, that's the same as backup.
Speaker:But with email archive, you're just getting all the data that's there,
Speaker:whether or not your mailbox actually looked like that, your inbox looked
Speaker:like that or not at any point in time.
Speaker:Yeah.
Speaker:So backup.
Speaker:requires restore, right?
Speaker:For it to be a backup, you need to be able to restore it to the way it
Speaker:looked at some point in time, right?
Speaker:yeah, that's a really good question, Prasanna.
Speaker:it's one thing to say a file, but, if you cannot, if you cannot bring
Speaker:the thing that's been damaged back to its You know, back to before it
Speaker:was damaged and that it comes back to the same way as it was before it was
Speaker:damaged, then you don't have a backup.
Speaker:You copy of the data, right?
Speaker:And an email archive is a perfect example of that.
Speaker:You have a copy of the data, but it was stored for a different purpose.
Speaker:It was stored for archive, which means it wasn't designed to be put back into the,
Speaker:the state it was in,
Speaker:yeah, the state that it was in,
Speaker:right?
Speaker:so you might be able to restore all the email, but you won't be able to
Speaker:restore folders and things like that.
Speaker:A good backup should bring the thing back to the way it was before it was damaged,
Speaker:however it let's go back to a time when tape drive started getting, so here,
Speaker:we're going to talk about a feature that is now for many people, passe, right?
Speaker:it's not really necessary because they no longer use tape as their primary target
Speaker:or their initial target of backups.
Speaker:and that is this concept of multiplexing.
Speaker:And it goes back to, there was a time when we
Speaker:Way back in the days.
Speaker:right back in the day.
Speaker:So multiplexing, do you want to define multiplexing or explain it?
Speaker:Yeah, multiplexing.
Speaker:Yeah, I, let me attempt to, I know I wasn't aware of this before we started
Speaker:doing the podcast and you explained everything about tape and I know we've
Speaker:had a bunch of folks, tape experts on the podcast as well, but multiplexing is...
Speaker:to solve an issue where tape requires you to write at a certain speed.
Speaker:If you don't, it's bad.
Speaker:And tapes got faster and faster, but the problem was pumping data into the tape
Speaker:device itself wasn't going as quickly as the tape speeds were increasing.
Speaker:And so in order to solve that, what they decided to do was say, okay, Let's have
Speaker:multiple clients feed data into the tape device at the same time, and we will
Speaker:multiplex or basically write all those streams into the tape drive at the same
Speaker:time, keeping the tape device happy.
Speaker:While still being able to do all the backups.
Speaker:Yeah, another word for it would be interleaving.
Speaker:You did great.
Speaker:basically putting all, chopping them up into pieces and then
Speaker:putting together into one, turning a bunch of streams into one stream.
Speaker:And when we first started, we used multiplexing settings of four
Speaker:Which means four different
Speaker:turn and.
Speaker:Yeah, four different clients being combined into a stream to
Speaker:make a tape drive happy, but tape drives got faster and faster.
Speaker:The clients didn't get faster.
Speaker:And so by the time I left, by the time I used my last tape drive in
Speaker:production, we were up to 36, right?
Speaker:We were up to 36 streams together to, to make an individual tape drive happy.
Speaker:And the reason,
Speaker:I was gonna ask why.
Speaker:Yeah.
Speaker:Why were clients not fast enough
Speaker:yeah.
Speaker:So the reason that this was bad is that, what, why is the only reason we back up,
Speaker:to restore
Speaker:right?
Speaker:So when you
Speaker:go to
Speaker:do a restore,
Speaker:Yeah.
Speaker:yeah.
Speaker:When you go to do a restore, you have to read all 36 streams
Speaker:and throw 35 of them away.
Speaker:So your tape drive, the speed of your restore is going to be 1 35th.
Speaker:Of what it could potentially be if it hadn't been multiplexed,
Speaker:But if you're never doing restore tests, it doesn't really matter.
Speaker:Until you actually need to restore the data.
Speaker:yeah, if you're You're killing me you're killing me yeah, so it was one
Speaker:of these things where it was a Cut your nose off to to spite your face, right?
Speaker:So We felt that it was But it was a necessary evil.
Speaker:We, you could only restore if you've got backups done and we could only get
Speaker:backups done reliably if we were using multiplexing, but we knew that it was
Speaker:creating this problem and ultimately this was the undoing of tape from
Speaker:a backup and recovery perspective.
Speaker:We switched to destaging and.
Speaker:these other things to undo this, necessary evil.
Speaker:But, it, it was a mess.
Speaker:But that's what multiplexing is.
Speaker:So if you've heard about multiplexing, you don't need to do multiplexing
Speaker:if you're backing up to disk.
Speaker:Because disk can write at whatever speed you tell it to write at.
Speaker:And it can write a bunch of things at the same time.
Speaker:And you can give it 36 streams and it can write them all at the same time in
Speaker:separate places of the disk in such a way that when you go to do a restore,
Speaker:you don't, you're not, you don't have to read all of them to read one of them.
Speaker:What?
Speaker:That was my yes.
Speaker:disk is fast enough, but
Speaker:Yeah.
Speaker:Well, it's not,
Speaker:a disk
Speaker:drive has a certain
Speaker:number of IOPS it could handle.
Speaker:And therefore, as long as your system is big enough.
Speaker:To handle all of them in peril.
Speaker:yes.
Speaker:they're, disk drives are not, Unlimited bandwidth, unlimited IO,
Speaker:et cetera, et cetera, et cetera.
Speaker:Yes.
Speaker:but the point of the way that it lays the data, you don't have to lay the,
Speaker:you can lay the data however you want and then read it however you want.
Speaker:there are, again, there are limits to everything depending on how
Speaker:much you fragment the data and all that kind of stuff, right?
Speaker:But it's still way better than tape from that perspective.
Speaker:All right, next one's a whole lot easier.
Speaker:What comes next?
Speaker:What's the first type of, what's
Speaker:let you tackle
Speaker:what, no, I'll let you tackle this, Curtis.
Speaker:So what's the, what is it?
Speaker:The first type of backup that everyone should cut their teeth on.
Speaker:what a full backup?
Speaker:Is that
Speaker:Yeah.
Speaker:what you're saying?
Speaker:Yeah.
Speaker:so basically we're just going to talk about this concept of
Speaker:full and incremental backups.
Speaker:And probably everybody knows this, but this is a backup to basic series.
Speaker:So a full backup backs up everything, an incremental backup
Speaker:backs up things that have changed.
Speaker:And the, there are different types of incremental backups, right?
Speaker:And different people have different names for these different types, right?
Speaker:terms you've probably heard, incremental, differential, cumulative incremental.
Speaker:For a lot of people, cumulative incremental and
Speaker:differential are the same thing.
Speaker:for people that got stuck in Windows land, not necessarily so what's the
Speaker:difference between an incremental and these other two things?
Speaker:A cumulative incremental.
Speaker:So an incremental is basically, Typically, Sunday you do a full backup, right?
Speaker:Monday you need to do another backup.
Speaker:Now, you don't want to do necessarily the entire full backup again,
Speaker:because maybe that's too much data, you don't have enough time, etc.
Speaker:So you'll do an incremental, which is basically whatever has
Speaker:changed since the last full.
Speaker:So since Sunday.
Speaker:Sorry, since the last time you did a backup, I should say.
Speaker:exactly, whatever's changed since the last time you did a
Speaker:Yeah, so in that case, it was Sunday, so then Monday you get the incrementals,
Speaker:now Tuesday you're going to do backup, and so you do another incremental, which
Speaker:is whatever has changed since Monday,
Speaker:Exactly,
Speaker:and we just keep doing that, right?
Speaker:Yeah, and
Speaker:then if it's, yeah, if it's Sunday, right?
Speaker:And now it's Saturday, how many tapes do I need to do a restore?
Speaker:do you need...
Speaker:The previous Sunday, plus the Monday, plus the Tuesday, plus
Speaker:the Wednesday, Thursday, Friday.
Speaker:You basically need to replay
Speaker:by the way, by the way, I really, I really channeled the old Curtis there.
Speaker:I did it without even meaning to, I said tapes, right?
Speaker:Cause
Speaker:that was the problem back then.
Speaker:We literally had to grab for seven tapes, right?
Speaker:Nowadays, we don't have to grab for seven tapes, but,
Speaker:but you still have to do all those restores though, right?
Speaker:So even in the case of, if a file existed Sunday, and then was deleted Monday,
Speaker:and then came back on Tuesday, you would still end up having to do all of those
Speaker:data, like basically you're replaying like a log, all the data that would
Speaker:have existed on each of those days.
Speaker:right.
Speaker:The real problem is a file that was changed every single day.
Speaker:You would actually restore that file seven times.
Speaker:It's a lot of wasted effort.
Speaker:That's just the idea of a increment or regular incremental.
Speaker:Then we have a differential or a cumulative incremental.
Speaker:And the difference between that is that it's going to, it's going to do
Speaker:the thing that you said earlier, which is it's going to back up everything
Speaker:that's changed since the fall.
Speaker:And so what some people do is that they've stopped, they stopped doing
Speaker:incrementals and they switched to differentials or cumulative incrementals
Speaker:every day, and that way at the end of the week, I would need at most two tapes.
Speaker:Right now, this whole thing has pretty much gone away in the world of.
Speaker:disk based backups, right?
Speaker:Because the whole reason that we did backups this way, is
Speaker:that, first off, let me back up.
Speaker:We used to do weekly fulls followed by daily incrementals.
Speaker:Then we switched for, because when we went to automated tape libraries,
Speaker:the whole process of managing the different tapes wasn't as a big.
Speaker:Big of a deal.
Speaker:So we went to monthly folds followed by daily incrementals or maybe
Speaker:a weekly cumulative and right?
Speaker:So you'd still need a maximum of seven tapes to do a restore But when
Speaker:we switched to this this whole thing just became Kind of silly and moot and
Speaker:whatever and you could back up, however, you wanted to back up and dedupe,
Speaker:which we're going to talk about in a minute, dedupe really changed the game.
Speaker:And, because it didn't matter whether you backed up full or incremental or whatever,
Speaker:you still stored the same amount of data.
Speaker:go
Speaker:before we jump though, one thing that I think people might also hear in addition
Speaker:to fulls, incrementals, differentials, and cumulative incrementals is also levels.
Speaker:So maybe you could talk about levels.
Speaker:I know sometimes it's specific to like Oracle.
Speaker:And some databases, but maybe it might
Speaker:no, that's a good point.
Speaker:Yeah, thanks.
Speaker:so the concept of a backup level, literally, this goes
Speaker:back to the days of dump, right?
Speaker:which was the command to backup Unix file systems.
Speaker:A level zero was a full, a level one, And if you wanted to do increment, if you
Speaker:want to do what we call the incremental backups, the way we, you would do a zero
Speaker:followed by a one, followed by a two, followed by a three, followed by a four.
Speaker:And, it got interesting because if you then lowered the number.
Speaker:It would behave like a,
Speaker:cumulative incremental, right?
Speaker:so like you could do a zero and then you do a one.
Speaker:If you then did another one, if you kept doing ones, you would get a differential.
Speaker:You would get a cumulative incremental every day.
Speaker:If you did a 0, a 1, and then a 2, and then a 1 again, it's just, it
Speaker:basically, it always pointed back to the number that was the most recent
Speaker:number that was lower than itself, and so it got complicated, and so
Speaker:there were actually some people that
Speaker:Is it they prefer
Speaker:called Towers of
Speaker:Hanoi, Yeah, which is based on the game, and I've got it in the book,
Speaker:the Towers of Hanoi progressive thing, but I can't, it's like 0, 3, 2, 4, so
Speaker:basically every backup, without doing cumulative incrementals, every backup,
Speaker:every file that was changed would end up being on two tapes, which was just
Speaker:an interesting way to, To minimize tape, again, this is all because we're doing
Speaker:tapes, but nobody has tapes anymore.
Speaker:So nobody cares.
Speaker:But that's what levels were.
Speaker:It was all the way up to nine.
Speaker:and they still have this concept in, in things like Oracle Backup.
Speaker:So the next thing to talk about is this concept called file
Speaker:level incremental forever.
Speaker:And the company that really put this out there was IBM with their product TSM.
Speaker:And back in the day,
Speaker:has been renamed,
Speaker:idea is you
Speaker:do one full, what's that?
Speaker:Hasn't it been renamed?
Speaker:It has, but I'm just saying they came out with it when they
Speaker:came out, it was called TSM.
Speaker:It's now like IBM spectrum protect, but, the idea was you do one full and then
Speaker:everything is an incremental forever.
Speaker:we never again do a full and this really saved a lot of
Speaker:bandwidth and saved a lot of tape.
Speaker:It came with a mess and that was over time and again, tape over time,
Speaker:you could end up needing hundreds of tapes to restore a single file system.
Speaker:you would need just one file from this tape and one file from that tape.
Speaker:And since the hardest part of a tape is like, it was like
Speaker:two and a half minutes just to get a tape in and, get it loaded and seek
Speaker:to So the average point in a tape.
Speaker:So I was not a fan of doing backups this way when we were talking about tape.
Speaker:Was there a reason?
Speaker:what was the use case at the time for that?
Speaker:it was about saving tape, saving
Speaker:storage.
Speaker:It was about saving bandwidth.
Speaker:the idea, there's nothing wrong with the idea of incremental forever.
Speaker:It's just that their implementation.
Speaker:Back in the day when it was all tape, even when they had disk staging.
Speaker:So they would stage the disk.
Speaker:So they wouldn't multiplex, by the way, they wouldn't multiplex.
Speaker:They would stage the disk and then they would, do the backups to tape.
Speaker:And this only applied to file system backups.
Speaker:It didn't apply to database backups.
Speaker:And, but literally you would need hundreds and hundreds of tapes
Speaker:to restore a single file system.
Speaker:And it just, I was never a fan of doing backups that way.
Speaker:As long as we were backing up to tape and they had ways to they had, co location
Speaker:and these various, and this thing called reclamation, because when you're doing
Speaker:backups that way, you end up with a lot of tapes that have files on them that have
Speaker:expired that are no longer needed, but you have other files on there that are needed.
Speaker:And so you'd have to copy forward.
Speaker:Yeah.
Speaker:so that you could reclaim that whole tape and then reuse it.
Speaker:And
Speaker:That sounds like a
Speaker:management nightmare.
Speaker:An interesting engineering problem, but...
Speaker:yeah, I was never a fan of doing backups that way.
Speaker:and I'm even less of a fan now that we don't have to worry about tape.
Speaker:Now we can just do incremental forever and just do it without all that
Speaker:co location and reclamation stuff.
Speaker:Cause on disk, to reclaim, you just delete a file, right?
Speaker:On tape, you delete a file in the middle of a tape.
Speaker:You have to reclaim the tape.
Speaker:so that's file level incremental forever.
Speaker:And then, with the advent of backing up to disk, Which finally
Speaker:happened, I don't know, 20 years ago.
Speaker:It's so funny.
Speaker:We, we say the advent of something that happened 20 years ago.
Speaker:When we finally started doing it, and once everybody finally went to, and by
Speaker:the way, everybody still is not backing up the desk, it's still, there's still
Speaker:a small contingent of people to back up the tape, so those people will really
Speaker:enjoy the first half of this episode.
Speaker:Now we have this concept of block level incremental forever.
Speaker:Would you like to explain that?
Speaker:Yeah, with block level incremental, I guess where I think of block level
Speaker:incremental, I know there's various places you can think about it, is
Speaker:when it applies to virtual machines and other sort of larger objects.
Speaker:where it doesn't make sense, to back up an entire VM, doing full, or, incremental
Speaker:backups away, if you think about how you would have done file level backups, right?
Speaker:Why would I
Speaker:Now, what, why would that be?
Speaker:because I have a file which represents a disk, the entire file
Speaker:doesn't change every time, right?
Speaker:Parts of the file
Speaker:it's, so we're talking to a VMDK file or VDK, For, for,
Speaker:Hyper V, VDDK, that can't say VDDK.
Speaker:I think it's VDDK.
Speaker:Yeah,
Speaker:I think you're right.
Speaker:so you're saying if anything changes on there,
Speaker:you're backing up the entire whole
Speaker:do an incremental, exactly.
Speaker:You're going
Speaker:it's the entire file change, right?
Speaker:So you're backing up the entire thing, but that doesn't make sense when
Speaker:you have files which are say 10, 50, 100, 200 gigabytes and you're backing
Speaker:that up every single time and so with block level incrementals What they
Speaker:basically have done is say, okay What blocks have changed in this VMDK?
Speaker:Let me just back those up, right?
Speaker:Oracle also for databases, they do something similar, right?
Speaker:Where it's hey Let me only back up the blocks within an Oracle data
Speaker:file that have changed rather than backing up the entire Oracle database.
Speaker:And how does the backup product know which blocks have changed?
Speaker:Usually you have to rely on that vendor to tell you.
Speaker:So in the case of Oracle, right?
Speaker:You're usually integrating with Oracle RMAN via SBT or some other
Speaker:mechanism where Oracle knows, okay, I keep track of the database blocks.
Speaker:I know which ones are new.
Speaker:Here is a list of blocks that you need to care about.
Speaker:Same thing with VMware, when you have their, what is their SDK called?
Speaker:VADP.
Speaker:Yeah.
Speaker:they've changed
Speaker:the name.
Speaker:Yeah.
Speaker:They've changed the name, but basically they're, they have an API to talk to,
Speaker:and they maintain a bitmap, right?
Speaker:And then they just give you, here's a map of the bits that you need to go get.
Speaker:These are the bits that have changed.
Speaker:They maintain that.
Speaker:And then the, there's an API for asking for those blocks,
Speaker:now this is great for disk based systems because if you think about these are
Speaker:all random spots in a file and so you can dump it out now It's up to figure
Speaker:out like how you want to do this and I know we'll talk a little bit later
Speaker:about deduplicated storage, but In the case of Oracle, typically you would just
Speaker:dump it out as incremental blocks, and just dump it into a file, and now you
Speaker:have all those blocks captured together.
Speaker:In the case of VMware, they started doing that.
Speaker:A lot of back up vendors would just dump it out as raw blocks, which makes sense.
Speaker:but then, there are other optimizations you can do to do smarter things with
Speaker:it, because with incremental block based backups, you still have to
Speaker:restore from multiple files in order to stitch together the final actual image.
Speaker:Yeah.
Speaker:And you still have that problem.
Speaker:That we talked about earlier where you may restore an individual block multiple
Speaker:times if it changes multiple times, right?
Speaker:the advantage is it's incredibly efficient.
Speaker:And the, like when we talk about backing up VMs, I agree with you.
Speaker:That's where this really shines.
Speaker:Because back in the day, if we backed up VMs, And we just pretended they were,
Speaker:physical machines and we were running full and incremental backups on them.
Speaker:We were beating the crap out of these VMs.
Speaker:So this is much more IO friendly, to the VMs, right?
Speaker:So it's much friendlier on the VMs.
Speaker:That's why we want to talk to the VMware API and get just
Speaker:the blocks that have changed.
Speaker:And it doesn't really come with any major downside compared to.
Speaker:The alternative is because we're storing the data on disk.
Speaker:Can I
Speaker:ask one
Speaker:yeah,
Speaker:sure.
Speaker:So we've talked about using block level incrementals for VMware, for databases.
Speaker:Is there a reason it hasn't really caught on for files?
Speaker:Because if I take a file and kind of split it up into blocks, right?
Speaker:Could I get the same benefit?
Speaker:Or is there a reason that it makes a lot more sense for
Speaker:like VMs or virtual machines?
Speaker:the benefit will be relative to the size of the file, right?
Speaker:The bigger the file, the bigger the benefit that you're going to get.
Speaker:And I would say that the reason it hasn't caught on is because of the next
Speaker:thing we're going to discuss, right?
Speaker:That solved that problem.
Speaker:But yeah, I think about like files like PST files or maybe a big access
Speaker:database or backing up like MySQL.
Speaker:That's not file.
Speaker:I mean, it is a file, but it's, it's actually a database.
Speaker:Right.
Speaker:I'd say the reason they didn't put a lot of effort is deduplication, which,
Speaker:why don't we just talk about that now?
Speaker:I know we've covered dedupe, just really quickly for those that don't
Speaker:understand what dedupe is, the idea is that we're going to identify duplicate
Speaker:segments of the data, and duplicate means that we've seen this data before.
Speaker:we've done a full backup or we've done an incremental backup and we've
Speaker:seen this part of the data before.
Speaker:And for it to be truly considered ddu, you've gotta look at,
Speaker:it's gotta be subfile, right?
Speaker:It's gotta be part of, like we were talking about the V M D K
Speaker:or the V D D K or a P S T file.
Speaker:We've gotta be looking inside the file, slicing that up into chunks,
Speaker:and then deciding this chunk.
Speaker:We've seen it before, this chunk, we have not, And so there are two
Speaker:different places that dedupe happens.
Speaker:One is at the target, which is, like a box, like a data domain
Speaker:or a quantum box or ExaGrid.
Speaker:these boxes are target dedupe.
Speaker:And then there's this thing called source dedupe, which.
Speaker:really took off from a company that was called Avamar.
Speaker:That company got sold to EMC, which I know you spent a little
Speaker:time with, back in the day.
Speaker:And, both of our previous employer did a source side deduplication.
Speaker:Yeah, so with the target site is great because you could take it and
Speaker:plug it in and place anywhere, right?
Speaker:Because as long as it supports whatever the protocol your client is using, right?
Speaker:You could just ingest the data and you get all the benefits of deduplication.
Speaker:So data domain was.
Speaker:Very popular initially for in virtual tape libraries, right?
Speaker:So you had tapes, right?
Speaker:People are constantly doing fulls and incremental backups.
Speaker:That's perfect to deduplicate.
Speaker:you plug in a data domain, it emulates the tape interface.
Speaker:And now you just, your clients still continue writing to there and then all
Speaker:your data gets deduplicated, right?
Speaker:And so it doesn't matter if it's NFS or if it's SMB or if it's tape, right?
Speaker:It just works.
Speaker:yeah, it's like that firewalla box that I
Speaker:bought, right?
Speaker:It just, it just, it goes in and then it just works, right?
Speaker:You didn't have to change anything.
Speaker:With source dedupe, the idea is that, there's three parts
Speaker:of the deduplication process.
Speaker:There's the slicing and dicing, right?
Speaker:There's the creation of a hash.
Speaker:You run the chunk of data through Some sort of cryptographic algorithm, like SHA,
Speaker:something, and then that gives you a value
Speaker:and then that value, you have to look up that value in some
Speaker:sort of hash table, right?
Speaker:with target deduplication, all three of those actions happen on the
Speaker:target, which is why it works so well.
Speaker:You just send the backups the way you're used to sending them,
Speaker:and then it does the magic.
Speaker:It slices and dices, it hashes, and it does the lookup, and it figures out which
Speaker:chunks of data are new based on that hash.
Speaker:Source side, the first two happen on the source, right?
Speaker:We slice up the data before we back it up.
Speaker:We slice up the data We create a hash of the data, and then we ask
Speaker:some magic person in the cloud, has this hash been seen before?
Speaker:And the decision is made on the other end.
Speaker:Yes, we've seen this, or we haven't seen this, and then we
Speaker:send Or don't send the data.
Speaker:To me, source dedupe is much more efficient than target dedupe.
Speaker:The difficulty is that it is a much, it's a little bit baby in
Speaker:a bathwater situation, right?
Speaker:Because in order to get it.
Speaker:You've got to do a forklift upgrade.
Speaker:You've got to stop using, let's say, again, this is things have changed, but
Speaker:back in the day, you had to stop using NetBackup and start using Avamar, right?
Speaker:Stop using Networker or TSM and switch to, Druva, right?
Speaker:You had to change your backup product to get this done.
Speaker:Things change a little bit over time, right?
Speaker:A lot of these products now support source dedupe.
Speaker:But that was the main downside or still is the main downside.
Speaker:If you want source dedupe, you've got to change your backup product,
Speaker:uh, or you've got to change how you use your backup product,
Speaker:assuming it starts supporting
Speaker:yeah, and I would say at this point, probably a good chunk of products
Speaker:either have their own source ID deduplication mechanism or they
Speaker:work with deduplicated targets which allow for source ID deduplication.
Speaker:for instance, integrating with ExaGrid or Data Domain from like TSM, Veeam,
Speaker:Exactly.
Speaker:Yeah, there are some that criticize it saying that, the slicing and
Speaker:dicing and the creation of the hash puts a load on the client.
Speaker:I have always argued that if done properly, that load created by the
Speaker:slicing and dicing and hashing is offset by the significant reduction
Speaker:of the load of transporting or not transporting 99 percent of the data,
Speaker:right?
Speaker:Yeah.
Speaker:Other critiques of it have been that the restore speed wasn't great because of
Speaker:how the data was stored on the other end.
Speaker:And I would argue that's a implementation problem.
Speaker:it's not a problem with the concept.
Speaker:It's a problem with the implementation of the
Speaker:And then the other thing to also mention about source side deduplication is
Speaker:typically these are also using proprietary protocols, so you don't end up with a
Speaker:lot of security issues you have around, say, having a target dedupe appliance
Speaker:with NFS or SMB open to the world.
Speaker:Yep.
Speaker:Yep.
Speaker:Agreed.
Speaker:Yes.
Speaker:Agreed that there is a security advantage to having the data sliced
Speaker:and diced way before and then encrypted before you send it to the other
Speaker:system instead of doing it over an unsecured protocol like NFS or SMB.
Speaker:Exactly.
Speaker:All right.
Speaker:this episode, I think, got a little longer than we had intended for it to
Speaker:get, but we covered a lot.
Speaker:We covered a lot in this episode.
Speaker:so basically, we learned about.
Speaker:what is and is not a backup.
Speaker:We learned about, multiplexing, full and incremental backups,
Speaker:file level incremental backups, and source side deduplication.
Speaker:Uh, it's a big episode.
Speaker:what do you think?
Speaker:Yeah, no, that covers a lot of what everyone talks about when you...
Speaker:Do you ever refer to backup and restore, You gotta know these backup
Speaker:technologies in order to be able to restore and protect your company.
Speaker:These are things that you need to know.
Speaker:All right.
Speaker:And with that, I once again want to thank our listeners.
Speaker:you are why we do this in Prasanna.
Speaker:Once again.
Speaker:great at your insights and questions as well.
Speaker:Thank you, sir.
Speaker:Thank you, sir.
Speaker:Keeping me honest.
Speaker:And, remember this show, the backup wrap up is an independent podcast and
Speaker:the opinions that you hear are ours.
Speaker:Not anyone else's, and also this is a production of BackupCentral.
Speaker:com and, uh, produced and edited by yours truly.
Speaker:And I just want to say, that's a wrap.