Speaker:

There are dozens of things that people do to protect their data

Speaker:

from loss, but many of them are worthless when you actually need them.

Speaker:

In this episode, we'll learn what turns a copy into a backup so that

Speaker:

you can make sure that anything you think is a backup actually is one.

Speaker:

We'll also talk about some important backup concepts like

Speaker:

multiplexing, incremental backups, block-level incremental backups

Speaker:

and source side deduplication.

Speaker:

Hi, I'm W.

Speaker:

Curtis Preston, AKA Mister backup.

Speaker:

And I've been specializing in backup and disaster recovery for over 30 years.

Speaker:

My podcast turns unappreciated backup admins into cyber recovery heroes.

Speaker:

This is the backup wrap up.

Speaker:

. Hi and welcome to the show.

Speaker:

and I have with me a guy who I think is super jelly of the new

Speaker:

toy that I put in yesterday.

Speaker:

Prasanna Malaiyandi.

Speaker:

How's it going, Prasanna?

Speaker:

I'm good Curtis, and yes, I am jealous of your toy, but I will have to start

Speaker:

sending you a bill for consulting fees, when you inevitably, or if

Speaker:

you inevitably run into issues.

Speaker:

Huh?

Speaker:

Yeah, because as you recall, I didn't, I purchased this like without even talking

Speaker:

to you, which is rather atypical of me.

Speaker:

because I, we talk about so much and, and basically what we're talking about

Speaker:

is a Firewalla, which I've had my eye on for a while and then after I realized,

Speaker:

so I've switched internet service providers and now they're telling me

Speaker:

that I'm hitting the bandwidth limit already and which is highly possible

Speaker:

given that I do, this, I realized that I had no bandwidth monitoring tools.

Speaker:

I have this really nice, mesh router system.

Speaker:

Wire, the wire, the wifi mesh, but that's put been put an access

Speaker:

point mode, which then of course offers no bandwidth monitoring.

Speaker:

And then I had this Cox router, which offered me nothing.

Speaker:

And so I replaced the Cox router with the Firewallet Purple SE and man.

Speaker:

Super simple to put in there.

Speaker:

And now I have these like super, stats.

Speaker:

And I get these, I'm going to, at some point I'm going to have to disable

Speaker:

the notifications because it's like Curtis is playing games on his phone.

Speaker:

Curtis is watching YouTube videos on MacBook Pro A, right?

Speaker:

It's literally, it's like Curtis has downloaded 3.

Speaker:

56 gigabytes of video on his, and I'm like, okay, this

Speaker:

is going to get old pretty

Speaker:

I have two questions for you.

Speaker:

Yeah,

Speaker:

The first question is, Did you figure out what was consuming

Speaker:

your data cap or data usage cap?

Speaker:

not yet.

Speaker:

Cause it's only, it hasn't even been 24 hours, but I do have a pretty good guess

Speaker:

and I think it's right in front of me.

Speaker:

we'll see what is weird.

Speaker:

And again, I didn't want to talk about this too much, but what is weird is I

Speaker:

get these, some of the notifications it's, Mac book pro uploaded 3.

Speaker:

5 megabytes of data to LinkedIn.

Speaker:

At 3 45 a.

Speaker:

m.

Speaker:

And I'm like, what, like, why is my laptop uploading three and a half

Speaker:

megabytes of anything while it's just sitting here and I'm sleeping somewhere?

Speaker:

that's weird.

Speaker:

weird.

Speaker:

Yeah.

Speaker:

So anyway, so yeah, go ahead.

Speaker:

Yeah.

Speaker:

since you've decided to get rid of the Cox router, can you not just

Speaker:

use your Wi Fi mesh as a router?

Speaker:

I could have, but then I would have to completely redo my

Speaker:

network architecture, which as you recall, was a really big thing.

Speaker:

That's true.

Speaker:

And I actually really liked the firewall features of this.

Speaker:

That's, that was what would, what really drew me to it.

Speaker:

And I'd been thinking about it and this was that final excuse to get it.

Speaker:

and, I'm really enjoying the security aspects

Speaker:

glad.

Speaker:

So there, for people, if Mr.

Speaker:

Backup can learn networking and firewalls, so can you.

Speaker:

Yeah.

Speaker:

It's certainly not my, my forte.

Speaker:

but I want to talk, it's time for the news of the week.

Speaker:

, the big news, I think, of the entire IG world, everybody seems to be

Speaker:

talking about it, is this MGM hack.

Speaker:

I can't, can you imagine?

Speaker:

if you've been living in a hole, They shut down MGM and Caesars and all

Speaker:

of the hotels attached to MGM and Caesars, which is like half the strip.

Speaker:

And they shut down like card keys, slot machines,

Speaker:

ATMs.

Speaker:

everything.

Speaker:

ATMs,

Speaker:

So for, so for our listeners, Who may not know about this,

Speaker:

MGM is a hotel chain, right?

Speaker:

They have a bunch of things as well, right?

Speaker:

Like various hotels like Caesars and MGM, and they are in Las Vegas

Speaker:

and there are casinos, right?

Speaker:

So you can stay there, you can gamble there, right?

Speaker:

They make lots and lots and lots and lots of money.

Speaker:

but not in the last week or so,

Speaker:

And they got hit by a cyber attack, on the week of September

Speaker:

20th or so, I'm guessing.

Speaker:

yeah, and The I think that's the saddest part about this and by the way as of

Speaker:

today There's been half a dozen lawsuits attached because there's, threat of a,

Speaker:

PII leak, personal information leak.

Speaker:

And so there's been all sorts of worries about that.

Speaker:

So there's, a half a dozen, what do you call it?

Speaker:

Class action or lawsuits that are attempting.

Speaker:

To achieve class action status that have been filed.

Speaker:

I think the saddest part here and the way I like and we'll put the

Speaker:

link to this particular article in the show description, the

Speaker:

heading here targeting layer eight.

Speaker:

I've heard of the seven layer networking model again With my extensive

Speaker:

networking experience, the OSI model.

Speaker:

What is Layer 8,

Speaker:

People?

Speaker:

it's people.

Speaker:

Yes.

Speaker:

So layer 8 is people, which is probably the weakest part

Speaker:

of the entire stack, I'd say.

Speaker:

Yeah.

Speaker:

You are the weakest link!

Speaker:

Yeah, so what, how did they get in?

Speaker:

So they basically targeted an employee, right?

Speaker:

Who had the right level of access and They basically were able to gain access into

Speaker:

their Okta environment as a super admin.

Speaker:

But how did they do that?

Speaker:

That's the

Speaker:

Oh, so how did they do that?

Speaker:

They basically tripped, tricked their IT help desk?

Speaker:

That is so bad, right?

Speaker:

they somehow got...

Speaker:

Access to a privileged account, right?

Speaker:

according to the powers that be that they stole a password or they

Speaker:

hacked Active Directory somehow.

Speaker:

So they were able to attempt to log in, but they were stopped by MFA

Speaker:

which is a good thing, Okta, but then

Speaker:

They were able to convince the help desk that they were the person in

Speaker:

question and get them to reset MFA.

Speaker:

Now here's a question.

Speaker:

Do you think that employee is still there at the company?

Speaker:

And

Speaker:

is one of those

Speaker:

you be blaming the person

Speaker:

so I'm going to fast forward like 30 years.

Speaker:

Okay.

Speaker:

so 20, what would that be?

Speaker:

2053.

Speaker:

There's a guy he's going to be called Mr.

Speaker:

MFA and he's going to have a podcast dedicated to security because like my

Speaker:

career started with a screw up of this.

Speaker:

not quite this magnitude, but my career started with this.

Speaker:

And so I, My personal opinion, I don't know if this person is, has been fired.

Speaker:

I think they should only be fired if they didn't follow the processes that had been,

Speaker:

established and they

Speaker:

set out for them.

Speaker:

Potentially they should be disciplined.

Speaker:

I don't know if firing, if.

Speaker:

If termination is the appropriate, they should be disciplined.

Speaker:

If they followed the procedures that had been laid out for

Speaker:

them, process, people, right?

Speaker:

Then technology.

Speaker:

If they had been, we just had a podcast about that.

Speaker:

If they followed the procedures.

Speaker:

have been given to them.

Speaker:

Then I think some massive leniency, then you update your procedures,

Speaker:

et cetera, et cetera, et cetera.

Speaker:

I think of a massive, outage that was caused at a major, software

Speaker:

vendor that I worked with.

Speaker:

I'm trying to be, I'm trying to be very cagey here, where the backup

Speaker:

operator followed his Procedure that they had two parts of the app

Speaker:

that had to be shut down in order to do a backup because they couldn't

Speaker:

synchronize the two backup systems.

Speaker:

And so every two weeks they would shut down these apps and,

Speaker:

and then do a backup offline.

Speaker:

And this person.

Speaker:

The backup operator just did what they were told to do and shut down these

Speaker:

apps at the most critical time of the year when the apps were needed, right?

Speaker:

The person was just doing their job.

Speaker:

That person should not be fired.

Speaker:

That person should be You know, you changed the procedure.

Speaker:

So I don't know what happened here.

Speaker:

Yeah.

Speaker:

You train.

Speaker:

Yeah.

Speaker:

I hope some leniency was there.

Speaker:

if the person was fired and, I'd love to have them on the podcast.

Speaker:

But anyway, what could we learn from this, from this news here?

Speaker:

Prasanna?

Speaker:

basically that, one is even if you have the greatest technologies in

Speaker:

place and the greatest processes in place, people will always exist

Speaker:

never underestimate the power of people to do dumb things.

Speaker:

I do think that perhaps what's in order here is an update to process.

Speaker:

And the process should be when, cause you have to be able to reset MFA.

Speaker:

When resetting MFA, it should require many more, bells and whistles

Speaker:

and levels of authentication.

Speaker:

And we need to identify, we need to identify that this person who

Speaker:

calls in that says that they're Steve, we need a way to identify

Speaker:

that Steve is actually Steve.

Speaker:

so you create a process around that, that really verifies that someone who they are,

Speaker:

and especially,

Speaker:

you reset MFA.

Speaker:

and especially when it's someone of, with that level of privilege.

Speaker:

Especially, super especially, that's not a word, but yeah,

Speaker:

that, oh, I feel for these guys.

Speaker:

keep, abreast of, this story because it is going to get worse before it gets better.

Speaker:

And that's the news for this week.

Speaker:

So what I thought we would talk about this week and the backup to basic series is,

Speaker:

I've got it defined as, backup methods that support a traditional restore.

Speaker:

So basically the backup methods that I grew up with that are still,

Speaker:

in

Speaker:

Relevant.

Speaker:

Yeah.

Speaker:

in, yeah, right?

Speaker:

we like to live in a world where everybody's using the

Speaker:

latest and greatest, right?

Speaker:

And nobody's doing this old, full and incremental backups and stuff.

Speaker:

Nobody's doing that.

Speaker:

And that's just not right.

Speaker:

So we need to talk about these, these methods and see what we can get out there.

Speaker:

the first thing, I just have to, again, I'm, I'm, we're doing this based

Speaker:

on, my book, Modern Data Protection, There's a cover for those of you

Speaker:

watching via video, all, all three listeners that are watching via video.

Speaker:

I think it's 10.

Speaker:

There's maybe 10.

Speaker:

the number's actually gone up since we've been putting them on YouTube.

Speaker:

Oh, there you go.

Speaker:

the, I've got this thing in here.

Speaker:

so this is from chapter nine and talking about backup and

Speaker:

recovery software methods.

Speaker:

And the first thing I had in there was, is everything backup?

Speaker:

So there was a time when backup was well defined.

Speaker:

Backup was copy something to tape and then put that tape in a box,

Speaker:

right?

Speaker:

It was so simple back then.

Speaker:

Yeah, it was so simple back then.

Speaker:

Yes.

Speaker:

so I, as quote, Mr.

Speaker:

Backup, I see backup a lot broader than I think a lot of people do.

Speaker:

A lot of people, when they say backup, they go, Oh, this isn't backup.

Speaker:

This is, to me, backup is anything really that protects the data, the

Speaker:

way backup protects data, right?

Speaker:

And so I'm defining backup rather broadly as anything that is a copy of data

Speaker:

stored separately from the original.

Speaker:

that can be used to restore the original if it is damaged.

Speaker:

There's a lot of things that qualify for backup as backup under that

Speaker:

so let me just give you some examples and see if you think they qualify.

Speaker:

Okay.

Speaker:

So take a copy on tape,

Speaker:

Yes.

Speaker:

a copy in AWS S3.

Speaker:

A copy of the data that's in S3, which is separate from

Speaker:

The

Speaker:

your not.

Speaker:

Yeah.

Speaker:

yes.

Speaker:

okay?

Speaker:

a copy replicated from one storage system to another storage

Speaker:

system from the same vendor.

Speaker:

as long as...

Speaker:

there's a caveat here, because you used the word replication.

Speaker:

I need the ability, is it replicated in such a way that if

Speaker:

I damage production, so

Speaker:

that doesn't qualify as being stored separately.

Speaker:

Replicated with separate retention of the copies on the destination.

Speaker:

Okay.

Speaker:

Yes, I would call that a

Speaker:

Okay, snapshots on a production system, on a production storage

Speaker:

array that does not include AWS S3,

Speaker:

thank you for, yeah, so snapshots on the same array.

Speaker:

no, End of story, not a backup until it's copied somewhere

Speaker:

Okay, and then doing what you were recently doing when

Speaker:

editing the podcast, right?

Speaker:

Downloading a copy from the cloud onto your local system, copying it

Speaker:

to a different directory, and then copying it to yet a third directory.

Speaker:

On your local system.

Speaker:

Is that local system considered backups, each of those copies?

Speaker:

again, we're storing the data in a separate place that

Speaker:

has a separate risk profile.

Speaker:

Etc.

Speaker:

yes,

Speaker:

As long as the copy, the original

Speaker:

copy was in the, cloud.

Speaker:

it's also about, the purpose of why I'm doing it, right?

Speaker:

If the purpose of downloading that is to serve as possibly a backup, right?

Speaker:

Because there's a lot of times that we download data That

Speaker:

is not for backup purposes.

Speaker:

Now, it could accidentally become a backup if it's the only

Speaker:

copy that you have available.

Speaker:

But, just because I copy doesn't necessarily make it a backup.

Speaker:

It might be an archive.

Speaker:

And then the last example.

Speaker:

taking pictures on your iPhone and using iCloud to sync

Speaker:

your copies to iCloud photos.

Speaker:

Not a backup.

Speaker:

Because,

Speaker:

is that?

Speaker:

for two reasons.

Speaker:

One, which is really the primary.

Speaker:

And that is specifically in terms of Apple iCloud.

Speaker:

But the biggest thing is that it's synchronized.

Speaker:

that's the key.

Speaker:

That's, you're, you asked earlier, you delete a picture in your phone or some

Speaker:

app, delete, some like ransomware deletes a bunch of pictures in your phone.

Speaker:

It synchronizes that deletion up in the cloud and they go byebye, right?

Speaker:

It is a synchronized copy, not a backup.

Speaker:

it is stored separately, but if you delete it here and it gets deleted

Speaker:

there, that's not a backup, right?

Speaker:

Just like we were talking, before.

Speaker:

And that's one really important reason, possibly the most important reason.

Speaker:

But the other is that there's a feature in iPhone that...

Speaker:

It says we can store low res copies on the phone and the high res copies in

Speaker:

the cloud, which means that not only is it a synchronized copy, the only true

Speaker:

copy of your photo is in the cloud.

Speaker:

It's only one copy, which means you need to be backing up iCloud.

Speaker:

and by extension also, Google photos if you're an Android

Speaker:

person, so yeah, not a backup,

Speaker:

okay, no,

Speaker:

which we had a whole podcast episode about that.

Speaker:

How to properly back up your iCloud account.

Speaker:

yeah,

Speaker:

were good examples.

Speaker:

I think those are a lot of things, like you said, right?

Speaker:

It's not always easy to say, is it a backup or not?

Speaker:

Unless you dive into the next level of questions and ask, okay, is it really a

Speaker:

yeah,

Speaker:

Does it meet these

Speaker:

I think.

Speaker:

or not?

Speaker:

I think you did a good job of, the different categories, like that

Speaker:

thing of, if it's fully synchronized, whether synchronous or asynchronous,

Speaker:

if it's fully synchronized and if I delete the production and

Speaker:

it deletes the data, the copy,

Speaker:

that, That's not a backup.

Speaker:

right?

Speaker:

unless that copy has the ability to undo that.

Speaker:

If it does, then, I would change my answer, right?

Speaker:

And so like a NetApp synchronized filer, I would consider that

Speaker:

other copy, that would be backup,

Speaker:

other things that are not a backup, one that you didn't mention would be,

Speaker:

the recycle bin in your Microsoft 365.

Speaker:

That is not a backup, right?

Speaker:

It's not stored separately.

Speaker:

it's just, records in a database that have been flagged as deleted.

Speaker:

They haven't gone anywhere.

Speaker:

They're sitting right next to the production data.

Speaker:

So yeah,

Speaker:

Okay.

Speaker:

And then the other one is,

Speaker:

So in your opinion, does backup require you to always be able to go

Speaker:

back to a point in time that could plausibly have existed in the system?

Speaker:

And the reason I'm asking this is if I look at, I know email archiving comes

Speaker:

up a lot and sometimes people are like, oh, that's the same as backup.

Speaker:

But with email archive, you're just getting all the data that's there,

Speaker:

whether or not your mailbox actually looked like that, your inbox looked

Speaker:

like that or not at any point in time.

Speaker:

Yeah.

Speaker:

So backup.

Speaker:

requires restore, right?

Speaker:

For it to be a backup, you need to be able to restore it to the way it

Speaker:

looked at some point in time, right?

Speaker:

yeah, that's a really good question, Prasanna.

Speaker:

it's one thing to say a file, but, if you cannot, if you cannot bring

Speaker:

the thing that's been damaged back to its You know, back to before it

Speaker:

was damaged and that it comes back to the same way as it was before it was

Speaker:

damaged, then you don't have a backup.

Speaker:

You copy of the data, right?

Speaker:

And an email archive is a perfect example of that.

Speaker:

You have a copy of the data, but it was stored for a different purpose.

Speaker:

It was stored for archive, which means it wasn't designed to be put back into the,

Speaker:

the state it was in,

Speaker:

yeah, the state that it was in,

Speaker:

right?

Speaker:

so you might be able to restore all the email, but you won't be able to

Speaker:

restore folders and things like that.

Speaker:

A good backup should bring the thing back to the way it was before it was damaged,

Speaker:

however it let's go back to a time when tape drive started getting, so here,

Speaker:

we're going to talk about a feature that is now for many people, passe, right?

Speaker:

it's not really necessary because they no longer use tape as their primary target

Speaker:

or their initial target of backups.

Speaker:

and that is this concept of multiplexing.

Speaker:

And it goes back to, there was a time when we

Speaker:

Way back in the days.

Speaker:

right back in the day.

Speaker:

So multiplexing, do you want to define multiplexing or explain it?

Speaker:

Yeah, multiplexing.

Speaker:

Yeah, I, let me attempt to, I know I wasn't aware of this before we started

Speaker:

doing the podcast and you explained everything about tape and I know we've

Speaker:

had a bunch of folks, tape experts on the podcast as well, but multiplexing is...

Speaker:

to solve an issue where tape requires you to write at a certain speed.

Speaker:

If you don't, it's bad.

Speaker:

And tapes got faster and faster, but the problem was pumping data into the tape

Speaker:

device itself wasn't going as quickly as the tape speeds were increasing.

Speaker:

And so in order to solve that, what they decided to do was say, okay, Let's have

Speaker:

multiple clients feed data into the tape device at the same time, and we will

Speaker:

multiplex or basically write all those streams into the tape drive at the same

Speaker:

time, keeping the tape device happy.

Speaker:

While still being able to do all the backups.

Speaker:

Yeah, another word for it would be interleaving.

Speaker:

You did great.

Speaker:

basically putting all, chopping them up into pieces and then

Speaker:

putting together into one, turning a bunch of streams into one stream.

Speaker:

And when we first started, we used multiplexing settings of four

Speaker:

Which means four different

Speaker:

turn and.

Speaker:

Yeah, four different clients being combined into a stream to

Speaker:

make a tape drive happy, but tape drives got faster and faster.

Speaker:

The clients didn't get faster.

Speaker:

And so by the time I left, by the time I used my last tape drive in

Speaker:

production, we were up to 36, right?

Speaker:

We were up to 36 streams together to, to make an individual tape drive happy.

Speaker:

And the reason,

Speaker:

I was gonna ask why.

Speaker:

Yeah.

Speaker:

Why were clients not fast enough

Speaker:

yeah.

Speaker:

So the reason that this was bad is that, what, why is the only reason we back up,

Speaker:

to restore

Speaker:

right?

Speaker:

So when you

Speaker:

go to

Speaker:

do a restore,

Speaker:

Yeah.

Speaker:

yeah.

Speaker:

When you go to do a restore, you have to read all 36 streams

Speaker:

and throw 35 of them away.

Speaker:

So your tape drive, the speed of your restore is going to be 1 35th.

Speaker:

Of what it could potentially be if it hadn't been multiplexed,

Speaker:

But if you're never doing restore tests, it doesn't really matter.

Speaker:

Until you actually need to restore the data.

Speaker:

yeah, if you're You're killing me you're killing me yeah, so it was one

Speaker:

of these things where it was a Cut your nose off to to spite your face, right?

Speaker:

So We felt that it was But it was a necessary evil.

Speaker:

We, you could only restore if you've got backups done and we could only get

Speaker:

backups done reliably if we were using multiplexing, but we knew that it was

Speaker:

creating this problem and ultimately this was the undoing of tape from

Speaker:

a backup and recovery perspective.

Speaker:

We switched to destaging and.

Speaker:

these other things to undo this, necessary evil.

Speaker:

But, it, it was a mess.

Speaker:

But that's what multiplexing is.

Speaker:

So if you've heard about multiplexing, you don't need to do multiplexing

Speaker:

if you're backing up to disk.

Speaker:

Because disk can write at whatever speed you tell it to write at.

Speaker:

And it can write a bunch of things at the same time.

Speaker:

And you can give it 36 streams and it can write them all at the same time in

Speaker:

separate places of the disk in such a way that when you go to do a restore,

Speaker:

you don't, you're not, you don't have to read all of them to read one of them.

Speaker:

What?

Speaker:

That was my yes.

Speaker:

disk is fast enough, but

Speaker:

Yeah.

Speaker:

Well, it's not,

Speaker:

a disk

Speaker:

drive has a certain

Speaker:

number of IOPS it could handle.

Speaker:

And therefore, as long as your system is big enough.

Speaker:

To handle all of them in peril.

Speaker:

yes.

Speaker:

they're, disk drives are not, Unlimited bandwidth, unlimited IO,

Speaker:

et cetera, et cetera, et cetera.

Speaker:

Yes.

Speaker:

but the point of the way that it lays the data, you don't have to lay the,

Speaker:

you can lay the data however you want and then read it however you want.

Speaker:

there are, again, there are limits to everything depending on how

Speaker:

much you fragment the data and all that kind of stuff, right?

Speaker:

But it's still way better than tape from that perspective.

Speaker:

All right, next one's a whole lot easier.

Speaker:

What comes next?

Speaker:

What's the first type of, what's

Speaker:

let you tackle

Speaker:

what, no, I'll let you tackle this, Curtis.

Speaker:

So what's the, what is it?

Speaker:

The first type of backup that everyone should cut their teeth on.

Speaker:

what a full backup?

Speaker:

Is that

Speaker:

Yeah.

Speaker:

what you're saying?

Speaker:

Yeah.

Speaker:

so basically we're just going to talk about this concept of

Speaker:

full and incremental backups.

Speaker:

And probably everybody knows this, but this is a backup to basic series.

Speaker:

So a full backup backs up everything, an incremental backup

Speaker:

backs up things that have changed.

Speaker:

And the, there are different types of incremental backups, right?

Speaker:

And different people have different names for these different types, right?

Speaker:

terms you've probably heard, incremental, differential, cumulative incremental.

Speaker:

For a lot of people, cumulative incremental and

Speaker:

differential are the same thing.

Speaker:

for people that got stuck in Windows land, not necessarily so what's the

Speaker:

difference between an incremental and these other two things?

Speaker:

A cumulative incremental.

Speaker:

So an incremental is basically, Typically, Sunday you do a full backup, right?

Speaker:

Monday you need to do another backup.

Speaker:

Now, you don't want to do necessarily the entire full backup again,

Speaker:

because maybe that's too much data, you don't have enough time, etc.

Speaker:

So you'll do an incremental, which is basically whatever has

Speaker:

changed since the last full.

Speaker:

So since Sunday.

Speaker:

Sorry, since the last time you did a backup, I should say.

Speaker:

exactly, whatever's changed since the last time you did a

Speaker:

Yeah, so in that case, it was Sunday, so then Monday you get the incrementals,

Speaker:

now Tuesday you're going to do backup, and so you do another incremental, which

Speaker:

is whatever has changed since Monday,

Speaker:

Exactly,

Speaker:

and we just keep doing that, right?

Speaker:

Yeah, and

Speaker:

then if it's, yeah, if it's Sunday, right?

Speaker:

And now it's Saturday, how many tapes do I need to do a restore?

Speaker:

do you need...

Speaker:

The previous Sunday, plus the Monday, plus the Tuesday, plus

Speaker:

the Wednesday, Thursday, Friday.

Speaker:

You basically need to replay

Speaker:

by the way, by the way, I really, I really channeled the old Curtis there.

Speaker:

I did it without even meaning to, I said tapes, right?

Speaker:

Cause

Speaker:

that was the problem back then.

Speaker:

We literally had to grab for seven tapes, right?

Speaker:

Nowadays, we don't have to grab for seven tapes, but,

Speaker:

but you still have to do all those restores though, right?

Speaker:

So even in the case of, if a file existed Sunday, and then was deleted Monday,

Speaker:

and then came back on Tuesday, you would still end up having to do all of those

Speaker:

data, like basically you're replaying like a log, all the data that would

Speaker:

have existed on each of those days.

Speaker:

right.

Speaker:

The real problem is a file that was changed every single day.

Speaker:

You would actually restore that file seven times.

Speaker:

It's a lot of wasted effort.

Speaker:

That's just the idea of a increment or regular incremental.

Speaker:

Then we have a differential or a cumulative incremental.

Speaker:

And the difference between that is that it's going to, it's going to do

Speaker:

the thing that you said earlier, which is it's going to back up everything

Speaker:

that's changed since the fall.

Speaker:

And so what some people do is that they've stopped, they stopped doing

Speaker:

incrementals and they switched to differentials or cumulative incrementals

Speaker:

every day, and that way at the end of the week, I would need at most two tapes.

Speaker:

Right now, this whole thing has pretty much gone away in the world of.

Speaker:

disk based backups, right?

Speaker:

Because the whole reason that we did backups this way, is

Speaker:

that, first off, let me back up.

Speaker:

We used to do weekly fulls followed by daily incrementals.

Speaker:

Then we switched for, because when we went to automated tape libraries,

Speaker:

the whole process of managing the different tapes wasn't as a big.

Speaker:

Big of a deal.

Speaker:

So we went to monthly folds followed by daily incrementals or maybe

Speaker:

a weekly cumulative and right?

Speaker:

So you'd still need a maximum of seven tapes to do a restore But when

Speaker:

we switched to this this whole thing just became Kind of silly and moot and

Speaker:

whatever and you could back up, however, you wanted to back up and dedupe,

Speaker:

which we're going to talk about in a minute, dedupe really changed the game.

Speaker:

And, because it didn't matter whether you backed up full or incremental or whatever,

Speaker:

you still stored the same amount of data.

Speaker:

go

Speaker:

before we jump though, one thing that I think people might also hear in addition

Speaker:

to fulls, incrementals, differentials, and cumulative incrementals is also levels.

Speaker:

So maybe you could talk about levels.

Speaker:

I know sometimes it's specific to like Oracle.

Speaker:

And some databases, but maybe it might

Speaker:

no, that's a good point.

Speaker:

Yeah, thanks.

Speaker:

so the concept of a backup level, literally, this goes

Speaker:

back to the days of dump, right?

Speaker:

which was the command to backup Unix file systems.

Speaker:

A level zero was a full, a level one, And if you wanted to do increment, if you

Speaker:

want to do what we call the incremental backups, the way we, you would do a zero

Speaker:

followed by a one, followed by a two, followed by a three, followed by a four.

Speaker:

And, it got interesting because if you then lowered the number.

Speaker:

It would behave like a,

Speaker:

cumulative incremental, right?

Speaker:

so like you could do a zero and then you do a one.

Speaker:

If you then did another one, if you kept doing ones, you would get a differential.

Speaker:

You would get a cumulative incremental every day.

Speaker:

If you did a 0, a 1, and then a 2, and then a 1 again, it's just, it

Speaker:

basically, it always pointed back to the number that was the most recent

Speaker:

number that was lower than itself, and so it got complicated, and so

Speaker:

there were actually some people that

Speaker:

Is it they prefer

Speaker:

called Towers of

Speaker:

Hanoi, Yeah, which is based on the game, and I've got it in the book,

Speaker:

the Towers of Hanoi progressive thing, but I can't, it's like 0, 3, 2, 4, so

Speaker:

basically every backup, without doing cumulative incrementals, every backup,

Speaker:

every file that was changed would end up being on two tapes, which was just

Speaker:

an interesting way to, To minimize tape, again, this is all because we're doing

Speaker:

tapes, but nobody has tapes anymore.

Speaker:

So nobody cares.

Speaker:

But that's what levels were.

Speaker:

It was all the way up to nine.

Speaker:

and they still have this concept in, in things like Oracle Backup.

Speaker:

So the next thing to talk about is this concept called file

Speaker:

level incremental forever.

Speaker:

And the company that really put this out there was IBM with their product TSM.

Speaker:

And back in the day,

Speaker:

has been renamed,

Speaker:

idea is you

Speaker:

do one full, what's that?

Speaker:

Hasn't it been renamed?

Speaker:

It has, but I'm just saying they came out with it when they

Speaker:

came out, it was called TSM.

Speaker:

It's now like IBM spectrum protect, but, the idea was you do one full and then

Speaker:

everything is an incremental forever.

Speaker:

we never again do a full and this really saved a lot of

Speaker:

bandwidth and saved a lot of tape.

Speaker:

It came with a mess and that was over time and again, tape over time,

Speaker:

you could end up needing hundreds of tapes to restore a single file system.

Speaker:

you would need just one file from this tape and one file from that tape.

Speaker:

And since the hardest part of a tape is like, it was like

Speaker:

two and a half minutes just to get a tape in and, get it loaded and seek

Speaker:

to So the average point in a tape.

Speaker:

So I was not a fan of doing backups this way when we were talking about tape.

Speaker:

Was there a reason?

Speaker:

what was the use case at the time for that?

Speaker:

it was about saving tape, saving

Speaker:

storage.

Speaker:

It was about saving bandwidth.

Speaker:

the idea, there's nothing wrong with the idea of incremental forever.

Speaker:

It's just that their implementation.

Speaker:

Back in the day when it was all tape, even when they had disk staging.

Speaker:

So they would stage the disk.

Speaker:

So they wouldn't multiplex, by the way, they wouldn't multiplex.

Speaker:

They would stage the disk and then they would, do the backups to tape.

Speaker:

And this only applied to file system backups.

Speaker:

It didn't apply to database backups.

Speaker:

And, but literally you would need hundreds and hundreds of tapes

Speaker:

to restore a single file system.

Speaker:

And it just, I was never a fan of doing backups that way.

Speaker:

As long as we were backing up to tape and they had ways to they had, co location

Speaker:

and these various, and this thing called reclamation, because when you're doing

Speaker:

backups that way, you end up with a lot of tapes that have files on them that have

Speaker:

expired that are no longer needed, but you have other files on there that are needed.

Speaker:

And so you'd have to copy forward.

Speaker:

Yeah.

Speaker:

so that you could reclaim that whole tape and then reuse it.

Speaker:

And

Speaker:

That sounds like a

Speaker:

management nightmare.

Speaker:

An interesting engineering problem, but...

Speaker:

yeah, I was never a fan of doing backups that way.

Speaker:

and I'm even less of a fan now that we don't have to worry about tape.

Speaker:

Now we can just do incremental forever and just do it without all that

Speaker:

co location and reclamation stuff.

Speaker:

Cause on disk, to reclaim, you just delete a file, right?

Speaker:

On tape, you delete a file in the middle of a tape.

Speaker:

You have to reclaim the tape.

Speaker:

so that's file level incremental forever.

Speaker:

And then, with the advent of backing up to disk, Which finally

Speaker:

happened, I don't know, 20 years ago.

Speaker:

It's so funny.

Speaker:

We, we say the advent of something that happened 20 years ago.

Speaker:

When we finally started doing it, and once everybody finally went to, and by

Speaker:

the way, everybody still is not backing up the desk, it's still, there's still

Speaker:

a small contingent of people to back up the tape, so those people will really

Speaker:

enjoy the first half of this episode.

Speaker:

Now we have this concept of block level incremental forever.

Speaker:

Would you like to explain that?

Speaker:

Yeah, with block level incremental, I guess where I think of block level

Speaker:

incremental, I know there's various places you can think about it, is

Speaker:

when it applies to virtual machines and other sort of larger objects.

Speaker:

where it doesn't make sense, to back up an entire VM, doing full, or, incremental

Speaker:

backups away, if you think about how you would have done file level backups, right?

Speaker:

Why would I

Speaker:

Now, what, why would that be?

Speaker:

because I have a file which represents a disk, the entire file

Speaker:

doesn't change every time, right?

Speaker:

Parts of the file

Speaker:

it's, so we're talking to a VMDK file or VDK, For, for,

Speaker:

Hyper V, VDDK, that can't say VDDK.

Speaker:

I think it's VDDK.

Speaker:

Yeah,

Speaker:

I think you're right.

Speaker:

so you're saying if anything changes on there,

Speaker:

you're backing up the entire whole

Speaker:

do an incremental, exactly.

Speaker:

You're going

Speaker:

it's the entire file change, right?

Speaker:

So you're backing up the entire thing, but that doesn't make sense when

Speaker:

you have files which are say 10, 50, 100, 200 gigabytes and you're backing

Speaker:

that up every single time and so with block level incrementals What they

Speaker:

basically have done is say, okay What blocks have changed in this VMDK?

Speaker:

Let me just back those up, right?

Speaker:

Oracle also for databases, they do something similar, right?

Speaker:

Where it's hey Let me only back up the blocks within an Oracle data

Speaker:

file that have changed rather than backing up the entire Oracle database.

Speaker:

And how does the backup product know which blocks have changed?

Speaker:

Usually you have to rely on that vendor to tell you.

Speaker:

So in the case of Oracle, right?

Speaker:

You're usually integrating with Oracle RMAN via SBT or some other

Speaker:

mechanism where Oracle knows, okay, I keep track of the database blocks.

Speaker:

I know which ones are new.

Speaker:

Here is a list of blocks that you need to care about.

Speaker:

Same thing with VMware, when you have their, what is their SDK called?

Speaker:

VADP.

Speaker:

Yeah.

Speaker:

they've changed

Speaker:

the name.

Speaker:

Yeah.

Speaker:

They've changed the name, but basically they're, they have an API to talk to,

Speaker:

and they maintain a bitmap, right?

Speaker:

And then they just give you, here's a map of the bits that you need to go get.

Speaker:

These are the bits that have changed.

Speaker:

They maintain that.

Speaker:

And then the, there's an API for asking for those blocks,

Speaker:

now this is great for disk based systems because if you think about these are

Speaker:

all random spots in a file and so you can dump it out now It's up to figure

Speaker:

out like how you want to do this and I know we'll talk a little bit later

Speaker:

about deduplicated storage, but In the case of Oracle, typically you would just

Speaker:

dump it out as incremental blocks, and just dump it into a file, and now you

Speaker:

have all those blocks captured together.

Speaker:

In the case of VMware, they started doing that.

Speaker:

A lot of back up vendors would just dump it out as raw blocks, which makes sense.

Speaker:

but then, there are other optimizations you can do to do smarter things with

Speaker:

it, because with incremental block based backups, you still have to

Speaker:

restore from multiple files in order to stitch together the final actual image.

Speaker:

Yeah.

Speaker:

And you still have that problem.

Speaker:

That we talked about earlier where you may restore an individual block multiple

Speaker:

times if it changes multiple times, right?

Speaker:

the advantage is it's incredibly efficient.

Speaker:

And the, like when we talk about backing up VMs, I agree with you.

Speaker:

That's where this really shines.

Speaker:

Because back in the day, if we backed up VMs, And we just pretended they were,

Speaker:

physical machines and we were running full and incremental backups on them.

Speaker:

We were beating the crap out of these VMs.

Speaker:

So this is much more IO friendly, to the VMs, right?

Speaker:

So it's much friendlier on the VMs.

Speaker:

That's why we want to talk to the VMware API and get just

Speaker:

the blocks that have changed.

Speaker:

And it doesn't really come with any major downside compared to.

Speaker:

The alternative is because we're storing the data on disk.

Speaker:

Can I

Speaker:

ask one

Speaker:

yeah,

Speaker:

sure.

Speaker:

So we've talked about using block level incrementals for VMware, for databases.

Speaker:

Is there a reason it hasn't really caught on for files?

Speaker:

Because if I take a file and kind of split it up into blocks, right?

Speaker:

Could I get the same benefit?

Speaker:

Or is there a reason that it makes a lot more sense for

Speaker:

like VMs or virtual machines?

Speaker:

the benefit will be relative to the size of the file, right?

Speaker:

The bigger the file, the bigger the benefit that you're going to get.

Speaker:

And I would say that the reason it hasn't caught on is because of the next

Speaker:

thing we're going to discuss, right?

Speaker:

That solved that problem.

Speaker:

But yeah, I think about like files like PST files or maybe a big access

Speaker:

database or backing up like MySQL.

Speaker:

That's not file.

Speaker:

I mean, it is a file, but it's, it's actually a database.

Speaker:

Right.

Speaker:

I'd say the reason they didn't put a lot of effort is deduplication, which,

Speaker:

why don't we just talk about that now?

Speaker:

I know we've covered dedupe, just really quickly for those that don't

Speaker:

understand what dedupe is, the idea is that we're going to identify duplicate

Speaker:

segments of the data, and duplicate means that we've seen this data before.

Speaker:

we've done a full backup or we've done an incremental backup and we've

Speaker:

seen this part of the data before.

Speaker:

And for it to be truly considered ddu, you've gotta look at,

Speaker:

it's gotta be subfile, right?

Speaker:

It's gotta be part of, like we were talking about the V M D K

Speaker:

or the V D D K or a P S T file.

Speaker:

We've gotta be looking inside the file, slicing that up into chunks,

Speaker:

and then deciding this chunk.

Speaker:

We've seen it before, this chunk, we have not, And so there are two

Speaker:

different places that dedupe happens.

Speaker:

One is at the target, which is, like a box, like a data domain

Speaker:

or a quantum box or ExaGrid.

Speaker:

these boxes are target dedupe.

Speaker:

And then there's this thing called source dedupe, which.

Speaker:

really took off from a company that was called Avamar.

Speaker:

That company got sold to EMC, which I know you spent a little

Speaker:

time with, back in the day.

Speaker:

And, both of our previous employer did a source side deduplication.

Speaker:

Yeah, so with the target site is great because you could take it and

Speaker:

plug it in and place anywhere, right?

Speaker:

Because as long as it supports whatever the protocol your client is using, right?

Speaker:

You could just ingest the data and you get all the benefits of deduplication.

Speaker:

So data domain was.

Speaker:

Very popular initially for in virtual tape libraries, right?

Speaker:

So you had tapes, right?

Speaker:

People are constantly doing fulls and incremental backups.

Speaker:

That's perfect to deduplicate.

Speaker:

you plug in a data domain, it emulates the tape interface.

Speaker:

And now you just, your clients still continue writing to there and then all

Speaker:

your data gets deduplicated, right?

Speaker:

And so it doesn't matter if it's NFS or if it's SMB or if it's tape, right?

Speaker:

It just works.

Speaker:

yeah, it's like that firewalla box that I

Speaker:

bought, right?

Speaker:

It just, it just, it goes in and then it just works, right?

Speaker:

You didn't have to change anything.

Speaker:

With source dedupe, the idea is that, there's three parts

Speaker:

of the deduplication process.

Speaker:

There's the slicing and dicing, right?

Speaker:

There's the creation of a hash.

Speaker:

You run the chunk of data through Some sort of cryptographic algorithm, like SHA,

Speaker:

something, and then that gives you a value

Speaker:

and then that value, you have to look up that value in some

Speaker:

sort of hash table, right?

Speaker:

with target deduplication, all three of those actions happen on the

Speaker:

target, which is why it works so well.

Speaker:

You just send the backups the way you're used to sending them,

Speaker:

and then it does the magic.

Speaker:

It slices and dices, it hashes, and it does the lookup, and it figures out which

Speaker:

chunks of data are new based on that hash.

Speaker:

Source side, the first two happen on the source, right?

Speaker:

We slice up the data before we back it up.

Speaker:

We slice up the data We create a hash of the data, and then we ask

Speaker:

some magic person in the cloud, has this hash been seen before?

Speaker:

And the decision is made on the other end.

Speaker:

Yes, we've seen this, or we haven't seen this, and then we

Speaker:

send Or don't send the data.

Speaker:

To me, source dedupe is much more efficient than target dedupe.

Speaker:

The difficulty is that it is a much, it's a little bit baby in

Speaker:

a bathwater situation, right?

Speaker:

Because in order to get it.

Speaker:

You've got to do a forklift upgrade.

Speaker:

You've got to stop using, let's say, again, this is things have changed, but

Speaker:

back in the day, you had to stop using NetBackup and start using Avamar, right?

Speaker:

Stop using Networker or TSM and switch to, Druva, right?

Speaker:

You had to change your backup product to get this done.

Speaker:

Things change a little bit over time, right?

Speaker:

A lot of these products now support source dedupe.

Speaker:

But that was the main downside or still is the main downside.

Speaker:

If you want source dedupe, you've got to change your backup product,

Speaker:

uh, or you've got to change how you use your backup product,

Speaker:

assuming it starts supporting

Speaker:

yeah, and I would say at this point, probably a good chunk of products

Speaker:

either have their own source ID deduplication mechanism or they

Speaker:

work with deduplicated targets which allow for source ID deduplication.

Speaker:

for instance, integrating with ExaGrid or Data Domain from like TSM, Veeam,

Speaker:

Exactly.

Speaker:

Yeah, there are some that criticize it saying that, the slicing and

Speaker:

dicing and the creation of the hash puts a load on the client.

Speaker:

I have always argued that if done properly, that load created by the

Speaker:

slicing and dicing and hashing is offset by the significant reduction

Speaker:

of the load of transporting or not transporting 99 percent of the data,

Speaker:

right?

Speaker:

Yeah.

Speaker:

Other critiques of it have been that the restore speed wasn't great because of

Speaker:

how the data was stored on the other end.

Speaker:

And I would argue that's a implementation problem.

Speaker:

it's not a problem with the concept.

Speaker:

It's a problem with the implementation of the

Speaker:

And then the other thing to also mention about source side deduplication is

Speaker:

typically these are also using proprietary protocols, so you don't end up with a

Speaker:

lot of security issues you have around, say, having a target dedupe appliance

Speaker:

with NFS or SMB open to the world.

Speaker:

Yep.

Speaker:

Yep.

Speaker:

Agreed.

Speaker:

Yes.

Speaker:

Agreed that there is a security advantage to having the data sliced

Speaker:

and diced way before and then encrypted before you send it to the other

Speaker:

system instead of doing it over an unsecured protocol like NFS or SMB.

Speaker:

Exactly.

Speaker:

All right.

Speaker:

this episode, I think, got a little longer than we had intended for it to

Speaker:

get, but we covered a lot.

Speaker:

We covered a lot in this episode.

Speaker:

so basically, we learned about.

Speaker:

what is and is not a backup.

Speaker:

We learned about, multiplexing, full and incremental backups,

Speaker:

file level incremental backups, and source side deduplication.

Speaker:

Uh, it's a big episode.

Speaker:

what do you think?

Speaker:

Yeah, no, that covers a lot of what everyone talks about when you...

Speaker:

Do you ever refer to backup and restore, You gotta know these backup

Speaker:

technologies in order to be able to restore and protect your company.

Speaker:

These are things that you need to know.

Speaker:

All right.

Speaker:

And with that, I once again want to thank our listeners.

Speaker:

you are why we do this in Prasanna.

Speaker:

Once again.

Speaker:

great at your insights and questions as well.

Speaker:

Thank you, sir.

Speaker:

Thank you, sir.

Speaker:

Keeping me honest.

Speaker:

And, remember this show, the backup wrap up is an independent podcast and

Speaker:

the opinions that you hear are ours.

Speaker:

Not anyone else's, and also this is a production of BackupCentral.

Speaker:

com and, uh, produced and edited by yours truly.

Speaker:

And I just want to say, that's a wrap.