Speaker: 00:00:00

There are dozens of things that people do to protect their data

Speaker: 00:00:03

from loss, but many of them are worthless when you actually need them.

Speaker: 00:00:07

In this episode, we'll learn what turns a copy into a backup so that

Speaker: 00:00:13

you can make sure that anything you think is a backup actually is one.

Speaker: 00:00:17

We'll also talk about some important backup concepts like

Speaker: 00:00:20

multiplexing, incremental backups, block-level incremental backups

Speaker: 00:00:25

and source side deduplication.

Speaker: 00:00:27

Hi, I'm W.

Speaker: 00:00:29

Curtis Preston, AKA Mister backup.

Speaker: 00:00:31

And I've been specializing in backup and disaster recovery for over 30 years.

Speaker: 00:00:36

My podcast turns unappreciated backup admins into cyber recovery heroes.

Speaker: 00:00:41

This is the backup wrap up.

Speaker: 00:01:04

. Hi and welcome to the show.

Speaker: 00:01:05

and I have with me a guy who I think is super jelly of the new

Speaker: 00:01:10

toy that I put in yesterday.

Speaker: 00:01:12

Prasanna Malaiyandi.

Speaker: 00:01:14

How's it going, Prasanna?

Speaker: 00:01:15

I'm good Curtis, and yes, I am jealous of your toy, but I will have to start

Speaker: 00:01:21

sending you a bill for consulting fees, when you inevitably, or if

Speaker: 00:01:26

you inevitably run into issues.

Speaker: 00:01:29

Huh?

Speaker: 00:01:30

Yeah, because as you recall, I didn't, I purchased this like without even talking

Speaker: 00:01:35

to you, which is rather atypical of me.

Speaker: 00:01:38

because I, we talk about so much and, and basically what we're talking about

Speaker: 00:01:43

is a Firewalla, which I've had my eye on for a while and then after I realized,

Speaker: 00:01:49

so I've switched internet service providers and now they're telling me

Speaker: 00:01:53

that I'm hitting the bandwidth limit already and which is highly possible

Speaker: 00:01:56

given that I do, this, I realized that I had no bandwidth monitoring tools.

Speaker: 00:02:01

I have this really nice, mesh router system.

Speaker: 00:02:04

Wire, the wire, the wifi mesh, but that's put been put an access

Speaker: 00:02:11

point mode, which then of course offers no bandwidth monitoring.

Speaker: 00:02:15

And then I had this Cox router, which offered me nothing.

Speaker: 00:02:19

And so I replaced the Cox router with the Firewallet Purple SE and man.

Speaker: 00:02:26

Super simple to put in there.

Speaker: 00:02:28

And now I have these like super, stats.

Speaker: 00:02:31

And I get these, I'm going to, at some point I'm going to have to disable

Speaker: 00:02:35

the notifications because it's like Curtis is playing games on his phone.

Speaker: 00:02:42

Curtis is watching YouTube videos on MacBook Pro A, right?

Speaker: 00:02:49

It's literally, it's like Curtis has downloaded 3.

Speaker: 00:02:52

56 gigabytes of video on his, and I'm like, okay, this

Speaker: 00:02:57

is going to get old pretty

Speaker: 00:02:58

I have two questions for you.

Speaker: 00:03:01

Yeah,

Speaker: 00:03:01

The first question is, Did you figure out what was consuming

Speaker: 00:03:05

your data cap or data usage cap?

Speaker: 00:03:08

not yet.

Speaker: 00:03:09

Cause it's only, it hasn't even been 24 hours, but I do have a pretty good guess

Speaker: 00:03:14

and I think it's right in front of me.

Speaker: 00:03:19

we'll see what is weird.

Speaker: 00:03:21

And again, I didn't want to talk about this too much, but what is weird is I

Speaker: 00:03:24

get these, some of the notifications it's, Mac book pro uploaded 3.

Speaker: 00:03:31

5 megabytes of data to LinkedIn.

Speaker: 00:03:34

At 3 45 a.

Speaker: 00:03:36

And I'm like, what, like, why is my laptop uploading three and a half

Speaker: 00:03:42

megabytes of anything while it's just sitting here and I'm sleeping somewhere?

Speaker: 00:03:49

that's weird.

Speaker: 00:03:51

weird.

Speaker: 00:03:52

Yeah.

Speaker: 00:03:52

So anyway, so yeah, go ahead.

Speaker: 00:03:54

Yeah.

Speaker: 00:03:56

since you've decided to get rid of the Cox router, can you not just

Speaker: 00:04:00

use your Wi Fi mesh as a router?

Speaker: 00:04:04

I could have, but then I would have to completely redo my

Speaker: 00:04:07

network architecture, which as you recall, was a really big thing.

Speaker: 00:04:12

That's true.

Speaker: 00:04:13

And I actually really liked the firewall features of this.

Speaker: 00:04:16

That's, that was what would, what really drew me to it.

Speaker: 00:04:18

And I'd been thinking about it and this was that final excuse to get it.

Speaker: 00:04:22

and, I'm really enjoying the security aspects

Speaker: 00:04:25

glad.

Speaker: 00:04:26

So there, for people, if Mr.

Speaker: 00:04:28

Backup can learn networking and firewalls, so can you.

Speaker: 00:04:34

Yeah.

Speaker: 00:04:34

It's certainly not my, my forte.

Speaker: 00:04:37

but I want to talk, it's time for the news of the week.

Speaker: 00:04:42

, the big news, I think, of the entire IG world, everybody seems to be

Speaker: 00:04:47

talking about it, is this MGM hack.

Speaker: 00:04:51

I can't, can you imagine?

Speaker: 00:04:53

if you've been living in a hole, They shut down MGM and Caesars and all

Speaker: 00:04:58

of the hotels attached to MGM and Caesars, which is like half the strip.

Speaker: 00:05:03

And they shut down like card keys, slot machines,

Speaker: 00:05:08

ATMs.

Speaker: 00:05:09

everything.

Speaker: 00:05:10

ATMs,

Speaker: 00:05:12

So for, so for our listeners, Who may not know about this,

Speaker: 00:05:15

MGM is a hotel chain, right?

Speaker: 00:05:17

They have a bunch of things as well, right?

Speaker: 00:05:19

Like various hotels like Caesars and MGM, and they are in Las Vegas

Speaker: 00:05:24

and there are casinos, right?

Speaker: 00:05:26

So you can stay there, you can gamble there, right?

Speaker: 00:05:29

They make lots and lots and lots and lots of money.

Speaker: 00:05:32

but not in the last week or so,

Speaker: 00:05:35

And they got hit by a cyber attack, on the week of September

Speaker: 00:05:39

20th or so, I'm guessing.

Speaker: 00:05:41

yeah, and The I think that's the saddest part about this and by the way as of

Speaker: 00:05:49

today There's been half a dozen lawsuits attached because there's, threat of a,

Speaker: 00:05:57

PII leak, personal information leak.

Speaker: 00:05:59

And so there's been all sorts of worries about that.

Speaker: 00:06:03

So there's, a half a dozen, what do you call it?

Speaker: 00:06:05

Class action or lawsuits that are attempting.

Speaker: 00:06:08

To achieve class action status that have been filed.

Speaker: 00:06:11

I think the saddest part here and the way I like and we'll put the

Speaker: 00:06:16

link to this particular article in the show description, the

Speaker: 00:06:21

heading here targeting layer eight.

Speaker: 00:06:24

I've heard of the seven layer networking model again With my extensive

Speaker: 00:06:28

networking experience, the OSI model.

Speaker: 00:06:31

What is Layer 8,

Speaker: 00:06:34

People?

Speaker: 00:06:35

it's people.

Speaker: 00:06:35

Yes.

Speaker: 00:06:36

So layer 8 is people, which is probably the weakest part

Speaker: 00:06:42

of the entire stack, I'd say.

Speaker: 00:06:45

Yeah.

Speaker: 00:06:45

You are the weakest link!

Speaker: 00:06:50

Yeah, so what, how did they get in?

Speaker: 00:06:51

So they basically targeted an employee, right?

Speaker: 00:06:55

Who had the right level of access and They basically were able to gain access into

Speaker: 00:07:03

their Okta environment as a super admin.

Speaker: 00:07:06

But how did they do that?

Speaker: 00:07:08

That's the

Speaker: 00:07:08

Oh, so how did they do that?

Speaker: 00:07:10

They basically tripped, tricked their IT help desk?

Speaker: 00:07:14

That is so bad, right?

Speaker: 00:07:17

they somehow got...

Speaker: 00:07:19

Access to a privileged account, right?

Speaker: 00:07:21

according to the powers that be that they stole a password or they

Speaker: 00:07:25

hacked Active Directory somehow.

Speaker: 00:07:27

So they were able to attempt to log in, but they were stopped by MFA

Speaker: 00:07:32

which is a good thing, Okta, but then

Speaker: 00:07:35

They were able to convince the help desk that they were the person in

Speaker: 00:07:38

question and get them to reset MFA.

Speaker: 00:07:41

Now here's a question.

Speaker: 00:07:42

Do you think that employee is still there at the company?

Speaker: 00:07:47

And

Speaker: 00:07:47

is one of those

Speaker: 00:07:48

you be blaming the person

Speaker: 00:07:51

so I'm going to fast forward like 30 years.

Speaker: 00:07:54

Okay.

Speaker: 00:07:55

so 20, what would that be?

Speaker: 00:07:58

2053.

Speaker: 00:08:00

There's a guy he's going to be called Mr.

Speaker: 00:08:02

MFA and he's going to have a podcast dedicated to security because like my

Speaker: 00:08:09

career started with a screw up of this.

Speaker: 00:08:12

not quite this magnitude, but my career started with this.

Speaker: 00:08:16

And so I, My personal opinion, I don't know if this person is, has been fired.

Speaker: 00:08:23

I think they should only be fired if they didn't follow the processes that had been,

Speaker: 00:08:27

established and they

Speaker: 00:08:28

set out for them.

Speaker: 00:08:29

Potentially they should be disciplined.

Speaker: 00:08:31

I don't know if firing, if.

Speaker: 00:08:32

If termination is the appropriate, they should be disciplined.

Speaker: 00:08:35

If they followed the procedures that had been laid out for

Speaker: 00:08:38

them, process, people, right?

Speaker: 00:08:41

Then technology.

Speaker: 00:08:43

If they had been, we just had a podcast about that.

Speaker: 00:08:46

If they followed the procedures.

Speaker: 00:08:48

have been given to them.

Speaker: 00:08:49

Then I think some massive leniency, then you update your procedures,

Speaker: 00:08:54

et cetera, et cetera, et cetera.

Speaker: 00:08:55

I think of a massive, outage that was caused at a major, software

Speaker: 00:09:02

vendor that I worked with.

Speaker: 00:09:03

I'm trying to be, I'm trying to be very cagey here, where the backup

Speaker: 00:09:09

operator followed his Procedure that they had two parts of the app

Speaker: 00:09:16

that had to be shut down in order to do a backup because they couldn't

Speaker: 00:09:19

synchronize the two backup systems.

Speaker: 00:09:22

And so every two weeks they would shut down these apps and,

Speaker: 00:09:27

and then do a backup offline.

Speaker: 00:09:30

And this person.

Speaker: 00:09:31

The backup operator just did what they were told to do and shut down these

Speaker: 00:09:36

apps at the most critical time of the year when the apps were needed, right?

Speaker: 00:09:42

The person was just doing their job.

Speaker: 00:09:44

That person should not be fired.

Speaker: 00:09:45

That person should be You know, you changed the procedure.

Speaker: 00:09:49

So I don't know what happened here.

Speaker: 00:09:50

Yeah.

Speaker: 00:09:50

You train.

Speaker: 00:09:51

Yeah.

Speaker: 00:09:52

I hope some leniency was there.

Speaker: 00:09:55

if the person was fired and, I'd love to have them on the podcast.

Speaker: 00:10:01

But anyway, what could we learn from this, from this news here?

Speaker: 00:10:04

Prasanna?

Speaker: 00:10:05

basically that, one is even if you have the greatest technologies in

Speaker: 00:10:11

place and the greatest processes in place, people will always exist

Speaker: 00:10:18

never underestimate the power of people to do dumb things.

Speaker: 00:10:22

I do think that perhaps what's in order here is an update to process.

Speaker: 00:10:27

And the process should be when, cause you have to be able to reset MFA.

Speaker: 00:10:32

When resetting MFA, it should require many more, bells and whistles

Speaker: 00:10:36

and levels of authentication.

Speaker: 00:10:38

And we need to identify, we need to identify that this person who

Speaker: 00:10:42

calls in that says that they're Steve, we need a way to identify

Speaker: 00:10:45

that Steve is actually Steve.

Speaker: 00:10:48

so you create a process around that, that really verifies that someone who they are,

Speaker: 00:10:52

and especially,

Speaker: 00:10:53

you reset MFA.

Speaker: 00:10:54

and especially when it's someone of, with that level of privilege.

Speaker: 00:10:59

Especially, super especially, that's not a word, but yeah,

Speaker: 00:11:02

that, oh, I feel for these guys.

Speaker: 00:11:06

keep, abreast of, this story because it is going to get worse before it gets better.

Speaker: 00:11:13

And that's the news for this week.

Speaker: 00:11:18

So what I thought we would talk about this week and the backup to basic series is,

Speaker: 00:11:23

I've got it defined as, backup methods that support a traditional restore.

Speaker: 00:11:29

So basically the backup methods that I grew up with that are still,

Speaker: 00:11:33

Relevant.

Speaker: 00:11:34

Yeah.

Speaker: 00:11:34

in, yeah, right?

Speaker: 00:11:37

we like to live in a world where everybody's using the

Speaker: 00:11:39

latest and greatest, right?

Speaker: 00:11:42

And nobody's doing this old, full and incremental backups and stuff.

Speaker: 00:11:46

Nobody's doing that.

Speaker: 00:11:48

And that's just not right.

Speaker: 00:11:50

So we need to talk about these, these methods and see what we can get out there.

Speaker: 00:11:55

the first thing, I just have to, again, I'm, I'm, we're doing this based

Speaker: 00:12:00

on, my book, Modern Data Protection, There's a cover for those of you

Speaker: 00:12:04

watching via video, all, all three listeners that are watching via video.

Speaker: 00:12:10

I think it's 10.

Speaker: 00:12:12

There's maybe 10.

Speaker: 00:12:13

the number's actually gone up since we've been putting them on YouTube.

Speaker: 00:12:15

Oh, there you go.

Speaker: 00:12:18

the, I've got this thing in here.

Speaker: 00:12:21

so this is from chapter nine and talking about backup and

Speaker: 00:12:24

recovery software methods.

Speaker: 00:12:25

And the first thing I had in there was, is everything backup?

Speaker: 00:12:28

So there was a time when backup was well defined.

Speaker: 00:12:31

Backup was copy something to tape and then put that tape in a box,

Speaker: 00:12:35

right?

Speaker: 00:12:35

It was so simple back then.

Speaker: 00:12:37

Yeah, it was so simple back then.

Speaker: 00:12:39

Yes.

Speaker: 00:12:40

so I, as quote, Mr.

Speaker: 00:12:44

Backup, I see backup a lot broader than I think a lot of people do.

Speaker: 00:12:49

A lot of people, when they say backup, they go, Oh, this isn't backup.

Speaker: 00:12:51

This is, to me, backup is anything really that protects the data, the

Speaker: 00:12:56

way backup protects data, right?

Speaker: 00:12:57

And so I'm defining backup rather broadly as anything that is a copy of data

Speaker: 00:13:02

stored separately from the original.

Speaker: 00:13:04

that can be used to restore the original if it is damaged.

Speaker: 00:13:08

There's a lot of things that qualify for backup as backup under that

Speaker: 00:13:12

so let me just give you some examples and see if you think they qualify.

Speaker: 00:13:17

Okay.

Speaker: 00:13:17

So take a copy on tape,

Speaker: 00:13:20

Yes.

Speaker: 00:13:21

a copy in AWS S3.

Speaker: 00:13:23

A copy of the data that's in S3, which is separate from

Speaker: 00:13:28

The

Speaker: 00:13:28

your not.

Speaker: 00:13:30

Yeah.

Speaker: 00:13:30

yes.

Speaker: 00:13:31

okay?

Speaker: 00:13:31

a copy replicated from one storage system to another storage

Speaker: 00:13:36

system from the same vendor.

Speaker: 00:13:37

as long as...

Speaker: 00:13:39

there's a caveat here, because you used the word replication.

Speaker: 00:13:44

I need the ability, is it replicated in such a way that if

Speaker: 00:13:49

I damage production, so

Speaker: 00:13:51

that doesn't qualify as being stored separately.

Speaker: 00:13:54

Replicated with separate retention of the copies on the destination.

Speaker: 00:13:59

Okay.

Speaker: 00:14:00

Yes, I would call that a

Speaker: 00:14:01

Okay, snapshots on a production system, on a production storage

Speaker: 00:14:05

array that does not include AWS S3,

Speaker: 00:14:10

thank you for, yeah, so snapshots on the same array.

Speaker: 00:14:15

no, End of story, not a backup until it's copied somewhere

Speaker: 00:14:20

Okay, and then doing what you were recently doing when

Speaker: 00:14:23

editing the podcast, right?

Speaker: 00:14:25

Downloading a copy from the cloud onto your local system, copying it

Speaker: 00:14:29

to a different directory, and then copying it to yet a third directory.

Speaker: 00:14:34

On your local system.

Speaker: 00:14:35

Is that local system considered backups, each of those copies?

Speaker: 00:14:39

again, we're storing the data in a separate place that

Speaker: 00:14:42

has a separate risk profile.

Speaker: 00:14:45

Etc.

Speaker: 00:14:46

yes,

Speaker: 00:14:46

As long as the copy, the original

Speaker: 00:14:48

copy was in the, cloud.

Speaker: 00:14:49

it's also about, the purpose of why I'm doing it, right?

Speaker: 00:14:52

If the purpose of downloading that is to serve as possibly a backup, right?

Speaker: 00:14:59

Because there's a lot of times that we download data That

Speaker: 00:15:02

is not for backup purposes.

Speaker: 00:15:04

Now, it could accidentally become a backup if it's the only

Speaker: 00:15:07

copy that you have available.

Speaker: 00:15:08

But, just because I copy doesn't necessarily make it a backup.

Speaker: 00:15:12

It might be an archive.

Speaker: 00:15:13

And then the last example.

Speaker: 00:15:14

taking pictures on your iPhone and using iCloud to sync

Speaker: 00:15:19

your copies to iCloud photos.

Speaker: 00:15:23

Not a backup.

Speaker: 00:15:26

Because,

Speaker: 00:15:26

is that?

Speaker: 00:15:27

for two reasons.

Speaker: 00:15:29

One, which is really the primary.

Speaker: 00:15:30

And that is specifically in terms of Apple iCloud.

Speaker: 00:15:36

But the biggest thing is that it's synchronized.

Speaker: 00:15:39

that's the key.

Speaker: 00:15:40

That's, you're, you asked earlier, you delete a picture in your phone or some

Speaker: 00:15:46

app, delete, some like ransomware deletes a bunch of pictures in your phone.

Speaker: 00:15:50

It synchronizes that deletion up in the cloud and they go byebye, right?

Speaker: 00:15:54

It is a synchronized copy, not a backup.

Speaker: 00:15:57

it is stored separately, but if you delete it here and it gets deleted

Speaker: 00:16:02

there, that's not a backup, right?

Speaker: 00:16:04

Just like we were talking, before.

Speaker: 00:16:06

And that's one really important reason, possibly the most important reason.

Speaker: 00:16:12

But the other is that there's a feature in iPhone that...

Speaker: 00:16:17

It says we can store low res copies on the phone and the high res copies in

Speaker: 00:16:21

the cloud, which means that not only is it a synchronized copy, the only true

Speaker: 00:16:26

copy of your photo is in the cloud.

Speaker: 00:16:28

It's only one copy, which means you need to be backing up iCloud.

Speaker: 00:16:32

and by extension also, Google photos if you're an Android

Speaker: 00:16:36

person, so yeah, not a backup,

Speaker: 00:16:38

okay, no,

Speaker: 00:16:39

which we had a whole podcast episode about that.

Speaker: 00:16:41

How to properly back up your iCloud account.

Speaker: 00:16:44

yeah,

Speaker: 00:16:45

were good examples.

Speaker: 00:16:46

I think those are a lot of things, like you said, right?

Speaker: 00:16:48

It's not always easy to say, is it a backup or not?

Speaker: 00:16:51

Unless you dive into the next level of questions and ask, okay, is it really a

Speaker: 00:16:55

yeah,

Speaker: 00:16:56

Does it meet these

Speaker: 00:16:57

I think.

Speaker: 00:16:57

or not?

Speaker: 00:16:58

I think you did a good job of, the different categories, like that

Speaker: 00:17:02

thing of, if it's fully synchronized, whether synchronous or asynchronous,

Speaker: 00:17:06

if it's fully synchronized and if I delete the production and

Speaker: 00:17:09

it deletes the data, the copy,

Speaker: 00:17:12

that, That's not a backup.

Speaker: 00:17:13

right?

Speaker: 00:17:14

unless that copy has the ability to undo that.

Speaker: 00:17:18

If it does, then, I would change my answer, right?

Speaker: 00:17:20

And so like a NetApp synchronized filer, I would consider that

Speaker: 00:17:25

other copy, that would be backup,

Speaker: 00:17:28

other things that are not a backup, one that you didn't mention would be,

Speaker: 00:17:34

the recycle bin in your Microsoft 365.

Speaker: 00:17:37

That is not a backup, right?

Speaker: 00:17:39

It's not stored separately.

Speaker: 00:17:40

it's just, records in a database that have been flagged as deleted.

Speaker: 00:17:44

They haven't gone anywhere.

Speaker: 00:17:46

They're sitting right next to the production data.

Speaker: 00:17:48

So yeah,

Speaker: 00:17:49

Okay.

Speaker: 00:17:50

And then the other one is,

Speaker: 00:17:55

So in your opinion, does backup require you to always be able to go

Speaker: 00:18:03

back to a point in time that could plausibly have existed in the system?

Speaker: 00:18:12

And the reason I'm asking this is if I look at, I know email archiving comes

Speaker: 00:18:16

up a lot and sometimes people are like, oh, that's the same as backup.

Speaker: 00:18:20

But with email archive, you're just getting all the data that's there,

Speaker: 00:18:22

whether or not your mailbox actually looked like that, your inbox looked

Speaker: 00:18:26

like that or not at any point in time.

Speaker: 00:18:29

Yeah.

Speaker: 00:18:30

So backup.

Speaker: 00:18:34

requires restore, right?

Speaker: 00:18:37

For it to be a backup, you need to be able to restore it to the way it

Speaker: 00:18:41

looked at some point in time, right?

Speaker: 00:18:45

yeah, that's a really good question, Prasanna.

Speaker: 00:18:48

it's one thing to say a file, but, if you cannot, if you cannot bring

Speaker: 00:18:54

the thing that's been damaged back to its You know, back to before it

Speaker: 00:19:01

was damaged and that it comes back to the same way as it was before it was

Speaker: 00:19:06

damaged, then you don't have a backup.

Speaker: 00:19:09

You copy of the data, right?

Speaker: 00:19:12

And an email archive is a perfect example of that.

Speaker: 00:19:15

You have a copy of the data, but it was stored for a different purpose.

Speaker: 00:19:18

It was stored for archive, which means it wasn't designed to be put back into the,

Speaker: 00:19:25

the state it was in,

Speaker: 00:19:27

yeah, the state that it was in,

Speaker: 00:19:28

right?

Speaker: 00:19:28

so you might be able to restore all the email, but you won't be able to

Speaker: 00:19:31

restore folders and things like that.

Speaker: 00:19:33

A good backup should bring the thing back to the way it was before it was damaged,

Speaker: 00:19:38

however it let's go back to a time when tape drive started getting, so here,

Speaker: 00:19:43

we're going to talk about a feature that is now for many people, passe, right?

Speaker: 00:19:49

it's not really necessary because they no longer use tape as their primary target

Speaker: 00:19:54

or their initial target of backups.

Speaker: 00:19:57

and that is this concept of multiplexing.

Speaker: 00:19:59

And it goes back to, there was a time when we

Speaker: 00:20:03

Way back in the days.

Speaker: 00:20:05

right back in the day.

Speaker: 00:20:07

So multiplexing, do you want to define multiplexing or explain it?

Speaker: 00:20:11

Yeah, multiplexing.

Speaker: 00:20:12

Yeah, I, let me attempt to, I know I wasn't aware of this before we started

Speaker: 00:20:16

doing the podcast and you explained everything about tape and I know we've

Speaker: 00:20:20

had a bunch of folks, tape experts on the podcast as well, but multiplexing is...

Speaker: 00:20:26

to solve an issue where tape requires you to write at a certain speed.

Speaker: 00:20:34

If you don't, it's bad.

Speaker: 00:20:36

And tapes got faster and faster, but the problem was pumping data into the tape

Speaker: 00:20:39

device itself wasn't going as quickly as the tape speeds were increasing.

Speaker: 00:20:45

And so in order to solve that, what they decided to do was say, okay, Let's have

Speaker: 00:20:50

multiple clients feed data into the tape device at the same time, and we will

Speaker: 00:20:55

multiplex or basically write all those streams into the tape drive at the same

Speaker: 00:20:58

time, keeping the tape device happy.

Speaker: 00:21:01

While still being able to do all the backups.

Speaker: 00:21:04

Yeah, another word for it would be interleaving.

Speaker: 00:21:06

You did great.

Speaker: 00:21:07

basically putting all, chopping them up into pieces and then

Speaker: 00:21:10

putting together into one, turning a bunch of streams into one stream.

Speaker: 00:21:14

And when we first started, we used multiplexing settings of four

Speaker: 00:21:20

Which means four different

Speaker: 00:21:21

turn and.

Speaker: 00:21:22

Yeah, four different clients being combined into a stream to

Speaker: 00:21:25

make a tape drive happy, but tape drives got faster and faster.

Speaker: 00:21:29

The clients didn't get faster.

Speaker: 00:21:31

And so by the time I left, by the time I used my last tape drive in

Speaker: 00:21:36

production, we were up to 36, right?

Speaker: 00:21:39

We were up to 36 streams together to, to make an individual tape drive happy.

Speaker: 00:21:45

And the reason,

Speaker: 00:21:47

I was gonna ask why.

Speaker: 00:21:48

Yeah.

Speaker: 00:21:49

Why were clients not fast enough

Speaker: 00:21:51

yeah.

Speaker: 00:21:52

So the reason that this was bad is that, what, why is the only reason we back up,

Speaker: 00:22:00

to restore

Speaker: 00:22:01

right?

Speaker: 00:22:02

So when you

Speaker: 00:22:02

go to

Speaker: 00:22:03

do a restore,

Speaker: 00:22:04

Yeah.

Speaker: 00:22:05

yeah.

Speaker: 00:22:05

When you go to do a restore, you have to read all 36 streams

Speaker: 00:22:10

and throw 35 of them away.

Speaker: 00:22:13

So your tape drive, the speed of your restore is going to be 1 35th.

Speaker: 00:22:20

Of what it could potentially be if it hadn't been multiplexed,

Speaker: 00:22:25

But if you're never doing restore tests, it doesn't really matter.

Speaker: 00:22:27

Until you actually need to restore the data.

Speaker: 00:22:31

yeah, if you're You're killing me you're killing me yeah, so it was one

Speaker: 00:22:36

of these things where it was a Cut your nose off to to spite your face, right?

Speaker: 00:22:44

So We felt that it was But it was a necessary evil.

Speaker: 00:22:50

We, you could only restore if you've got backups done and we could only get

Speaker: 00:22:54

backups done reliably if we were using multiplexing, but we knew that it was

Speaker: 00:22:59

creating this problem and ultimately this was the undoing of tape from

Speaker: 00:23:03

a backup and recovery perspective.

Speaker: 00:23:05

We switched to destaging and.

Speaker: 00:23:08

these other things to undo this, necessary evil.

Speaker: 00:23:11

But, it, it was a mess.

Speaker: 00:23:13

But that's what multiplexing is.

Speaker: 00:23:14

So if you've heard about multiplexing, you don't need to do multiplexing

Speaker: 00:23:18

if you're backing up to disk.

Speaker: 00:23:19

Because disk can write at whatever speed you tell it to write at.

Speaker: 00:23:23

And it can write a bunch of things at the same time.

Speaker: 00:23:26

And you can give it 36 streams and it can write them all at the same time in

Speaker: 00:23:30

separate places of the disk in such a way that when you go to do a restore,

Speaker: 00:23:33

you don't, you're not, you don't have to read all of them to read one of them.

Speaker: 00:23:39

What?

Speaker: 00:23:41

That was my yes.

Speaker: 00:23:44

disk is fast enough, but

Speaker: 00:23:48

Yeah.

Speaker: 00:23:49

Well, it's not,

Speaker: 00:23:50

a disk

Speaker: 00:23:50

drive has a certain

Speaker: 00:23:51

number of IOPS it could handle.

Speaker: 00:23:53

And therefore, as long as your system is big enough.

Speaker: 00:23:57

To handle all of them in peril.

Speaker: 00:23:58

yes.

Speaker: 00:23:59

they're, disk drives are not, Unlimited bandwidth, unlimited IO,

Speaker: 00:24:04

et cetera, et cetera, et cetera.

Speaker: 00:24:05

Yes.

Speaker: 00:24:06

but the point of the way that it lays the data, you don't have to lay the,

Speaker: 00:24:10

you can lay the data however you want and then read it however you want.

Speaker: 00:24:13

there are, again, there are limits to everything depending on how

Speaker: 00:24:18

much you fragment the data and all that kind of stuff, right?

Speaker: 00:24:20

But it's still way better than tape from that perspective.

Speaker: 00:24:24

All right, next one's a whole lot easier.

Speaker: 00:24:27

What comes next?

Speaker: 00:24:28

What's the first type of, what's

Speaker: 00:24:29

let you tackle

Speaker: 00:24:30

what, no, I'll let you tackle this, Curtis.

Speaker: 00:24:32

So what's the, what is it?

Speaker: 00:24:34

The first type of backup that everyone should cut their teeth on.

Speaker: 00:24:40

what a full backup?

Speaker: 00:24:41

Is that

Speaker: 00:24:42

Yeah.

Speaker: 00:24:43

what you're saying?

Speaker: 00:24:44

Yeah.

Speaker: 00:24:45

so basically we're just going to talk about this concept of

Speaker: 00:24:47

full and incremental backups.

Speaker: 00:24:49

And probably everybody knows this, but this is a backup to basic series.

Speaker: 00:24:55

So a full backup backs up everything, an incremental backup

Speaker: 00:24:59

backs up things that have changed.

Speaker: 00:25:02

And the, there are different types of incremental backups, right?

Speaker: 00:25:07

And different people have different names for these different types, right?

Speaker: 00:25:13

terms you've probably heard, incremental, differential, cumulative incremental.

Speaker: 00:25:18

For a lot of people, cumulative incremental and

Speaker: 00:25:21

differential are the same thing.

Speaker: 00:25:23

for people that got stuck in Windows land, not necessarily so what's the

Speaker: 00:25:29

difference between an incremental and these other two things?

Speaker: 00:25:32

A cumulative incremental.

Speaker: 00:25:34

So an incremental is basically, Typically, Sunday you do a full backup, right?

Speaker: 00:25:41

Monday you need to do another backup.

Speaker: 00:25:43

Now, you don't want to do necessarily the entire full backup again,

Speaker: 00:25:47

because maybe that's too much data, you don't have enough time, etc.

Speaker: 00:25:50

So you'll do an incremental, which is basically whatever has

Speaker: 00:25:54

changed since the last full.

Speaker: 00:25:56

So since Sunday.

Speaker: 00:25:57

Sorry, since the last time you did a backup, I should say.

Speaker: 00:25:59

exactly, whatever's changed since the last time you did a

Speaker: 00:26:02

Yeah, so in that case, it was Sunday, so then Monday you get the incrementals,

Speaker: 00:26:06

now Tuesday you're going to do backup, and so you do another incremental, which

Speaker: 00:26:09

is whatever has changed since Monday,

Speaker: 00:26:12

Exactly,

Speaker: 00:26:13

and we just keep doing that, right?

Speaker: 00:26:15

Yeah, and

Speaker: 00:26:17

then if it's, yeah, if it's Sunday, right?

Speaker: 00:26:20

And now it's Saturday, how many tapes do I need to do a restore?

Speaker: 00:26:25

do you need...

Speaker: 00:26:26

The previous Sunday, plus the Monday, plus the Tuesday, plus

Speaker: 00:26:29

the Wednesday, Thursday, Friday.

Speaker: 00:26:32

You basically need to replay

Speaker: 00:26:33

by the way, by the way, I really, I really channeled the old Curtis there.

Speaker: 00:26:37

I did it without even meaning to, I said tapes, right?

Speaker: 00:26:40

Cause

Speaker: 00:26:40

that was the problem back then.

Speaker: 00:26:42

We literally had to grab for seven tapes, right?

Speaker: 00:26:46

Nowadays, we don't have to grab for seven tapes, but,

Speaker: 00:26:48

but you still have to do all those restores though, right?

Speaker: 00:26:50

So even in the case of, if a file existed Sunday, and then was deleted Monday,

Speaker: 00:26:56

and then came back on Tuesday, you would still end up having to do all of those

Speaker: 00:27:01

data, like basically you're replaying like a log, all the data that would

Speaker: 00:27:05

have existed on each of those days.

Speaker: 00:27:08

right.

Speaker: 00:27:08

The real problem is a file that was changed every single day.

Speaker: 00:27:13

You would actually restore that file seven times.

Speaker: 00:27:16

It's a lot of wasted effort.

Speaker: 00:27:17

That's just the idea of a increment or regular incremental.

Speaker: 00:27:21

Then we have a differential or a cumulative incremental.

Speaker: 00:27:25

And the difference between that is that it's going to, it's going to do

Speaker: 00:27:27

the thing that you said earlier, which is it's going to back up everything

Speaker: 00:27:30

that's changed since the fall.

Speaker: 00:27:32

And so what some people do is that they've stopped, they stopped doing

Speaker: 00:27:36

incrementals and they switched to differentials or cumulative incrementals

Speaker: 00:27:41

every day, and that way at the end of the week, I would need at most two tapes.

Speaker: 00:27:46

Right now, this whole thing has pretty much gone away in the world of.

Speaker: 00:27:53

disk based backups, right?

Speaker: 00:27:55

Because the whole reason that we did backups this way, is

Speaker: 00:27:59

that, first off, let me back up.

Speaker: 00:28:01

We used to do weekly fulls followed by daily incrementals.

Speaker: 00:28:04

Then we switched for, because when we went to automated tape libraries,

Speaker: 00:28:10

the whole process of managing the different tapes wasn't as a big.

Speaker: 00:28:14

Big of a deal.

Speaker: 00:28:15

So we went to monthly folds followed by daily incrementals or maybe

Speaker: 00:28:18

a weekly cumulative and right?

Speaker: 00:28:21

So you'd still need a maximum of seven tapes to do a restore But when

Speaker: 00:28:25

we switched to this this whole thing just became Kind of silly and moot and

Speaker: 00:28:30

whatever and you could back up, however, you wanted to back up and dedupe,

Speaker: 00:28:34

which we're going to talk about in a minute, dedupe really changed the game.

Speaker: 00:28:39

And, because it didn't matter whether you backed up full or incremental or whatever,

Speaker: 00:28:43

you still stored the same amount of data.

Speaker: 00:28:45

Speaker: 00:28:46

before we jump though, one thing that I think people might also hear in addition

Speaker: 00:28:53

to fulls, incrementals, differentials, and cumulative incrementals is also levels.

Speaker: 00:29:00

So maybe you could talk about levels.

Speaker: 00:29:01

I know sometimes it's specific to like Oracle.

Speaker: 00:29:04

And some databases, but maybe it might

Speaker: 00:29:06

no, that's a good point.

Speaker: 00:29:06

Yeah, thanks.

Speaker: 00:29:08

so the concept of a backup level, literally, this goes

Speaker: 00:29:13

back to the days of dump, right?

Speaker: 00:29:16

which was the command to backup Unix file systems.

Speaker: 00:29:20

A level zero was a full, a level one, And if you wanted to do increment, if you

Speaker: 00:29:26

want to do what we call the incremental backups, the way we, you would do a zero

Speaker: 00:29:29

followed by a one, followed by a two, followed by a three, followed by a four.

Speaker: 00:29:34

And, it got interesting because if you then lowered the number.

Speaker: 00:29:40

It would behave like a,

Speaker: 00:29:41

cumulative incremental, right?

Speaker: 00:29:43

so like you could do a zero and then you do a one.

Speaker: 00:29:49

If you then did another one, if you kept doing ones, you would get a differential.

Speaker: 00:29:53

You would get a cumulative incremental every day.

Speaker: 00:29:56

If you did a 0, a 1, and then a 2, and then a 1 again, it's just, it

Speaker: 00:30:02

basically, it always pointed back to the number that was the most recent

Speaker: 00:30:08

number that was lower than itself, and so it got complicated, and so

Speaker: 00:30:12

there were actually some people that

Speaker: 00:30:14

Is it they prefer

Speaker: 00:30:15

called Towers of

Speaker: 00:30:16

Hanoi, Yeah, which is based on the game, and I've got it in the book,

Speaker: 00:30:22

the Towers of Hanoi progressive thing, but I can't, it's like 0, 3, 2, 4, so

Speaker: 00:30:30

basically every backup, without doing cumulative incrementals, every backup,

Speaker: 00:30:34

every file that was changed would end up being on two tapes, which was just

Speaker: 00:30:39

an interesting way to, To minimize tape, again, this is all because we're doing

Speaker: 00:30:43

tapes, but nobody has tapes anymore.

Speaker: 00:30:44

So nobody cares.

Speaker: 00:30:45

But that's what levels were.

Speaker: 00:30:46

It was all the way up to nine.

Speaker: 00:30:48

and they still have this concept in, in things like Oracle Backup.

Speaker: 00:30:53

So the next thing to talk about is this concept called file

Speaker: 00:30:56

level incremental forever.

Speaker: 00:30:58

And the company that really put this out there was IBM with their product TSM.

Speaker: 00:31:06

And back in the day,

Speaker: 00:31:07

has been renamed,

Speaker: 00:31:08

idea is you

Speaker: 00:31:08

do one full, what's that?

Speaker: 00:31:10

Hasn't it been renamed?

Speaker: 00:31:12

It has, but I'm just saying they came out with it when they

Speaker: 00:31:15

came out, it was called TSM.

Speaker: 00:31:17

It's now like IBM spectrum protect, but, the idea was you do one full and then

Speaker: 00:31:23

everything is an incremental forever.

Speaker: 00:31:25

we never again do a full and this really saved a lot of

Speaker: 00:31:30

bandwidth and saved a lot of tape.

Speaker: 00:31:32

It came with a mess and that was over time and again, tape over time,

Speaker: 00:31:41

you could end up needing hundreds of tapes to restore a single file system.

Speaker: 00:31:48

you would need just one file from this tape and one file from that tape.

Speaker: 00:31:51

And since the hardest part of a tape is like, it was like

Speaker: 00:31:54

two and a half minutes just to get a tape in and, get it loaded and seek

Speaker: 00:31:58

to So the average point in a tape.

Speaker: 00:32:01

So I was not a fan of doing backups this way when we were talking about tape.

Speaker: 00:32:09

Was there a reason?

Speaker: 00:32:11

what was the use case at the time for that?

Speaker: 00:32:13

it was about saving tape, saving

Speaker: 00:32:15

storage.

Speaker: 00:32:16

It was about saving bandwidth.

Speaker: 00:32:17

the idea, there's nothing wrong with the idea of incremental forever.

Speaker: 00:32:20

It's just that their implementation.

Speaker: 00:32:23

Back in the day when it was all tape, even when they had disk staging.

Speaker: 00:32:27

So they would stage the disk.

Speaker: 00:32:29

So they wouldn't multiplex, by the way, they wouldn't multiplex.

Speaker: 00:32:31

They would stage the disk and then they would, do the backups to tape.

Speaker: 00:32:36

And this only applied to file system backups.

Speaker: 00:32:39

It didn't apply to database backups.

Speaker: 00:32:41

And, but literally you would need hundreds and hundreds of tapes

Speaker: 00:32:46

to restore a single file system.

Speaker: 00:32:48

And it just, I was never a fan of doing backups that way.

Speaker: 00:32:52

As long as we were backing up to tape and they had ways to they had, co location

Speaker: 00:32:57

and these various, and this thing called reclamation, because when you're doing

Speaker: 00:33:01

backups that way, you end up with a lot of tapes that have files on them that have

Speaker: 00:33:07

expired that are no longer needed, but you have other files on there that are needed.

Speaker: 00:33:12

And so you'd have to copy forward.

Speaker: 00:33:15

Yeah.

Speaker: 00:33:15

so that you could reclaim that whole tape and then reuse it.

Speaker: 00:33:18

And

Speaker: 00:33:19

That sounds like a

Speaker: 00:33:20

management nightmare.

Speaker: 00:33:21

An interesting engineering problem, but...

Speaker: 00:33:24

yeah, I was never a fan of doing backups that way.

Speaker: 00:33:28

and I'm even less of a fan now that we don't have to worry about tape.

Speaker: 00:33:33

Now we can just do incremental forever and just do it without all that

Speaker: 00:33:35

co location and reclamation stuff.

Speaker: 00:33:37

Cause on disk, to reclaim, you just delete a file, right?

Speaker: 00:33:41

On tape, you delete a file in the middle of a tape.

Speaker: 00:33:43

You have to reclaim the tape.

Speaker: 00:33:45

so that's file level incremental forever.

Speaker: 00:33:49

And then, with the advent of backing up to disk, Which finally

Speaker: 00:33:54

happened, I don't know, 20 years ago.

Speaker: 00:33:58

It's so funny.

Speaker: 00:33:59

We, we say the advent of something that happened 20 years ago.

Speaker: 00:34:02

When we finally started doing it, and once everybody finally went to, and by

Speaker: 00:34:07

the way, everybody still is not backing up the desk, it's still, there's still

Speaker: 00:34:10

a small contingent of people to back up the tape, so those people will really

Speaker: 00:34:13

enjoy the first half of this episode.

Speaker: 00:34:16

Now we have this concept of block level incremental forever.

Speaker: 00:34:19

Would you like to explain that?

Speaker: 00:34:21

Yeah, with block level incremental, I guess where I think of block level

Speaker: 00:34:31

incremental, I know there's various places you can think about it, is

Speaker: 00:34:33

when it applies to virtual machines and other sort of larger objects.

Speaker: 00:34:39

where it doesn't make sense, to back up an entire VM, doing full, or, incremental

Speaker: 00:34:45

backups away, if you think about how you would have done file level backups, right?

Speaker: 00:34:50

Why would I

Speaker: 00:34:50

Now, what, why would that be?

Speaker: 00:34:52

because I have a file which represents a disk, the entire file

Speaker: 00:34:57

doesn't change every time, right?

Speaker: 00:34:59

Parts of the file

Speaker: 00:35:01

it's, so we're talking to a VMDK file or VDK, For, for,

Speaker: 00:35:06

Hyper V, VDDK, that can't say VDDK.

Speaker: 00:35:10

I think it's VDDK.

Speaker: 00:35:11

Yeah,

Speaker: 00:35:12

I think you're right.

Speaker: 00:35:13

so you're saying if anything changes on there,

Speaker: 00:35:16

you're backing up the entire whole

Speaker: 00:35:17

do an incremental, exactly.

Speaker: 00:35:19

You're going

Speaker: 00:35:19

it's the entire file change, right?

Speaker: 00:35:21

So you're backing up the entire thing, but that doesn't make sense when

Speaker: 00:35:23

you have files which are say 10, 50, 100, 200 gigabytes and you're backing

Speaker: 00:35:29

that up every single time and so with block level incrementals What they

Speaker: 00:35:34

basically have done is say, okay What blocks have changed in this VMDK?

Speaker: 00:35:41

Let me just back those up, right?

Speaker: 00:35:43

Oracle also for databases, they do something similar, right?

Speaker: 00:35:47

Where it's hey Let me only back up the blocks within an Oracle data

Speaker: 00:35:52

file that have changed rather than backing up the entire Oracle database.

Speaker: 00:35:57

And how does the backup product know which blocks have changed?

Speaker: 00:36:01

Usually you have to rely on that vendor to tell you.

Speaker: 00:36:05

So in the case of Oracle, right?

Speaker: 00:36:08

You're usually integrating with Oracle RMAN via SBT or some other

Speaker: 00:36:12

mechanism where Oracle knows, okay, I keep track of the database blocks.

Speaker: 00:36:16

I know which ones are new.

Speaker: 00:36:18

Here is a list of blocks that you need to care about.

Speaker: 00:36:20

Same thing with VMware, when you have their, what is their SDK called?

Speaker: 00:36:27

VADP.

Speaker: 00:36:28

Yeah.

Speaker: 00:36:29

they've changed

Speaker: 00:36:30

the name.

Speaker: 00:36:31

Yeah.

Speaker: 00:36:31

They've changed the name, but basically they're, they have an API to talk to,

Speaker: 00:36:36

and they maintain a bitmap, right?

Speaker: 00:36:39

And then they just give you, here's a map of the bits that you need to go get.

Speaker: 00:36:44

These are the bits that have changed.

Speaker: 00:36:46

They maintain that.

Speaker: 00:36:47

And then the, there's an API for asking for those blocks,

Speaker: 00:36:52

now this is great for disk based systems because if you think about these are

Speaker: 00:36:56

all random spots in a file and so you can dump it out now It's up to figure

Speaker: 00:37:02

out like how you want to do this and I know we'll talk a little bit later

Speaker: 00:37:06

about deduplicated storage, but In the case of Oracle, typically you would just

Speaker: 00:37:10

dump it out as incremental blocks, and just dump it into a file, and now you

Speaker: 00:37:14

have all those blocks captured together.

Speaker: 00:37:16

In the case of VMware, they started doing that.

Speaker: 00:37:19

A lot of back up vendors would just dump it out as raw blocks, which makes sense.

Speaker: 00:37:25

but then, there are other optimizations you can do to do smarter things with

Speaker: 00:37:30

it, because with incremental block based backups, you still have to

Speaker: 00:37:34

restore from multiple files in order to stitch together the final actual image.

Speaker: 00:37:41

Yeah.

Speaker: 00:37:41

And you still have that problem.

Speaker: 00:37:44

That we talked about earlier where you may restore an individual block multiple

Speaker: 00:37:48

times if it changes multiple times, right?

Speaker: 00:37:51

the advantage is it's incredibly efficient.

Speaker: 00:37:55

And the, like when we talk about backing up VMs, I agree with you.

Speaker: 00:38:00

That's where this really shines.

Speaker: 00:38:02

Because back in the day, if we backed up VMs, And we just pretended they were,

Speaker: 00:38:08

physical machines and we were running full and incremental backups on them.

Speaker: 00:38:12

We were beating the crap out of these VMs.

Speaker: 00:38:13

So this is much more IO friendly, to the VMs, right?

Speaker: 00:38:19

So it's much friendlier on the VMs.

Speaker: 00:38:21

That's why we want to talk to the VMware API and get just

Speaker: 00:38:25

the blocks that have changed.

Speaker: 00:38:27

And it doesn't really come with any major downside compared to.

Speaker: 00:38:32

The alternative is because we're storing the data on disk.

Speaker: 00:38:35

Can I

Speaker: 00:38:36

ask one

Speaker: 00:38:37

yeah,

Speaker: 00:38:37

sure.

Speaker: 00:38:38

So we've talked about using block level incrementals for VMware, for databases.

Speaker: 00:38:45

Is there a reason it hasn't really caught on for files?

Speaker: 00:38:50

Because if I take a file and kind of split it up into blocks, right?

Speaker: 00:38:56

Could I get the same benefit?

Speaker: 00:38:58

Or is there a reason that it makes a lot more sense for

Speaker: 00:39:00

like VMs or virtual machines?

Speaker: 00:39:04

the benefit will be relative to the size of the file, right?

Speaker: 00:39:08

The bigger the file, the bigger the benefit that you're going to get.

Speaker: 00:39:12

And I would say that the reason it hasn't caught on is because of the next

Speaker: 00:39:17

thing we're going to discuss, right?

Speaker: 00:39:19

That solved that problem.

Speaker: 00:39:21

But yeah, I think about like files like PST files or maybe a big access

Speaker: 00:39:27

database or backing up like MySQL.

Speaker: 00:39:29

That's not file.

Speaker: 00:39:30

I mean, it is a file, but it's, it's actually a database.

Speaker: 00:39:32

Right.

Speaker: 00:39:33

I'd say the reason they didn't put a lot of effort is deduplication, which,

Speaker: 00:39:40

why don't we just talk about that now?

Speaker: 00:39:42

I know we've covered dedupe, just really quickly for those that don't

Speaker: 00:39:45

understand what dedupe is, the idea is that we're going to identify duplicate

Speaker: 00:39:50

segments of the data, and duplicate means that we've seen this data before.

Speaker: 00:39:58

we've done a full backup or we've done an incremental backup and we've

Speaker: 00:40:02

seen this part of the data before.

Speaker: 00:40:05

And for it to be truly considered ddu, you've gotta look at,

Speaker: 00:40:09

it's gotta be subfile, right?

Speaker: 00:40:10

It's gotta be part of, like we were talking about the V M D K

Speaker: 00:40:14

or the V D D K or a P S T file.

Speaker: 00:40:17

We've gotta be looking inside the file, slicing that up into chunks,

Speaker: 00:40:21

and then deciding this chunk.

Speaker: 00:40:22

We've seen it before, this chunk, we have not, And so there are two

Speaker: 00:40:26

different places that dedupe happens.

Speaker: 00:40:28

One is at the target, which is, like a box, like a data domain

Speaker: 00:40:33

or a quantum box or ExaGrid.

Speaker: 00:40:35

these boxes are target dedupe.

Speaker: 00:40:39

And then there's this thing called source dedupe, which.

Speaker: 00:40:42

really took off from a company that was called Avamar.

Speaker: 00:40:46

That company got sold to EMC, which I know you spent a little

Speaker: 00:40:49

time with, back in the day.

Speaker: 00:40:51

And, both of our previous employer did a source side deduplication.

Speaker: 00:40:56

Yeah, so with the target site is great because you could take it and

Speaker: 00:41:01

plug it in and place anywhere, right?

Speaker: 00:41:03

Because as long as it supports whatever the protocol your client is using, right?

Speaker: 00:41:09

You could just ingest the data and you get all the benefits of deduplication.

Speaker: 00:41:12

So data domain was.

Speaker: 00:41:15

Very popular initially for in virtual tape libraries, right?

Speaker: 00:41:19

So you had tapes, right?

Speaker: 00:41:21

People are constantly doing fulls and incremental backups.

Speaker: 00:41:23

That's perfect to deduplicate.

Speaker: 00:41:25

you plug in a data domain, it emulates the tape interface.

Speaker: 00:41:29

And now you just, your clients still continue writing to there and then all

Speaker: 00:41:32

your data gets deduplicated, right?

Speaker: 00:41:34

And so it doesn't matter if it's NFS or if it's SMB or if it's tape, right?

Speaker: 00:41:39

It just works.

Speaker: 00:41:41

yeah, it's like that firewalla box that I

Speaker: 00:41:43

bought, right?

Speaker: 00:41:44

It just, it just, it goes in and then it just works, right?

Speaker: 00:41:47

You didn't have to change anything.

Speaker: 00:41:49

With source dedupe, the idea is that, there's three parts

Speaker: 00:41:52

of the deduplication process.

Speaker: 00:41:54

There's the slicing and dicing, right?

Speaker: 00:41:57

There's the creation of a hash.

Speaker: 00:41:58

You run the chunk of data through Some sort of cryptographic algorithm, like SHA,

Speaker: 00:42:04

something, and then that gives you a value

Speaker: 00:42:08

and then that value, you have to look up that value in some

Speaker: 00:42:11

sort of hash table, right?

Speaker: 00:42:13

with target deduplication, all three of those actions happen on the

Speaker: 00:42:17

target, which is why it works so well.

Speaker: 00:42:19

You just send the backups the way you're used to sending them,

Speaker: 00:42:21

and then it does the magic.

Speaker: 00:42:23

It slices and dices, it hashes, and it does the lookup, and it figures out which

Speaker: 00:42:26

chunks of data are new based on that hash.

Speaker: 00:42:30

Source side, the first two happen on the source, right?

Speaker: 00:42:34

We slice up the data before we back it up.

Speaker: 00:42:36

We slice up the data We create a hash of the data, and then we ask

Speaker: 00:42:41

some magic person in the cloud, has this hash been seen before?

Speaker: 00:42:46

And the decision is made on the other end.

Speaker: 00:42:50

Yes, we've seen this, or we haven't seen this, and then we

Speaker: 00:42:53

send Or don't send the data.

Speaker: 00:42:58

To me, source dedupe is much more efficient than target dedupe.

Speaker: 00:43:03

The difficulty is that it is a much, it's a little bit baby in

Speaker: 00:43:08

a bathwater situation, right?

Speaker: 00:43:10

Because in order to get it.

Speaker: 00:43:11

You've got to do a forklift upgrade.

Speaker: 00:43:13

You've got to stop using, let's say, again, this is things have changed, but

Speaker: 00:43:19

back in the day, you had to stop using NetBackup and start using Avamar, right?

Speaker: 00:43:24

Stop using Networker or TSM and switch to, Druva, right?

Speaker: 00:43:28

You had to change your backup product to get this done.

Speaker: 00:43:32

Things change a little bit over time, right?

Speaker: 00:43:34

A lot of these products now support source dedupe.

Speaker: 00:43:38

But that was the main downside or still is the main downside.

Speaker: 00:43:41

If you want source dedupe, you've got to change your backup product,

Speaker: 00:43:45

uh, or you've got to change how you use your backup product,

Speaker: 00:43:49

assuming it starts supporting

Speaker: 00:43:50

yeah, and I would say at this point, probably a good chunk of products

Speaker: 00:43:55

either have their own source ID deduplication mechanism or they

Speaker: 00:44:00

work with deduplicated targets which allow for source ID deduplication.

Speaker: 00:44:04

for instance, integrating with ExaGrid or Data Domain from like TSM, Veeam,

Speaker: 00:44:11

Exactly.

Speaker: 00:44:12

Yeah, there are some that criticize it saying that, the slicing and

Speaker: 00:44:18

dicing and the creation of the hash puts a load on the client.

Speaker: 00:44:21

I have always argued that if done properly, that load created by the

Speaker: 00:44:26

slicing and dicing and hashing is offset by the significant reduction

Speaker: 00:44:31

of the load of transporting or not transporting 99 percent of the data,

Speaker: 00:44:36

right?

Speaker: 00:44:37

Yeah.

Speaker: 00:44:38

Other critiques of it have been that the restore speed wasn't great because of

Speaker: 00:44:43

how the data was stored on the other end.

Speaker: 00:44:45

And I would argue that's a implementation problem.

Speaker: 00:44:48

it's not a problem with the concept.

Speaker: 00:44:49

It's a problem with the implementation of the

Speaker: 00:44:51

And then the other thing to also mention about source side deduplication is

Speaker: 00:44:55

typically these are also using proprietary protocols, so you don't end up with a

Speaker: 00:44:58

lot of security issues you have around, say, having a target dedupe appliance

Speaker: 00:45:02

with NFS or SMB open to the world.

Speaker: 00:45:07

Yep.

Speaker: 00:45:07

Yep.

Speaker: 00:45:07

Agreed.

Speaker: 00:45:08

Yes.

Speaker: 00:45:08

Agreed that there is a security advantage to having the data sliced

Speaker: 00:45:13

and diced way before and then encrypted before you send it to the other

Speaker: 00:45:17

system instead of doing it over an unsecured protocol like NFS or SMB.

Speaker: 00:45:21

Exactly.

Speaker: 00:45:22

All right.

Speaker: 00:45:23

this episode, I think, got a little longer than we had intended for it to

Speaker: 00:45:26

get, but we covered a lot.

Speaker: 00:45:29

We covered a lot in this episode.

Speaker: 00:45:31

so basically, we learned about.

Speaker: 00:45:33

what is and is not a backup.

Speaker: 00:45:35

We learned about, multiplexing, full and incremental backups,

Speaker: 00:45:38

file level incremental backups, and source side deduplication.

Speaker: 00:45:43

Uh, it's a big episode.

Speaker: 00:45:45

what do you think?

Speaker: 00:45:46

Yeah, no, that covers a lot of what everyone talks about when you...

Speaker: 00:45:51

Do you ever refer to backup and restore, You gotta know these backup

Speaker: 00:45:54

technologies in order to be able to restore and protect your company.

Speaker: 00:45:57

These are things that you need to know.

Speaker: 00:46:00

All right.

Speaker: 00:46:00

And with that, I once again want to thank our listeners.

Speaker: 00:46:05

you are why we do this in Prasanna.

Speaker: 00:46:07

Once again.

Speaker: 00:46:08

great at your insights and questions as well.

Speaker: 00:46:11

Thank you, sir.

Speaker: 00:46:12

Thank you, sir.

Speaker: 00:46:14

Keeping me honest.

Speaker: 00:46:15

And, remember this show, the backup wrap up is an independent podcast and

Speaker: 00:46:20

the opinions that you hear are ours.

Speaker: 00:46:22

Not anyone else's, and also this is a production of BackupCentral.

Speaker: 00:46:26

com and, uh, produced and edited by yours truly.

Speaker: 00:46:30

And I just want to say, that's a wrap.