Speaker:

If you're responsible for backup and Dr.

Speaker:

At some point, someone is going to tell you about their amazing product based

Speaker:

on continuous data protection or CDP.

Speaker:

They say they can meet an RTO and RPO of zero, which sounds great.

Speaker:

Why don't we do all backups and Dr.

Speaker:

Using this method.

Speaker:

Hi, I'm W.

Speaker:

Curtis Preston, AKA Mister backup.

Speaker:

And I started this podcast to turn unappreciated, backup admins

Speaker:

into cyber recovery heroes.

Speaker:

This episode will answer all your questions about CDP, which

Speaker:

some say is the next great thing.

Speaker:

And Dr.

Speaker:

This is the backup wrap-up.

Speaker:

Hi and welcome to the show.

Speaker:

And once again, I have a guy

Speaker:

who cost me money.

Speaker:

Prasanna Malaiyandi how's it going, Prasanna?

Speaker:

I'm good.

Speaker:

I'm worried about

Speaker:

what I'm going to

Speaker:

be blamed for

Speaker:

now.

Speaker:

Well, I think, I think that, you know,

Speaker:

the, the fact that I

Speaker:

have new AirPods is your fault.

Speaker:

What do you

Speaker:

So,

Speaker:

uh, no.

Speaker:

the fact that

Speaker:

you lost your AirPods.

Speaker:

I think that you

Speaker:

manifested it.

Speaker:

You were suggesting that I needed new AirPods and I think my current AirPods got

Speaker:

upset

Speaker:

and then they literally flew out

Speaker:

of my pocket.

Speaker:

They're like, doo doo doo doo doo doo

Speaker:

Yeah, it was the weirdest thing.

Speaker:

Like, I, I had, like I, I, I've

Speaker:

done.

Speaker:

really good job

Speaker:

with holding on my ear pods.

Speaker:

And then I was,

Speaker:

I was

Speaker:

at a restaurant and, um, you know, having

Speaker:

a date with my lovely wife was great and

Speaker:

I pulled, pulled

Speaker:

the thing out of my

Speaker:

pocket and literally the case, like, flipped,

Speaker:

like, open and the AirPod just went flying.

Speaker:

And I don't, it was, it was such a weird thing that I didn't even realize

Speaker:

it happened when it happened.

Speaker:

It wasn't

Speaker:

until I got home and I realized that both my AirPods were no longer in it

Speaker:

And, uh, yeah, so I just like, in a moment like that I lost my AirPods.

Speaker:

So I think what you need is a case that has one of those clasps on it.

Speaker:

Right?

Speaker:

So you have to undo the clasps in order for it to open.

Speaker:

oh,

Speaker:

Right?

Speaker:

Because just the normal

Speaker:

Silicon ones, I don't think will be sufficient for you.

Speaker:

Yeah,

Speaker:

apparently

Speaker:

not.

Speaker:

But, hey, I've got the new

Speaker:

the new fancy AirPods Pro Generation 2 USB C,

Speaker:

which really should be called Generation 3.

Speaker:

But, You

Speaker:

know, because I had to

Speaker:

very

Speaker:

specifically make sure that I bought the one with the USB C.

Speaker:

well, it's

Speaker:

because the actual thing is the same.

Speaker:

Yeah, well, yeah, but you know what I'm saying.

Speaker:

I mean,

Speaker:

I know

Speaker:

like I was going to buy it at Costco, but Costco only has

Speaker:

the, uh, the older

Speaker:

Shantoos.

Speaker:

Yep.

Speaker:

Yeah,

Speaker:

but,

Speaker:

uh,

Speaker:

Tough life you live, Curtis.

Speaker:

Uh, I'm just waiting to listen to what I'll be blamed for next.

Speaker:

Absolutely.

Speaker:

So,

Speaker:

uh, we're speaking of blaming.

Speaker:

We got blame to go around.

Speaker:

I,

Speaker:

I, I, think we should take credit for this, for this piece of

Speaker:

news.

Speaker:

What do you think?

Speaker:

Oh,

Speaker:

Curtis.

Speaker:

Yes, it's our ability to

Speaker:

expose to the listeners, hey, here's what ransomware is, that...

Speaker:

Yeah, I think you're right.

Speaker:

We...

Speaker:

You think, you think,

Speaker:

But is it a good or a bad thing, though?

Speaker:

That's my question, this article.

Speaker:

well, I actually think

Speaker:

it's a good thing.

Speaker:

Let's, so let's talk about

Speaker:

it.

Speaker:

So the, the headline, and it's from a story in the register.

Speaker:

ransomware

Speaker:

attacks register,

Speaker:

It's a bit, it's funny I realized that the word register was in the

Speaker:

title and it messed me up there.

Speaker:

Ransomware

Speaker:

attacks register record speeds thanks to successive InfoSec

Speaker:

industry.

Speaker:

So when I

Speaker:

first heard that, I

Speaker:

was like Wait,

Speaker:

I, you know, that one, that one literally, uh,

Speaker:

threw me.

Speaker:

So the subtitle here is dwell times

Speaker:

drop to hours rather than days for the first time.

Speaker:

So first off,

Speaker:

do you want to explain what a dwell time is for those of our listeners

Speaker:

that don't know?

Speaker:

Yeah.

Speaker:

yeah, so in the past with ransomware, I think

Speaker:

before

Speaker:

it used to be

Speaker:

measured, like you said,

Speaker:

in days, like four and a half to five and a half days in the

Speaker:

last couple of years.

Speaker:

Right.

Speaker:

But this is basically the amount of time that

Speaker:

ransomware is in your system.

Speaker:

So someone

Speaker:

has attacked, infiltrated your systems, they've dropped a package.

Speaker:

It hasn't done anything though,

Speaker:

Right.

Speaker:

It's just sitting

Speaker:

there and waiting.

Speaker:

Right,

Speaker:

and there

Speaker:

is now while it's waiting it could be

Speaker:

discovering other things, figure out what's important, what's

Speaker:

not but while

Speaker:

it's waiting there's always a

Speaker:

risk that

Speaker:

it could be detected, it could be destroyed, and so previously, like you

Speaker:

were saying, four and a half five and a

Speaker:

half days for the dwell time, that it would just sit in

Speaker:

your environment, not doing

Speaker:

anything.

Speaker:

Yeah,

Speaker:

I remember

Speaker:

when, I don't know if there's a difference.

Speaker:

Well, there's definitely a difference between the average and the mean, but

Speaker:

I remember when the mean dwell time was measured in many days, right?

Speaker:

Like, like

Speaker:

it was like as high as

Speaker:

45.

Speaker:

Right.

Speaker:

And now they're saying that the, uh,

Speaker:

um, you

Speaker:

know

Speaker:

this time, and I don't know if they're using mean or average,

Speaker:

but, uh, it says it's down to

Speaker:

24 hours.

Speaker:

And they're saying, and in more than 10 percent of the

Speaker:

incidents,

Speaker:

It was deployed within five hours The ransomware

Speaker:

was, you know, the actual ransomware part was done within

Speaker:

five hours of the initial attack,

Speaker:

which

Speaker:

is good, right?

Speaker:

Because, like the title said, that means people are detecting it faster, right?

Speaker:

And ransomware crews and ransomware as a

Speaker:

service affiliates, right?

Speaker:

They realize, yeah, we can't just let it sit there.

Speaker:

We have

Speaker:

to Be in and out

Speaker:

as quickly as possible.

Speaker:

Right, yeah, and and that's, that's why the headline,

Speaker:

they're saying, well, because

Speaker:

we've gotten

Speaker:

better at

Speaker:

detecting it,

Speaker:

they've, they've basically

Speaker:

had to realize they've had to,

Speaker:

You know,

Speaker:

once they're in and

Speaker:

they got to

Speaker:

do bad stuff right away.

Speaker:

Otherwise, they're going to, they're going to get detected.

Speaker:

Um, go ahead.

Speaker:

Another interesting fact I

Speaker:

saw in the article was, I know we always

Speaker:

talk about like double extortion,

Speaker:

right?

Speaker:

Where

Speaker:

someone comes in, they encrypt your.

Speaker:

Environment, but they also exfiltrate data, right?

Speaker:

So now

Speaker:

you have to pay,

Speaker:

right?

Speaker:

Because otherwise,

Speaker:

who wants to have their data released?

Speaker:

I think, actually, as we're

Speaker:

recording this,

Speaker:

there is a company that

Speaker:

is potentially going to have

Speaker:

their data exposed

Speaker:

because they

Speaker:

decided not to pay the ransomware operators,

Speaker:

right.

Speaker:

right?

Speaker:

And that's the

Speaker:

double extortion.

Speaker:

Now, in the article though, they said that the times, number of times that

Speaker:

they're seeing double

Speaker:

extortion from the people they've surveyed

Speaker:

is only 13 percent of the

Speaker:

time.

Speaker:

That seems really

Speaker:

low,

Speaker:

but given

Speaker:

you only have 24 hours, maybe

Speaker:

it makes sense.

Speaker:

They don't have

Speaker:

enough time to do more damage.

Speaker:

Right.

Speaker:

Yeah, that, that,

Speaker:

and I see that

Speaker:

as good news, as I'm

Speaker:

sure

Speaker:

you understand, because the, the actual,

Speaker:

the thing I'm worried

Speaker:

most about is the Um, exfiltration because

Speaker:

backup just can't help in

Speaker:

that, right?

Speaker:

Uh, once the data has been

Speaker:

exfiltrated,

Speaker:

all all bets are

Speaker:

off.

Speaker:

So I saw

Speaker:

that as good.

Speaker:

And that part came from the annual threat intelligence report from Microsoft.

Speaker:

So

Speaker:

that

Speaker:

that is really interesting though, is that, um, uh,

Speaker:

the other reason why I think this is a good thing

Speaker:

is that the shorter the dwell

Speaker:

time,

Speaker:

the easier the

Speaker:

recovery.

Speaker:

So when you have a dwell

Speaker:

time

Speaker:

measured in days or

Speaker:

weeks,

Speaker:

And you're doing something along the way,

Speaker:

especially if you're encrypting data

Speaker:

along the way,

Speaker:

how do you recover from

Speaker:

that?

Speaker:

Right?

Speaker:

There's no,

Speaker:

the, the the good point

Speaker:

in time

Speaker:

is three weeks

Speaker:

ago,

Speaker:

right?

Speaker:

Do you do you really want to recover?

Speaker:

your primary file

Speaker:

server, for example?

Speaker:

This was the one I was always

Speaker:

worried about.

Speaker:

If you

Speaker:

encrypt VMs, if you encrypt databases,

Speaker:

it's easy to notice

Speaker:

the moment you encrypt anything, everything stops working

Speaker:

and you know when the

Speaker:

point in time

Speaker:

is.

Speaker:

But if you talk about a file

Speaker:

server

Speaker:

or someone's workstation that has a lot of files

Speaker:

on it, if you're able

Speaker:

to encrypt...

Speaker:

data over

Speaker:

time

Speaker:

and not be noticed, Restoring that is

Speaker:

significantly more

Speaker:

complicated than restoring

Speaker:

an encryption attack that takes place over hours.

Speaker:

So I think this

Speaker:

is a much better uh, scenario.

Speaker:

It does mean we have to continue to

Speaker:

stay vigilant

Speaker:

and to make sure that we're continuing to detect

Speaker:

so that they continue to have dwell times this small.

Speaker:

And this

Speaker:

also goes to the importance of backups,

Speaker:

right?

Speaker:

Cause if it does hit, like you were saying, you want to

Speaker:

be able to restore.

Speaker:

And so if you don't have a

Speaker:

backup that you can restore from.

Speaker:

Then you're going to lose data.

Speaker:

Right.

Speaker:

There, There was another thing here that they were

Speaker:

saying that, you know, because of ransomware as a service uh, businesses,

Speaker:

that they

Speaker:

actually,

Speaker:

it says in June,

Speaker:

they broke the single

Speaker:

month record for ransomware attacks.

Speaker:

Thanks to a single exploit, uh, the MoveIt

Speaker:

MFT exploit, which I actually don't know much about, but that

Speaker:

single exploit allowed them to uh,

Speaker:

break the record of the number of attacks in a month.

Speaker:

That

Speaker:

doesn't sound good.

Speaker:

None of this sounds good, I guess.

Speaker:

It's just,

Speaker:

I do like

Speaker:

a quicker attack because

Speaker:

a quicker attack is, I think.

Speaker:

Easier

Speaker:

to

Speaker:

defend against,

Speaker:

or let me rephrase

Speaker:

that,

Speaker:

a quicker attack is easier

Speaker:

to recover from.

Speaker:

Yeah.

Speaker:

And also a

Speaker:

hundred percent

Speaker:

agree with you, Curtis.

Speaker:

So,

Speaker:

what,

Speaker:

so do you still want to claim credit because of our podcast that

Speaker:

we're helping

Speaker:

improve?

Speaker:

to how

Speaker:

much

Speaker:

we have gotten the word out there that long

Speaker:

dwell times are bad,

Speaker:

that the attackers have

Speaker:

made short dwell times.

Speaker:

You

Speaker:

So any attackers

Speaker:

so any attackers

Speaker:

out there, if you would like to come on the podcast and talk about this,

Speaker:

please reach out and

Speaker:

let us know.

Speaker:

Can you

Speaker:

imagine?

Speaker:

Can you imagine that?

Speaker:

Um, once

Speaker:

again, another thing

Speaker:

from here, once

Speaker:

again,

Speaker:

the two highest

Speaker:

profile attacks of 2023 were the result

Speaker:

of unpatched infrastructure, right?

Speaker:

Um,

Speaker:

we like to talk about on the podcast, right?

Speaker:

yeah, yeah,

Speaker:

MFA, patcher systems,

Speaker:

do

Speaker:

backups.

Speaker:

Exactly.

Speaker:

That would stop the vast majority of

Speaker:

ransomware attacks that we see.

Speaker:

Well, with that, that is the news of the day.

Speaker:

This week's episode is a continuation of our Backup to Basics series,

Speaker:

and this week, we're going to be talking about a Product category

Speaker:

that at one point was red hot.

Speaker:

Was it not?

Speaker:

Do you remember when this product category was red hot?

Speaker:

Like everybody had to have a CDP product.

Speaker:

Do you

Speaker:

I want to say it was like 2002, 2003.

Speaker:

Yeah.

Speaker:

What I remember was being at Storage Networking World and half of the

Speaker:

booths were CDP products, remember

Speaker:

is CDP Curtis or our

Speaker:

yeah, we're, we're gonna, we're gonna talk about that in just

Speaker:

a second, but just the, the.

Speaker:

The sheer number, I remember thinking all of these can't succeed and little

Speaker:

did I know that pretty much almost none of them, uh, would succeed.

Speaker:

Uh,

Speaker:

I want to say there's like four left.

Speaker:

In the world.

Speaker:

Yeah, there's, well, and, and most of them got acquired and

Speaker:

are, are simply a checkbox on, on another product's portfolio.

Speaker:

So what is CDP?

Speaker:

It stands for continuous data.

Speaker:

Protection.

Speaker:

And this was a, you may recall in a previous episode, we talked about

Speaker:

replication and what, as far as I'm concerned, what is the primary problem

Speaker:

with date with replication as a community.

Speaker:

Data protection or a basically a replacement for backup.

Speaker:

What's the primary problem with it?

Speaker:

Whatever you do here happens here.

Speaker:

Exactly.

Speaker:

It is very efficient in replicating stupidity, right?

Speaker:

Uh, or, or, or ransomware attacks or anything in any sort of cyber attack.

Speaker:

So replication is great at giving you a, An RPO of zero, right?

Speaker:

A recovery point objective of zero, but it's also going to replicate

Speaker:

things that happen on a logical level.

Speaker:

Um, and so CDP was born and I describe CDP as replication with a back button.

Speaker:

What do you think of that?

Speaker:

That definition.

Speaker:

I like it, but I used to think I used to, you know what I used to call CDP?

Speaker:

What?

Speaker:

I was like, it's TiVo for your data.

Speaker:

Yeah.

Speaker:

That, but that was, uh, I remember, I remember vendors describing it like that.

Speaker:

Uh, the problem is now nobody knows what TiVo is.

Speaker:

I know that's why I said for the five listeners who may know what TiVo is.

Speaker:

And for the two of us, since we both had TiVos, right.

Speaker:

We understand that name.

Speaker:

And also if you do watch Psych, there is references.

Speaker:

Are there TiVo references in psych?

Speaker:

Oh, yeah,

Speaker:

All right.

Speaker:

Well, you would know better 'cause you've been, you've been binging psych lately, so

Speaker:

but, but, but yes, I agree with your point.

Speaker:

It is a back button for replication.

Speaker:

And specifically what you mean is replication.

Speaker:

Do you have that one copy with CDP?

Speaker:

You can go backwards from that one copy.

Speaker:

To other points in

Speaker:

yeah.

Speaker:

The, the reason why I call it replication with a back button is that, is that

Speaker:

the process of getting the data.

Speaker:

We've discussed that.

Speaker:

I see all of these things as backup.

Speaker:

A lot of people see backup as, well, putting something on tape or a backup

Speaker:

that changes its format, right?

Speaker:

A lot of people try to define sort of old school backup as something that requires

Speaker:

a restore, you know, different ways to try to define what old school backup is.

Speaker:

And...

Speaker:

I just see that as a, that is the old way we did backup.

Speaker:

This is now a new way that

Speaker:

we do backup.

Speaker:

Backup is just a method of putting the data in a different place

Speaker:

so that we can restore it in, in time of something bad happening.

Speaker:

And this is one of the newer ways.

Speaker:

And the, the thing is, unlike traditional backup, CDP is not a batch process.

Speaker:

Traditionally backup ran once a night.

Speaker:

Sometimes you might run it multiple times a day.

Speaker:

You could run it once an hour.

Speaker:

You could run it every five minutes.

Speaker:

Traditionally backup is a batch process.

Speaker:

CDP by definition, that C is that it is happening continuously.

Speaker:

All the time, just like replication.

Speaker:

Although we had some, there were some finer points there where we,

Speaker:

where you and I were trying to argue about on what continuous means, and

Speaker:

the idea is that it is happening truly continuously every time.

Speaker:

A block of data that is changed on the primary system.

Speaker:

It gets replicated to the target system Now, immediately, you know,

Speaker:

Yeah, we can debate that.

Speaker:

That's

Speaker:

this happens, but, but basically this is, it's not a batch process.

Speaker:

It's happening continuously throughout the day.

Speaker:

And then we can talk about how that is stored on the other end.

Speaker:

Uh, how are you okay with that part of the definition?

Speaker:

I'm good with that.

Speaker:

And I think the one other thing we should touch on is.

Speaker:

As technologies have evolved, so has CDP in the sense of we could

Speaker:

talk about where in the stack you're actually triggering or forwarding I.

Speaker:

O.

Speaker:

and the data from.

Speaker:

Typically, right, and way back in the day, right, all these CDP vendors when

Speaker:

you were probably at the SNIA, right, it was all, okay, here's an appliance.

Speaker:

that you put in, right, the writes might come into it, get split off, go to two

Speaker:

different places, right, that's one method that some people would do to make sure you

Speaker:

have two copies, continuously replicating.

Speaker:

Another method that some vendors have used is you sort of write to your

Speaker:

primary, the primary forwards it off to an appliance or to something else which

Speaker:

then writes it on the target system.

Speaker:

Right.

Speaker:

That's another mechanism people did.

Speaker:

All of that is sort of infrastructure level down at the

Speaker:

storage array or networking level.

Speaker:

Actually, some people even did it at like the storage area network level, right.

Speaker:

Where they would have that appliance in the middle, right.

Speaker:

And basically that's that first use case where you would write

Speaker:

to two different storage arrays.

Speaker:

The other thing moving up the stack, right, is with virtualization, people

Speaker:

were like, hey, the same challenges you had with sort of storage level, CDP,

Speaker:

let's do that at the VM level as well.

Speaker:

And so you had technologies that would allow you to split right at a VM level.

Speaker:

You could forward it off to another ESXI cluster in a different location and have a

Speaker:

continuously replicated VM somewhere else.

Speaker:

Right, basically they all, the concept was the same.

Speaker:

The question is, at what point are we going to split the right?

Speaker:

And then take one copy and send it where we would always send it to the

Speaker:

primary storage and the other copy of that right gets sent to some magic

Speaker:

process or box or whatever that will then store it for CDP purposes and.

Speaker:

Sometimes it can happen in the storage array.

Speaker:

There, there have been boxes that you can buy that go between your

Speaker:

storage array and your server.

Speaker:

Sometimes it might be an independent, you know, that box might be

Speaker:

an actual appliance, it might be a piece of software, right?

Speaker:

We had Datacore on here.

Speaker:

Datacore was one of those vendors that you can put the box in, you know, their

Speaker:

software on a box in between your.

Speaker:

Uh, storage array on your server, and it might be in, like you said, it might

Speaker:

be in the hypervisor, it might even be in the cloud, it might be something

Speaker:

that's being done in the cloud.

Speaker:

But the idea is that basically as the, literally as the data is being written,

Speaker:

it gets piped off into two places, and then the second of which is the CDP copy.

Speaker:

Do you consider, since we're talking about CDP, do you consider database

Speaker:

level things like Oracle's Data Guard as CDP or Exchange used to have

Speaker:

something like, what was it called?

Speaker:

CRR and all the rest where a write comes in and they forward over the

Speaker:

log, because that technically is CDP,

Speaker:

That is, that is application level replication.

Speaker:

It is not application level CDP because I don't think that with an active database

Speaker:

that you can just go backwards in time.

Speaker:

I know that if it crashes you can do, you can do media recovery against it.

Speaker:

But I don't think it's built.

Speaker:

So I'll just say if that's built into it, then sure.

Speaker:

Right.

Speaker:

But if it's just replicating the changes and doesn't have the ability

Speaker:

to go back in time, then no, right.

Speaker:

It's not CDP.

Speaker:

That is a very crucial aspect of CDP,

Speaker:

yeah, and one way to think about this is, I know with databases, we

Speaker:

think about redo logs, right, which allow you to go forward in time.

Speaker:

With CDP, you actually want undo logs, right?

Speaker:

How do I go backwards in time from the most recent version on the target system?

Speaker:

That's a really good point.

Speaker:

And I don't think anybody calls them undo logs.

Speaker:

So everybody calls them either redo logs or transaction logs.

Speaker:

No, I mean, in, in the

Speaker:

Oh, database.

Speaker:

um, they call them redo logs or they call them transaction logs, because

Speaker:

the idea is that you, you have a.

Speaker:

It allows you to have a backup, a traditional backup from this

Speaker:

point in time and then use those logs to redo the transactions

Speaker:

that happened during that point in time and since that point in time.

Speaker:

But with CDP, you are correct, the most important thing is to be able to go

Speaker:

back in time, which is not something that a typical database replication

Speaker:

scenario is going to be able to do.

Speaker:

You mentioned the ability to go back in time.

Speaker:

How far back in time should we be able to go with ACDP system?

Speaker:

Depends on what your requirements are, right?

Speaker:

I would say with the CDP system, it depends on what other environments

Speaker:

or infrastructure you have.

Speaker:

For instance, if you have backups, right?

Speaker:

That you're taking periodically, separately, outside of the CDP system.

Speaker:

Your CDP system may only need 7 days worth of data, so you can recover

Speaker:

within those 7 days at, sort of, uh, I.

Speaker:

O.

Speaker:

granularity.

Speaker:

Right.

Speaker:

Or a record granular or whatever we want to call it.

Speaker:

Right.

Speaker:

Uh, but as long as you have that backup system, that's fine.

Speaker:

Going back, say 30, 90, or trying to replace your backup system with the

Speaker:

CDP system is a little crazy because I think we need to talk about what's

Speaker:

required on the target system or on the target side in order to handle CDP.

Speaker:

Right, because in order to be able to go back in time, I need much more

Speaker:

storage at the target side than I need at the primary side, because

Speaker:

if I'm doing a hundred terabytes of storage, And I'm, and I'm

Speaker:

going to do CDP for that.

Speaker:

How much do you, because realize at that target side, I need to store the

Speaker:

hundred terabytes and every block.

Speaker:

That changes in that 100 terabytes during that

Speaker:

Date.

Speaker:

continuum that you've set,

Speaker:

Yeah.

Speaker:

And so that's why you would see sort of, and I think on the target side,

Speaker:

we should probably differentiate depending on what technology, right?

Speaker:

Your target system itself may not need all the extra space, but maybe that

Speaker:

target appliance, which is dealing with these transactions coming in or

Speaker:

these change blocks coming in, that might need to hold the space, right?

Speaker:

Uh, sort of as a log.

Speaker:

And this is really, this problem right here is why CDP, I think,

Speaker:

failed in terms of the dream of CDP.

Speaker:

The dream of CDP, because I remember meeting with CDP.

Speaker:

CEOs, and they were like, this solves everything, We can

Speaker:

recover to any point in time.

Speaker:

Why would you do it any other way?

Speaker:

And the answer is cost.

Speaker:

It's the cost because, because the thing you have to think about is

Speaker:

you have to store the data, right?

Speaker:

Not only with the metadata about what came in, the data that's there,

Speaker:

but if these are undue, right, you also need to store what the previous

Speaker:

data was as well, because you have to be able to go backwards in time.

Speaker:

And so you have to store all of this information in that appliance and.

Speaker:

Some people say that you might have like a 2 percent change rate per day.

Speaker:

That doesn't mean that that's 2 percent that's 2 percent over the entire day.

Speaker:

But if you're adding up every single transaction, right, that might turn

Speaker:

out to be like 5 percent actual change.

Speaker:

Right.

Speaker:

Or 10%, right?

Speaker:

if you have anything, if you have a block updated, if we're talking about

Speaker:

block level CDP here, which is generally what we're talking about, if a block

Speaker:

changes multiple times during the day, you have to store every version

Speaker:

of that block throughout the day.

Speaker:

And, uh, you're right.

Speaker:

It could be a significant percent.

Speaker:

And by the way, you have no idea what that number is until you deploy CDP.

Speaker:

Right.

Speaker:

The other,

Speaker:

and

Speaker:

the other thing I know you were just mentioning about sort of, you don't know

Speaker:

what you'll need until you deploy it.

Speaker:

You also have to deploy it on pretty fast and expensive hardware, because if

Speaker:

you think about it, you're getting this constant stream of rights that you have

Speaker:

to store and you have to replay it down to your target storage location as well.

Speaker:

And so your destination system might need to be beefier or the infrastructure

Speaker:

required might need to be beefier than what you even have on your production.

Speaker:

Right?

Speaker:

So going back to that cost aspect, that starts to add up pretty fast.

Speaker:

yeah, this, we can go back to the episode on replication.

Speaker:

The synchronous and asynchronous aspect is important to understand here.

Speaker:

So generally CDP will be done asynchronously.

Speaker:

Do you remember synchronous CDP?

Speaker:

I think there was one vendor who did it, but yes,

Speaker:

okay.

Speaker:

So you could do, but I do, I think most people do it asynchronously.

Speaker:

And the point is, asynchronously is fine.

Speaker:

Obviously your RPO won't be zero.

Speaker:

It'll be something close to zero.

Speaker:

But the problem with asynchronous is if the target system gets behind

Speaker:

in those rights at some point, you know, the buffer is getting back.

Speaker:

At

Speaker:

your back pressure is going to have to, yeah,

Speaker:

Yeah.

Speaker:

That's a good term.

Speaker:

The back pressure.

Speaker:

I like that.

Speaker:

Right.

Speaker:

You, you will eventually have more rights in the buffer than the size of

Speaker:

the buffer, which would then essentially it then becomes a synchronous or

Speaker:

you have to start dropping rights.

Speaker:

Because

Speaker:

which you don't want to do.

Speaker:

be slowing down the primary system.

Speaker:

So you'd end up having to dump the buffer and you'd end up

Speaker:

losing bits along the way.

Speaker:

And that's just, that's just not something that you would want to do.

Speaker:

Now, one of the benefits I would say, though, with the CDP like

Speaker:

approach is you can do this sort of CDP to Dissimilar systems, right?

Speaker:

So you might be going from like a NetApp to an EMC, or you could be

Speaker:

going from a pure to a Hitachi.

Speaker:

So it gives you flexibility because the CDP applying software package,

Speaker:

whatever else, just needs access to devices on both sides, right?

Speaker:

It's doing all the replication, it's managing everything.

Speaker:

So for cases where you're looking to deal with uh, different costs or

Speaker:

availability of equipment, right?

Speaker:

It is an option rather than sort of being locked into a particular vendor.

Speaker:

Right.

Speaker:

Most of the CDP vendors that I know.

Speaker:

Uh, are, are independent of the storage, right?

Speaker:

So you can use whatever storage you want on, on both sides.

Speaker:

The thing is, I mean, we've, we've been, we've been harping on

Speaker:

it for a little bit, but I mean, the, the idea of CDP is amazing.

Speaker:

The idea that I can just go back to any point in time is amazing.

Speaker:

And I don't have to do anything special on the front end.

Speaker:

Um, but it does come with these downsides.

Speaker:

And so there were some things that happened over.

Speaker:

As CDP was deployed in more and more environments, customers, I

Speaker:

think, demanded certain features.

Speaker:

One of them was this term called right coalescing.

Speaker:

Do you want to talk about that a little bit?

Speaker:

Yeah.

Speaker:

So write coalescing is, I know Curtis, you talked about before where you had

Speaker:

multiple changes to a single block.

Speaker:

That would happen.

Speaker:

Uh, and that's great.

Speaker:

But at the end, would I need to replay something?

Speaker:

I don't need to know all the versions, right?

Speaker:

I could just say, look, just give me this version of the data.

Speaker:

That's all I care about.

Speaker:

And so being able to reduce down some of that data.

Speaker:

So maybe instead of having every transaction for the last

Speaker:

seven days, maybe for the last.

Speaker:

36 hours, I have every transaction, and then after that I'm going

Speaker:

to coalesce writes down.

Speaker:

So I have singular points in time rather than having every single

Speaker:

point in time available to me.

Speaker:

Because honestly, if I go back seven days, do I really care about this I.

Speaker:

O.

Speaker:

versus this I.

Speaker:

O.?

Speaker:

Right?

Speaker:

Like, how do I even find that point in time, you know?

Speaker:

That's the biggest challenge as well.

Speaker:

you'll be happy to have anything.

Speaker:

So you could start with true CDP.

Speaker:

You could, you could always replicate every change.

Speaker:

The system holds on to a certain amount of, you know, all of the

Speaker:

changes for a certain amount of time.

Speaker:

Configurable by the customer.

Speaker:

And then it starts coalescing and saying, okay, we're just going to

Speaker:

make sure we have all the blocks we need to represent this point in time.

Speaker:

And you might go with hourly snapshots after they're not snapshots, but

Speaker:

they're not snapshots in terms of what we traditionally think of as

Speaker:

There are point in times, yeah.

Speaker:

There are points in time.

Speaker:

So you have hourly points in time that you can recover.

Speaker:

And then maybe you go to daily and even weekly.

Speaker:

And that's where some CDP systems, that's what, that's the way some

Speaker:

CDP systems were trying to push out that amount of time that they could.

Speaker:

Essentially replaced the backup system.

Speaker:

But even then it's just not, doesn't really think the way of a regular

Speaker:

backup system would, and so it still ends up storing a lot more data.

Speaker:

And just being more costly in general.

Speaker:

I know, I remember another challenge with CDP systems is with backup.

Speaker:

I know we talk a lot about application consistency, right?

Speaker:

Making sure I have a application consistent point in time that Oracle,

Speaker:

for instance, can quickly recover and I don't need to worry about media

Speaker:

recovery and all the other processes.

Speaker:

With CDP systems, A lot of them missed out.

Speaker:

Now, they've gotten better, but back then, none of them really supported

Speaker:

application integration in a proper way.

Speaker:

Yes, some would do VSS integration to allow you to do like poor bands

Speaker:

backup, but for the most part, they were CDP systems operated

Speaker:

at an infrastructure level.

Speaker:

And so, it didn't have that capability that, honestly, like, as a backup

Speaker:

person, you cared about the application more than the storage, right?

Speaker:

You needed to make sure I had an application consistent backup that I

Speaker:

knew was good that I could recover from.

Speaker:

Yeah, exactly.

Speaker:

One of the challenges is that you say, well, you give me

Speaker:

infinite recovery points, right?

Speaker:

I just want one good one, one point when I know that the the CD,

Speaker:

a lot of the CDP products started integrating more with the database.

Speaker:

So that while they could still give you the infinite point, they could

Speaker:

say, Hey, we also put the database in backup mode at these points in time

Speaker:

so that we know that that point in time is one that is truly consistent

Speaker:

that you could, uh, recover from.

Speaker:

You, you could also use the other points in time, but we're giving you this one

Speaker:

that we know for sure that it's good.

Speaker:

It's special.

Speaker:

so, yeah, it's, it's special, right?

Speaker:

It's still not a snapshot, but it's a point in time when we can say

Speaker:

that, uh, when we can say that we know we can recover to that point.

Speaker:

One other thing about backup, right?

Speaker:

I know we always talk about test your backups, test your

Speaker:

backups, test your backups.

Speaker:

CDP becomes difficult to test in most environments.

Speaker:

Unless you have a lot of additional space and storage, because you don't

Speaker:

necessarily want to stop the copy being updated on the target site.

Speaker:

So now the question becomes, how do I now spin up a separate copy with

Speaker:

that particular point in time that I'm interested in so I can test and verify,

Speaker:

is my Oracle database backup, right?

Speaker:

Is that a good point in time or not?

Speaker:

Exactly.

Speaker:

Yeah.

Speaker:

So it was like, it gave you, it, it gave you almost too much.

Speaker:

Right?

Speaker:

the thing that it gave you that nothing else could give you except

Speaker:

for replication was that RPO of zero.

Speaker:

But it did come with other op it, it came with other.

Speaker:

Complications that you had to, to deal with.

Speaker:

It's like, I often say in IT, we never fix problems.

Speaker:

We just move them.

Speaker:

Right.

Speaker:

So, so we, we solved one problem.

Speaker:

We created, we created some others.

Speaker:

So the other thing I want to talk about is how the.

Speaker:

The, how the data was stored on the target end.

Speaker:

There are two ways, as I understand it, that data was stored on the other end.

Speaker:

There were sort of two ways that the recovery system manifested itself.

Speaker:

One was that there was a volume that we were continuously updating so that if you.

Speaker:

needed to do a recovery, that volume was already replicated to the point in time,

Speaker:

the most recent point in time that you wanted to restore to, and then it also

Speaker:

had a log and the ability to undo that.

Speaker:

Uh, that volume, undo the changes to that volume so that you could take this,

Speaker:

this LUN, right, bring it back in time.

Speaker:

That was the one thing that was, that was really cool.

Speaker:

The, the advantage to that method was that if what you wanted was

Speaker:

right now, you had it immediately.

Speaker:

If you wanted to go back a little bit earlier and the farther back you wanted

Speaker:

to go, the more work had to be done.

Speaker:

And so the longer the recovery took, but that, uh, was the

Speaker:

primary, I think that was the most common way CDP manifested itself,

Speaker:

Yeah.

Speaker:

And I think, like you mentioned, that's a great...

Speaker:

opportunity because most of the times you're probably recovering to the latest

Speaker:

or somewhere near the latest point in time, rather than, hey, I need to go back.

Speaker:

Let me restore all my data from three weeks ago and now replay all

Speaker:

my backups going forward, which leads to a much longer time to recover.

Speaker:

Exactly.

Speaker:

There was this other way where they didn't create the volume that there was no volume

Speaker:

that they were continuously updating.

Speaker:

They essentially had all of the bits necessary to create

Speaker:

the volume at any time.

Speaker:

And then, I, I, this feels very NetApp y, and, and, right, although, and by that

Speaker:

I don't mean this is the way NetApp did it, it's just, you know, the way with

Speaker:

NetApp is, is a given snapshot is really just a bunch of pointers to blocks at

Speaker:

a particular point in time, right, and it's so, when you restore a volume To

Speaker:

a particular point in time, all you're doing is moving all the snapshots.

Speaker:

What they're doing is they have all of the bits and pieces that are necessary to

Speaker:

represent the volume at any point in time.

Speaker:

And then you,

Speaker:

Stitch it

Speaker:

you just had to create all the pointers, right?

Speaker:

There was no, there wasn't to restore so much as there was this,

Speaker:

I don't know what to call it.

Speaker:

It's unlike anything I've ever seen.

Speaker:

I really need a whiteboard.

Speaker:

I think too.

Speaker:

To illustrate this method, the real advantage of this was that

Speaker:

the recovery time was always the same regardless of whether or not

Speaker:

you wanted to go to the most recent point in time or three weeks ago.

Speaker:

because you're just at that point, just manipulating pointers and metadata and

Speaker:

not actually copying and restoring data.

Speaker:

And I know that some of those systems They also, we started talking, we started

Speaker:

using this term copy data management when we started talking about some of

Speaker:

the systems because they could say, hey, here's this, here's this volume from

Speaker:

this point in time and from this point in time and from this point in time, and

Speaker:

you can have all three of them at the same time because you could not do that.

Speaker:

That was the other feature.

Speaker:

Of the other method.

Speaker:

You could not have the same volume at multiple points in time.

Speaker:

This method allows you to have as many, no, you think you could.

Speaker:

There are ways with newer technologies to get that.

Speaker:

One method.

Speaker:

Some vendors used was to update that copy, take a snapshot

Speaker:

and present the snapshot out.

Speaker:

Right, so that's one method.

Speaker:

Now, not always the most optimal, but yeah,

Speaker:

right.

Speaker:

Yeah.

Speaker:

I was just thinking that like a single volume can't be presented at multiple

Speaker:

points in time, but you're right.

Speaker:

If you do a snapshot, then yes, you could do, you could do exactly that.

Speaker:

but it's more management

Speaker:

did it like that?

Speaker:

I said, I wonder what vendor was really good at doing snapshots.

Speaker:

So CDP, Continuous Data Protection, is the system that allows you to have an RPO

Speaker:

and an RTO of zero without the risk that you have with replication, where, where

Speaker:

if you have the, something bad happening to your primary data, Uh, from a logical

Speaker:

basis, you, you drop a table, you do something stupid, you get a cyber attack

Speaker:

that it gives you that power that you had with replication, but it also gives you

Speaker:

the power to be able to go back in time.

Speaker:

So it gives you basically the best of both worlds.

Speaker:

It gives you an infinite number of recovery points, but an

Speaker:

infinite turns out might not.

Speaker:

I like it.

Speaker:

Anybody living space.

Speaker:

Yeah, that exactly.

Speaker:

See, you know where I was going.

Speaker:

yeah, exactly.

Speaker:

Uh, but it.

Speaker:

It turns out that infinite is, is not as amazing as it seems.

Speaker:

Infinite number of recovery points comes with its own challenges, but the biggest

Speaker:

challenge I think with CDP is just cost.

Speaker:

That very few people were comfortable with The cost of using CDP as their only data

Speaker:

protection method for a given set of data.

Speaker:

And so they would, what you would most commonly see is we're only going to

Speaker:

use it for our most critical apps.

Speaker:

Or we're going to use it, but we're also going to use a traditional backup.

Speaker:

Because that's what we're, I don't know, I don't know about you, but I,

Speaker:

I'm always on the lookout for something that can do, that can give me everything

Speaker:

that I want, give me that long term retention to be able to go back when

Speaker:

I realized that I did something stupid three months ago and also have an

Speaker:

RPO and an RTO of, of close to zero.

Speaker:

on a unicorn.

Speaker:

I want a unicorn, but there are, there are ways, and we're going to talk about

Speaker:

some of those ways, to give you an RPO and an RTO way better than what we

Speaker:

traditionally had without perhaps the cost and the, the downsides and the

Speaker:

logistical challenges that CDP offered.

Speaker:

I think in the end it cut, there are CDP products, and for certain

Speaker:

applications, for certain environments, It's like the way to do it, right?

Speaker:

It's just, I think what you're seeing is the complexities of this

Speaker:

and the costs associated with this.

Speaker:

This is why it's still a niche play.

Speaker:

And that's why there's only a handful of these products available out there.

Speaker:

What do you

Speaker:

I agree.

Speaker:

Yes.

Speaker:

All right.

Speaker:

Well, hopefully, uh, for those of you that have always wondered what CDP is, now

Speaker:

you know, and now you know why it didn't solve all problems in data protection.

Speaker:

But I would just say, if you want an RPO and an RTO of zero, and you don't want to

Speaker:

have the issue with replication, right?

Speaker:

Which means, right, we've already talked about the, if you don't want to have

Speaker:

the issues that replication causes, then Uh, CDP is really the only game in town.

Speaker:

So hopefully this honest assessment of CDP will allow the very small

Speaker:

percentage of you that need it to know that sounds exactly like what I need.

Speaker:

Uh, and with that, that's a wrap.

Speaker:

The backup wrap up is a production of backup central.com where you'll find my

Speaker:

blog and a list of services I can provide.

Speaker:

This is an independent podcast.

Speaker:

And any opinions that you hear are those of the speaker and not necessarily

Speaker:

any companies that they work for.

Speaker:

We'll see you next week on the backup wrap up.