Speaker:

I'm not sure who needs to hear this, but snapshots are not backups.

Speaker:

And in this context, I'm talking about virtual snapshots, like

Speaker:

what you do with the storage re.

Speaker:

We're not talking about cloud snapshots, like what you do

Speaker:

with EBS volumes in the AWS.

Speaker:

If you don't know the difference or you can't articulate why

Speaker:

these snapshots are not backups.

Speaker:

Well, have I got a podcast for you?

Speaker:

Hi, I'm W.

Speaker:

Curtis Preston AKA Mister backup.

Speaker:

And my podcast turns unappreciated backup admins into cyber recovery heroes.

Speaker:

This is the backup wrap-up.

Speaker:

Hi, and welcome to the Backup Wrap Up.

Speaker:

I'm your host, W.

Speaker:

Curtis Preston, and I have with me my Spanish language

Speaker:

encourager, Prasanna Malaiyandi.

Speaker:

I gave you a positive thing

Speaker:

I know, I'm impressed.

Speaker:

And I'm also impressed that you are done now with, what, Spanish 2, right?

Speaker:

Spanish 2, uh, Sisi termine con Span, uh, I almost said

Speaker:

Spanish, Espanol 2, um, yeah.

Speaker:

So now I need to, I need to do a little review because there's, it's, it's

Speaker:

really, you know, for a, for an English speaker, uh, the challenges, as I know,

Speaker:

I've spoken to you, the challenges are things like, we're like, it, it, the,

Speaker:

the gender thing is, is not a big deal.

Speaker:

You get used to that.

Speaker:

What's challenging is words that n an ma like PMA you would

Speaker:

think that would be feminine, but it's actually a masculine word.

Speaker:

Uh, because it, it ends in an, uh, ma.

Speaker:

I have to pound that stuff into my head and just say it over and over.

Speaker:

I will be using, the thing that I made where I, I'm actually

Speaker:

using technology to make a verbal recording that I can listen to.

Speaker:

Uh, it's very, very cool.

Speaker:

Uh, but I am excited to move on to Spanish three and.

Speaker:

Bring you dragging with me.

Speaker:

It's helpful.

Speaker:

I, Used to be pretty good at Spanish because I took it in high

Speaker:

school and then I lost everything.

Speaker:

I can order things off a menu, but that's about it.

Speaker:

Right.

Speaker:

Right.

Speaker:

So, uh, I think it's time for the news of the week.

Speaker:

this one is frustrating.

Speaker:

Uh, we'll, we'll, we'll do a frustrating one first and then

Speaker:

some good news about a vendor.

Speaker:

Let's talk about this one password hack.

Speaker:

And it frustrates me for many reasons.

Speaker:

One is, we're such a fan of password managers.

Speaker:

And there are those who are not fans of password managers.

Speaker:

There are also those who are not a fan of the cloud.

Speaker:

And this just, uh, is both of those things.

Speaker:

So, uh, do you want to talk a little bit about the 1Password slash Octahack?

Speaker:

so basically what happened is 1Password, I don't know if it was

Speaker:

1Password or Okta, but basically some hackers got in into 1Password.

Speaker:

They were able to change some things in their Okta instance and try to get a list

Speaker:

of admins, which I'm sure they were going to use to target and be able to deploy

Speaker:

things so they could cause bigger damage.

Speaker:

Right.

Speaker:

But as they were starting to sort of peel back the onion and

Speaker:

figure out, okay, what happened?

Speaker:

They realized what actually happened is it started on the Okta side,

Speaker:

Right.

Speaker:

The reason they were able to do what, what they ended up being able

Speaker:

to do was because Okta had already been, had already been hacked.

Speaker:

And so it's interesting in this case because It wasn't like, oh, they

Speaker:

just got into Okta and then they ping ponged and got into 1Password.

Speaker:

They actually looked at support files that someone in 1Password, an engineer,

Speaker:

had uploaded because they probably had some issue or something else like that.

Speaker:

They were reaching out to Okta and they uploaded basically a support bundle.

Speaker:

Um, and in that support bundle, they contained, in addition to sort of like

Speaker:

information needed to troubleshoot, it also contained session cookies,

Speaker:

Right.

Speaker:

Which they were then able to use.

Speaker:

Right.

Speaker:

Yeah, and so they were able to basically impersonate the one password employee

Speaker:

and log into the system and that's when they started causing havoc.

Speaker:

Yeah.

Speaker:

And, and the good news is that they did, that 1Password did see what was happening.

Speaker:

Uh, they did see this weird session coming from an odd IP.

Speaker:

And they, they shut it down before basically all they did was

Speaker:

attempt to get a list of admins.

Speaker:

They didn't actually get the list of admins is what it looks like.

Speaker:

Um, but the, the whole thing was again, you know, it went back to

Speaker:

this, the fact that the, the hacker initially had compromised the.

Speaker:

The Okta support system, which they then got this support file.

Speaker:

And, you know, in retrospect, it looks like everybody did their job.

Speaker:

It, it looks like, right.

Speaker:

That, that, that.

Speaker:

Things were noticed, things were stopped.

Speaker:

I think that in the case of Okta, maybe they weren't noticed quick

Speaker:

enough because it actually, 1Password was just one company of several that

Speaker:

were compromised because of this.

Speaker:

It doesn't appear for those of you that are 1Password customers, it doesn't

Speaker:

appear that any customer data was accessed, doesn't appear, you know,

Speaker:

unlike the, what was the other one?

Speaker:

The LastPass, the LastPass hack where they actually got the vault.

Speaker:

And then you had to worry if you had insecure passwords, right?

Speaker:

That, that, that, that there were guessable, but it doesn't appear

Speaker:

that that was the case here.

Speaker:

They found it pretty quickly.

Speaker:

think another interesting thing that I found, by the way, this is an

Speaker:

article on The Register, we'll put a link in the show notes description.

Speaker:

Right.

Speaker:

The one thing I found interesting is they were trying to figure

Speaker:

out, okay, what actually happened?

Speaker:

Like, when was This information stolen and so they went back and they looked at

Speaker:

their logs on the Okta side and they were trying to figure out okay was the archive

Speaker:

access before the support engineer on the Okta side accessed it and they were able

Speaker:

to figure out no the support engineer hadn't opened it yet so it wasn't a

Speaker:

rogue support engineer on the Okta side.

Speaker:

And then they looked at the 1Pass site, 1Password site and they saw,

Speaker:

okay, the person who uploaded it was on a public Wi Fi at a hotel.

Speaker:

And they were like, oh, maybe it could have been stolen during the upload

Speaker:

because you know, those connections are always unencrypted and all the rest.

Speaker:

But they looked and they're like, no, the upload process

Speaker:

had TLS end to end up to Okta.

Speaker:

And so.

Speaker:

Everything was encrypted.

Speaker:

It wasn't stolen in the process.

Speaker:

So there must've been something else that had accessed it on the Okta side.

Speaker:

So it's interesting how they're able to piece this all together.

Speaker:

It's almost like CSI, right?

Speaker:

You're piecing together all these clues, trying to figure out, okay,

Speaker:

what happened, what went wrong?

Speaker:

Well, that's what you have to do in an incident response system, right?

Speaker:

You've got to, you know, piece together what you can from the logs that you have.

Speaker:

Logs, logs, logs, right?

Speaker:

That's what it's all about is the logs.

Speaker:

The, the, the only thing that's somewhat distressing is that it

Speaker:

appeared that they made some tweaks.

Speaker:

Some upgrades to their MFA.

Speaker:

Uh, and by upgrades, I mean like they changed some of their stuff.

Speaker:

They're like, maybe we shouldn't allow anyone to log into Okta who isn't

Speaker:

at a, um, a one password IP, right?

Speaker:

Maybe, maybe we shouldn't allow that.

Speaker:

That seems like

Speaker:

Uh, that's,

Speaker:

you should have decided that before, but yeah,

Speaker:

know about that, Curtis, because there are times, like, you might

Speaker:

be traveling, and you might be the super user, and you have to change

Speaker:

something, and you're not at a desk.

Speaker:

Or, imagine that one password blows up, something happens to their site,

Speaker:

and they need to reset passwords, and they could eventually get locked out.

Speaker:

But maybe this is an

Speaker:

I would think, yeah, I would, uh, I would think that that would be an

Speaker:

exception case that you, you would deal with that maybe by the default

Speaker:

would be that you would turn it off.

Speaker:

The other thing was that it appears that now they're using Yubikeys.

Speaker:

For those that aren't familiar with that, Y U B I K E Y.

Speaker:

This is a hardware token.

Speaker:

It's pretty inexpensive as these systems go.

Speaker:

But it is a a hard, you know, it's it's an actual system that you can actually

Speaker:

buy for your own personal use I took a look and it was I think it was a

Speaker:

hundred dollars for two of them Um,

Speaker:

you want to.

Speaker:

you know, and you want two.

Speaker:

Yeah, and so, um, then Uh, and, you know, and so it looks like they're

Speaker:

now using YubiKeys where maybe before they weren't using YubiKeys.

Speaker:

I'm glad that they made those steps.

Speaker:

It's just, it's just, it's just a shame that we always make changes to

Speaker:

systems after we've been, you know, after we've had an exposure, but good

Speaker:

on them, good on them for finding it good on them for responding and

Speaker:

saying, Hey, we could make this better.

Speaker:

Uh, I'm a little dinging Okta here.

Speaker:

There was a thing at the end that Okta basically said that perhaps if you're

Speaker:

sending a support file, uh, what was the, what was the type of that file?

Speaker:

A HAR file?

Speaker:

Um, some sort of archive file.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And they're like, perhaps you shouldn't, perhaps you should sanitize

Speaker:

that before you send it to us.

Speaker:

And I'm like, well, maybe your system that creates the HAR file

Speaker:

should sanitize it for the customer.

Speaker:

Before you upload it.

Speaker:

Uh, there was a little bit I thought of Victor blaming, uh, towards the end there,

Speaker:

but, um, anyway, a, a, a much, I think one that's gonna be a, a lot easier to

Speaker:

talk about our former employer, Druva.

Speaker:

Uh, there, the headline here, another, it's a tech target article

Speaker:

that they've added a gen AI assistant to their cloud backup tool.

Speaker:

And I watched a, like a video of a demo and it basically, it looked like, uh,

Speaker:

you know, an NLM that you can use to interface with the Druva cloud platform.

Speaker:

And so you could ask it things like, Hey, show me my backups, like in, in like, Hey,

Speaker:

show me my backup backups have failed.

Speaker:

I think the big thing is, especially with, uh, with the sort of focus on

Speaker:

AI and making it easily available to a large audience without needing

Speaker:

to know all the training, Right.

Speaker:

And being an expert in it, I think has now made it sort of commonplace, right?

Speaker:

It's easy to pick up an AI model that has been pre trained

Speaker:

and use it for some purpose.

Speaker:

right?

Speaker:

In their case, they're using, uh, Amazon's, uh, no, no.

Speaker:

Great surprise there.

Speaker:

They're using Amazon's, uh, NLM model,

Speaker:

you referring to?

Speaker:

Sorry, are you referring to NLP or LLM?

Speaker:

You're calling it NLM.

Speaker:

thank you.

Speaker:

Oh, yeah, sorry.

Speaker:

NLP Thank you.

Speaker:

NL NLP.

Speaker:

So they're, they're using Amazon's, uh, product to do this.

Speaker:

No.

Speaker:

Great surprise.

Speaker:

DVA lives in Amazon, and, uh, it, it seems like the biggest thing that

Speaker:

they had to do was to make sure that it was integrated with the security.

Speaker:

That's the biggest thing that I see is, yeah, you have to make sure because if

Speaker:

you just pick up any random LLM, I'll use LLM because that's a large language

Speaker:

model, right, which a lot of the AI models are built on top of, if you.

Speaker:

Pick that and depending on what is trained, you might get back bogus

Speaker:

answers, random answers, answers that aren't even safe, right?

Speaker:

And so at least the fact that Druva is putting guardrails in place to make

Speaker:

sure that what gets returned is sensible and safe, I think is a good step.

Speaker:

Yeah.

Speaker:

So we've seen, uh, we've seen AI use with Alcion.

Speaker:

We saw it with, um, I'm trying to think.

Speaker:

Yeah.

Speaker:

Cohesity came out with one.

Speaker:

Do you remember who else?

Speaker:

It was Druva.

Speaker:

I'm trying to, I know there's another one.

Speaker:

Um,

Speaker:

Was it Commvault?

Speaker:

in the thing.

Speaker:

Oh, Dell, Dell, Dell said, yeah, I haven't, I don't think I've seen

Speaker:

anything from Commvault, but, uh, Dell is also doing, uh, yeah, I merged,

Speaker:

I think, in, uh, uh, uh, NLP and LLM

Speaker:

To NLM.

Speaker:

It's, it's a Curtis only thing.

Speaker:

So, yeah, so that's interesting.

Speaker:

Uh, I think anything that makes it easier to interface with your

Speaker:

backup system is a good thing.

Speaker:

As long as they have those guardrails in place and

Speaker:

And I know you always like to talk about how, what's the job, what's the, one of

Speaker:

the most important jobs that you give to the most junior person at a company?

Speaker:

exactly.

Speaker:

Backup, right?

Speaker:

And so,

Speaker:

why would you do

Speaker:

yeah, and so giving an assistant, if you will, right, to that junior backup person

Speaker:

is helpful as they're learning the ropes.

Speaker:

Yeah.

Speaker:

Well, that is the news of the week.

Speaker:

As you know, every episode of the Backup Wrap Up is going to dive

Speaker:

deep into one particular topic.

Speaker:

This week's topic is snapshot, snapshot, snapshot.

Speaker:

Click, click, click,

Speaker:

It's a topic that comes up a lot.

Speaker:

This word comes up a lot on this show.

Speaker:

So it's time to dive deep into a world that I, I bet at one point in your career.

Speaker:

Persona, you must've heard this.

Speaker:

Word a hundred times a day.

Speaker:

What do you think?

Speaker:

At least.

Speaker:

At least,

Speaker:

least.

Speaker:

because there's certainly one company that thinks they do snapshots different

Speaker:

and better than everyone else.

Speaker:

And they're probably, they certainly, it's certainly at one point in time.

Speaker:

That was true.

Speaker:

I think a number of other vendors are now doing snapshots

Speaker:

the way NetApp did snapshots.

Speaker:

So just a quick story just sort of bring why different ways

Speaker:

to do snapshots really matter.

Speaker:

I was at a, I'm in my brain, uh, live translating the story because I

Speaker:

was at one of the largest companies in the world at a consulting gig.

Speaker:

And we were helping them to pick a new storage and backup system.

Speaker:

They were looking for an integrated system that would do both storage and

Speaker:

data protection as part of that storage.

Speaker:

They were already a NetApp customer and they knew that that meant that

Speaker:

they knew every foible of NetApp.

Speaker:

So, uh, they knew every bad thing about NetApp, but they

Speaker:

also knew the good things.

Speaker:

And I think of all of the consulting gigs that I've done throughout the years,

Speaker:

they had done the best at presenting.

Speaker:

These are our requirements.

Speaker:

And they are well defined and there are reasons behind every one of them.

Speaker:

Mm hmm.

Speaker:

And one of their requirements was that they wanted end users, just regular Joe

Speaker:

and Jane person sitting in the desktop.

Speaker:

To be able to do their own resource.

Speaker:

Right.

Speaker:

And they wanted them to have the ability to do that at points in

Speaker:

time that were like an hour at a time throughout the day, going back

Speaker:

to, you know, it was like 90 days.

Speaker:

So specifically they said, this is why we want 90 days of user browsable snapshots.

Speaker:

At the time that was like.

Speaker:

Not everybody did that, right?

Speaker:

Depending on how you did snapshots as a vendor, you either could or

Speaker:

could not meet that requirement.

Speaker:

Just end of story.

Speaker:

And so, the, the, the responses ranged from, uh, no problem, right?

Speaker:

Literally no problem.

Speaker:

To, I remember one vendor coming in and going, That's the

Speaker:

dumbest requirement we've ever

Speaker:

I bet I can

Speaker:

you want 90 days of user browsable snapshots?

Speaker:

Yeah, I think you know exactly who that was.

Speaker:

It was just chaos and by the way There was a vendor that came in and they had

Speaker:

this let me restate that I'm pretty sure it was that vendor that had this

Speaker:

sort of Very convoluted system that was based on like you had this block system

Speaker:

and then you had this other system that had the blocks and you could,

Speaker:

Stephen

Speaker:

it was just really, really complicated.

Speaker:

And they had this like this wizard of a presenter that

Speaker:

was just an amazing presenter.

Speaker:

That was very, um, you know, charming and very smart and

Speaker:

just presented all of the stuff.

Speaker:

Uh, and, and even though that presenter did their best, the, the

Speaker:

customer just was not having it.

Speaker:

And even though he was a really great presenter.

Speaker:

Just didn't play.

Speaker:

Anyway, the thing is that the key there is that how you do snapshots It very much

Speaker:

dictates how everything's going to work.

Speaker:

Going back to the requirements, right?

Speaker:

You said that they had a very clear list of things.

Speaker:

Do you know why they had that requirement for 90 days?

Speaker:

Was it to like?

Speaker:

What was the purpose behind having that?

Speaker:

Because I'm sure today everyone kind of thinks oh, that's just reasonable, right?

Speaker:

Oh, of course, why don't I have that?

Speaker:

But back then it seemed like that was something very,

Speaker:

they had very specific numbers on the number of restores that had been done.

Speaker:

And I know this as a backup person, but if you look at just

Speaker:

all of the restores that are done.

Speaker:

Ever anywhere, you know, for the history of restores, 99 percent of

Speaker:

them are done from data from yesterday.

Speaker:

Right.

Speaker:

And then, or, or from the most recent snapshot, and then there is this cliff.

Speaker:

of usage that just gets smaller and smaller.

Speaker:

And it's just this incredibly ever increasing line to zero.

Speaker:

And I think in their world, what they showed was that ever increasing line to

Speaker:

zero basically dropped off at 90 days.

Speaker:

And so they said, we need 90 days of user browsable snapshots.

Speaker:

That's what I meant was that they were really good at articulating what

Speaker:

their requirements were and also.

Speaker:

Why those requirements?

Speaker:

So like, you

Speaker:

yeah.

Speaker:

And I think that's a good lesson though for backup admins, right?

Speaker:

You should be looking at the metrics of these systems because if you want

Speaker:

to make a case for say new technologies or new process improvements or other

Speaker:

things like that, having this data to show why you need something so you could

Speaker:

put it into requirements is critical.

Speaker:

exactly, exactly.

Speaker:

So let's define

Speaker:

What's a

Speaker:

what we mean.

Speaker:

Yeah.

Speaker:

What is a snapshot?

Speaker:

Let me give you my definition, let's see how closely it

Speaker:

lines with what you call it.

Speaker:

Alright.

Speaker:

What's your definition

Speaker:

of a

Speaker:

my definition of a snapshot is a point in time copy of the data that existed

Speaker:

at some point in time, so it has to have been plausible, that can be

Speaker:

preserved such that it's not modified when the primary copy gets modified.

Speaker:

So, I would take your definition and I would insert one word

Speaker:

I think to make it perfect

Speaker:

Okay?

Speaker:

and that is the word virtual at the beginning because It's a virtual

Speaker:

copy, because that is really what differentiates a snapshot from a copy,

Speaker:

because you just said a copy, right?

Speaker:

So it is a, I like to use the word view.

Speaker:

It's a view.

Speaker:

Into your volume that you're protecting with the snapshot

Speaker:

at a different point in time.

Speaker:

And I like the word view because it, it, which is very much a database term, right?

Speaker:

It's just a different, it's a way to look at your current volume

Speaker:

at a different point in time.

Speaker:

The, I think the most important thing that differentiates a true snapshot from a lot

Speaker:

of other things that we call snapshots.

Speaker:

Is that it is a virtual copy in that relies on the primary volume that it is

Speaker:

protecting for most of the blocks of data.

Speaker:

The bulk of the blocks, when you're reading that snapshot, the bulk

Speaker:

of those blocks are going to come from the, the current volume.

Speaker:

Are you with me?

Speaker:

Right.

Speaker:

That basically that the change data is going to come from

Speaker:

snapshot.

Speaker:

some, the snapshot, right, how that happens, that's the difference between

Speaker:

copy on write and redirect on write, but the bulk of the data is going

Speaker:

to come from the current volume.

Speaker:

That's basically what I'm

Speaker:

Okay.

Speaker:

I

Speaker:

we on the same page?

Speaker:

Okay.

Speaker:

All right.

Speaker:

So we can go, I mean, only one of us actually worked at a vendor that did

Speaker:

snapshot, so, you know, want to make sure I'm getting things right here.

Speaker:

Um, this is really the key of the difference between a snapshot and a

Speaker:

copy or a snapshot and a backup why does that matter from a backup perspective?

Speaker:

Why does that

Speaker:

snapshots.

Speaker:

They're not independent.

Speaker:

And that's why, you know, those of us that, you know, care about

Speaker:

things like backup and recovery.

Speaker:

We just like to scream and say, snapshots are not backup.

Speaker:

Right.

Speaker:

Um, Now, I will say that you can use snapshots as a way to get backup, but a

Speaker:

snapshot by itself on a volume is, I like to call it a convenience copy, right?

Speaker:

It is a way to go back in time as long as you don't have media failure.

Speaker:

Right, right.

Speaker:

You don't have a double disk failure in a RAID 5 array or a triple

Speaker:

disk failure in a RAID 6 array.

Speaker:

Yeah, and like you're saying, it could use a snapshot to allow you to do other backup

Speaker:

mechanisms like take a copy, preserve it in a point of time, now you move that

Speaker:

data and you back up that data, right?

Speaker:

Which, if you're integrating with applications, you can now take an

Speaker:

application consistent point in time while your database is in hot

Speaker:

backup mode, take that snapshot, now you preserve that point in time.

Speaker:

You can.

Speaker:

Uh, Tha, the database, so it can continue operating like normal, Tha, the database.

Speaker:

What word are you saying there?

Speaker:

I'll follow.

Speaker:

I just, I don't know what word I thought you were saying there.

Speaker:

So basically you're saying, because you freeze it.

Speaker:

You're saying you freeze it and now you thaw it.

Speaker:

Okay, all

Speaker:

Yeah, or you could quiesce and unquiesce.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Okay.

Speaker:

I just, I don't usually use that term.

Speaker:

So it really threw me.

Speaker:

right.

Speaker:

And then you do your backup off of that snapshot, right?

Speaker:

So you now have a copy that's frozen that you can now do your

Speaker:

backup and it's all good to go.

Speaker:

Right.

Speaker:

Um, I'd say the most common outside of storage arrays, the most common snapshot

Speaker:

that is used in that way is VSS, right?

Speaker:

The Windows Volume Shadow Services.

Speaker:

And it, it's basically integrated into the operating system.

Speaker:

It's integrated into the applications.

Speaker:

It is.

Speaker:

And backup apps can integrate with VSS.

Speaker:

These are all done with APIs and a backup app can show up and say, Hey,

Speaker:

I am here to do a backup of this box.

Speaker:

Um, please, I'm going to really simplify it.

Speaker:

Please take a snapshot of everything that needs to have a snapshot taken of

Speaker:

it before I take a backup, then you take a backup and then they, um, and it can

Speaker:

take a backup of that snapshot, even though the volume continues to change.

Speaker:

It it's given this view into the volume that is static.

Speaker:

And then it can back up that volume, uh, and get that perfectly application

Speaker:

consistent version of the volume.

Speaker:

Uh, even if the backup takes two hours, it doesn't matter.

Speaker:

It has it, all of the blocks will be from the same exact point in time.

Speaker:

And then when it's done, it can tell VSS to delete that snapshot

Speaker:

or it can keep it around.

Speaker:

It's up to you.

Speaker:

It's just, it's a configuration thing.

Speaker:

One thing I do want to mention.

Speaker:

That, like we said, Snapshots just gives you that point in time copy, right?

Speaker:

It's a read only, point in time copy.

Speaker:

Now sometimes you will also hear, and I don't think we're covering

Speaker:

it later, clones being used, right?

Speaker:

Where I take a virtual copy of the volume and start using it, just going

Speaker:

back to the previous discussion, Curtis.

Speaker:

The difference is clones are writable.

Speaker:

So you're making changes to it.

Speaker:

Right, a snapshot is a read only copy that you preserve that point in time,

Speaker:

nothing's gonna change it, and it's always there for you to go back, you

Speaker:

can pull your file, your data out of it.

Speaker:

Clones, on the other hand, give you a copy of the volume at a point in time,

Speaker:

but it's so you can use it for some purpose, like for testing out restore.

Speaker:

Capabilities.

Speaker:

Can I verify my backups?

Speaker:

Those sort of things.

Speaker:

And also doing, like, database recovery against that copy, the clone

Speaker:

copy, and other things like that.

Speaker:

So, clones are different than snapshots, even though they both

Speaker:

might start from a snapshot copy.

Speaker:

Right.

Speaker:

Um, yeah, I, I would probably just call it a read write snapshot, but

Speaker:

maybe that's a contradiction in terms.

Speaker:

Um,

Speaker:

Yes, a snapshot is a point in time, Curtis.

Speaker:

yeah, exactly.

Speaker:

THere are three ways that snapshots are created.

Speaker:

The most common way, I, would you, we say it's still the most common

Speaker:

way, the copy on write method?

Speaker:

Uh, no, I don't think, I don't think so anymore.

Speaker:

Okay.

Speaker:

Well, historically what used to be the most common method before

Speaker:

one vendor ruined it for everybody

Speaker:

Made something

Speaker:

is.

Speaker:

So is called the copy on write method.

Speaker:

And the reason it's called the copy on write method is that we create this,

Speaker:

this storage area that is going to hold the, um, the, the snapshot blocks.

Speaker:

And when we go to update a block, because we're, it's a storage volume, right?

Speaker:

So we're going to update a block and we say, Hey, Uh, there's

Speaker:

a snapshot for this block.

Speaker:

We're going to copy that block out to the snapshot area before you write.

Speaker:

So that's why it's called copy on write.

Speaker:

And that is very expensive because if you think about it, there are, let me count.

Speaker:

There, there, there is a read.

Speaker:

And a right for every right.

Speaker:

There's, there's a says,

Speaker:

and it's not just the one read and write that happens on the snapshot side.

Speaker:

You also have a bunch of metadata and updating indirect blocks and

Speaker:

a whole bunch of other things.

Speaker:

So, yeah, doing a copy on write might lead to, say, 10 additional I.

Speaker:

O.

Speaker:

operations or 12.

Speaker:

Yeah.

Speaker:

for that one block.

Speaker:

And so what happens is that over time that the number of blocks, remember

Speaker:

when I initially, I mentioned that the bulk of the data is going to come from

Speaker:

the primary volume, but over time.

Speaker:

As a snapshot has been created, more and more blocks are going to be copied into

Speaker:

that snapshot area, and which means that at some point, you know, a significant

Speaker:

portion of my snapshot, if I'm.

Speaker:

If I'm reading it, a significant portion is going to come from the snapshot area.

Speaker:

And so it, there are multiple reasons that there's a performance hit.

Speaker:

I think the biggest performance hit is.

Speaker:

You know, you talk about all of the IOs that have to happen every time we update,

Speaker:

uh, you know, we do a copy on write.

Speaker:

The other is that as time goes on, the more and more data that I have to get

Speaker:

from my snapshot area when I'm doing a read, the performance goes down.

Speaker:

And this goes back to that vendor that they basically suggested that if we had

Speaker:

90 days of user browsable snapshots, That their performance was going to be

Speaker:

like half of what, uh, what it typically

Speaker:

and I would say that is Probably based on older technology, Curtis.

Speaker:

I think when you had traditional RAID arrays or RAID Groups

Speaker:

that you were creating.

Speaker:

I think there was more of a performance impact I think now since you end

Speaker:

up aggregating a bunch of disks and then carving out volumes And so

Speaker:

you can share in the performance.

Speaker:

I think That's not as big of a concern anymore as it used to be.

Speaker:

Like I know those systems you're talking about, they now support

Speaker:

a thousand snapshots, right, for

Speaker:

so you think that the main hit from a performance standpoint on a copy

Speaker:

on write snapshot scenario in modern technology is mainly that IO hit the

Speaker:

first time you go to do a write when every time, every time you update a write.

Speaker:

And by the way, remember that It has to do that for every, that's sort of

Speaker:

calculate every time it does it right.

Speaker:

It has to calculate, is there a snapshot that is looking at this

Speaker:

block as it exists at this point in time, and then you have to copy it,

Speaker:

uh, for that, for that snapshot.

Speaker:

Which, there are different mechanisms you could use.

Speaker:

You could use bitmaps, you could use other things.

Speaker:

So, it's not the end of the world, and I think that a lot of these storage vendors

Speaker:

have optimized if they are continuing to use copy on write technologies.

Speaker:

But I would say a good chunk of them have moved away from

Speaker:

copy on write because of the I.

Speaker:

O.

Speaker:

penalties that we've talked

Speaker:

right, right.

Speaker:

So the, the, there was this vendor that came out, a little vendor

Speaker:

called, at one point it used to be called Network Appliance.

Speaker:

And it had a little screw and bolt as its logo.

Speaker:

right.

Speaker:

Um, the, the, at one point they said, we're just going to change our

Speaker:

name to NetApp because that's what

Speaker:

That's what everyone calls them.

Speaker:

I remember actually working at a startup and got my hands on my

Speaker:

first NetApp appliance and was like, yeah, wow, these are kind of cool.

Speaker:

And this was before I started working there.

Speaker:

And I was like, wow, this is really amazing and simple

Speaker:

for what it does and easy to

Speaker:

Yeah, exactly.

Speaker:

And they used a completely different way to, um, to do snapshots that at the time

Speaker:

was revolutionary, which I think has now been adopted by a lot of storage vendors.

Speaker:

And that is, we call it redirect on write.

Speaker:

Do you want to describe how that

Speaker:

works?

Speaker:

so what read so before you get there?

Speaker:

I think we need to talk about their right anywhere file layout

Speaker:

which allows then the snapshot.

Speaker:

Yep waffle Which a lot of other vendors I think do something similar as well these

Speaker:

days but what it is is going back to the copy on write example If you are writing

Speaker:

a block of data in copy on write sort of file system, previous file systems,

Speaker:

you would always write to the same spot.

Speaker:

And because you're always writing to the same spot, that's why you have to first

Speaker:

copy out the data and then update it.

Speaker:

With a write anywhere file layout, what you end up doing is, it doesn't

Speaker:

matter which actual block you end up writing to, you basically construct the

Speaker:

metadata tree to reference that block.

Speaker:

Even though I might be updating block 100, block 100 might actually

Speaker:

physically be at, like, block 1000.

Speaker:

And because I have all my metadata that tells me exactly where that data

Speaker:

exists, I just need to update the metadata to say, okay, if someone

Speaker:

tries to access block 100, it's actually physically on block 1000.

Speaker:

And so you can end up writing to any location in the file system

Speaker:

and not having to worry about always hitting that same location.

Speaker:

And so that's kind of waffle in a nutshell.

Speaker:

So basically what you then is you have this metadata system that has a pointer

Speaker:

to every block and it doesn't really matter where those blocks happen to be.

Speaker:

And you can have thousands of these snapshots, right?

Speaker:

These pointers at the top.

Speaker:

right, so then when we go to do an update and we have a snapshot, it just

Speaker:

means that what we're going to do is we're going to change the pointer.

Speaker:

We're going to say, okay, this, there's this block that's sitting here.

Speaker:

And we know we, we, we're not supposed to update that block because we

Speaker:

have a snapshot that's requiring on that, that's requiring that block.

Speaker:

And then we're going to just redirect.

Speaker:

We're going to, we're going to write a new block for the

Speaker:

new version of that old block.

Speaker:

And then we're going to change the pointer, right?

Speaker:

We're going to redirect the pointer.

Speaker:

To this new location of that block.

Speaker:

Meanwhile, the old block is still sitting there and we've got a

Speaker:

snapshot that's just pointing to it.

Speaker:

Right.

Speaker:

Um, and so you've got this infinite number of snapshots that are pointing

Speaker:

to an infinite number of, well, it's not an infinite number of snapshots, but

Speaker:

you have a very high number of snapshots that are, that, that, and they're all.

Speaker:

Just a whole bunch of metadata pointing to a whole bunch of blocks that are

Speaker:

just sitting all around the volume.

Speaker:

And the one thing, so this all sounds amazing, right?

Speaker:

Because you're like, Oh, writes are, writes are super fast.

Speaker:

I don't have to worry about it.

Speaker:

In fact, Curtis, just one correction.

Speaker:

When you're going to actually write a new block, you never actually have to

Speaker:

look up the old block because you're always writing to a new location.

Speaker:

So you don't care if that old block is occupied by a snapshot or not, right?

Speaker:

Because this is the downside though, of snapshots, especially with the

Speaker:

write anywhere file layout is.

Speaker:

You now need a process to go through and say when a snapshot gets deleted,

Speaker:

what blocks are no longer being used actively because they don't

Speaker:

belong to snapshots, they're not currently as part of the volume.

Speaker:

So you have a garbage reclamation process or different vendors call it

Speaker:

different things, but some process to go through and reclaim all of those

Speaker:

free blocks so they can be reused.

Speaker:

I think you said the same thing I said in different words, but

Speaker:

yeah, I see what you're saying.

Speaker:

I guess I was saying that, that it's making a decision on what to do.

Speaker:

Based on whether or not, I think if I could just change my part

Speaker:

of my answer, instead of, is this block being used by anything else?

Speaker:

I guess is,

Speaker:

Yeah, more

Speaker:

I guess that's the question I'm asking.

Speaker:

Yeah.

Speaker:

Uh, and actually, I guess what you're saying is it actually doesn't even make

Speaker:

that decision at that point in time.

Speaker:

If it's going to modify a new block, if it's going to modify a

Speaker:

block, it just writes a new block.

Speaker:

And then what happens to that old block is a completely separate process.

Speaker:

If there's, if there is a snapshot that's pointing to that, that

Speaker:

old block, then it will stay.

Speaker:

If there are no snapshots that are pointing to that block, then at some

Speaker:

point the garbage collection process will come and make it go away.

Speaker:

Is that a better, is that a better

Speaker:

Yes, that is a better description.

Speaker:

Now this is

Speaker:

don't want to argue with a former NetApp employee

Speaker:

about how NetApps

Speaker:

now I should say this is based on NetApp's technology, different

Speaker:

vendors may do different things, but for the most part, most of the

Speaker:

vendors do something similar ish.

Speaker:

Specifically, it's based on how NetApp worked when you worked

Speaker:

there, which was a while ago.

Speaker:

And things may be different now,

Speaker:

that is also true.

Speaker:

not.

Speaker:

I mean, Waffle is like at the core of their...

Speaker:

You know, the core of their technology.

Speaker:

And so, and that's why with redirect on right, that's why you could essentially

Speaker:

have an infinite number of snapshots with, with zero performance penalty,

Speaker:

you have zero performance penalty of having basically the performance

Speaker:

penalty of doing a right update.

Speaker:

Is the same whether you have a snapshot or you don't have a snapshot.

Speaker:

The, the only penalty, if you want to call it that, is that, um, one, one

Speaker:

big difference between this way and the other way is there is no snapshot area.

Speaker:

Right.

Speaker:

So in the other method, the snapshot area could fill up if

Speaker:

you held snapshots for too long.

Speaker:

In this configuration, there is no snapshot area.

Speaker:

The snapshot area is the volume, and if you have too many updates and you

Speaker:

keep too many snapshots, you would fill up the volume with snapshots.

Speaker:

So you've gotta get rid of the older

Speaker:

Well, and this is where That would apply if you're talking NetApp terminology

Speaker:

traditional volumes, but most of it has moved over to virtual volumes, where once

Speaker:

again, you have an aggregate, a shared pool of common data, and for each of the

Speaker:

volumes, typically you also set a limit on how much space a snapshot can occupy.

Speaker:

So you could say, I am allowing 20 percent for snapshots of my overall

Speaker:

volume capacity, in which case it'll start

Speaker:

And what happens when you hit that wall?

Speaker:

I, I don't know what the current behavior is.

Speaker:

Previously.

Speaker:

I believe it would like let you like start automatically pruning

Speaker:

snapshots and trying to free up space.

Speaker:

Right, right.

Speaker:

Because it, obviously, it's not going to, it's not going to prune, uh,

Speaker:

production, you know, current data.

Speaker:

So, yeah, I, it could, but again, we're, we're talking specifically NetApp,

Speaker:

but something has to happen, right?

Speaker:

If you're using this method.

Speaker:

Something has to happen.

Speaker:

Either we have to stop creating new snapshots, right?

Speaker:

Or stop updating the snapshots that we have.

Speaker:

And, uh, we need to delete older snapshots or we need to maybe delete, you know,

Speaker:

certain ones in the middle, right?

Speaker:

Basically you've got to do some kind of pruning or else you're going to

Speaker:

Yeah, the other challenge is also figuring out what snapshot to delete

Speaker:

because blocks are being shared, right?

Speaker:

You might be like, hey, this snapshot is huge and you go delete it, but

Speaker:

because those blocks are being shared by other snapshots, you're not actually

Speaker:

going to free any space, right?

Speaker:

So you need to be able to figure out like which snapshot actually

Speaker:

contains unique blocks that if I delete it will actually save me space.

Speaker:

complicated.

Speaker:

Storage management.

Speaker:

I

Speaker:

I don't miss production storage management.

Speaker:

Any other final thoughts on redirect on, right?

Speaker:

think that covers it

Speaker:

I mean, My personal, if you're going to do snapshots on a storage array, I

Speaker:

think redirect on write is the way to go.

Speaker:

It sounds like what you're saying, they've made copy on write better,

Speaker:

but I still think redirect on write is just significantly better.

Speaker:

So, um, but it might be more complicated than if you're coding it.

Speaker:

Right.

Speaker:

So the next one is what I'm going to call the dumbest of all snapshot methods.

Speaker:

That's not what I have in the book.

Speaker:

I gave it a much nicer name in the book.

Speaker:

And guess who does this method?

Speaker:

The leading hypervisor company in the world.

Speaker:

Yes.

Speaker:

I think that's a fair statement, right?

Speaker:

company in the world.

Speaker:

I think, I think it still is.

Speaker:

Yeah.

Speaker:

And that would be VMware.

Speaker:

So the way VMware does snapshots is just literally the Dumbest

Speaker:

implementation of snapshots that I've ever seen and I don't know how they

Speaker:

haven't addressed it, but here it is.

Speaker:

When you create a snapshot in VMware, it literally holds all the rights.

Speaker:

Now, by the way, if I'm wrong, by the way, you know, Broadcom, don't sue me.

Speaker:

This is based, this is based on my understanding of VMware snapshots.

Speaker:

Uh, you know, I've, I've checked every once in a while and they,

Speaker:

no one seems bothered by this.

Speaker:

Uh, but if this has changed, any of you that are, you know, if anybody

Speaker:

works for Broadcom slash VMware, then, you know, feel free to update

Speaker:

me and I will update this episode.

Speaker:

And I'll just delete this section.

Speaker:

But here's the way it works.

Speaker:

When you create a single snapshot on a VMware volume, it halts.

Speaker:

All rights on the, on the current volume.

Speaker:

And then it keeps all rights in a snapshot area.

Speaker:

And then when you delete that snapshot, it replays all those

Speaker:

rights against the production volume.

Speaker:

And this is why when you make a snapshot.

Speaker:

And then you, if you hold that snapshot for a long time and then

Speaker:

you delete that snapshot, this is why it has a big performance hit

Speaker:

against the production volume.

Speaker:

But no one

Speaker:

this is

Speaker:

snapshot of a VM

Speaker:

yeah, this is why you do not do this.

Speaker:

You don't use snapshots on VMware level snapshots the way you do any other

Speaker:

snapshots, because, and by the way, I used VMware for years before knowing this.

Speaker:

That's why I want to make sure I mentioned it.

Speaker:

And, and that is that if you create a snapshot and then hold it for a

Speaker:

long period of time, you're going to get hit with a massive IO hit

Speaker:

when you delete that snapshot.

Speaker:

So if you're using VMware, VMware level snapshots, then you use them the way we

Speaker:

talked about earlier, where you create a snapshot, you make a, you make a backup.

Speaker:

And then you delete the snapshot.

Speaker:

Maybe you take a VMware level snapshot, and then you take a storage level

Speaker:

snapshot of that snapshot, and then you delete the, the VMware level snapshot.

Speaker:

You should, if this is the way your snapshot system works, you

Speaker:

cannot leave the snapshots around for any significant period of time.

Speaker:

I was going to chime in.

Speaker:

Thank you for covering that.

Speaker:

The, this specifically is VM where software snapshots,

Speaker:

if you wanna call it that.

Speaker:

Right, that are only done at the VMware level.

Speaker:

Now, there are integrations that various storage vendors offer

Speaker:

by plugging into the VMware API.

Speaker:

So whenever you trigger a VMware snapshot, it actually triggers

Speaker:

a storage level snapshot.

Speaker:

So avoiding some of these issues, but not everyone is aware of it.

Speaker:

Not everyone is using a third party storage array that integrates with VMware.

Speaker:

So.

Speaker:

Just

Speaker:

Yeah.

Speaker:

So, right.

Speaker:

Thanks for, thanks for clarifying that.

Speaker:

This is specifically VMware level snapshots that are done by VMware.

Speaker:

And without any third party storage.

Speaker:

Yeah.

Speaker:

And I don't know why VMware did this, but it's bonkers.

Speaker:

It's just literally one of the weirdest, codest thing, weirdest

Speaker:

coded things I've ever heard.

Speaker:

Why would you do it that way?

Speaker:

Somewhere in a meeting, this is how they decided to

Speaker:

it.

Speaker:

was probably easier

Speaker:

and yeah, yeah, maybe it was easier,

Speaker:

and they didn't talk to the

Speaker:

I wonder about that.

Speaker:

They didn't, they exactly, they did not talk to the backup folks.

Speaker:

Well, uh, I think we have, uh, summarized the world of snapshots.

Speaker:

That?

Speaker:

No, I think we did a good job with that.

Speaker:

So, copy on write, redirect on write, dumbest method ever.

Speaker:

Those are the three types.

Speaker:

I've got it officially in the book, uh, I've got this labeled

Speaker:

as the hold all writes method.

Speaker:

Uh, I should really just change that to the dumbest method ever.

Speaker:

But, um, yeah.

Speaker:

So, you know, snapshots are a great tool.

Speaker:

In the backup and recovery arsenal.

Speaker:

They are the great sort of basis upon which we're going to talk about one

Speaker:

of my favorite ways to do backup.

Speaker:

And we're going to talk about that in another episode.

Speaker:

Hint, it's called near CDP, not CDP.

Speaker:

It's called near CDP.

Speaker:

And, uh, it's just, just the number one thing you have to understand

Speaker:

about snapshots is that unless you have copied this snapshot to

Speaker:

another location via some mechanism.

Speaker:

Which could be backup.

Speaker:

It could be replication of the volume.

Speaker:

It could be a number of things.

Speaker:

You do not have a backup.

Speaker:

You have a picture of your volume.

Speaker:

And that picture of your volume is as worthless as a picture of your

Speaker:

house after your house burns down.

Speaker:

It'll just be a nice memory and, uh, and a really bad day.

Speaker:

So it's just, that's the really, the most important thing to

Speaker:

understand about snapshots.

Speaker:

And now if this is your first time, snapshots have been explained to you.

Speaker:

Now you understand why I don't like it that they call.

Speaker:

What AWS does snapshots because that very much does not meet

Speaker:

the definition that we just had.

Speaker:

And I'm glad I brought this up because it's important to what we're talking

Speaker:

about is storage level snapshots.

Speaker:

Darn it.

Speaker:

I don't know

Speaker:

You can't call it that, yeah, because

Speaker:

this.

Speaker:

Yeah, these are traditional snapshots.

Speaker:

There are other things out there that people call snapshots

Speaker:

that don't work like this.

Speaker:

AWS snapshots don't work like this.

Speaker:

Uh, they are an actual image copy.

Speaker:

It's actually, they actually, when you make an AWS snapshot, it actually copies

Speaker:

that, that point in time out to another area of storage, which happens to be S3,

Speaker:

you?

Speaker:

And I think specifically you're talking about an AWS EBS snapshot

Speaker:

thank you.

Speaker:

I am talking about an AWS EBS snapshot.

Speaker:

Um, my former employer, Druva, they call what they do snapshots.

Speaker:

They call their backups snapshots.

Speaker:

I never liked that, but you know, nobody asked me.

Speaker:

Uh, so, but what we're talking about here is traditional snapshots.

Speaker:

And, um, a lot of other people will call what they do a snapshot.

Speaker:

Um, the problem is like a lot of terms in the, in the backup world.

Speaker:

It's a term like so many of our terms are, um, their words that are used

Speaker:

just, they're just English words that are used in so many different contexts.

Speaker:

when, like when we had the CDP episode, we couldn't figure out what to call

Speaker:

those point in time, because a lot of the CDP vendors called them snapshots.

Speaker:

Yeah, exactly.

Speaker:

Yeah, exactly.

Speaker:

All right.

Speaker:

Well, uh, I guess the only thing left for me to say is that's a wrap