I'm not sure who needs to hear this, but snapshots are not backups.
Speaker:And in this context, I'm talking about virtual snapshots, like
Speaker:what you do with the storage re.
Speaker:We're not talking about cloud snapshots, like what you do
Speaker:with EBS volumes in the AWS.
Speaker:If you don't know the difference or you can't articulate why
Speaker:these snapshots are not backups.
Speaker:Well, have I got a podcast for you?
Speaker:Hi, I'm W.
Speaker:Curtis Preston AKA Mister backup.
Speaker:And my podcast turns unappreciated backup admins into cyber recovery heroes.
Speaker:This is the backup wrap-up.
Speaker:Hi, and welcome to the Backup Wrap Up.
Speaker:I'm your host, W.
Speaker:Curtis Preston, and I have with me my Spanish language
Speaker:encourager, Prasanna Malaiyandi.
Speaker:I gave you a positive thing
Speaker:I know, I'm impressed.
Speaker:And I'm also impressed that you are done now with, what, Spanish 2, right?
Speaker:Spanish 2, uh, Sisi termine con Span, uh, I almost said
Speaker:Spanish, Espanol 2, um, yeah.
Speaker:So now I need to, I need to do a little review because there's, it's, it's
Speaker:really, you know, for a, for an English speaker, uh, the challenges, as I know,
Speaker:I've spoken to you, the challenges are things like, we're like, it, it, the,
Speaker:the gender thing is, is not a big deal.
Speaker:You get used to that.
Speaker:What's challenging is words that n an ma like PMA you would
Speaker:think that would be feminine, but it's actually a masculine word.
Speaker:Uh, because it, it ends in an, uh, ma.
Speaker:I have to pound that stuff into my head and just say it over and over.
Speaker:I will be using, the thing that I made where I, I'm actually
Speaker:using technology to make a verbal recording that I can listen to.
Speaker:Uh, it's very, very cool.
Speaker:Uh, but I am excited to move on to Spanish three and.
Speaker:Bring you dragging with me.
Speaker:It's helpful.
Speaker:I, Used to be pretty good at Spanish because I took it in high
Speaker:school and then I lost everything.
Speaker:I can order things off a menu, but that's about it.
Speaker:Right.
Speaker:Right.
Speaker:So, uh, I think it's time for the news of the week.
Speaker:this one is frustrating.
Speaker:Uh, we'll, we'll, we'll do a frustrating one first and then
Speaker:some good news about a vendor.
Speaker:Let's talk about this one password hack.
Speaker:And it frustrates me for many reasons.
Speaker:One is, we're such a fan of password managers.
Speaker:And there are those who are not fans of password managers.
Speaker:There are also those who are not a fan of the cloud.
Speaker:And this just, uh, is both of those things.
Speaker:So, uh, do you want to talk a little bit about the 1Password slash Octahack?
Speaker:so basically what happened is 1Password, I don't know if it was
Speaker:1Password or Okta, but basically some hackers got in into 1Password.
Speaker:They were able to change some things in their Okta instance and try to get a list
Speaker:of admins, which I'm sure they were going to use to target and be able to deploy
Speaker:things so they could cause bigger damage.
Speaker:Right.
Speaker:But as they were starting to sort of peel back the onion and
Speaker:figure out, okay, what happened?
Speaker:They realized what actually happened is it started on the Okta side,
Speaker:Right.
Speaker:The reason they were able to do what, what they ended up being able
Speaker:to do was because Okta had already been, had already been hacked.
Speaker:And so it's interesting in this case because It wasn't like, oh, they
Speaker:just got into Okta and then they ping ponged and got into 1Password.
Speaker:They actually looked at support files that someone in 1Password, an engineer,
Speaker:had uploaded because they probably had some issue or something else like that.
Speaker:They were reaching out to Okta and they uploaded basically a support bundle.
Speaker:Um, and in that support bundle, they contained, in addition to sort of like
Speaker:information needed to troubleshoot, it also contained session cookies,
Speaker:Right.
Speaker:Which they were then able to use.
Speaker:Right.
Speaker:Yeah, and so they were able to basically impersonate the one password employee
Speaker:and log into the system and that's when they started causing havoc.
Speaker:Yeah.
Speaker:And, and the good news is that they did, that 1Password did see what was happening.
Speaker:Uh, they did see this weird session coming from an odd IP.
Speaker:And they, they shut it down before basically all they did was
Speaker:attempt to get a list of admins.
Speaker:They didn't actually get the list of admins is what it looks like.
Speaker:Um, but the, the whole thing was again, you know, it went back to
Speaker:this, the fact that the, the hacker initially had compromised the.
Speaker:The Okta support system, which they then got this support file.
Speaker:And, you know, in retrospect, it looks like everybody did their job.
Speaker:It, it looks like, right.
Speaker:That, that, that.
Speaker:Things were noticed, things were stopped.
Speaker:I think that in the case of Okta, maybe they weren't noticed quick
Speaker:enough because it actually, 1Password was just one company of several that
Speaker:were compromised because of this.
Speaker:It doesn't appear for those of you that are 1Password customers, it doesn't
Speaker:appear that any customer data was accessed, doesn't appear, you know,
Speaker:unlike the, what was the other one?
Speaker:The LastPass, the LastPass hack where they actually got the vault.
Speaker:And then you had to worry if you had insecure passwords, right?
Speaker:That, that, that, that there were guessable, but it doesn't appear
Speaker:that that was the case here.
Speaker:They found it pretty quickly.
Speaker:think another interesting thing that I found, by the way, this is an
Speaker:article on The Register, we'll put a link in the show notes description.
Speaker:Right.
Speaker:The one thing I found interesting is they were trying to figure
Speaker:out, okay, what actually happened?
Speaker:Like, when was This information stolen and so they went back and they looked at
Speaker:their logs on the Okta side and they were trying to figure out okay was the archive
Speaker:access before the support engineer on the Okta side accessed it and they were able
Speaker:to figure out no the support engineer hadn't opened it yet so it wasn't a
Speaker:rogue support engineer on the Okta side.
Speaker:And then they looked at the 1Pass site, 1Password site and they saw,
Speaker:okay, the person who uploaded it was on a public Wi Fi at a hotel.
Speaker:And they were like, oh, maybe it could have been stolen during the upload
Speaker:because you know, those connections are always unencrypted and all the rest.
Speaker:But they looked and they're like, no, the upload process
Speaker:had TLS end to end up to Okta.
Speaker:And so.
Speaker:Everything was encrypted.
Speaker:It wasn't stolen in the process.
Speaker:So there must've been something else that had accessed it on the Okta side.
Speaker:So it's interesting how they're able to piece this all together.
Speaker:It's almost like CSI, right?
Speaker:You're piecing together all these clues, trying to figure out, okay,
Speaker:what happened, what went wrong?
Speaker:Well, that's what you have to do in an incident response system, right?
Speaker:You've got to, you know, piece together what you can from the logs that you have.
Speaker:Logs, logs, logs, right?
Speaker:That's what it's all about is the logs.
Speaker:The, the, the only thing that's somewhat distressing is that it
Speaker:appeared that they made some tweaks.
Speaker:Some upgrades to their MFA.
Speaker:Uh, and by upgrades, I mean like they changed some of their stuff.
Speaker:They're like, maybe we shouldn't allow anyone to log into Okta who isn't
Speaker:at a, um, a one password IP, right?
Speaker:Maybe, maybe we shouldn't allow that.
Speaker:That seems like
Speaker:Uh, that's,
Speaker:you should have decided that before, but yeah,
Speaker:know about that, Curtis, because there are times, like, you might
Speaker:be traveling, and you might be the super user, and you have to change
Speaker:something, and you're not at a desk.
Speaker:Or, imagine that one password blows up, something happens to their site,
Speaker:and they need to reset passwords, and they could eventually get locked out.
Speaker:But maybe this is an
Speaker:I would think, yeah, I would, uh, I would think that that would be an
Speaker:exception case that you, you would deal with that maybe by the default
Speaker:would be that you would turn it off.
Speaker:The other thing was that it appears that now they're using Yubikeys.
Speaker:For those that aren't familiar with that, Y U B I K E Y.
Speaker:This is a hardware token.
Speaker:It's pretty inexpensive as these systems go.
Speaker:But it is a a hard, you know, it's it's an actual system that you can actually
Speaker:buy for your own personal use I took a look and it was I think it was a
Speaker:hundred dollars for two of them Um,
Speaker:you want to.
Speaker:you know, and you want two.
Speaker:Yeah, and so, um, then Uh, and, you know, and so it looks like they're
Speaker:now using YubiKeys where maybe before they weren't using YubiKeys.
Speaker:I'm glad that they made those steps.
Speaker:It's just, it's just, it's just a shame that we always make changes to
Speaker:systems after we've been, you know, after we've had an exposure, but good
Speaker:on them, good on them for finding it good on them for responding and
Speaker:saying, Hey, we could make this better.
Speaker:Uh, I'm a little dinging Okta here.
Speaker:There was a thing at the end that Okta basically said that perhaps if you're
Speaker:sending a support file, uh, what was the, what was the type of that file?
Speaker:A HAR file?
Speaker:Um, some sort of archive file.
Speaker:Yeah.
Speaker:Yeah.
Speaker:And they're like, perhaps you shouldn't, perhaps you should sanitize
Speaker:that before you send it to us.
Speaker:And I'm like, well, maybe your system that creates the HAR file
Speaker:should sanitize it for the customer.
Speaker:Before you upload it.
Speaker:Uh, there was a little bit I thought of Victor blaming, uh, towards the end there,
Speaker:but, um, anyway, a, a, a much, I think one that's gonna be a, a lot easier to
Speaker:talk about our former employer, Druva.
Speaker:Uh, there, the headline here, another, it's a tech target article
Speaker:that they've added a gen AI assistant to their cloud backup tool.
Speaker:And I watched a, like a video of a demo and it basically, it looked like, uh,
Speaker:you know, an NLM that you can use to interface with the Druva cloud platform.
Speaker:And so you could ask it things like, Hey, show me my backups, like in, in like, Hey,
Speaker:show me my backup backups have failed.
Speaker:I think the big thing is, especially with, uh, with the sort of focus on
Speaker:AI and making it easily available to a large audience without needing
Speaker:to know all the training, Right.
Speaker:And being an expert in it, I think has now made it sort of commonplace, right?
Speaker:It's easy to pick up an AI model that has been pre trained
Speaker:and use it for some purpose.
Speaker:right?
Speaker:In their case, they're using, uh, Amazon's, uh, no, no.
Speaker:Great surprise there.
Speaker:They're using Amazon's, uh, NLM model,
Speaker:you referring to?
Speaker:Sorry, are you referring to NLP or LLM?
Speaker:You're calling it NLM.
Speaker:thank you.
Speaker:Oh, yeah, sorry.
Speaker:NLP Thank you.
Speaker:NL NLP.
Speaker:So they're, they're using Amazon's, uh, product to do this.
Speaker:No.
Speaker:Great surprise.
Speaker:DVA lives in Amazon, and, uh, it, it seems like the biggest thing that
Speaker:they had to do was to make sure that it was integrated with the security.
Speaker:That's the biggest thing that I see is, yeah, you have to make sure because if
Speaker:you just pick up any random LLM, I'll use LLM because that's a large language
Speaker:model, right, which a lot of the AI models are built on top of, if you.
Speaker:Pick that and depending on what is trained, you might get back bogus
Speaker:answers, random answers, answers that aren't even safe, right?
Speaker:And so at least the fact that Druva is putting guardrails in place to make
Speaker:sure that what gets returned is sensible and safe, I think is a good step.
Speaker:Yeah.
Speaker:So we've seen, uh, we've seen AI use with Alcion.
Speaker:We saw it with, um, I'm trying to think.
Speaker:Yeah.
Speaker:Cohesity came out with one.
Speaker:Do you remember who else?
Speaker:It was Druva.
Speaker:I'm trying to, I know there's another one.
Speaker:Um,
Speaker:Was it Commvault?
Speaker:in the thing.
Speaker:Oh, Dell, Dell, Dell said, yeah, I haven't, I don't think I've seen
Speaker:anything from Commvault, but, uh, Dell is also doing, uh, yeah, I merged,
Speaker:I think, in, uh, uh, uh, NLP and LLM
Speaker:To NLM.
Speaker:It's, it's a Curtis only thing.
Speaker:So, yeah, so that's interesting.
Speaker:Uh, I think anything that makes it easier to interface with your
Speaker:backup system is a good thing.
Speaker:As long as they have those guardrails in place and
Speaker:And I know you always like to talk about how, what's the job, what's the, one of
Speaker:the most important jobs that you give to the most junior person at a company?
Speaker:exactly.
Speaker:Backup, right?
Speaker:And so,
Speaker:why would you do
Speaker:yeah, and so giving an assistant, if you will, right, to that junior backup person
Speaker:is helpful as they're learning the ropes.
Speaker:Yeah.
Speaker:Well, that is the news of the week.
Speaker:As you know, every episode of the Backup Wrap Up is going to dive
Speaker:deep into one particular topic.
Speaker:This week's topic is snapshot, snapshot, snapshot.
Speaker:Click, click, click,
Speaker:It's a topic that comes up a lot.
Speaker:This word comes up a lot on this show.
Speaker:So it's time to dive deep into a world that I, I bet at one point in your career.
Speaker:Persona, you must've heard this.
Speaker:Word a hundred times a day.
Speaker:What do you think?
Speaker:At least.
Speaker:At least,
Speaker:least.
Speaker:because there's certainly one company that thinks they do snapshots different
Speaker:and better than everyone else.
Speaker:And they're probably, they certainly, it's certainly at one point in time.
Speaker:That was true.
Speaker:I think a number of other vendors are now doing snapshots
Speaker:the way NetApp did snapshots.
Speaker:So just a quick story just sort of bring why different ways
Speaker:to do snapshots really matter.
Speaker:I was at a, I'm in my brain, uh, live translating the story because I
Speaker:was at one of the largest companies in the world at a consulting gig.
Speaker:And we were helping them to pick a new storage and backup system.
Speaker:They were looking for an integrated system that would do both storage and
Speaker:data protection as part of that storage.
Speaker:They were already a NetApp customer and they knew that that meant that
Speaker:they knew every foible of NetApp.
Speaker:So, uh, they knew every bad thing about NetApp, but they
Speaker:also knew the good things.
Speaker:And I think of all of the consulting gigs that I've done throughout the years,
Speaker:they had done the best at presenting.
Speaker:These are our requirements.
Speaker:And they are well defined and there are reasons behind every one of them.
Speaker:Mm hmm.
Speaker:And one of their requirements was that they wanted end users, just regular Joe
Speaker:and Jane person sitting in the desktop.
Speaker:To be able to do their own resource.
Speaker:Right.
Speaker:And they wanted them to have the ability to do that at points in
Speaker:time that were like an hour at a time throughout the day, going back
Speaker:to, you know, it was like 90 days.
Speaker:So specifically they said, this is why we want 90 days of user browsable snapshots.
Speaker:At the time that was like.
Speaker:Not everybody did that, right?
Speaker:Depending on how you did snapshots as a vendor, you either could or
Speaker:could not meet that requirement.
Speaker:Just end of story.
Speaker:And so, the, the, the responses ranged from, uh, no problem, right?
Speaker:Literally no problem.
Speaker:To, I remember one vendor coming in and going, That's the
Speaker:dumbest requirement we've ever
Speaker:I bet I can
Speaker:you want 90 days of user browsable snapshots?
Speaker:Yeah, I think you know exactly who that was.
Speaker:It was just chaos and by the way There was a vendor that came in and they had
Speaker:this let me restate that I'm pretty sure it was that vendor that had this
Speaker:sort of Very convoluted system that was based on like you had this block system
Speaker:and then you had this other system that had the blocks and you could,
Speaker:Stephen
Speaker:it was just really, really complicated.
Speaker:And they had this like this wizard of a presenter that
Speaker:was just an amazing presenter.
Speaker:That was very, um, you know, charming and very smart and
Speaker:just presented all of the stuff.
Speaker:Uh, and, and even though that presenter did their best, the, the
Speaker:customer just was not having it.
Speaker:And even though he was a really great presenter.
Speaker:Just didn't play.
Speaker:Anyway, the thing is that the key there is that how you do snapshots It very much
Speaker:dictates how everything's going to work.
Speaker:Going back to the requirements, right?
Speaker:You said that they had a very clear list of things.
Speaker:Do you know why they had that requirement for 90 days?
Speaker:Was it to like?
Speaker:What was the purpose behind having that?
Speaker:Because I'm sure today everyone kind of thinks oh, that's just reasonable, right?
Speaker:Oh, of course, why don't I have that?
Speaker:But back then it seemed like that was something very,
Speaker:they had very specific numbers on the number of restores that had been done.
Speaker:And I know this as a backup person, but if you look at just
Speaker:all of the restores that are done.
Speaker:Ever anywhere, you know, for the history of restores, 99 percent of
Speaker:them are done from data from yesterday.
Speaker:Right.
Speaker:And then, or, or from the most recent snapshot, and then there is this cliff.
Speaker:of usage that just gets smaller and smaller.
Speaker:And it's just this incredibly ever increasing line to zero.
Speaker:And I think in their world, what they showed was that ever increasing line to
Speaker:zero basically dropped off at 90 days.
Speaker:And so they said, we need 90 days of user browsable snapshots.
Speaker:That's what I meant was that they were really good at articulating what
Speaker:their requirements were and also.
Speaker:Why those requirements?
Speaker:So like, you
Speaker:yeah.
Speaker:And I think that's a good lesson though for backup admins, right?
Speaker:You should be looking at the metrics of these systems because if you want
Speaker:to make a case for say new technologies or new process improvements or other
Speaker:things like that, having this data to show why you need something so you could
Speaker:put it into requirements is critical.
Speaker:exactly, exactly.
Speaker:So let's define
Speaker:What's a
Speaker:what we mean.
Speaker:Yeah.
Speaker:What is a snapshot?
Speaker:Let me give you my definition, let's see how closely it
Speaker:lines with what you call it.
Speaker:Alright.
Speaker:What's your definition
Speaker:of a
Speaker:my definition of a snapshot is a point in time copy of the data that existed
Speaker:at some point in time, so it has to have been plausible, that can be
Speaker:preserved such that it's not modified when the primary copy gets modified.
Speaker:So, I would take your definition and I would insert one word
Speaker:I think to make it perfect
Speaker:Okay?
Speaker:and that is the word virtual at the beginning because It's a virtual
Speaker:copy, because that is really what differentiates a snapshot from a copy,
Speaker:because you just said a copy, right?
Speaker:So it is a, I like to use the word view.
Speaker:It's a view.
Speaker:Into your volume that you're protecting with the snapshot
Speaker:at a different point in time.
Speaker:And I like the word view because it, it, which is very much a database term, right?
Speaker:It's just a different, it's a way to look at your current volume
Speaker:at a different point in time.
Speaker:The, I think the most important thing that differentiates a true snapshot from a lot
Speaker:of other things that we call snapshots.
Speaker:Is that it is a virtual copy in that relies on the primary volume that it is
Speaker:protecting for most of the blocks of data.
Speaker:The bulk of the blocks, when you're reading that snapshot, the bulk
Speaker:of those blocks are going to come from the, the current volume.
Speaker:Are you with me?
Speaker:Right.
Speaker:That basically that the change data is going to come from
Speaker:snapshot.
Speaker:some, the snapshot, right, how that happens, that's the difference between
Speaker:copy on write and redirect on write, but the bulk of the data is going
Speaker:to come from the current volume.
Speaker:That's basically what I'm
Speaker:Okay.
Speaker:I
Speaker:we on the same page?
Speaker:Okay.
Speaker:All right.
Speaker:So we can go, I mean, only one of us actually worked at a vendor that did
Speaker:snapshot, so, you know, want to make sure I'm getting things right here.
Speaker:Um, this is really the key of the difference between a snapshot and a
Speaker:copy or a snapshot and a backup why does that matter from a backup perspective?
Speaker:Why does that
Speaker:snapshots.
Speaker:They're not independent.
Speaker:And that's why, you know, those of us that, you know, care about
Speaker:things like backup and recovery.
Speaker:We just like to scream and say, snapshots are not backup.
Speaker:Right.
Speaker:Um, Now, I will say that you can use snapshots as a way to get backup, but a
Speaker:snapshot by itself on a volume is, I like to call it a convenience copy, right?
Speaker:It is a way to go back in time as long as you don't have media failure.
Speaker:Right, right.
Speaker:You don't have a double disk failure in a RAID 5 array or a triple
Speaker:disk failure in a RAID 6 array.
Speaker:Yeah, and like you're saying, it could use a snapshot to allow you to do other backup
Speaker:mechanisms like take a copy, preserve it in a point of time, now you move that
Speaker:data and you back up that data, right?
Speaker:Which, if you're integrating with applications, you can now take an
Speaker:application consistent point in time while your database is in hot
Speaker:backup mode, take that snapshot, now you preserve that point in time.
Speaker:You can.
Speaker:Uh, Tha, the database, so it can continue operating like normal, Tha, the database.
Speaker:What word are you saying there?
Speaker:I'll follow.
Speaker:I just, I don't know what word I thought you were saying there.
Speaker:So basically you're saying, because you freeze it.
Speaker:You're saying you freeze it and now you thaw it.
Speaker:Okay, all
Speaker:Yeah, or you could quiesce and unquiesce.
Speaker:Yeah.
Speaker:Yeah.
Speaker:Okay.
Speaker:I just, I don't usually use that term.
Speaker:So it really threw me.
Speaker:right.
Speaker:And then you do your backup off of that snapshot, right?
Speaker:So you now have a copy that's frozen that you can now do your
Speaker:backup and it's all good to go.
Speaker:Right.
Speaker:Um, I'd say the most common outside of storage arrays, the most common snapshot
Speaker:that is used in that way is VSS, right?
Speaker:The Windows Volume Shadow Services.
Speaker:And it, it's basically integrated into the operating system.
Speaker:It's integrated into the applications.
Speaker:It is.
Speaker:And backup apps can integrate with VSS.
Speaker:These are all done with APIs and a backup app can show up and say, Hey,
Speaker:I am here to do a backup of this box.
Speaker:Um, please, I'm going to really simplify it.
Speaker:Please take a snapshot of everything that needs to have a snapshot taken of
Speaker:it before I take a backup, then you take a backup and then they, um, and it can
Speaker:take a backup of that snapshot, even though the volume continues to change.
Speaker:It it's given this view into the volume that is static.
Speaker:And then it can back up that volume, uh, and get that perfectly application
Speaker:consistent version of the volume.
Speaker:Uh, even if the backup takes two hours, it doesn't matter.
Speaker:It has it, all of the blocks will be from the same exact point in time.
Speaker:And then when it's done, it can tell VSS to delete that snapshot
Speaker:or it can keep it around.
Speaker:It's up to you.
Speaker:It's just, it's a configuration thing.
Speaker:One thing I do want to mention.
Speaker:That, like we said, Snapshots just gives you that point in time copy, right?
Speaker:It's a read only, point in time copy.
Speaker:Now sometimes you will also hear, and I don't think we're covering
Speaker:it later, clones being used, right?
Speaker:Where I take a virtual copy of the volume and start using it, just going
Speaker:back to the previous discussion, Curtis.
Speaker:The difference is clones are writable.
Speaker:So you're making changes to it.
Speaker:Right, a snapshot is a read only copy that you preserve that point in time,
Speaker:nothing's gonna change it, and it's always there for you to go back, you
Speaker:can pull your file, your data out of it.
Speaker:Clones, on the other hand, give you a copy of the volume at a point in time,
Speaker:but it's so you can use it for some purpose, like for testing out restore.
Speaker:Capabilities.
Speaker:Can I verify my backups?
Speaker:Those sort of things.
Speaker:And also doing, like, database recovery against that copy, the clone
Speaker:copy, and other things like that.
Speaker:So, clones are different than snapshots, even though they both
Speaker:might start from a snapshot copy.
Speaker:Right.
Speaker:Um, yeah, I, I would probably just call it a read write snapshot, but
Speaker:maybe that's a contradiction in terms.
Speaker:Um,
Speaker:Yes, a snapshot is a point in time, Curtis.
Speaker:yeah, exactly.
Speaker:THere are three ways that snapshots are created.
Speaker:The most common way, I, would you, we say it's still the most common
Speaker:way, the copy on write method?
Speaker:Uh, no, I don't think, I don't think so anymore.
Speaker:Okay.
Speaker:Well, historically what used to be the most common method before
Speaker:one vendor ruined it for everybody
Speaker:Made something
Speaker:is.
Speaker:So is called the copy on write method.
Speaker:And the reason it's called the copy on write method is that we create this,
Speaker:this storage area that is going to hold the, um, the, the snapshot blocks.
Speaker:And when we go to update a block, because we're, it's a storage volume, right?
Speaker:So we're going to update a block and we say, Hey, Uh, there's
Speaker:a snapshot for this block.
Speaker:We're going to copy that block out to the snapshot area before you write.
Speaker:So that's why it's called copy on write.
Speaker:And that is very expensive because if you think about it, there are, let me count.
Speaker:There, there, there is a read.
Speaker:And a right for every right.
Speaker:There's, there's a says,
Speaker:and it's not just the one read and write that happens on the snapshot side.
Speaker:You also have a bunch of metadata and updating indirect blocks and
Speaker:a whole bunch of other things.
Speaker:So, yeah, doing a copy on write might lead to, say, 10 additional I.
Speaker:O.
Speaker:operations or 12.
Speaker:Yeah.
Speaker:for that one block.
Speaker:And so what happens is that over time that the number of blocks, remember
Speaker:when I initially, I mentioned that the bulk of the data is going to come from
Speaker:the primary volume, but over time.
Speaker:As a snapshot has been created, more and more blocks are going to be copied into
Speaker:that snapshot area, and which means that at some point, you know, a significant
Speaker:portion of my snapshot, if I'm.
Speaker:If I'm reading it, a significant portion is going to come from the snapshot area.
Speaker:And so it, there are multiple reasons that there's a performance hit.
Speaker:I think the biggest performance hit is.
Speaker:You know, you talk about all of the IOs that have to happen every time we update,
Speaker:uh, you know, we do a copy on write.
Speaker:The other is that as time goes on, the more and more data that I have to get
Speaker:from my snapshot area when I'm doing a read, the performance goes down.
Speaker:And this goes back to that vendor that they basically suggested that if we had
Speaker:90 days of user browsable snapshots, That their performance was going to be
Speaker:like half of what, uh, what it typically
Speaker:and I would say that is Probably based on older technology, Curtis.
Speaker:I think when you had traditional RAID arrays or RAID Groups
Speaker:that you were creating.
Speaker:I think there was more of a performance impact I think now since you end
Speaker:up aggregating a bunch of disks and then carving out volumes And so
Speaker:you can share in the performance.
Speaker:I think That's not as big of a concern anymore as it used to be.
Speaker:Like I know those systems you're talking about, they now support
Speaker:a thousand snapshots, right, for
Speaker:so you think that the main hit from a performance standpoint on a copy
Speaker:on write snapshot scenario in modern technology is mainly that IO hit the
Speaker:first time you go to do a write when every time, every time you update a write.
Speaker:And by the way, remember that It has to do that for every, that's sort of
Speaker:calculate every time it does it right.
Speaker:It has to calculate, is there a snapshot that is looking at this
Speaker:block as it exists at this point in time, and then you have to copy it,
Speaker:uh, for that, for that snapshot.
Speaker:Which, there are different mechanisms you could use.
Speaker:You could use bitmaps, you could use other things.
Speaker:So, it's not the end of the world, and I think that a lot of these storage vendors
Speaker:have optimized if they are continuing to use copy on write technologies.
Speaker:But I would say a good chunk of them have moved away from
Speaker:copy on write because of the I.
Speaker:O.
Speaker:penalties that we've talked
Speaker:right, right.
Speaker:So the, the, there was this vendor that came out, a little vendor
Speaker:called, at one point it used to be called Network Appliance.
Speaker:And it had a little screw and bolt as its logo.
Speaker:right.
Speaker:Um, the, the, at one point they said, we're just going to change our
Speaker:name to NetApp because that's what
Speaker:That's what everyone calls them.
Speaker:I remember actually working at a startup and got my hands on my
Speaker:first NetApp appliance and was like, yeah, wow, these are kind of cool.
Speaker:And this was before I started working there.
Speaker:And I was like, wow, this is really amazing and simple
Speaker:for what it does and easy to
Speaker:Yeah, exactly.
Speaker:And they used a completely different way to, um, to do snapshots that at the time
Speaker:was revolutionary, which I think has now been adopted by a lot of storage vendors.
Speaker:And that is, we call it redirect on write.
Speaker:Do you want to describe how that
Speaker:works?
Speaker:so what read so before you get there?
Speaker:I think we need to talk about their right anywhere file layout
Speaker:which allows then the snapshot.
Speaker:Yep waffle Which a lot of other vendors I think do something similar as well these
Speaker:days but what it is is going back to the copy on write example If you are writing
Speaker:a block of data in copy on write sort of file system, previous file systems,
Speaker:you would always write to the same spot.
Speaker:And because you're always writing to the same spot, that's why you have to first
Speaker:copy out the data and then update it.
Speaker:With a write anywhere file layout, what you end up doing is, it doesn't
Speaker:matter which actual block you end up writing to, you basically construct the
Speaker:metadata tree to reference that block.
Speaker:Even though I might be updating block 100, block 100 might actually
Speaker:physically be at, like, block 1000.
Speaker:And because I have all my metadata that tells me exactly where that data
Speaker:exists, I just need to update the metadata to say, okay, if someone
Speaker:tries to access block 100, it's actually physically on block 1000.
Speaker:And so you can end up writing to any location in the file system
Speaker:and not having to worry about always hitting that same location.
Speaker:And so that's kind of waffle in a nutshell.
Speaker:So basically what you then is you have this metadata system that has a pointer
Speaker:to every block and it doesn't really matter where those blocks happen to be.
Speaker:And you can have thousands of these snapshots, right?
Speaker:These pointers at the top.
Speaker:right, so then when we go to do an update and we have a snapshot, it just
Speaker:means that what we're going to do is we're going to change the pointer.
Speaker:We're going to say, okay, this, there's this block that's sitting here.
Speaker:And we know we, we, we're not supposed to update that block because we
Speaker:have a snapshot that's requiring on that, that's requiring that block.
Speaker:And then we're going to just redirect.
Speaker:We're going to, we're going to write a new block for the
Speaker:new version of that old block.
Speaker:And then we're going to change the pointer, right?
Speaker:We're going to redirect the pointer.
Speaker:To this new location of that block.
Speaker:Meanwhile, the old block is still sitting there and we've got a
Speaker:snapshot that's just pointing to it.
Speaker:Right.
Speaker:Um, and so you've got this infinite number of snapshots that are pointing
Speaker:to an infinite number of, well, it's not an infinite number of snapshots, but
Speaker:you have a very high number of snapshots that are, that, that, and they're all.
Speaker:Just a whole bunch of metadata pointing to a whole bunch of blocks that are
Speaker:just sitting all around the volume.
Speaker:And the one thing, so this all sounds amazing, right?
Speaker:Because you're like, Oh, writes are, writes are super fast.
Speaker:I don't have to worry about it.
Speaker:In fact, Curtis, just one correction.
Speaker:When you're going to actually write a new block, you never actually have to
Speaker:look up the old block because you're always writing to a new location.
Speaker:So you don't care if that old block is occupied by a snapshot or not, right?
Speaker:Because this is the downside though, of snapshots, especially with the
Speaker:write anywhere file layout is.
Speaker:You now need a process to go through and say when a snapshot gets deleted,
Speaker:what blocks are no longer being used actively because they don't
Speaker:belong to snapshots, they're not currently as part of the volume.
Speaker:So you have a garbage reclamation process or different vendors call it
Speaker:different things, but some process to go through and reclaim all of those
Speaker:free blocks so they can be reused.
Speaker:I think you said the same thing I said in different words, but
Speaker:yeah, I see what you're saying.
Speaker:I guess I was saying that, that it's making a decision on what to do.
Speaker:Based on whether or not, I think if I could just change my part
Speaker:of my answer, instead of, is this block being used by anything else?
Speaker:I guess is,
Speaker:Yeah, more
Speaker:I guess that's the question I'm asking.
Speaker:Yeah.
Speaker:Uh, and actually, I guess what you're saying is it actually doesn't even make
Speaker:that decision at that point in time.
Speaker:If it's going to modify a new block, if it's going to modify a
Speaker:block, it just writes a new block.
Speaker:And then what happens to that old block is a completely separate process.
Speaker:If there's, if there is a snapshot that's pointing to that, that
Speaker:old block, then it will stay.
Speaker:If there are no snapshots that are pointing to that block, then at some
Speaker:point the garbage collection process will come and make it go away.
Speaker:Is that a better, is that a better
Speaker:Yes, that is a better description.
Speaker:Now this is
Speaker:don't want to argue with a former NetApp employee
Speaker:about how NetApps
Speaker:now I should say this is based on NetApp's technology, different
Speaker:vendors may do different things, but for the most part, most of the
Speaker:vendors do something similar ish.
Speaker:Specifically, it's based on how NetApp worked when you worked
Speaker:there, which was a while ago.
Speaker:And things may be different now,
Speaker:that is also true.
Speaker:not.
Speaker:I mean, Waffle is like at the core of their...
Speaker:You know, the core of their technology.
Speaker:And so, and that's why with redirect on right, that's why you could essentially
Speaker:have an infinite number of snapshots with, with zero performance penalty,
Speaker:you have zero performance penalty of having basically the performance
Speaker:penalty of doing a right update.
Speaker:Is the same whether you have a snapshot or you don't have a snapshot.
Speaker:The, the only penalty, if you want to call it that, is that, um, one, one
Speaker:big difference between this way and the other way is there is no snapshot area.
Speaker:Right.
Speaker:So in the other method, the snapshot area could fill up if
Speaker:you held snapshots for too long.
Speaker:In this configuration, there is no snapshot area.
Speaker:The snapshot area is the volume, and if you have too many updates and you
Speaker:keep too many snapshots, you would fill up the volume with snapshots.
Speaker:So you've gotta get rid of the older
Speaker:Well, and this is where That would apply if you're talking NetApp terminology
Speaker:traditional volumes, but most of it has moved over to virtual volumes, where once
Speaker:again, you have an aggregate, a shared pool of common data, and for each of the
Speaker:volumes, typically you also set a limit on how much space a snapshot can occupy.
Speaker:So you could say, I am allowing 20 percent for snapshots of my overall
Speaker:volume capacity, in which case it'll start
Speaker:And what happens when you hit that wall?
Speaker:I, I don't know what the current behavior is.
Speaker:Previously.
Speaker:I believe it would like let you like start automatically pruning
Speaker:snapshots and trying to free up space.
Speaker:Right, right.
Speaker:Because it, obviously, it's not going to, it's not going to prune, uh,
Speaker:production, you know, current data.
Speaker:So, yeah, I, it could, but again, we're, we're talking specifically NetApp,
Speaker:but something has to happen, right?
Speaker:If you're using this method.
Speaker:Something has to happen.
Speaker:Either we have to stop creating new snapshots, right?
Speaker:Or stop updating the snapshots that we have.
Speaker:And, uh, we need to delete older snapshots or we need to maybe delete, you know,
Speaker:certain ones in the middle, right?
Speaker:Basically you've got to do some kind of pruning or else you're going to
Speaker:Yeah, the other challenge is also figuring out what snapshot to delete
Speaker:because blocks are being shared, right?
Speaker:You might be like, hey, this snapshot is huge and you go delete it, but
Speaker:because those blocks are being shared by other snapshots, you're not actually
Speaker:going to free any space, right?
Speaker:So you need to be able to figure out like which snapshot actually
Speaker:contains unique blocks that if I delete it will actually save me space.
Speaker:complicated.
Speaker:Storage management.
Speaker:I
Speaker:I don't miss production storage management.
Speaker:Any other final thoughts on redirect on, right?
Speaker:think that covers it
Speaker:I mean, My personal, if you're going to do snapshots on a storage array, I
Speaker:think redirect on write is the way to go.
Speaker:It sounds like what you're saying, they've made copy on write better,
Speaker:but I still think redirect on write is just significantly better.
Speaker:So, um, but it might be more complicated than if you're coding it.
Speaker:Right.
Speaker:So the next one is what I'm going to call the dumbest of all snapshot methods.
Speaker:That's not what I have in the book.
Speaker:I gave it a much nicer name in the book.
Speaker:And guess who does this method?
Speaker:The leading hypervisor company in the world.
Speaker:Yes.
Speaker:I think that's a fair statement, right?
Speaker:company in the world.
Speaker:I think, I think it still is.
Speaker:Yeah.
Speaker:And that would be VMware.
Speaker:So the way VMware does snapshots is just literally the Dumbest
Speaker:implementation of snapshots that I've ever seen and I don't know how they
Speaker:haven't addressed it, but here it is.
Speaker:When you create a snapshot in VMware, it literally holds all the rights.
Speaker:Now, by the way, if I'm wrong, by the way, you know, Broadcom, don't sue me.
Speaker:This is based, this is based on my understanding of VMware snapshots.
Speaker:Uh, you know, I've, I've checked every once in a while and they,
Speaker:no one seems bothered by this.
Speaker:Uh, but if this has changed, any of you that are, you know, if anybody
Speaker:works for Broadcom slash VMware, then, you know, feel free to update
Speaker:me and I will update this episode.
Speaker:And I'll just delete this section.
Speaker:But here's the way it works.
Speaker:When you create a single snapshot on a VMware volume, it halts.
Speaker:All rights on the, on the current volume.
Speaker:And then it keeps all rights in a snapshot area.
Speaker:And then when you delete that snapshot, it replays all those
Speaker:rights against the production volume.
Speaker:And this is why when you make a snapshot.
Speaker:And then you, if you hold that snapshot for a long time and then
Speaker:you delete that snapshot, this is why it has a big performance hit
Speaker:against the production volume.
Speaker:But no one
Speaker:this is
Speaker:snapshot of a VM
Speaker:yeah, this is why you do not do this.
Speaker:You don't use snapshots on VMware level snapshots the way you do any other
Speaker:snapshots, because, and by the way, I used VMware for years before knowing this.
Speaker:That's why I want to make sure I mentioned it.
Speaker:And, and that is that if you create a snapshot and then hold it for a
Speaker:long period of time, you're going to get hit with a massive IO hit
Speaker:when you delete that snapshot.
Speaker:So if you're using VMware, VMware level snapshots, then you use them the way we
Speaker:talked about earlier, where you create a snapshot, you make a, you make a backup.
Speaker:And then you delete the snapshot.
Speaker:Maybe you take a VMware level snapshot, and then you take a storage level
Speaker:snapshot of that snapshot, and then you delete the, the VMware level snapshot.
Speaker:You should, if this is the way your snapshot system works, you
Speaker:cannot leave the snapshots around for any significant period of time.
Speaker:I was going to chime in.
Speaker:Thank you for covering that.
Speaker:The, this specifically is VM where software snapshots,
Speaker:if you wanna call it that.
Speaker:Right, that are only done at the VMware level.
Speaker:Now, there are integrations that various storage vendors offer
Speaker:by plugging into the VMware API.
Speaker:So whenever you trigger a VMware snapshot, it actually triggers
Speaker:a storage level snapshot.
Speaker:So avoiding some of these issues, but not everyone is aware of it.
Speaker:Not everyone is using a third party storage array that integrates with VMware.
Speaker:So.
Speaker:Just
Speaker:Yeah.
Speaker:So, right.
Speaker:Thanks for, thanks for clarifying that.
Speaker:This is specifically VMware level snapshots that are done by VMware.
Speaker:And without any third party storage.
Speaker:Yeah.
Speaker:And I don't know why VMware did this, but it's bonkers.
Speaker:It's just literally one of the weirdest, codest thing, weirdest
Speaker:coded things I've ever heard.
Speaker:Why would you do it that way?
Speaker:Somewhere in a meeting, this is how they decided to
Speaker:it.
Speaker:was probably easier
Speaker:and yeah, yeah, maybe it was easier,
Speaker:and they didn't talk to the
Speaker:I wonder about that.
Speaker:They didn't, they exactly, they did not talk to the backup folks.
Speaker:Well, uh, I think we have, uh, summarized the world of snapshots.
Speaker:That?
Speaker:No, I think we did a good job with that.
Speaker:So, copy on write, redirect on write, dumbest method ever.
Speaker:Those are the three types.
Speaker:I've got it officially in the book, uh, I've got this labeled
Speaker:as the hold all writes method.
Speaker:Uh, I should really just change that to the dumbest method ever.
Speaker:But, um, yeah.
Speaker:So, you know, snapshots are a great tool.
Speaker:In the backup and recovery arsenal.
Speaker:They are the great sort of basis upon which we're going to talk about one
Speaker:of my favorite ways to do backup.
Speaker:And we're going to talk about that in another episode.
Speaker:Hint, it's called near CDP, not CDP.
Speaker:It's called near CDP.
Speaker:And, uh, it's just, just the number one thing you have to understand
Speaker:about snapshots is that unless you have copied this snapshot to
Speaker:another location via some mechanism.
Speaker:Which could be backup.
Speaker:It could be replication of the volume.
Speaker:It could be a number of things.
Speaker:You do not have a backup.
Speaker:You have a picture of your volume.
Speaker:And that picture of your volume is as worthless as a picture of your
Speaker:house after your house burns down.
Speaker:It'll just be a nice memory and, uh, and a really bad day.
Speaker:So it's just, that's the really, the most important thing to
Speaker:understand about snapshots.
Speaker:And now if this is your first time, snapshots have been explained to you.
Speaker:Now you understand why I don't like it that they call.
Speaker:What AWS does snapshots because that very much does not meet
Speaker:the definition that we just had.
Speaker:And I'm glad I brought this up because it's important to what we're talking
Speaker:about is storage level snapshots.
Speaker:Darn it.
Speaker:I don't know
Speaker:You can't call it that, yeah, because
Speaker:this.
Speaker:Yeah, these are traditional snapshots.
Speaker:There are other things out there that people call snapshots
Speaker:that don't work like this.
Speaker:AWS snapshots don't work like this.
Speaker:Uh, they are an actual image copy.
Speaker:It's actually, they actually, when you make an AWS snapshot, it actually copies
Speaker:that, that point in time out to another area of storage, which happens to be S3,
Speaker:you?
Speaker:And I think specifically you're talking about an AWS EBS snapshot
Speaker:thank you.
Speaker:I am talking about an AWS EBS snapshot.
Speaker:Um, my former employer, Druva, they call what they do snapshots.
Speaker:They call their backups snapshots.
Speaker:I never liked that, but you know, nobody asked me.
Speaker:Uh, so, but what we're talking about here is traditional snapshots.
Speaker:And, um, a lot of other people will call what they do a snapshot.
Speaker:Um, the problem is like a lot of terms in the, in the backup world.
Speaker:It's a term like so many of our terms are, um, their words that are used
Speaker:just, they're just English words that are used in so many different contexts.
Speaker:when, like when we had the CDP episode, we couldn't figure out what to call
Speaker:those point in time, because a lot of the CDP vendors called them snapshots.
Speaker:Yeah, exactly.
Speaker:Yeah, exactly.
Speaker:All right.
Speaker:Well, uh, I guess the only thing left for me to say is that's a wrap