You've found the backup wrap up your go-to podcast for all things
Speaker:backup recovery and cyber recovery.
Speaker:In this episode, we take a look at the use of artificial intelligence in backup.
Speaker:Can AI make your backup environment actually better?
Speaker:Prasanna Malaiyandi and I discuss AI and how it can help from
Speaker:possibly everything from scheduling backups to detecting ransomware.
Speaker:We talk about using it for deduplication, for capacity planning,
Speaker:and even helping you to write better disaster recovery plans.
Speaker:It's time to talk about AI and backups.
Speaker:Hope you enjoy it.
Speaker:By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.
Speaker:Backup, and I've been passionate about backup and recovery for over 30 years.
Speaker:Ever since I had to tell my boss I. That we had no backups of that really
Speaker:important database that we had just lost.
Speaker:I don't want that to happen to you, and that's why I do this podcast.
Speaker:On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.
Speaker:This is the backup wrap up.
Speaker:Welcome to the show.
Speaker:Hi, I'm w Curtis Preston, AKA, Mr. Backup, and I have with me a guy who apparently
Speaker:doesn't know how to hold a coffee cup.
Speaker:Prasanna Malaiyandi, how's it going?
Speaker:Prasanna
Speaker:I am good, Curtis.
Speaker:I. So I think we need to clarify a
Speaker:are you defending yourself?
Speaker:Are you gonna try to defend your weirdness?
Speaker:I think we have to talk about multiple things.
Speaker:First.
Speaker:In India, they don't typically use like a mug.
Speaker:They use like a stainless steel cup, right?
Speaker:So if
Speaker:you're drinking hot beverages, you can only hold it from like the very,
Speaker:you saw it when we went to the Indian
Speaker:restaurant in San Diego,
Speaker:yeah, yeah.
Speaker:you have to hold it from the very top, otherwise you'll burn your hand.
Speaker:Right.
Speaker:And then most mugs, it just feels weird.
Speaker:Like I got, I got chunky fingers, like sausages, right?
Speaker:And so like putting it inside the mug, like the handle part of the mug.
Speaker:I feel like, especially if it's like a curve, not like a straight, I feel like
Speaker:there's not enough stability there.
Speaker:That's fascinating.
Speaker:So for people watching the video, who, by the
Speaker:way, we do publish a video on YouTube if you want to see our
Speaker:glorious faces and our expressions.
Speaker:But yeah, so when I hold a mug, I don't hold it like this through the
Speaker:handle.
Speaker:I basically grab it either from the top or I hold it like on the side.
Speaker:And then of course, the Pinky's kind
Speaker:The pinky,
Speaker:the bottom.
Speaker:But what's weird though is the pinky supporting the bottom thing.
Speaker:I know you've complained to me many times, but that's also how I hold my phone.
Speaker:you end up covering your microphone.
Speaker:always hold the phone and
Speaker:then my Pinky's kind of on the bottom, and so it always blocks the microphone.
Speaker:So Curtis is always
Speaker:like, were you underwater?
Speaker:Did you swallow your phone?
Speaker:What's going on?
Speaker:So regarding your defense from, you know, how they hold, do things in India.
Speaker:What part of India were you born in?
Speaker:Uh, just remind
Speaker:Yeah, I was, uh, born in not India, but,
Speaker:but at home.
Speaker:Right.
Speaker:But yeah,
Speaker:you were, you were raised by people born in
Speaker:India, and so you were, you were taught,
Speaker:yeah.
Speaker:And so actually I prefer, so even drinking water.
Speaker:I don't drink from a glass cup.
Speaker:I drink from a stainless steel cup.
Speaker:Right.
Speaker:Which is, if you haven't spent any time around, you know,
Speaker:Indians, you wouldn't know that.
Speaker:It's just that you use a lot, you use stainless steel for cups, for plates,
Speaker:right.
Speaker:As Curtis knows what I'm loading, the dishwasher and
Speaker:he's like, what is that racket?
Speaker:what is happening over there?
Speaker:Because everything's so noisy.
Speaker:They last longer and you don't have to worry about them breaking.
Speaker:That's, you know, I can't, I can't complain.
Speaker:Yeah.
Speaker:Uh, but yeah, I don't get the whole knot, you know?
Speaker:Here I am with four fingers in my mug.
Speaker:I'm just saying.
Speaker:Okay, so now what if that mug was smaller and the handle was curved,
Speaker:Well, then that's like a, that's like a girly mug and then,
Speaker:then you use two fingers like
Speaker:this.
Speaker:feel like it gives you enough stability?
Speaker:And yet I've never dropped a mug.
Speaker:I'm
Speaker:just saying.
Speaker:It's not from dropping the mug.
Speaker:It's from like when you, yeah.
Speaker:See when you're drinking it, it just feels like it's a little like,
Speaker:Yeah.
Speaker:Um,
Speaker:all over you.
Speaker:I just think you don't know how to hold a mic, but.
Speaker:Our listeners are probably like, what are these people talking about?
Speaker:By the way, this is a new format starting in the new year.
Speaker:We are now gonna just be talking about coffee and all the crazy
Speaker:things that Prasanna does.
Speaker:Yeah, absolutely.
Speaker:Um, or maybe we might actually talk about some stuff.
Speaker:So I thought, um, you know, we've been seeing, uh, AI on the news a lot,
Speaker:right?
Speaker:ai, I've never heard about it.
Speaker:Yeah, I've never, never heard of it.
Speaker:Yeah.
Speaker:So artificial intelligence, and if, if you've been following the backup
Speaker:industry much, you probably saw a few announcements from your, uh, backup
Speaker:company or maybe backup companies you're interested in about the use of ai.
Speaker:Within backup.
Speaker:And so I thought we'd talk about that a little bit,
Speaker:um, in this episode, and
Speaker:whether or not it has a use, right?
Speaker:And can, just to clarify, I think when a lot of these backup vendors launched ai,
Speaker:they were using AI for like the, not for the core product, right?
Speaker:So they were using AI for their support agent, or to help answer questions, right?
Speaker:Which I think we all understand, we all know about, but I think in this
Speaker:episode, I think we should focus on like the core part of backup.
Speaker:Yeah.
Speaker:So, so let's talk a just a little bit about, you know,
Speaker:what we mean when we say ai.
Speaker:There are different categories of ai and then also there's machine learning, which
Speaker:is very closely, and honestly, I, I, I,
Speaker:you know, I think I could describe the difference between machine
Speaker:learning and ai, but then there's something that, that.
Speaker:Changes, you know, that, that messes me up when we talk about that.
Speaker:Um, I'll just, for those of you that actually really know what AI
Speaker:is and machine learning is, you're gonna be offended by something
Speaker:I say during this episode.
Speaker:I, I'll just tell you that.
Speaker:But we're gonna use the terms almost interchangeably, but they're not.
Speaker:Uh, but I do want distinguish between.
Speaker:What is referred to as generative ai, right?
Speaker:Which is a, you know, a large language model that is
Speaker:going to create things there.
Speaker:It's not ex nihilo, right?
Speaker:It's not from, it's not from nothing.
Speaker:It's it, it has to, it has to have been trained on a large data set.
Speaker:But, those are the kinds of things that they're using,
Speaker:like you talked about there.
Speaker:Sup for support
Speaker:models, right?
Speaker:And, And,
Speaker:just as examples of large language models, you might've heard about
Speaker:meta's llama, lama three, Lama four, there's chat, GPT or open ais.
Speaker:What is it?
Speaker:OPT?
Speaker:What,
Speaker:the, the, actual model.
Speaker:the
Speaker:underlying model.
Speaker:Oh, okay.
Speaker:I, I, I would just, I would've just said chat, GPT.
Speaker:'cause everybody knows what chat GPT
Speaker:is, right?
Speaker:I mean, you've got copilot, you've got, you've
Speaker:got, Yeah, you, so you've got Claude from Anthropic.
Speaker:Um, there are a lot of people, you know, um, confused the company with the product.
Speaker:But, um, these are the, these are the ones that are grabbing
Speaker:all the headlines, right?
Speaker:They're also, they're also writing large bodies of texts.
Speaker:They're helping people to write books.
Speaker:They're helping people to do art.
Speaker:That, and there's a lot of, um.
Speaker:A lot of legal discussions around that, around the use of things like
Speaker:the books that I've written as, um, you know, feeding into that and, um,
Speaker:the, we're not talking about that,
Speaker:right?
Speaker:Um, we're not gonna talk about, Hey, um, chat GPT.
Speaker:My restore didn't work.
Speaker:Can you recreate all my documents?
Speaker:Um, it's not,
Speaker:there's not gonna be anything like that, at least not yet.
Speaker:Um, the, um, we're gonna talk about how AI can be used to basically
Speaker:enhance the core functionality.
Speaker:I mean, you said this in way, a fewer words a few minutes ago,
Speaker:but, uh, basically how it could be used to make backups better.
Speaker:And I think a good chunk of this is really, like you said, more
Speaker:around machine learning models,
Speaker:right,
Speaker:right,
Speaker:large language models.
Speaker:right.
Speaker:So the, the first section we will just talk about how potentially just talk about
Speaker:this is just sort of thoughts out loud.
Speaker:I know that we have a lot of vendors that listen to the podcast.
Speaker:We are.
Speaker:Technically aimed at the, the people who actually use backup and
Speaker:recovery, but I know a lot of vendors use the podcast, so feel free to
Speaker:take this episode and run with it and
Speaker:do stuff.
Speaker:So I, I guess the first question would be, do we think that, uh, machine learning
Speaker:can be used to help just to prove the efficiency of the backup process itself?
Speaker:What do you think about
Speaker:Oh, a thousand percent.
Speaker:A billion percent, Curtis.
Speaker:So I've never actually had to implement a backup system.
Speaker:But you've done
Speaker:tons of this, right?
Speaker:And how do you go about just planning your backup, right?
Speaker:How to back up an infrastructure, right?
Speaker:It's like, just walk us through that, right?
Speaker:And how many spreadsheets and all the rest that you have in
Speaker:order to try to optimize these.
Speaker:Yeah, I, I think about that a lot.
Speaker:And, and, and, and, and the answer is gonna depend greatly on the
Speaker:product that you're using, right?
Speaker:You know, I, I can think of.
Speaker:The traditional way is that you're going to create some kind of schedule, some
Speaker:kind of, uh, automatic backup schedule.
Speaker:Um, and you're going to do a, again, traditionally we'll
Speaker:do three categories here.
Speaker:Traditionally you've got some full backups and you're gonna do some
Speaker:full backups every once in a while.
Speaker:Um, and I was always a proponent if you had to do full backups, I was
Speaker:always a proponent of doing those.
Speaker:No.
Speaker:More often than once a month.
Speaker:Um, back in the days of tape, it was once a week because
Speaker:it, was
Speaker:complicated the restore process.
Speaker:Yeah.
Speaker:But, um, you know, doing it no more often than once a month, but depending on your
Speaker:backup product, you might be able to, to
Speaker:spread that out even over like three months.
Speaker:And then you also want to schedule, if your backup product
Speaker:is capable of doing it, you wanna schedule a cumulative incremental.
Speaker:A differential, some products call it.
Speaker:Um, and then of course the daily incremental.
Speaker:Right.
Speaker:So spreading
Speaker:that all
Speaker:for one application you're talking about,
Speaker:E exactly.
Speaker:You're doing this per application, per server.
Speaker:Um, and, and you're trying to load balance things out because if you've
Speaker:properly designed your system, it's probably not capable of doing a full
Speaker:backup of your environment in one night.
Speaker:Right.
Speaker:Um, because that would just be really expensive, and then the rest of the
Speaker:time it would go completely unused.
Speaker:Right?
Speaker:Um, so you, you buy it so that it's you, you size it so that it's big
Speaker:enough to do a full backup over time.
Speaker:And, um, you're right that, that, that scheduling that out is problematic, right?
Speaker:Um, and you, you definitely could use, um, uh, AI
Speaker:or ML to, to do that.
Speaker:And even for the scheduling aspect.
Speaker:So we talked about the applications, and then you were talking about sort
Speaker:of that infrastructure piece, which is shared and you now have to worry
Speaker:about it across all of these things.
Speaker:And I'm sure you had these bonkers spreadsheets that you
Speaker:were creating, trying to do this.
Speaker:Did it stretch all the way to the moon and back, by the way?
Speaker:Well, you know me for, it wasn't even a spreadsheet, it was just, uh, it, it was a
Speaker:script.
Speaker:Right.
Speaker:I would, I would just script all this nonsense.
Speaker:Right?
Speaker:Um, but it, but it, the bigger the environment, the more.
Speaker:That doing it programmatically made sense, right?
Speaker:Um, and, and by the way, even if you have a more modern backup tool
Speaker:that does incremental forever, there are many applications that
Speaker:won't, that won't let you do
Speaker:that.
Speaker:Right?
Speaker:I think of like database backups still need to be done every, you know, a full
Speaker:backup every so often, and you have to schedule these out,
Speaker:And that's the
Speaker:second category.
Speaker:'cause I know you talked about three categories.
Speaker:Yeah.
Speaker:Oh yeah.
Speaker:Oh, well the three categories were, yes.
Speaker:Uh, thank you.
Speaker:I'm glad I have you here sometimes, you know.
Speaker:Yeah.
Speaker:So you have the, the, the old school full and incremental,
Speaker:which old school is still current
Speaker:school.
Speaker:If we're talking about regular apps, then there's the forever incremental type.
Speaker:Um, and you don't, you, you do have to worry about scheduling those,
Speaker:but generally you just sort of tell 'em all to start at once and then
Speaker:they queue and then it is not, it's, it's a lot simpler to do those.
Speaker:I. But then the final category are ones that actually, um, and I
Speaker:think the one that probably stands out the most here would be Rubrik,
Speaker:right?
Speaker:Rubrik doesn't let you schedule, um, that
Speaker:stuff.
Speaker:You tell it what your RTO
Speaker:is and your RPO, and it just does the backups.
Speaker:I mean, in fact, there are people that complain that you cannot, at least
Speaker:last time I checked, you could not do.
Speaker:a a manually scheduled backup if you wanted to tell it when to do stuff.
Speaker:Um, I, I think this is probably the first use of some sort of machine learning
Speaker:or artificial intelligence that I can think of with regards to scheduling.
Speaker:Which, which I was also gonna chime in.
Speaker:So the first two methods you talked about, right?
Speaker:You're kind of statically doing this upfront, setting the schedules and
Speaker:hoping that forever that it will be good,
Speaker:Right.
Speaker:You'll always be able to meet it, but say that there's an additional load or a
Speaker:server goes down or something else, right.
Speaker:There's no way to fine tune and adjust that,
Speaker:Well, well, I, Well, there, I mean, there is, but there's
Speaker:no way to automatically fine
Speaker:tune and Yeah.
Speaker:Yeah.
Speaker:Right.
Speaker:And so you're just like, okay, maybe it'll fail a couple times
Speaker:and then I'll adjust the policies and then I'll be fine, but Right.
Speaker:Versus something like an SLA based, which I, I actually have
Speaker:looked at rubrics in the past,
Speaker:and I find that very enticing because really in the end, you
Speaker:care about what your RPO and RTO,
Speaker:Yeah.
Speaker:No one cares if you can back up.
Speaker:They only care if you can restore.
Speaker:the problem though is it's such a big paradigm shift for a lot of backup admins
Speaker:that it's very difficult to understand because it's like when people move
Speaker:from on-premises to the cloud and they were concerned because they're like,
Speaker:I can't touch and feel my equipment.
Speaker:Right.
Speaker:It's not something I could actually do.
Speaker:I think that's also the same challenges you get when you move
Speaker:from sort of, uh, schedule-based backups to sort of SLA based backups.
Speaker:Yeah, I, I liked, I liked the idea a lot.
Speaker:I, I, I still, again, you know, if I was, if I was running rubric,
Speaker:I would give people the ability to do a manual backup if they
Speaker:wanted to.
Speaker:But, but I do really like the idea of SLA driven backups,
Speaker:because I like the idea of SLAs.
Speaker:You know, we've talked about SLAs on here, and I like the idea of.
Speaker:Knowing the back backups were being done often enough to meet my SLAs.
Speaker:I
Speaker:really liked that idea.
Speaker:The one thing I think that is useful with these sort of approaches is
Speaker:we've talked about the fact that like your environment doesn't say static.
Speaker:Right.
Speaker:So as you're adding new workloads, as things are changing, you don't
Speaker:want to have to go recompute your entire spreadsheet or your
Speaker:script H every single time.
Speaker:So it's nice to have sort of these models that can automatically help fine tune and
Speaker:optimize so you're not wasting your time because it's more than likely that you're
Speaker:not gonna get it right the first time if you manually try to reset some of these
Speaker:things.
Speaker:And so having this automatic thing that constantly is
Speaker:adjusting just seems amazing.
Speaker:Yeah, it does.
Speaker:And I, and outside of Rubrik, I'm not aware of any tools that do that.
Speaker:Uh, but I, I think that this could certainly be a way where
Speaker:they could use AI to do that.
Speaker:Um, the.
Speaker:And I, and I was thinking about, again, going back to it, it's been a
Speaker:while since I've had to do this in a production environment, but the, the
Speaker:the first thing that you have to find out is how big is everything, right?
Speaker:How big is, is everything from a database perspective and
Speaker:how, how long does it take?
Speaker:'cause there's all these different, and that's the thing that nobody knows.
Speaker:Right.
Speaker:How big is your, how big is your data center?
Speaker:And they're like, I don't know.
Speaker:I don't know.
Speaker:And so like, you have to do a full backup first
Speaker:before you have any idea.
Speaker:And not every server backs up at the same speed and all these different things.
Speaker:So yeah, it it is a
Speaker:complicated
Speaker:and you may not be able to back up everything at the same
Speaker:time because there might be
Speaker:different hours, right?
Speaker:That
Speaker:a server is sort of offline or has less load that you can actually do it.
Speaker:Yeah, so having some sort of AI or ml, um, figure that out sounds amazing.
Speaker:Right?
Speaker:Another area where I think that this could help is very, very closely related, and
Speaker:that is, and, and some backup products do have this and that is making sure
Speaker:that everything in my data center.
Speaker:Is backed up in some
Speaker:way, right?
Speaker:Usually where you see this is an integration with like, um, uh,
Speaker:VMware or, uh, AWS, et cetera, right?
Speaker:Um, basically just connect to my entire, uh, you know, control
Speaker:panel and then just look and make sure that everything is connected
Speaker:to some type of policy to back it
Speaker:up.
Speaker:I, I think.
Speaker:a default policy if anything is created, so at least everything
Speaker:is protected, even though
Speaker:it may not be protected with the right thing, but at least it's
Speaker:being protected and you don't have to worry about these gaps.
Speaker:I.
Speaker:Exactly.
Speaker:Exactly.
Speaker:Um, and I, I think you do see this in a lot of backup products.
Speaker:Usually again, it's with integration
Speaker:with, uh, big things like VMware, HyperV, AWS, um,
Speaker:you know, et cetera.
Speaker:you need the companies, those vendors, to actually provide the APIs to be
Speaker:able to do these sort of queries, and I think that's where there's kind
Speaker:of a little bit of a tension there,
Speaker:Yeah.
Speaker:Yeah.
Speaker:I mean, theoretically you could scour the data center, right?
Speaker:Uh, looking for new computers.
Speaker:Again, I, I know I mentioned this before, but you know, back
Speaker:in the day we did that, right?
Speaker:And back in the day we did that with Vizio.
Speaker:Um, the, the vis, there used to be a very
Speaker:expensive version of Vizio that would just literally crawl your data center.
Speaker:And it used, uh, some very interesting technology.
Speaker:Um, I forgot the, the name of this, but like, inmap
Speaker:does this, where it, what it does is it sends a malformed packet.
Speaker:It finds an IP address, it sends a malformed packet to that IP address
Speaker:to see how it responds, and different things respond in different ways.
Speaker:And that's how it, that's how it, um,
Speaker:That
Speaker:is crazy that they built that.
Speaker:Yeah.
Speaker:Yeah.
Speaker:Um, and so you, you could theoretically do that, but a agreed, it's much easier
Speaker:if you just have, everything's gonna be in VMware or AWS and then just talk to AWS.
Speaker:Now again, going to VMware and AWS, there can be multiple virtual data centers.
Speaker:There can be
Speaker:multiple AWS accounts.
Speaker:So you, you, you want to make sure that, that you have some way to, to
Speaker:do that.
Speaker:And I, and I do like that idea.
Speaker:Shadow it.
Speaker:Yeah, shadow it bad,
Speaker:especially when it comes to backup.
Speaker:Right.
Speaker:Um, again, I'll tell a story from back in the day was the time that someone came to
Speaker:me and they had, they were DBAs and they, they gave me a directory of a database.
Speaker:They wanted me to restore.
Speaker:Restore, and it was temp, um slash TMP on a, on a HP box.
Speaker:And for those that don't know slash TMP on an HP box specifically, HPUX was in ram.
Speaker:So when you rebooted it, temp went away.
Speaker:And this, um,
Speaker:this
Speaker:it source code,
Speaker:what I.
Speaker:it?
Speaker:Source code
Speaker:It was source code.
Speaker:Yeah.
Speaker:And they were developing for months, like an entire team of
Speaker:developers developing source code of this new application in temp.
Speaker:And then we rebooted the server and they, and they came to me
Speaker:and asked me to restore it.
Speaker:And I was like, dude, we don't back up temp. I don't know
Speaker:what you're talking about.
Speaker:Like, and they're like, dude, this is really important,
Speaker:like heads are gonna roll.
Speaker:And I'm like, yeah, not mine.
Speaker:Like everybody knows we don't back up temp.
Speaker:Except for you, apparently.
Speaker:Oh
Speaker:Uh, so it's, I'm just, you know, it's really bad when you have
Speaker:a functioning system and then it's not being backed up again.
Speaker:Another story we used to have, um, we had a, a naming convention.
Speaker:Ours was very boring.
Speaker:Um, it, it was, it H-P-D-B-S-V-A, right?
Speaker:HP database server A, and there was HB FS oh one, et
Speaker:cetera, right?
Speaker:And I remember, and I had this form that you had to fill out.
Speaker:This was an actual piece of paper.
Speaker:We did
Speaker:not have web pages.
Speaker:Right?
Speaker:You had this form that you fill out and, and you had to, and, and it, it said
Speaker:on there, simply filling out this form is not, does not meet the requirement.
Speaker:You do not consider your system backed up until you have a signed form back from me.
Speaker:Right?
Speaker:And then one day somebody handed me a form and it said like.
Speaker:They wanted, like me to back up H-P-D-B-S-V-M, right?
Speaker:And I go, M that's interesting.
Speaker:The last server I remember hearing about was H. So that means there's an I, A
Speaker:J, A K, and an L out there somewhere.
Speaker:hasn't been backed up.
Speaker:That hasn't been backed up.
Speaker:Yeah.
Speaker:Um, so this idea of automatically
Speaker:detecting servers and applications sounds like a great
Speaker:idea.
Speaker:And also not just VMs, but also detect, it would be really
Speaker:nice if it detected the type of
Speaker:VM and said, this appears to be a SQL instance.
Speaker:We should back it up with the default SQL
Speaker:policy.
Speaker:That would be great.
Speaker:So in addition to making things more efficient, um, there are some
Speaker:other things we could do, uh, with AI that also would be interesting.
Speaker:Uh, what do
Speaker:you think is the, the first one?
Speaker:No.
Speaker:So I think one of the ones, and we've talked about it so much, so often,
Speaker:and vendors are starting to do this, it's around anomaly detection and
Speaker:it could be used in various fashion.
Speaker:So one thing is like, Hey, by the way, this server, all of a sudden it's backing
Speaker:up 10 times what it normally does.
Speaker:Maybe this might indicate like a malware or ransomware on the system.
Speaker:Right?
Speaker:Um.
Speaker:Or Hey, I've noticed that there's a bunch of data that's starting
Speaker:to look like based on entropy.
Speaker:That it's been encrypted, that doesn't look normal.
Speaker:Okay, maybe I should go investigate it, right?
Speaker:So, or it could even be security things like, Hey, you're logging
Speaker:in from a different place than normal as a backup admin.
Speaker:Is this the right thing or not?
Speaker:Yeah.
Speaker:And also very closely related to the stuff you said before was, uh,
Speaker:are files where the file type based on the first few bytes of the file,
Speaker:does not match the extension of the
Speaker:file.
Speaker:So it says it's a dot doc, but the first few bites of the file
Speaker:show that it's an application, for
Speaker:Sorry, one
Speaker:Yeah, that's an interesting use case around, uh, the first few bites because
Speaker:that could detect things that are being encrypted or other things that don't
Speaker:make sense, or potentially even malware.
Speaker:Right.
Speaker:Yeah, it, uh, it's something we do, you know, my, uh, employee is S two
Speaker:data and we do a lot of restores of old stuff, um, where we're pulling
Speaker:data off of tape often for, um, I. For e-discovery purposes and lawsuit
Speaker:purposes and, um, investigation purposes.
Speaker:And one of the things that we do as we're pulling data, 'cause we
Speaker:use a, a, a proprietary tool that we've written to restore data off
Speaker:of most backups rather than use the built in tool for a lot of reasons.
Speaker:Um, and this is one of them is that we check the file type against the file
Speaker:contents and, uh, it can, it can also indicate.
Speaker:Um, uh, subterfuge,
Speaker:right?
Speaker:Um, it can indicate somebody trying to hide something.
Speaker:Um, but yeah, so anomaly detection, I think is a really big one.
Speaker:Uh, right.
Speaker:Definitely that this is a, this is a, you looks like you've got ransomware, right?
Speaker:You need
Speaker:to solve that.
Speaker:That was probably the, the first big use of AI that I
Speaker:remember, uh, in, in the backup world.
Speaker:And I, I, I will say that if.
Speaker:The way that you know, that you have ransomware is that your backup
Speaker:product told you something is wrong, but, uh, but it, but it can
Speaker:happen.
Speaker:Right.
Speaker:Um, another one that I'll talk, uh, that I'd bring up is, is data classification.
Speaker:Again, I think that.
Speaker:This is, this is probably a very simple one, but the
Speaker:idea of like, looking at all the different data types and helping you to
Speaker:understand what is in your environment.
Speaker:This is not that new.
Speaker:Um, but perhaps the AI use case could be helping you to identify trends,
Speaker:um, and, and where the data's moving, where it's being created, where
Speaker:it's being changed, uh, et cetera.
Speaker:Um, and, and then, which is very closely related to my
Speaker:other idea, which is predictive
Speaker:analytics.
Speaker:Right.
Speaker:Um, again, going back to, uh, you know, back in the day,
Speaker:one of the things I remember being the hardest to do is capacity prediction.
Speaker:You
Speaker:know, predicting whether or not I have enough capacity To
Speaker:do my backups for the next six
Speaker:and you know what makes it even harder?
Speaker:What's that?
Speaker:It does, d ddu makes it way harder.
Speaker:And you know what AI right?
Speaker:Ai ml could, could use to, could be used because it's smarter than I am.
Speaker:Smarter than you are.
Speaker:It could actually understand the trends
Speaker:as to now what, what, let's talk about that Non, not every,
Speaker:everybody might not understand.
Speaker:Why DDU makes capacity,
Speaker:Sure.
Speaker:uh, management so
Speaker:So let's talk about the, before we get to D Dub, let's talk about like
Speaker:traditional storage or tape, right?
Speaker:So
Speaker:you're doing a full backup, you know how big your database is, therefore,
Speaker:you know, okay, my full backup is gonna take this much space and
Speaker:you know, with compression, maybe it's gonna be two x or half the space, right?
Speaker:And then, you know, okay, my daily change rate is say 5%, and based on the
Speaker:total size, I know what that's gonna be.
Speaker:And so
Speaker:if I'm doing weekly fulls, daily incrementals, I know how much
Speaker:storage I'm gonna need for a week.
Speaker:Yeah.
Speaker:And, and just as, and just as important, you also know how
Speaker:much storage, when you delete
Speaker:the, you know, the older backups.
Speaker:Yeah.
Speaker:You know how much storage will be freed up, which is just if, if not even more
Speaker:important.
Speaker:Now the problem with deduplication is they talk about these great rates like
Speaker:40 x, 30 x, 20 x, take your pick, right?
Speaker:And that's all great.
Speaker:If you're all like if a lot of your data is very similar, but it's hard
Speaker:to tell, is your data similar or not until you've actually start doing it.
Speaker:So if you're trying to buy storage for, say, three years
Speaker:ahead of time, a capacity plan.
Speaker:It becomes really difficult.
Speaker:And so you guess, right?
Speaker:You'll take a stab and maybe you look at some of your data and you're like,
Speaker:Hey, these kind of look the same, but you don't know if that's right or not
Speaker:until you actually start backing it up.
Speaker:And like you said, Curtis, if you go delete your backup, you may not
Speaker:actually free up that space because it's been de-duplicated against something
Speaker:else that you're still preserving.
Speaker:right,
Speaker:Say I go delete my backup for six months ago for one application.
Speaker:Another application might have, uh, common blocks with that data or with that other
Speaker:application.
Speaker:And so even though I deleted the first application's backup,
Speaker:it's not gonna free up space.
Speaker:And so you end up with this problem and this challenge.
Speaker:And that's one of the things, the hardest things about deduplication.
Speaker:Having worked at a company that did deduplication, customers
Speaker:always struggled with it,
Speaker:Yeah,
Speaker:And some of the
Speaker:things we would do is we would be like, Hey, let's scan your
Speaker:application and just understand what sort of DDU rates you may get.
Speaker:And even that's a guess, because maybe you move an application from one storage
Speaker:appliance to a different appliance and now your DDU rates are different.
Speaker:Yeah.
Speaker:And, and, and again, the
Speaker:one of the most frustrating things could be if you, you start.
Speaker:You're running outta capacity, right?
Speaker:And so you say, listen, I know we said we wanted to keep backups for
Speaker:three years, but we're running outta capacity and so we're gonna start
Speaker:deleting three years minus a month.
Speaker:And you do that and you get
Speaker:back 0.1% of your, it can be very difficult.
Speaker:Um,
Speaker:fact that to free up that space takes time.
Speaker:Because typically with a lot of these systems, there's a background process
Speaker:typically called garbage collection,
Speaker:which goes and now needs to free up all this data and that does take time to run.
Speaker:Yeah, it is, it is a two stage process where you, you, you, um, flag that
Speaker:block for deletion and then another
Speaker:process that runs typically when backups aren't running.
Speaker:Um, and you, you probably have to force the garbage collection process.
Speaker:Um, so go, go ahead.
Speaker:so I was just thinking as we were talking about the first time
Speaker:that I heard about AI in storage,
Speaker:and I think the first company that I can recall, and I'm sure there
Speaker:were others, was actually nimble.
Speaker:Storage and nimble.
Speaker:What they did is their first product when they built they, so
Speaker:they provided primary storage.
Speaker:And their first product, they basically were like, Hey, we are optimized for sql.
Speaker:We are optimized for VMware.
Speaker:We are optimized for these different, and I was like, oh, that's pretty awesome.
Speaker:They're doing it dynamically.
Speaker:But I think at the time it was kind of a static thing where you
Speaker:would say, Hey, I have VMware.
Speaker:I'm writing into this data store.
Speaker:And it would optimize its, and it would basically pick different
Speaker:block sizes for deduplication
Speaker:Right, right, right.
Speaker:Yeah.
Speaker:That's interesting.
Speaker:The, the, the, I, I, I think div, going back to the thing
Speaker:we were talking about of like.
Speaker:Using AI to basically help me understand when do I need to order more storage?
Speaker:It can, to the best of its ability.
Speaker:It can actually look at all of the DDU rates, right?
Speaker:At all of the at, at what?
Speaker:It could look at the DDU rate of each individual backup, right?
Speaker:You, you gave, you told me it's a backup this much and this is
Speaker:how much, and so we can actually
Speaker:run all those calculations and I can actually figure out.
Speaker:Well in six months, based on if everything stays the
Speaker:same in six months, you're gonna be
Speaker:outta storage.
Speaker:So
Speaker:many vendors actually do.
Speaker:Yeah.
Speaker:Yeah.
Speaker:Um, so the, the, um,
Speaker:Because I think storage capacity is a little easier.
Speaker:To predict, because like you said, you're not really changing things, right.
Speaker:You know what your policy is.
Speaker:You know what data's coming in, you know how long it's, you're keeping it,
Speaker:you know what your deduplication rates are, you know how much it's filling up.
Speaker:So I think it's a little easier than what we had talked about previously
Speaker:where it's like, okay, now let me plan out my entire backup infrastructure
Speaker:and start scheduling that.
Speaker:Yeah.
Speaker:Speaking of dedupe, can AI help dedupe itself?
Speaker:Do you think that?
Speaker:can.
Speaker:So I think my biggest.
Speaker:Challenge would be that to run AI requires compute
Speaker:and usually backup.
Speaker:You want to go as fast as you can,
Speaker:Mm-hmm.
Speaker:right?
Speaker:And so I think there's that tension.
Speaker:That exists between running as fast as you can versus introducing
Speaker:something in the pipeline to that could potentially slow things down.
Speaker:And you'd have to also ask at what cost, right?
Speaker:Like, are you going to be saving, say 70% additional versus a traditional
Speaker:algorithms, or is it gonna be much less
Speaker:Yeah, I think ddu in, um, in the backup world, there, there, there
Speaker:have been two main ways to do ddu, which has been, there has been
Speaker:something that isn't really ddu, but
Speaker:there were DDU products that called themselves DDU products that did this.
Speaker:Uh, and that would be block level, um,
Speaker:incremental, essentially.
Speaker:Right?
Speaker:Not
Speaker:actually de-duping things against each other, but just.
Speaker:Using technology to lower the additional new data that's
Speaker:backed up from each workload.
Speaker:But then the traditional ddu, the way it works for those that don't know
Speaker:this, is that you slice it up, you slice everything up into what are
Speaker:typically called shards or chunks.
Speaker:You run some type of algorithm on it that gives you some type of thing.
Speaker:Like, like
Speaker:A fingerprint.
Speaker:the original SHA two,
Speaker:SHA 2 56.
Speaker:And again, here the, the better the algorithm, um, the better the ddu,
Speaker:but the better the algorithm, the more compute it takes going back to
Speaker:your trade off thing.
Speaker:And so, um, that's the way basically every chunk it's run through, you come
Speaker:up with this alpha numeric string, that alpha numeric string is compared
Speaker:with every other alpha numeric string.
Speaker:I. Um, and then that's how you identify redundant data.
Speaker:And one of the challenges you have with that method is that, uh, the data slides,
Speaker:um, and so if you don't slice the data at exactly the same spot it, it's duplicate
Speaker:data, but you don't, don't identify it.
Speaker:The, there is a completely different way which, um, you
Speaker:look at the way vast does things.
Speaker:They do something completely different, right?
Speaker:So they, they have an algorithm and, and I, I'm guessing they
Speaker:use AI or ML to, to, do this.
Speaker:They have an algorithm that, um, basically identifies data that
Speaker:is probably redundant, right?
Speaker:Um, that, that, so they, they've got two different ways to do de-dupe and I, so
Speaker:there are potentially, again, potentially.
Speaker:AI or ML could be used to identify a new way to identify duplicate
Speaker:data that is maybe, maybe
Speaker:more efficient from a compute and storage.
Speaker:Like even if it was just more efficient from a compute standpoint,
Speaker:but got the but got the same amount of dedupe, that would still
Speaker:be great.
Speaker:Um, but
Speaker:potentially this is something
Speaker:that I think, uh, AI could
Speaker:and the one thing I did also want to comment on Curtis is, uh, going back to
Speaker:your comment about, okay, if the data shifts, then now you have to make sure
Speaker:that you're doing the right blocks, right?
Speaker:Uh, this is where companies though have done sort of, uh, what you're
Speaker:talking about is called fixed block.
Speaker:Fixed block deduplication,
Speaker:right?
Speaker:There are
Speaker:many vendors out there though, who do variable size.
Speaker:Variable block, uh, deduplication, which allows it to vary such that if
Speaker:you do get an offset right, because of some data change, it's still able to
Speaker:dup everything else after that because
Speaker:of how it's actually computing the chunks, the segments, right?
Speaker:Each of
Speaker:the blocks.
Speaker:Yep.
Speaker:Um, so, uh, so that, that's certainly an area where, where AI could potentially
Speaker:help the, um, the next, do you think it could help with recovery testing?
Speaker:Oh yeah, I would.
Speaker:So one thing for C is like, most people probably don't
Speaker:know how to write a DR plan,
Speaker:Mm-hmm.
Speaker:Mm-hmm.
Speaker:right.
Speaker:Um, I wonder if you took ai, like even, and I'm going back to the first
Speaker:set, right, the large language models,
Speaker:Yep.
Speaker:So the thing we said we
Speaker:weren't talking about, I think we're gonna talk about it here.
Speaker:Yeah.
Speaker:I think at least to start with, it's like, Hey, here's all my data.
Speaker:Here's my applications.
Speaker:Help me build a DR test plan.
Speaker:Yeah,
Speaker:I like that idea.
Speaker:And
Speaker:see what it pops out because, and it may not be perfect, and don't just
Speaker:blindly trust what it provides, but use it as a starting point, right?
Speaker:And then go use that.
Speaker:Because I think a lot of people struggle with, where do I even start?
Speaker:Yeah.
Speaker:And you could also, um, you could use it like a chaos monkey,
Speaker:right?
Speaker:You could use it.
Speaker:Help me come up with some interesting scenarios.
Speaker:To just make the, the idea, you know, one of the things that we talked about with in
Speaker:terms of, uh, cyber testing, uh, was, um.
Speaker:You know, when we had Mike on the idea of like, doing this and, and
Speaker:making it, making it fun, making it a game, uh, I like that idea a
Speaker:lot and I think maybe AI could help
Speaker:there.
Speaker:Um,
Speaker:if, if it helps you do recovery testing more often, um, and, uh,
Speaker:helps you identify potential, uh, uh, plot, I was gonna say plot
Speaker:holes, uh, potential, potential holes in your program, uh, then that, then that
Speaker:I think could be, um, very
Speaker:helpful.
Speaker:And Curtis, since you threw out a term, Chaos Monkey is a tool that was released
Speaker:by Netflix, and literally what it is used for is to just test it, resiliency.
Speaker:So it'll go randomly, kill services, kill locations, kill
Speaker:network connections, just to see.
Speaker:Is streaming, interrupted, are, uh, end users having any sort of
Speaker:issues and it's able to do this at a scale and in an automated fashion
Speaker:versus someone like trying to think about all the combinations,
Speaker:permutations, and scenarios, because they're probably gonna miss things.
Speaker:And so Netflix designed this thing to actually go out and
Speaker:test their infrastructure.
Speaker:It is pretty impressive.
Speaker:Uh, you know, their infrastructure in general is pretty impressive.
Speaker:It's not flawless.
Speaker:Um, I did, I did watch part of the, uh.
Speaker:The Tyson fight a little while ago, and that was on Netflix
Speaker:and it was not good, right?
Speaker:That wasn't so much a resilient thing as it was.
Speaker:They just, again, they could have used perhaps a little bit better
Speaker:AI to predict the, what kind of load they were gonna have.
Speaker:But yeah.
Speaker:But the idea of predicting crazy things that will happen, uh, Netflix
Speaker:is pretty darn resilient, uh, when it comes to their infrastructure,
Speaker:Yep.
Speaker:yeah, I, I like that idea a lot.
Speaker:Um, and, and I think, I think this is something that could be, that,
Speaker:that, that, again, an, uh, uh, an LLM could actually help with, right?
Speaker:So, like I said, the thing that we said we weren't gonna talk about,
Speaker:we could talk about it, right?
Speaker:Um, and for those, if you've never used a chat, g PT or a Claude,
Speaker:uh, I think it's very useful
Speaker:here, right?
Speaker:You, you could say, Hey, I, I'm this kind of company.
Speaker:This is the type of company, you know, and I understand the,
Speaker:the privacy concerns of what you
Speaker:share with a chat g pt or a clot.
Speaker:Uh, there, there are, by the way, there are on-prem versions that
Speaker:you can run, uh, of these LLMs too, so that you can keep the
Speaker:data to yourself.
Speaker:But the, you have a conversation with it.
Speaker:Here's the type of company I am, here's the type of computing environment I have.
Speaker:What do you th what could go
Speaker:wrong?
Speaker:Um, you know what, what could I build a, a dr scenario
Speaker:around?
Speaker:Any final thoughts?
Speaker:Can you think of, uh, any other areas where we could use AI and, and backup?
Speaker:Not so much.
Speaker:I think the one thing I do wanna call out though is AI is here to stay.
Speaker:ML is here to stay.
Speaker:Don't be afraid of it.
Speaker:Use it.
Speaker:Right in the right ways and don't be afraid and just start thinking about it.
Speaker:Uh, the one other thing I will call out is as companies are starting
Speaker:to dig into AI and ML for their own applications, production applications
Speaker:and other things, as a backup admin, you need to start thinking
Speaker:about how do I protect this, right?
Speaker:How do I back it up?
Speaker:How would I potentially restore it?
Speaker:Because there's a lot of data and training these models.
Speaker:Is really, really expensive.
Speaker:Mm.
Speaker:And so you wanna make sure you have mechanisms to protect the models
Speaker:that emerge from all of this training so you can restore them if needed.
Speaker:So use backup to, to make AI more resilient while AI makes backup more
Speaker:resilient.
Speaker:I like that.
Speaker:We'll call that a symbiosis.
Speaker:I like that a lot.
Speaker:Uh, one my final thought is that potentially you could use, again,
Speaker:going back to the thing we said we weren't gonna talk about.
Speaker:You could use LLMs to help select vendors, right?
Speaker:You could say, Hey, here are all my requirements and here's all the
Speaker:documents that they, they gave me this 57 page response to my 10 page RFI.
Speaker:Can you help me make sense of it?
Speaker:Um, and, uh, you, you could use that again, trust but
Speaker:verify when using an LLM for
Speaker:sure.
Speaker:All right, well, thanks again, Prasanna, uh, for a good chat.
Speaker:Thank you, Curtis.
Speaker:And I am not gonna change how I hold a coffee mug.
Speaker:I'm sorry.
Speaker:I, I would expect no less.
Speaker:And thanks to our listeners, uh, we'd be nothing without you.
Speaker:That is a wrap.