Speaker:

You've found the backup wrap up your go-to podcast for all things

Speaker:

backup recovery and cyber recovery.

Speaker:

In this episode, we take a look at the use of artificial intelligence in backup.

Speaker:

Can AI make your backup environment actually better?

Speaker:

Prasanna Malaiyandi and I discuss AI and how it can help from

Speaker:

possibly everything from scheduling backups to detecting ransomware.

Speaker:

We talk about using it for deduplication, for capacity planning,

Speaker:

and even helping you to write better disaster recovery plans.

Speaker:

It's time to talk about AI and backups.

Speaker:

Hope you enjoy it.

Speaker:

By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.

Speaker:

Backup, and I've been passionate about backup and recovery for over 30 years.

Speaker:

Ever since I had to tell my boss I. That we had no backups of that really

Speaker:

important database that we had just lost.

Speaker:

I don't want that to happen to you, and that's why I do this podcast.

Speaker:

On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.

Speaker:

This is the backup wrap up.

Speaker:

Welcome to the show.

Speaker:

Hi, I'm w Curtis Preston, AKA, Mr. Backup, and I have with me a guy who apparently

Speaker:

doesn't know how to hold a coffee cup.

Speaker:

Prasanna Malaiyandi, how's it going?

Speaker:

Prasanna

Speaker:

I am good, Curtis.

Speaker:

I. So I think we need to clarify a

Speaker:

are you defending yourself?

Speaker:

Are you gonna try to defend your weirdness?

Speaker:

I think we have to talk about multiple things.

Speaker:

First.

Speaker:

In India, they don't typically use like a mug.

Speaker:

They use like a stainless steel cup, right?

Speaker:

So if

Speaker:

you're drinking hot beverages, you can only hold it from like the very,

Speaker:

you saw it when we went to the Indian

Speaker:

restaurant in San Diego,

Speaker:

yeah, yeah.

Speaker:

you have to hold it from the very top, otherwise you'll burn your hand.

Speaker:

Right.

Speaker:

And then most mugs, it just feels weird.

Speaker:

Like I got, I got chunky fingers, like sausages, right?

Speaker:

And so like putting it inside the mug, like the handle part of the mug.

Speaker:

I feel like, especially if it's like a curve, not like a straight, I feel like

Speaker:

there's not enough stability there.

Speaker:

That's fascinating.

Speaker:

So for people watching the video, who, by the

Speaker:

way, we do publish a video on YouTube if you want to see our

Speaker:

glorious faces and our expressions.

Speaker:

But yeah, so when I hold a mug, I don't hold it like this through the

Speaker:

handle.

Speaker:

I basically grab it either from the top or I hold it like on the side.

Speaker:

And then of course, the Pinky's kind

Speaker:

The pinky,

Speaker:

the bottom.

Speaker:

But what's weird though is the pinky supporting the bottom thing.

Speaker:

I know you've complained to me many times, but that's also how I hold my phone.

Speaker:

you end up covering your microphone.

Speaker:

always hold the phone and

Speaker:

then my Pinky's kind of on the bottom, and so it always blocks the microphone.

Speaker:

So Curtis is always

Speaker:

like, were you underwater?

Speaker:

Did you swallow your phone?

Speaker:

What's going on?

Speaker:

So regarding your defense from, you know, how they hold, do things in India.

Speaker:

What part of India were you born in?

Speaker:

Uh, just remind

Speaker:

Yeah, I was, uh, born in not India, but,

Speaker:

but at home.

Speaker:

Right.

Speaker:

But yeah,

Speaker:

you were, you were raised by people born in

Speaker:

India, and so you were, you were taught,

Speaker:

yeah.

Speaker:

And so actually I prefer, so even drinking water.

Speaker:

I don't drink from a glass cup.

Speaker:

I drink from a stainless steel cup.

Speaker:

Right.

Speaker:

Which is, if you haven't spent any time around, you know,

Speaker:

Indians, you wouldn't know that.

Speaker:

It's just that you use a lot, you use stainless steel for cups, for plates,

Speaker:

right.

Speaker:

As Curtis knows what I'm loading, the dishwasher and

Speaker:

he's like, what is that racket?

Speaker:

what is happening over there?

Speaker:

Because everything's so noisy.

Speaker:

They last longer and you don't have to worry about them breaking.

Speaker:

That's, you know, I can't, I can't complain.

Speaker:

Yeah.

Speaker:

Uh, but yeah, I don't get the whole knot, you know?

Speaker:

Here I am with four fingers in my mug.

Speaker:

I'm just saying.

Speaker:

Okay, so now what if that mug was smaller and the handle was curved,

Speaker:

Well, then that's like a, that's like a girly mug and then,

Speaker:

then you use two fingers like

Speaker:

this.

Speaker:

feel like it gives you enough stability?

Speaker:

And yet I've never dropped a mug.

Speaker:

I'm

Speaker:

just saying.

Speaker:

It's not from dropping the mug.

Speaker:

It's from like when you, yeah.

Speaker:

See when you're drinking it, it just feels like it's a little like,

Speaker:

Yeah.

Speaker:

Um,

Speaker:

all over you.

Speaker:

I just think you don't know how to hold a mic, but.

Speaker:

Our listeners are probably like, what are these people talking about?

Speaker:

By the way, this is a new format starting in the new year.

Speaker:

We are now gonna just be talking about coffee and all the crazy

Speaker:

things that Prasanna does.

Speaker:

Yeah, absolutely.

Speaker:

Um, or maybe we might actually talk about some stuff.

Speaker:

So I thought, um, you know, we've been seeing, uh, AI on the news a lot,

Speaker:

right?

Speaker:

ai, I've never heard about it.

Speaker:

Yeah, I've never, never heard of it.

Speaker:

Yeah.

Speaker:

So artificial intelligence, and if, if you've been following the backup

Speaker:

industry much, you probably saw a few announcements from your, uh, backup

Speaker:

company or maybe backup companies you're interested in about the use of ai.

Speaker:

Within backup.

Speaker:

And so I thought we'd talk about that a little bit,

Speaker:

um, in this episode, and

Speaker:

whether or not it has a use, right?

Speaker:

And can, just to clarify, I think when a lot of these backup vendors launched ai,

Speaker:

they were using AI for like the, not for the core product, right?

Speaker:

So they were using AI for their support agent, or to help answer questions, right?

Speaker:

Which I think we all understand, we all know about, but I think in this

Speaker:

episode, I think we should focus on like the core part of backup.

Speaker:

Yeah.

Speaker:

So, so let's talk a just a little bit about, you know,

Speaker:

what we mean when we say ai.

Speaker:

There are different categories of ai and then also there's machine learning, which

Speaker:

is very closely, and honestly, I, I, I,

Speaker:

you know, I think I could describe the difference between machine

Speaker:

learning and ai, but then there's something that, that.

Speaker:

Changes, you know, that, that messes me up when we talk about that.

Speaker:

Um, I'll just, for those of you that actually really know what AI

Speaker:

is and machine learning is, you're gonna be offended by something

Speaker:

I say during this episode.

Speaker:

I, I'll just tell you that.

Speaker:

But we're gonna use the terms almost interchangeably, but they're not.

Speaker:

Uh, but I do want distinguish between.

Speaker:

What is referred to as generative ai, right?

Speaker:

Which is a, you know, a large language model that is

Speaker:

going to create things there.

Speaker:

It's not ex nihilo, right?

Speaker:

It's not from, it's not from nothing.

Speaker:

It's it, it has to, it has to have been trained on a large data set.

Speaker:

But, those are the kinds of things that they're using,

Speaker:

like you talked about there.

Speaker:

Sup for support

Speaker:

models, right?

Speaker:

And, And,

Speaker:

just as examples of large language models, you might've heard about

Speaker:

meta's llama, lama three, Lama four, there's chat, GPT or open ais.

Speaker:

What is it?

Speaker:

OPT?

Speaker:

What,

Speaker:

the, the, actual model.

Speaker:

the

Speaker:

underlying model.

Speaker:

Oh, okay.

Speaker:

I, I, I would just, I would've just said chat, GPT.

Speaker:

'cause everybody knows what chat GPT

Speaker:

is, right?

Speaker:

I mean, you've got copilot, you've got, you've

Speaker:

got, Yeah, you, so you've got Claude from Anthropic.

Speaker:

Um, there are a lot of people, you know, um, confused the company with the product.

Speaker:

But, um, these are the, these are the ones that are grabbing

Speaker:

all the headlines, right?

Speaker:

They're also, they're also writing large bodies of texts.

Speaker:

They're helping people to write books.

Speaker:

They're helping people to do art.

Speaker:

That, and there's a lot of, um.

Speaker:

A lot of legal discussions around that, around the use of things like

Speaker:

the books that I've written as, um, you know, feeding into that and, um,

Speaker:

the, we're not talking about that,

Speaker:

right?

Speaker:

Um, we're not gonna talk about, Hey, um, chat GPT.

Speaker:

My restore didn't work.

Speaker:

Can you recreate all my documents?

Speaker:

Um, it's not,

Speaker:

there's not gonna be anything like that, at least not yet.

Speaker:

Um, the, um, we're gonna talk about how AI can be used to basically

Speaker:

enhance the core functionality.

Speaker:

I mean, you said this in way, a fewer words a few minutes ago,

Speaker:

but, uh, basically how it could be used to make backups better.

Speaker:

And I think a good chunk of this is really, like you said, more

Speaker:

around machine learning models,

Speaker:

right,

Speaker:

right,

Speaker:

large language models.

Speaker:

right.

Speaker:

So the, the first section we will just talk about how potentially just talk about

Speaker:

this is just sort of thoughts out loud.

Speaker:

I know that we have a lot of vendors that listen to the podcast.

Speaker:

We are.

Speaker:

Technically aimed at the, the people who actually use backup and

Speaker:

recovery, but I know a lot of vendors use the podcast, so feel free to

Speaker:

take this episode and run with it and

Speaker:

do stuff.

Speaker:

So I, I guess the first question would be, do we think that, uh, machine learning

Speaker:

can be used to help just to prove the efficiency of the backup process itself?

Speaker:

What do you think about

Speaker:

Oh, a thousand percent.

Speaker:

A billion percent, Curtis.

Speaker:

So I've never actually had to implement a backup system.

Speaker:

But you've done

Speaker:

tons of this, right?

Speaker:

And how do you go about just planning your backup, right?

Speaker:

How to back up an infrastructure, right?

Speaker:

It's like, just walk us through that, right?

Speaker:

And how many spreadsheets and all the rest that you have in

Speaker:

order to try to optimize these.

Speaker:

Yeah, I, I think about that a lot.

Speaker:

And, and, and, and, and the answer is gonna depend greatly on the

Speaker:

product that you're using, right?

Speaker:

You know, I, I can think of.

Speaker:

The traditional way is that you're going to create some kind of schedule, some

Speaker:

kind of, uh, automatic backup schedule.

Speaker:

Um, and you're going to do a, again, traditionally we'll

Speaker:

do three categories here.

Speaker:

Traditionally you've got some full backups and you're gonna do some

Speaker:

full backups every once in a while.

Speaker:

Um, and I was always a proponent if you had to do full backups, I was

Speaker:

always a proponent of doing those.

Speaker:

No.

Speaker:

More often than once a month.

Speaker:

Um, back in the days of tape, it was once a week because

Speaker:

it, was

Speaker:

complicated the restore process.

Speaker:

Yeah.

Speaker:

But, um, you know, doing it no more often than once a month, but depending on your

Speaker:

backup product, you might be able to, to

Speaker:

spread that out even over like three months.

Speaker:

And then you also want to schedule, if your backup product

Speaker:

is capable of doing it, you wanna schedule a cumulative incremental.

Speaker:

A differential, some products call it.

Speaker:

Um, and then of course the daily incremental.

Speaker:

Right.

Speaker:

So spreading

Speaker:

that all

Speaker:

for one application you're talking about,

Speaker:

E exactly.

Speaker:

You're doing this per application, per server.

Speaker:

Um, and, and you're trying to load balance things out because if you've

Speaker:

properly designed your system, it's probably not capable of doing a full

Speaker:

backup of your environment in one night.

Speaker:

Right.

Speaker:

Um, because that would just be really expensive, and then the rest of the

Speaker:

time it would go completely unused.

Speaker:

Right?

Speaker:

Um, so you, you buy it so that it's you, you size it so that it's big

Speaker:

enough to do a full backup over time.

Speaker:

And, um, you're right that, that, that scheduling that out is problematic, right?

Speaker:

Um, and you, you definitely could use, um, uh, AI

Speaker:

or ML to, to do that.

Speaker:

And even for the scheduling aspect.

Speaker:

So we talked about the applications, and then you were talking about sort

Speaker:

of that infrastructure piece, which is shared and you now have to worry

Speaker:

about it across all of these things.

Speaker:

And I'm sure you had these bonkers spreadsheets that you

Speaker:

were creating, trying to do this.

Speaker:

Did it stretch all the way to the moon and back, by the way?

Speaker:

Well, you know me for, it wasn't even a spreadsheet, it was just, uh, it, it was a

Speaker:

script.

Speaker:

Right.

Speaker:

I would, I would just script all this nonsense.

Speaker:

Right?

Speaker:

Um, but it, but it, the bigger the environment, the more.

Speaker:

That doing it programmatically made sense, right?

Speaker:

Um, and, and by the way, even if you have a more modern backup tool

Speaker:

that does incremental forever, there are many applications that

Speaker:

won't, that won't let you do

Speaker:

that.

Speaker:

Right?

Speaker:

I think of like database backups still need to be done every, you know, a full

Speaker:

backup every so often, and you have to schedule these out,

Speaker:

And that's the

Speaker:

second category.

Speaker:

'cause I know you talked about three categories.

Speaker:

Yeah.

Speaker:

Oh yeah.

Speaker:

Oh, well the three categories were, yes.

Speaker:

Uh, thank you.

Speaker:

I'm glad I have you here sometimes, you know.

Speaker:

Yeah.

Speaker:

So you have the, the, the old school full and incremental,

Speaker:

which old school is still current

Speaker:

school.

Speaker:

If we're talking about regular apps, then there's the forever incremental type.

Speaker:

Um, and you don't, you, you do have to worry about scheduling those,

Speaker:

but generally you just sort of tell 'em all to start at once and then

Speaker:

they queue and then it is not, it's, it's a lot simpler to do those.

Speaker:

I. But then the final category are ones that actually, um, and I

Speaker:

think the one that probably stands out the most here would be Rubrik,

Speaker:

right?

Speaker:

Rubrik doesn't let you schedule, um, that

Speaker:

stuff.

Speaker:

You tell it what your RTO

Speaker:

is and your RPO, and it just does the backups.

Speaker:

I mean, in fact, there are people that complain that you cannot, at least

Speaker:

last time I checked, you could not do.

Speaker:

a a manually scheduled backup if you wanted to tell it when to do stuff.

Speaker:

Um, I, I think this is probably the first use of some sort of machine learning

Speaker:

or artificial intelligence that I can think of with regards to scheduling.

Speaker:

Which, which I was also gonna chime in.

Speaker:

So the first two methods you talked about, right?

Speaker:

You're kind of statically doing this upfront, setting the schedules and

Speaker:

hoping that forever that it will be good,

Speaker:

Right.

Speaker:

You'll always be able to meet it, but say that there's an additional load or a

Speaker:

server goes down or something else, right.

Speaker:

There's no way to fine tune and adjust that,

Speaker:

Well, well, I, Well, there, I mean, there is, but there's

Speaker:

no way to automatically fine

Speaker:

tune and Yeah.

Speaker:

Yeah.

Speaker:

Right.

Speaker:

And so you're just like, okay, maybe it'll fail a couple times

Speaker:

and then I'll adjust the policies and then I'll be fine, but Right.

Speaker:

Versus something like an SLA based, which I, I actually have

Speaker:

looked at rubrics in the past,

Speaker:

and I find that very enticing because really in the end, you

Speaker:

care about what your RPO and RTO,

Speaker:

Yeah.

Speaker:

No one cares if you can back up.

Speaker:

They only care if you can restore.

Speaker:

the problem though is it's such a big paradigm shift for a lot of backup admins

Speaker:

that it's very difficult to understand because it's like when people move

Speaker:

from on-premises to the cloud and they were concerned because they're like,

Speaker:

I can't touch and feel my equipment.

Speaker:

Right.

Speaker:

It's not something I could actually do.

Speaker:

I think that's also the same challenges you get when you move

Speaker:

from sort of, uh, schedule-based backups to sort of SLA based backups.

Speaker:

Yeah, I, I liked, I liked the idea a lot.

Speaker:

I, I, I still, again, you know, if I was, if I was running rubric,

Speaker:

I would give people the ability to do a manual backup if they

Speaker:

wanted to.

Speaker:

But, but I do really like the idea of SLA driven backups,

Speaker:

because I like the idea of SLAs.

Speaker:

You know, we've talked about SLAs on here, and I like the idea of.

Speaker:

Knowing the back backups were being done often enough to meet my SLAs.

Speaker:

I

Speaker:

really liked that idea.

Speaker:

The one thing I think that is useful with these sort of approaches is

Speaker:

we've talked about the fact that like your environment doesn't say static.

Speaker:

Right.

Speaker:

So as you're adding new workloads, as things are changing, you don't

Speaker:

want to have to go recompute your entire spreadsheet or your

Speaker:

script H every single time.

Speaker:

So it's nice to have sort of these models that can automatically help fine tune and

Speaker:

optimize so you're not wasting your time because it's more than likely that you're

Speaker:

not gonna get it right the first time if you manually try to reset some of these

Speaker:

things.

Speaker:

And so having this automatic thing that constantly is

Speaker:

adjusting just seems amazing.

Speaker:

Yeah, it does.

Speaker:

And I, and outside of Rubrik, I'm not aware of any tools that do that.

Speaker:

Uh, but I, I think that this could certainly be a way where

Speaker:

they could use AI to do that.

Speaker:

Um, the.

Speaker:

And I, and I was thinking about, again, going back to it, it's been a

Speaker:

while since I've had to do this in a production environment, but the, the

Speaker:

the first thing that you have to find out is how big is everything, right?

Speaker:

How big is, is everything from a database perspective and

Speaker:

how, how long does it take?

Speaker:

'cause there's all these different, and that's the thing that nobody knows.

Speaker:

Right.

Speaker:

How big is your, how big is your data center?

Speaker:

And they're like, I don't know.

Speaker:

I don't know.

Speaker:

And so like, you have to do a full backup first

Speaker:

before you have any idea.

Speaker:

And not every server backs up at the same speed and all these different things.

Speaker:

So yeah, it it is a

Speaker:

complicated

Speaker:

and you may not be able to back up everything at the same

Speaker:

time because there might be

Speaker:

different hours, right?

Speaker:

That

Speaker:

a server is sort of offline or has less load that you can actually do it.

Speaker:

Yeah, so having some sort of AI or ml, um, figure that out sounds amazing.

Speaker:

Right?

Speaker:

Another area where I think that this could help is very, very closely related, and

Speaker:

that is, and, and some backup products do have this and that is making sure

Speaker:

that everything in my data center.

Speaker:

Is backed up in some

Speaker:

way, right?

Speaker:

Usually where you see this is an integration with like, um, uh,

Speaker:

VMware or, uh, AWS, et cetera, right?

Speaker:

Um, basically just connect to my entire, uh, you know, control

Speaker:

panel and then just look and make sure that everything is connected

Speaker:

to some type of policy to back it

Speaker:

up.

Speaker:

I, I think.

Speaker:

a default policy if anything is created, so at least everything

Speaker:

is protected, even though

Speaker:

it may not be protected with the right thing, but at least it's

Speaker:

being protected and you don't have to worry about these gaps.

Speaker:

I.

Speaker:

Exactly.

Speaker:

Exactly.

Speaker:

Um, and I, I think you do see this in a lot of backup products.

Speaker:

Usually again, it's with integration

Speaker:

with, uh, big things like VMware, HyperV, AWS, um,

Speaker:

you know, et cetera.

Speaker:

you need the companies, those vendors, to actually provide the APIs to be

Speaker:

able to do these sort of queries, and I think that's where there's kind

Speaker:

of a little bit of a tension there,

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

I mean, theoretically you could scour the data center, right?

Speaker:

Uh, looking for new computers.

Speaker:

Again, I, I know I mentioned this before, but you know, back

Speaker:

in the day we did that, right?

Speaker:

And back in the day we did that with Vizio.

Speaker:

Um, the, the vis, there used to be a very

Speaker:

expensive version of Vizio that would just literally crawl your data center.

Speaker:

And it used, uh, some very interesting technology.

Speaker:

Um, I forgot the, the name of this, but like, inmap

Speaker:

does this, where it, what it does is it sends a malformed packet.

Speaker:

It finds an IP address, it sends a malformed packet to that IP address

Speaker:

to see how it responds, and different things respond in different ways.

Speaker:

And that's how it, that's how it, um,

Speaker:

That

Speaker:

is crazy that they built that.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Um, and so you, you could theoretically do that, but a agreed, it's much easier

Speaker:

if you just have, everything's gonna be in VMware or AWS and then just talk to AWS.

Speaker:

Now again, going to VMware and AWS, there can be multiple virtual data centers.

Speaker:

There can be

Speaker:

multiple AWS accounts.

Speaker:

So you, you, you want to make sure that, that you have some way to, to

Speaker:

do that.

Speaker:

And I, and I do like that idea.

Speaker:

Shadow it.

Speaker:

Yeah, shadow it bad,

Speaker:

especially when it comes to backup.

Speaker:

Right.

Speaker:

Um, again, I'll tell a story from back in the day was the time that someone came to

Speaker:

me and they had, they were DBAs and they, they gave me a directory of a database.

Speaker:

They wanted me to restore.

Speaker:

Restore, and it was temp, um slash TMP on a, on a HP box.

Speaker:

And for those that don't know slash TMP on an HP box specifically, HPUX was in ram.

Speaker:

So when you rebooted it, temp went away.

Speaker:

And this, um,

Speaker:

this

Speaker:

it source code,

Speaker:

what I.

Speaker:

it?

Speaker:

Source code

Speaker:

It was source code.

Speaker:

Yeah.

Speaker:

And they were developing for months, like an entire team of

Speaker:

developers developing source code of this new application in temp.

Speaker:

And then we rebooted the server and they, and they came to me

Speaker:

and asked me to restore it.

Speaker:

And I was like, dude, we don't back up temp. I don't know

Speaker:

what you're talking about.

Speaker:

Like, and they're like, dude, this is really important,

Speaker:

like heads are gonna roll.

Speaker:

And I'm like, yeah, not mine.

Speaker:

Like everybody knows we don't back up temp.

Speaker:

Except for you, apparently.

Speaker:

Oh

Speaker:

Uh, so it's, I'm just, you know, it's really bad when you have

Speaker:

a functioning system and then it's not being backed up again.

Speaker:

Another story we used to have, um, we had a, a naming convention.

Speaker:

Ours was very boring.

Speaker:

Um, it, it was, it H-P-D-B-S-V-A, right?

Speaker:

HP database server A, and there was HB FS oh one, et

Speaker:

cetera, right?

Speaker:

And I remember, and I had this form that you had to fill out.

Speaker:

This was an actual piece of paper.

Speaker:

We did

Speaker:

not have web pages.

Speaker:

Right?

Speaker:

You had this form that you fill out and, and you had to, and, and it, it said

Speaker:

on there, simply filling out this form is not, does not meet the requirement.

Speaker:

You do not consider your system backed up until you have a signed form back from me.

Speaker:

Right?

Speaker:

And then one day somebody handed me a form and it said like.

Speaker:

They wanted, like me to back up H-P-D-B-S-V-M, right?

Speaker:

And I go, M that's interesting.

Speaker:

The last server I remember hearing about was H. So that means there's an I, A

Speaker:

J, A K, and an L out there somewhere.

Speaker:

hasn't been backed up.

Speaker:

That hasn't been backed up.

Speaker:

Yeah.

Speaker:

Um, so this idea of automatically

Speaker:

detecting servers and applications sounds like a great

Speaker:

idea.

Speaker:

And also not just VMs, but also detect, it would be really

Speaker:

nice if it detected the type of

Speaker:

VM and said, this appears to be a SQL instance.

Speaker:

We should back it up with the default SQL

Speaker:

policy.

Speaker:

That would be great.

Speaker:

So in addition to making things more efficient, um, there are some

Speaker:

other things we could do, uh, with AI that also would be interesting.

Speaker:

Uh, what do

Speaker:

you think is the, the first one?

Speaker:

No.

Speaker:

So I think one of the ones, and we've talked about it so much, so often,

Speaker:

and vendors are starting to do this, it's around anomaly detection and

Speaker:

it could be used in various fashion.

Speaker:

So one thing is like, Hey, by the way, this server, all of a sudden it's backing

Speaker:

up 10 times what it normally does.

Speaker:

Maybe this might indicate like a malware or ransomware on the system.

Speaker:

Right?

Speaker:

Um.

Speaker:

Or Hey, I've noticed that there's a bunch of data that's starting

Speaker:

to look like based on entropy.

Speaker:

That it's been encrypted, that doesn't look normal.

Speaker:

Okay, maybe I should go investigate it, right?

Speaker:

So, or it could even be security things like, Hey, you're logging

Speaker:

in from a different place than normal as a backup admin.

Speaker:

Is this the right thing or not?

Speaker:

Yeah.

Speaker:

And also very closely related to the stuff you said before was, uh,

Speaker:

are files where the file type based on the first few bytes of the file,

Speaker:

does not match the extension of the

Speaker:

file.

Speaker:

So it says it's a dot doc, but the first few bites of the file

Speaker:

show that it's an application, for

Speaker:

Sorry, one

Speaker:

Yeah, that's an interesting use case around, uh, the first few bites because

Speaker:

that could detect things that are being encrypted or other things that don't

Speaker:

make sense, or potentially even malware.

Speaker:

Right.

Speaker:

Yeah, it, uh, it's something we do, you know, my, uh, employee is S two

Speaker:

data and we do a lot of restores of old stuff, um, where we're pulling

Speaker:

data off of tape often for, um, I. For e-discovery purposes and lawsuit

Speaker:

purposes and, um, investigation purposes.

Speaker:

And one of the things that we do as we're pulling data, 'cause we

Speaker:

use a, a, a proprietary tool that we've written to restore data off

Speaker:

of most backups rather than use the built in tool for a lot of reasons.

Speaker:

Um, and this is one of them is that we check the file type against the file

Speaker:

contents and, uh, it can, it can also indicate.

Speaker:

Um, uh, subterfuge,

Speaker:

right?

Speaker:

Um, it can indicate somebody trying to hide something.

Speaker:

Um, but yeah, so anomaly detection, I think is a really big one.

Speaker:

Uh, right.

Speaker:

Definitely that this is a, this is a, you looks like you've got ransomware, right?

Speaker:

You need

Speaker:

to solve that.

Speaker:

That was probably the, the first big use of AI that I

Speaker:

remember, uh, in, in the backup world.

Speaker:

And I, I, I will say that if.

Speaker:

The way that you know, that you have ransomware is that your backup

Speaker:

product told you something is wrong, but, uh, but it, but it can

Speaker:

happen.

Speaker:

Right.

Speaker:

Um, another one that I'll talk, uh, that I'd bring up is, is data classification.

Speaker:

Again, I think that.

Speaker:

This is, this is probably a very simple one, but the

Speaker:

idea of like, looking at all the different data types and helping you to

Speaker:

understand what is in your environment.

Speaker:

This is not that new.

Speaker:

Um, but perhaps the AI use case could be helping you to identify trends,

Speaker:

um, and, and where the data's moving, where it's being created, where

Speaker:

it's being changed, uh, et cetera.

Speaker:

Um, and, and then, which is very closely related to my

Speaker:

other idea, which is predictive

Speaker:

analytics.

Speaker:

Right.

Speaker:

Um, again, going back to, uh, you know, back in the day,

Speaker:

one of the things I remember being the hardest to do is capacity prediction.

Speaker:

You

Speaker:

know, predicting whether or not I have enough capacity To

Speaker:

do my backups for the next six

Speaker:

and you know what makes it even harder?

Speaker:

What's that?

Speaker:

It does, d ddu makes it way harder.

Speaker:

And you know what AI right?

Speaker:

Ai ml could, could use to, could be used because it's smarter than I am.

Speaker:

Smarter than you are.

Speaker:

It could actually understand the trends

Speaker:

as to now what, what, let's talk about that Non, not every,

Speaker:

everybody might not understand.

Speaker:

Why DDU makes capacity,

Speaker:

Sure.

Speaker:

uh, management so

Speaker:

So let's talk about the, before we get to D Dub, let's talk about like

Speaker:

traditional storage or tape, right?

Speaker:

So

Speaker:

you're doing a full backup, you know how big your database is, therefore,

Speaker:

you know, okay, my full backup is gonna take this much space and

Speaker:

you know, with compression, maybe it's gonna be two x or half the space, right?

Speaker:

And then, you know, okay, my daily change rate is say 5%, and based on the

Speaker:

total size, I know what that's gonna be.

Speaker:

And so

Speaker:

if I'm doing weekly fulls, daily incrementals, I know how much

Speaker:

storage I'm gonna need for a week.

Speaker:

Yeah.

Speaker:

And, and just as, and just as important, you also know how

Speaker:

much storage, when you delete

Speaker:

the, you know, the older backups.

Speaker:

Yeah.

Speaker:

You know how much storage will be freed up, which is just if, if not even more

Speaker:

important.

Speaker:

Now the problem with deduplication is they talk about these great rates like

Speaker:

40 x, 30 x, 20 x, take your pick, right?

Speaker:

And that's all great.

Speaker:

If you're all like if a lot of your data is very similar, but it's hard

Speaker:

to tell, is your data similar or not until you've actually start doing it.

Speaker:

So if you're trying to buy storage for, say, three years

Speaker:

ahead of time, a capacity plan.

Speaker:

It becomes really difficult.

Speaker:

And so you guess, right?

Speaker:

You'll take a stab and maybe you look at some of your data and you're like,

Speaker:

Hey, these kind of look the same, but you don't know if that's right or not

Speaker:

until you actually start backing it up.

Speaker:

And like you said, Curtis, if you go delete your backup, you may not

Speaker:

actually free up that space because it's been de-duplicated against something

Speaker:

else that you're still preserving.

Speaker:

right,

Speaker:

Say I go delete my backup for six months ago for one application.

Speaker:

Another application might have, uh, common blocks with that data or with that other

Speaker:

application.

Speaker:

And so even though I deleted the first application's backup,

Speaker:

it's not gonna free up space.

Speaker:

And so you end up with this problem and this challenge.

Speaker:

And that's one of the things, the hardest things about deduplication.

Speaker:

Having worked at a company that did deduplication, customers

Speaker:

always struggled with it,

Speaker:

Yeah,

Speaker:

And some of the

Speaker:

things we would do is we would be like, Hey, let's scan your

Speaker:

application and just understand what sort of DDU rates you may get.

Speaker:

And even that's a guess, because maybe you move an application from one storage

Speaker:

appliance to a different appliance and now your DDU rates are different.

Speaker:

Yeah.

Speaker:

And, and, and again, the

Speaker:

one of the most frustrating things could be if you, you start.

Speaker:

You're running outta capacity, right?

Speaker:

And so you say, listen, I know we said we wanted to keep backups for

Speaker:

three years, but we're running outta capacity and so we're gonna start

Speaker:

deleting three years minus a month.

Speaker:

And you do that and you get

Speaker:

back 0.1% of your, it can be very difficult.

Speaker:

Um,

Speaker:

fact that to free up that space takes time.

Speaker:

Because typically with a lot of these systems, there's a background process

Speaker:

typically called garbage collection,

Speaker:

which goes and now needs to free up all this data and that does take time to run.

Speaker:

Yeah, it is, it is a two stage process where you, you, you, um, flag that

Speaker:

block for deletion and then another

Speaker:

process that runs typically when backups aren't running.

Speaker:

Um, and you, you probably have to force the garbage collection process.

Speaker:

Um, so go, go ahead.

Speaker:

so I was just thinking as we were talking about the first time

Speaker:

that I heard about AI in storage,

Speaker:

and I think the first company that I can recall, and I'm sure there

Speaker:

were others, was actually nimble.

Speaker:

Storage and nimble.

Speaker:

What they did is their first product when they built they, so

Speaker:

they provided primary storage.

Speaker:

And their first product, they basically were like, Hey, we are optimized for sql.

Speaker:

We are optimized for VMware.

Speaker:

We are optimized for these different, and I was like, oh, that's pretty awesome.

Speaker:

They're doing it dynamically.

Speaker:

But I think at the time it was kind of a static thing where you

Speaker:

would say, Hey, I have VMware.

Speaker:

I'm writing into this data store.

Speaker:

And it would optimize its, and it would basically pick different

Speaker:

block sizes for deduplication

Speaker:

Right, right, right.

Speaker:

Yeah.

Speaker:

That's interesting.

Speaker:

The, the, the, I, I, I think div, going back to the thing

Speaker:

we were talking about of like.

Speaker:

Using AI to basically help me understand when do I need to order more storage?

Speaker:

It can, to the best of its ability.

Speaker:

It can actually look at all of the DDU rates, right?

Speaker:

At all of the at, at what?

Speaker:

It could look at the DDU rate of each individual backup, right?

Speaker:

You, you gave, you told me it's a backup this much and this is

Speaker:

how much, and so we can actually

Speaker:

run all those calculations and I can actually figure out.

Speaker:

Well in six months, based on if everything stays the

Speaker:

same in six months, you're gonna be

Speaker:

outta storage.

Speaker:

So

Speaker:

many vendors actually do.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Um, so the, the, um,

Speaker:

Because I think storage capacity is a little easier.

Speaker:

To predict, because like you said, you're not really changing things, right.

Speaker:

You know what your policy is.

Speaker:

You know what data's coming in, you know how long it's, you're keeping it,

Speaker:

you know what your deduplication rates are, you know how much it's filling up.

Speaker:

So I think it's a little easier than what we had talked about previously

Speaker:

where it's like, okay, now let me plan out my entire backup infrastructure

Speaker:

and start scheduling that.

Speaker:

Yeah.

Speaker:

Speaking of dedupe, can AI help dedupe itself?

Speaker:

Do you think that?

Speaker:

can.

Speaker:

So I think my biggest.

Speaker:

Challenge would be that to run AI requires compute

Speaker:

and usually backup.

Speaker:

You want to go as fast as you can,

Speaker:

Mm-hmm.

Speaker:

right?

Speaker:

And so I think there's that tension.

Speaker:

That exists between running as fast as you can versus introducing

Speaker:

something in the pipeline to that could potentially slow things down.

Speaker:

And you'd have to also ask at what cost, right?

Speaker:

Like, are you going to be saving, say 70% additional versus a traditional

Speaker:

algorithms, or is it gonna be much less

Speaker:

Yeah, I think ddu in, um, in the backup world, there, there, there

Speaker:

have been two main ways to do ddu, which has been, there has been

Speaker:

something that isn't really ddu, but

Speaker:

there were DDU products that called themselves DDU products that did this.

Speaker:

Uh, and that would be block level, um,

Speaker:

incremental, essentially.

Speaker:

Right?

Speaker:

Not

Speaker:

actually de-duping things against each other, but just.

Speaker:

Using technology to lower the additional new data that's

Speaker:

backed up from each workload.

Speaker:

But then the traditional ddu, the way it works for those that don't know

Speaker:

this, is that you slice it up, you slice everything up into what are

Speaker:

typically called shards or chunks.

Speaker:

You run some type of algorithm on it that gives you some type of thing.

Speaker:

Like, like

Speaker:

A fingerprint.

Speaker:

the original SHA two,

Speaker:

SHA 2 56.

Speaker:

And again, here the, the better the algorithm, um, the better the ddu,

Speaker:

but the better the algorithm, the more compute it takes going back to

Speaker:

your trade off thing.

Speaker:

And so, um, that's the way basically every chunk it's run through, you come

Speaker:

up with this alpha numeric string, that alpha numeric string is compared

Speaker:

with every other alpha numeric string.

Speaker:

I. Um, and then that's how you identify redundant data.

Speaker:

And one of the challenges you have with that method is that, uh, the data slides,

Speaker:

um, and so if you don't slice the data at exactly the same spot it, it's duplicate

Speaker:

data, but you don't, don't identify it.

Speaker:

The, there is a completely different way which, um, you

Speaker:

look at the way vast does things.

Speaker:

They do something completely different, right?

Speaker:

So they, they have an algorithm and, and I, I'm guessing they

Speaker:

use AI or ML to, to, do this.

Speaker:

They have an algorithm that, um, basically identifies data that

Speaker:

is probably redundant, right?

Speaker:

Um, that, that, so they, they've got two different ways to do de-dupe and I, so

Speaker:

there are potentially, again, potentially.

Speaker:

AI or ML could be used to identify a new way to identify duplicate

Speaker:

data that is maybe, maybe

Speaker:

more efficient from a compute and storage.

Speaker:

Like even if it was just more efficient from a compute standpoint,

Speaker:

but got the but got the same amount of dedupe, that would still

Speaker:

be great.

Speaker:

Um, but

Speaker:

potentially this is something

Speaker:

that I think, uh, AI could

Speaker:

and the one thing I did also want to comment on Curtis is, uh, going back to

Speaker:

your comment about, okay, if the data shifts, then now you have to make sure

Speaker:

that you're doing the right blocks, right?

Speaker:

Uh, this is where companies though have done sort of, uh, what you're

Speaker:

talking about is called fixed block.

Speaker:

Fixed block deduplication,

Speaker:

right?

Speaker:

There are

Speaker:

many vendors out there though, who do variable size.

Speaker:

Variable block, uh, deduplication, which allows it to vary such that if

Speaker:

you do get an offset right, because of some data change, it's still able to

Speaker:

dup everything else after that because

Speaker:

of how it's actually computing the chunks, the segments, right?

Speaker:

Each of

Speaker:

the blocks.

Speaker:

Yep.

Speaker:

Um, so, uh, so that, that's certainly an area where, where AI could potentially

Speaker:

help the, um, the next, do you think it could help with recovery testing?

Speaker:

Oh yeah, I would.

Speaker:

So one thing for C is like, most people probably don't

Speaker:

know how to write a DR plan,

Speaker:

Mm-hmm.

Speaker:

Mm-hmm.

Speaker:

right.

Speaker:

Um, I wonder if you took ai, like even, and I'm going back to the first

Speaker:

set, right, the large language models,

Speaker:

Yep.

Speaker:

So the thing we said we

Speaker:

weren't talking about, I think we're gonna talk about it here.

Speaker:

Yeah.

Speaker:

I think at least to start with, it's like, Hey, here's all my data.

Speaker:

Here's my applications.

Speaker:

Help me build a DR test plan.

Speaker:

Yeah,

Speaker:

I like that idea.

Speaker:

And

Speaker:

see what it pops out because, and it may not be perfect, and don't just

Speaker:

blindly trust what it provides, but use it as a starting point, right?

Speaker:

And then go use that.

Speaker:

Because I think a lot of people struggle with, where do I even start?

Speaker:

Yeah.

Speaker:

And you could also, um, you could use it like a chaos monkey,

Speaker:

right?

Speaker:

You could use it.

Speaker:

Help me come up with some interesting scenarios.

Speaker:

To just make the, the idea, you know, one of the things that we talked about with in

Speaker:

terms of, uh, cyber testing, uh, was, um.

Speaker:

You know, when we had Mike on the idea of like, doing this and, and

Speaker:

making it, making it fun, making it a game, uh, I like that idea a

Speaker:

lot and I think maybe AI could help

Speaker:

there.

Speaker:

Um,

Speaker:

if, if it helps you do recovery testing more often, um, and, uh,

Speaker:

helps you identify potential, uh, uh, plot, I was gonna say plot

Speaker:

holes, uh, potential, potential holes in your program, uh, then that, then that

Speaker:

I think could be, um, very

Speaker:

helpful.

Speaker:

And Curtis, since you threw out a term, Chaos Monkey is a tool that was released

Speaker:

by Netflix, and literally what it is used for is to just test it, resiliency.

Speaker:

So it'll go randomly, kill services, kill locations, kill

Speaker:

network connections, just to see.

Speaker:

Is streaming, interrupted, are, uh, end users having any sort of

Speaker:

issues and it's able to do this at a scale and in an automated fashion

Speaker:

versus someone like trying to think about all the combinations,

Speaker:

permutations, and scenarios, because they're probably gonna miss things.

Speaker:

And so Netflix designed this thing to actually go out and

Speaker:

test their infrastructure.

Speaker:

It is pretty impressive.

Speaker:

Uh, you know, their infrastructure in general is pretty impressive.

Speaker:

It's not flawless.

Speaker:

Um, I did, I did watch part of the, uh.

Speaker:

The Tyson fight a little while ago, and that was on Netflix

Speaker:

and it was not good, right?

Speaker:

That wasn't so much a resilient thing as it was.

Speaker:

They just, again, they could have used perhaps a little bit better

Speaker:

AI to predict the, what kind of load they were gonna have.

Speaker:

But yeah.

Speaker:

But the idea of predicting crazy things that will happen, uh, Netflix

Speaker:

is pretty darn resilient, uh, when it comes to their infrastructure,

Speaker:

Yep.

Speaker:

yeah, I, I like that idea a lot.

Speaker:

Um, and, and I think, I think this is something that could be, that,

Speaker:

that, that, again, an, uh, uh, an LLM could actually help with, right?

Speaker:

So, like I said, the thing that we said we weren't gonna talk about,

Speaker:

we could talk about it, right?

Speaker:

Um, and for those, if you've never used a chat, g PT or a Claude,

Speaker:

uh, I think it's very useful

Speaker:

here, right?

Speaker:

You, you could say, Hey, I, I'm this kind of company.

Speaker:

This is the type of company, you know, and I understand the,

Speaker:

the privacy concerns of what you

Speaker:

share with a chat g pt or a clot.

Speaker:

Uh, there, there are, by the way, there are on-prem versions that

Speaker:

you can run, uh, of these LLMs too, so that you can keep the

Speaker:

data to yourself.

Speaker:

But the, you have a conversation with it.

Speaker:

Here's the type of company I am, here's the type of computing environment I have.

Speaker:

What do you th what could go

Speaker:

wrong?

Speaker:

Um, you know what, what could I build a, a dr scenario

Speaker:

around?

Speaker:

Any final thoughts?

Speaker:

Can you think of, uh, any other areas where we could use AI and, and backup?

Speaker:

Not so much.

Speaker:

I think the one thing I do wanna call out though is AI is here to stay.

Speaker:

ML is here to stay.

Speaker:

Don't be afraid of it.

Speaker:

Use it.

Speaker:

Right in the right ways and don't be afraid and just start thinking about it.

Speaker:

Uh, the one other thing I will call out is as companies are starting

Speaker:

to dig into AI and ML for their own applications, production applications

Speaker:

and other things, as a backup admin, you need to start thinking

Speaker:

about how do I protect this, right?

Speaker:

How do I back it up?

Speaker:

How would I potentially restore it?

Speaker:

Because there's a lot of data and training these models.

Speaker:

Is really, really expensive.

Speaker:

Mm.

Speaker:

And so you wanna make sure you have mechanisms to protect the models

Speaker:

that emerge from all of this training so you can restore them if needed.

Speaker:

So use backup to, to make AI more resilient while AI makes backup more

Speaker:

resilient.

Speaker:

I like that.

Speaker:

We'll call that a symbiosis.

Speaker:

I like that a lot.

Speaker:

Uh, one my final thought is that potentially you could use, again,

Speaker:

going back to the thing we said we weren't gonna talk about.

Speaker:

You could use LLMs to help select vendors, right?

Speaker:

You could say, Hey, here are all my requirements and here's all the

Speaker:

documents that they, they gave me this 57 page response to my 10 page RFI.

Speaker:

Can you help me make sense of it?

Speaker:

Um, and, uh, you, you could use that again, trust but

Speaker:

verify when using an LLM for

Speaker:

sure.

Speaker:

All right, well, thanks again, Prasanna, uh, for a good chat.

Speaker:

Thank you, Curtis.

Speaker:

And I am not gonna change how I hold a coffee mug.

Speaker:

I'm sorry.

Speaker:

I, I would expect no less.

Speaker:

And thanks to our listeners, uh, we'd be nothing without you.

Speaker:

That is a wrap.