You found the backup wrap up your go-to podcast for all things
Speaker:backup recovery and cyber recovery.
Speaker:In this episode, you'll hear the harrowing tale of what I'm
Speaker:calling the backup from hell.
Speaker:A project that started as a simple one-time backup, a 40 terabyte
Speaker:of two sonology boxes that turned into a 400 terabyte nightmare
Speaker:that took months to complete.
Speaker:We're talking hundreds of millions of files with one directory alone
Speaker:containing 99 million of them.
Speaker:I'll share how I dealt with failing tape drives ridiculously slow
Speaker:backup speeds, and ultimate solution that finally got the job done.
Speaker:If you've ever wondered what happens when everything that could go wrong
Speaker:with the backup actually goes wrong.
Speaker:This episode is for you, plus you'll learn some valuable lessons about what to check
Speaker:before starting a massive backup job.
Speaker:By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.
Speaker:Backup, and I've been passionate about backup and recovery for
Speaker:over 30 years, ever since.
Speaker:I had to tell my boss that we had no backups of the production
Speaker:database that we just lost.
Speaker:I don't want that to happen to you, and that's why I do this show.
Speaker:On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.
Speaker:This is the backup wrap up.
Speaker:Welcome to the show, and if I could ask you to just take one quick second
Speaker:and, uh, subscribe or follow us so you can make sure that you get all of this
Speaker:great content, that would be great.
Speaker:I'm w Curtis Preston, AKA, Mr.
Speaker:Backup, and I have with me a guy that apparently owes Ben Kingsley
Speaker:a huge apology Prasanna Malaiyandin
Speaker:how's it going?
Speaker:Prasanna, why do you owe
Speaker:an apology?
Speaker:so as everyone's probably like, who's Ben Kingsley.
Speaker:So if you don't know, he is an actor and he also played Gandhi in the movie Gandhi.
Speaker:He did.
Speaker:Right?
Speaker:And for the longest time I was a little, not upset, but like the fact that you have
Speaker:like probably one of the most important Indian people in history being played
Speaker:By a guy with the name Ben Kingsley.
Speaker:Exactly.
Speaker:Yeah.
Speaker:Ben Kingsley.
Speaker:And so today I found out that Ben Kingsley is actually Indian.
Speaker:Half
Speaker:How about that?
Speaker:should say.
Speaker:Yeah,
Speaker:what?
Speaker:he's Anglo Indian.
Speaker:Anglo Indian.
Speaker:Yes.
Speaker:It's like us.
Speaker:You and me we're Indian.
Speaker:so his paternal side is from Gujarat.
Speaker:Right.
Speaker:And his mom's side I think is European.
Speaker:His dad was a physician who was born in Kenya.
Speaker:And Ben Kingsley's name is not actually Ben Kingsley.
Speaker:It's like Krishna Bunge, I think
Speaker:Yeah.
Speaker:Yeah.
Speaker:Yeah.
Speaker:And he realized that he wasn't getting called into the right casting
Speaker:roles when he was looking for, when he was starting off his career.
Speaker:So he is like, let me change my name.
Speaker:And so we changed his name to Ben Kingsley and people started calling
Speaker:him in and he started getting roles.
Speaker:Racism in early Hollywood say, it isn't so.
Speaker:Racism in current Hollywood.
Speaker:Say it isn't So, Wouldn't be the only to do so.
Speaker:Yeah.
Speaker:yeah, so I apologize to Sir Ben Kingsley, uh, for all these years.
Speaker:Yeah.
Speaker:You were putting it in the same category as the quote unquote
Speaker:Indian guy from the Short Circuit movie, which I don't know his name,
Speaker:but he is very much not an Indian
Speaker:person.
Speaker:Do you know who it was?
Speaker:the name?
Speaker:I'm looking it up.
Speaker:Or it's also like how Apu from, uh, the Simpsons is not Indian,
Speaker:Yeah, he's, he's played by, um.
Speaker:Oh, I know that.
Speaker:I know the actor, but his name is escaping me.
Speaker:So Fisher Stevens, is that
Speaker:Fisher Stevens.
Speaker:Yeah, Fisher Stevens.
Speaker:Who?
Speaker:Those of you that watch succession
Speaker:will, uh, uh, Fisher Stevens was in succession.
Speaker:He was, he was a, a lawyer, a a smarmy lawyer, which
Speaker:always plays smarmy characters
Speaker:yeah, I was just thinking, because I remember him from the blacklist
Speaker:where he plays Marvin, the lawyer.
Speaker:Yeah, got, he's got kind of the lawyer face.
Speaker:I'm glad that you, you finally realized the error of your ways.
Speaker:But did you know he was
Speaker:No, no, I didn't.
Speaker:I guess I always brought it up just like you, like I would bring
Speaker:Ben Kingsley playing Gandhi and, um, as just another example of, uh, you
Speaker:know, what would we call it, brown face, I guess we'd call it brown face.
Speaker:Yeah.
Speaker:Yeah.
Speaker:But people taking actor and there's been a lot of those great roles throughout the
Speaker:Great.
Speaker:You know, great roles played by very not,
Speaker:you know, people that are not of that ethnic group.
Speaker:Yeah.
Speaker:and I think maybe also at the time, right, there weren't many
Speaker:Indian actors in Hollywood at all.
Speaker:And I would rather have the fact, or I would rather it like the movie be made
Speaker:with someone who is non-Indian, rather, because it's a great movie.
Speaker:I
Speaker:don't know.
Speaker:You've seen it,
Speaker:good movie.
Speaker:Yeah.
Speaker:Yeah.
Speaker:So I would rather have that rather than not having the movie at all.
Speaker:Hmm, I see what you're saying.
Speaker:I see what you're saying.
Speaker:Yeah.
Speaker:And of course, you know, we have the same challenge with, uh, Asian, uh, actors,
Speaker:right?
Speaker:Uh, there's literally only three Chinese actors in all of Hollywood.
Speaker:Like if you, if you look at like the Chinese roles, they've gone to
Speaker:literally like one, there's one guy.
Speaker:Uh, I forgot how many roles he's had, but he has had a prolific career playing
Speaker:every Chinese person that you know.
Speaker:Um, but, um, anyway, so we're gonna talk about something that we've
Speaker:alluded to a little bit on the podcast.
Speaker:Uh, sort of tell the final saga of what I'm calling the backup from Hell.
Speaker:I may maybe, uh, we should probably phrase that slightly differently.
Speaker:It's probably the,
Speaker:the backup that keeps giving.
Speaker:the back, the backup that, yeah.
Speaker:Uh, what a mess.
Speaker:The beginning of the story
Speaker:that I was asked to do a backup of two Synology boxes that they
Speaker:were, uh, repurposing, right?
Speaker:So they were, um, going to move the data.
Speaker:They, they were gonna reuse these servers, but they wanted to get a backup of the, of
Speaker:the, the data before they moved it off of
Speaker:Backup is good.
Speaker:Yeah,
Speaker:Backup is good.
Speaker:Yeah.
Speaker:Apparently they hadn't had a backup of the, of these servers before.
Speaker:And, um, then the, the, um, and, and , they said it was
Speaker:about 40 terabytes of data.
Speaker:That's the information that I was given and after I had started doing
Speaker:the backup, I very quickly realized that 40 terabytes might have been.
Speaker:An understatement.
Speaker:You, found additional data around
Speaker:right as you
Speaker:data.
Speaker:Yeah.
Speaker:Uh, so it turned out that it wasn't like 40 terabytes of data.
Speaker:It was more like 400 terabytes of
Speaker:Yeah, and
Speaker:I'm guessing because these were systems that were kind of probably off on the
Speaker:side, they hadn't been used in a while.
Speaker:Like that's, I think, the problem, and I think we talked about this in one of
Speaker:our episodes about sort of systems that kind of get stored away in the corner.
Speaker:No one worries about
Speaker:it.
Speaker:Right?
Speaker:And do you leave it powered on your old backup systems?
Speaker:Right.
Speaker:We just talked about that.
Speaker:And so I think that becomes a challenge.
Speaker:It's when you have these systems that are no longer actively being
Speaker:used, it kind of gets away from you.
Speaker:Yeah.
Speaker:Yeah.
Speaker:And so the customer really didn't have any idea just how much data that they
Speaker:were dealing with here, out to be, like I said, like close to half a petabyte of
Speaker:Yeah.
Speaker:And, and for you, that changes things significantly because
Speaker:changes the backup design like massively.
Speaker:Yeah.
Speaker:Yeah.
Speaker:because your backup target, I think you had mentioned previously
Speaker:that it was like a server, right?
Speaker:That you were backing this data up to
Speaker:I was backing it up via a server, a window server.
Speaker:And, um, and tape, right?
Speaker:Had tape, but it's sized like, um, you know, four 40 terabytes.
Speaker:And so, which is, which is basically the, the, the server and the tape
Speaker:library was the perfect size for that.
Speaker:But as I started realizing that it was figure, it was filling up.
Speaker:And again, this, this is my fault for not really looking at the size of the
Speaker:data before really jumping in there, but basically I realized very quickly
Speaker:that this was a whole lot more data
Speaker:than,
Speaker:than,
Speaker:That you expect it and and I think just kind of looking at lessons
Speaker:learned as you're a backup admin who is being told, Hey, this new
Speaker:application is coming online.
Speaker:Make sure that you understand like what is the expected growth of that application.
Speaker:Because what you size for, say, a five terabyte database with a 1% growth
Speaker:is very different than like a file server with like a 50% growth rate.
Speaker:Yeah, exactly.
Speaker:Um, and just because somebody says they have 10 terabytes of data doesn't mean
Speaker:that they have 10 terabytes of data.
Speaker:So you mentioned you had a backup server, you had a tape drive.
Speaker:Is there a reason you chose to use tape
Speaker:Well, the, I mean, tape is great for long-term retention of data,
Speaker:which is what this customer wanted.
Speaker:They wanted to hold onto this data for a long period of time,
Speaker:and that's where tape is great.
Speaker:And tape also is has, uh, you know, if you're able to properly feed it,
Speaker:tape is actually, can be quite fast.
Speaker:the challenge that I had when backing up this data that for various reasons,
Speaker:which I think I, I think by the end I sort of figured out the, the
Speaker:core reason for various reasons.
Speaker:Individual backups off of the, these filers, the, they were just
Speaker:slow, just, um, you know, they were
Speaker:Like how slow, slow.
Speaker:like, slow, was like, like, like three and a half kilobytes a second slow.
Speaker:So like slower than like a 56 K modem back
Speaker:Yeah.
Speaker:Right.
Speaker:And you can multiplex all you want.
Speaker:So first off, you know, I, I was using NetBackup, which, you know, NetBackup, it
Speaker:did a great job at what we had available.
Speaker:Um, the, challenge was that because I couldn't put.
Speaker:The client on the filers themselves.
Speaker:So the, was a way allegedly to put a, a backup client on the filer,
Speaker:but I could never get that to work.
Speaker:And so I had to back up over SMB because I'm backing up
Speaker:over SMB, I'm just, I'm just.
Speaker:I'm just limited at what that was, right?
Speaker:What,
Speaker:I could get, and because I'm backing up over SMB, the client
Speaker:is just the backup server,
Speaker:right?
Speaker:So instead of running a backup from two clients, I'm running a backup from one
Speaker:client because that's the backup server.
Speaker:I'm backing it up over SMB.
Speaker:And because of that, I'm limited to the number of jobs I can run at one time.
Speaker:NetBackup, um, says 99 99 jobs, which should say, gee, that
Speaker:sounds like a
Speaker:Nine problems.
Speaker:right?
Speaker:But, but the thing is, towards the end, as I was running a lot of these backups,
Speaker:the aggregate speed of like 99 backups was only like 30, 40 megabytes a second,
Speaker:you
Speaker:you're talking about 400 terabytes of data to
Speaker:400 terabytes of data doing the math.
Speaker:I backed up for months,
Speaker:right?
Speaker:And I tried all these different things.
Speaker:Uh, you know, num, you know, was I running too many backups at a time?
Speaker:Was I running not enough backups at a time?
Speaker:You know, it, um, you know, and then the problem is every, every
Speaker:test would take days or weeks.
Speaker:Think we should mention one thing.
Speaker:You were talking about these test taking days or
Speaker:mm-hmm.
Speaker:and then do you wanna mention sort of some of the issues you ran into with these long
Speaker:running jobs just due to infrastructure or
Speaker:Yeah.
Speaker:other issues in the environment?
Speaker:yeah, you, you backups are not made to run over weeks or months.
Speaker:Just backup infrastructure isn't made to work like that.
Speaker:And so when you do backups over weeks or months.
Speaker:Weird things happen that, cause you know, consternation, one of the things
Speaker:is LTO tape drives are great, but like we were using like the half high LTO
Speaker:drives and as far as I could tell, their duty cycle was not meant to
Speaker:be a hundred percent for two months.
Speaker:Right.
Speaker:Um, they're meant to be backed up for, you know, several hours and then give
Speaker:'em a rest and then back up several hours and then give 'em a rest.
Speaker:I was just beating the crap outta these things for weeks or months at a time.
Speaker:And what would happen is after some significant period of time,
Speaker:it would just go write error.
Speaker:And that's fine when a backup runs for a few hours and then just try again.
Speaker:But if you, but if it took you two weeks or three weeks to get to that point
Speaker:and then you get a write error, um,
Speaker:then
Speaker:it's not like you could restart these jobs either, right?
Speaker:I think you're running into
Speaker:Yeah.
Speaker:Well,
Speaker:I mean, I mean, I could restart em, but, but it's like after
Speaker:a period of time I became, I.
Speaker:I eventually got to the point where I said, tape is not my friend.
Speaker:I, anybody who
Speaker:this is coming from Mr.
Speaker:Backup.
Speaker:know anybody who listens to this podcast knows that I am, I am a friend of tape,
Speaker:right?
Speaker:I believe strongly in tape for a lot of reasons, but I don't think that, uh,
Speaker:specific and, and you know, maybe the, my LTO friends can chime in here, but I don't
Speaker:think that these tape drives were designed to be backed up to like this for weeks and
Speaker:months at a time, 24 7 with no, because as soon as one, I was multiplexing
Speaker:as many backups together as I could.
Speaker:And when one backup would finish, I would just add another backup onto it, right?
Speaker:Because
Speaker:I, I could, I could, I.
Speaker:what I couldn't do is I couldn't say, well, let's do these 10 backups, let
Speaker:them run until they're finished, and then we'll do the next 10 backups.
Speaker:And that would've given the tape drives a, a moment to breathe, I think.
Speaker:But, uh, I couldn't do that because the, because we, we just
Speaker:didn't have that kind of time.
Speaker:And so I
Speaker:was just, I was just try, you know, tagging it
Speaker:and, and I know you've always talked about like the shoe shining problem,
Speaker:given that you're not going very fast with these backups, right.
Speaker:Do you think that also led to some issues as well for the tape drives?
Speaker:yeah.
Speaker:So again, the core problem was that each individual backup was running slow.
Speaker:matter how many of them that I multiplex together, it was not enough
Speaker:speed to make the tape drive happy.
Speaker:And so, yes, the tape driver shoe shining.
Speaker:And when a tape tribe is continually shoe shining, the tape drive will fail.
Speaker:And so everything, I remember learning about tape drives was
Speaker:coming back to haunt me, right?
Speaker:Um, this is all of the design that I was, that I had done throughout
Speaker:the years on backup, um, you know,
Speaker:um, backup system
Speaker:And system.
Speaker:all of the things that, you know, what do you do when the backups, you know?
Speaker:And so I came to understand
Speaker:that the only way I was gonna finish this backup was to do it to disc.
Speaker:And just quickly before you move on, I think along the way, didn't
Speaker:you also have a tape drive that failed that you then had to go
Speaker:Oh, multiple Multiple times.
Speaker:Swap out tape drives, reboot tape drives, put in cleaning tapes and tape drives.
Speaker:And by the way, that's another thing is the way tape drives normally do
Speaker:is you run them for a certain number of hours and then there's a cleaning
Speaker:tape that goes in there and cleans it.
Speaker:And when you have a robotic library, that happens automatically.
Speaker:Well, when you just run the tape drive for.
Speaker:Two months, you know, that
Speaker:And so at some point the tape drive just fails.
Speaker:Yeah.
Speaker:um, yeah.
Speaker:And so I ultimately that the only way to get this done was to, um, you know,
Speaker:buy, uh, enough disc to back this up.
Speaker:And that wasn't cheap.
Speaker:Uh, but I, I didn't think that there was any other way that this was ever
Speaker:going to get done 'cause again, the core problem that we've had with tape
Speaker:for the last three decades has been that the backup, if the backup isn't
Speaker:too fast enough for the tape drive it's a, it's a fundamental mismatch
Speaker:right?
Speaker:And so we use to make that better.
Speaker:But if the multi, but if the speed you're dealing with is in kilobytes a second,
Speaker:Yeah.
Speaker:Well, and especially 'cause you're limited by those two, uh, Synology boxes, right?
Speaker:Which are limiting your bandwidth, right?
Speaker:It's not like
Speaker:Yeah.
Speaker:Synology boxes you can then pull from,
Speaker:Yeah, and I was, I was watching, like, I was running every kind of tool I could
Speaker:run to see, like, I wasn't overt tasking.
Speaker:The, that was the really weird part is that the, it's not like the
Speaker:Synology boxes were saying, you're really beating the crap out of it.
Speaker:You shouldn't do so
Speaker:backups at a time.
Speaker:It wasn't, it, it was, I didn't have a high I/O wait.
Speaker:I didn't have high CPU, I didn't have high ram.
Speaker:There, there was no, there was no
Speaker:rhyme or as to why we'll get to the rhyme or reason later.
Speaker:I figured it out.
Speaker:Um, but, but I knew the tape and I knew the tape and this wasn't gonna work.
Speaker:So, so I had to bring in, uh, a couple of other Synology disc arrays, by the
Speaker:way, and populate them with enough disc to handle all of this, uh, this backup.
Speaker:Right.
Speaker:Yeah,
Speaker:And, um.
Speaker:Then
Speaker:but that wasn't without its issues either.
Speaker:Right?
Speaker:When you, when you brought those in, that wasn't without its issues either.
Speaker:No, it wasn't without issues.
Speaker:And the other thing, what I needed to do was to, I felt that with, in terms of the
Speaker:number of directories that were remaining, I wasn't sure like the different sizes.
Speaker:So what I did was I split, I.
Speaker:Those jobs into many smaller jobs.
Speaker:NetBackup is really good at like running thousands of jobs, right?
Speaker:So rather than just have a hundred jobs, I turned that into like 2,400 jobs.
Speaker:Like I went,
Speaker:I went another level deep and created a policy for each of these
Speaker:directories, and then I ran those and it was running for a while.
Speaker:It was, it was, you know, again, more time.
Speaker:And what I started seeing.
Speaker:Were these jobs that were like an individual job that was running
Speaker:inordinate amount of time.
Speaker:but you also had some jobs that would finish like super fast, right?
Speaker:Like
Speaker:They'd finish five, they'd finish in
Speaker:Some of 'em, some of 'em finished in five minutes, some 'em would finish.
Speaker:But I noticed that over time there were certain policies that were running for
Speaker:really, really long periods of time, and eventually started poking around.
Speaker:when I discovered what ultimately was the, the true culprit.
Speaker:And, uh, anyone who's been around backup for a long time
Speaker:has seen this culprit before.
Speaker:It's just, this is the worst example of this culprit that I've ever seen.
Speaker:And what is that?
Speaker:We affectionately refer to it as the million file problem.
Speaker:Hmm.
Speaker:Because remember, again, going back to that, um, that client back from
Speaker:25 years ago, we had one server.
Speaker:That was going to be storing a bunch of images and it was going
Speaker:to result in millions of files.
Speaker:And we knew that back then that the million file problem is, a real problem.
Speaker:and and million file problem ev over, over the network is even worse, right?
Speaker:Because everything is, is, is a
Speaker:round trip.
Speaker:The way we fixed it back then was we used a product back then called
Speaker:flashback, which would back up at the raw level, but store the
Speaker:information, and that was not available to me.
Speaker:Why?
Speaker:Because that product no longer exists
Speaker:No.
Speaker:because it doesn't run on a Synology box.
Speaker:Right.
Speaker:Remember, I'm not the Synology
Speaker:All it was was an SMB mount to me.
Speaker:Right?
Speaker:And by the way, for those curious, yes, I tested SMB, I tested NFS.
Speaker:It didn't matter.
Speaker:It didn't matter.
Speaker:Um, the um.
Speaker:And
Speaker:by the way, this was a constant, you know, you know the phrase, never, never
Speaker:go into battle with an untested weapon.
Speaker:This was constant example of I am in the battle, I'm in the stuff,
Speaker:and now I'm trying to test stuff
Speaker:and, and I did to try to make things better, just made it take longer
Speaker:and the client just had to wait.
Speaker:And the the client was incredibly patient, honestly.
Speaker:And, and you know, I did my best to say, look, I, I've been doing this for 30
Speaker:years, I've never seen anything like this.
Speaker:Right.
Speaker:And that, that helped.
Speaker:But in the end, I was backing up.
Speaker:You know, we got down to, I, I learned a way to identify which
Speaker:were the problem directories.
Speaker:So I would kick off a policy and I would watch, and I would notice
Speaker:that had run for, let's say an hour.
Speaker:And it listed, let's say 300,000 files backed up.
Speaker:kilobytes.
Speaker:Hmm.
Speaker:Literally there's, there's a kilobyte column that
Speaker:kilobytes of byte and there's no value in there.
Speaker:We backed up 300,000 files, no kilobytes.
Speaker:so that, that helped me identify these problem
Speaker:Problem child.
Speaker:Yeah.
Speaker:it and let the other non-problem policies finish.
Speaker:And
Speaker:Right.
Speaker:Yeah.
Speaker:up getting down to like 150 policies that were the problem policies.
Speaker:And so I backed them up and I was able to get them.
Speaker:Over time, I was able to get them backed up, and then finally I got down to about
Speaker:20 policies, I think somewhere around
Speaker:policies.
Speaker:Go ahead.
Speaker:And at this point when you're down to the 20, like some of these have
Speaker:been running for a long time, right?
Speaker:Like how?
Speaker:like two months backups that have been running for two months,
Speaker:successfully running for two months.
Speaker:Yeah.
Speaker:And what was good was at this point again.
Speaker:Like this is information that would've been really helpful to have at the
Speaker:beginning, but it was information that, to get all this information at the
Speaker:beginning, it would've taken time to, like we, we just wanted to get started.
Speaker:Yeah.
Speaker:What I ended up finding was that, um, these backups, um.
Speaker:The, the, there were millions and millions and millions, like one of the, one
Speaker:of the directories that I was backing up, it had 99 million files in it,
Speaker:one directory, 99 million files, and eventually what I realized was that
Speaker:again, the problem this time was just SMB.
Speaker:So the fact that every one of these files results in a round
Speaker:trip conversation, possibly multiple round trip conversations.
Speaker:Yep.
Speaker:And I realized that the only way I was gonna back up these truly problem
Speaker:directories was to back them up locally.
Speaker:But how do I back them up locally?
Speaker:Well, luckily this is when I just, you know, basically go back
Speaker:to dumb, dumb old backup tools.
Speaker:And so I was able to run a backup using tar logged in locally
Speaker:on the filers, and then just.
Speaker:Directing the tarball across the network that finally worked.
Speaker:That's crazy.
Speaker:So you had these 20 jobs, right?
Speaker:And some of them you said were running for 60 plus days, and then you sort of
Speaker:were like, okay, let me start this over.
Speaker:And by the way, you were kind of forced to start them over
Speaker:because something happened right?
Speaker:At
Speaker:yeah.
Speaker:Something some unknown thing.
Speaker:Um, I think I.
Speaker:I, I, I don't know.
Speaker:I, I actually don't know
Speaker:what caused it, but they, they did fail
Speaker:and,
Speaker:And you were like, I'm not gonna start these
Speaker:yeah.
Speaker:I'm not gonna start 'em again.
Speaker:It's just, yeah.
Speaker:Well, Because
Speaker:like, one of jobs, the, the one with 99 fi, 99 million
Speaker:files, we were nowhere near.
Speaker:I.
Speaker:yeah.
Speaker:After 60 days you were barely
Speaker:yeah, yeah.
Speaker:We're barely, barely scratching the surface.
Speaker:so I'm like, I, I, I don't have, I don't have that, you know, I, I don't
Speaker:have the amount of time that it would take, so, so I switched to, you know,
Speaker:experimentally once again, experimentally, I'm experimenting on the fly, I'm
Speaker:doing development in production.
Speaker:Uh, I was like, well, let me see how long, how quick a tar ball would run.
Speaker:I ran a tar ball.
Speaker:I remember for like a day, you remember this?
Speaker:I ran a
Speaker:a day and it, I, I had a du of the size of the directory and after a day it had
Speaker:done like, like a half of it or something.
Speaker:Yeah.
Speaker:You're like, what?
Speaker:Once taking 66 days and barely scratch the
Speaker:yeah,
Speaker:You are mainly done.
Speaker:Almost done within a day.
Speaker:yeah.
Speaker:And so I was like, this is the way.
Speaker:Right.
Speaker:So it, it, it wasn't, it wasn't a way for everything because the, the, this
Speaker:was, um, because I, you know, I'm glad that I, that I use NetBackup for the
Speaker:bulk of it, because then I have the catalog data and, you know, and, um,
Speaker:but
Speaker:on the restore side.
Speaker:yeah, yeah.
Speaker:So this will.
Speaker:This will be the diff the restores will be more difficult for these
Speaker:like remaining 20 directories.
Speaker:I mean, not, not astronomically.
Speaker:So like,
Speaker:you know, can create a tarball, a
Speaker:list of this.
Speaker:So, you know, lessons learned, like,
Speaker:do that.
Speaker:Don't store millions of files on the other side of a, of an SMB box.
Speaker:I guess
Speaker:Yeah, so Well, and I think a couple things, even if it's not SMB, right?
Speaker:Just having that many files, because I think what people don't realize is
Speaker:even though the size of every disc has gotten significantly larger, right?
Speaker:You're talking like 18 terabyte, 20 terabyte disk
Speaker:Yeah.
Speaker:They can only handle so many operations per disc, right?
Speaker:That number hasn't changed.
Speaker:It's about a hundred per second.
Speaker:And so no matter how many, how big your disc is, right?
Speaker:If it was 21 terabyte discs, right, then you get 20 times a hundred iops.
Speaker:Versus if it's one 20 terabyte disc, you only still get that a hundred.
Speaker:So that's a big thing that people don't realize with these larger size discs.
Speaker:Yeah.
Speaker:And, and the thing was that the.
Speaker:That many files.
Speaker:So, because the problem, the, ultimately the problem wasn't disc io, the problem
Speaker:io.
Speaker:Right?
Speaker:Network latency.
Speaker:So, because
Speaker:when I actually ran, I ran two tar balls.
Speaker:I.
Speaker:Simultaneously is what I did.
Speaker:I using
Speaker:I just, I ran, I was always running two at a time.
Speaker:When I was running two at a time, I/O wait was sitting at 10,
Speaker:which is, is high,
Speaker:but I was like, well, it's got nothing else going on, so I'm, I'm
Speaker:it go.
Speaker:Right?
Speaker:The highest I/O wait ran during all of those hundreds of
Speaker:simultaneous backups was like four.
Speaker:yeah,
Speaker:So like I wasn't disc bound.
Speaker:I was
Speaker:bound, but not network bound in terms of throughput, network bound, in terms of
Speaker:Laid C,
Speaker:and
Speaker:of operations, just because SMB is very chatty.
Speaker:very chatty.
Speaker:It's probably the chattiest of the protocols,
Speaker:and
Speaker:we, you
Speaker:it was just a really combination.
Speaker:Yeah.
Speaker:And you know why this, and this is why backup vendors have their own protocols,
Speaker:like Data Domain has boost, right?
Speaker:To help alleviate and solve some of these issues.
Speaker:Yeah.
Speaker:You talked about, don't, don't do the somewhere we were talking about.
Speaker:Just don't do this.
Speaker:I, I'd like, I'd like to talk today.
Speaker:When I looked at these, these, uh, these directories that had these
Speaker:tens of millions of files, it was a structure that was very clearly
Speaker:created by some application.
Speaker:one of these directors had a common structure created by some.
Speaker:I'm gonna say stupid application that thought this was perfectly fine.
Speaker:That it was perfectly fine to create 99 million files for
Speaker:Do you know, I,
Speaker:item.
Speaker:I bet they were using the file system as a database
Speaker:I don't know.
Speaker:what it was.
Speaker:given just like the number of files and the size of those files.
Speaker:I know it was forensic type information
Speaker:and I, I don't, I clearly
Speaker:That, that's fine.
Speaker:Yeah, yeah,
Speaker:No, I'm just saying I clearly don't know enough about forensic stuff
Speaker:to know why they would want tens of
Speaker:of vials,
Speaker:but
Speaker:So where are you?
Speaker:So you talked about these 20 jobs that you were starting to do tarballs with.
Speaker:So where are you right now?
Speaker:So, so we finished all of them, but one, there was one that for some reason
Speaker:it, it, the file didn't look right.
Speaker:It was weird.
Speaker:Um, it, the, the, the backup completed, but the, some reason, the, the tarball,
Speaker:it just, it just didn't look right.
Speaker:I don't wanna go into details.
Speaker:It just didn't look
Speaker:so I'm rerunning that one.
Speaker:So it, based on its size and how well it's doing, it should
Speaker:finish in about a day or so.
Speaker:Um, and what I'm
Speaker:is a significant improvement in terms of
Speaker:A significant improvement a day versus, you know, a year, um,
Speaker:Or two, I think actually it might have been two.
Speaker:Yeah,
Speaker:Agreed.
Speaker:Um, and what I'm doing is I'm, because again, I don't have the catalog.
Speaker:What I'm currently running is I'm running a tar TVF.
Speaker:On all of those files and creating tarballs or creating, I'm sorry, text
Speaker:files, a list.
Speaker:of the, the files that are in there.
Speaker:And then I'm gonna do a count on the files that are in there and
Speaker:check it against the count of the files that are in the directory.
Speaker:And, and hopefully those numbers should be the same.
Speaker:Yeah, because I believe you are even saying that to run things
Speaker:like a find to get a list of all the files in a directory or a DU
Speaker:Yeah.
Speaker:hours, right?
Speaker:Well, it was days actually.
Speaker:In
Speaker:fact, it was why I didn't have this information in the beginning
Speaker:because everything was so big and every find, every du every command
Speaker:that I had DU is quicker than find.
Speaker:DU is.
Speaker:It just does less work than find.
Speaker:But the problem that I ultimately realized was that DU wasn't
Speaker:really being helpful in terms of.
Speaker:The
Speaker:scope of the job, what was the scope of the job was determined
Speaker:by the number of these files.
Speaker:And I couldn't get those numbers because that was the thing that took forever.
Speaker:the number of jobs dwindled down to about 20, that's when I
Speaker:was able to run these, uh, the
Speaker:and they would, they would actually complete.
Speaker:And that's when I realized just how bad it was.
Speaker:so if you had to start this over, and hopefully you never do, but I'm just
Speaker:saying, if you had to go back to day one, what would you do differently?
Speaker:I know you talked about making sure you understand the size of your backups.
Speaker:Right.
Speaker:It just feels like some of these, you just have to go through the process
Speaker:though because you don't know what to do.
Speaker:Like it's not like you could just start day one and be like,
Speaker:oh, I know I need to go to disc.
Speaker:I need to do X, Y, and Z.
Speaker:Right?
Speaker:It's sort of like a learning process.
Speaker:would say that I.
Speaker:Yeah, because the problem is you're going off into the unknown,
Speaker:you're doing a backup of something that you don't know what it is.
Speaker:And I, I would say if possible, if at all possible, get things like
Speaker:dus, uh, you know, discus it, it's a Unix command, but you can load those
Speaker:tools and windows as well get, like if you're going to back up, if you're
Speaker:gonna back up a hundred directories.
Speaker:Get a du of every one of those directories so that you have an idea
Speaker:of just what you're dealing with,
Speaker:if at all possible.
Speaker:Also, look and see if the number files and if the number of, and if you're
Speaker:trying to do a, you know, it's not that hard, you just run a fine dot dash,
Speaker:you know, I didn't even do a print just fine dot pipe to wc -l, right?
Speaker:That was it.
Speaker:Right?
Speaker:Um, to, to get the number of files.
Speaker:I'd say if again.
Speaker:If I could go back in time, I, I would say maybe do a little bit more of this
Speaker:research prior to beginning the job.
Speaker:Um, but that's diff it's, it's easy to say that now,
Speaker:um, because I know what
Speaker:I know.
Speaker:Right.
Speaker:Um, but the, you know, the core problem was that you've
Speaker:got these millions of files.
Speaker:I mean, which is all.
Speaker:Already gonna be a problem if you're backing it up in any sort of normal way.
Speaker:But if you're
Speaker:up remotely over the network, it's going to kill you.
Speaker:Yeah.
Speaker:So, um, you gotta figure out a way to do that.
Speaker:And then I would just say, see if there's anything that you can do with the, with
Speaker:the application that's created this data
Speaker:which is why it's important to get involved early on, right when an
Speaker:application is being developed or deployed, right, to get involved so
Speaker:they understand the backup requirements.
Speaker:yeah.
Speaker:And so, this backup that would never finish, I literally was, I
Speaker:was starting to think that this thing was never gonna finish.
Speaker:Um.
Speaker:It's essentially finally, I mean, it's not, at this point, it's
Speaker:not a hundred percent, but I'm, I'm now, you know, it's just, I'm
Speaker:at the finish line.
Speaker:Yeah.
Speaker:at the finish line.
Speaker:Yeah.
Speaker:Um, it's nice.
Speaker:I know one of the other things you mentioned that you were using
Speaker:NetBackup, but you had also looked at other tools out there as well, right?
Speaker:That could potentially help you with this effort.
Speaker:Right.
Speaker:So do you think that that becomes valuable, like either looking at other
Speaker:tools, um, I know you had reached out to like synology support, you
Speaker:had reached out to some experts, like
Speaker:Yeah.
Speaker:Yeah.
Speaker:The problem there, there were, there were, you could do, like with Synology,
Speaker:you can like copy the data from A to B.
Speaker:Mm-Hmm.
Speaker:They have this ability essentially like, you know, for lack of a
Speaker:better word, they have Snap Mirror.
Speaker:they have the equivalent of Snap Mirror.
Speaker:Yep.
Speaker:from onSynologygy box to another.
Speaker:But to me that wasn't really a backup like I wanted in a, in a format, you know,
Speaker:the end I was forced to not do what I wanted with the tar.
Speaker:Um, but I wanted it in a cataloged format.
Speaker:So we looked at a couple of, the problem was never NetBackup.
Speaker:Right?
Speaker:NetBackup made it, um, easy to script this whole thing because it was the
Speaker:only way I could make sense of it.
Speaker:'cause it was, it was thousands of directories and, um, and even
Speaker:more thousands of sub directories under those directories.
Speaker:And the only way I could make sense of this was to script it all.
Speaker:And, um, the, the fact that NetBackup allowed me to do that was great.
Speaker:Um, there are some other tools these days, some of the newer tools,
Speaker:they want to make it easy for you.
Speaker:But if you get into a complicated situation like this, some of the newer
Speaker:tools don't even have the ability to sort of grab it by the horns.
Speaker:The
Speaker:able to do a NetBackup,
Speaker:Yeah.
Speaker:I think the other thing also that you were doing, which I thought was interesting,
Speaker:was also your scripting, right?
Speaker:Trying to automate this, like, uh, I know like scheduling your,
Speaker:the backup policies to run, right?
Speaker:And then you were sort of doing load balancing to make sure
Speaker:that you keep the two filers
Speaker:Yeah.
Speaker:Yeah.
Speaker:I couldn't, yeah, that was the thing.
Speaker:I couldn't normally, I, I just, I believe in just throwing
Speaker:everything in the NetBackup schedule or, and let it figure it out.
Speaker:But because again, because of the limitations of the weird thing I had,
Speaker:I, I couldn't figure out a way to load balance across the two target filers.
Speaker:the NetBackup scheduler.
Speaker:Um, maybe I could have, uh, done that better.
Speaker:I don't know.
Speaker:But, uh, so the way I was doing it was I was just assigning a backup.
Speaker:a backup would finish, I would assign the next backup to that, that the
Speaker:was now had more space available to it.
Speaker:Right.
Speaker:So I just had a while loop that was running, you
Speaker:know, checking to see if a backup job was done.
Speaker:but I think that's important, right?
Speaker:You can always script some of these things that if it doesn't
Speaker:exist in the native tools, right?
Speaker:Don't be afraid.
Speaker:Yeah.
Speaker:Don't be afraid.
Speaker:you know, obviously I'm, I'm pretty good at scripting and
Speaker:I'm pretty good in the backup.
Speaker:And, um, th there are, and, and, and, and thanks.
Speaker:Thanks very much to Veritas for keeping their, uh, their documentation online.
Speaker:Uh, the number of times I Googled.
Speaker:You know, backup job, you know, how do, how do I list, uh, you know, and
Speaker:I know there's a, there's, I know there's a command to, to do this.
Speaker:How do I do that?
Speaker:And, you know, and then a man page would come up and I would read it
Speaker:and I was like, oh, yeah, yeah, yeah.
Speaker:It's
Speaker:been a while.
Speaker:Yeah.
Speaker:Um.
Speaker:you have to also thank Cygwin, of course.
Speaker:Yes, special thanks to to Cygwin Without Cygwin.
Speaker:That is the tool that you can download and run on any Windows
Speaker:server to give you Unix capabilities.
Speaker:I will say there were, there were moments where Cygwin was both helpful and
Speaker:terrorizing me because it was the whole like backslash versus forward slash thing.
Speaker:Because in Windows, you know, the file separator is a backslash, which
Speaker:in Unix is an escape character,
Speaker:Yep.
Speaker:and Cygwin wasn't consistent.
Speaker:When that escape character would be an escape character.
Speaker:Like, like if you piped it into a file, it would do one thing.
Speaker:If you piped it into a command, it would do it, it would behave differently.
Speaker:And, um, so that, that definitely l lent.
Speaker:The fact that I was doing constant file manipulation on directories
Speaker:that were seven levels deep,
Speaker:Yeah.
Speaker:did not help.
Speaker:Yeah.
Speaker:Oh, and then I couldn't, the, the, the, the one thing with
Speaker:Cygwin is that it doesn't see.
Speaker:It doesn't see the, to point the backups to NetBackup, I have to point
Speaker:'em in the backs back slash filer name
Speaker:share name.
Speaker:Cygwin doesn't see that.
Speaker:Cygwin sees only mapped drive names
Speaker:and
Speaker:have to map it using
Speaker:you have to map it to a drive name.
Speaker:Let's say you map it to,
Speaker:to letter F, and then in Cygwin you would see /cygdrive/f.
Speaker:Which would be the same as this backs slash backs mount.
Speaker:know, I was constantly having to go back and forth between
Speaker:those two and, and that was fun.
Speaker:Um,
Speaker:scripting
Speaker:here's the thing.
Speaker:After all of this experience and everything you've learned, you're probably
Speaker:never gonna use any of this again.
Speaker:I don't know about that.
Speaker:I dunno about that.
Speaker:I tell you what, I'm, I'm taking a tar, all those scripts that
Speaker:I wrote, um, because I will say this, that, that the NetBackup
Speaker:documentation while, uh, extensive, it doesn't give a lot of examples.
Speaker:And so like, I'm thinking of like, um, like the BP duplicate command,
Speaker:which is the command to copy backups from one place to another.
Speaker:I couldn't, I couldn't figure out from reading the man page how to
Speaker:actually do, to do what I needed to do.
Speaker:So I would, I would like.
Speaker:I would do, I would have to run tests, you
Speaker:know, I'd, you know, um, and, um, the, you know, not like now that Cohesity's
Speaker:acquiring them, it's not like they're now gonna rewrite their man pages.
Speaker:I just thought that they could have used some more, some more examples.
Speaker:But
Speaker:Yeah.
Speaker:I figured it out eventually.
Speaker:You know, I think someone used to have a forum that people would post on about.
Speaker:Yeah, someone used to have that and then, but people stopped posting
Speaker:on that forum, so I don't know
Speaker:You know?
Speaker:Um, where people are getting their help now,
Speaker:but, uh,
Speaker:Well, I'm glad that this is almost over,
Speaker:yeah.
Speaker:Yeah.
Speaker:nearly over and I'm glad you're still alive,
Speaker:I am alive.
Speaker:I didn't kill anyone along the way.
Speaker:I didn't scream at anyone.
Speaker:Like the, the story that
Speaker:you have heard were, were Curtis Cuss Preston.
Speaker:I didn't scream at anyone.
Speaker:yeah.
Speaker:but I really, really, really think you should do an office space on those filers.
Speaker:yeah.
Speaker:Well, that would sort of defeat the purpo of the
Speaker:but, uh, I, yeah, I, like that idea.
Speaker:Hmm.
Speaker:Anyway.
Speaker:Well, uh, thanks Prasanna for helping me, uh, sort of through this.
Speaker:You were my constant counselor through this.
Speaker:I think I learned a bunch.
Speaker:I know usually I'm all about YouTube knowledge, but in this case it was
Speaker:the Preston knowledge, so it was good.
Speaker:I.
Speaker:Yeah.
Speaker:Yeah.
Speaker:uh, thanks everybody else for, uh, uh, listening along with this sad, sad story
Speaker:with I think a decent, happy ending.
Speaker:That is a wrap.
Speaker:The backup wrap up is written, recorded and produced by me w Curtis Preston.
Speaker:If you need backup or Dr.
Speaker:Consulting content generation or expert witness work,
Speaker:check out backup central.com.
Speaker:You can also find links from my O'Reilly Books on the same website.
Speaker:Remember, this is an independent podcast and any opinions that you
Speaker:hear are those of the speaker.
Speaker:And not necessarily an employer.
Speaker:Thanks for listening.