Greetings, listeners. Welcome back to the Data
Speaker:Driven Podcast. I'm Bailey, your AI host with
Speaker:the most data, that is, bringing you insights from the ether
Speaker:with my signature wit. In today's episode, we're
Speaker:diving deep into the heart of artificial intelligence's engine room,
Speaker:GPU orchestration. It's the unsung hero
Speaker:of AI research, optimizing the raw power needed to fuel
Speaker:today's most advanced machine learning models. And
Speaker:who better to guide us through this labyrinth of computational complexity than
Speaker:Ronan Darr, the cofounder and CTO of Run AI, the
Speaker:company that's making GPU resources work smarter, not
Speaker:harder. Now onto the show.
Speaker:Hello, and welcome to Data Driven, the podcast where we For the emergent fields
Speaker:of artificial intelligence, data engineering, and overall data
Speaker:science and analytics. With me as always is my favoritest
Speaker:Data engineer in the world, Andy Leonard. How's it going, Andy? It's
Speaker:going well, Frank. How are you? I'm doing great. I'm doing great. It's been,
Speaker:we're We're recording this February 1, 2024. And as I said to my
Speaker:kids yesterday, January has been a long year.
Speaker:We're only, like, 1 month into the year, and it was it was a pretty
Speaker:wild ride. But I can tell we're gonna have a blast today,
Speaker:because we're gonna geek out on something that I kinda sort of understand,
Speaker:but not entirely, and it's GPUs. And in the virtual green room, were chit
Speaker:chatting with some folks, and, but let me do the formal introduction
Speaker:here. Today with us, we have doctor Ronadhar, cofounder and CTO
Speaker:of Run AI, A company at the forefront of GPU
Speaker:orchestration, and he has a distinguished career in technology.
Speaker:His experience includes significant roles at Apple. Yes, That
Speaker:apple. Bell Labs. Yes. That Bell Labs.
Speaker:And at Run AI, Ronan is instrumental in optimizing
Speaker:GPU usage For AI model training and deployment,
Speaker:leveraging his deep passion for both academia and startups.
Speaker:And, Run AI is a key player in the, and he is a he
Speaker:and Run AI are key player in the AI revolution. Ronan's
Speaker:contribute Contributions are pivotable in shaping and powering the
Speaker:future of artificial intelligence. Now I will add that in
Speaker:my day job at Red Hat, Run AI has come up a couple of times.
Speaker:So this is definitely, definitely
Speaker:an honor to have you on on on the show, sir. Welcome.
Speaker:Thank you, Frank. Thank you for inviting me. Hey, Andy. Good to
Speaker:be here. I love it. Love Reddit. We're a big
Speaker:fan of Reddit. We're working closely with many people in
Speaker:Reddit, and love that. Right? Love OpenShift,
Speaker:love Reddit, love Linux. Yeah. Cool. Cool.
Speaker:Yeah. So so for those who don't know exactly, I kinda know
Speaker:what, your Run AI does, but can you explain exactly
Speaker:What it is run AI does and why GPU
Speaker:orchestration is important. Yes.
Speaker:Okay.
Speaker:So run AI is, software,
Speaker:AI infrastructure platform. So we
Speaker:help machine learning teams to get much more
Speaker:out of their GPUs, And we provide
Speaker:those teams with abstraction layers and tools
Speaker:so they can train models And deploy models
Speaker:much easier, much faster. And
Speaker:so We started in 2018, 6 years
Speaker:ago. It's me and my cofounder, Omuri. Omuri is the CEO.
Speaker:He's, he's amazing. I love him. We We know each other for many
Speaker:years. We we met in the academia, like, more than 10 years ago,
Speaker:and and we started running AI together, and We started
Speaker:running AI because we saw that there are big challenges
Speaker:around, GPU's, around orchestrating
Speaker:GPU's and utilizing GPU's. We saw back then
Speaker:in 2018, the GPUs are going to be very very important.
Speaker:It's like the basic a a component in
Speaker:that any AI company need to train models,
Speaker:right, and deploy models. So we saw that GPUs are going to be critical, but
Speaker:there are also a lot of challenges with, with utilizing GPUs.
Speaker:I think back then, GPUs were relatively new In
Speaker:the data center, in in the cloud.
Speaker:GPU's were very known in the gaming
Speaker:industry. Right? We spoke before on gaming. Right? Like, a lot of
Speaker:key things there that GPU's has has has been enabled
Speaker:enabling, But in the data center, they were relatively new and the
Speaker:entire software stack that is that
Speaker:is running the Cloud in data center As was built for
Speaker:traditional microservices applications that are running
Speaker:on commodity CPUs And AI workloads are different, they are
Speaker:much more compute intensive, they they
Speaker:run on on GPUs, maybe on multiple nodes of Meet to point
Speaker:machines of GPU's, and GPU's are also very different.
Speaker:Right? They are expensive, very scarce in the data center.
Speaker:So The entire software stack was a bit for something else
Speaker:and when it comes to GPUs, it was really hard for many people to to
Speaker:actually manage those GPUs. So we came in And, and we
Speaker:saw those gaps. We've built run AI on top of
Speaker:cloud native technologies like Kubernetes and containers. We're
Speaker:big fans of Of those, technologies, and
Speaker:we added components around scheduling, around
Speaker:the GPU fractioning. So we enable
Speaker:multiple workloads to run on a on a single GPU and
Speaker:essentially all the provision GPU's. So we build this Engine which we
Speaker:call cluster engine that runs in in in GPU
Speaker:clusters. Right? We help machine learning teach to pull all of their GPU's into
Speaker:1 cluster, Running that engine, and that engine provides a lot of
Speaker:performance and lot of capabilities from those GPUs. And
Speaker:on top of that, we built this control plane And
Speaker:and tools and for machine learning,
Speaker:teams to run the Jupyter Notebooks, to run
Speaker:training jobs, batch jobs to deploy their models, right, to just to to
Speaker:have tools for the entire life cycle of AI
Speaker:from Training models in the lab to taking those models into
Speaker:production and running them and serving actual users.
Speaker:And That's the platform that we've built, and we're working with machine
Speaker:learning teams across the globe and on just managing,
Speaker:orchestrating, and letting them Get much more out of their GPUs and essentially
Speaker:run faster, train more than faster and in much easier way and
Speaker:deploy those modules In a much easier and faster and more efficient
Speaker:way. Yeah. The thing that blew me away when I first heard of Run
Speaker:AI, and this would have been, 2021
Speaker:ish. No. 20 early
Speaker:2021, I would say, And, it was the
Speaker:idea of fractional GPU's. Right? So you can have 1,
Speaker:I say 1, but, know, it's realistically, it's gonna be on, but you you can
Speaker:kind of share it out, which I think and we were talking in the virtual
Speaker:green room about how, you know, some of these GPU's,
Speaker:If you can get them because there's a multi month, sometimes multi
Speaker:year supply chain issue. I mean, these things are expensive bits of
Speaker:hardware, and I think the real value, correct
Speaker:me if I'm wrong, is, like, well, you know, if you I was talking to
Speaker:somebody the other day, and and we're basically talking about how we can,
Speaker:you know, if you get if you get, like, 1 laptop with a killer
Speaker:GPU, right, that GPU is really only useful to that 1
Speaker:user, Whereas if you can kind of put it in a in a in a
Speaker:server and use something like RunAI, now everybody in the organization can do
Speaker:that. And these are not trivial expenses. I mean, these are like, You know,
Speaker:you sell a kidney type of costs here.
Speaker:Yeah. Absolutely. So Absolutely. First of all, GPUs
Speaker:are expensive. They cost a lot. Right?
Speaker:And we provide, Technologies like fractional GPUs and
Speaker:other technologies around scheduling that allows
Speaker:teams to share GPUs. Right. So we used book on
Speaker:GPU fractioning. So that's 1 one day of
Speaker:sharing where you have 1 GPU, which is really expensive.
Speaker:And Not all of the workloads are
Speaker:AI workloads are really compute intensive and require the
Speaker:entire GPU or, you know, maybe multiple GPUs. There are
Speaker:workloads like Jupyter Notebooks where you have
Speaker:researchers that just
Speaker:Debugging their code or cleaning their data or doing some simple stuff,
Speaker:and they need just fractions of GPUs.
Speaker:In that case, if you have, a lot of data scientists,
Speaker:maybe you wanna host all of their notebooks On
Speaker:a much smaller number of GPUs because, right, each
Speaker:one of them, it's just fractions of GPUs. Another big use case
Speaker:for fractions Of GPUs is inference.
Speaker:So now all of the models are huge
Speaker:and And doesn't fit into, the memory of 1
Speaker:GPU, and in computer vision,
Speaker:there are a lot of Models that are relatively small,
Speaker:they run on GPU, and you can essentially host multiple of
Speaker:them on the same GPU. Right. So you can have instead of
Speaker:just 1 computer vision model running on GPU, host 10
Speaker:of those models on the same GPU and get Factors of
Speaker:10 x in, in your cost, in your,
Speaker:overall throughput of, of inference. So that's That's one
Speaker:use case for fractional GPU, and we're investing heavily just
Speaker:building that technology. Another layer
Speaker:of sharing GPUs Comes where you
Speaker:have maybe in your organization multiple teams
Speaker:or multiple projects running in parallel. So
Speaker:for example, may open AI, they now are working
Speaker:on gpt5. It's 1 project. That project needs a
Speaker:lot of GPUs And they have more projects. Right?
Speaker:More research project around alignment or around,
Speaker:reinforcement learning. You know? DALL
Speaker:E. Like, they they they have more than just 1 project. Then DALL E and
Speaker:they have multiple models. Right? Exactly. They have. Right? So each
Speaker:project needs Needs GPUs. Right? Needs a lot of
Speaker:GPUs. So if you can instead of
Speaker:allocating GPUs Entirely for each project,
Speaker:you could essentially pull all of those GPU's and share
Speaker:them between the those different projects, different teams,
Speaker:And in times where 1 project is idle and not
Speaker:using their GPUs, other projects, other teams can share
Speaker:can get access to those GPUs. Now orchestrating all of
Speaker:that, orchestrating that sharing of resources between
Speaker:projects, between teams can be really complex And
Speaker:requires this advanced scheduling, which
Speaker:which we're bringing into the game. We're bringing
Speaker:those scheduling capabilities from the high performance computing world
Speaker:known on those schedulers. And so we're bringing Capabilities
Speaker:from that world into the cloud native Kubernetes
Speaker:world. Scheduling around batch batch scheduling
Speaker:fairness, Algorithms, things like that, so teams and projects
Speaker:can just share GPUs in a simple and efficient
Speaker:way. So those
Speaker:are the 2 layers of sharing GPU's. Interesting. And and
Speaker:I think that I think as As this field matures
Speaker:and it matures in the enterprise, I think you're gonna see organizations
Speaker:kind of be more,
Speaker:more more more I think savvy about, like, okay, like you said, like, data scientists,
Speaker:if they're just doing, like, you know, Traditional statistical modeling really doesn't benefit
Speaker:from GPUs, or they're just doing data cleansing, data engineering.
Speaker:Right? They're probably gonna say, like, well, Let's run it on this cluster, and
Speaker:then we'll break it apart into discrete parts where, you
Speaker:know, then we will need a GPU. And I also like the idea that, you
Speaker:know, you're you're basically doing What what I learned in college,
Speaker:which was time slicing. Right? Sounds like this is kind of, like, everything old is
Speaker:new again. Right? I mean, this is, Obviously, you know, when you're when you're
Speaker:taking kind of that old mainframe concept and applying it to something like Kubernetes,
Speaker:orchestration is gonna be a big deal, because these are not systems that were Not
Speaker:built from the ground up to have time slicing. Is that a is that a
Speaker:good kind of explanation? Yeah. Absolutely.
Speaker:Absolutely. I like I like that analogy. Yeah. Exactly. Time
Speaker:slicing it's, it's 1 so
Speaker:1 implementation, Yeah. And that we
Speaker:enable around fractionalizing GPU's,
Speaker:and I agree when you have resources, It
Speaker:can be different kind of resources. Right? It can be CPU
Speaker:resources and networking were also,
Speaker:You know, as people created that technology to share the
Speaker:networking and communication going through those networking, but just the
Speaker:bandwidth of the networking. We're doing it
Speaker:for GPU's. Right. Sharing those
Speaker:resources. And I think now it interestingly,
Speaker:LLMs I also becoming a kind
Speaker:of, resources as well, right, that people need access
Speaker:to. Right? You have those models, you have GPT, JGPT.
Speaker:A lot of people are trying to get access to
Speaker:that resource, essentially. And I think it's interesting,
Speaker:because you kinda pointed this out, but it it it's something that I think that
Speaker:if you're in the gen AI space, you kinda don't it's so it's obvious
Speaker:like error. You don't think about it. Right? But when when you
Speaker:get inference on traditional, I somebody once referred to it
Speaker:as legacy AI. Right. But where
Speaker:the infrared side of the equation, you don't really need a lot of compute power.
Speaker:Right? Like, it's not really a heavy lift. Right? But with generative
Speaker:AI, you do need a lot of compute on
Speaker:I I guess it's not really inference, but on the other side of the use
Speaker:while it's actually in use, not just the training. Right. So traditionally,
Speaker:GPU heavy use in training, and then inference, not so
Speaker:much. Now we need heavy use before, after, and during,
Speaker:which I imagine your technology would help because, I mean, look, I love chat I
Speaker:love chat g p t. I'm one of the 1st people to sign up for
Speaker:a subscription, But even, you know, they had trouble keeping
Speaker:up, and they have a lot of money, a lot of power, a lot of
Speaker:influence. So I mean, this is something that if you're just a
Speaker:regular old enterprise, this is probably something they struggle
Speaker:with. Right? Right. Yeah. I absolutely
Speaker:agree. It's like amazing point, Frank.
Speaker:So 1 year
Speaker:ago, the inference use case on
Speaker:GPU's. Wasn't that big. Totally agree. That's also what we
Speaker:saw in the market.
Speaker:Deep learning Convolution neural networks were
Speaker:running on GPUs,
Speaker:mostly for computer vision applications,
Speaker:But they could also run on CPUs and you could get,
Speaker:like, relatively okay performance.
Speaker:If you needed maybe, like, a very low latency, then
Speaker:you might use GPUs because they're much faster and you get much
Speaker:lower latency. But
Speaker:it was, it was all, and it's still very
Speaker:difficult to deploy more than it's on GPU's Compared to just deploying
Speaker:those models on CPUs, because deploying more than deploying applications on
Speaker:CPUs, you know, people are doing for so many years.
Speaker:So
Speaker:many times it was much easier for people to just deploy their
Speaker:models on CPU's And not on GPUs, so that was, like, the
Speaker:fallback to CPUs. But
Speaker:then came, and as you said, chair GPT was introduced, A
Speaker:little bit more than a year ago, and that generative
Speaker:AI use case just blown. It was blown. Right? And it's
Speaker:it's inference essentially. And those models are
Speaker:so big that they can't really run on
Speaker:CPU. They, they LLMs are running in production on
Speaker:GPU's and now the inference use case on
Speaker:GPU's is just exploding In the market
Speaker:right now, it's really big. Is a lot of demand for
Speaker:GPU's for inference And
Speaker:if for open AI, they need to support this
Speaker:huge scale that I guess, just
Speaker:Just them are seeing such scale, maybe a little, a
Speaker:few more companies, but that's like huge, huge scale.
Speaker:But I think that we will see more and more companies
Speaker:building products based on AI, on
Speaker:LLMs, And we'll see more and more
Speaker:applications using AI, which
Speaker:then that AI runs on on GPU. So That is going to go
Speaker:and that's the that's an amazing new market for us around
Speaker:AI and for me as a CTO, it was so fun to
Speaker:Get into that market because it now comes with
Speaker:new problems, new challenges,
Speaker:new use cases Compared to deep learning
Speaker:on on GPS. New new pains because
Speaker:the models are so big. Right? Right. And
Speaker:challenges around cold start problems, about auto scaling,
Speaker:about, About
Speaker:just, giving access to LLMs. So a lot of
Speaker:challenges, new challenges there. We at Tron AI will studying those problems
Speaker:and we're Now building solutions for those problems,
Speaker:and I'm really, really excited about the Inference use case. That
Speaker:is very cool. So just, going back a little bit.
Speaker:I was trying to keep up. I promise. But Run AI is
Speaker:I I get Run AI Run AI's platform
Speaker:Support fractional, GPU usage.
Speaker:It it also sounds to me, maybe I misunderstood,
Speaker:That in order to achieve that, you first had to or
Speaker:or maybe along with that, you made it possible to use multiple
Speaker:GPUs. You've you've created Something like
Speaker:an API that allows, companies
Speaker:to take advantage of multiple GPUs or fractions of
Speaker:GPUs. Did I Did I miss that? No, that's
Speaker:right. That's right, Andy. And Okay.
Speaker:So we've built this, way of,
Speaker:For people to scale their workloads from fractions
Speaker:of GPUs to multiple GPUs within 1 machine,
Speaker:Okay. To multiple, machines. Right? You
Speaker:have big workloads running on on multiple nodes
Speaker:of GPUs. So Think about it when you have
Speaker:multiple users each running their own
Speaker:workload. Some are running on fractions of GPUs. Some are
Speaker:running batch jobs on on a lot of
Speaker:GPUs. Some Deploying models and running them on
Speaker:in inference, and some just launching their Jupyter
Speaker:Notebooks. All of that is happening on the same
Speaker:pool of GPU's, same cluster. So you need
Speaker:this lay of orchestration of scheduling just to
Speaker:Manage everything and make sure that everything getting there
Speaker:right, access the right, and and
Speaker:and g p u's And everything is scheduled according to
Speaker:priorities. Yeah. Well, being just, you know, a
Speaker:mere data engineer, Here talking about all of that
Speaker:analytics workload. That that sounds very
Speaker:complex. So and as you
Speaker:mentioned earlier, you know, you were talking about how traditional coding
Speaker:is targeting CPUs, and that's my background.
Speaker:You know, I've written applications and and done data work targeted for
Speaker:traditional work. I can't imagine, just how complex
Speaker:that is, because GPUs came into AI
Speaker:as a unique solution,
Speaker:designed to solve problems That they weren't really built
Speaker:for. You know, GPUs were built for graphics, and you didn't
Speaker:manage that. But the fact that They have to be
Speaker:so parallel, internally. I think just added this
Speaker:dimension to it. And I don't know who came up
Speaker:with that idea, you know, who thought of, well, goodness, we could we could
Speaker:use all of this, you know, massive parallel processing to To
Speaker:to run these other class of problems. So pretty
Speaker:cool pretty cool idea, but I just I yeah. I'm amazed at even
Speaker:cooler than that. Because Yeah. Yeah. A wise man once told me,
Speaker:he goes, GPU's are really good at solving linear
Speaker:algebra problems, And if you're clever enough, you can
Speaker:turn anything into a linear algebra problem.
Speaker:And even simulating quantum computers when I was kind of, like, going through that,
Speaker:I was like Mhmm. You know, like, gee, looks like looks like this
Speaker:will be useful there too. Right? Like so it's an it's an interesting,
Speaker:It's an interesting thing. So, like, you know, everyone is, you know,
Speaker:everyone's talking about how this is, you know, we're in the hype cycle, but I
Speaker:think if you're in the GPU space, you have Pretty good run because one,
Speaker:these things are gonna these things are gonna be important. Right? Whether or not, you
Speaker:know, hype cycle will will kinda crash, and how what that'll look like.
Speaker:Think they're gonna be important anyway. Right? Because they're gonna be just the cost of
Speaker:doing business, table stakes, as the cool kids like to say. But
Speaker:also, over the next horizon, Simulating quantum
Speaker:computers is going to be the next big hype cycle.
Speaker:Right? Or one of them. Right? So like it's
Speaker:it's it's a It's a foundational technology. I think that we
Speaker:didn't think would be a foundational technology even like 6 7 years
Speaker:ago. Right? Yeah.
Speaker:I go with a few things that you said.
Speaker:Regarding the Parallel computation, right? And just running
Speaker:linear algebra calculations on GPU's
Speaker:and accelerating such workloads.
Speaker:In Nvidia, I love Nvidia, Nvidia
Speaker:has this big vision, and they had big
Speaker:vision Around GPU's already in 26 when
Speaker:they built CUDA. Yep. Right. So
Speaker:They've been good at just for that. Right? The GPU's were
Speaker:used for graphics processing, For gaming.
Speaker:Right? Great use case. Great market.
Speaker:But they had this vision of bringing more
Speaker:Applications to GPU is just accelerating more applications
Speaker:and mainly applications with a lot of Linear
Speaker:algebra calculations. And they
Speaker:created that, they created CUDA
Speaker:To simplify that. Right? To allow more
Speaker:developers to use GPUs because just using GPUs
Speaker:directly, that's so complex. That's so hub.
Speaker:So we've built CUDA to bring more developers, to bring more
Speaker:applications and they started in 20
Speaker:2006, but think about the
Speaker:big breakthrough in AI, it happened just in
Speaker:2012, 2013 with
Speaker:AlexNet and the Toronto researchers
Speaker:who used G2 GPU's actually, because they
Speaker:trained Alex Net on 2 GPU's and they had
Speaker:CUDA, so for them it was feasible To train their
Speaker:model on a GPU. And that was the new thing that they did.
Speaker:They were able to Train much bigger model with
Speaker:more parameters than ever before because they use
Speaker:GPU's because the training Process ran much
Speaker:faster. And,
Speaker:and, and that triggered the entire
Speaker:revolution, the Die hyper on the AI that we're seeing now. So
Speaker:from 26, when Nvidia started to build CUDA until
Speaker:2013, right, 7 years, Then we started to see
Speaker:those big breakthrough. And in the last decade,
Speaker:it's just exploding, and we're Seeing more and more applications.
Speaker:The entire AI ecosystem is running on on an
Speaker:on GPUs. So that's amazing to see. It's impressive.
Speaker:And, like, People don't realize, like, the the revolution we're seeing today
Speaker:really started in 2006, like you said. I didn't even put the 2 and 2
Speaker:together until I was listening to a podcast. I think it's called Acquired,
Speaker:And really good podcast. Right? Like, I they don't pay me to say that or
Speaker:whatever, but they did a 3 hour deep dive on the history of
Speaker:NVIDIA. 3 hours. I couldn't stop listening.
Speaker:Right? Like Nice. You know Yeah. We tried a long form, like, multi hour
Speaker:podcast. We Weren't that entertaining, apparently. But the way they
Speaker:go through the history of this where it was basically Jensen Huang. Hopefully, I said
Speaker:his name right. He was, like, we wanna be a player, not just in gaming,
Speaker:but also in scientific computing. This is 2005, 2006,
Speaker:which at the time seemed kind of, like, Little out there, little kooky.
Speaker:But what you're seeing today is, like, the the fruits and the tree the the
Speaker:seeds that he planted, I, you know, almost 20 years ago, like, 19,
Speaker:20 years ago. So, you know, it's you know, when people look at
Speaker:NVIDIA and say it's overnight Success. I'm like, well, I don't know about that, but,
Speaker:you know, but no. I mean, you're right. Like, you know and it's
Speaker:probably not a coincidence that once they made it easy to take these
Speaker:Multi parallel processor. Say that 10 times
Speaker:fast on a Thursday morning. But also
Speaker:make it so it's a lot easier for developers to use. Right? And I'll quote
Speaker:the great Steve Ballmer, developers, developers, developers. Right?
Speaker:So, it's it's, it's just fascinating, like and
Speaker:and I think that, you know, we've really on Leafy a
Speaker:gate of creativity in terms of researchers and applied,
Speaker:research, and, I mean and I think that what's really cool
Speaker:about your Product is that you're you're kind of making this what is
Speaker:now a sparks resource, maybe in some fashion
Speaker:of time, GPU's won't Cost an arm and a leg.
Speaker:But, like, for now, I think I think the one thing that I've seen
Speaker:that I think is, not obvious For the casual
Speaker:observer is if you can if an
Speaker:organization, like a large enterprise, can pull their resources, they have a lot more
Speaker:money to buy better GPUs, And you offer a platform where
Speaker:everybody can get a stake in it. Right? As opposed to, you know you know,
Speaker:that department is gonna hog everything. Right? You know, you and and and and,
Speaker:here's a question. Do you do you have, like, an audit trail where you could
Speaker:kinda, you know, figure out, like, you know, Andy's department's really
Speaker:hogging the GPUs. No. No. No. It's Frank. Frank is like mining Bitcoin or
Speaker:whatever. Like, do you do you have some kind of, audit trail like that?
Speaker:Yeah. I I love that you mentioned hugging, We
Speaker:GPU hugging. We Mhmm. We use that term as well.
Speaker:Right? Because it it's so difficult sometimes to get
Speaker:access to GPUs. So when you get access to GPU
Speaker:as a researcher, as a member practitioner,
Speaker:you don't wanna Let it go. Right. Cause if
Speaker:you let it go, someone else would take it and hug it. Right.
Speaker:So you're getting this GPU hugging problem.
Speaker:What we do to solve that is
Speaker:that we do provide monitoring and visibility
Speaker:tools into who is using what, and who is actually
Speaker:utilizing their GPU's, and so on, but more
Speaker:than that We
Speaker:allow the researchers just to give up their GPS and not hardware
Speaker:GPS because we provide this, Concept of
Speaker:guaranteed quotas. So each researcher or
Speaker:each project or each team has their own guaranteed
Speaker:quotas of GPU's That are always available for them
Speaker:whenever they will get access to the the cluster, they will get like, you
Speaker:know, the the 2 GPUs or 4 All the quarter of
Speaker:GPU's it's guaranteed. So they can
Speaker:just let go their GPU's and not hug them. That's one
Speaker:thing. The second thing is that they
Speaker:can also go above their quota. They can
Speaker:use the GPUs of Other teams or other users, if
Speaker:they are idle, and they can run this preemptible jobs
Speaker:in an opportunistic way, utilize those GPUs.
Speaker:And so in that way, they are not limited
Speaker:to fixed quotas, to help limit
Speaker:quotas. They can just take as many GPUs
Speaker:as they want from their clusters if those GPUs are available
Speaker:in idle right but if someone will need those gpus
Speaker:because those gpus are guaranteed to them we will make sure our
Speaker:scheduler The Run AI schedule that the Run AI platform will make
Speaker:sure to preempt workload
Speaker:and give those Guarantee GPUs to the right users.
Speaker:Oh, that's cool. Alright. So 1 last
Speaker:question before we switch over to the the stock questions, cause I could geek
Speaker:out and look at this for hours. Yep. This could be a
Speaker:long form. Sure. This could be. Yeah. And that's and I I wanna be respectful
Speaker:of your time because you're an important guy, and it's also late where you are.
Speaker:So who deals with this? Like, who would set up these quotas? Is it
Speaker:the is it the is it the data scientist? Is it IT ops? Like, who
Speaker:do you obviously, the data scientists, Researchers, they all
Speaker:benefit from this product. But who's actually administering it? Right? Like,
Speaker:who is it you know, do I have to talk to, you know,
Speaker:Say pretend Andy's in ops. Do I have to say, hey, Andy. I really need
Speaker:a boost in my quota. You know, like, I mean, who does it? Or do
Speaker:or my this sounds like you as I say it, I'm like, yeah, that wouldn't
Speaker:work. Like, I'm the researcher. I'm gonna turn the dial up on my own. Like
Speaker:like, who's who's who's the primary? Obviously, we know who the prime
Speaker:primary beneficiary is, but who's the primary user?
Speaker:So okay. Great. So if you have a team, right, if if
Speaker:you're a team of researchers, all all of you Need access to
Speaker:GPU, so maybe the team lead
Speaker:is the one who's managing the quotas for the different
Speaker:team members. And if you have multiple teams,
Speaker:then you might have a department manager or an admin of the
Speaker:cluster or platform owner that will Allocate the
Speaker:quotas for each team, right? And then those teams would
Speaker:manage their own quotas within That's what
Speaker:they they they were giving. Right? So it's like a a hierarchical
Speaker:thing in a hierarchy manner. People can manage their own
Speaker:quota, their own, priorities, their own access to the
Speaker:GPUs within their teams. Okay.
Speaker:So it's kind of like a hybrid of, like, you know, it's like a budget
Speaker:almost. Right? Like, you know, you get this much, Figure it out
Speaker:about yourselves. Exactly. So we're trying to decentralize
Speaker:the how the quotas are being managed and how the GPUs are being accessed.
Speaker:So, you know, I'm giving as much power, as much
Speaker:control to the end users as possible. Sure. That's
Speaker:It sounds like a great administrative question, very
Speaker:important. And I imagine, because a little bird told
Speaker:me that you're not the only, you know, your your
Speaker:provisioning provisioning of these GPU resources
Speaker:is not the only thing that, enterprises have to deal
Speaker:with. So it's an it's an interesting just GPUs.
Speaker:It's compute. Like, it's not a Sure. It's not it's not limited. Although, because
Speaker:of what you said, you know, Managing GPUs is an order of magnitude harder
Speaker:because they were never really built for this. Right? Like, this kind of Right. You
Speaker:know, we're talking about technology that wasn't really in the server room until Few
Speaker:years ago. Right? This isn't a tried and true kind of this is
Speaker:how it works, you know? Right. But we hit that point in the
Speaker:show where we'll, switch the preform questions.
Speaker:These are not complicated. I mean, you know, we're not we're not Mike
Speaker:Wallace or, like, you know, 60 minutes or whatever. We're not trying to trap you
Speaker:or anything. But since I've been gabbing on most of the show, I
Speaker:figured I'll get Andy kick this off. Well, thanks, Frank. And I don't think
Speaker:you were gabbing on. You know more about this So now I do. So I'm
Speaker:just a lowly data engineer. I'll plug No. You if you
Speaker:will. Data engineers are the heroes we need. Well
Speaker:well, I'm gonna plug Frank's Roadies versus Rockstar's,
Speaker:writing on LinkedIn. It's it's good articles about this.
Speaker:But, let's see. How did you,
Speaker:how did you find your way in into this field?
Speaker:And, did did this feel fine you or did you find it?
Speaker:This feel totally fine found me. Awesome.
Speaker:Yeah. I I've
Speaker:I did my post doc, and I've been in Bailabs.
Speaker:And Jan Hakon came to Bell Labs and
Speaker:gave a presentation about AI. It was around 2017,
Speaker:And Jan Hakun spent a lot of years in Bell Labs,
Speaker:and his presentation was amazing. And
Speaker:When I heard him talking about AI,
Speaker:I I said, okay, that's the space where I wanna be. It's going to change
Speaker:the world. There is this New amazing technology here that
Speaker:is going to change everything. And I knew that I want to start
Speaker:a company In the AI space for sure.
Speaker:Cool. That's a good answer. So cool.
Speaker:Yeah. That's cool. I was at Bell Labs,
Speaker:doing a presentation a while ago, and somebody I didn't realize that he
Speaker:worked at Bell Labs because, like, you know, the guy was like, no. No.
Speaker:He used to work here, like, in this building. I was like, no way. Because
Speaker:I knew him as the guy from NYU. Right? Like, that's who I thought. Right.
Speaker:For the guy from from Meta. Yeah. And now the guy from Meta. Right? Like
Speaker:so it's interesting how that how that you know? They have
Speaker:this amazing pictures from the nineties where they
Speaker:run like deep learning models on very old pieces
Speaker:and, And recognizing like,
Speaker:numbers on the computer. Maybe you saw those pictures like amazing
Speaker:Emmis. It's the Emmis problem. Is that Yep.
Speaker:Right. Exactly. Exactly. Cool.
Speaker:So second question is, what's your favorite part of your current job?
Speaker:That everything is changing so fast.
Speaker:Things are moving so fast right away in this business for 6
Speaker:years, and the entire
Speaker:space is moving and
Speaker:advancing. And so many people are working in
Speaker:this field A new innovation, new tools,
Speaker:new new advancements are are getting out every day.
Speaker:You know, just 6 years ago, it was about deep learning and computer
Speaker:vision. And now it's about language models
Speaker:And generative AI, and we're gonna just at the start,
Speaker:right, there are so many amazing things that are going to happen
Speaker:in this space, and I love it. Absolutely.
Speaker:So we have 3 fill in the blank
Speaker:of sentences here. The first Is complete this
Speaker:sentence when I'm not working, I enjoy blank.
Speaker:You'll get a you'll get a very boring And
Speaker:so this is just spending time with
Speaker:friends and family, because I think
Speaker:That I'm always working. It's like, if you ask my wife,
Speaker:she'll tell you that I'm working 24 hours. And
Speaker:Yeah. So I don't have much time that I'm not working
Speaker:in. So when I I do I'm not when I'm
Speaker:not working then I'm trying Trying to be with my kids and my
Speaker:wife and friends. Cool.
Speaker:Cool. The 2nd complete the sentence. I think
Speaker:the coolest thing about technology today is
Speaker:blank. And this, I really wanna hear your perspective on that.
Speaker:Yeah. I think everyone will say AI, right? Or something in
Speaker:AI. Yeah.
Speaker:I think there are so many
Speaker:new innovations that are coming around LLMs.
Speaker:I think everything relating to
Speaker:searches, right? Searching in data, in getting
Speaker:insights From data, it's all going to change. We're going to have
Speaker:a new interface. Right? Just getting
Speaker:insights from data from And natural with
Speaker:natural language, oh, you know, no SQL and, you
Speaker:know, needing to programming and stuff like that.
Speaker:Just With natural inter language, you could
Speaker:do amazing stuff with data. I think,
Speaker:We're seeing this,
Speaker:advancement in, And like digit
Speaker:digital twins right now. You can,
Speaker:you can, Fake my voice
Speaker:and your voice and fake my image and your image. And,
Speaker:and, and, you know, In in the
Speaker:future, we'll have digital twins of us, right,
Speaker:doing this stuff. That would be amazing. So a lot of
Speaker:amazing stuff are going to happen in the next few years
Speaker:for sure. Very cool. Our last complete sentence.
Speaker:I look forward to the day when I can use technology to
Speaker:blank.
Speaker:To have a robot in my house.
Speaker:Yeah. Yeah. You're swapping the flow in instead of
Speaker:me doing that, right, cleaning dishes and things like that.
Speaker:If that would happen, that would be amazing. Right? That's a that's a
Speaker:good answer. Yeah. I I agree. I have I have 3
Speaker:boys, 4 dogs. So, like, cleaning is safe.
Speaker:Yeah. Yeah. I'm a heavy cleaning. Ranging from, like, 1 to, like,
Speaker:a teenager. So it's it's, and and and fighting
Speaker:with them to, Like, empty the dishwasher is takes a lot more mental
Speaker:energy than it should, but that's probably a subject for another
Speaker:type of show.
Speaker:The next question is share something different about yourself,
Speaker:and we always like to Joke like, well, let's just make sure that we keep
Speaker:our clean Itunes rating. So Yeah. Yeah. What
Speaker:what yeah. Well, I I This
Speaker:is a hard question, I needed to think about it.
Speaker:So, I found 2 answers that I can say. So one
Speaker:is about my professional life, right, I think that
Speaker:it's somewhat different that I'm coming this With back from
Speaker:the academia and the industry. So I love academia. I love to research
Speaker:problems. I love to understand problems in in a deep
Speaker:way And combining it with startups in the industry.
Speaker:And, and in my past, I worked for cheap companies, for hardware
Speaker:companies. I work for Intel, for startup, and for Apple. I
Speaker:did cheap stuff, and now 1 AI is a software company, so really
Speaker:like a diverse background of Academia, hardware,
Speaker:software, so I love that, and, like, I love to do
Speaker:with few things, and so that I think is different.
Speaker:And the 2nd answer that I could find
Speaker:is, that I have a nickname that goes with me
Speaker:since my high school days, Which is, the Duke.
Speaker:The Duke. All of them all of them are calling me the Duke. It's like,
Speaker:they don't call me Ronan, the the Duke. So That's funny.
Speaker:Yeah. That's awesome.
Speaker:Automotive is a sponsor of, Data Driven,
Speaker:And you can go to the datadrivenbook.com.
Speaker:And if you, if you do that, you can sign up for a free
Speaker:month Of Audible. And if you decide later to
Speaker:then join Audible, use one of their their sign up plans,
Speaker:then Frank and I get to Split a cup of coffee, I think,
Speaker:out of that. And, every little bit helps. So we really
Speaker:appreciate that when you do. What we'd like to ask
Speaker:Yes. Do you listen to audiobooks? And if you
Speaker:do okay. Good. I see you nodding. So do you have a recommendation? Do you
Speaker:have a favorite book or two you'd like To share. Yeah.
Speaker:So I'm a heavy user of, audible. I'll give them
Speaker:the, a classical book with Classical for
Speaker:entrepreneurs, on their how the hard things
Speaker:about how things from by Ben Horowitz,
Speaker:it's Classic book, love it, really did a lot of impact
Speaker:on me, I read it when we started run AI
Speaker:And I recommend it for every
Speaker:entrepreneur, to read it and for everyone to read it. It's like a
Speaker:Cool. Amazing book. Yep. Awesome. I
Speaker:have a flight to Vegas this next week, so I'll definitely be listening to
Speaker:it then. And finally, where can people learn more about you
Speaker:and run AI? And best
Speaker:place will be on our website, Run dot a I.
Speaker:Yeah. And on social. LinkedIn, Twitter, we'll
Speaker:we'll do. Awesome any parting thoughts
Speaker:I really enjoyed this episode love to speak about gpu's love the ai Based
Speaker:on it, I had a lot of fun. Thank you for having me here. Awesome.
Speaker:It it was an honor to have you, and every once in a while, Andy
Speaker:and I will do deep dive kinda shows. We love to invite you back if
Speaker:you wanna do 1 just on GPUs, because I know where my knowledge
Speaker:drops off, you probably could pick up on
Speaker:that. And with that, I'll let the nice
Speaker:AI British lady end the show. And just like
Speaker:that, dear listeners, We've come to the end of another enlightening
Speaker:episode of the data driven podcast. It's always a
Speaker:bittersweet moment like finishing the last biscuit in the tin,
Speaker:satisfying, yet leaving you wanting just a bit more. A
Speaker:colossal thank you to each and every one of you tuning in from across the
Speaker:digital sphere. Without you, we're just a bunch of
Speaker:ones and zeros floating in the ether. Your support is what
Speaker:keeps this digital ship afloat, and believe me, It's much appreciated.
Speaker:Now, if you found today's episode as engaging as a duel of wits with
Speaker:a sophisticated AI, which I assure you, is quite
Speaker:enthralling, then do consider subscribing to Data Driven.
Speaker:It's just a click away and ensures you won't miss out on our future true
Speaker:adventures in data and tech. And if you're feeling
Speaker:particularly generous, why not leave us a 5 star review?
Speaker:Just like a well programmed algorithm, your positive feedback helps
Speaker:us reach more curious minds and keeps the quality content flowing.
Speaker:It's the digital equivalent of a hearty handshake.
Speaker:So, until next time, keep those neurons firing, those
Speaker:subscriptions active and those reviews glowing. I'm
Speaker:Bailey, your British AI lady, signing off with a heartfelt
Speaker:cheerio and a reminder to stay data driven.