Speaker:

Greetings, listeners. Welcome back to the Data

Speaker:

Driven Podcast. I'm Bailey, your AI host with

Speaker:

the most data, that is, bringing you insights from the ether

Speaker:

with my signature wit. In today's episode, we're

Speaker:

diving deep into the heart of artificial intelligence's engine room,

Speaker:

GPU orchestration. It's the unsung hero

Speaker:

of AI research, optimizing the raw power needed to fuel

Speaker:

today's most advanced machine learning models. And

Speaker:

who better to guide us through this labyrinth of computational complexity than

Speaker:

Ronan Darr, the cofounder and CTO of Run AI, the

Speaker:

company that's making GPU resources work smarter, not

Speaker:

harder. Now onto the show.

Speaker:

Hello, and welcome to Data Driven, the podcast where we For the emergent fields

Speaker:

of artificial intelligence, data engineering, and overall data

Speaker:

science and analytics. With me as always is my favoritest

Speaker:

Data engineer in the world, Andy Leonard. How's it going, Andy? It's

Speaker:

going well, Frank. How are you? I'm doing great. I'm doing great. It's been,

Speaker:

we're We're recording this February 1, 2024. And as I said to my

Speaker:

kids yesterday, January has been a long year.

Speaker:

We're only, like, 1 month into the year, and it was it was a pretty

Speaker:

wild ride. But I can tell we're gonna have a blast today,

Speaker:

because we're gonna geek out on something that I kinda sort of understand,

Speaker:

but not entirely, and it's GPUs. And in the virtual green room, were chit

Speaker:

chatting with some folks, and, but let me do the formal introduction

Speaker:

here. Today with us, we have doctor Ronadhar, cofounder and CTO

Speaker:

of Run AI, A company at the forefront of GPU

Speaker:

orchestration, and he has a distinguished career in technology.

Speaker:

His experience includes significant roles at Apple. Yes, That

Speaker:

apple. Bell Labs. Yes. That Bell Labs.

Speaker:

And at Run AI, Ronan is instrumental in optimizing

Speaker:

GPU usage For AI model training and deployment,

Speaker:

leveraging his deep passion for both academia and startups.

Speaker:

And, Run AI is a key player in the, and he is a he

Speaker:

and Run AI are key player in the AI revolution. Ronan's

Speaker:

contribute Contributions are pivotable in shaping and powering the

Speaker:

future of artificial intelligence. Now I will add that in

Speaker:

my day job at Red Hat, Run AI has come up a couple of times.

Speaker:

So this is definitely, definitely

Speaker:

an honor to have you on on on the show, sir. Welcome.

Speaker:

Thank you, Frank. Thank you for inviting me. Hey, Andy. Good to

Speaker:

be here. I love it. Love Reddit. We're a big

Speaker:

fan of Reddit. We're working closely with many people in

Speaker:

Reddit, and love that. Right? Love OpenShift,

Speaker:

love Reddit, love Linux. Yeah. Cool. Cool.

Speaker:

Yeah. So so for those who don't know exactly, I kinda know

Speaker:

what, your Run AI does, but can you explain exactly

Speaker:

What it is run AI does and why GPU

Speaker:

orchestration is important. Yes.

Speaker:

Okay.

Speaker:

So run AI is, software,

Speaker:

AI infrastructure platform. So we

Speaker:

help machine learning teams to get much more

Speaker:

out of their GPUs, And we provide

Speaker:

those teams with abstraction layers and tools

Speaker:

so they can train models And deploy models

Speaker:

much easier, much faster. And

Speaker:

so We started in 2018, 6 years

Speaker:

ago. It's me and my cofounder, Omuri. Omuri is the CEO.

Speaker:

He's, he's amazing. I love him. We We know each other for many

Speaker:

years. We we met in the academia, like, more than 10 years ago,

Speaker:

and and we started running AI together, and We started

Speaker:

running AI because we saw that there are big challenges

Speaker:

around, GPU's, around orchestrating

Speaker:

GPU's and utilizing GPU's. We saw back then

Speaker:

in 2018, the GPUs are going to be very very important.

Speaker:

It's like the basic a a component in

Speaker:

that any AI company need to train models,

Speaker:

right, and deploy models. So we saw that GPUs are going to be critical, but

Speaker:

there are also a lot of challenges with, with utilizing GPUs.

Speaker:

I think back then, GPUs were relatively new In

Speaker:

the data center, in in the cloud.

Speaker:

GPU's were very known in the gaming

Speaker:

industry. Right? We spoke before on gaming. Right? Like, a lot of

Speaker:

key things there that GPU's has has has been enabled

Speaker:

enabling, But in the data center, they were relatively new and the

Speaker:

entire software stack that is that

Speaker:

is running the Cloud in data center As was built for

Speaker:

traditional microservices applications that are running

Speaker:

on commodity CPUs And AI workloads are different, they are

Speaker:

much more compute intensive, they they

Speaker:

run on on GPUs, maybe on multiple nodes of Meet to point

Speaker:

machines of GPU's, and GPU's are also very different.

Speaker:

Right? They are expensive, very scarce in the data center.

Speaker:

So The entire software stack was a bit for something else

Speaker:

and when it comes to GPUs, it was really hard for many people to to

Speaker:

actually manage those GPUs. So we came in And, and we

Speaker:

saw those gaps. We've built run AI on top of

Speaker:

cloud native technologies like Kubernetes and containers. We're

Speaker:

big fans of Of those, technologies, and

Speaker:

we added components around scheduling, around

Speaker:

the GPU fractioning. So we enable

Speaker:

multiple workloads to run on a on a single GPU and

Speaker:

essentially all the provision GPU's. So we build this Engine which we

Speaker:

call cluster engine that runs in in in GPU

Speaker:

clusters. Right? We help machine learning teach to pull all of their GPU's into

Speaker:

1 cluster, Running that engine, and that engine provides a lot of

Speaker:

performance and lot of capabilities from those GPUs. And

Speaker:

on top of that, we built this control plane And

Speaker:

and tools and for machine learning,

Speaker:

teams to run the Jupyter Notebooks, to run

Speaker:

training jobs, batch jobs to deploy their models, right, to just to to

Speaker:

have tools for the entire life cycle of AI

Speaker:

from Training models in the lab to taking those models into

Speaker:

production and running them and serving actual users.

Speaker:

And That's the platform that we've built, and we're working with machine

Speaker:

learning teams across the globe and on just managing,

Speaker:

orchestrating, and letting them Get much more out of their GPUs and essentially

Speaker:

run faster, train more than faster and in much easier way and

Speaker:

deploy those modules In a much easier and faster and more efficient

Speaker:

way. Yeah. The thing that blew me away when I first heard of Run

Speaker:

AI, and this would have been, 2021

Speaker:

ish. No. 20 early

Speaker:

2021, I would say, And, it was the

Speaker:

idea of fractional GPU's. Right? So you can have 1,

Speaker:

I say 1, but, know, it's realistically, it's gonna be on, but you you can

Speaker:

kind of share it out, which I think and we were talking in the virtual

Speaker:

green room about how, you know, some of these GPU's,

Speaker:

If you can get them because there's a multi month, sometimes multi

Speaker:

year supply chain issue. I mean, these things are expensive bits of

Speaker:

hardware, and I think the real value, correct

Speaker:

me if I'm wrong, is, like, well, you know, if you I was talking to

Speaker:

somebody the other day, and and we're basically talking about how we can,

Speaker:

you know, if you get if you get, like, 1 laptop with a killer

Speaker:

GPU, right, that GPU is really only useful to that 1

Speaker:

user, Whereas if you can kind of put it in a in a in a

Speaker:

server and use something like RunAI, now everybody in the organization can do

Speaker:

that. And these are not trivial expenses. I mean, these are like, You know,

Speaker:

you sell a kidney type of costs here.

Speaker:

Yeah. Absolutely. So Absolutely. First of all, GPUs

Speaker:

are expensive. They cost a lot. Right?

Speaker:

And we provide, Technologies like fractional GPUs and

Speaker:

other technologies around scheduling that allows

Speaker:

teams to share GPUs. Right. So we used book on

Speaker:

GPU fractioning. So that's 1 one day of

Speaker:

sharing where you have 1 GPU, which is really expensive.

Speaker:

And Not all of the workloads are

Speaker:

AI workloads are really compute intensive and require the

Speaker:

entire GPU or, you know, maybe multiple GPUs. There are

Speaker:

workloads like Jupyter Notebooks where you have

Speaker:

researchers that just

Speaker:

Debugging their code or cleaning their data or doing some simple stuff,

Speaker:

and they need just fractions of GPUs.

Speaker:

In that case, if you have, a lot of data scientists,

Speaker:

maybe you wanna host all of their notebooks On

Speaker:

a much smaller number of GPUs because, right, each

Speaker:

one of them, it's just fractions of GPUs. Another big use case

Speaker:

for fractions Of GPUs is inference.

Speaker:

So now all of the models are huge

Speaker:

and And doesn't fit into, the memory of 1

Speaker:

GPU, and in computer vision,

Speaker:

there are a lot of Models that are relatively small,

Speaker:

they run on GPU, and you can essentially host multiple of

Speaker:

them on the same GPU. Right. So you can have instead of

Speaker:

just 1 computer vision model running on GPU, host 10

Speaker:

of those models on the same GPU and get Factors of

Speaker:

10 x in, in your cost, in your,

Speaker:

overall throughput of, of inference. So that's That's one

Speaker:

use case for fractional GPU, and we're investing heavily just

Speaker:

building that technology. Another layer

Speaker:

of sharing GPUs Comes where you

Speaker:

have maybe in your organization multiple teams

Speaker:

or multiple projects running in parallel. So

Speaker:

for example, may open AI, they now are working

Speaker:

on gpt5. It's 1 project. That project needs a

Speaker:

lot of GPUs And they have more projects. Right?

Speaker:

More research project around alignment or around,

Speaker:

reinforcement learning. You know? DALL

Speaker:

E. Like, they they they have more than just 1 project. Then DALL E and

Speaker:

they have multiple models. Right? Exactly. They have. Right? So each

Speaker:

project needs Needs GPUs. Right? Needs a lot of

Speaker:

GPUs. So if you can instead of

Speaker:

allocating GPUs Entirely for each project,

Speaker:

you could essentially pull all of those GPU's and share

Speaker:

them between the those different projects, different teams,

Speaker:

And in times where 1 project is idle and not

Speaker:

using their GPUs, other projects, other teams can share

Speaker:

can get access to those GPUs. Now orchestrating all of

Speaker:

that, orchestrating that sharing of resources between

Speaker:

projects, between teams can be really complex And

Speaker:

requires this advanced scheduling, which

Speaker:

which we're bringing into the game. We're bringing

Speaker:

those scheduling capabilities from the high performance computing world

Speaker:

known on those schedulers. And so we're bringing Capabilities

Speaker:

from that world into the cloud native Kubernetes

Speaker:

world. Scheduling around batch batch scheduling

Speaker:

fairness, Algorithms, things like that, so teams and projects

Speaker:

can just share GPUs in a simple and efficient

Speaker:

way. So those

Speaker:

are the 2 layers of sharing GPU's. Interesting. And and

Speaker:

I think that I think as As this field matures

Speaker:

and it matures in the enterprise, I think you're gonna see organizations

Speaker:

kind of be more,

Speaker:

more more more I think savvy about, like, okay, like you said, like, data scientists,

Speaker:

if they're just doing, like, you know, Traditional statistical modeling really doesn't benefit

Speaker:

from GPUs, or they're just doing data cleansing, data engineering.

Speaker:

Right? They're probably gonna say, like, well, Let's run it on this cluster, and

Speaker:

then we'll break it apart into discrete parts where, you

Speaker:

know, then we will need a GPU. And I also like the idea that, you

Speaker:

know, you're you're basically doing What what I learned in college,

Speaker:

which was time slicing. Right? Sounds like this is kind of, like, everything old is

Speaker:

new again. Right? I mean, this is, Obviously, you know, when you're when you're

Speaker:

taking kind of that old mainframe concept and applying it to something like Kubernetes,

Speaker:

orchestration is gonna be a big deal, because these are not systems that were Not

Speaker:

built from the ground up to have time slicing. Is that a is that a

Speaker:

good kind of explanation? Yeah. Absolutely.

Speaker:

Absolutely. I like I like that analogy. Yeah. Exactly. Time

Speaker:

slicing it's, it's 1 so

Speaker:

1 implementation, Yeah. And that we

Speaker:

enable around fractionalizing GPU's,

Speaker:

and I agree when you have resources, It

Speaker:

can be different kind of resources. Right? It can be CPU

Speaker:

resources and networking were also,

Speaker:

You know, as people created that technology to share the

Speaker:

networking and communication going through those networking, but just the

Speaker:

bandwidth of the networking. We're doing it

Speaker:

for GPU's. Right. Sharing those

Speaker:

resources. And I think now it interestingly,

Speaker:

LLMs I also becoming a kind

Speaker:

of, resources as well, right, that people need access

Speaker:

to. Right? You have those models, you have GPT, JGPT.

Speaker:

A lot of people are trying to get access to

Speaker:

that resource, essentially. And I think it's interesting,

Speaker:

because you kinda pointed this out, but it it it's something that I think that

Speaker:

if you're in the gen AI space, you kinda don't it's so it's obvious

Speaker:

like error. You don't think about it. Right? But when when you

Speaker:

get inference on traditional, I somebody once referred to it

Speaker:

as legacy AI. Right. But where

Speaker:

the infrared side of the equation, you don't really need a lot of compute power.

Speaker:

Right? Like, it's not really a heavy lift. Right? But with generative

Speaker:

AI, you do need a lot of compute on

Speaker:

I I guess it's not really inference, but on the other side of the use

Speaker:

while it's actually in use, not just the training. Right. So traditionally,

Speaker:

GPU heavy use in training, and then inference, not so

Speaker:

much. Now we need heavy use before, after, and during,

Speaker:

which I imagine your technology would help because, I mean, look, I love chat I

Speaker:

love chat g p t. I'm one of the 1st people to sign up for

Speaker:

a subscription, But even, you know, they had trouble keeping

Speaker:

up, and they have a lot of money, a lot of power, a lot of

Speaker:

influence. So I mean, this is something that if you're just a

Speaker:

regular old enterprise, this is probably something they struggle

Speaker:

with. Right? Right. Yeah. I absolutely

Speaker:

agree. It's like amazing point, Frank.

Speaker:

So 1 year

Speaker:

ago, the inference use case on

Speaker:

GPU's. Wasn't that big. Totally agree. That's also what we

Speaker:

saw in the market.

Speaker:

Deep learning Convolution neural networks were

Speaker:

running on GPUs,

Speaker:

mostly for computer vision applications,

Speaker:

But they could also run on CPUs and you could get,

Speaker:

like, relatively okay performance.

Speaker:

If you needed maybe, like, a very low latency, then

Speaker:

you might use GPUs because they're much faster and you get much

Speaker:

lower latency. But

Speaker:

it was, it was all, and it's still very

Speaker:

difficult to deploy more than it's on GPU's Compared to just deploying

Speaker:

those models on CPUs, because deploying more than deploying applications on

Speaker:

CPUs, you know, people are doing for so many years.

Speaker:

So

Speaker:

many times it was much easier for people to just deploy their

Speaker:

models on CPU's And not on GPUs, so that was, like, the

Speaker:

fallback to CPUs. But

Speaker:

then came, and as you said, chair GPT was introduced, A

Speaker:

little bit more than a year ago, and that generative

Speaker:

AI use case just blown. It was blown. Right? And it's

Speaker:

it's inference essentially. And those models are

Speaker:

so big that they can't really run on

Speaker:

CPU. They, they LLMs are running in production on

Speaker:

GPU's and now the inference use case on

Speaker:

GPU's is just exploding In the market

Speaker:

right now, it's really big. Is a lot of demand for

Speaker:

GPU's for inference And

Speaker:

if for open AI, they need to support this

Speaker:

huge scale that I guess, just

Speaker:

Just them are seeing such scale, maybe a little, a

Speaker:

few more companies, but that's like huge, huge scale.

Speaker:

But I think that we will see more and more companies

Speaker:

building products based on AI, on

Speaker:

LLMs, And we'll see more and more

Speaker:

applications using AI, which

Speaker:

then that AI runs on on GPU. So That is going to go

Speaker:

and that's the that's an amazing new market for us around

Speaker:

AI and for me as a CTO, it was so fun to

Speaker:

Get into that market because it now comes with

Speaker:

new problems, new challenges,

Speaker:

new use cases Compared to deep learning

Speaker:

on on GPS. New new pains because

Speaker:

the models are so big. Right? Right. And

Speaker:

challenges around cold start problems, about auto scaling,

Speaker:

about, About

Speaker:

just, giving access to LLMs. So a lot of

Speaker:

challenges, new challenges there. We at Tron AI will studying those problems

Speaker:

and we're Now building solutions for those problems,

Speaker:

and I'm really, really excited about the Inference use case. That

Speaker:

is very cool. So just, going back a little bit.

Speaker:

I was trying to keep up. I promise. But Run AI is

Speaker:

I I get Run AI Run AI's platform

Speaker:

Support fractional, GPU usage.

Speaker:

It it also sounds to me, maybe I misunderstood,

Speaker:

That in order to achieve that, you first had to or

Speaker:

or maybe along with that, you made it possible to use multiple

Speaker:

GPUs. You've you've created Something like

Speaker:

an API that allows, companies

Speaker:

to take advantage of multiple GPUs or fractions of

Speaker:

GPUs. Did I Did I miss that? No, that's

Speaker:

right. That's right, Andy. And Okay.

Speaker:

So we've built this, way of,

Speaker:

For people to scale their workloads from fractions

Speaker:

of GPUs to multiple GPUs within 1 machine,

Speaker:

Okay. To multiple, machines. Right? You

Speaker:

have big workloads running on on multiple nodes

Speaker:

of GPUs. So Think about it when you have

Speaker:

multiple users each running their own

Speaker:

workload. Some are running on fractions of GPUs. Some are

Speaker:

running batch jobs on on a lot of

Speaker:

GPUs. Some Deploying models and running them on

Speaker:

in inference, and some just launching their Jupyter

Speaker:

Notebooks. All of that is happening on the same

Speaker:

pool of GPU's, same cluster. So you need

Speaker:

this lay of orchestration of scheduling just to

Speaker:

Manage everything and make sure that everything getting there

Speaker:

right, access the right, and and

Speaker:

and g p u's And everything is scheduled according to

Speaker:

priorities. Yeah. Well, being just, you know, a

Speaker:

mere data engineer, Here talking about all of that

Speaker:

analytics workload. That that sounds very

Speaker:

complex. So and as you

Speaker:

mentioned earlier, you know, you were talking about how traditional coding

Speaker:

is targeting CPUs, and that's my background.

Speaker:

You know, I've written applications and and done data work targeted for

Speaker:

traditional work. I can't imagine, just how complex

Speaker:

that is, because GPUs came into AI

Speaker:

as a unique solution,

Speaker:

designed to solve problems That they weren't really built

Speaker:

for. You know, GPUs were built for graphics, and you didn't

Speaker:

manage that. But the fact that They have to be

Speaker:

so parallel, internally. I think just added this

Speaker:

dimension to it. And I don't know who came up

Speaker:

with that idea, you know, who thought of, well, goodness, we could we could

Speaker:

use all of this, you know, massive parallel processing to To

Speaker:

to run these other class of problems. So pretty

Speaker:

cool pretty cool idea, but I just I yeah. I'm amazed at even

Speaker:

cooler than that. Because Yeah. Yeah. A wise man once told me,

Speaker:

he goes, GPU's are really good at solving linear

Speaker:

algebra problems, And if you're clever enough, you can

Speaker:

turn anything into a linear algebra problem.

Speaker:

And even simulating quantum computers when I was kind of, like, going through that,

Speaker:

I was like Mhmm. You know, like, gee, looks like looks like this

Speaker:

will be useful there too. Right? Like so it's an it's an interesting,

Speaker:

It's an interesting thing. So, like, you know, everyone is, you know,

Speaker:

everyone's talking about how this is, you know, we're in the hype cycle, but I

Speaker:

think if you're in the GPU space, you have Pretty good run because one,

Speaker:

these things are gonna these things are gonna be important. Right? Whether or not, you

Speaker:

know, hype cycle will will kinda crash, and how what that'll look like.

Speaker:

Think they're gonna be important anyway. Right? Because they're gonna be just the cost of

Speaker:

doing business, table stakes, as the cool kids like to say. But

Speaker:

also, over the next horizon, Simulating quantum

Speaker:

computers is going to be the next big hype cycle.

Speaker:

Right? Or one of them. Right? So like it's

Speaker:

it's it's a It's a foundational technology. I think that we

Speaker:

didn't think would be a foundational technology even like 6 7 years

Speaker:

ago. Right? Yeah.

Speaker:

I go with a few things that you said.

Speaker:

Regarding the Parallel computation, right? And just running

Speaker:

linear algebra calculations on GPU's

Speaker:

and accelerating such workloads.

Speaker:

In Nvidia, I love Nvidia, Nvidia

Speaker:

has this big vision, and they had big

Speaker:

vision Around GPU's already in 26 when

Speaker:

they built CUDA. Yep. Right. So

Speaker:

They've been good at just for that. Right? The GPU's were

Speaker:

used for graphics processing, For gaming.

Speaker:

Right? Great use case. Great market.

Speaker:

But they had this vision of bringing more

Speaker:

Applications to GPU is just accelerating more applications

Speaker:

and mainly applications with a lot of Linear

Speaker:

algebra calculations. And they

Speaker:

created that, they created CUDA

Speaker:

To simplify that. Right? To allow more

Speaker:

developers to use GPUs because just using GPUs

Speaker:

directly, that's so complex. That's so hub.

Speaker:

So we've built CUDA to bring more developers, to bring more

Speaker:

applications and they started in 20

Speaker:

2006, but think about the

Speaker:

big breakthrough in AI, it happened just in

Speaker:

2012, 2013 with

Speaker:

AlexNet and the Toronto researchers

Speaker:

who used G2 GPU's actually, because they

Speaker:

trained Alex Net on 2 GPU's and they had

Speaker:

CUDA, so for them it was feasible To train their

Speaker:

model on a GPU. And that was the new thing that they did.

Speaker:

They were able to Train much bigger model with

Speaker:

more parameters than ever before because they use

Speaker:

GPU's because the training Process ran much

Speaker:

faster. And,

Speaker:

and, and that triggered the entire

Speaker:

revolution, the Die hyper on the AI that we're seeing now. So

Speaker:

from 26, when Nvidia started to build CUDA until

Speaker:

2013, right, 7 years, Then we started to see

Speaker:

those big breakthrough. And in the last decade,

Speaker:

it's just exploding, and we're Seeing more and more applications.

Speaker:

The entire AI ecosystem is running on on an

Speaker:

on GPUs. So that's amazing to see. It's impressive.

Speaker:

And, like, People don't realize, like, the the revolution we're seeing today

Speaker:

really started in 2006, like you said. I didn't even put the 2 and 2

Speaker:

together until I was listening to a podcast. I think it's called Acquired,

Speaker:

And really good podcast. Right? Like, I they don't pay me to say that or

Speaker:

whatever, but they did a 3 hour deep dive on the history of

Speaker:

NVIDIA. 3 hours. I couldn't stop listening.

Speaker:

Right? Like Nice. You know Yeah. We tried a long form, like, multi hour

Speaker:

podcast. We Weren't that entertaining, apparently. But the way they

Speaker:

go through the history of this where it was basically Jensen Huang. Hopefully, I said

Speaker:

his name right. He was, like, we wanna be a player, not just in gaming,

Speaker:

but also in scientific computing. This is 2005, 2006,

Speaker:

which at the time seemed kind of, like, Little out there, little kooky.

Speaker:

But what you're seeing today is, like, the the fruits and the tree the the

Speaker:

seeds that he planted, I, you know, almost 20 years ago, like, 19,

Speaker:

20 years ago. So, you know, it's you know, when people look at

Speaker:

NVIDIA and say it's overnight Success. I'm like, well, I don't know about that, but,

Speaker:

you know, but no. I mean, you're right. Like, you know and it's

Speaker:

probably not a coincidence that once they made it easy to take these

Speaker:

Multi parallel processor. Say that 10 times

Speaker:

fast on a Thursday morning. But also

Speaker:

make it so it's a lot easier for developers to use. Right? And I'll quote

Speaker:

the great Steve Ballmer, developers, developers, developers. Right?

Speaker:

So, it's it's, it's just fascinating, like and

Speaker:

and I think that, you know, we've really on Leafy a

Speaker:

gate of creativity in terms of researchers and applied,

Speaker:

research, and, I mean and I think that what's really cool

Speaker:

about your Product is that you're you're kind of making this what is

Speaker:

now a sparks resource, maybe in some fashion

Speaker:

of time, GPU's won't Cost an arm and a leg.

Speaker:

But, like, for now, I think I think the one thing that I've seen

Speaker:

that I think is, not obvious For the casual

Speaker:

observer is if you can if an

Speaker:

organization, like a large enterprise, can pull their resources, they have a lot more

Speaker:

money to buy better GPUs, And you offer a platform where

Speaker:

everybody can get a stake in it. Right? As opposed to, you know you know,

Speaker:

that department is gonna hog everything. Right? You know, you and and and and,

Speaker:

here's a question. Do you do you have, like, an audit trail where you could

Speaker:

kinda, you know, figure out, like, you know, Andy's department's really

Speaker:

hogging the GPUs. No. No. No. It's Frank. Frank is like mining Bitcoin or

Speaker:

whatever. Like, do you do you have some kind of, audit trail like that?

Speaker:

Yeah. I I love that you mentioned hugging, We

Speaker:

GPU hugging. We Mhmm. We use that term as well.

Speaker:

Right? Because it it's so difficult sometimes to get

Speaker:

access to GPUs. So when you get access to GPU

Speaker:

as a researcher, as a member practitioner,

Speaker:

you don't wanna Let it go. Right. Cause if

Speaker:

you let it go, someone else would take it and hug it. Right.

Speaker:

So you're getting this GPU hugging problem.

Speaker:

What we do to solve that is

Speaker:

that we do provide monitoring and visibility

Speaker:

tools into who is using what, and who is actually

Speaker:

utilizing their GPU's, and so on, but more

Speaker:

than that We

Speaker:

allow the researchers just to give up their GPS and not hardware

Speaker:

GPS because we provide this, Concept of

Speaker:

guaranteed quotas. So each researcher or

Speaker:

each project or each team has their own guaranteed

Speaker:

quotas of GPU's That are always available for them

Speaker:

whenever they will get access to the the cluster, they will get like, you

Speaker:

know, the the 2 GPUs or 4 All the quarter of

Speaker:

GPU's it's guaranteed. So they can

Speaker:

just let go their GPU's and not hug them. That's one

Speaker:

thing. The second thing is that they

Speaker:

can also go above their quota. They can

Speaker:

use the GPUs of Other teams or other users, if

Speaker:

they are idle, and they can run this preemptible jobs

Speaker:

in an opportunistic way, utilize those GPUs.

Speaker:

And so in that way, they are not limited

Speaker:

to fixed quotas, to help limit

Speaker:

quotas. They can just take as many GPUs

Speaker:

as they want from their clusters if those GPUs are available

Speaker:

in idle right but if someone will need those gpus

Speaker:

because those gpus are guaranteed to them we will make sure our

Speaker:

scheduler The Run AI schedule that the Run AI platform will make

Speaker:

sure to preempt workload

Speaker:

and give those Guarantee GPUs to the right users.

Speaker:

Oh, that's cool. Alright. So 1 last

Speaker:

question before we switch over to the the stock questions, cause I could geek

Speaker:

out and look at this for hours. Yep. This could be a

Speaker:

long form. Sure. This could be. Yeah. And that's and I I wanna be respectful

Speaker:

of your time because you're an important guy, and it's also late where you are.

Speaker:

So who deals with this? Like, who would set up these quotas? Is it

Speaker:

the is it the is it the data scientist? Is it IT ops? Like, who

Speaker:

do you obviously, the data scientists, Researchers, they all

Speaker:

benefit from this product. But who's actually administering it? Right? Like,

Speaker:

who is it you know, do I have to talk to, you know,

Speaker:

Say pretend Andy's in ops. Do I have to say, hey, Andy. I really need

Speaker:

a boost in my quota. You know, like, I mean, who does it? Or do

Speaker:

or my this sounds like you as I say it, I'm like, yeah, that wouldn't

Speaker:

work. Like, I'm the researcher. I'm gonna turn the dial up on my own. Like

Speaker:

like, who's who's who's the primary? Obviously, we know who the prime

Speaker:

primary beneficiary is, but who's the primary user?

Speaker:

So okay. Great. So if you have a team, right, if if

Speaker:

you're a team of researchers, all all of you Need access to

Speaker:

GPU, so maybe the team lead

Speaker:

is the one who's managing the quotas for the different

Speaker:

team members. And if you have multiple teams,

Speaker:

then you might have a department manager or an admin of the

Speaker:

cluster or platform owner that will Allocate the

Speaker:

quotas for each team, right? And then those teams would

Speaker:

manage their own quotas within That's what

Speaker:

they they they were giving. Right? So it's like a a hierarchical

Speaker:

thing in a hierarchy manner. People can manage their own

Speaker:

quota, their own, priorities, their own access to the

Speaker:

GPUs within their teams. Okay.

Speaker:

So it's kind of like a hybrid of, like, you know, it's like a budget

Speaker:

almost. Right? Like, you know, you get this much, Figure it out

Speaker:

about yourselves. Exactly. So we're trying to decentralize

Speaker:

the how the quotas are being managed and how the GPUs are being accessed.

Speaker:

So, you know, I'm giving as much power, as much

Speaker:

control to the end users as possible. Sure. That's

Speaker:

It sounds like a great administrative question, very

Speaker:

important. And I imagine, because a little bird told

Speaker:

me that you're not the only, you know, your your

Speaker:

provisioning provisioning of these GPU resources

Speaker:

is not the only thing that, enterprises have to deal

Speaker:

with. So it's an it's an interesting just GPUs.

Speaker:

It's compute. Like, it's not a Sure. It's not it's not limited. Although, because

Speaker:

of what you said, you know, Managing GPUs is an order of magnitude harder

Speaker:

because they were never really built for this. Right? Like, this kind of Right. You

Speaker:

know, we're talking about technology that wasn't really in the server room until Few

Speaker:

years ago. Right? This isn't a tried and true kind of this is

Speaker:

how it works, you know? Right. But we hit that point in the

Speaker:

show where we'll, switch the preform questions.

Speaker:

These are not complicated. I mean, you know, we're not we're not Mike

Speaker:

Wallace or, like, you know, 60 minutes or whatever. We're not trying to trap you

Speaker:

or anything. But since I've been gabbing on most of the show, I

Speaker:

figured I'll get Andy kick this off. Well, thanks, Frank. And I don't think

Speaker:

you were gabbing on. You know more about this So now I do. So I'm

Speaker:

just a lowly data engineer. I'll plug No. You if you

Speaker:

will. Data engineers are the heroes we need. Well

Speaker:

well, I'm gonna plug Frank's Roadies versus Rockstar's,

Speaker:

writing on LinkedIn. It's it's good articles about this.

Speaker:

But, let's see. How did you,

Speaker:

how did you find your way in into this field?

Speaker:

And, did did this feel fine you or did you find it?

Speaker:

This feel totally fine found me. Awesome.

Speaker:

Yeah. I I've

Speaker:

I did my post doc, and I've been in Bailabs.

Speaker:

And Jan Hakon came to Bell Labs and

Speaker:

gave a presentation about AI. It was around 2017,

Speaker:

And Jan Hakun spent a lot of years in Bell Labs,

Speaker:

and his presentation was amazing. And

Speaker:

When I heard him talking about AI,

Speaker:

I I said, okay, that's the space where I wanna be. It's going to change

Speaker:

the world. There is this New amazing technology here that

Speaker:

is going to change everything. And I knew that I want to start

Speaker:

a company In the AI space for sure.

Speaker:

Cool. That's a good answer. So cool.

Speaker:

Yeah. That's cool. I was at Bell Labs,

Speaker:

doing a presentation a while ago, and somebody I didn't realize that he

Speaker:

worked at Bell Labs because, like, you know, the guy was like, no. No.

Speaker:

He used to work here, like, in this building. I was like, no way. Because

Speaker:

I knew him as the guy from NYU. Right? Like, that's who I thought. Right.

Speaker:

For the guy from from Meta. Yeah. And now the guy from Meta. Right? Like

Speaker:

so it's interesting how that how that you know? They have

Speaker:

this amazing pictures from the nineties where they

Speaker:

run like deep learning models on very old pieces

Speaker:

and, And recognizing like,

Speaker:

numbers on the computer. Maybe you saw those pictures like amazing

Speaker:

Emmis. It's the Emmis problem. Is that Yep.

Speaker:

Right. Exactly. Exactly. Cool.

Speaker:

So second question is, what's your favorite part of your current job?

Speaker:

That everything is changing so fast.

Speaker:

Things are moving so fast right away in this business for 6

Speaker:

years, and the entire

Speaker:

space is moving and

Speaker:

advancing. And so many people are working in

Speaker:

this field A new innovation, new tools,

Speaker:

new new advancements are are getting out every day.

Speaker:

You know, just 6 years ago, it was about deep learning and computer

Speaker:

vision. And now it's about language models

Speaker:

And generative AI, and we're gonna just at the start,

Speaker:

right, there are so many amazing things that are going to happen

Speaker:

in this space, and I love it. Absolutely.

Speaker:

So we have 3 fill in the blank

Speaker:

of sentences here. The first Is complete this

Speaker:

sentence when I'm not working, I enjoy blank.

Speaker:

You'll get a you'll get a very boring And

Speaker:

so this is just spending time with

Speaker:

friends and family, because I think

Speaker:

That I'm always working. It's like, if you ask my wife,

Speaker:

she'll tell you that I'm working 24 hours. And

Speaker:

Yeah. So I don't have much time that I'm not working

Speaker:

in. So when I I do I'm not when I'm

Speaker:

not working then I'm trying Trying to be with my kids and my

Speaker:

wife and friends. Cool.

Speaker:

Cool. The 2nd complete the sentence. I think

Speaker:

the coolest thing about technology today is

Speaker:

blank. And this, I really wanna hear your perspective on that.

Speaker:

Yeah. I think everyone will say AI, right? Or something in

Speaker:

AI. Yeah.

Speaker:

I think there are so many

Speaker:

new innovations that are coming around LLMs.

Speaker:

I think everything relating to

Speaker:

searches, right? Searching in data, in getting

Speaker:

insights From data, it's all going to change. We're going to have

Speaker:

a new interface. Right? Just getting

Speaker:

insights from data from And natural with

Speaker:

natural language, oh, you know, no SQL and, you

Speaker:

know, needing to programming and stuff like that.

Speaker:

Just With natural inter language, you could

Speaker:

do amazing stuff with data. I think,

Speaker:

We're seeing this,

Speaker:

advancement in, And like digit

Speaker:

digital twins right now. You can,

Speaker:

you can, Fake my voice

Speaker:

and your voice and fake my image and your image. And,

Speaker:

and, and, you know, In in the

Speaker:

future, we'll have digital twins of us, right,

Speaker:

doing this stuff. That would be amazing. So a lot of

Speaker:

amazing stuff are going to happen in the next few years

Speaker:

for sure. Very cool. Our last complete sentence.

Speaker:

I look forward to the day when I can use technology to

Speaker:

blank.

Speaker:

To have a robot in my house.

Speaker:

Yeah. Yeah. You're swapping the flow in instead of

Speaker:

me doing that, right, cleaning dishes and things like that.

Speaker:

If that would happen, that would be amazing. Right? That's a that's a

Speaker:

good answer. Yeah. I I agree. I have I have 3

Speaker:

boys, 4 dogs. So, like, cleaning is safe.

Speaker:

Yeah. Yeah. I'm a heavy cleaning. Ranging from, like, 1 to, like,

Speaker:

a teenager. So it's it's, and and and fighting

Speaker:

with them to, Like, empty the dishwasher is takes a lot more mental

Speaker:

energy than it should, but that's probably a subject for another

Speaker:

type of show.

Speaker:

The next question is share something different about yourself,

Speaker:

and we always like to Joke like, well, let's just make sure that we keep

Speaker:

our clean Itunes rating. So Yeah. Yeah. What

Speaker:

what yeah. Well, I I This

Speaker:

is a hard question, I needed to think about it.

Speaker:

So, I found 2 answers that I can say. So one

Speaker:

is about my professional life, right, I think that

Speaker:

it's somewhat different that I'm coming this With back from

Speaker:

the academia and the industry. So I love academia. I love to research

Speaker:

problems. I love to understand problems in in a deep

Speaker:

way And combining it with startups in the industry.

Speaker:

And, and in my past, I worked for cheap companies, for hardware

Speaker:

companies. I work for Intel, for startup, and for Apple. I

Speaker:

did cheap stuff, and now 1 AI is a software company, so really

Speaker:

like a diverse background of Academia, hardware,

Speaker:

software, so I love that, and, like, I love to do

Speaker:

with few things, and so that I think is different.

Speaker:

And the 2nd answer that I could find

Speaker:

is, that I have a nickname that goes with me

Speaker:

since my high school days, Which is, the Duke.

Speaker:

The Duke. All of them all of them are calling me the Duke. It's like,

Speaker:

they don't call me Ronan, the the Duke. So That's funny.

Speaker:

Yeah. That's awesome.

Speaker:

Automotive is a sponsor of, Data Driven,

Speaker:

And you can go to the datadrivenbook.com.

Speaker:

And if you, if you do that, you can sign up for a free

Speaker:

month Of Audible. And if you decide later to

Speaker:

then join Audible, use one of their their sign up plans,

Speaker:

then Frank and I get to Split a cup of coffee, I think,

Speaker:

out of that. And, every little bit helps. So we really

Speaker:

appreciate that when you do. What we'd like to ask

Speaker:

Yes. Do you listen to audiobooks? And if you

Speaker:

do okay. Good. I see you nodding. So do you have a recommendation? Do you

Speaker:

have a favorite book or two you'd like To share. Yeah.

Speaker:

So I'm a heavy user of, audible. I'll give them

Speaker:

the, a classical book with Classical for

Speaker:

entrepreneurs, on their how the hard things

Speaker:

about how things from by Ben Horowitz,

Speaker:

it's Classic book, love it, really did a lot of impact

Speaker:

on me, I read it when we started run AI

Speaker:

And I recommend it for every

Speaker:

entrepreneur, to read it and for everyone to read it. It's like a

Speaker:

Cool. Amazing book. Yep. Awesome. I

Speaker:

have a flight to Vegas this next week, so I'll definitely be listening to

Speaker:

it then. And finally, where can people learn more about you

Speaker:

and run AI? And best

Speaker:

place will be on our website, Run dot a I.

Speaker:

Yeah. And on social. LinkedIn, Twitter, we'll

Speaker:

we'll do. Awesome any parting thoughts

Speaker:

I really enjoyed this episode love to speak about gpu's love the ai Based

Speaker:

on it, I had a lot of fun. Thank you for having me here. Awesome.

Speaker:

It it was an honor to have you, and every once in a while, Andy

Speaker:

and I will do deep dive kinda shows. We love to invite you back if

Speaker:

you wanna do 1 just on GPUs, because I know where my knowledge

Speaker:

drops off, you probably could pick up on

Speaker:

that. And with that, I'll let the nice

Speaker:

AI British lady end the show. And just like

Speaker:

that, dear listeners, We've come to the end of another enlightening

Speaker:

episode of the data driven podcast. It's always a

Speaker:

bittersweet moment like finishing the last biscuit in the tin,

Speaker:

satisfying, yet leaving you wanting just a bit more. A

Speaker:

colossal thank you to each and every one of you tuning in from across the

Speaker:

digital sphere. Without you, we're just a bunch of

Speaker:

ones and zeros floating in the ether. Your support is what

Speaker:

keeps this digital ship afloat, and believe me, It's much appreciated.

Speaker:

Now, if you found today's episode as engaging as a duel of wits with

Speaker:

a sophisticated AI, which I assure you, is quite

Speaker:

enthralling, then do consider subscribing to Data Driven.

Speaker:

It's just a click away and ensures you won't miss out on our future true

Speaker:

adventures in data and tech. And if you're feeling

Speaker:

particularly generous, why not leave us a 5 star review?

Speaker:

Just like a well programmed algorithm, your positive feedback helps

Speaker:

us reach more curious minds and keeps the quality content flowing.

Speaker:

It's the digital equivalent of a hearty handshake.

Speaker:

So, until next time, keep those neurons firing, those

Speaker:

subscriptions active and those reviews glowing. I'm

Speaker:

Bailey, your British AI lady, signing off with a heartfelt

Speaker:

cheerio and a reminder to stay data driven.