1
00:00:00,030 --> 00:00:05,700
I'm Miko Pawlikowski,
and this is HockeyStick.

2
00:00:06,370 --> 00:00:11,190
Today, we're talking about LLMOps
and how it iffers from MLOps,

3
00:00:11,209 --> 00:00:13,479
MLE, and other co acronyms.

4
00:00:13,640 --> 00:00:14,980
Do we need any discipline?

5
00:00:15,230 --> 00:00:18,709
How different is it really to work
with large language models compared

6
00:00:18,710 --> 00:00:20,069
to any other piece of software?

7
00:00:20,430 --> 00:00:22,700
Why do models deteriorate over time?

8
00:00:22,990 --> 00:00:27,720
I'm joined by Abi Aryan, the author
of LLM Ops, Managing Large Language

9
00:00:27,740 --> 00:00:32,740
Models in Production, as well as What
is LLM Ops, published by O'Reilly.

10
00:00:33,235 --> 00:00:36,175
Abby is a founder at Abide ai.

11
00:00:36,714 --> 00:00:39,960
Welcome to this episode and thank
you for flying Hockey Stick.

12
00:00:40,916 --> 00:00:45,426
LLMOps versus MLOps versus MLE.

13
00:00:46,376 --> 00:00:48,666
Can you tell me what's the
difference between the three of them?

14
00:00:49,646 --> 00:00:54,706
So in very simple words, I
think MLOps versus LLMOps.

15
00:00:54,726 --> 00:00:55,706
Those are framework.

16
00:00:55,776 --> 00:00:59,516
Machine learning engineering is
a discipline or an engineering

17
00:00:59,556 --> 00:01:00,586
practice, I would say.

18
00:01:00,946 --> 00:01:03,416
So it's more like a role or a practice.

19
00:01:03,696 --> 00:01:06,566
I would keep that separate
from both of those ones.

20
00:01:06,856 --> 00:01:10,776
but let me define the differece
between MLOps versus LLMOps.

21
00:01:11,226 --> 00:01:14,356
So most of the conventional machine
learning models that we have seen till

22
00:01:14,376 --> 00:01:19,246
date were discriminative models, which is
they were very predictable in the sense

23
00:01:19,246 --> 00:01:21,476
that they were making their inferences.

24
00:01:22,236 --> 00:01:25,116
The models that we are working with
right now, large language models,

25
00:01:25,216 --> 00:01:26,646
they are generative in nature.

26
00:01:27,066 --> 00:01:32,586
So one of the core differences
between MLOps Versus LLMOps is what

27
00:01:32,586 --> 00:01:34,286
kind of model are we working with?

28
00:01:34,316 --> 00:01:37,056
Are we working with a
discriminative model or are we

29
00:01:37,096 --> 00:01:38,696
working with the generative model?

30
00:01:39,536 --> 00:01:44,366
The big difference really happens
because when we're talking about

31
00:01:44,406 --> 00:01:49,276
generative models, they're not really
generating things at the same scale as

32
00:01:49,456 --> 00:01:51,126
conventional machine learning models.

33
00:01:51,466 --> 00:01:56,276
The size is much, much bigger, because
they need a lot of information to be able

34
00:01:56,276 --> 00:01:58,501
to create  more information themselves.

35
00:01:58,901 --> 00:02:03,581
there are the big problems: first is
evaluation, and this second is basically

36
00:02:03,591 --> 00:02:05,631
the scale or the size of the markets.

37
00:02:05,691 --> 00:02:08,841
with conventional machine learning
models, a lot of focus was

38
00:02:09,221 --> 00:02:12,291
'let's collect the data', then
'let's do feature engineering'.

39
00:02:12,341 --> 00:02:17,431
it was very much experimental, we were
trying to fit a model to a very specific

40
00:02:17,431 --> 00:02:21,501
task, but large language models are task
agnostic, which is their more generalized

41
00:02:21,541 --> 00:02:28,411
model, there's a shift from building
task specific software to building task

42
00:02:28,411 --> 00:02:32,011
agnostic software, and that's where
large language models come into play.

43
00:02:32,021 --> 00:02:35,611
Anytime you're building any
sort of unbounded solution, that

44
00:02:35,621 --> 00:02:37,161
comes with its own challenges.

45
00:02:37,661 --> 00:02:43,591
So I would say large language model
operations is inspired from MLOps which is

46
00:02:43,611 --> 00:02:46,921
because it shares some of the same things.

47
00:02:47,761 --> 00:02:50,561
There's maybe some stuff that
we are doing in the engineering.

48
00:02:50,571 --> 00:02:54,051
Yes, we're still fine tuning the products,
even though the fine tuning we're doing

49
00:02:54,051 --> 00:02:59,541
is very different, we can't really afford
to update all of the weights of the model.

50
00:02:59,601 --> 00:03:03,606
So we're using approaches that only
update some of the weights, or we

51
00:03:03,606 --> 00:03:06,456
are using other techniques, for
example, prompt engineering, which

52
00:03:06,466 --> 00:03:11,576
is new with these models specifically
because again, updating all of the

53
00:03:11,576 --> 00:03:15,006
weights of the model during the
fine tuning or the training process.

54
00:03:15,096 --> 00:03:16,906
If we are to call fine tuning.

55
00:03:17,141 --> 00:03:18,831
very similar to training itself.

56
00:03:19,371 --> 00:03:20,301
It's very costly.

57
00:03:20,846 --> 00:03:21,136
okay.

58
00:03:21,136 --> 00:03:22,516
So you're blowing my mind a little bit.

59
00:03:22,556 --> 00:03:29,646
I thought the answer was LLMOps is a niche
within MLOps and, just leave it at that.

60
00:03:29,646 --> 00:03:31,986
But sounds like there is more to that.

61
00:03:32,286 --> 00:03:33,716
How much of that is fashion?

62
00:03:33,926 --> 00:03:37,856
People who work on the fancy new LLMs
who don't want to be called, the old

63
00:03:37,856 --> 00:03:40,036
fashioned, machine learning engineers?

64
00:03:40,781 --> 00:03:45,491
it's a complicated thing, which
is, I don't think you essentially

65
00:03:45,491 --> 00:03:47,291
ever work on a technology.

66
00:03:47,291 --> 00:03:50,511
There are people who work on
technology for the sake of technology.

67
00:03:50,511 --> 00:03:53,491
I would call those people
researchers, which is ML scientists,

68
00:03:53,541 --> 00:03:55,011
essentially are those kind of people.

69
00:03:55,011 --> 00:03:59,516
But then the next step is people
who are working on technology

70
00:03:59,526 --> 00:04:01,066
because it's solving a problem.

71
00:04:01,076 --> 00:04:05,226
So whether it's using a very simple
decision tree or whether it's still

72
00:04:05,226 --> 00:04:10,646
using cat booster XGBoost, I don't
think there needs to be much difference

73
00:04:10,646 --> 00:04:15,226
in terms of how people approach these
kind of technologies itself, because

74
00:04:15,496 --> 00:04:19,466
the focus people need to have is 'this
is the problem we are trying to solve.

75
00:04:19,466 --> 00:04:21,386
What kind of problem is it?

76
00:04:21,596 --> 00:04:25,306
Is it discrimative problem or
is it a generative problem?

77
00:04:25,546 --> 00:04:29,566
If it's a generative problem, yes, I'm
going to implement this technology, but

78
00:04:29,571 --> 00:04:34,866
it doesn't really mean that, both of those
two fields are in competition with each

79
00:04:34,866 --> 00:04:36,666
other, I think they compliment each other.

80
00:04:37,166 --> 00:04:43,286
You picked LLMOps as the topic of your
next book, and for anybody listening

81
00:04:43,286 --> 00:04:48,706
to this, the book will be very soon
available in preview early access.

82
00:04:49,396 --> 00:04:53,946
It's called "LLMOps: Managing Large
Language Models in Production".

83
00:04:54,691 --> 00:04:58,061
which takes quite a bit to
pronounce all of this together.

84
00:04:58,461 --> 00:05:01,501
Tell us a little bit about how
you, came up with the topic

85
00:05:01,501 --> 00:05:04,351
of the book, the origin story.

86
00:05:04,351 --> 00:05:06,631
I always wanted to write a technical book.

87
00:05:06,981 --> 00:05:12,081
Manning approached me back in 2018
to write a book on interpretability.

88
00:05:12,086 --> 00:05:15,026
I didn't feel like I was ready
to write a book back then.

89
00:05:15,726 --> 00:05:19,206
but the person who was the assistant
acquisition editor, which is

90
00:05:19,206 --> 00:05:22,266
the person who reached out, is
essentially my acquisition editor.

91
00:05:22,266 --> 00:05:24,066
So she's now at O'Reilly.

92
00:05:24,996 --> 00:05:29,346
it's a very small place eventually,
in terms of what inspired me to write

93
00:05:29,346 --> 00:05:33,106
a book, especially on this topic and
not pick up any other topic for say,

94
00:05:33,786 --> 00:05:38,726
is basically seeing that shift, which
is, as the scale is increasing with

95
00:05:38,736 --> 00:05:42,886
these models, yes, they're not really
good at doing discriminative tasks

96
00:05:42,946 --> 00:05:47,736
right now, but eventually, because
these are generalized models, we'll be

97
00:05:47,826 --> 00:05:52,256
eventually expanding on the capabilities
of these models, but these models

98
00:05:52,256 --> 00:05:54,136
are not getting smaller anytime soon.

99
00:05:55,171 --> 00:05:59,381
Because of the scale of these models,
there will be a few questions for people

100
00:05:59,421 --> 00:06:03,871
to asking, because we're interacting so
closely with these models as compared

101
00:06:03,871 --> 00:06:08,851
to before earlier, the majorly the
people who were interacting directly

102
00:06:08,871 --> 00:06:11,811
with the model were machine learning
engineers and data scientists.

103
00:06:12,181 --> 00:06:17,031
Now, these are in the form of
chatbots, which the entire user is

104
00:06:17,821 --> 00:06:20,636
interacting with it, and people are
playing with it, trying to hack it.

105
00:06:20,996 --> 00:06:25,696
So while the field of security operations
wasn't super relevant for a lot of

106
00:06:25,736 --> 00:06:30,806
other companies, now that has become
the main center of this show in a way.

107
00:06:30,876 --> 00:06:35,326
Everybody can build a large language
model and that's one of the core

108
00:06:35,326 --> 00:06:41,086
differences is, the focus was more like
in MLOps, like, how do we build a model?

109
00:06:41,156 --> 00:06:44,166
How do we host and, deploy it
in production, which is how

110
00:06:44,166 --> 00:06:45,886
do we self serve these models.

111
00:06:46,346 --> 00:06:51,206
Now the focus has shifted, it's so damn
easy To build a large language model

112
00:06:51,246 --> 00:06:54,576
for your particular application, you
may not have to build it from scratch.

113
00:06:54,606 --> 00:06:56,826
You don't need to train
a model from scratch.

114
00:06:57,116 --> 00:06:58,666
You can put wrappers around it.

115
00:06:58,696 --> 00:07:00,266
You can put guardrails around it.

116
00:07:00,606 --> 00:07:01,946
You can still fine tune it.

117
00:07:01,976 --> 00:07:06,956
You can integrate it with a RAG system
and use it for your particular use case.

118
00:07:07,246 --> 00:07:12,936
So for me, understanding the fact that
there's A big market of people who were

119
00:07:12,946 --> 00:07:17,776
software engineers, who didn't really
have access to machine learning systems

120
00:07:17,826 --> 00:07:21,286
because they didn't have the skill
setting, machine learning has always been

121
00:07:21,286 --> 00:07:25,736
posed like, Oh, my God, you need to know
linear algebra to understand how these

122
00:07:25,756 --> 00:07:32,221
models work to now, where they can just
give the API key and implement a machine

123
00:07:32,231 --> 00:07:36,701
learning model itself so that ease of
use means the entire software engineering

124
00:07:36,711 --> 00:07:41,691
community or anybody who can code will
now be able to build or host their own

125
00:07:41,711 --> 00:07:45,411
machine learning model and in this case,
specifically, it will be large language

126
00:07:45,441 --> 00:07:48,001
models, but I recognize that shift.

127
00:07:48,001 --> 00:07:50,501
And I was like, this
is a substantial shift.

128
00:07:50,661 --> 00:07:55,511
This is not just, this technology is
limited to this very set of people.

129
00:07:56,031 --> 00:07:58,051
Now, so many people can use it.

130
00:07:58,091 --> 00:07:59,761
So many people can build on it.

131
00:08:00,561 --> 00:08:05,701
And the fact that the market has expanded
and also the fact that these models do

132
00:08:05,701 --> 00:08:08,271
present additional challenges as well.

133
00:08:08,451 --> 00:08:11,101
This is the right point
for us to write a book.

134
00:08:11,116 --> 00:08:15,416
Because it's a way bigger
market than before.

135
00:08:16,154 --> 00:08:20,974
There's something really scary about
putting a model in production and letting

136
00:08:21,034 --> 00:08:25,244
clients talk to it, when you never
know what it's going to do for sure.

137
00:08:25,244 --> 00:08:25,274
Okay.

138
00:08:25,989 --> 00:08:28,339
And I understand the shift
that you're describing.

139
00:08:28,339 --> 00:08:30,724
So who came up with the term LLMOps,

140
00:08:30,774 --> 00:08:33,644
Basically, when I sent my
proposal, I used that term.

141
00:08:34,044 --> 00:08:40,264
it was in, Last year, February, when
I sent my proposal, and I use that

142
00:08:40,264 --> 00:08:44,574
term, the proposal was sent to a couple
of reviewers who were like, 'Oh, we

143
00:08:44,574 --> 00:08:46,194
don't think it's going to stick'.

144
00:08:46,194 --> 00:08:51,284
And then eventually, I think Weights and
Biases came up with their own blog post,

145
00:08:51,284 --> 00:08:52,744
which is what's really the difference.

146
00:08:52,784 --> 00:08:55,944
Then Arise came up with their
own blog post, as in what's the

147
00:08:55,944 --> 00:08:57,564
difference between LLMOps and MLOPs.

148
00:08:58,084 --> 00:09:01,964
And eventually, Everybody was like,
'Oh my God, this term is sticking'.

149
00:09:02,264 --> 00:09:06,084
And by then we had already signed
up the contract with my editor.

150
00:09:06,084 --> 00:09:10,544
she took a gamble on me, which is, I
said, this is going to stick because it's

151
00:09:10,544 --> 00:09:16,904
a substantial shift in what we're trying
to do, in MLOps, the focus was different.

152
00:09:17,024 --> 00:09:18,524
Here, the focus is different.

153
00:09:18,844 --> 00:09:22,644
the amount of outages, the amount of
reliability issues, the amount of,

154
00:09:23,014 --> 00:09:27,304
unreliability of these models is way
higher than the conventional machine

155
00:09:27,304 --> 00:09:28,994
learning models that we were using.

156
00:09:29,364 --> 00:09:33,454
there are very few people who were
doing distributed training, who

157
00:09:33,514 --> 00:09:36,014
understand that scope of problems.

158
00:09:36,514 --> 00:09:40,704
the engineering was not really done
at the scale that is being done

159
00:09:40,754 --> 00:09:42,324
right now for large language models.

160
00:09:42,614 --> 00:09:48,414
I feel like this is going to be a big
thing where there needs to be education.

161
00:09:48,734 --> 00:09:53,294
And I basically went in to create
that education in this space.

162
00:09:53,854 --> 00:09:58,664
if I was to start in the field
today as a 17 or a 19 year old,

163
00:09:59,424 --> 00:10:01,214
what would I want to learn?

164
00:10:01,324 --> 00:10:04,794
I come from like a background in maths
and computer science and statistics.

165
00:10:05,194 --> 00:10:08,634
So I don't want people to feel
like, 'oh, that's a barrier for

166
00:10:08,634 --> 00:10:09,944
me to get into machine learning'.

167
00:10:09,944 --> 00:10:11,614
No, that's not really a barrier.

168
00:10:13,154 --> 00:10:13,534
right?

169
00:10:14,144 --> 00:10:19,424
And so for anybody who is, like we said,
the book is still a little bit, out.

170
00:10:19,484 --> 00:10:22,754
it's coming soon, but
it's not available today.

171
00:10:23,119 --> 00:10:26,419
What is available is that
new report that you authored.

172
00:10:26,569 --> 00:10:27,999
What is LLMOps?

173
00:10:28,529 --> 00:10:29,059
What's that?

174
00:10:29,069 --> 00:10:33,269
Basically to prepare people to
start using that term to make sure

175
00:10:33,269 --> 00:10:37,009
that everybody's on the same page:
'okay, guys, we're doing LLM Ops.

176
00:10:37,009 --> 00:10:37,839
This is the term.

177
00:10:37,869 --> 00:10:38,779
Let's go with it'.

178
00:10:39,426 --> 00:10:44,736
so the reason the report came out was
because we got very critical reviews

179
00:10:44,776 --> 00:10:49,146
from a lot of people early last year from
people who were saying LLMs are not going

180
00:10:49,146 --> 00:10:51,566
to stick, LLMOps is not going to stick.

181
00:10:51,566 --> 00:10:56,156
So we were like, let's at least tell
people what it is, and then if there's

182
00:10:56,156 --> 00:10:57,536
enough interest, we'll write the book.

183
00:10:57,556 --> 00:11:00,546
Though we had signed contracts for
both of the things, but we were like,

184
00:11:00,596 --> 00:11:04,486
let's test out, if people really
understand what's the difference.

185
00:11:04,536 --> 00:11:09,276
once people know why this is
substantial, then we can take them

186
00:11:09,276 --> 00:11:11,396
to, how are you supposed to do it?

187
00:11:12,189 --> 00:11:17,509
what the report essentially does is what
I would probably say we're a little bit

188
00:11:17,999 --> 00:11:22,079
late in the market where it has already
stuck, which is people have already

189
00:11:22,079 --> 00:11:26,049
understood, there's a shift in terms
of companies that are building their

190
00:11:26,079 --> 00:11:29,819
own large language model or generative
AI teams, if I can use that word.

191
00:11:30,339 --> 00:11:32,809
The implementation has already started.

192
00:11:32,809 --> 00:11:35,089
They've already started
looking at the issues.

193
00:11:35,139 --> 00:11:38,659
They've started realizing that
they need a new discipline.

194
00:11:39,019 --> 00:11:43,979
so I'm having talks with a lot of
companies on a consulting capacity,

195
00:11:44,409 --> 00:11:47,879
that are trying to figure out how
to build, a specialized engineering

196
00:11:47,899 --> 00:11:49,259
practice around these models.

197
00:11:49,259 --> 00:11:53,009
What would the shift look like
when it comes to these models?

198
00:11:53,289 --> 00:11:55,289
What would the team structure look like?

199
00:11:55,309 --> 00:11:56,979
What would the metrics look like?

200
00:11:57,319 --> 00:12:00,869
What are the key expectations
that they can get?

201
00:12:00,899 --> 00:12:05,059
And, if you want to keep investing
in the space, then how do we

202
00:12:05,059 --> 00:12:06,709
justify that investment as well?

203
00:12:07,579 --> 00:12:11,569
How do we make sure we tie these
models with our KPIs now, given

204
00:12:11,569 --> 00:12:16,409
the fact, these models are still a
little bit unpredictable for a few

205
00:12:16,409 --> 00:12:17,949
people would call it unpredictable.

206
00:12:18,499 --> 00:12:21,419
I don't particularly think
they're unpredictable.

207
00:12:21,739 --> 00:12:22,849
They still exist.

208
00:12:23,249 --> 00:12:28,279
any inference that is being made does
exist, and there are probably the space of

209
00:12:28,279 --> 00:12:30,889
the input data that you're providing to.

210
00:12:30,979 --> 00:12:35,149
So to me, while they're still very
probabilistic model, but they're still

211
00:12:35,239 --> 00:12:40,279
a little bit like untameable in, if I
can say it in that sense, which is, it's

212
00:12:40,279 --> 00:12:45,619
very hard to predict, if the model goes
off and it's not because essentially the

213
00:12:45,629 --> 00:12:49,559
model is built that way, but it's because
of the number of people interacting with

214
00:12:49,559 --> 00:12:54,479
it and, the way the models are being
structured is basically to help the user.

215
00:12:54,809 --> 00:12:57,239
there are so many people
trying to hack the solutions.

216
00:12:57,309 --> 00:13:00,959
basically you're building a product
for your enemies, essentially.

217
00:13:01,039 --> 00:13:01,479
okay.

218
00:13:01,579 --> 00:13:02,879
Naming is hard.

219
00:13:02,889 --> 00:13:07,789
It's probably the hardest problem in
computer science, but we've got a term.

220
00:13:07,939 --> 00:13:11,509
I think at this stage we
understand what it means.

221
00:13:12,289 --> 00:13:17,499
There's a report in case you want to
prove to somebody, Hey, LLMOps means this.

222
00:13:17,499 --> 00:13:18,749
You can just point them to that.

223
00:13:19,159 --> 00:13:20,479
And the book is coming out soon.

224
00:13:20,479 --> 00:13:27,644
So let's talk a little bit about
what LLMOps really is in practice.

225
00:13:27,644 --> 00:13:30,754
And I'm browsing through
your report right now.

226
00:13:30,754 --> 00:13:36,264
And I see things like safety, scalability,
robustness, the LLM lifecycle.

227
00:13:37,064 --> 00:13:38,724
Let's talk about this things a little bit.

228
00:13:38,744 --> 00:13:39,754
where should we start?

229
00:13:39,814 --> 00:13:44,294
What's the most, painful
part of running LLMs today?

230
00:13:45,779 --> 00:13:50,419
So I would say, the three goals are
where we should ideally start this,

231
00:13:50,419 --> 00:13:55,064
which is why do we need this field,
or why do we need this new practice?

232
00:13:55,734 --> 00:13:59,074
The first thing is essentially
safety, which is making sure that

233
00:13:59,164 --> 00:14:00,944
the model is playing by the rules.

234
00:14:01,344 --> 00:14:05,904
Because, again, it's not just
machine learning engineers trying

235
00:14:05,904 --> 00:14:07,274
to build on these models today.

236
00:14:07,314 --> 00:14:08,304
It's software engineers.

237
00:14:08,634 --> 00:14:10,454
It's a lot of other people as well.

238
00:14:10,724 --> 00:14:15,424
There needs to be a new playbook, for
people who are working with these models.

239
00:14:15,804 --> 00:14:20,159
Because, again, the models do
pose a lot of risk, which is yes,

240
00:14:20,159 --> 00:14:21,939
there's operational risk as well.

241
00:14:22,719 --> 00:14:26,489
But a lot of risk that people don't
really understand, it's very easy to

242
00:14:26,509 --> 00:14:30,489
integrate code, integrate libraries,
but a lot of people don't really

243
00:14:30,499 --> 00:14:35,329
think about supply chain risk, which
is if I'm using a package from some

244
00:14:35,649 --> 00:14:38,609
website, is the package secure enough?

245
00:14:38,844 --> 00:14:43,654
How do I make sure that, I'm not
installing malware on my system.

246
00:14:44,214 --> 00:14:47,644
Those things are not really well
understood, which is the entire

247
00:14:47,644 --> 00:14:52,294
field of, cyber security and security
operations was isolated from practice.

248
00:14:52,294 --> 00:14:57,109
And now that has to become
very  key integrated into this.

249
00:14:57,599 --> 00:15:03,409
The second thing I would say is
scalability, which is basically

250
00:15:03,409 --> 00:15:08,329
making sure that the model does scale
to the number of people that are

251
00:15:08,499 --> 00:15:09,979
interacting with the model as well.

252
00:15:10,239 --> 00:15:15,659
We're essentially going from where maybe
a couple of people were interacting

253
00:15:15,669 --> 00:15:19,389
with these models to a large number
of people interacting by the minute,

254
00:15:19,439 --> 00:15:23,609
which is, you're not going to open AI
chat GPT to write one thing, right?

255
00:15:23,959 --> 00:15:28,529
You're having a conversation, which may
take about five, 10, 15, 20 minutes,

256
00:15:28,819 --> 00:15:31,039
and they're wearing workloads as well.

257
00:15:31,039 --> 00:15:34,229
And they're wearing workloads
from different locations.

258
00:15:34,959 --> 00:15:39,599
So we need to think about how do we
make sure that the latency is fine?

259
00:15:39,869 --> 00:15:44,449
How do we make sure that the models are
able to deal with the traffic if it's

260
00:15:44,979 --> 00:15:50,179
usual or unusual, and how do we build
an architecture around making sure

261
00:15:50,199 --> 00:15:56,849
that the model can serve and can adapt
to those requirements is the central

262
00:15:56,849 --> 00:16:01,209
thing, but also with the part that
these models are so huge, inferencing

263
00:16:01,449 --> 00:16:03,459
that every single time does cost you.

264
00:16:03,459 --> 00:16:04,879
So how do we do caching?

265
00:16:04,989 --> 00:16:06,659
How do we do, load testing?

266
00:16:06,669 --> 00:16:08,419
How do we do performance testing?

267
00:16:08,429 --> 00:16:13,169
All of those questions become central
that weren't really central before.

268
00:16:13,619 --> 00:16:15,839
then the next part is
basically robustness.

269
00:16:16,089 --> 00:16:18,039
and by robust, the model keeps.

270
00:16:18,404 --> 00:16:22,734
Having the same kind of reactions,
which is a conventionally we used to

271
00:16:22,734 --> 00:16:26,464
call it reproducibility, which is you
can reproduce what was already there.

272
00:16:27,084 --> 00:16:30,424
Robustness is a little bit different,
since a lot of people are building

273
00:16:30,444 --> 00:16:33,384
on closed source, a lot of people
are building on open source, but

274
00:16:33,404 --> 00:16:38,104
the models behavior changes with
how many people are interacting?

275
00:16:38,134 --> 00:16:39,944
it's getting a lot of light data as well.

276
00:16:40,384 --> 00:16:43,784
So there's some kind of model
degradation that happens with time.

277
00:16:44,094 --> 00:16:47,844
Also, every single time the model gets
updated as well, the behavior changes.

278
00:16:47,854 --> 00:16:52,224
So the entire prompt pipeline that you
built up can break easily, which is a

279
00:16:52,224 --> 00:16:58,549
lot of companies eventually realized, mid
last year that, the built up These very

280
00:16:58,569 --> 00:17:03,309
intricate prompt pipelines, and eventually
OpenAI does one update and those prompt

281
00:17:03,399 --> 00:17:05,049
pipelines don't really work anymore.

282
00:17:05,049 --> 00:17:10,189
So how do you build a system that keeps
on being predictable in that scenario,

283
00:17:10,189 --> 00:17:14,617
which is any sort of infrastructure
that you build on top of the model?

284
00:17:14,667 --> 00:17:18,537
It doesn't need to be rebuilt for
every single iteration or every single

285
00:17:18,577 --> 00:17:22,237
time you're moving from OpenAI to
let's say plot or to some other model

286
00:17:22,297 --> 00:17:27,577
as well, because you need to keep
improving and making sure that you're

287
00:17:27,577 --> 00:17:28,977
working with the new data as well.

288
00:17:29,007 --> 00:17:33,067
So the three questions that come over
there are the questions of data drift,

289
00:17:33,807 --> 00:17:39,297
which is based on how the input changes
over time, which is basically how many

290
00:17:39,297 --> 00:17:40,877
people are interacting with the model.

291
00:17:41,237 --> 00:17:44,947
and that causes one of the
shifts in the model behavior.

292
00:17:45,227 --> 00:17:49,357
The second is concept drift, which
is every single time, there's new

293
00:17:49,357 --> 00:17:50,757
information that comes out there.

294
00:17:50,757 --> 00:17:55,597
So a good example to give over there
would be Corona used to be a beer brand.

295
00:17:55,867 --> 00:18:02,347
so any models that were built up till,
let's say about 2019, 2020, understood

296
00:18:02,437 --> 00:18:06,197
Corona as like a beer, so it would
always reference an answer in that

297
00:18:06,197 --> 00:18:10,177
perspective, the models that are being,
built now to understand that, it could be

298
00:18:10,207 --> 00:18:13,417
a beer or it could be, the virus thing.

299
00:18:14,767 --> 00:18:16,647
So that is essentially concept drift.

300
00:18:16,747 --> 00:18:19,897
the prime minister, the president,
or any new information that comes

301
00:18:19,897 --> 00:18:24,467
up where changes the behavioral
functional, capabilities of the

302
00:18:24,467 --> 00:18:29,522
inputs that we've essentially, given
or, adds additional information that

303
00:18:29,582 --> 00:18:31,262
changes the model behavior as well.

304
00:18:31,262 --> 00:18:35,512
And the third is basically the
prompter, which is the updates

305
00:18:35,932 --> 00:18:37,002
of the model, essentially.

306
00:18:37,002 --> 00:18:41,622
And how does the retraining of the
model affect the performance of

307
00:18:41,652 --> 00:18:43,562
your entire infrastructure as well?

308
00:18:43,612 --> 00:18:49,082
For anybody who's building on closed
source models, OpenAI, Cloud, and Entropiq

309
00:18:49,082 --> 00:18:53,702
and all of these companies, they're
constantly using, our LHF techniques

310
00:18:53,702 --> 00:18:55,702
to retrain the models substantially.

311
00:18:56,752 --> 00:18:59,512
So that does impact the model performance.

312
00:18:59,732 --> 00:19:02,652
I would say these three are core
things which are in the center.

313
00:19:02,682 --> 00:19:07,412
Anybody who's building with these models
needs to think of is my model safe?

314
00:19:07,782 --> 00:19:09,182
Is my model scalable?

315
00:19:09,282 --> 00:19:10,342
Is my model robust?

316
00:19:10,342 --> 00:19:14,302
And if you're not looking those
properties, it's very hard to build a

317
00:19:14,652 --> 00:19:17,172
sustainable product around these models.

318
00:19:17,947 --> 00:19:20,157
that was a lot of information in one go.

319
00:19:20,187 --> 00:19:21,177
I've got questions.

320
00:19:21,227 --> 00:19:25,347
imagine you're talking to a five year
old software engineer who has never done

321
00:19:25,387 --> 00:19:32,667
any AI, just, basic, software engineering
things, as five year olds do, how

322
00:19:32,877 --> 00:19:39,072
different really is it, the safety part
of it, compared to any other application?

323
00:19:39,122 --> 00:19:43,372
the few examples you gave, like using
an unsafe library coming from somewhere,

324
00:19:43,412 --> 00:19:46,802
every piece of software on earth is
going to have the same problem, right?

325
00:19:47,212 --> 00:19:52,922
What are the problems that are
actually unique to LLMs, from

326
00:19:52,922 --> 00:19:54,912
the safety perspective and why?

327
00:19:55,970 --> 00:20:01,270
This is one reason I think LLMOps is
closer to DevOps than it is to MLOps.

328
00:20:01,310 --> 00:20:06,110
Essentially, because DevOps is
built up around so much software,

329
00:20:06,110 --> 00:20:09,220
so many frameworks, so many
libraries exist out there, whereas

330
00:20:09,380 --> 00:20:13,190
in conventional machine learning
models, we were using scikit learning.

331
00:20:13,460 --> 00:20:16,900
So there were very specific
libraries that were already tested.

332
00:20:17,105 --> 00:20:18,625
And, we knew that these are secure.

333
00:20:18,645 --> 00:20:21,085
We were using TensorFlow,
PyTorch and all those ones.

334
00:20:21,845 --> 00:20:27,095
Now, because the open source community is
very similar to how the software community

335
00:20:27,095 --> 00:20:32,335
is, so a lot of things do translate from
what DevOps engineers were doing or where

336
00:20:32,335 --> 00:20:37,415
the focus of what conventional software
engineers were looking at versus what

337
00:20:37,415 --> 00:20:40,055
LLMOps engineers would be looking at.

338
00:20:40,385 --> 00:20:45,120
The key differences now.would
be: anytime we're doing

339
00:20:45,350 --> 00:20:46,780
conventional software engineering.

340
00:20:47,160 --> 00:20:52,130
it's a rule based system where we
define, what our code is supposed to do.

341
00:20:52,540 --> 00:20:57,230
Now we're moving away from a rule
based system, which means, basically

342
00:20:57,370 --> 00:21:02,190
he model can create things that
are factually inaccurate as well.

343
00:21:02,490 --> 00:21:05,350
Now those are things that
you really need to cater for.

344
00:21:05,720 --> 00:21:08,790
So that's one of the big
things, which is A: how do you

345
00:21:08,790 --> 00:21:11,060
deal with biases in the data?

346
00:21:11,560 --> 00:21:15,220
Second thing is how do you deal with
factually inaccurate information?

347
00:21:15,770 --> 00:21:19,950
and so for a five year old, maybe it's not
that significant, but for anybody who's

348
00:21:19,950 --> 00:21:24,390
doing software engineering, how do you
make sure that you're not making decisions

349
00:21:24,520 --> 00:21:27,540
based on what the models are generating.

350
00:21:27,540 --> 00:21:33,610
For example, if the model says this is how
this is supposed to happen or for business

351
00:21:33,610 --> 00:21:38,980
executives, if it says, based on the data,
this is what the graph is looking like.

352
00:21:39,240 --> 00:21:43,680
And if that happens to be inaccurate,
we can't really rely on that to make

353
00:21:43,690 --> 00:21:47,040
further decisions on where the strategic
decisions we should be making next.

354
00:21:48,210 --> 00:21:52,050
Now then with models are exposed to
that kind of risk, which is usually

355
00:21:52,050 --> 00:21:54,030
called this hallucination problem.

356
00:21:55,015 --> 00:21:55,375
got it.

357
00:21:55,765 --> 00:22:00,075
I guess it gets much worse when you've
got things like autonomous agents, right?

358
00:22:00,075 --> 00:22:05,455
When people directly plug things that
have permissions to do things, into

359
00:22:05,455 --> 00:22:08,995
this LLMs and we'll see how that goes.

360
00:22:09,045 --> 00:22:09,395
Okay.

361
00:22:09,505 --> 00:22:11,715
So I buy that argument.

362
00:22:11,785 --> 00:22:16,635
going back to the scalability, that's
the bit that I don't think I fully

363
00:22:16,635 --> 00:22:18,345
understood when you were explaining.

364
00:22:18,435 --> 00:22:25,465
Why is it not the same as scaling
any other request response server?

365
00:22:26,115 --> 00:22:30,495
how is it different other than, the
practical part of it being massive

366
00:22:30,505 --> 00:22:32,675
and requiring a lot of resources?

367
00:22:33,745 --> 00:22:39,745
Why is it harder to scale an  LLM, than
it is to scale any other application?

368
00:22:40,700 --> 00:22:44,850
at this point, you can probably say there
are three kinds of applications out there.

369
00:22:44,870 --> 00:22:46,780
One is conventional software piece, right?

370
00:22:47,220 --> 00:22:50,500
Anytime we're writing software,
we're trying to refactor, making it

371
00:22:50,580 --> 00:22:55,050
as small as possible or making sure
that we're defining rules on, this

372
00:22:55,220 --> 00:22:56,840
is what happens when you do this.

373
00:22:56,880 --> 00:22:58,490
This is what happens when you do this.

374
00:22:58,510 --> 00:23:00,070
That's how requests are processed.

375
00:23:00,500 --> 00:23:06,040
But conventional machine learning models,
the way that they are working, is,

376
00:23:06,050 --> 00:23:10,770
the applications they're used for are
entirely different, mostly they're used

377
00:23:10,770 --> 00:23:17,755
for internal data capture, being used for
recommender systems or semantic analysis.

378
00:23:18,035 --> 00:23:20,865
the people who are interacting with
the model outputs are different.

379
00:23:21,375 --> 00:23:25,645
Now, because large language
models are customer facing.

380
00:23:26,015 --> 00:23:29,835
that necessitates some expectations
that people have, the inference

381
00:23:29,835 --> 00:23:31,445
speed is always going to be high.

382
00:23:32,285 --> 00:23:36,335
Now, with the inference speed, when
you have so much data that you need to

383
00:23:36,405 --> 00:23:41,465
retrieve or run an algorithm to create
new information based on whatever they've

384
00:23:41,515 --> 00:23:46,235
given, that is a very hard task, so with
conventional software engineering, we

385
00:23:46,235 --> 00:23:50,845
were writing, those birds first search,
all of those algorithms, They were still

386
00:23:50,885 --> 00:23:56,755
implemented on a small scale data, it
was still very simple to do as compared

387
00:23:56,755 --> 00:23:59,375
to now we have these massive databases.

388
00:23:59,815 --> 00:24:04,070
retrieving data and then
generating information, both in

389
00:24:04,070 --> 00:24:05,590
real time, is a very hard task.

390
00:24:05,660 --> 00:24:09,790
Making sure that you can maintain the
inference, speed, making sure that you

391
00:24:09,790 --> 00:24:13,600
can maintain the latency, making sure that
you can make it the target, making sure

392
00:24:13,600 --> 00:24:16,400
that you can test, other things as well.

393
00:24:16,470 --> 00:24:20,850
before the model really passes
information to the user, their guardrails

394
00:24:20,880 --> 00:24:24,030
being put in, which is there's one
additional layer that's put in, there's

395
00:24:24,060 --> 00:24:27,770
evaluations that are being put in as
well to make sure that the person isn't

396
00:24:27,770 --> 00:24:32,695
fiddling with the model or, it's not
giving you information that's wrong.

397
00:24:32,935 --> 00:24:36,265
First part is retrieving the data
and then generating information.

398
00:24:36,275 --> 00:24:39,175
The next part is making sure
that, it passes through all of

399
00:24:39,225 --> 00:24:43,255
those layers as well, then still
maintains the customer expectations.

400
00:24:43,285 --> 00:24:44,295
That's really hard.

401
00:24:45,015 --> 00:24:51,385
And also, when the demand does skyrocket,
or when they don't find an information,

402
00:24:51,485 --> 00:24:54,090
they can easily freeze up as well.

403
00:24:54,630 --> 00:24:59,030
So that becomes a very hard problem
to solve, because that means the

404
00:24:59,030 --> 00:25:01,340
performance would also degrade.

405
00:25:01,590 --> 00:25:07,330
then the next question is basically If
a lot of people are using the model for

406
00:25:07,330 --> 00:25:14,020
one kind of things, making sure that the
model can still answer or doesn't really

407
00:25:14,280 --> 00:25:19,230
adapt to only those kind of problems
and can still go into the database

408
00:25:19,230 --> 00:25:23,580
and still look at, A very different
problem and still perform well on that.

409
00:25:23,580 --> 00:25:24,320
that's hard.

410
00:25:24,790 --> 00:25:30,670
So the real challenges are basically
the service disruption, the availability

411
00:25:31,140 --> 00:25:35,650
and that is usually a little bit harder.

412
00:25:36,095 --> 00:25:41,175
majorly because, you need to have a
lot of, parallel nodes as well that are

413
00:25:41,175 --> 00:25:42,905
trying to interact with these models.

414
00:25:42,995 --> 00:25:46,555
And then for companies that are hosting
different large language models as well.

415
00:25:46,895 --> 00:25:51,085
no one large language model is optimal
for every single kind of problem.

416
00:25:51,085 --> 00:25:52,895
it may not be cost optimal as well.

417
00:25:53,045 --> 00:25:57,045
So what is really happening is
there's a micro service kind

418
00:25:57,045 --> 00:25:58,895
of architecture, which was.

419
00:25:59,160 --> 00:26:01,320
Prominent in conventional
software engineering.

420
00:26:01,320 --> 00:26:04,840
Do not so much in LLMOps.

421
00:26:05,290 --> 00:26:08,910
What really happens over there is then
you're thinking about how am I doing

422
00:26:09,170 --> 00:26:13,080
parallel computing with all of these
nodes and clusters that I do have?

423
00:26:13,360 --> 00:26:15,130
How do I do horizontal scaling?

424
00:26:15,440 --> 00:26:18,660
How do I make sure that all of my
resources are being optimized, and

425
00:26:18,670 --> 00:26:23,190
one of my clusters isn't done while
one is being really optimized to

426
00:26:23,210 --> 00:26:24,500
the maximum, you know, that is.

427
00:26:24,755 --> 00:26:26,145
stopping at one point in time.

428
00:26:26,595 --> 00:26:30,225
So those are key problems
with these models now.

429
00:26:31,338 --> 00:26:31,708
So

430
00:26:31,830 --> 00:26:37,960
I get that, but I'm still not entirely
sure why it's any harder than any

431
00:26:37,990 --> 00:26:41,350
other piece of software, like all
the things that you just mentioned

432
00:26:41,350 --> 00:26:45,620
about scaling clusters and high
availability, high throughput, making

433
00:26:45,620 --> 00:26:47,050
sure that all those things happen.

434
00:26:47,790 --> 00:26:52,230
That's problems that we've had for
decades and that, all the other

435
00:26:52,340 --> 00:26:54,230
systems have in place, right?

436
00:26:54,230 --> 00:26:57,860
If you look at any enterprise ready
application, there are layers and

437
00:26:57,870 --> 00:27:00,260
layers of things and they keep working.

438
00:27:00,260 --> 00:27:03,460
So where is the actual
difference coming from?

439
00:27:03,470 --> 00:27:08,755
Is it because of the fact that your
query, your prompt, it's indeterministic

440
00:27:08,755 --> 00:27:14,445
in terms of resources and time
it takes to answer that query?

441
00:27:14,545 --> 00:27:19,265
Is that really the biggest
wrench in the works here?

442
00:27:20,525 --> 00:27:24,575
so the biggest wrench is
essentially just the size of

443
00:27:24,605 --> 00:27:26,665
the data that it needs to query.

444
00:27:27,225 --> 00:27:32,095
the size of the data has gone
from, a few million parameters

445
00:27:32,155 --> 00:27:33,675
to now a trillion parameters.

446
00:27:33,675 --> 00:27:39,065
And every single time you need to look at
a trillion parameters of information and

447
00:27:39,075 --> 00:27:42,905
then generate information, making sure
that you're not overly relying on copying

448
00:27:42,905 --> 00:27:47,085
information from a single source, but
you're building up a response from all

449
00:27:47,125 --> 00:27:51,895
different sources of information that you
do have available that is time consuming.

450
00:27:53,535 --> 00:27:53,755
Got it.

451
00:27:53,965 --> 00:27:54,345
Okay.

452
00:27:54,485 --> 00:27:54,705
So,

453
00:27:54,735 --> 00:27:56,465
it is the unpredictable.

454
00:27:57,900 --> 00:28:00,580
Nature of you don't know how
much data you're going to have

455
00:28:00,580 --> 00:28:04,110
to pull in basically Okay.

456
00:28:04,210 --> 00:28:07,920
And then you talked a little
bit about the robustness.

457
00:28:08,650 --> 00:28:13,830
And if I understood correctly, you
said something about: as people

458
00:28:13,830 --> 00:28:16,750
use this models, they deteriorate?

459
00:28:17,405 --> 00:28:22,875
so any single time people are interacting
with these models, what really happens

460
00:28:22,875 --> 00:28:27,915
is we're asking a certain kind of
questions over a period of time.

461
00:28:28,725 --> 00:28:34,345
as the model is answering those
particular kind of questions, it starts

462
00:28:35,065 --> 00:28:36,855
learning new information as well.

463
00:28:36,925 --> 00:28:40,695
And with a period of time, it
starts forgetting other information.

464
00:28:40,975 --> 00:28:44,645
consider this, you basically started
in high school, you learned a couple of

465
00:28:45,125 --> 00:28:49,325
subjects, you learned social sciences,
you learned physics, you learned chemistry

466
00:28:49,325 --> 00:28:52,915
and everything, but now you're doing
software engineering, which is that's

467
00:28:52,955 --> 00:28:54,485
the thing you're doing all day long.

468
00:28:55,005 --> 00:28:58,885
Now, if I was to ask you a chemistry
question, it would take you a very

469
00:28:58,885 --> 00:29:02,185
long time and you'll have to think
and you may not be able to answer

470
00:29:02,390 --> 00:29:06,500
accurately on a chemistry question,
compared to when you were exposed to

471
00:29:06,550 --> 00:29:08,610
that information on a daily basis.

472
00:29:09,020 --> 00:29:11,720
Now that's the same thing which is
happening with large language models

473
00:29:11,720 --> 00:29:15,900
as well, which is based on the kind
of Interactions they're having with

474
00:29:16,000 --> 00:29:20,250
these models, based on the inputs that
they're getting from the users itself

475
00:29:20,280 --> 00:29:22,910
they can drift in a particular direction

476
00:29:24,163 --> 00:29:24,393
You're

477
00:29:24,555 --> 00:29:26,055
completely blowing my mind.

478
00:29:26,075 --> 00:29:30,095
What I thought was happening is that
once I've got a model trained, let's

479
00:29:30,095 --> 00:29:33,175
say that I download LLAMA-3, right?

480
00:29:33,215 --> 00:29:35,735
And I run it on my, computer.

481
00:29:36,715 --> 00:29:39,795
I thought that this were
static weights that didn't.

482
00:29:40,305 --> 00:29:41,945
budge anymore, right?

483
00:29:41,985 --> 00:29:47,785
I was just sorting a query through it,
getting some kind of output, and my model

484
00:29:47,785 --> 00:29:50,085
itself wasn't changing over time at all.

485
00:29:50,760 --> 00:29:56,280
that's if you're implementing the model
as is, but the moment you implement it

486
00:29:56,280 --> 00:30:00,720
in production, it changes because now
you're integrating real data sources as

487
00:30:00,720 --> 00:30:06,220
well, but also with LLAMA models as well,
which is if we keep interacting with

488
00:30:06,220 --> 00:30:09,540
the model for a couple of hours of time.

489
00:30:09,810 --> 00:30:15,120
It it will look at the queries that were
done previously to answer you questions

490
00:30:15,120 --> 00:30:19,900
quickly over that period of time, based
on the last interactions, essentially,

491
00:30:20,380 --> 00:30:22,310
there's a particular reason for that.

492
00:30:22,330 --> 00:30:25,180
it's basically the same thing
which is happening in our brains,

493
00:30:25,210 --> 00:30:27,970
which is the same neurons are
getting fired over and over again.

494
00:30:28,190 --> 00:30:32,990
as some information gets fired over
and over again, as some rates are being

495
00:30:32,990 --> 00:30:37,770
called over and over again, those rates
do get higher priority eventually.

496
00:30:37,910 --> 00:30:43,300
Okay, so why can't you just have
a static set of weights for this

497
00:30:43,300 --> 00:30:47,630
model and not adjust them so
that you don't have that problem?

498
00:30:48,360 --> 00:30:49,310
Why is it not enough?

499
00:30:49,845 --> 00:30:54,055
because then it wouldn't be able to
do domain adaptation, which is, it may

500
00:30:54,115 --> 00:30:59,565
work fantastically well for the idea
said that you've provided it with, but

501
00:30:59,605 --> 00:31:03,455
if you need to do something on top of
that, which is implemented for your

502
00:31:03,455 --> 00:31:06,445
use case, then it can't really do that.

503
00:31:06,445 --> 00:31:11,585
And then again, the big question around:
if we wanted to behave in a certain

504
00:31:11,595 --> 00:31:15,785
way, we want it to answer questions
in a certain way, it wouldn't be able

505
00:31:15,785 --> 00:31:17,655
to have those capabilities either.

506
00:31:17,655 --> 00:31:21,285
So the whole RLHF thing with
where we teach the model,

507
00:31:21,305 --> 00:31:22,545
this is wrong, this is right.

508
00:31:22,815 --> 00:31:24,155
That doesn't really happen.

509
00:31:24,915 --> 00:31:27,205
So there's essentially
no learning happening.

510
00:31:27,205 --> 00:31:29,565
So the performance is static.

511
00:31:29,585 --> 00:31:33,965
It could be bad and it will
deteriorate over time just because

512
00:31:33,995 --> 00:31:38,275
you know that this model wouldn't be
able to generalize further for me.

513
00:31:38,485 --> 00:31:41,820
So it doesn't generalize
with you as a person.

514
00:31:42,425 --> 00:31:46,155
I'm asking because I thought that you
would just update that model of new

515
00:31:46,155 --> 00:31:51,195
data and have a fresh fine tuned or
whatever updated version here and there.

516
00:31:51,755 --> 00:31:53,855
And, you would just replace it.

517
00:31:54,355 --> 00:31:58,890
But if you're telling me that this is
how most people run this models, then

518
00:31:58,890 --> 00:32:00,885
I understand why this is so scary.

519
00:32:00,925 --> 00:32:05,420
Not only have this model, but
also people interacting with it,

520
00:32:05,690 --> 00:32:08,980
they can break it, they can find
a new way of going around your

521
00:32:09,005 --> 00:32:10,515
hacking the model as well.

522
00:32:10,595 --> 00:32:11,035
Yes.

523
00:32:11,300 --> 00:32:15,630
And by design, you want it to be malleable
and every conversation it has with

524
00:32:15,630 --> 00:32:17,700
someone is actually changing the model.

525
00:32:18,585 --> 00:32:20,015
That's like triple scary.

526
00:32:20,940 --> 00:32:24,740
that's essentially why you need a
new framework or why you need a new

527
00:32:25,000 --> 00:32:31,835
field that I was the core inspiration
for as okay, this field has gone a

528
00:32:31,835 --> 00:32:34,425
little bit harder than it used to be.

529
00:32:35,750 --> 00:32:36,120
Okay.

530
00:32:36,120 --> 00:32:42,100
So not to ask you for spoilers or
anything, but what can you do about that?

531
00:32:42,240 --> 00:32:45,690
what's your book going to introduce
to make this stuff better?

532
00:32:46,323 --> 00:32:52,063
I don't think I can make the stuff
better, but, if you can measure

533
00:32:52,063 --> 00:32:54,863
something, then you can improvise it.

534
00:32:55,043 --> 00:32:57,573
Or you can see if something works.

535
00:32:57,993 --> 00:33:00,263
That's happening is an outlier as well.

536
00:33:00,623 --> 00:33:04,833
So what my book really does, is give
you ways to measure things, which is

537
00:33:05,163 --> 00:33:09,833
instead of just thinking about security,
'okay, I need to do X, Y, Z, blah, blah,

538
00:33:09,863 --> 00:33:13,743
blah, things', giving you a systematic
framework to think about evaluations,

539
00:33:13,743 --> 00:33:18,958
which is, instead of implementing X
framework or Y framework, which is let's

540
00:33:18,958 --> 00:33:23,538
say, instead of implementing just rove
or blue score or anything that comes out

541
00:33:23,558 --> 00:33:27,768
tomorrow in the market, you really need
to understand what am I essentially doing?

542
00:33:28,178 --> 00:33:30,228
Why are these scores really helpful?

543
00:33:30,418 --> 00:33:34,558
What are the limitations of these
ones, where do they essentially fail?

544
00:33:34,888 --> 00:33:37,318
What are the new things
that can be implemented?

545
00:33:37,338 --> 00:33:40,028
What are the properties that
those new things need to have?

546
00:33:40,798 --> 00:33:41,948
So I'm more.

547
00:33:42,773 --> 00:33:46,953
Building the field from that first
principles thing, which is understanding

548
00:33:47,623 --> 00:33:52,763
what do you really need and for a lot of
things that I'm introducing in the book,

549
00:33:52,773 --> 00:33:56,023
there isn't really a framework, there
isn't really a technology out there.

550
00:33:56,133 --> 00:34:00,193
And a lot of things I say, there can be
a software that can be built around it.

551
00:34:00,453 --> 00:34:01,233
Nobody has.

552
00:34:02,628 --> 00:34:03,178
Okay.

553
00:34:03,458 --> 00:34:05,328
So that sounds like a good first step.

554
00:34:06,708 --> 00:34:16,613
Can I ask you in like a nutshell version
of what's a life cycle of an LLM, like

555
00:34:16,613 --> 00:34:21,893
a modern one that you would see in
production somewhere right now, typically

556
00:34:21,893 --> 00:34:25,553
looks like, because I'm just realizing
I have holes in my understanding.

557
00:34:25,553 --> 00:34:27,723
It just blew my mind
about the context drift.

558
00:34:28,563 --> 00:34:33,503
So can you walk me through what
happens from the moment that, a company

559
00:34:33,503 --> 00:34:38,013
decides, 'okay, we need a model to
do this because we really want our

560
00:34:38,033 --> 00:34:44,893
customers talk to something online how
you add the domain knowledge to it.

561
00:34:44,913 --> 00:34:48,483
How do you evaluate it and how
you integrate and then deploy

562
00:34:48,483 --> 00:34:49,773
and monitor the whole thing?

563
00:34:50,713 --> 00:34:55,393
let me be very precise in saying this,
which is the first step for anybody

564
00:34:55,423 --> 00:34:59,843
to implement these models is use a toy
model or use something which already

565
00:35:00,023 --> 00:35:07,423
exists and implement it as is and build
evaluation metrics around your problem.

566
00:35:08,213 --> 00:35:13,123
so instead of trying to fine tune your
model, or instead of giving it new

567
00:35:13,153 --> 00:35:15,233
data, just implement the model as is.

568
00:35:15,623 --> 00:35:20,373
use ChatGPT or something, and build
evaluation metrics, which is what

569
00:35:20,433 --> 00:35:22,243
was I trying to measure around it?

570
00:35:22,993 --> 00:35:25,543
How is the model performing
on these kind of tasks?

571
00:35:25,553 --> 00:35:27,823
So breaking those things
down is the first step.

572
00:35:29,123 --> 00:35:33,533
Then it gets a little bit more intricate
than that, which is once you realize

573
00:35:33,553 --> 00:35:39,148
these are the holes, or this is the
data that I needed for the model to

574
00:35:39,158 --> 00:35:42,328
be able to answer, which is now I
need the model to be able to answer

575
00:35:42,328 --> 00:35:47,198
questions about my company particularly,
or about my product specifically.

576
00:35:47,518 --> 00:35:50,658
Now you're going into data
engineering, which is now you're

577
00:35:50,658 --> 00:35:54,943
thinking, what is the additional data
I can provide to the model itself?

578
00:35:55,983 --> 00:36:00,313
And once you've done that then there's
the whole pipeline of data engineering

579
00:36:00,323 --> 00:36:04,913
that goes in which is now you need to
think about how do you manage the noise?

580
00:36:04,953 --> 00:36:06,503
How do you augment the data?

581
00:36:06,503 --> 00:36:08,253
How are you tokenizing the data?

582
00:36:08,583 --> 00:36:12,773
How are you making sure that there's no
bias or toxicity in the data as well?

583
00:36:12,823 --> 00:36:18,693
And how do you make sure that the model
doesn't really memorize something.

584
00:36:18,963 --> 00:36:24,193
So the way models memorize information
is because some of the information

585
00:36:24,773 --> 00:36:26,463
occurs quite a lot of times.

586
00:36:26,493 --> 00:36:29,733
So that is essentially
called data deduplication.

587
00:36:29,983 --> 00:36:33,673
So making sure that there's no
deduplication in the model itself.

588
00:36:33,823 --> 00:36:37,053
How do you sanitize the data, which
is making sure that, there's no user

589
00:36:37,193 --> 00:36:41,623
information or any private information
removed from the data that you're

590
00:36:41,623 --> 00:36:43,213
providing to the model itself.

591
00:36:43,613 --> 00:36:49,673
So then once you have a set of
evaluation metrics, then the next step.

592
00:36:49,973 --> 00:36:53,973
Which is the next stage for the
company to go in, is implement the

593
00:36:53,973 --> 00:36:58,903
data engineering pipeline, then use the
same model on it and then evaluation.

594
00:37:00,523 --> 00:37:05,148
Once you've done evaluation on that
one, then the next step is letting

595
00:37:05,178 --> 00:37:06,628
people interact with the model.

596
00:37:06,668 --> 00:37:11,318
But before that, set up orchestration
deployment monitoring solutions on it

597
00:37:11,638 --> 00:37:17,048
so that if, you can measure what are
the interactions people are having

598
00:37:17,058 --> 00:37:18,948
with these models, essentially.

599
00:37:19,173 --> 00:37:19,703
As well.

600
00:37:20,593 --> 00:37:23,943
So if something goes wrong on
security, you can catch things

601
00:37:23,943 --> 00:37:25,483
quickly and turn things off, right?

602
00:37:26,053 --> 00:37:29,943
Or if there are a lot of people who
are interacting with the model, you

603
00:37:29,943 --> 00:37:34,683
can serve next time, okay, I need to
allocate X, Y, Z number of resources,

604
00:37:34,943 --> 00:37:37,733
or these are the kind of interactions
people are having with the model.

605
00:37:37,733 --> 00:37:42,683
Essentially, once you've gone through
stage two, now the stage three, the full

606
00:37:42,733 --> 00:37:46,858
pipeline is essentially you're doing
data engineering, then you have, an

607
00:37:47,368 --> 00:37:51,488
LLM router which chooses the best base
model or the foundation model for you.

608
00:37:51,498 --> 00:37:54,648
That really depends on the
kind of prompt as well.

609
00:37:55,208 --> 00:37:58,628
different prompts can use
different kinds of models.

610
00:37:58,678 --> 00:38:03,048
let's say if the person is asking an
algorithmic question, then ideally a

611
00:38:03,048 --> 00:38:06,738
model which is trained on mathematical
information would be much better.

612
00:38:07,198 --> 00:38:11,858
and plus the Other question is you don't
always need to use the expensive model.

613
00:38:11,858 --> 00:38:17,558
Sometimes you can get away with providing
a more, generalized information,

614
00:38:17,558 --> 00:38:20,948
which is if the person is asking very
simplistic question, you don't need

615
00:38:20,948 --> 00:38:25,928
to use ChatGPT, you want to have a
system that automatically sees that

616
00:38:25,968 --> 00:38:30,448
prompt and says, I think for this one,
I can, inference on LLAMA-2 instead,

617
00:38:30,678 --> 00:38:34,118
I think for this kind of prompt, I
can inference on so and so model.

618
00:38:35,278 --> 00:38:40,168
One step next after that, once
you've done all of that is, doing

619
00:38:40,198 --> 00:38:41,918
domain adaptation on the model.

620
00:38:41,978 --> 00:38:46,088
Now that can be done in a lot of ways,
you can implement prompt engineering

621
00:38:46,108 --> 00:38:52,808
pipelines using frameworks like DSPY,
or you can implement drag pipelines to

622
00:38:52,858 --> 00:38:57,763
introduce more information to the model,
without having to retrain the model, or

623
00:38:57,763 --> 00:39:01,653
you can do fine tuning and you essentially
do fine tuning when you want to achieve

624
00:39:01,663 --> 00:39:07,023
the behavior of the model or how it
essentially provides information for you.

625
00:39:07,253 --> 00:39:10,643
So tensioning is more
like us putting a wrapper.

626
00:39:11,123 --> 00:39:16,453
it's very similar to if we say we
want the input to be shaped like this.

627
00:39:16,453 --> 00:39:20,213
When you're doing structural
changes to the input, that's when

628
00:39:20,213 --> 00:39:21,443
you're doing prompt engineering.

629
00:39:22,053 --> 00:39:26,563
But the moment you say: 'I want the
input structure to be changed right

630
00:39:26,603 --> 00:39:31,073
now, I want the model to process this
information in a different way', then

631
00:39:31,143 --> 00:39:32,723
you're essentially doing fine tuning.

632
00:39:33,593 --> 00:39:38,673
so data engineering, then implementing
an LLM router, then doing some sort

633
00:39:38,673 --> 00:39:43,913
of domain adaptation on it, then
evaluation and orchestration as well.

634
00:39:43,973 --> 00:39:46,343
Orchestration is more like
the piece of how do you tie

635
00:39:46,343 --> 00:39:47,693
different software components.

636
00:39:47,703 --> 00:39:49,963
So how are you doing CI/CD on it?

637
00:39:50,433 --> 00:39:54,483
you're optimizing for things
over there to be able to now

638
00:39:54,543 --> 00:39:56,723
reduce the, Influence latency.

639
00:39:56,723 --> 00:40:02,653
then the next step is doing security
and reliability engineering, which I've

640
00:40:02,663 --> 00:40:06,553
not really seen a lot of companies do
it, but the companies that are working

641
00:40:06,553 --> 00:40:10,473
in banking have already started working
very heavily on it because they had the

642
00:40:10,473 --> 00:40:15,803
existing infrastructure where they were
doing extensive security, reliability,

643
00:40:15,803 --> 00:40:19,403
engineering, then a few other ones
were doing it, which is the big tech

644
00:40:19,413 --> 00:40:23,183
companies, but the more generalized
normal companies weren't doing it.

645
00:40:23,653 --> 00:40:26,803
But now that has become
one of the core stages.

646
00:40:27,163 --> 00:40:30,333
The next step is basically
doing deployment and monitoring.

647
00:40:30,703 --> 00:40:34,733
Once you've done all of that, and the
deployment and monitoring is done now.

648
00:40:34,783 --> 00:40:37,103
The end user is
interacting with the model.

649
00:40:37,633 --> 00:40:40,993
So when the end user is interacting
with the model, you're learning

650
00:40:41,033 --> 00:40:44,993
things because you've implemented
monitoring solutions on it.

651
00:40:45,233 --> 00:40:47,933
Now you're making additional
changes on security as well.

652
00:40:48,303 --> 00:40:53,403
you're learning from the data, using
the customer interaction data, giving

653
00:40:53,403 --> 00:40:55,243
it back to the database as well.

654
00:40:55,323 --> 00:40:59,230
So there's that step which gets
associated, so there's a loop of

655
00:40:59,230 --> 00:41:02,983
data flywheel that goes back into
the engineering stage itself.

656
00:41:03,983 --> 00:41:04,843
A few questions.

657
00:41:05,593 --> 00:41:06,973
The router...

658
00:41:07,643 --> 00:41:13,613
it's an interesting one, because in my
mind, if I talk to different models with

659
00:41:13,613 --> 00:41:18,503
my every query or my every follow up,
I might get different behaviors, right?

660
00:41:19,178 --> 00:41:20,378
Isn't that the problem?

661
00:41:20,598 --> 00:41:26,038
if you sometimes root to a cheap
funky model, because you want to

662
00:41:26,038 --> 00:41:29,408
save some money and sometimes it
goes to ChatGPT, The quality of my

663
00:41:29,408 --> 00:41:31,448
responses might vary significantly.

664
00:41:32,268 --> 00:41:36,648
Is there a good way to work around
that or it's just how it is?

665
00:41:37,255 --> 00:41:41,375
I think as long as your infrastructure
is monitoring, which is this output came

666
00:41:41,385 --> 00:41:45,965
from this in this model, It's actually
ideal because then you can compare

667
00:41:45,965 --> 00:41:50,145
the performance of different models on
the kind of queries as well and pick

668
00:41:50,155 --> 00:41:56,275
which prompts or even pick which models
should you be using in vain to subside

669
00:41:56,585 --> 00:42:00,940
the use of a particular model within
your outer solution itself as well.

670
00:42:02,050 --> 00:42:04,980
And what does a router like
this actually look like?

671
00:42:05,010 --> 00:42:07,260
Is that a deterministic algorithm?

672
00:42:07,260 --> 00:42:08,430
Or is it another model?

673
00:42:08,430 --> 00:42:10,240
Is it like turtles all the way down?

674
00:42:10,290 --> 00:42:14,040
I've come across, I think
probably two companies that

675
00:42:14,070 --> 00:42:16,310
have built, a semantic router.

676
00:42:16,410 --> 00:42:20,680
they're looking at the semantics of the
prompt itself and, based on resource

677
00:42:20,680 --> 00:42:24,430
limits set by the client itself, which
could be like the company itself.

678
00:42:24,780 --> 00:42:27,500
They're picking up a particular
model at that point in time.

679
00:42:28,590 --> 00:42:31,000
So those are very deterministic solutions.

680
00:42:31,450 --> 00:42:36,010
I've not really seen non deterministic
solutions put into play where you could

681
00:42:36,040 --> 00:42:40,480
actually use a large language model
as a routing solution, Or like using

682
00:42:40,480 --> 00:42:44,100
a decision tree for, as an LLM router.

683
00:42:44,100 --> 00:42:46,910
So I've not really seen those
kind of implementations yet.

684
00:42:49,385 --> 00:42:50,601
What about the evaluation?

685
00:42:50,611 --> 00:42:54,791
So that sounds straightforward,
but in practice, how do you

686
00:42:54,791 --> 00:42:57,561
evaluate freestyle text?

687
00:42:57,751 --> 00:43:01,641
do you get people to look at the responses
and compare 'oh, I like this one better.

688
00:43:01,641 --> 00:43:03,021
like the chatbot arena'.

689
00:43:03,421 --> 00:43:08,801
Or are there more, scientific ways
of comparing, different models.

690
00:43:09,720 --> 00:43:12,760
there's more scientific way of
comparing different models because

691
00:43:13,130 --> 00:43:14,740
you're looking at so many things.

692
00:43:14,760 --> 00:43:19,070
You're looking at if the model is
engaging, you're looking at if the model

693
00:43:19,330 --> 00:43:22,170
is, aware about that particular domain.

694
00:43:22,180 --> 00:43:24,550
Is the model really good
at question answering?

695
00:43:24,600 --> 00:43:30,450
Is the model good at recognizing when
it's giving a response that's off as well.

696
00:43:30,770 --> 00:43:35,150
so often picking the right model is
a little bit harder for that reason.

697
00:43:35,150 --> 00:43:38,660
But essentially when you're
building an evaluation pipeline

698
00:43:38,780 --> 00:43:42,610
for yourself, think about what
is your model essentially doing?

699
00:43:42,650 --> 00:43:46,500
Are you building a model that is
heavily focused on retrieval only?

700
00:43:46,920 --> 00:43:50,700
Or are you building a model that's very
heavily focused on generation only?

701
00:43:51,130 --> 00:43:56,580
Both problems can be broken down, which is
retrieval needs to have its own metrics.

702
00:43:56,920 --> 00:44:01,180
These can be, context recall,
context precision, basic

703
00:44:01,180 --> 00:44:02,810
recall, basic precision as well.

704
00:44:03,100 --> 00:44:09,335
And for the more generative use cases, you
need to have different metrics as well.

705
00:44:09,585 --> 00:44:15,675
So the metrics for the generative
solutions or to test the generative

706
00:44:15,845 --> 00:44:20,365
performance you have n gram metrics,
which are the blue scores, the raw scores

707
00:44:20,365 --> 00:44:25,805
that people used to implement in like the
conventional NLP models, then you have

708
00:44:25,955 --> 00:44:32,075
sem score, which is basically looking at
the semantic similarity of the model with

709
00:44:32,075 --> 00:44:33,895
a base transformer model, essentially.

710
00:44:34,155 --> 00:44:36,675
So a birb score, sem score, mover score.

711
00:44:36,735 --> 00:44:38,875
these are essentially
called similarity scores.

712
00:44:38,876 --> 00:44:42,275
And then there are LLM
based scoring as well.

713
00:44:42,605 --> 00:44:44,585
so there are three different categories.

714
00:44:44,645 --> 00:44:44,795
if.

715
00:44:45,175 --> 00:44:46,385
Somebody wants to learn.

716
00:44:46,385 --> 00:44:51,305
I'll leave it for the users, which is I
have a talk on this thing, particularly,

717
00:44:51,755 --> 00:44:54,425
for I did an O'Reilly super stream.

718
00:44:54,535 --> 00:44:58,035
So that will give you like a really
good framework to think about this.

719
00:44:58,630 --> 00:45:02,540
which is how to do evaluation, how to
think about it super systematically,

720
00:45:02,550 --> 00:45:06,850
where, this is the actual number that I'm
supposed to get, which is if it's above 0.

721
00:45:06,890 --> 00:45:09,650
5, if this number is above 0.

722
00:45:09,651 --> 00:45:10,860
7, then I need to optimize.

723
00:45:11,791 --> 00:45:12,171
Got it.

724
00:45:12,831 --> 00:45:15,751
Abhi, do you think we could make
it a little bit more concrete and

725
00:45:15,761 --> 00:45:18,051
go through this with some examples?

726
00:45:18,091 --> 00:45:21,101
imagine that you own Reddit, right?

727
00:45:21,271 --> 00:45:24,791
you've got all this people talking
about all the different topics and

728
00:45:24,841 --> 00:45:27,231
they tend to be useful in some domains.

729
00:45:27,951 --> 00:45:32,261
And let's say that you wanted to
build a model that you can chat about,

730
00:45:32,271 --> 00:45:36,981
that basically knows all the things
that, people at Reddit talk about.

731
00:45:37,541 --> 00:45:41,521
And if you wanted to build like
a proof of concept to get a model

732
00:45:41,521 --> 00:45:45,721
that can answer queries about
that, how would you go about that?

733
00:45:48,000 --> 00:45:54,280
So very simple would be use a similar
model, which is, now we're looking at,

734
00:45:54,950 --> 00:45:58,330
Reddit conversations specifically, right?

735
00:45:58,550 --> 00:46:01,730
so what that essentially
is basically some.

736
00:46:02,335 --> 00:46:06,545
Internet website that has a lot of
information, which has a lot of textual

737
00:46:06,545 --> 00:46:11,515
information, by people on a lot of
different topics and a lot of different

738
00:46:11,565 --> 00:46:14,945
languages as well, though I'm not
entirely sure about the language part.

739
00:46:15,435 --> 00:46:16,635
So what I would do is.

740
00:46:17,080 --> 00:46:22,300
Look at huggingface for models that
are trained on conversational data.

741
00:46:22,350 --> 00:46:26,650
or sub stack kind of data where
people answering questions.

742
00:46:27,120 --> 00:46:30,630
so ideally a model that is trained
on that kind of information

743
00:46:30,720 --> 00:46:32,210
would be my base model.

744
00:46:32,720 --> 00:46:37,020
Then the next step would be scraping
data from Reddit, essentially.

745
00:46:37,420 --> 00:46:40,450
so that would be the next step,
which is building my own, dataset

746
00:46:40,510 --> 00:46:44,760
pipeline from Reddit, essentially,
and doing fine tuning with that.

747
00:46:45,250 --> 00:46:49,530
so that, that would be the first
two steps, and then, the whole

748
00:46:49,570 --> 00:46:53,620
evaluation, security, and all of
those things will always be consistent

749
00:46:53,670 --> 00:46:55,280
with all of the models, essentially.

750
00:46:55,911 --> 00:46:56,261
Got it.

751
00:46:56,271 --> 00:47:01,241
So in theory you could take a LLAMA
and then fine tune it on all of

752
00:47:01,261 --> 00:47:06,951
Reddit's data and hopefully it would
give you something to start with, and

753
00:47:06,951 --> 00:47:11,131
then you would have to worry about
Evaluating it and all the other things.

754
00:47:13,501 --> 00:47:13,731
All right.

755
00:47:13,741 --> 00:47:21,856
So I think that probably gives our
listeners tonight enough to eagerly await

756
00:47:21,856 --> 00:47:27,146
your book now and wonder when they're
going to be able to actually read the

757
00:47:27,146 --> 00:47:29,626
whole thing or maybe buy it off Amazon.

758
00:47:30,096 --> 00:47:33,066
Is there an ETA at the
moment that we can give them?

759
00:47:33,871 --> 00:47:38,141
the early release would happen sometime
next month, which is, we're already

760
00:47:38,331 --> 00:47:43,071
in May, it should happen sometime in
June, the whole book is supposed to

761
00:47:43,081 --> 00:47:44,880
be available by the end of the year.

762
00:47:45,691 --> 00:47:46,191
Awesome.

763
00:47:47,271 --> 00:47:47,611
All right.

764
00:47:47,701 --> 00:47:51,681
And before I let you off the hook, I
think you might have seen it coming.

765
00:47:51,681 --> 00:47:55,601
I'm going to ask you for some
predictions, obviously, with all

766
00:47:55,601 --> 00:47:57,971
the caveats, how difficult it is.

767
00:47:58,021 --> 00:48:02,401
And, previous performance is
not a guarantee of future gains.

768
00:48:02,501 --> 00:48:04,191
where do you see all of this going?

769
00:48:04,801 --> 00:48:11,331
I see more people using generative
models Instead of the number of people

770
00:48:11,331 --> 00:48:15,531
who were using it before, one of
the big shifts is which is going to

771
00:48:15,531 --> 00:48:19,231
happen is the productivity in that
people are getting from these models.

772
00:48:19,241 --> 00:48:20,791
So it could be developers.

773
00:48:20,791 --> 00:48:22,601
It could be people who
are doing copywriting.

774
00:48:23,131 --> 00:48:24,961
So companies are getting smaller.

775
00:48:25,576 --> 00:48:27,456
And they will continue to get smaller.

776
00:48:27,806 --> 00:48:32,556
the number of companies that were
working with external people or external

777
00:48:32,786 --> 00:48:36,546
audits is going to get smaller as
well, I think going into the future,

778
00:48:36,596 --> 00:48:40,306
we would be seeing that shift of,
you could say create an economy.

779
00:48:40,306 --> 00:48:43,856
I'm not entirely sure what would be the
right word in the specific scenarios,

780
00:48:43,876 --> 00:48:45,756
which is every person is a company.

781
00:48:46,046 --> 00:48:51,635
So now instead of every person being
a company,  a company being 500, 800,

782
00:48:51,736 --> 00:48:56,796
and 55,000 employees, they will get
certainly much, much, much smaller

783
00:48:57,436 --> 00:49:01,886
because one person is going to be able
to do a lot, and there's a lot of stuff

784
00:49:01,886 --> 00:49:03,936
that would be automated essentially.

785
00:49:04,416 --> 00:49:10,216
along the lines of what Altman was
saying about how he's expecting a

786
00:49:10,276 --> 00:49:16,166
unicorn single person company very soon
because of the increased productivity?

787
00:49:18,815 --> 00:49:23,625
I think I would agree with that and,
very importantly, this is something which

788
00:49:23,625 --> 00:49:28,975
I've mentioned in like the chapter one
of my book as well, which is how big

789
00:49:29,075 --> 00:49:33,085
is the shift essentially, there were
a couple of surveys that were done.

790
00:49:33,545 --> 00:49:38,415
And it wouldn't be wrong to say that
within the next five years, essentially

791
00:49:38,415 --> 00:49:44,351
28% jobs, at least in some professions,
would be eliminated and, they may be

792
00:49:44,391 --> 00:49:49,001
eliminated in the sense of like those
people become unemployed for a period

793
00:49:49,001 --> 00:49:54,121
of time because, now the three people
are able to do five people's task.

794
00:49:55,851 --> 00:49:59,621
Because again, they've gained more
productivity, I don't think people will

795
00:49:59,841 --> 00:50:03,731
be unemployed for long, there will be
more and more companies essentially.

796
00:50:03,733 --> 00:50:07,763
Yeah, I think the one thing that I always
wonder about is, I remember as a kid

797
00:50:07,793 --> 00:50:12,703
reading all these predictions about how
all this increases in productivity will

798
00:50:12,703 --> 00:50:18,983
mean that people work less than their
work, like a couple days a week, and

799
00:50:19,013 --> 00:50:20,883
they will just have all this free time.

800
00:50:20,883 --> 00:50:25,833
And people are worrying about how
that's going to affect an average

801
00:50:25,833 --> 00:50:27,753
person having so much free time

802
00:50:28,416 --> 00:50:32,586
that's a question one of my friends
asked as well, which is what do you think

803
00:50:32,626 --> 00:50:35,816
people would do when full automation
really happens and they don't think

804
00:50:35,816 --> 00:50:40,606
there will ever be full automation, there
needs to be monitoring systems that are

805
00:50:40,616 --> 00:50:46,336
always put into play, monitoring systems
can be automated, they still need to

806
00:50:46,346 --> 00:50:50,786
be fine tuned, but all of that, thing
is going to be still done by humans.

807
00:50:50,796 --> 00:50:54,736
So you could say humans are
transitioning from becoming

808
00:50:54,746 --> 00:50:56,936
workers to becoming managers.

809
00:50:58,081 --> 00:50:58,431
Yeah.

810
00:50:59,411 --> 00:51:03,241
I'm still working probably
similar amount of time, but on

811
00:51:03,241 --> 00:51:05,986
a slightly more productive way.

812
00:51:06,486 --> 00:51:06,776
Yeah.

813
00:51:06,776 --> 00:51:11,636
I think, we had this concept of, silent
promotion, that we were talking about

814
00:51:11,656 --> 00:51:16,606
on one of the previous episodes that
overnight, everybody who works with code

815
00:51:16,636 --> 00:51:22,006
basically went from single contributor to
effectively engineering manager with, Per

816
00:51:22,336 --> 00:51:27,826
like junior equivalent, software engineers
at their disposal with tools like co

817
00:51:27,826 --> 00:51:30,596
pilot and just chatting to ChatGPT.

818
00:51:31,286 --> 00:51:36,456
I have friends who are VCs who are
now trying to say, instead of trying

819
00:51:36,486 --> 00:51:42,066
to train and associate right now, to
teach about, how to look for deals

820
00:51:42,066 --> 00:51:45,756
or, how to compile information from
different datasets, which could be

821
00:51:45,796 --> 00:51:47,276
GitHub, which could be CrunchBase.

822
00:51:47,686 --> 00:51:49,146
Why not use a model instead?

823
00:51:49,146 --> 00:51:51,146
And there, instead, spending.

824
00:51:51,531 --> 00:51:57,321
50 to 60K on ChatGPT as compared to
hiring a person for that essential task.

825
00:51:57,801 --> 00:52:01,221
so people need to be
more autonomously driven.

826
00:52:01,321 --> 00:52:06,191
and the people who aren't, I think
they may have a problem, very soon.

827
00:52:08,396 --> 00:52:11,316
Of that, that billboard
'still hiring humans'.

828
00:52:11,516 --> 00:52:12,576
Have you seen that one?

829
00:52:14,526 --> 00:52:14,766
Yeah.

830
00:52:14,766 --> 00:52:17,856
The, one of those companies,
where is it called?

831
00:52:17,896 --> 00:52:18,496
The one that.

832
00:52:18,796 --> 00:52:24,006
There's the telephone AI
where you can call a number.

833
00:52:24,406 --> 00:52:28,616
Effectively, the billboard was this
massive, phone number to call and

834
00:52:28,656 --> 00:52:32,816
asking whether you're still hiring
humans and people are calling that.

835
00:52:32,816 --> 00:52:36,506
And apparently it can handle
million concurrent phone calls or

836
00:52:36,516 --> 00:52:37,936
some ridiculous stuff like that.

837
00:52:37,956 --> 00:52:44,386
And it's convincingly, replacing like
the receptionist or like booking,

838
00:52:44,486 --> 00:52:45,756
conversations that you had before.

839
00:52:46,276 --> 00:52:50,356
Something that I remember that demo
from Google years ago, I have, I'm

840
00:52:50,366 --> 00:52:53,976
forgetting what it was called, like duo
or something when they had a demo, it

841
00:52:53,996 --> 00:52:58,826
was making a reservation and then it
never really worked as well as the demo.

842
00:52:59,436 --> 00:53:03,586
So it's we're effectively reaching
that at that moment now, just

843
00:53:03,726 --> 00:53:04,996
with different companies doing it,

844
00:53:05,768 --> 00:53:10,708
maybe this is a realization I do
have constantly because I am ADHD.

845
00:53:11,068 --> 00:53:14,128
but we're interacting with
so much software or so much

846
00:53:14,128 --> 00:53:15,838
information, which is isolated.

847
00:53:15,838 --> 00:53:18,898
And what we're essentially doing
is trying to remember one thing

848
00:53:18,968 --> 00:53:20,208
and implement another thing.

849
00:53:20,498 --> 00:53:24,988
So We need systems that can interact
with all of these systems and

850
00:53:25,068 --> 00:53:27,488
be more like assistance for us.

851
00:53:27,488 --> 00:53:30,838
And that's where a lot of people are
trying to build up agents as well.

852
00:53:31,358 --> 00:53:36,958
So from isolated software, we're
going towards a system where our

853
00:53:37,238 --> 00:53:41,828
software is getting linked as in,
it's becoming an ecosystem as well.

854
00:53:42,398 --> 00:53:46,228
That is able to communicate and
anticipate our requirements.

855
00:53:46,618 --> 00:53:50,298
But the downsides of that is
still to be predicted, which

856
00:53:50,298 --> 00:53:53,058
is, what happens if it goes off?

857
00:53:53,518 --> 00:53:56,378
what happens if somebody
hacks into the system?

858
00:53:56,418 --> 00:53:59,763
the risk of, deploying such
systems is really high.

859
00:54:00,013 --> 00:54:04,283
So those are all technical problems
that would need to be solved for

860
00:54:04,283 --> 00:54:09,563
in, for that particular reason, I
think, the field of, safety, which is

861
00:54:09,563 --> 00:54:12,383
people who are working in LLMSecOps.

862
00:54:12,728 --> 00:54:16,918
And the field of evaluation, which is
people who are doing evaluation and

863
00:54:16,938 --> 00:54:21,818
monitoring, are going to be some of
the most important jobs as compared

864
00:54:21,818 --> 00:54:25,588
to people doing fine tuning and all of
those things, while those will continue

865
00:54:25,708 --> 00:54:31,188
to be important, but the more does that
we get from other companies will not

866
00:54:31,458 --> 00:54:37,688
eventually with time become really good
enough as well, where we may not need

867
00:54:37,718 --> 00:54:39,588
to do a lot of those things manually.

868
00:54:39,638 --> 00:54:43,588
A lot of work of a machine
learning engineer or data scientist

869
00:54:43,738 --> 00:54:45,188
will get automated as well.

870
00:54:46,183 --> 00:54:50,623
do you worry about other things that
might go wrong with all of this?

871
00:54:50,673 --> 00:54:53,863
I don't think that many people
are actually worried about,

872
00:54:53,923 --> 00:54:56,513
Skynet, materializing tomorrow.

873
00:54:57,223 --> 00:55:03,573
But are there things that you're
realistically concerned about in, short,

874
00:55:03,573 --> 00:55:05,993
maybe two to five years, time horizon?

875
00:55:07,418 --> 00:55:13,458
Yeah, one of the things that does
concern me is how are these models I'm

876
00:55:13,488 --> 00:55:18,988
being used by kids and, we're at the
kind of risk that, generative AI does

877
00:55:19,008 --> 00:55:23,538
pose to risk in elderly people who
don't really realize the difference,

878
00:55:23,968 --> 00:55:27,708
between something being generated versus
something being true, or should they

879
00:55:27,758 --> 00:55:30,288
rely on that to some extent or not?

880
00:55:30,768 --> 00:55:36,508
I think the whole spamming industry
got so big, or the whole stealing

881
00:55:36,568 --> 00:55:41,298
people's credit card information got
so big, precisely because people need

882
00:55:41,298 --> 00:55:44,388
to stay in touch with the technology,
the people who are more vulnerable.

883
00:55:45,923 --> 00:55:49,663
Are getting attacked and they're
the people who are most at risk.

884
00:55:49,903 --> 00:55:54,063
So what really concerns me is not people
who are data scientists or machine

885
00:55:54,063 --> 00:55:58,243
learning engineers and their jobs going
away People i'm concerned most about right

886
00:55:58,243 --> 00:56:03,453
now, are the people who are vulnerable So
kids and elderly people who will give a

887
00:56:03,453 --> 00:56:09,113
lot of information to ChatGPT hey charge
if you look at my, medical details and see

888
00:56:09,123 --> 00:56:11,143
you, what problem I may be having as well.

889
00:56:11,143 --> 00:56:16,143
And my parents are heavily using chat GP
as well, but they don't really realize a

890
00:56:16,143 --> 00:56:19,563
lot of information they're going giving
into the system can be hacked very

891
00:56:19,563 --> 00:56:22,573
easily and they can be phishing attacks.

892
00:56:22,593 --> 00:56:25,163
There can be all of those attacks as wel,.

893
00:56:25,183 --> 00:56:25,843
eventually.

894
00:56:27,603 --> 00:56:33,193
Yeah, and I think the scale is what
scares me the most about it, right?

895
00:56:33,193 --> 00:56:36,733
The fact that you can do
it at a massive scale.

896
00:56:36,743 --> 00:56:41,223
There's always been scammers
calling, elderly and scamming

897
00:56:41,243 --> 00:56:42,513
them out of their money.

898
00:56:43,163 --> 00:56:48,173
But now that you can automate it and you
can scale it up, you could conceivably

899
00:56:48,363 --> 00:56:50,063
just make it a massive problem.

900
00:56:51,623 --> 00:56:54,263
And the second bit of that problem is...

901
00:56:54,513 --> 00:56:59,093
when you steal all of that data, you're
stealing how the person is interacting

902
00:56:59,103 --> 00:57:03,773
because large language models are so
good at impersonating or trying to

903
00:57:04,213 --> 00:57:08,733
learn how a person structures their
question or answers their question.

904
00:57:08,763 --> 00:57:12,203
And, the same is happening with an
audio speech synthesis as well, which

905
00:57:12,203 --> 00:57:18,098
is the models are getting much better
at learning the, And the intonations

906
00:57:18,098 --> 00:57:22,468
or different, tonal capabilities of
different people as well and adapting to

907
00:57:22,468 --> 00:57:28,028
them does expose a lot of risk because
it becomes so easy to impersonate and

908
00:57:28,098 --> 00:57:31,378
spread misinformation or to be able to.

909
00:57:31,838 --> 00:57:38,238
Hurt somebody if hurt is a word or
is a concentration in that particular

910
00:57:38,268 --> 00:57:42,788
scenario, which is, it can impersonate
anybody and ask for certain information.

911
00:57:42,788 --> 00:57:46,588
It can interact with your child
and there's a lot of information

912
00:57:46,588 --> 00:57:49,278
as well because people are
interacting with these models every

913
00:57:49,278 --> 00:57:50,688
single minute of the day as well.

914
00:57:50,918 --> 00:57:54,898
And we'll get more and more With
all of the systems, which is

915
00:57:54,968 --> 00:57:59,718
Google is now integrating their
AI systems into Google Docs.

916
00:58:00,138 --> 00:58:02,028
So ChatGPT was already there.

917
00:58:02,038 --> 00:58:05,278
Now, Instagram might
very soon integrate this.

918
00:58:06,548 --> 00:58:11,838
I don't think there will be a world where
we can escape generative models as such,

919
00:58:12,408 --> 00:58:16,578
and the more we have conversations with
them, the more they are learning about

920
00:58:16,578 --> 00:58:21,718
their personalities and about everything
we're doing on the internet, essentially.

921
00:58:21,726 --> 00:58:21,986
Okay.

922
00:58:21,986 --> 00:58:24,416
I'm going to ask you
for one more prediction.

923
00:58:24,836 --> 00:58:26,846
And then I promise I'll
let you have the hook.

924
00:58:26,946 --> 00:58:33,576
Today, it's all about OpenAI, this
OpenAI that, we also have seen that memo

925
00:58:33,846 --> 00:58:36,966
about Google having no moat a year ago.

926
00:58:37,876 --> 00:58:40,996
You obviously are deep in the industry.

927
00:58:41,386 --> 00:58:46,716
Where do you expect to see the different
companies that were used to seeing, your

928
00:58:46,736 --> 00:58:51,446
Googles of the world that don't seem to be
doing that well with the AI, despite being

929
00:58:51,446 --> 00:58:53,416
there at the forefront and that long ago.

930
00:58:54,106 --> 00:58:58,166
The different startups that, didn't
exist a few years ago and now

931
00:58:58,166 --> 00:59:01,006
they're doing exceptional things.

932
00:59:01,926 --> 00:59:04,526
I'm thinking about places like midjourney.

933
00:59:04,976 --> 00:59:07,856
Where would you pay attention to the most?

934
00:59:07,866 --> 00:59:10,806
Where do you expect to see
the good stuff coming from?

935
00:59:12,676 --> 00:59:17,596
So I would say it would change, which
is, The companies that were able to be

936
00:59:17,596 --> 00:59:22,366
monopolies now, it will be very hard
to be a monopoly that easily without

937
00:59:22,966 --> 00:59:25,026
by just trying to build software.

938
00:59:25,436 --> 00:59:29,346
so by acquisition, yes, you can
be a monopoly, which is trying

939
00:59:29,366 --> 00:59:31,926
to acquire everybody, which is
essentially what Google was doing.

940
00:59:31,926 --> 00:59:34,356
So a lot of people do think
Google is essentially business

941
00:59:34,356 --> 00:59:35,406
world building products.

942
00:59:35,746 --> 00:59:38,596
No, they were essentially acquiring
all the small companies that

943
00:59:38,596 --> 00:59:39,976
were building excellent products.

944
00:59:40,926 --> 00:59:45,766
before they became big, and that's a word
we're moving further into because the

945
00:59:45,776 --> 00:59:50,926
bigger companies do have the infinite
resources, compute resources as well

946
00:59:50,986 --> 00:59:53,156
to be able to control the ecosystem

947
00:59:53,206 --> 00:59:55,906
So I would say we will more likely see.

948
00:59:56,211 --> 00:59:59,751
More monopolies, but those
wouldn't be monopolies because

949
00:59:59,751 --> 01:00:01,481
they have an excellent product.

950
01:00:01,531 --> 01:00:06,101
Those would be monopolies because they
have more access to information, and they

951
01:00:06,111 --> 01:00:09,111
have higher number of resources out there.

952
01:00:09,711 --> 01:00:14,511
The number of small companies, yes,
there will be many, but I do it easily

953
01:00:14,511 --> 01:00:19,471
assume, there, there will be still tons
of companies who make exits as compared to

954
01:00:20,641 --> 01:00:24,071
becoming the victim in every hype cycle.

955
01:00:24,261 --> 01:00:27,281
And I would say we're going through
a hype cycle right now where

956
01:00:27,651 --> 01:00:31,731
there's far too much paranoia and
there's far too much excitement.

957
01:00:31,851 --> 01:00:37,121
There's very little realism around,
the business value being derived

958
01:00:37,121 --> 01:00:38,901
out of these models essentially.

959
01:00:39,301 --> 01:00:43,171
so in this hype cycle, there are always
a lot of companies that get created.

960
01:00:43,811 --> 01:00:49,311
Two years from now, there's a very
good chance at least seven out of

961
01:00:49,311 --> 01:00:50,751
ten of those companies will die.

962
01:00:51,751 --> 01:00:52,331
Okay.

963
01:00:52,451 --> 01:00:54,321
And on that optimistic note,

964
01:00:56,651 --> 01:00:58,381
we're going to wrap up the episode.

965
01:00:58,531 --> 01:01:01,461
my guest, once again,
everybody was Abi Arian.

966
01:01:01,541 --> 01:01:03,971
You can find her at abbyarian.

967
01:01:04,021 --> 01:01:04,481
com.

968
01:01:04,661 --> 01:01:06,191
Is that the best place to find you?

969
01:01:07,556 --> 01:01:12,576
Yep, so that's one place where you find
all the information or where I'm giving

970
01:01:12,626 --> 01:01:17,926
talks, because essentially that's where
I'm presenting bits of information from

971
01:01:17,936 --> 01:01:20,056
my book and testing out my material.

972
01:01:21,326 --> 01:01:25,506
So the best place to find information
about me or to find social media.

973
01:01:25,506 --> 01:01:30,556
And if the links change, but otherwise,
I'm @goabiarian on Twitter, on

974
01:01:30,596 --> 01:01:32,506
threads, on Instagram, on LinkedIn.

975
01:01:32,876 --> 01:01:34,886
So I use the same username everywhere.

976
01:01:35,356 --> 01:01:36,186
You can find me.

977
01:01:36,656 --> 01:01:37,186
There you go.

978
01:01:37,246 --> 01:01:42,286
Abi is omnipresent, always watching
you on every platform and the

979
01:01:42,296 --> 01:01:47,246
book once again called "LLM Ops,
Managing Large Language Models in

980
01:01:47,246 --> 01:01:49,436
Production", published by the O'Reilly.

981
01:01:49,896 --> 01:01:50,876
Thank you so much, Abhi.

982
01:01:50,936 --> 01:01:51,616
Thank you for coming.

983
01:01:53,066 --> 01:01:54,196
Thank you so much.