1
00:00:00,000 --> 00:00:03,110
Miko Pawlikowski: I'm Miko Pawlikowski, and this is Hockey Stick.

2
00:00:06,490 --> 00:00:08,740
Starting with generative AI can be daunting.

3
00:00:08,970 --> 00:00:12,239
There's a lot of hype, a lot of development, and a lot of change, daily.

4
00:00:12,730 --> 00:00:22,099
It's easy enough to launch ChatGPT and ask for a poem on how Vim is superior to Emacs, but to get value from it professionally requires a bit more skill.

5
00:00:22,660 --> 00:00:29,095
Today, I'm joined by Amit Bahri, The author of "Generative AI in Action", a brand new book published by Manning.

6
00:00:29,525 --> 00:00:39,575
Amit is a principal group technical program manager at Microsoft, where he leads the engineering
team that builds the next generation of AI products and services on the Azure AI platform.

7
00:00:40,125 --> 00:00:48,465
He has over 25 years of experience in technology and product development, including the artificial intelligence and cloud platforms fields.

8
00:00:49,235 --> 00:00:52,555
And yes, you will learn what his mom thinks about ChatGPT.

9
00:00:53,264 --> 00:00:56,554
Welcome to this episode and thank you for flying hockey stick.

10
00:00:57,066 --> 00:00:58,996
let's start right away.

11
00:00:59,096 --> 00:01:00,716
why did you write the book?

12
00:01:01,206 --> 00:01:13,905
Amit Bahree: being in the AI platform team at Microsoft, one of my roles, which more or less became the day job
over the last year and a half was meeting with a lot of our customers, which are generally large enterprises where

13
00:01:13,905 --> 00:01:23,136
everybody wanted to know how do I use Gen AI, obviously took over the world, as I joke, my mom's a ChatGPT expert

14
00:01:23,266 --> 00:01:24,666
Miko Pawlikowski: Oh yeah, I bet she is.

15
00:01:26,066 --> 00:01:33,726
Amit Bahree: and I basically, at the end of the day, got tired answering and guiding the same thing again and again across multiple customers.

16
00:01:34,206 --> 00:01:42,221
so I said, What if this could be put down on paper and they could just learn themselves rather than we being the bottleneck in many ways, right?

17
00:01:42,221 --> 00:01:46,461
So in, in full transparency, it was a selfish exercise.

18
00:01:46,571 --> 00:01:54,311
so I don't have to repeat myself again and again in doing this could just point them and say, Hey, go read this and that'll at least give you a jumpstart.

19
00:01:55,436 --> 00:01:56,266
Miko Pawlikowski: Yeah, exactly.

20
00:01:56,266 --> 00:01:57,766
Read the book.

21
00:01:58,366 --> 00:01:59,026
I love that.

22
00:01:59,136 --> 00:02:02,316
that's a completely valid, perfect origin story.

23
00:02:02,586 --> 00:02:07,226
so you mentioned that your mom is a generative AI expert, so I guess we'll interview her next time.

24
00:02:07,226 --> 00:02:11,636
But, for you,v what was your moment?

25
00:02:11,756 --> 00:02:13,686
When did you decide to go into AI?

26
00:02:13,786 --> 00:02:16,076
obviously you've been in it for a while.

27
00:02:16,546 --> 00:02:19,606
It wasn't as hot as it is right now back then.

28
00:02:19,946 --> 00:02:23,106
Can you tell us a little bit about your story and how you ended up doing it?

29
00:02:23,116 --> 00:02:23,776
What you're doing?

30
00:02:24,194 --> 00:02:26,114
Amit Bahree: I am actually not a data scientist.

31
00:02:26,134 --> 00:02:27,674
I'm not a machine learning engineer.

32
00:02:28,349 --> 00:02:34,109
I know how to build models, but that's not what I live and breathe and dream up in the middle of the night, as I know many of my colleagues do.

33
00:02:34,209 --> 00:02:35,109
That's their passion.

34
00:02:35,729 --> 00:02:51,099
in my previous role before Microsoft, one of the things I was learning was, emerging technologies, understanding from a
technical point of view, what they are, how they work, how they could be used, or mostly in the context of an enterprise setting.

35
00:02:51,269 --> 00:02:56,519
And one of the technologies among a few was AI, a few years ago.

36
00:02:56,569 --> 00:03:01,479
In my role of looking at emerging tech, is how I got into AI.

37
00:03:01,559 --> 00:03:07,769
Of course, Gen AI or these underlying architecture principles that power these things today didn't exist.

38
00:03:08,599 --> 00:03:09,779
But I was quite fascinated.

39
00:03:09,779 --> 00:03:17,309
it was still a side job in the sense it was one of a few areas of emerging technologies to go dig and deep into.

40
00:03:17,824 --> 00:03:23,414
And then as that started getting more traction, I was the one eyed king in the kingdom of blind.

41
00:03:23,484 --> 00:03:27,014
Because, I knew more than the others, didn't know, doesn't mean I know most.

42
00:03:27,014 --> 00:03:28,594
And then I was stuck with that.

43
00:03:29,784 --> 00:03:31,734
And then grew into that and got fascinated.

44
00:03:32,326 --> 00:03:33,736
Miko Pawlikowski: as they say, 'the rest is history'.

45
00:03:34,914 --> 00:03:35,784
Amit Bahree: It's still early days.

46
00:03:37,779 --> 00:03:45,479
Miko Pawlikowski: so what does a principal group technical program manager, that's a mouthful, is that how you introduce yourself at parties?

47
00:03:45,794 --> 00:03:46,604
Amit Bahree: No.

48
00:03:47,604 --> 00:03:49,884
Microsoft likes long names and titles.

49
00:03:49,934 --> 00:03:50,324
titles.

50
00:03:50,324 --> 00:03:59,004
Being a aside, I basically have officially two day jobs, unofficially three day jobs So I sit in what we call the AI platform team.

51
00:03:59,094 --> 00:04:04,854
We are the product team that builds all the AI products that power other products or our end customers.

52
00:04:04,854 --> 00:04:08,224
I have formally two buckets of responsibilities.

53
00:04:08,934 --> 00:04:20,074
Microsoft, our leadership goals, we sign large contracts with customers, within which we promise them either new or better AI features.

54
00:04:20,094 --> 00:04:27,154
it could be brand new things that we're building with them or for them, or it could be improving existing features.

55
00:04:27,164 --> 00:04:33,084
So once we sign those contracts, those land on my plate to go deliver from a platform team perspective.

56
00:04:33,084 --> 00:04:37,004
So I'm responsible for a lot of the custom engineering on the platform, which is this.

57
00:04:37,774 --> 00:04:40,044
That's my first bucket of responsibilities.

58
00:04:40,144 --> 00:04:45,724
My second bucket of responsibilities is whatever we do custom in the first, make sure it's in the platform.

59
00:04:46,554 --> 00:04:48,694
Because if you keep being custom, then there's no platform left.

60
00:04:49,044 --> 00:05:04,574
So the way I want you and the listeners who will get to this to think about it is, these large deals that we sign are
the catalyst for us to go do things in the platform that we already are thinking, maybe it's not prioritized enough.

61
00:05:04,574 --> 00:05:08,544
so they are a forcing function to go improve the platform at the end of the day.

62
00:05:09,014 --> 00:05:13,604
and that helps not just that one specific customer, but all the rest of them as well.

63
00:05:14,229 --> 00:05:29,044
And then my third unofficial one is anything and everything related to, Azure OpenAI coming from our, CEO and what we
call our SLT, which is CEO and his, direct reports, in the context of customers where, it's a top of mind for many and.

64
00:05:30,144 --> 00:05:35,204
For many folks, their understanding is varied, which somewhat ties back to the genesis of the book.

65
00:05:35,844 --> 00:05:47,344
so when Satya meets other, CEOs and they have a question or they're not happy about something
or they need guidance, those get sent over and say, here's the team is going to go help you.

66
00:05:47,444 --> 00:05:54,204
And so then I go in and from an engineering point of view, support, see what they need or what they want.

67
00:05:54,514 --> 00:05:56,394
So those are, that's my day job, right?

68
00:05:56,394 --> 00:05:57,474
So custom engineering.

69
00:05:57,979 --> 00:06:02,789
And then supporting, Azure OpenAI related things, from our leadership team.

70
00:06:03,306 --> 00:06:06,086
Miko Pawlikowski: and then there's your fourth job, which is writing books.

71
00:06:07,281 --> 00:06:20,411
Amit Bahree: That, indeed, that is also a moment of insanity in some ways, but yes, that is the graveyard shift,
as I call it, because, it's after the work is done, and, which is never done, these days at least, so yes.

72
00:06:20,806 --> 00:06:21,476
Miko Pawlikowski: Of course.

73
00:06:22,926 --> 00:06:32,376
I have to ask you, obviously, not that long ago, there was this entire drama of, Sam Altman being fired, and then rehired, and all of that.

74
00:06:33,166 --> 00:06:36,786
And a lot of people were wondering a lot of things.

75
00:06:36,836 --> 00:06:41,616
Satya was quite prominent during that entire conversation.

76
00:06:42,256 --> 00:06:44,296
What's your take on what happened?

77
00:06:44,969 --> 00:06:45,869
Amit Bahree: couple of things.

78
00:06:45,949 --> 00:06:52,959
we were learning along with the rest of the folks on Twitter or Reddit or wherever one follows things, right?

79
00:06:53,009 --> 00:06:59,239
the conversations that, Satya and Sam were having was above my pay grade, just to be black and white about it.

80
00:06:59,679 --> 00:07:03,265
So we were following along and listening along just like rest of the world.

81
00:07:03,315 --> 00:07:06,285
I think the one difference is, we had a little bit in the machinery.

82
00:07:06,715 --> 00:07:13,585
Obviously in our team, we do work from an engineering perspective closely with OpenAI and they're a massive partner to us.

83
00:07:13,585 --> 00:07:21,135
So I think in some cases, maybe we are a little more empathetic, I would say, because it's a little more closer to home.

84
00:07:21,135 --> 00:07:25,006
And, it's one big virtual team is loosely speaking, how to think about it.

85
00:07:26,103 --> 00:07:38,563
Miko Pawlikowski: So there was one particular thing that I think it's interesting and, It might be that people are
just reading way too much into that, but I think Satya went and said something along the lines of, 'don't you worry,

86
00:07:38,573 --> 00:07:45,383
even if OpenAI stops existing tomorrow, we're basically well positioned to continue, the innovation' and all of that.

87
00:07:45,703 --> 00:07:50,533
And a lot of people took it as saying, okay, they basically bought themselves OpenAI.

88
00:07:50,603 --> 00:07:52,123
is that roughly what's happening?

89
00:07:52,320 --> 00:07:53,070
Amit Bahree: Couple of things.

90
00:07:53,280 --> 00:07:55,850
One is now I'm not a Microsoft spokesman.

91
00:07:55,880 --> 00:07:57,340
I'm just talking on my behalf.

92
00:07:57,400 --> 00:07:58,560
we don't own OpenAI.

93
00:08:00,160 --> 00:08:01,220
I don't think that is correct.

94
00:08:01,220 --> 00:08:02,850
I think people are reading too much into it.

95
00:08:03,650 --> 00:08:11,690
I think the thing I want the folks to understand is, Microsoft and Microsoft research investments in AI have been over 30 years.

96
00:08:12,420 --> 00:08:14,675
So it's not just today we've woken up.

97
00:08:14,925 --> 00:08:18,615
Or a few years ago, we've woken up and say, 'look, this is the thing to go in'.

98
00:08:19,145 --> 00:08:24,245
I think the difference really is my mom didn't know about it, nor did she care.

99
00:08:24,875 --> 00:08:25,885
now she does.

100
00:08:26,135 --> 00:08:30,105
so I think, where we're coming from in some ways it's not new.

101
00:08:30,105 --> 00:08:41,210
And it's just become more in the limelight and people are becoming more aware, but We've been at it for
a while, and both from a research perspective, products perspective, it's just more in the limelight now.

102
00:08:41,950 --> 00:08:42,330
Miko Pawlikowski: Okay.

103
00:08:42,330 --> 00:08:45,670
let's leave Microsoft alone and talk a little bit closer to your book.

104
00:08:45,670 --> 00:08:49,240
So one of the questions that I keep asking everybody is.

105
00:08:49,975 --> 00:08:54,855
Their reason to think why GenAI is such a massive deal, right?

106
00:08:54,855 --> 00:08:59,165
Why is it such a big deal and why again, your mom, why does she know about it now?

107
00:08:59,215 --> 00:09:01,935
And she didn't before, I don't think she knew about BERT.

108
00:09:02,015 --> 00:09:09,110
I suspect, but she does know about ChatGPT and there's a good chance she's using ChatGPT, which is, next level.

109
00:09:09,890 --> 00:09:13,280
And, what do you think was so special recently?

110
00:09:13,280 --> 00:09:20,180
What's the like hockey stick moment from your perspective of what's changed that it became, a household name?

111
00:09:20,725 --> 00:09:29,105
Amit Bahree: it was ChatGPT itself that changed to make it a household name, and as we
all know and perhaps understand what most people is, the roots of ChatGPT was a demo.

112
00:09:29,655 --> 00:09:31,475
It wasn't meant to where it is right now.

113
00:09:32,175 --> 00:09:43,235
And the fact that one doesn't have to know BERT or any of the other sort of technical
mumbo jumbo, and I can just talk to it, I can just use it just as an end user.

114
00:09:43,845 --> 00:09:45,845
I think the simplicity is the power of it.

115
00:09:46,365 --> 00:10:02,145
And the breadth of what a language understanding, it can do versus as we call it now, the traditional AI, which is very odd, by
the way, in the first place, but, in the old AI, the pre gen AI, which is not old again, it's very much valid, of course, today.

116
00:10:02,750 --> 00:10:06,080
was very task specific, where you go deep in a certain task.

117
00:10:06,100 --> 00:10:13,280
So if you're in a company, in an enterprise doing a certain thing, using that, you understand it, you get its value, you know why it's powerful.

118
00:10:13,750 --> 00:10:21,520
But you can't have a generic, free ranging, wider set of, conversations and thoughts, around it.

119
00:10:21,520 --> 00:10:29,470
So if you take a previous chatbot, for example, which is not powered by GenAI, and you say, and if Miko goes and says, hey, I'm hungry,

120
00:10:31,815 --> 00:10:33,855
It won't know what to do, I'm sorry.

121
00:10:34,205 --> 00:10:43,065
Whereas these things understand, they adapt, so I think the simplicity from a using perspective is the power.

122
00:10:43,665 --> 00:10:47,385
And that's why the likes of my mom and others in the world is talking about it, right?

123
00:10:47,405 --> 00:10:53,325
Because it's not technical mumbo jumbo that a handful of people understand and you geek out in the corner.

124
00:10:53,475 --> 00:10:54,115
I can just use it.

125
00:10:56,970 --> 00:10:57,990
Miko Pawlikowski: you behind this comparison?

126
00:10:57,990 --> 00:11:05,190
This is the iPhone moment for, artificial intelligence in general, in particular, like large language models?

127
00:11:05,261 --> 00:11:13,661
Amit Bahree: is it the iPhone, the original one, or which was the one which got the 3G support, or when the App Store came up, is it that one?

128
00:11:13,661 --> 00:11:15,151
It's some variants out there, right?

129
00:11:15,191 --> 00:11:21,461
But, I look at it even simpler, because I think the, iPhone is still a very consumer thing at least.

130
00:11:21,651 --> 00:11:23,321
My world is very enterprising.

131
00:11:23,371 --> 00:11:24,731
Consumer is one side of the house.

132
00:11:24,731 --> 00:11:29,091
Enterprise is a very different kettle of fish in the sense of the problems and what they're trying to solve.

133
00:11:29,311 --> 00:11:34,061
So I think if you look at a consumer sense, like my mom, that is an iPhone sort of comparison moment.

134
00:11:34,441 --> 00:11:38,951
Miko Pawlikowski: If you go to Manning.com, you can actually browse portions of the book for free.

135
00:11:38,951 --> 00:11:44,661
So if you're listening along to that, go to Manning.com, find the book, and, look for figure 1.

136
00:11:44,661 --> 00:11:45,071
1.

137
00:11:45,071 --> 00:11:49,791
It's a graph, that Amit took out of our world in data .org.

138
00:11:49,791 --> 00:11:55,781
And it's called 'language and image recognition capabilities of AI systems, have improved rapidly'.

139
00:11:56,151 --> 00:12:09,831
And it's basically plotting, the human performance benchmark, which goes from minus 100, meaning that
it's pretty bad and goes all the way to zero where it's comparable, I think, or maybe equivalent to human.

140
00:12:09,881 --> 00:12:23,031
And, For everybody who's listening to that as a podcast and not seeing this on video, it's showing different,
machine learning, AI, trends, it's got handwriting, recognition, speech, recognition, image recognition,

141
00:12:23,391 --> 00:12:28,711
and then it's got the reading comprehension and language understanding and what's mind blowing to me.

142
00:12:28,711 --> 00:12:32,291
And I suspect this is why you chose this particular graph is that.

143
00:12:32,636 --> 00:12:37,566
We've got the handwriting and the speech recognition that kind of goes slowly, looks linearly.

144
00:12:37,956 --> 00:12:47,676
there was a little bit of progress and then somewhere in mid 2010s, it just goes out of control and, it goes all the way up, to very good results.

145
00:12:47,676 --> 00:12:52,456
And then 2016, I think on the graph starts the reading comprehension.

146
00:12:52,936 --> 00:12:59,096
It's basically, An arrow going straight up, same for language understanding.

147
00:12:59,096 --> 00:13:00,276
This is within two years.

148
00:13:00,286 --> 00:13:05,646
It goes from nothing literally to basically comparable to human performance.

149
00:13:05,696 --> 00:13:07,026
why did it happen then?

150
00:13:07,076 --> 00:13:09,996
What needed to happen this is not even a hockey stick.

151
00:13:09,996 --> 00:13:12,896
This is just like the right angle here.

152
00:13:14,196 --> 00:13:15,106
How do you explain that?

153
00:13:15,331 --> 00:13:15,581
Amit Bahree: true.

154
00:13:16,561 --> 00:13:18,101
I actually never thought of the right angle.

155
00:13:18,501 --> 00:13:21,461
I think it's, it's three things coming together, right?

156
00:13:21,461 --> 00:13:28,091
So one is aspects of AI and the research behind it have gotten better in that time frame, right?

157
00:13:28,091 --> 00:13:34,311
So we started getting deep learning, transformers I don't think quite existed at that point in time.

158
00:13:34,761 --> 00:13:40,141
so fundamental architecture changes, or improvements, from a model perspective, model architecture.

159
00:13:40,141 --> 00:13:40,911
So I think that's one.

160
00:13:41,501 --> 00:13:48,251
But I think crucially, maybe equally, maybe more crucially is availability of data at the scale you need.

161
00:13:48,856 --> 00:13:54,026
And then also compute most specific GPUs to, train and crunch through these.

162
00:13:54,386 --> 00:13:57,656
I think that it's that perfect storm of those three things coming together.

163
00:13:58,266 --> 00:14:01,316
if one of them didn't happen as much, it would be still slower.

164
00:14:01,356 --> 00:14:05,946
And that's why you see the linear progression in the others versus I don't know, is that a rocket thing?

165
00:14:05,986 --> 00:14:07,096
Miko Pawlikowski: basically vertical.

166
00:14:08,268 --> 00:14:11,428
Amit Bahree: so I think it's those three sort of things coming together.

167
00:14:11,648 --> 00:14:14,898
I personally believe, I don't think anything was planned or orchestrated.

168
00:14:14,898 --> 00:14:25,638
I think it's one of those happy accidents, how GPUs work and the number, the floating points
it needs to do for graphics, which is gaming, is the same thing that AI models need to do.

169
00:14:26,038 --> 00:14:34,888
We as humans started spitting on more data, maybe thanks to social, thanks to actually iPhones and other smartphones and devices and whatnot.

170
00:14:34,928 --> 00:14:40,438
And then, cloud capabilities in the context of GPUs and compute, improved.

171
00:14:40,788 --> 00:14:46,923
I guess there's a fourth one, which is inherent, but A lot of system engineering things started coming online, right?

172
00:14:46,933 --> 00:14:48,443
How do you run these?

173
00:14:48,623 --> 00:14:51,203
Because it's not like running them on one GPU, for example.

174
00:14:51,203 --> 00:14:52,773
You need clusters of machines.

175
00:14:52,843 --> 00:15:01,463
So there's a fair amount of systems engineering, in the sense of reliability, resilience, and so on, under the covers that, that has to  make it all happen.

176
00:15:01,493 --> 00:15:02,483
Otherwise it won't run.

177
00:15:02,713 --> 00:15:04,093
Lots of Physics and computer science.

178
00:15:04,093 --> 00:15:05,973
I keep saying that to my team, for example.

179
00:15:06,023 --> 00:15:14,913
so I think that's maybe a fourth dimension, which most people don't talk about, but, I think
those are the things that perhaps enabled a bunch of this to go where we are right now.

180
00:15:16,433 --> 00:15:26,323
Miko Pawlikowski: There's another interesting reference that you have, it's called
a survey of large language models and somehow I missed that I found it in your book.

181
00:15:26,323 --> 00:15:27,513
So thank you for that.

182
00:15:28,133 --> 00:15:31,983
and I think, page nine is where I found the figure three.

183
00:15:32,458 --> 00:15:35,543
it's going to be very difficult to describe on a verbal way, but.

184
00:15:36,023 --> 00:15:42,663
Imagine like a little anthill with a bunch of ants in it, swarming.

185
00:15:42,743 --> 00:15:44,893
And each one of those ants is basically a model.

186
00:15:45,553 --> 00:15:53,453
And, the figure is making a distinction between the ones that are basically open source, publicly available, and the ones that are closed source, and it's only.

187
00:15:53,803 --> 00:15:57,853
graphing, up to, GPT-4 and LLAMA-2.

188
00:15:57,873 --> 00:16:00,283
So there's, way more of that.

189
00:16:00,903 --> 00:16:05,933
I think at some point I saw that, hugging face had a hundred thousand models uploaded to it.

190
00:16:05,933 --> 00:16:12,853
And I suspect after Lama three, it's probably doubled since, it gives you a little bit of a perspective.

191
00:16:12,873 --> 00:16:16,723
It's not just ChatGPT and it's certainly not just OpenAI.

192
00:16:16,743 --> 00:16:19,973
And it, it shows you how much variety there is.

193
00:16:20,013 --> 00:16:23,603
And, frankly, I've been looking at this things for a while now.

194
00:16:23,653 --> 00:16:30,723
And I still, there's probably like half of this graph that I haven't actually even heard of, let alone, tried.

195
00:16:30,753 --> 00:16:34,263
I keep using this word Cambrian explosion, but it really does feel like that.

196
00:16:34,568 --> 00:16:39,238
They're just crawling out of every rock and hole, which is amazing.

197
00:16:39,258 --> 00:16:44,268
This is, such an exciting time to be alive, is that it's the right way of putting that.

198
00:16:44,998 --> 00:16:47,988
why did you choose that figure, for your book?

199
00:16:48,311 --> 00:16:54,131
Amit Bahree: I had two schools of thought when I originally said this would be a right one.

200
00:16:54,521 --> 00:17:01,221
I think one of them is what you touched on, yes, OpenAI and ChatGPT has the world's attention.

201
00:17:01,811 --> 00:17:05,871
but there's a lot of other innovation, a lot of other companies, a lot of other stuff going on as well.

202
00:17:05,871 --> 00:17:07,171
It's not only that.

203
00:17:07,621 --> 00:17:13,361
so I think it is more of awareness in that sense, because the book also is in my personal capacity.

204
00:17:13,371 --> 00:17:16,711
It's not a Microsoft-sponsored or a Microsoft book, right?

205
00:17:16,711 --> 00:17:23,731
So in that sense, I felt I would be doing a disservice if I didn't make folks, at least aware, because you just know what you know.

206
00:17:23,911 --> 00:17:26,271
So I think that was my one aspect.

207
00:17:26,541 --> 00:17:28,981
I think the second aspect was also.

208
00:17:28,981 --> 00:17:36,071
Showing lineage because a lot of these models are complex, as base models of training.

209
00:17:36,151 --> 00:17:48,081
they're super expensive, both in the sense of data gathering, cleaning it up, actual
training costs, and so on and so forth, which many don't really have the appetite.

210
00:17:48,121 --> 00:17:50,541
or have the ability resources-wise to do that.

211
00:17:51,036 --> 00:18:01,336
So what I also wanted to show was, at the end of the day, it's still only a handful of base models that are further trained or fine-tuned and derived from.

212
00:18:02,056 --> 00:18:06,516
so it's a lineage aspect also I wanted to, because that gets lost in the noise as well.

213
00:18:07,076 --> 00:18:17,166
and again, the framing of the book is mostly in enterprises, so if you're in an enterprise
setting, you just need to know the roots of the model you're using and the lineage it has.

214
00:18:17,961 --> 00:18:21,611
So you can make an informed decision if that's the right thing or not the right.

215
00:18:22,596 --> 00:18:27,216
Miko Pawlikowski: Speaking of which, that reminds me, Phi-3 released last week.

216
00:18:27,746 --> 00:18:32,276
It seems to be punching above its, weight category, quite heavily.

217
00:18:33,006 --> 00:18:35,731
were you involved in any capacity in that project?

218
00:18:36,356 --> 00:18:37,516
Amit Bahree: in a minor way.

219
00:18:37,516 --> 00:18:44,006
So if you go read the technical paper, I'm one of the 70 some people listed on that.

220
00:18:44,036 --> 00:18:45,026
it's a team sport.

221
00:18:45,476 --> 00:18:52,056
so the team that built the, the SLM, the Phi-3 is originally from our platform team.

222
00:18:52,531 --> 00:18:57,961
they've been moved out of that into the new GenAI team we've recently formed and publicly announced.

223
00:18:58,351 --> 00:19:00,061
so we work very closely with the team.

224
00:19:00,161 --> 00:19:07,861
even though I have roots in applied research, I don't think I can take credit to say I built the model, but I've been involved with it for sure.

225
00:19:08,141 --> 00:19:08,951
Miko Pawlikowski: you're on the paper.

226
00:19:08,951 --> 00:19:12,651
That means you, you built it, you can claim that

227
00:19:13,096 --> 00:19:20,746
Amit Bahree: I think Sebastian and the others have been very kind where some of us
have been involved in providing feedback and input and guidance and what have you.

228
00:19:21,046 --> 00:19:24,526
I think they've been quite kind and then they've done the right thing.

229
00:19:24,526 --> 00:19:27,286
But that doesn't mean I can take foot credit.

230
00:19:27,416 --> 00:19:29,691
the way I think it is, it takes a village.

231
00:19:30,071 --> 00:19:31,991
Each village needs an idiot, and that's me.

232
00:19:31,991 --> 00:19:32,741
It's an important role.

233
00:19:32,741 --> 00:19:33,611
Somebody has to do it.

234
00:19:34,626 --> 00:19:34,986
Miko Pawlikowski: Oh, wow.

235
00:19:34,986 --> 00:19:36,406
That, is a lot of authors.

236
00:19:36,426 --> 00:19:39,771
I just opened, and the paper, was released three days ago.

237
00:19:40,176 --> 00:19:45,326
looks like it, and it is an impressive number of people working on that.

238
00:19:45,656 --> 00:19:46,986
I've been reading people's opinions.

239
00:19:46,986 --> 00:19:48,396
I haven't actually read the paper.

240
00:19:48,446 --> 00:19:52,646
so I don't know how it explains how it's possibly this good.

241
00:19:53,046 --> 00:19:54,646
it happened a few days.

242
00:19:54,646 --> 00:19:57,356
Was it a week after LLAMA-3 was

243
00:19:57,551 --> 00:19:57,941
. Amit Bahree: Roughly.

244
00:19:58,241 --> 00:20:04,571
Miko Pawlikowski: the main selling point being that they trained it on 15 trillion tokens or some ridiculous number like that.

245
00:20:04,591 --> 00:20:06,341
And they were surprised that it kept getting better.

246
00:20:06,861 --> 00:20:11,141
Sounds like this one was, trained on a much smaller corpus of text.

247
00:20:11,691 --> 00:20:14,661
how do you explain, why it's so good?

248
00:20:15,796 --> 00:20:17,256
Amit Bahree: so there's two things here.

249
00:20:17,256 --> 00:20:20,826
I think it's, and it's in the paper, it's 3 trillion tokens.

250
00:20:21,296 --> 00:20:27,356
I think the, again, this is a genesis from Phi-2, which is a genesis from Phi-1, which is a genesis from ORCA2.

251
00:20:27,356 --> 00:20:28,886
Those are all research models.

252
00:20:29,126 --> 00:20:38,196
one of the things we've come around to seeing is in the context of these new categories of small language models, is highly curated data sets is better.

253
00:20:38,486 --> 00:20:44,196
so one reason why you see Phi-2 and Phi-3 doing so much better.

254
00:20:44,196 --> 00:20:50,196
Relative to, bigger models is because, a good chunk of the data is highly curated.

255
00:20:50,216 --> 00:20:52,306
There's two aspects to it, which we also publish.

256
00:20:52,306 --> 00:20:57,616
So there's this other paper, this textbooks is all you need if you or your readers have seen it.

257
00:20:57,936 --> 00:21:05,991
So basically a good portion of the corpus is high quality textbooks as input into the model to train on.

258
00:21:06,421 --> 00:21:19,041
And then the second aspect of, data is not common crawl sucking stuff off the web, but again,
highly curated, web data, or a very small subset of the web data, combined with the, Textbooks.

259
00:21:19,051 --> 00:21:24,611
that is also an interesting research thing where it's going now to say is like for these smaller models.

260
00:21:25,821 --> 00:21:31,421
How much higher quality data sets does carry a lot of weight.

261
00:21:31,913 --> 00:21:34,093
and that's really a lot of what you're seeing.

262
00:21:35,098 --> 00:21:39,498
Miko Pawlikowski: So When people say curated, does it mean an army of humans?

263
00:21:39,888 --> 00:21:45,208
Like selecting, reading that and annotating and like discarding low quality stuff.

264
00:21:45,208 --> 00:21:53,028
Or is there like another model that does that work to pre select and it's models all the way down

265
00:21:53,731 --> 00:21:58,451
Amit Bahree: It's not an army of humans because that's not scalable and doable at, you can do it as a one

266
00:21:58,556 --> 00:21:59,446
Miko Pawlikowski: trillion tokens.

267
00:21:59,446 --> 00:21:59,656
Yeah.

268
00:22:00,171 --> 00:22:02,941
Amit Bahree: Yes, you can do it as a one off maybe, but, especially.

269
00:22:03,721 --> 00:22:05,251
Phi-3 is a product.

270
00:22:05,621 --> 00:22:17,091
Phi-2 was a research model, two different things and from our perspective, the minute we're saying it's a product,
we release it to production, it has to go through the right rigour and cycles from a Microsoft perspective.

271
00:22:17,211 --> 00:22:21,041
That means, We have to support it for a number of years.

272
00:22:21,101 --> 00:22:23,201
We have customers who are gonna use it and so on.

273
00:22:23,201 --> 00:22:26,031
So we can't just publish it with an army of people.

274
00:22:26,061 --> 00:22:27,351
'cause that doesn't really scale.

275
00:22:27,351 --> 00:22:41,771
So there is other models helping when you say how do you create it, at least in the context of this, it is
synthetic data generated using, GPT-4, but then the humans are involved to make sure that, it is curated.

276
00:22:41,771 --> 00:22:43,661
Again, it's not an army of people, but it's.

277
00:22:44,136 --> 00:22:47,396
machinery evaluations and so on, machinery running to

278
00:22:47,546 --> 00:22:49,836
Miko Pawlikowski: So this is synthetic data we're talking about.

279
00:22:49,856 --> 00:22:52,186
It's literally all generated by ChatGPT,

280
00:22:52,621 --> 00:22:55,831
Amit Bahree: most, most is generated by  GPT-4.

281
00:22:56,141 --> 00:23:03,021
Miko Pawlikowski: So that always makes me wonder, if we train things on data coming out of a model.

282
00:23:03,781 --> 00:23:19,251
I'm, obviously no expert on this, but intuitively it seems to me like that data generated by GPT-4, any model
really it's going to have certain attributes to it that don't necessarily represent, the web, is that not a problem?

283
00:23:20,093 --> 00:23:20,703
Amit Bahree: Yes and no.

284
00:23:20,703 --> 00:23:27,863
I think one shouldn't be using the output of another model as your general data input only.

285
00:23:27,913 --> 00:23:32,253
I think you have to look at it in certain domains and specific of what you're trying to do.

286
00:23:32,283 --> 00:23:34,433
And then in that context, it would be okay.

287
00:23:34,433 --> 00:23:36,743
But that's where the human aspect also comes.

288
00:23:36,743 --> 00:23:38,783
You have to make sure evaluations are right.

289
00:23:38,793 --> 00:23:40,433
Cause guess what?

290
00:23:40,433 --> 00:23:44,803
The old school garbage in garbage out is still very much valid.

291
00:23:45,283 --> 00:23:47,733
but I think your intuition is correct in the sense.

292
00:23:48,858 --> 00:23:59,308
One shouldn't think of it as, 'hey, I can go use an LLM, spit it out, and then use that to go train my own model', in the breadth, in the broad sense of it.

293
00:23:59,838 --> 00:24:13,488
but you'll also hear of more recent papers coming, and more recent news where, in general, this is not Phi-3, but in general,
we have reached the points where we are sucking in all of the available Internet that one's reachable or allowed to reach.

294
00:24:14,048 --> 00:24:21,248
And to train the models more and more, we are also then complementing it with the synthetic data, which, other AI is, generating,

295
00:24:21,888 --> 00:24:35,168
So I think you have to go put it back in which aspects of your existing models not doing great on, evaluating those, and
then using that as a basis to strengthen that dimension, rather than just a more horizontal generic, if that makes sense.

296
00:24:35,588 --> 00:24:36,568
Miko Pawlikowski: Yeah, It certainly does.

297
00:24:36,568 --> 00:24:46,758
And I think what I appreciated, I actually haven't seen the shortcut, the abbreviation SLMs for small language models until I opened your

298
00:24:47,018 --> 00:24:47,468
Oh, okay.

299
00:24:47,538 --> 00:24:54,028
which is, an indication of just how much focus we put on LLMs, the large language models.

300
00:24:54,548 --> 00:25:10,458
And, I think that, to me, at least, I don't know if it's just like the part of me that loves running things on Raspberry Pis and gets excited
about the possibility of actually running a decent enough model that I can speak to that actually runs on my phone or something like that.

301
00:25:10,458 --> 00:25:18,758
so 3 billion parameters, does that mean roughly with 4 bit quantization that we can run it on effectively any phone at this stage?

302
00:25:18,818 --> 00:25:21,378
Like it's going to need maybe a couple of gigs

303
00:25:21,476 --> 00:25:22,316
Amit Bahree: everyone's asking that.

304
00:25:22,706 --> 00:25:30,416
so on a certain profile, so I think we talk about an iPhone 14 with a Bionic processor.

305
00:25:31,006 --> 00:25:31,876
You can run it.

306
00:25:32,216 --> 00:25:35,766
It can do a certain number of tokens per minute sort of generations.

307
00:25:36,316 --> 00:25:36,716
I think.

308
00:25:37,141 --> 00:25:50,531
To be able to go run it for Miko or Amit as a, I can run it on a phone and as an experiment, what have
you is one thing, versus the ability to run it at scale for a production deployment is a different thing.

309
00:25:51,171 --> 00:26:00,631
So yes, these are small language models and we do believe how LLMs after ChatGPT became a lot of hype.

310
00:26:00,811 --> 00:26:02,461
Some is good, some is not so good.

311
00:26:02,701 --> 00:26:06,026
SLMs will be the next set, in the context of the hype.

312
00:26:06,471 --> 00:26:14,421
But as I go to remind many of the folks I talk to, it's a small language model in relation to a large language model

313
00:26:15,831 --> 00:26:23,591
at the end of the day, I think 2.8 or 3.8 or whatever parameter we have on the mini one, because this is Phi mini, Phi-3 mini.

314
00:26:24,206 --> 00:26:25,626
It's also a family of Phi models.

315
00:26:25,636 --> 00:26:40,016
This is the smallest of the ones that should be coming out and the paper touches on the others, at the end of the day, three billion
parameter or whatever the exact number isn't small just from a computer science perspective, it is still a pretty big, complex thing.

316
00:26:40,906 --> 00:26:41,236
Yes.

317
00:26:41,256 --> 00:26:47,596
Compared to hundreds of billions of parameters, it is small, but it is not small.

318
00:26:47,876 --> 00:26:49,496
I think I have to go remind people that.

319
00:26:49,561 --> 00:27:00,861
In relation, or relative to an LLM, yes, it's small, but by itself, it is still pretty complex
and beefy in the sense of compute requirements and GPU requirements and, what it needs.

320
00:27:01,541 --> 00:27:12,181
It doesn't mean you'll go off and deploy a bunch of these on your Raspberry Pi with inference and, milliseconds and whatnot.

321
00:27:12,371 --> 00:27:23,391
Miko Pawlikowski: I'm asking this, but one of the reasons I am asking this is because, I don't know if
you followed the launch of Humane AI, that little gadget that kind of looks like something from Star Trek.

322
00:27:23,951 --> 00:27:29,051
and it looks like it hasn't been particularly well received because it's a little slow and a little clunky.

323
00:27:29,051 --> 00:27:30,411
I think, I watched.

324
00:27:31,086 --> 00:27:44,808
YouTube review of that and I think they basically destroyed it a little bit by showing just how long you have to wait
because it's effectively just uploading it to a cloud somewhere and then downloading the response and it's just not there.

325
00:27:45,348 --> 00:27:51,208
And, with Phi-3 and like the smaller models, all of a sudden everybody's thinking the same thing.

326
00:27:51,238 --> 00:27:53,188
Can we make it native?

327
00:27:53,218 --> 00:28:00,298
I think Apple announced some things about how they're going to work on making sure that the hardware in the newer iPhones.

328
00:28:00,653 --> 00:28:03,123
to run this stuff at reasonable speed.

329
00:28:03,153 --> 00:28:07,833
And this feels like this would be, another hockey stick moment for this things.

330
00:28:07,833 --> 00:28:12,063
There's small language models where Siri doesn't suck Hey Google.

331
00:28:13,093 --> 00:28:14,483
"Okay Google" works.

332
00:28:14,543 --> 00:28:17,423
And Alexa actually listens to me, that kind of stuff.

333
00:28:17,693 --> 00:28:22,303
do you think it was that one of the motivations of the smaller model?

334
00:28:22,873 --> 00:28:25,923
Amit Bahree: the premise you're touching on is, was one of the motivations.

335
00:28:25,923 --> 00:28:33,343
So if I rewind for a second, for a large language model, I go back, Again, if you cut through the hype for a second, laws of Physics and computer science.

336
00:28:33,583 --> 00:28:39,903
For these large language models, enormously complex, needs a lot of compute resources to run.

337
00:28:40,313 --> 00:28:52,533
And like any developer, programmer, computer scientist will tell you, laws of Physics, the scale means complexity, means latency, I have to process more things.

338
00:28:52,933 --> 00:28:58,163
It takes time to get results back and there is no ways to cut those corners at the end of the day.

339
00:28:58,913 --> 00:29:03,663
So where you're seeing latency or things are slower, it's because of that.

340
00:29:03,743 --> 00:29:11,963
from our perspective, there's also another dimension to run this at cloud at Azure level globally across the hundreds of data centers and what have you.

341
00:29:11,963 --> 00:29:24,393
that's not simple or cheap, So if we can reduce our costs to run this at scale, we can make sure the service is cheaper for our customers as well.

342
00:29:24,898 --> 00:29:29,138
I think this is also where, we as humans are awesome and we forget things.

343
00:29:30,018 --> 00:29:33,098
Because many of these models are exposed as an API.

344
00:29:33,868 --> 00:29:43,868
we as, at least developers for sure, have the expectation that it's an API call, so I'm going to get my response back in, milliseconds and what have you.

345
00:29:44,413 --> 00:29:45,913
because that's what we have been used to.

346
00:29:46,233 --> 00:29:53,993
The difference is, yes, it's an API call, but the machinery that's running behind, including the models itself, is super complex.

347
00:29:54,753 --> 00:29:56,663
and when things are slow, we get unhappy.

348
00:29:56,673 --> 00:29:59,113
So I think that also needs to be a mentorship.

349
00:29:59,123 --> 00:30:07,293
So if you package it all of this up, that's a big motivation of, in some cases, a small language model would make more sense.

350
00:30:07,393 --> 00:30:09,463
But I also want to outline this.

351
00:30:10,248 --> 00:30:13,428
It doesn't have the same power as a large language model.

352
00:30:13,428 --> 00:30:16,558
I see a lot of comparisons to the bigger models and all, which is good.

353
00:30:16,588 --> 00:30:21,168
It's early days, but at the end of the day, not an apples and apples comparison.

354
00:30:21,178 --> 00:30:30,303
For example, A lot of people, including me, have been guilty of just using GPT 4 as a knowledge database,
more and more people are instead of googling or binging or whatever you do, you just ask the thing.

355
00:30:30,463 --> 00:30:32,633
So you're using it as a big, fancy database.

356
00:30:33,433 --> 00:30:45,083
So if I put that in the sense of the world knowledge, again, it's not factually correct whether it doesn't have
the world knowledge, it only has the publicly accessible knowledge as of its cutoff training, but ignoring that

357
00:30:45,093 --> 00:30:51,948
point, the small language models will not have that because they've not been trained on that volume of data.

358
00:30:52,168 --> 00:31:01,968
So I think the other dimension is whilst the compute profile, what we've been talking is one, you have to think of the SLMs in the right use case.

359
00:31:01,968 --> 00:31:02,968
What am I trying to do?

360
00:31:03,398 --> 00:31:08,208
If I'm trying to do understand an entity in a workflow, I can use a small language model.

361
00:31:08,278 --> 00:31:11,818
I don't need the power of these large language models necessarily.

362
00:31:12,228 --> 00:31:22,018
equally if there's different languages that one has to use, not English, for example,
a small language model may not be as powerful or as good as a large language model.

363
00:31:22,098 --> 00:31:25,018
So the way we should think about it is they shouldn't be competing.

364
00:31:25,038 --> 00:31:26,258
They're complementing each other.

365
00:31:27,008 --> 00:31:32,458
in what you're trying to solve at the right step, use the right model because the beauty again is they're an API call.

366
00:31:32,798 --> 00:31:38,158
So it's not that if you're developing an application, you're stuck with one thing for the whole duration.

367
00:31:38,168 --> 00:31:41,208
You can choose at the right step for the right thing, for the right power.

368
00:31:41,208 --> 00:31:41,758
So I use.

369
00:31:42,158 --> 00:31:55,418
Often with my teams and others, the analogy, like if a GPT or pick your model is like a Ferrari, if you're going to
racing, you need a Ferrari, but if you are, if an SLM is like a Honda, and by the way, I don't get pick your brand,

370
00:31:55,988 --> 00:32:01,708
and you're stuck in morning rush hour traffic, and Honda is better than you pick the right thing for the right purpose.

371
00:32:01,768 --> 00:32:03,188
Is this really what I'm getting into?

372
00:32:03,588 --> 00:32:04,858
I would show them the compute profile.

373
00:32:06,403 --> 00:32:07,423
Miko Pawlikowski: I completely agree.

374
00:32:07,483 --> 00:32:15,793
And I think these are separate use cases where I just want my okay Google and my Siri to not suck so much.

375
00:32:15,803 --> 00:32:21,723
I want it to understand what I mean half of the time and not have to say the thing three times.

376
00:32:22,218 --> 00:32:28,508
Not to wonder every time, what did I say differently now that it didn't catch the song that I wanted to play kind of thing.

377
00:32:28,518 --> 00:32:33,488
And that would already be like a big improvement for me, just interacting with that thing.

378
00:32:33,828 --> 00:32:52,728
when you were saying all those things, I was wondering whether there is a certain kind of minimal level where it will be, a certain number of
tokens per second that will feel to most humans as real time is really what we're talking about here and beyond that point, probably doesn't matter.

379
00:32:53,148 --> 00:32:58,228
if you can't read it faster than it's being produced and you're not going to have that, that feeling of slowness.

380
00:32:58,228 --> 00:33:00,378
And there are some interesting.

381
00:33:00,788 --> 00:33:14,738
things like Groq, the one with a Q at the end, I think there's suing Elon Musk over there, that has some
dedicated hardware, and I saw some demo, I was doing something ridiculous, like 800 tokens a second on LLAMA 3.

382
00:33:14,758 --> 00:33:16,688
So was it 70B or something?

383
00:33:17,438 --> 00:33:24,478
Is it not just like the matter of waiting like 5-10 years for the dedicated hardware to get cheap and plentiful enough.

384
00:33:24,478 --> 00:33:28,268
And it won't be so much of an issue?

385
00:33:29,086 --> 00:33:32,546
Amit Bahree: that's the whole story of computing history, if you go look at it, right?

386
00:33:32,971 --> 00:33:33,411
Miko Pawlikowski: Yeah.

387
00:33:34,146 --> 00:33:40,046
Amit Bahree: As hardware improves, but I think we also have to put it in the context of the scale of use.

388
00:33:40,336 --> 00:33:53,166
For example, if you have access to a data center with hundreds of GPUs of today's best in breed, let's say,
and there's nobody else in, it won't feel slow to you, it'll be like, what's everyone complaining about?

389
00:33:53,586 --> 00:33:58,756
But if in the same data center, you have 4000 other users concurrently coming, it's a different story.

390
00:33:58,806 --> 00:34:13,716
I think I have to also remind people when you are doing comparisons or the expectations you have to think of in the sense of the load, the
traffic, how much, now, we as a cloud provider, that's a lot of our headache and a lot of customers saying like, why do you think I'm paying you?

391
00:34:14,946 --> 00:34:18,216
but I also go back, yes, and laws of Physics don't change as well.

392
00:34:18,246 --> 00:34:21,986
but overall, I think, Nvidia announced a whole bunch of new stuff.

393
00:34:22,221 --> 00:34:26,671
at their conference quite recently, network speeds are improving or have been.

394
00:34:26,671 --> 00:34:34,761
So if you step back for a second, I think, just history of computing has been that hardware scales up and improves and helps the software.

395
00:34:35,111 --> 00:34:40,351
I think the one thing that I'm not, personally speaking, I can't predict the future.

396
00:34:40,381 --> 00:34:44,761
The one thing is, This is one of those back to your hockey stick points.

397
00:34:44,801 --> 00:34:47,381
The scale is almost at a global level.

398
00:34:47,451 --> 00:34:54,291
okay, it's not every human on the planet using ChatGPT or LLMs in some fashion, but quite a big percentage of people are.

399
00:34:54,401 --> 00:34:58,601
In some manner, on a daily basis, for some people, it is eight hours a day.

400
00:34:59,431 --> 00:35:03,791
My mom may be once every other, once a week or whatever it is she does.

401
00:35:04,171 --> 00:35:10,551
but the breadth of humans using it is much more broader than it ever has been.

402
00:35:10,931 --> 00:35:28,271
So with that context, even as other underlying system and hardware improves, I think the perception of, is it
actually getting improving, maybe slower than perhaps in the past where it was still more niche, if that makes sense.

403
00:35:29,046 --> 00:35:29,626
Miko Pawlikowski: It does.

404
00:35:29,626 --> 00:35:38,846
And I think to follow the train of thought that you started here, the potential is probably higher just because it's so much more intuitive.

405
00:35:38,846 --> 00:35:40,116
You just talk to it.

406
00:35:40,786 --> 00:35:43,836
my mom when she needs to install an app, it's a whole thing.

407
00:35:43,876 --> 00:35:44,886
It takes a while.

408
00:35:44,926 --> 00:35:46,206
She needs to get used to it.

409
00:35:46,216 --> 00:35:49,937
She needs get comfortable with it, needs to remember the password.

410
00:35:49,937 --> 00:35:51,257
There might be another pin.

411
00:35:51,267 --> 00:36:06,912
it's a whole thing, but, once she gets some kind of interface that's built into her phone, or whatever, where she
can just talk to it, that kind of clears a lot of barriers and, a lot of people are picturing this feature where your

412
00:36:06,912 --> 00:36:13,174
phone is slowly turning to effectively listening device from Star Trek and, It's just doing what you want it to do.

413
00:36:13,224 --> 00:36:15,134
And maybe integrates with all the apps.

414
00:36:15,684 --> 00:36:17,344
I ordered that Rabbit R1.

415
00:36:17,364 --> 00:36:18,264
I'm still waiting.

416
00:36:18,284 --> 00:36:22,853
I don't know where the delivery is supposed to be, but, that's one of the visions of the future is right there.

417
00:36:22,934 --> 00:36:31,924
You just talk to it and the model does things on your behalf, goes to this dodgy apps and clicks things, and you don't have to worry about that.

418
00:36:31,924 --> 00:36:33,044
And you don't have a learning curve.

419
00:36:33,804 --> 00:36:38,074
And I think that's a vision of the future that excites a lot of people.

420
00:36:38,914 --> 00:36:44,724
And, I suspect we might see something like that in the near future because I don't see any roadblocks for

421
00:36:44,849 --> 00:36:46,929
Amit Bahree: No, I actually argue the other way.

422
00:36:47,189 --> 00:36:49,939
I see it actually is already happening now.

423
00:36:49,959 --> 00:36:51,469
And I can give you two, real examples.

424
00:36:51,479 --> 00:36:53,599
for example, I'm originally from India.

425
00:36:54,099 --> 00:37:03,399
And in India, as much as the country's made progress, there's still a, decent percentage of the population who is not very literate.

426
00:37:03,579 --> 00:37:09,149
Either they haven't finished school, or they dropped out early, or they've actually not gone to school.

427
00:37:09,634 --> 00:37:17,874
Now, it may be a small percentage at a country level, but if it's a country with 1.4 billion, a small percentage in absolute numbers is still a big number.

428
00:37:18,304 --> 00:37:20,693
a chunk of humanity.

429
00:37:21,263 --> 00:37:34,363
And in that, we're seeing, for many people who are not comfortable reading or writing, some of
the cheaper devices they have, it's not an iPhone or an Android phone, but they have a speech.

430
00:37:34,373 --> 00:37:35,693
So they print, there's a big mic.

431
00:37:36,348 --> 00:37:51,298
In the middle of the phone, they can press that and talk to it and actually in natural language, in their language, they're asking questions and
talking to it and that is, as it happens in some of these cases, it's out of some of our speech AI, which is understanding it and then responding back.

432
00:37:51,418 --> 00:37:58,658
So it's lowering the barrier and opening up this to a broader segment, which in the past was not possible.

433
00:37:59,213 --> 00:38:03,423
so that's one example, because they don't need to know the language to go type it in or what have you.

434
00:38:03,423 --> 00:38:05,913
They can just talk to it normally how they would talk to it.

435
00:38:06,293 --> 00:38:22,158
And then the second one was actually more of a ChatGPT example, which I think, Microsoft also published where it's, plugging in
different languages for, again, in rural areas in India as farmers, like India is not, For those who don't know, it's not like the U.

436
00:38:22,158 --> 00:38:22,308
S.

437
00:38:22,318 --> 00:38:27,678
and others, where you have big farms with, hundreds and thousands of hectares or acres.

438
00:38:27,708 --> 00:38:29,348
They're usually small farms.

439
00:38:29,738 --> 00:38:31,448
Usually it's, the family which owns it.

440
00:38:31,828 --> 00:38:39,138
And they don't really have the muscle at an individual level, to go understand pricing and markets and what's happening as they want to go sell there.

441
00:38:39,623 --> 00:38:41,073
green or whatever they're growing.

442
00:38:41,283 --> 00:38:48,983
So in that sense, we talked about democratizing was how they're using ChatGPT to actually, getting basically real time market information.

443
00:38:48,983 --> 00:38:59,263
So they're empowered to go make a better decision, which until now was impossible because
you need a computer, you need a modem, or you need to be online and those are the barriers.

444
00:38:59,263 --> 00:39:03,743
And it's like, they don't know how to use it, or in the language that they understand.

445
00:39:04,298 --> 00:39:23,138
So these are actually happening today, like in production, so to speak, live, and the way we want to think about it as is democratizing AI, which is when I go
back to how you started asking me the question, when we started talking, if I go back to my mom's or the example you used with your mom of the barrier of a new

446
00:39:23,188 --> 00:39:31,788
app or a new interface, if we free them up or make it easier in many ways, those are the democratizing, elements that is happening is not only about your, Okay.

447
00:39:32,188 --> 00:39:36,188
how literate you are or not, but it's the, it's easing barriers basically.

448
00:39:36,268 --> 00:39:38,588
So of course it doesn't do everything.

449
00:39:39,358 --> 00:39:47,958
It doesn't mean all barriers are gone, but we see a lot of real examples, day to day life things that people are using it, which is absolutely fascinating.

450
00:39:48,408 --> 00:40:05,778
Miko Pawlikowski: there was this very popular demo of, I think it's called bland AI where they had a billboard with a phone number to call and you can have
thousands and thousands of parallel conversations with an AI to do things like booking and, Basically get like a first line human in a, experience really.

451
00:40:06,518 --> 00:40:09,058
And the demos were amazing.

452
00:40:09,158 --> 00:40:16,178
And there's, like a million startups doing things around that at the moment, it also obviously has the dark side, right?

453
00:40:16,208 --> 00:40:28,258
Where people are worried that what does it mean, Can you go and sway an election now   by just calling
everybody in the US and telling them something that they want to hear and, personalize the message.

454
00:40:28,808 --> 00:40:39,318
It is a brave new world, a weird world that we're entering here, Where some things that, you could always technically go and call everybody in the U.

455
00:40:39,318 --> 00:40:41,358
S., but it would take a while.

456
00:40:43,668 --> 00:40:49,648
now with those things, maybe you can do it convincingly in a shorter period of time, And maybe not that expensively.

457
00:40:49,708 --> 00:40:50,878
does that scare you?

458
00:40:51,591 --> 00:40:52,161
Amit Bahree: yes and no.

459
00:40:52,161 --> 00:40:55,901
I think that's true with any aspect of humanity or technology.

460
00:40:55,901 --> 00:40:57,991
You can use it for good, you can not use it for good.

461
00:40:58,211 --> 00:40:59,561
And it's a choice you have to make.

462
00:40:59,571 --> 00:41:00,701
So I think that's sort of one.

463
00:41:00,701 --> 00:41:06,591
So in that dimension, it's not something new that we haven't been doing.

464
00:41:06,591 --> 00:41:10,961
I think what is new or what is more dangerous, if that's the word I want.

465
00:41:10,981 --> 00:41:13,521
I'm not sure if that's the word I want, but I can't think of a better one.

466
00:41:13,761 --> 00:41:17,611
More concerning, is how easy it is.

467
00:41:17,671 --> 00:41:20,281
And unless What things to watch out for?

468
00:41:20,291 --> 00:41:21,701
How do you know what's true or not?

469
00:41:21,761 --> 00:41:29,221
So I think there's of course dimensions into it where we as humans have to recalibrate ourselves on, do I trust it or not?

470
00:41:29,511 --> 00:41:31,961
For example, Robocalling has been around for decades.

471
00:41:32,001 --> 00:41:34,541
The fact that I can cheaply call everyone is not the problem.

472
00:41:34,923 --> 00:41:40,838
now it is, it may sound like Amit or Miko's calling, which in the past You know it's not Amit or Miko calling.

473
00:41:41,298 --> 00:41:44,878
I think that's the really, things to think about and worry about.

474
00:41:44,878 --> 00:41:56,418
The way I reposition it as well, from a Microsoft perspective, and also I have a whole
chapter in the book on that, is, the, there is new emerging threats from a security.

475
00:41:56,418 --> 00:42:06,428
So if you think of a traditional security aspect of your application or developer, DevStack, The
way we're saying is look, there's additional new security threats you have to go think about.

476
00:42:07,228 --> 00:42:18,978
And it's easy to get wrapped up in all the negative, but if you step back and say, look, as there were paradigm shifts, as
you went from client server two tier, and I'm going to show my age now, applications to distributed applications and then

477
00:42:18,978 --> 00:42:25,643
to web applications, There's a lot of goodness, but then it also opened up the exposure to, a different threat vector.

478
00:42:26,043 --> 00:42:27,413
The surface area was different.

479
00:42:27,643 --> 00:42:31,793
In some cases it was broader, in other cases it was actually contracted.

480
00:42:32,373 --> 00:42:34,433
And in that sense, this is no different.

481
00:42:34,443 --> 00:42:41,413
There is new emerging threats you have to think about and be cognizant of, and then also understand what is the risk of that.

482
00:42:41,913 --> 00:42:44,873
And sure, a threat could happen, but how often will it happen?

483
00:42:45,043 --> 00:42:46,123
And how do I mitigate that?

484
00:42:46,163 --> 00:42:48,033
Uganda will solve 100 percent everything.

485
00:42:48,488 --> 00:42:53,128
But you have to then hone it back down into what's your use case, how you're thinking about it, and so on.

486
00:42:53,768 --> 00:43:03,008
so instead of, either ignoring it, which is not good, or putting your head in the sand like it's all doom, neither of those dimensions are going to be helpful.

487
00:43:03,028 --> 00:43:08,208
So I think part of it is understanding that, yes, there is a new set of threats that are emerging.

488
00:43:08,918 --> 00:43:10,208
Be aware of those.

489
00:43:10,518 --> 00:43:12,118
How do you solve for those?

490
00:43:12,378 --> 00:43:13,808
How do you manage those?

491
00:43:13,868 --> 00:43:17,688
And then In the context of a use case, in the context of how you're using it.

492
00:43:18,638 --> 00:43:21,558
Miko Pawlikowski: It's a little bit like passwords, isn't it?

493
00:43:21,908 --> 00:43:30,578
We rely on the fact that it's not practical for someone to go and brute force your password because it would take a thousand years.

494
00:43:31,358 --> 00:43:45,833
And if someone goes and figures out how to make a computer that goes around the limitations of
Physics and can do it a thousand times faster, all of a sudden a lot of passwords would be useless.

495
00:43:46,363 --> 00:43:48,253
And I think it's a little bit like that, right?

496
00:43:48,273 --> 00:43:55,973
we got a technology that made things possible now, that we're relying on them just not being practical from, time and cost perspective.

497
00:43:56,623 --> 00:43:57,833
And now we have to deal with that.

498
00:43:57,863 --> 00:44:00,693
And the genie's out of the bottle, as they say, I think.

499
00:44:00,733 --> 00:44:02,333
and the cat's out of the bag.

500
00:44:04,836 --> 00:44:05,766
Amit Bahree: That's a great analogy.

501
00:44:05,766 --> 00:44:06,476
I actually like that.

502
00:44:06,476 --> 00:44:07,896
I'm going to steal that in other places.

503
00:44:07,896 --> 00:44:10,506
But you're right, like there was a time where we didn't need passwords.

504
00:44:10,736 --> 00:44:11,596
It wasn't a problem.

505
00:44:12,056 --> 00:44:15,386
And then there was a time where we needed passwords, but it was simple passwords.

506
00:44:15,696 --> 00:44:18,716
You could do hello1234 or password1 or what have you.

507
00:44:19,076 --> 00:44:22,296
And then it was like, time where, okay, it needs to be a little more complex.

508
00:44:22,326 --> 00:44:25,276
And now you can find these, buy these on the dark web and all.

509
00:44:25,276 --> 00:44:29,476
and hence you need more complex passwords.

510
00:44:29,576 --> 00:44:32,246
my one PSA is please use a password manager.

511
00:44:32,806 --> 00:44:37,866
10 years, 15 years ago, if you were chatting, the concept of a password manager would be so alien.

512
00:44:38,646 --> 00:44:42,806
And here now, I'm sure as you do, I do tech support for my family.

513
00:44:43,646 --> 00:44:44,696
Unpaid, of course.

514
00:44:44,876 --> 00:44:47,066
my everything is go use the password manager.

515
00:44:47,096 --> 00:44:48,176
Here's how you set it up.

516
00:44:48,176 --> 00:44:49,916
And why shouldn't you reuse passwords?

517
00:44:49,916 --> 00:44:52,986
And let the thing do the heavy lifting for you.

518
00:44:53,046 --> 00:44:54,446
But you save it, right?

519
00:44:54,466 --> 00:44:56,016
With your master password and whatnot.

520
00:44:56,596 --> 00:44:58,976
I think it's, yeah, it's the same analogy in that sense.

521
00:44:59,046 --> 00:45:01,926
it goes back to your thread vectors change society.

522
00:45:01,946 --> 00:45:04,976
Things are changing and, part of it is adapting.

523
00:45:05,036 --> 00:45:05,736
Some is good.

524
00:45:05,776 --> 00:45:06,426
Some is not good.

525
00:45:07,291 --> 00:45:09,051
Miko Pawlikowski: Let's circle back to your book.

526
00:45:09,371 --> 00:45:12,411
ultimately, that's how I learned about you existing.

527
00:45:13,256 --> 00:45:22,576
So as I was reading it, for anybody who's, interested in, go and pick it up on, manning.com, it's a very practical guide.

528
00:45:22,646 --> 00:45:26,716
It's called "Generative AI in action" for a reason.

529
00:45:26,726 --> 00:45:27,136
There is.

530
00:45:27,676 --> 00:45:32,006
Little time spent on the underlying details.

531
00:45:32,006 --> 00:45:49,536
There is obviously the intro that covers everything that you would expect in terms of what is generative AI, the architecture,
high level, what it means, references, overview of LLMs, transformer, smaller language models, that kind of stuff.

532
00:45:49,576 --> 00:46:02,181
And then it turns into basically a guide to show you what's possible with it, show you how you can go and call some API and get magic text being generated.

533
00:46:02,411 --> 00:46:12,241
It shows you how to generate pictures, shows you how to generate other things like music, video, I think briefly code, all that kind of stuff.

534
00:46:12,241 --> 00:46:25,171
So I'm picturing this really as a kind of guide that you get yourself when you want to get into this without
wasting any time on things that are not necessary for your journey, I will get you from zero to one on that.

535
00:46:25,721 --> 00:46:27,531
is that accurate description?

536
00:46:27,681 --> 00:46:30,221
Am I doing a good, marketing pitch here?

537
00:46:30,396 --> 00:46:30,886
Amit Bahree: Mostly.

538
00:46:31,116 --> 00:46:31,626
Yeah.

539
00:46:31,961 --> 00:46:32,541
Miko Pawlikowski: Mostly.

540
00:46:34,556 --> 00:46:35,576
Amit Bahree: I'm not in marketing.

541
00:46:35,696 --> 00:46:35,896
Yes.

542
00:46:35,936 --> 00:46:37,536
I think that is an accurate description.

543
00:46:37,536 --> 00:46:45,386
I think the emphasis is on the "in Action" part, the premise of this is, you want to go build an app and right now.

544
00:46:45,386 --> 00:46:55,226
I go back to my year and a half of conversations from CEOs down across the Fortune 500 or whatever, which is our, from a work point of view, right?

545
00:46:55,226 --> 00:47:11,126
A lot of these large enterprises, but this is not about just large enterprises, it's about if you're a company, you have a
set of products you want to improve or make new, how do I use this GenAI and ChatGPT and LLMs and everyone's heard about it.

546
00:47:11,626 --> 00:47:13,596
And they don't know where to start or how to start.

547
00:47:14,226 --> 00:47:16,816
So that's really what I was trying to do, right?

548
00:47:16,816 --> 00:47:19,376
There's broadly speaking three parts to the book.

549
00:47:19,396 --> 00:47:29,076
The first part is introductions and because you just know what you know, I can't just go dig
into things without giving you some context and basis on what's possible, what's not possible.

550
00:47:29,556 --> 00:47:31,276
and that's the first part you're touching on.

551
00:47:32,126 --> 00:47:36,196
What I stay away from is it's not a science research book.

552
00:47:36,246 --> 00:47:43,116
I link to papers where there are people who generally are curious or they want to go deeper.

553
00:47:43,536 --> 00:47:53,666
So we leave those crumb trails in a way saying, if you want to dig more in your own time kind of a
thing, here's the things you can go read up and then that'll expose you to more dimensions, right?

554
00:47:53,666 --> 00:48:07,186
So it's not a science book, techie book in that sense, because At least in an enterprise setting, most
developers and CTOs and CIOs or CEOs, they want to see like, how is it going to solve my business problem?

555
00:48:07,876 --> 00:48:08,926
How do I do it?

556
00:48:09,606 --> 00:48:19,486
Some are interested in the science and the depth, but most just want to know at a high
level, how it works deep enough, but not in the guts at least on the AI side of the science.

557
00:48:20,036 --> 00:48:35,586
so we leave the breadcrumbs and the trails pointing to papers where people can go deeper should they want to, but If you're a developer and
you can use a set of APIs and SDKs, that is really for you and the way we say is because these, at least these LLMs are exposed as an API.

558
00:48:36,151 --> 00:48:40,981
You really don't need to know any of the AI sort of mumbo jumbo, any developer can pick it up easily.

559
00:48:41,001 --> 00:48:43,211
So that's certainly why I was trying to position it.

560
00:48:43,791 --> 00:48:49,081
Part one is getting you a sense of the world from a technical perspective, but not go super deep.

561
00:48:49,601 --> 00:49:01,008
And then part two and part three is where we start going deeper on, okay, how do
I use this in my production, solving my business problem, what I'm trying to do.

562
00:49:01,101 --> 00:49:07,661
Miko Pawlikowski: making it very applicable, for example, at some point, the book is talking about image generation.

563
00:49:08,176 --> 00:49:28,206
And there is a short description of generative adversarial networks, and it doesn't include Ian Goodfellow getting drunk and going to a fellow
student's graduation and then arguing with them and then going home and implementing a proof of concept algorithm to prove the other people wrong.

564
00:49:28,316 --> 00:49:30,736
And, next day discovering is actually working.

565
00:49:31,286 --> 00:49:33,296
It's giving you the kind of applicable.

566
00:49:34,021 --> 00:49:43,841
This is used for scenarios where the data is complex and diverse, requiring realism, suitable for high quality images, data augmentation, style transfer.

567
00:49:43,841 --> 00:49:51,841
So it's prescriptive in a way, I would say, you give people, what they need to, get to get cracking with it.

568
00:49:51,901 --> 00:50:05,091
speaking of which, let's talk a little bit about the images because you do cover a few interesting things like
the VAE, the GANs, diffusion, vision transformers, to give people a sneak peek of what they're going to expect.

569
00:50:05,151 --> 00:50:12,041
Can you talk about why they're interesting and why they might be, something that you should be paying attention to.

570
00:50:12,671 --> 00:50:13,891
What are the breakthroughs?

571
00:50:14,649 --> 00:50:21,249
Amit Bahree: I think one aspect is where ChatGPT and the LLMs is just the language part, is taking the hype.

572
00:50:21,249 --> 00:50:26,799
And I think most people understand that there's a different set of tech related but different on images, right?

573
00:50:26,849 --> 00:50:41,479
and image understanding, image editing, the power of it on one hand, wherever you go on whichever social
thingy with stable diffusion came out, lot of creativity on the image generation was there, but in a social

574
00:50:41,489 --> 00:50:50,209
setting, the thing really is, how do you expand that in a corporate, application setting and what can you do?

575
00:50:50,379 --> 00:50:54,569
It's one is like fun and wonderful in a personal social setting.

576
00:50:55,109 --> 00:51:01,909
But then how do I transfer that and then which area do I use in a work setting?

577
00:51:02,519 --> 00:51:06,009
Not even have to be work, each of these techniques have their own power.

578
00:51:06,059 --> 00:51:17,254
I think most people don't really care or maybe nor should they care, but in some cases where it would
matter It's good to know what is the underlying tech, so I know what to ignore versus not to ignore.

579
00:51:17,404 --> 00:51:20,024
Because, again, the hype wraps up a lot of this.

580
00:51:20,024 --> 00:51:28,104
If you come back to it, it's more of helping people ground themselves a little, because at the end of the day, the tech is still the tech, right?

581
00:51:28,164 --> 00:51:31,954
What it is meant to do and how it is meant to do doesn't fundamentally move.

582
00:51:31,964 --> 00:51:38,424
if you're trying to solve one set of images, one kind of things, like diffusion models would be great for, that set of categories.

583
00:51:38,424 --> 00:51:40,014
And now there's multiple diffusion models.

584
00:51:40,014 --> 00:51:41,374
You can go pick which one you want.

585
00:51:41,949 --> 00:51:43,119
versus a transformer model.

586
00:51:43,129 --> 00:51:53,169
So again, we don't go, I have a few diagrams and images to, outline at a high level how
these are, because there's papers on each topic, you can go read like hundreds of them.

587
00:51:53,849 --> 00:51:57,659
but the intention is just to know, look, there's different buckets and categories.

588
00:51:57,729 --> 00:51:59,089
Each has its own strengths.

589
00:52:00,379 --> 00:52:04,549
And if what you're trying to solve for, just make sure you connect those dots.

590
00:52:05,029 --> 00:52:10,149
I guess the other analogy is if you're writing a book, word is easier than notepad kind of a thing, right?

591
00:52:10,319 --> 00:52:18,089
Miko Pawlikowski: I was a little surprised to see a prompt engineering chapter, but I guess it makes perfect sense.

592
00:52:18,119 --> 00:52:20,449
You need a little bit of basics.

593
00:52:20,499 --> 00:52:22,659
What was your thinking, with that chapter?

594
00:52:22,989 --> 00:52:25,049
What was the goal you wanted to achieve with it?

595
00:52:26,771 --> 00:52:30,621
Amit Bahree: in the context of LLMs, like prompt engineering is pretty crucial.

596
00:52:31,461 --> 00:52:34,651
it is how you steer the model fundamentally in many ways.

597
00:52:35,111 --> 00:52:38,271
the beauty of it is half art and half science.

598
00:52:39,096 --> 00:52:41,736
The frustration of it is it is half art and half science.

599
00:52:42,926 --> 00:52:49,816
but, fundamentally, at least with today's technology of where things are, prompt engineering is quite crucial.

600
00:52:50,366 --> 00:52:58,616
And the way we also tell many of our customers and I tell is look, you have to start thinking about prompts as your IP

601
00:52:59,526 --> 00:53:01,686
in many ways, and I'm not talking about simple prompts.

602
00:53:01,686 --> 00:53:04,886
Like I, in the book, I use simple prompts to make the point.

603
00:53:04,896 --> 00:53:09,586
So 'tell me a story about a panda' is not really IP in the context of a prompt.

604
00:53:10,166 --> 00:53:14,326
and then prompts are also closely tied to how a model understands it.

605
00:53:14,886 --> 00:53:33,111
So again, outside of simple prompts, where you are When you're using this concept of RAG, for example, as you start using a specific model or a
family of models, which are closely related, you start picking up nuances on how the models interpreting things and working with things and so on.

606
00:53:33,141 --> 00:53:36,261
And then you're tweaking your prompts along with that, right?

607
00:53:36,261 --> 00:53:37,621
So it's cohesive together.

608
00:53:38,161 --> 00:53:44,091
and that intuition as you learn is also part of your IP and how you want to think about prompt engineering.

609
00:53:44,291 --> 00:53:45,401
That also means.

610
00:53:46,101 --> 00:53:47,461
There is no universal prompts.

611
00:53:47,501 --> 00:53:50,991
Again, outside of the simple ones, I'm not talking about the simple, straightforward prompts.

612
00:53:51,381 --> 00:54:02,581
So you should not, or one should not just say, if I'm, let's say, using GPT 3, 4, whichever, the same
prompts, which are complex ones, I can pick up and expect to work on, let's say, LLAMA or something else.

613
00:54:02,781 --> 00:54:07,101
They will work, but do they work at the same level and the same evaluation, the same criteria?

614
00:54:07,111 --> 00:54:11,221
Probably not, because they are quite tied into how the model behaves.

615
00:54:11,271 --> 00:54:12,271
This is very loose, right?

616
00:54:12,271 --> 00:54:16,101
It's not a scientific thing, prompt engineering is quite crucial.

617
00:54:16,171 --> 00:54:21,561
It is how we talk, even though you are calling an API, but how you're talking to the model is through those.

618
00:54:21,971 --> 00:54:28,341
so I think it's worth spending time to understand how, what these are, how they work.

619
00:54:28,361 --> 00:54:30,061
There's a lot of hype around prompts as well.

620
00:54:30,561 --> 00:54:31,841
I would say don't believe all of it.

621
00:54:32,501 --> 00:54:37,591
The one final point I want to make on it is prompts is also one of the new threat vectors.

622
00:54:37,611 --> 00:54:42,281
So I touch on it in a later chapter, I touch a little bit on prompt injection in the chapter you've seen.

623
00:54:42,771 --> 00:54:45,381
but we go a little more deeper in one of the later chapters.

624
00:54:45,761 --> 00:54:48,851
But prompt injection, as an example, is one of the new threat vectors.

625
00:54:48,851 --> 00:54:49,761
It's not the only one.

626
00:54:50,161 --> 00:54:55,101
so again, understanding that as well, but prompts, in today's world gets quite crucial.

627
00:54:55,541 --> 00:54:58,421
At the end of the day, it's how we, in quotes, talk to the model.

628
00:54:58,437 --> 00:54:59,227
Miko Pawlikowski: That makes sense.

629
00:55:00,247 --> 00:55:08,897
prompt engineering might be getting a little bit of bad rep just because of how many
people are walking around saying that they have the ultimate prompt stuff like that.

630
00:55:08,897 --> 00:55:14,057
But at the end of the day, you do need to learn how to to these things.

631
00:55:14,967 --> 00:55:17,357
And it is one of the biggest frustrations.

632
00:55:17,367 --> 00:55:25,017
It's almost like, You're talking to a cat sometimes it can suddenly freak out and do something very weird at that moment notice.

633
00:55:25,017 --> 00:55:27,497
And there is little you can do to prevent that.

634
00:55:28,422 --> 00:55:28,942
Amit Bahree: Yeah.

635
00:55:28,972 --> 00:55:35,472
and so we call it, or at least I call it, is you have to think of prompts when you're talking to the model as a parent.

636
00:55:35,602 --> 00:55:41,082
So for those who have had children or who are toddlers right now, it is what we call parentology.

637
00:55:42,452 --> 00:55:46,542
Somebody said this to me in one of my meetings and I loved it and I stole it from them.

638
00:55:47,062 --> 00:55:51,852
So if you're a toddler, your memories retention is lower.

639
00:55:52,332 --> 00:55:54,182
so many often you have to keep repeating.

640
00:55:54,192 --> 00:55:57,542
It's the classic, don't stick your finger in the wall socket, a thing.

641
00:55:58,312 --> 00:56:00,942
Saying it one time doesn't help, you have to keep repeating.

642
00:56:01,422 --> 00:56:08,882
The way I want, generally speaking, folks to think is, your model's like a toddler, you have to keep repeating, keep thinking about it, right?

643
00:56:08,932 --> 00:56:17,277
and as silly as it may sound, it's like basic stuff, like for example, one of the side effects is what's called hallucinations where, non grounded.

644
00:56:17,277 --> 00:56:20,527
So you will get responses back, which are made up and it's not factual.

645
00:56:21,117 --> 00:56:22,947
That may be okay in one dimension.

646
00:56:22,997 --> 00:56:33,037
If you are writing a creative story, it may not be okay in another dimension where in a
business setting, you're answering things based on some policy or information or what have you.

647
00:56:33,897 --> 00:56:47,837
so in the prompt it figures like simple things like do not make up any information, only answer
from this, you would think that would be obvious, so your intuition of a cat is not very far off

648
00:56:48,767 --> 00:56:50,287
Miko Pawlikowski: that's an interesting comparison.

649
00:56:51,177 --> 00:56:52,207
Let's do one more.

650
00:56:52,417 --> 00:57:09,635
you talk about RAG in your book and I think a lot of people, I have heard the term and know that there is something to
do with getting fresher data, can you give us an explanation for a five year old version of what that is and how it works

651
00:57:10,875 --> 00:57:15,335
Amit Bahree: should open ChatGPT on the other screen and say, 'explain RAG for a 5 year old in summary'.

652
00:57:16,625 --> 00:57:19,305
RAG is Retrieve, Augment, Generate, right?

653
00:57:19,305 --> 00:57:26,335
So the technique originally came from Meta, Facebook, as the research paper.

654
00:57:26,785 --> 00:57:34,935
But fundamentally, it is crucial when you are using large language models, specifically in the context of a company or a business or what have you.

655
00:57:36,500 --> 00:57:47,160
Basically, it is also a little clunky right now, but what it does is, as the name suggests, the
model that you're using just knows what it knows, what it's been trained on, which is public data.

656
00:57:47,550 --> 00:57:48,050
That's one.

657
00:57:48,500 --> 00:57:51,450
And then as with these things, there's a training cutoff, right?

658
00:57:51,480 --> 00:57:54,240
At some point, you say, okay, I'm done collecting data.

659
00:57:54,680 --> 00:58:04,510
I need to go off for a few weeks or a few months or whatever it is and go train this thing and then spit out a
model and go through a bunch more other alignment and this and that, and then eventually have a model available.

660
00:58:05,260 --> 00:58:16,100
So Online, when you go and see, a lot of people using RAG to get fresh information, which is post training data, that is absolutely valid use case.

661
00:58:16,490 --> 00:58:20,420
For many others, the other thing is my proprietary information.

662
00:58:20,600 --> 00:58:31,480
So especially in a company setting, your proprietary internal information, corporate knowledge, the model doesn't know because it's never seen it.

663
00:58:32,410 --> 00:58:35,340
In fact, if it does know that, then fundamentally there's a different problem.

664
00:58:35,970 --> 00:58:37,110
Because it shouldn't know that.

665
00:58:37,620 --> 00:58:49,320
for your business workflow, you often need to bring in your internal proprietary knowledge, whether it's
a CRM or a database or an ERP, or you're solving a ticket or what have you, depending on the use case.

666
00:58:49,800 --> 00:58:56,840
The only way the knowledge, you can bring in the knowledge is through This technique of RAG, so retrieve, augment, generate.

667
00:58:57,230 --> 00:59:05,590
Retrieve means I'm retrieving the information, which could be from my corporate enterprise systems, or, Google or Bing to get more fresh information.

668
00:59:06,110 --> 00:59:10,220
I'm augmenting it in my prompt, which goes back to prompt engineering.

669
00:59:10,220 --> 00:59:14,100
And then based on that, I'm saying please generate, or whatever I'm trying to do.

670
00:59:14,120 --> 00:59:19,860
Generation could be a summary, or entity extraction, or depending on, whatever I'm trying to do.

671
00:59:20,300 --> 00:59:21,480
But that's what RAG is doing.

672
00:59:21,840 --> 00:59:25,620
It also gets clunky, by the way, because, it is the first generation.

673
00:59:25,650 --> 00:59:27,790
I do expect things to get improving in that.

674
00:59:27,790 --> 00:59:47,490
you talk about complexities of RAG, but if I have to get proprietary in-house information, if I have to get more fresher information, the only ways I
can do that is through RAG, without retraining a whole model, which, in theory, is an option practically for, I guess 99% of people is not an option.

675
00:59:47,490 --> 00:59:49,050
I don't know if that was for a five year old, but

676
00:59:50,640 --> 00:59:56,320
Miko Pawlikowski: Yeah, that might have been a six and a half, maybe even seven, but I let, we'll let it

677
00:59:56,580 --> 00:59:57,000
Amit Bahree: Thank you.

678
00:59:57,080 --> 00:59:57,380
Miko Pawlikowski: time.

679
00:59:59,830 --> 01:00:00,260
Okay.

680
01:00:00,360 --> 01:00:16,425
so this is basically what, you're going to see if you look at, the early access version of the book, tells me
that there is six more chapters coming very soon and there's a chapters 8-13 that cover things like, More on

681
01:00:16,425 --> 01:00:25,465
RAG, telling models, application architecture for GenAI apps, can they have evaluation and ethical on GenAI.

682
01:00:25,485 --> 01:00:30,765
I, I think at some point we're going to have to get you back to talk about the rest of the book.

683
01:00:31,295 --> 01:00:36,475
but before I let you go, I wanted to, ask you for a few predictions.

684
01:00:37,260 --> 01:00:42,220
from where you stand, where, are we going to see the next evolutions and breakthrough,

685
01:00:42,290 --> 01:00:44,270
Amit Bahree: one is , Multimodality,

686
01:00:47,130 --> 01:00:53,460
which basically, a lot of people today, primarily when they're using GenAI and the likes of ChatGPT is in one mode, i.e.

687
01:00:53,480 --> 01:00:54,550
language, text.

688
01:00:55,030 --> 01:01:01,815
But I do expect multimodality where I'm starting to combine language, images, text, video, and what have you, together.

689
01:01:02,155 --> 01:01:03,875
Not just generation, but input.

690
01:01:03,885 --> 01:01:05,515
We already are seeing that, by the way.

691
01:01:05,515 --> 01:01:10,365
That's already here today, like GPT V, which is vision, being one example of that.

692
01:01:10,705 --> 01:01:14,525
But more and more multimodality, because our real world is that as well, right?

693
01:01:14,525 --> 01:01:16,045
So I see one and that happening.

694
01:01:16,455 --> 01:01:19,655
I do see SLMs to accelerate more, as we touched on.

695
01:01:20,375 --> 01:01:22,755
Again, they're not better, they're different.

696
01:01:23,255 --> 01:01:27,225
there's times you need one, and there's times you need the other, and there's times you need both.

697
01:01:28,040 --> 01:01:34,460
But, I do see more and more on that front because for many use cases, I need simple things.

698
01:01:34,460 --> 01:01:35,870
I don't need all the other power.

699
01:01:36,320 --> 01:01:37,890
so I do see that accelerating a lot.

700
01:01:37,890 --> 01:01:54,370
And then I also see a third dimension is the, underlying, systems engineering things improving to be it cost effective
from how much hardware and GPUs I need to run it to, latency around it, things like memory profile and so on and so forth.

701
01:01:54,700 --> 01:02:04,860
so I do see those sort of three and I guess I want to sneak in a fourth one, which is also all of the
responsible AI aspects, which is one of the later chapters, we touched on the likes of prompt engineering.

702
01:02:04,860 --> 01:02:09,600
I know I talked a little bit on hallucinations, but the new harmful things one can do.

703
01:02:10,315 --> 01:02:12,025
That's also a cat and mouse thing.

704
01:02:12,025 --> 01:02:15,455
I do see more research breakthroughs

705
01:02:15,652 --> 01:02:21,182
Miko Pawlikowski: Do you expect we're still going to be doing transformers a year or two or three from now?

706
01:02:22,502 --> 01:02:25,642
Do you think it was big enough of a breakthrough that it's going to stay

707
01:02:25,692 --> 01:02:31,552
Amit Bahree: I honestly don't know what I can tell you is it's what everybody's doing at the moment, which is not going away anytime soon.

708
01:02:31,732 --> 01:02:32,902
That's one side of it.

709
01:02:33,362 --> 01:02:34,192
Having said that.

710
01:02:34,737 --> 01:02:40,277
I think it's also pushing a lot of other areas around it where we can do things better.

711
01:02:40,317 --> 01:02:47,017
for example, we didn't touch on it, but each model has this concept of what we call a context window.

712
01:02:47,077 --> 01:02:52,247
How much, how big my prompt can be and in reality how many tokens can it be and how much can it send back?

713
01:02:52,737 --> 01:02:59,777
So on one hand a lot of people get happy Hey, if I have a longer token, my context window is longer.

714
01:02:59,787 --> 01:03:05,877
It means I can stuff in more things I can ask you more things or I can generate more things On one hand, that's good.

715
01:03:05,947 --> 01:03:07,145
People get happy about it.

716
01:03:07,145 --> 01:03:21,217
What then I come and have to remind them like each token one length Increase is a quadratic
increase in compute So it's four times, extra costly in the sense of computing profile.

717
01:03:21,217 --> 01:03:25,497
So just having a longer token, context window isn't necessarily good.

718
01:03:26,287 --> 01:03:29,237
So there is research going on now to say, how can we do that?

719
01:03:29,237 --> 01:03:33,877
How can we, derivatives of the transformer architecture, which is, how can we increase?

720
01:03:34,362 --> 01:03:38,582
the token windows without having a quadratic increase on the compute profile.

721
01:03:38,942 --> 01:03:43,002
and that ties back to the attention mechanics of how the transformer architecture works.

722
01:03:43,062 --> 01:03:45,622
the way I would say it's the first one which has, reached the scale.

723
01:03:45,662 --> 01:03:50,852
And then now there's other research damage that's happening to make those profiles better.

724
01:03:51,732 --> 01:03:52,952
will there be another big two?

725
01:03:54,062 --> 01:03:57,722
Which is, better than this, I'm sure, in the sense of humanity, absolutely.

726
01:03:58,115 --> 01:04:04,195
Miko Pawlikowski: plus like you alluded to, there is a lot of value to being the first thing that's good enough, right?

727
01:04:04,845 --> 01:04:14,735
when we look at how technology works, it's better to have something that's good enough today than to have the perfect or, ideal solution much later.

728
01:04:15,325 --> 01:04:17,555
And typically there is enough momentum.

729
01:04:17,995 --> 01:04:23,955
By the time the better thing comes that it might not be as, attractive as one would think.

730
01:04:24,795 --> 01:04:33,625
I think what was the paper from Google talking about basically a method of achieving
infinite, attention span, or was that some other research that you're alluding to?

731
01:04:33,992 --> 01:04:36,042
Amit Bahree: There, there's a few, there's a few papers.

732
01:04:36,062 --> 01:04:40,012
So there's one, in fact, Microsoft has on, like, how can I do 2 million tokens.

733
01:04:40,188 --> 01:04:41,178
that's one example.

734
01:04:41,418 --> 01:04:45,798
There's another one which is research going on called Ring Attention, which is different.

735
01:04:46,218 --> 01:04:48,718
I can't remember, I think it was Google?

736
01:04:48,958 --> 01:04:50,378
I can't recall off the top of my head.

737
01:04:50,468 --> 01:04:53,038
so there's multitudes of things going on.

738
01:04:53,728 --> 01:04:58,168
In parallel, like active research, on, how do we look at this differently.

739
01:04:58,208 --> 01:04:59,888
And then, that's just a context window.

740
01:04:59,888 --> 01:05:04,558
There's other things, for example, like when we touched on RAG, I said, it's a clunky way of doing things.

741
01:05:04,668 --> 01:05:06,248
we didn't go deep in it, but it's a clunky way.

742
01:05:06,248 --> 01:05:10,988
So there's other things happening, like graph, can I do graph with RAGs, and so on and so forth.

743
01:05:11,058 --> 01:05:17,418
it's not only in one dimension, I was using one of these as an example, but across multitudes of dimensions.

744
01:05:17,848 --> 01:05:20,698
There is active research going on to improve those.

745
01:05:20,728 --> 01:05:29,688
And as you start, as that starts formulating, cause look, research is one, getting something as a product that is deployable and running.

746
01:05:30,143 --> 01:05:38,233
That you can consistently, is a whole separate sort of scale and of its own separate complexities.

747
01:05:38,633 --> 01:05:42,443
but across these multiple dimensions as they come together, it'll just suddenly get improved.

748
01:05:42,463 --> 01:05:50,653
To your point, this is like version one, and it's a mad race across the board from academia to, commercial and whatnot.

749
01:05:50,653 --> 01:05:55,203
So it'll just be improving is how I see it.

750
01:05:55,568 --> 01:06:00,098
Miko Pawlikowski: the open models eventually prevailing and taking over?

751
01:06:00,748 --> 01:06:01,688
there's a lot of talk.

752
01:06:01,708 --> 01:06:04,208
Obviously people are excited about LLAMA-3.

753
01:06:04,228 --> 01:06:14,138
That's I think a lot of people call it GPT-4 class, comparable, a model that's effectively free to use.

754
01:06:15,148 --> 01:06:20,838
And, obviously Microsoft doing their own, research and releasing the Phi-3, in the open as well.

755
01:06:21,158 --> 01:06:25,328
Do you see this models eventually becoming the defacto standard?

756
01:06:26,418 --> 01:06:28,178
Amit Bahree: they certainly have a place, for sure.

757
01:06:28,218 --> 01:06:30,538
I think, there's no question about that.

758
01:06:30,618 --> 01:06:32,578
I don't know if they're the de facto standard or not.

759
01:06:32,578 --> 01:06:41,638
I think the challenge would come down to is at the end of the day, with the current state of technology, training a model is super expensive.

760
01:06:42,508 --> 01:06:44,158
There is no shortcut around that.

761
01:06:44,598 --> 01:06:57,833
So even if you have an open source model in the sort of near term, there's only a handful of companies who
have the technical know how have the sort of the muscle, have the compute profile to be able to do that.

762
01:06:58,593 --> 01:07:04,873
so more and more until again, unless, as there's more fundamental breakthroughs to open that up more.

763
01:07:05,468 --> 01:07:14,468
a lot of the open source, it's back to that, the ant analogy used for, with the, one of the diagrams we have from the paper in the book.

764
01:07:14,898 --> 01:07:19,528
the part I was trying to show there also is the roots are very few models that are derived from that.

765
01:07:19,528 --> 01:07:31,348
So I think once there's a lot happening, it's at the end of the day, there'll be just a
handful of people who are publishing those and exposing those, that others are deriving from.

766
01:07:32,008 --> 01:07:38,168
So until that happens, as in fundamental breakthroughs at a cost point of view, where it becomes cheaper.

767
01:07:38,218 --> 01:07:46,193
It all doesn't need hundreds of thousands of whatever it is, GPUs plus I don't know how many billions of tokens of data, to train them.

768
01:07:47,270 --> 01:07:47,800
Miko Pawlikowski: Oh, don't worry.

769
01:07:47,800 --> 01:07:47,880
I

770
01:07:47,968 --> 01:07:48,388
Amit Bahree: it won't be

771
01:07:48,523 --> 01:07:51,843
Miko Pawlikowski: my crypto mining farm in my garage.

772
01:07:54,418 --> 01:07:55,288
Amit Bahree: There you go.

773
01:07:55,388 --> 01:07:56,718
That is one way to do it.

774
01:07:56,768 --> 01:08:02,478
I think open source won't still be constrained with a few source models, which they go derive from.

775
01:08:03,018 --> 01:08:19,708
But the fact, if you, the other way If you just look in the last one year, 12 months, which is nothing in the sense of humanity and technology, just see
how much progress and how much improvement the models have made across both dimensions, whether they're open source or closed source or what have you.

776
01:08:20,078 --> 01:08:21,178
It is fascinating.

777
01:08:21,188 --> 01:08:25,118
and, the fact that there's literally new models every day is a good thing, but also not a good thing.

778
01:08:25,528 --> 01:08:27,408
So it has to stabilize to some extent.

779
01:08:27,428 --> 01:08:28,298
At some point it will.

780
01:08:28,828 --> 01:08:31,458
but I think the open source community is absolutely critical.

781
01:08:31,818 --> 01:08:44,128
On the flip side, a lot of research breakthroughs also are coming from the research labs where, there's, at
the end of the day, deeper pockets and muscle sports in the sense of financial and compute and data as well.

782
01:08:44,238 --> 01:08:57,228
It's a fascinating world we are in, which is your, one of your opening statements, because at
least for a geek and somebody in the industry, these are far and few moments that one gets.

783
01:08:57,238 --> 01:08:58,938
So it's absolutely fascinating.

784
01:08:59,660 --> 01:09:06,460
Miko Pawlikowski: Yeah, we'll be Sitting down with the grandchildren saying, ah, I remember in my day

785
01:09:08,475 --> 01:09:10,755
They released the first capable.

786
01:09:10,935 --> 01:09:11,775
Amit Bahree: they're like, what?

787
01:09:11,775 --> 01:09:14,005
you use hundreds of GPUs and all this stuff?

788
01:09:14,005 --> 01:09:14,255
Why?

789
01:09:14,305 --> 01:09:17,415
I can just run it on my phone or whatever the phone looks like.

790
01:09:17,575 --> 01:09:17,635
I don't know.

791
01:09:18,495 --> 01:09:19,275
Miko Pawlikowski: Yeah, exactly.

792
01:09:19,275 --> 01:09:21,475
You are so wasteful back in the day.

793
01:09:21,505 --> 01:09:23,155
Really very clunky

794
01:09:24,730 --> 01:09:25,310
Amit Bahree: That's right.

795
01:09:25,360 --> 01:09:37,640
Miko Pawlikowski: Well, we're going to have to wait a little bit until that materializes, but I completely
agree It's a very interesting time to be alive, and I'm certainly grateful That I get to experience that.

796
01:09:38,370 --> 01:09:40,910
Amit, it's been a pleasure to host you.

797
01:09:41,010 --> 01:09:42,220
Thank you so much for coming

798
01:09:42,235 --> 01:09:43,105
Amit Bahree: Thank you for having me.