1
00:00:00,240 --> 00:00:04,260
I'm Miko Pawlikowski
and this is HockeyStick.

2
00:00:04,475 --> 00:00:08,940
Generative AI is on everyone's mind.

3
00:00:09,230 --> 00:00:13,430
From essays to photorealistic
pictures to high quality videos it

4
00:00:13,430 --> 00:00:17,489
has changed the way we think about
creativity and intelligence forever.

5
00:00:17,700 --> 00:00:22,009
If the AI won't steal your job,
but somebody using AI will, then

6
00:00:22,009 --> 00:00:25,040
the best defense is to learn
how this technology works ASAP.

7
00:00:25,470 --> 00:00:29,800
Today, I'm bringing you Mark Liu, the
author of Learn Generative AI with

8
00:00:29,820 --> 00:00:33,860
PyTorch, a tenured finance professor
and the founding director of the Master

9
00:00:33,860 --> 00:00:38,250
of Science in Finance program at the
University of Kentucky and a veteran

10
00:00:38,279 --> 00:00:40,470
coder with over 20 years of experience.

11
00:00:40,699 --> 00:00:44,490
In this conversation, we'll talk about
learning through doing, how everybody can

12
00:00:44,499 --> 00:00:48,560
build generative AI models, the various
breakthroughs that allowed for the current

13
00:00:48,589 --> 00:00:52,789
AI explosion to take place, and make
some wild predictions about the future.

14
00:00:53,570 --> 00:00:55,859
Welcome to this episode and please enjoy.

15
00:00:56,090 --> 00:00:56,840
How are you doing today?

16
00:00:58,170 --> 00:00:58,860
Pretty good.

17
00:00:59,019 --> 00:00:59,840
Thank you Miko.

18
00:01:00,010 --> 00:01:00,970
glad to be here.

19
00:01:01,229 --> 00:01:02,630
Yeah, I'm very excited.

20
00:01:02,700 --> 00:01:05,869
not only because I'm hoping to learn
so many interesting things from

21
00:01:05,869 --> 00:01:10,540
your book, but also because I'm very
curious, how does somebody who's a

22
00:01:10,540 --> 00:01:14,210
founding director of a master of science
in finance and a tenured professor

23
00:01:14,210 --> 00:01:17,160
in finance, decide to go into AI.

24
00:01:17,700 --> 00:01:19,070
Tell us a little bit about your story.

25
00:01:19,237 --> 00:01:26,787
it goes back to, like five years ago,
in 2017, our department wanted to launch

26
00:01:27,227 --> 00:01:29,507
a Master of Science in Finance program.

27
00:01:30,227 --> 00:01:34,247
And it is that point, I've been
tenured for about five years.

28
00:01:34,982 --> 00:01:39,352
I was always, very adventurous,
trying to do new things.

29
00:01:39,762 --> 00:01:45,832
I was appointed the founding
director to start an academic

30
00:01:45,962 --> 00:01:47,662
graduate program from scratch.

31
00:01:48,722 --> 00:01:51,552
And, I was very much into it.

32
00:01:52,022 --> 00:01:53,422
it was a lot of work.

33
00:01:53,877 --> 00:01:55,727
But I thoroughly enjoyed it.

34
00:01:55,727 --> 00:01:59,907
So our program launched in fall of 2017.

35
00:02:00,447 --> 00:02:01,707
And it's a one year program.

36
00:02:02,897 --> 00:02:09,337
at the end of 2017, We started
to, place our students.

37
00:02:09,507 --> 00:02:16,567
the very first year we had 30 students
in the program, which is a great number.

38
00:02:17,267 --> 00:02:22,757
And, I talked to many employers,
many companies, trying to

39
00:02:22,797 --> 00:02:25,157
place our MS Finance students.

40
00:02:25,597 --> 00:02:28,937
I heard the same thing again and again.

41
00:02:29,587 --> 00:02:35,162
they told me that they want somebody
who not only knows finance, but also

42
00:02:35,162 --> 00:02:41,422
knows coding programming analytics and
the number one programming language

43
00:02:41,652 --> 00:02:48,942
in finance is Python and I've been
doing programming for many years

44
00:02:48,992 --> 00:02:53,782
So those are mainly, statistical,
software to run regression

45
00:02:53,822 --> 00:02:55,302
for the finance research.

46
00:02:56,032 --> 00:03:04,062
And then I had to learn Python from
scratch in order to teach my students.

47
00:03:04,112 --> 00:03:10,947
And it turns out that Python is a very
user-friendly programming language,

48
00:03:11,227 --> 00:03:20,937
so even if you never programmed
before, you can guess what a block

49
00:03:20,937 --> 00:03:23,487
of code is trying to accomplish.

50
00:03:24,117 --> 00:03:33,067
I started to run Python workshops to
MS finance students and gradually I

51
00:03:33,377 --> 00:03:43,047
accumulated a lot of teaching notes and
I also had to convince my students to

52
00:03:43,047 --> 00:03:47,697
use Python, because some of the students
said that, "I can do everything in

53
00:03:47,697 --> 00:03:49,927
Excel, why should I learn Python", right?

54
00:03:50,297 --> 00:03:54,917
And then I told them that, Excel is
not exactly a programming language,

55
00:03:55,337 --> 00:04:00,697
and you do need a programming
language in order to automate things

56
00:04:01,037 --> 00:04:05,557
to make sense, more convenient, the
bigger programs, that kind of stuff.

57
00:04:05,857 --> 00:04:13,527
So what I did was I started to create
fun projects in finance, like speech

58
00:04:13,577 --> 00:04:15,767
recognition and text to speech.

59
00:04:16,097 --> 00:04:23,017
So one example would be I add those
features to a finance calculator.

60
00:04:23,127 --> 00:04:28,727
what you can do is that you can actually
speak to a computer, and ask the

61
00:04:28,727 --> 00:04:31,777
computer to do a finance calculation.

62
00:04:31,957 --> 00:04:38,187
you can tell the program in a human
voice "what is the present value of

63
00:04:38,187 --> 00:04:44,317
$1000 in five years", And then the
program will do the calculation and

64
00:04:44,377 --> 00:04:47,697
tell you the answer in a human voice.

65
00:04:48,177 --> 00:04:50,887
and then that caught
a student's attention.

66
00:04:51,177 --> 00:04:54,257
So I started to do those
kind of applications.

67
00:04:54,647 --> 00:04:58,657
And then after a year or so,
I had plenty of projects.

68
00:04:59,027 --> 00:05:03,327
And then some students told me
"you should write a book about it".

69
00:05:03,587 --> 00:05:10,232
So I started to, send the manuscript to
no starch, press to publish the book.

70
00:05:10,602 --> 00:05:16,122
The moment my colleagues, or my students,
or a lot of my friends, even my family

71
00:05:16,122 --> 00:05:21,792
members, heard that I'm writing a
programming book, In Python about the

72
00:05:21,822 --> 00:05:27,552
speech recognition and the text to
speech, their first reaction was, "I

73
00:05:27,582 --> 00:05:29,242
thought you were a finance professor"

74
00:05:29,572 --> 00:05:31,842
that question came up again and again.

75
00:05:32,172 --> 00:05:40,892
And then I gave them a famous quote by
a chief risk officer from Deutsche Bank.

76
00:05:41,382 --> 00:05:45,472
"banks are essentially
technology firms now".

77
00:05:46,402 --> 00:05:53,602
So there is a lot of truth in that because
in order to be in the field of finance,

78
00:05:53,622 --> 00:05:59,972
you need to know a lot of technology, know
programming, know, analytics and so forth.

79
00:06:00,712 --> 00:06:03,432
So that was my first book.

80
00:06:03,852 --> 00:06:06,292
in 2020, it's finally published in 2021.

81
00:06:07,102 --> 00:06:11,552
So I think I, signed a
contract with them in 2019.

82
00:06:12,172 --> 00:06:14,912
And then after that, I.

83
00:06:14,912 --> 00:06:18,652
Started to, teach a course
in the MS finance program.

84
00:06:19,242 --> 00:06:22,322
So it's called, Python,
predictive analytics.

85
00:06:22,622 --> 00:06:25,602
so use Python to do machinery models.

86
00:06:26,092 --> 00:06:33,132
for business analytics, and, I started to,
teach students a lot of machine learning

87
00:06:33,162 --> 00:06:36,452
models, including, deep neural networks.

88
00:06:37,112 --> 00:06:41,632
And then, again, I,
accumulated a lot of, notes.

89
00:06:42,482 --> 00:06:43,522
And then,

90
00:06:43,582 --> 00:06:51,232
I came across a video from DeepMind,
showing how you can actually play

91
00:06:51,582 --> 00:06:59,322
Atari games like, Breakout, by
training a computer program to play

92
00:06:59,322 --> 00:07:02,602
the game, at a superhuman level.

93
00:07:02,972 --> 00:07:08,612
So what happened was, not only the
computer program learned,, To play

94
00:07:08,632 --> 00:07:15,682
the game, it actually figured out
a way to score very efficiently, a

95
00:07:15,682 --> 00:07:18,672
way human beings didn't know before.

96
00:07:18,822 --> 00:07:23,652
So you, dig a tunnel at the side
of the wall, and then you send

97
00:07:23,692 --> 00:07:27,982
the ball to the back of the wall
to score it very efficiently.

98
00:07:28,362 --> 00:07:31,842
When I saw that video,
I was completely amazed.

99
00:07:32,392 --> 00:07:35,522
I told myself, "I gotta
figure out how this works".

100
00:07:35,962 --> 00:07:42,212
I spent several months experimented
with different kind of programs,

101
00:07:42,212 --> 00:07:44,202
trying to figure out how it works.

102
00:07:45,062 --> 00:07:47,852
And eventually I figured it out.

103
00:07:48,722 --> 00:07:51,492
And that became my second book.

104
00:07:51,812 --> 00:07:54,182
it's machine learning animated.

105
00:07:54,582 --> 00:07:59,072
So it's published with
CRC Press, last year.

106
00:08:00,552 --> 00:08:09,407
And then, recently, once, ChatGPT was
out, generative AI was very popular.

107
00:08:09,507 --> 00:08:11,397
I was very curious.

108
00:08:12,162 --> 00:08:17,542
I was trying to figure out how
exactly a large language model

109
00:08:17,552 --> 00:08:25,552
works, and how a computer program
can understand the human language.

110
00:08:26,072 --> 00:08:29,022
I spend a lot of time
trying to figure it out.

111
00:08:29,492 --> 00:08:31,742
Before I was actually using TensorFlow.

112
00:08:31,762 --> 00:08:37,502
It worked pretty well for me with
Atari games and so on and so forth.

113
00:08:37,892 --> 00:08:42,492
apparently it's not great
in terms of GPU training.

114
00:08:42,852 --> 00:08:47,052
You can do GPU training,
but there is an overhead.

115
00:08:47,102 --> 00:08:51,322
So you have to program everything
in CPU and then send it to the GPU.

116
00:08:52,367 --> 00:08:54,517
Do the calculation and then send it back.

117
00:08:55,077 --> 00:08:56,687
the overhead is just too much.

118
00:08:56,697 --> 00:08:59,967
So it ended up, not very fast.

119
00:09:00,427 --> 00:09:06,277
then I learned another AI
framework called PyTorch.

120
00:09:06,387 --> 00:09:17,547
you can explicitly send a tensor to GPU to
do the calculation and so on and so forth.

121
00:09:17,547 --> 00:09:23,387
It's a little more complicated than
TensorFlow because you do have to send

122
00:09:23,397 --> 00:09:28,097
something to GPU and then, get it back.

123
00:09:28,317 --> 00:09:32,787
So in terms of coding, you have
to do a slightly more work, but in

124
00:09:32,787 --> 00:09:35,417
terms of performance, it's amazing.

125
00:09:35,797 --> 00:09:39,827
So I get to, train models.

126
00:09:40,962 --> 00:09:46,322
7 to 10 times faster,
compared to CPU training.

127
00:09:46,822 --> 00:09:51,502
as all those large language models,
they have billions or hundreds

128
00:09:51,502 --> 00:09:53,762
of billions of parameters, right?

129
00:09:54,122 --> 00:09:56,432
So the speed is crucial.

130
00:09:57,042 --> 00:10:01,302
RIght now, I'm like, training a
model with millions of parameters.

131
00:10:01,382 --> 00:10:02,242
which is fine.

132
00:10:02,462 --> 00:10:08,012
So for, even larger kind of language
models, in my third book, which

133
00:10:08,012 --> 00:10:10,387
is with, manning publications.

134
00:10:10,827 --> 00:10:16,097
So in this book, I'm doing
generative AI with PyTorch.

135
00:10:16,417 --> 00:10:21,577
the reason I switched to PyTorch
is because of dynamic, computing,

136
00:10:21,587 --> 00:10:25,097
graph, and then, the GPU training.

137
00:10:25,177 --> 00:10:28,247
I can train most models
in a matter of minutes.

138
00:10:28,687 --> 00:10:32,047
sometimes I get a larger
ones, maybe a couple of hours.

139
00:10:32,427 --> 00:10:33,117
That's it.

140
00:10:33,147 --> 00:10:37,507
I can see the model in action
and then I can tune the model

141
00:10:37,687 --> 00:10:39,227
so that's the third book.

142
00:10:39,237 --> 00:10:44,907
So let me, conclude by quickly summarizing
what I'm doing in the third book.

143
00:10:45,417 --> 00:10:48,337
the name, I think you just
mentioned at the beginning.

144
00:10:48,727 --> 00:10:51,897
Learn Generative AI with PyTorch.

145
00:10:52,417 --> 00:11:00,387
Readers learn to create generative
AI models from scratch, to create the

146
00:11:00,387 --> 00:11:08,797
different contents like, images, shapes,
numbers, text, music, sound, so forth,

147
00:11:09,197 --> 00:11:13,237
all with PyTorch and deep learning models.

148
00:11:13,697 --> 00:11:15,187
And in particular,

149
00:11:15,747 --> 00:11:18,147
readers learn how to create.

150
00:11:18,877 --> 00:11:27,397
a ChatGPT-style transformer from
scratch, and then in particular, I teach

151
00:11:27,567 --> 00:11:36,867
readers how to create a GPT-2 XL  with
1.5B parameters Of course, with 1.

152
00:11:37,027 --> 00:11:39,667
5 billion parameters, it's
very hard to train, right?

153
00:11:39,667 --> 00:11:40,647
It's very slow, number one.

154
00:11:40,647 --> 00:11:47,927
Number two, GPT-2 was trained with huge
amounts of data, and regular readers don't

155
00:11:47,987 --> 00:11:50,917
have access to this training data, right?

156
00:11:51,277 --> 00:12:01,377
but, I also teach readers how to
extract the pre trained weights from

157
00:12:01,397 --> 00:12:11,477
OpenAI and then you load those weights
into the GPT-2 model you created from

158
00:12:11,487 --> 00:12:15,097
scratch, and start to generate the text.

159
00:12:15,507 --> 00:12:23,547
So the text you generate is very
coherent without grammar errors,

160
00:12:23,837 --> 00:12:29,967
it's amazing, of course it's not
as Powerful as ChatGPT GPT-4, but

161
00:12:30,077 --> 00:12:38,837
a normal person without access to super
computing facilities, without access

162
00:12:38,837 --> 00:12:45,667
to larger amounts of training data can
create a ChatGPT-style deep neural network

163
00:12:45,707 --> 00:12:53,097
from scratch, and use it to generate
a text and generate a lifelike music.

164
00:12:53,287 --> 00:12:54,117
It's amazing.

165
00:12:54,237 --> 00:12:55,517
And that's the text part.

166
00:12:55,607 --> 00:12:59,437
on the image part, you can
create like a color image.

167
00:12:59,852 --> 00:13:04,012
You can also convert a horse to a zebra.

168
00:13:04,222 --> 00:13:08,822
You can convert blonde hair
to black hair in images.

169
00:13:09,042 --> 00:13:14,292
You can add or remove glasses
in images and so forth.

170
00:13:14,302 --> 00:13:16,852
So the whole experience is amazing.

171
00:13:16,922 --> 00:13:20,382
it worked better than anticipated.

172
00:13:20,812 --> 00:13:22,812
And that's a whole experience.

173
00:13:23,372 --> 00:13:29,582
Reminded me of famous quote,
"technology advanced enough is

174
00:13:29,612 --> 00:13:32,262
indistinguishable from magic".

175
00:13:32,462 --> 00:13:33,902
The whole thing is really magic.

176
00:13:34,342 --> 00:13:36,372
That's my long answer to your question.

177
00:13:37,047 --> 00:13:37,817
Thank you for that.

178
00:13:37,867 --> 00:13:41,937
just for anybody who's not familiar
with Manning, the book, is currently

179
00:13:41,937 --> 00:13:43,997
available in what's called MEEP.

180
00:13:44,047 --> 00:13:48,547
That's for Manning Early Access
Program, you can read the chapters

181
00:13:48,617 --> 00:13:51,127
as, they are produced, by Mark.

182
00:13:51,497 --> 00:13:55,647
So at the moment there is five chapters
that are available, but I'm being told

183
00:13:55,677 --> 00:13:58,447
that 11, will be coming, very soon.

184
00:13:58,902 --> 00:14:02,402
And the estimated time for the
whole book to be available is May

185
00:14:02,402 --> 00:14:07,272
2024, so for anybody who's eager
and who might be thinking that the

186
00:14:07,282 --> 00:14:10,632
book is not finished yet, you can
actually start reading it right now.

187
00:14:12,282 --> 00:14:16,602
speaking of the magic and the building
from scratch, I think what I liked the

188
00:14:16,602 --> 00:14:20,512
most about your book, and what initially
attracted me to actually go and read it,

189
00:14:21,352 --> 00:14:22,912
It's that 'build from scratch' thing.

190
00:14:22,922 --> 00:14:27,862
And I love that you used Richard
Feynman's philosophy, the quote, "What

191
00:14:27,862 --> 00:14:30,092
I cannot create, I do not understand".

192
00:14:30,722 --> 00:14:33,852
I think that's a very
good motto to live by.

193
00:14:34,812 --> 00:14:39,542
it's absolutely great that, you
take us on this journey to build

194
00:14:39,562 --> 00:14:43,682
things up, even though I've only
read the five chapters so far.

195
00:14:44,822 --> 00:14:47,872
all of a sudden with ChatGPT,
everybody started talking

196
00:14:47,872 --> 00:14:49,692
about this and this explosion.

197
00:14:50,132 --> 00:14:53,692
what were some other moments, other
than chat GPT, where you realized,

198
00:14:53,722 --> 00:14:55,552
Oh man, this is going to blow up.

199
00:14:55,552 --> 00:14:58,612
This is going to be
massive with generative AI.

200
00:14:58,612 --> 00:15:03,312
I believe you mentioned, the writer's
guild of America versus AI, story.

201
00:15:03,392 --> 00:15:04,902
Can we talk about that for a minute?

202
00:15:06,077 --> 00:15:11,717
before I answer that question, I encourage
you to read my chapter one for free,

203
00:15:11,757 --> 00:15:13,607
even if you don't have to buy my book.

204
00:15:13,667 --> 00:15:15,557
manning has a great feature.

205
00:15:15,607 --> 00:15:22,517
If you go to manning.com and if you
look for my book, Learn Generative

206
00:15:22,517 --> 00:15:24,527
AI with PyTorch, you can find it.

207
00:15:24,957 --> 00:15:30,727
I have a fairly long chapter one
summarizing the state of the art

208
00:15:30,837 --> 00:15:34,607
in generative AI and also what
I've been doing in the book.

209
00:15:35,227 --> 00:15:38,697
what Miko talked about, the
Writer's Guild of America.

210
00:15:39,207 --> 00:15:42,827
So a few months ago, they,
negotiated with, big firms.

211
00:15:43,477 --> 00:15:46,627
About, The threat of, AI.

212
00:15:47,097 --> 00:15:54,382
And as a result, it's a, contract
to limit, how much AI you can use

213
00:15:54,432 --> 00:16:00,492
in writing, in production, in order
to protect the jobs of the writers.

214
00:16:00,912 --> 00:16:06,852
And, this is just one example
of the, Disruptive power of AI

215
00:16:06,902 --> 00:16:09,272
in many different industries.

216
00:16:09,602 --> 00:16:17,512
writers is just one example, and
it threatens many other industries.

217
00:16:17,672 --> 00:16:25,572
Another example is checkmate, which
is online educational platform.

218
00:16:25,952 --> 00:16:31,467
So college students go there to
get tutoring service and so forth,

219
00:16:31,717 --> 00:16:37,077
and with the ChatGPT actually their
business model is threatened, right?

220
00:16:37,077 --> 00:16:42,122
I think, in the month after the
release of ChatGPT, their, stock

221
00:16:42,122 --> 00:16:45,112
price plunged by almost 40%.

222
00:16:45,562 --> 00:16:49,502
So that's how serious the, competition is.

223
00:16:49,802 --> 00:16:51,872
Those are just, a couple of examples.

224
00:16:51,952 --> 00:16:57,662
the potential of generative AI is huge,
but at the same time, if you don't,

225
00:16:57,662 --> 00:17:00,032
catch up with the trend, there is,

226
00:17:00,112 --> 00:17:04,757
a risk that, your job
might be, replaced by ai.

227
00:17:05,387 --> 00:17:06,947
there is a, an interesting quote.

228
00:17:07,337 --> 00:17:08,987
I think there is a lot of truth.

229
00:17:09,842 --> 00:17:12,942
It says that, "AI will not take your job.

230
00:17:13,692 --> 00:17:15,692
somebody using AI will".

231
00:17:15,942 --> 00:17:17,812
So I think there is a
lot of truth in that.

232
00:17:17,822 --> 00:17:23,902
So in order to avoid being
replaced by AI, I think the best

233
00:17:23,902 --> 00:17:26,022
strategy is to get in the game.

234
00:17:26,592 --> 00:17:33,462
to learn about the general AI, to protect
yourself in terms of, future careers.

235
00:17:33,962 --> 00:17:34,882
so that's,

236
00:17:34,932 --> 00:17:38,292
the big motivation, behind my books.

237
00:17:38,392 --> 00:17:42,652
the main motivation, of course,
is intellectual curiosity.

238
00:17:42,702 --> 00:17:45,012
I'm by nature a very curious person.

239
00:17:45,102 --> 00:17:52,602
So when I saw like ChatGPT
works like magic, I really want

240
00:17:52,607 --> 00:17:53,747
to get it to the bottom of it.

241
00:17:54,267 --> 00:17:56,437
And they're trying to
figure out how it works.

242
00:17:57,147 --> 00:17:58,477
So that's the main reason.

243
00:17:58,487 --> 00:18:01,587
But at the same time, I'm
trying to teach my students.

244
00:18:02,452 --> 00:18:08,492
programming skills, machine learning
skills, AI skills, generative AI skills in

245
00:18:08,492 --> 00:18:12,322
order to prepare them for the job market.

246
00:18:12,712 --> 00:18:18,032
so that, in the future, their
skill sets will not be outdated.

247
00:18:18,082 --> 00:18:22,722
that's my second motivation
for writing the books.

248
00:18:24,318 --> 00:18:29,588
Do you buy in this comparison
that AI is like personal compuers?

249
00:18:30,697 --> 00:18:34,597
And that, a lot of people were
worried about how personal computers

250
00:18:34,627 --> 00:18:37,637
were going to just remove jobs.

251
00:18:37,687 --> 00:18:43,327
But what ended up happening was, some,
small portion of jobs was eliminated,

252
00:18:43,327 --> 00:18:48,467
but most of the jobs were modified,
and became, operating computers.

253
00:18:49,107 --> 00:18:52,787
Do you think that's the most apt
comparison of what we're likely to

254
00:18:52,817 --> 00:18:54,987
experience with AI in the coming years?

255
00:18:55,960 --> 00:19:02,850
the future, is hard to predict, but
personally, I think, most likely, that's

256
00:19:03,430 --> 00:19:05,240
what's going to happen in the near future.

257
00:19:05,580 --> 00:19:12,700
if generative AI, you can actually
use it to increase your productivity,

258
00:19:13,040 --> 00:19:15,190
to have more job opportunities.

259
00:19:15,630 --> 00:19:20,960
On the other hand, if you, basically,
completely stay away from it, your

260
00:19:20,960 --> 00:19:25,560
skill sets might be outdated but at
the same time, I think technology

261
00:19:25,560 --> 00:19:32,250
will make all this AI stuff more
accessible to most people, right?

262
00:19:32,490 --> 00:19:36,970
You don't necessarily have
to be a programmer, so one

263
00:19:36,975 --> 00:19:38,930
example is Midjourney right?

264
00:19:39,220 --> 00:19:44,400
you can actually just go to a browser
and then you can use Midjourney

265
00:19:44,420 --> 00:19:52,500
or DALL-E 2, DALL-E 3, or whatever
to create a very fancy images.

266
00:19:52,860 --> 00:19:55,560
You can use a text prompt to create a.

267
00:19:55,560 --> 00:19:59,230
an image of what you meant, you
don't have to draw yourself,

268
00:19:59,250 --> 00:20:01,320
in that sense, I'm optimistic.

269
00:20:01,460 --> 00:20:08,260
I think for most people, generative
AI will be a very valuable tool

270
00:20:08,860 --> 00:20:11,260
to increase their productivity.

271
00:20:11,680 --> 00:20:14,710
as long as, you keep
up with the technology,

272
00:20:14,710 --> 00:20:18,420
I'm glad you mentioned Midjourney because
I think for me personally, that was where

273
00:20:18,420 --> 00:20:22,450
I realized: 'okay, this is the hockeystick
moment' because I remember the little

274
00:20:22,770 --> 00:20:25,550
tiny pictures, blurry from the GAN paper.

275
00:20:26,270 --> 00:20:30,420
and then all of a sudden I saw some
pictures that were generated by

276
00:20:30,420 --> 00:20:34,820
Midjourney and I went and I, I tried
it myself and, it was more or less able

277
00:20:34,820 --> 00:20:40,440
to produce almost everything I threw at
it, other than some particular types of

278
00:20:40,440 --> 00:20:42,510
dinosaurs that just didn't recognize.

279
00:20:44,020 --> 00:20:46,520
That was like the one thing I
knew, 'okay, they didn't train

280
00:20:46,520 --> 00:20:47,860
it on that kind of dinosaur'.

281
00:20:47,860 --> 00:20:51,450
But, that was definitely one of
those moments where I realized, wow.

282
00:20:51,920 --> 00:20:55,510
And the other is, I think, I live in
London, one way or another, you end

283
00:20:55,510 --> 00:20:59,630
up using the tube a lot, and, usually
you're annoyed at people who, play

284
00:20:59,630 --> 00:21:01,570
some music on like public transport.

285
00:21:01,580 --> 00:21:05,980
And then, at some point I realized
that I was getting annoyed at people

286
00:21:06,010 --> 00:21:11,980
talking about generative AI, on the
public transport and making noise.

287
00:21:12,370 --> 00:21:16,600
And that's when you realize that, 'okay,
so this has now gone, mainstream and,

288
00:21:16,600 --> 00:21:18,020
and everybody's talking about that'.

289
00:21:18,030 --> 00:21:23,310
But let's talk a little bit about,
The actual underlying breakthroughs,

290
00:21:23,370 --> 00:21:26,130
that brought us to where we are.

291
00:21:26,130 --> 00:21:31,330
And, in particular, I'm thinking
about GAN, the generative adversarial

292
00:21:31,330 --> 00:21:34,280
networks and transformers and diffusion.

293
00:21:34,960 --> 00:21:36,150
where should we start?

294
00:21:36,430 --> 00:21:40,520
what's the first important breakthrough
that everybody should know about?

295
00:21:41,130 --> 00:21:47,470
I think, all the generative AI models,
in my book are deep neural networks.

296
00:21:47,960 --> 00:21:50,580
machine learning is a very wide field.

297
00:21:50,900 --> 00:21:55,650
there are many traditional machine
learning models, random forest,

298
00:21:56,260 --> 00:22:02,290
linear regressions, this and that,
but about, 20 years ago, deep neural

299
00:22:02,290 --> 00:22:04,080
networks became very powerful.

300
00:22:04,890 --> 00:22:13,250
one great thing about the neural networks
is that you can scale it and, deep neural

301
00:22:13,250 --> 00:22:22,000
network can approximate any relationship,
even if we human beings don't know what's

302
00:22:22,000 --> 00:22:28,740
the exact relationship, as long as you
create a large enough model to capture it.

303
00:22:29,150 --> 00:22:32,390
so that's the foundation 20 years ago.

304
00:22:32,720 --> 00:22:36,300
And then over the past, 20
years or so, many people.

305
00:22:36,800 --> 00:22:43,690
Breakthroughs in, deep learning field, and
then, let's talk about it like a ChatGPT.

306
00:22:43,710 --> 00:22:51,590
Okay, so ChatGPT is a huge deep neural
network trained on huge amounts of data.

307
00:22:52,610 --> 00:22:58,150
And before that, state of the art,
natural language processing models

308
00:22:58,200 --> 00:23:00,840
are recurrent neural networks.

309
00:23:01,240 --> 00:23:06,560
So how it works was either
progresses on the timeline.

310
00:23:06,640 --> 00:23:10,920
Let's say you have a sentence
like, this is a sentence, right?

311
00:23:11,190 --> 00:23:13,730
So you have like four words
in the sentence, right?

312
00:23:13,950 --> 00:23:20,400
the model uses the first, word, "this"
to predict the second word "is" and then

313
00:23:20,470 --> 00:23:25,250
it uses the first two words to predict
the third word, and so on and so forth.

314
00:23:26,200 --> 00:23:31,200
it worked to some degree, but
it's very slow because, you have

315
00:23:31,210 --> 00:23:34,920
to, predict one word at a time.

316
00:23:35,760 --> 00:23:40,560
And then in 2017, there
is a huge breakthrough.

317
00:23:41,010 --> 00:23:42,090
There's a paper.

318
00:23:42,265 --> 00:23:46,735
called "attention is all you need"
by a group of, Google scholars,

319
00:23:47,345 --> 00:23:54,365
and they used a different mechanism
to capture the, relationship of

320
00:23:54,375 --> 00:23:55,915
different words in a sentence.

321
00:23:56,515 --> 00:24:05,025
So it's called the attention mechanism and
It's much more effective on top of that.

322
00:24:05,505 --> 00:24:07,750
it's not sequential.

323
00:24:07,770 --> 00:24:11,710
So which means one word
can't pay attention to all

324
00:24:11,760 --> 00:24:13,410
other words at the same time.

325
00:24:14,080 --> 00:24:17,490
And this allows for, parallel training.

326
00:24:18,080 --> 00:24:20,320
And this has huge implications.

327
00:24:20,370 --> 00:24:26,010
number one, it works better in terms
of capturing long-term relationships.

328
00:24:26,315 --> 00:24:30,775
between different words in a sentence so
that you can understand the meaning of

329
00:24:30,775 --> 00:24:33,025
a long sentence, long text, number one.

330
00:24:33,025 --> 00:24:38,265
Number two, because of the non sequential
nature of, Attention mechanism.

331
00:24:39,085 --> 00:24:40,955
You can use parallel training.

332
00:24:41,005 --> 00:24:45,555
you can train the same model
on many different devices.

333
00:24:46,135 --> 00:24:48,105
this makes training much faster.

334
00:24:48,545 --> 00:24:53,935
And this also allows you to
train the model on more data.

335
00:24:54,425 --> 00:25:00,775
that's why ChatGPT became so powerful,
because, you can train them much faster,

336
00:25:00,805 --> 00:25:02,845
and then you can train them on more data.

337
00:25:02,925 --> 00:25:08,275
On top of that, the mechanism works
much better than recurrent neural

338
00:25:08,275 --> 00:25:14,515
networks, because it can capture really
long term relationships in a sequence,

339
00:25:14,875 --> 00:25:16,975
like as a text is a sequence, right?

340
00:25:17,025 --> 00:25:23,175
that propelled, uh, OpenAI to have
all these models, including ChatGPT.

341
00:25:23,575 --> 00:25:31,135
now let's go to, the recent development,
the text to image transformers.

342
00:25:31,235 --> 00:25:36,518
this is a new innovation in transformer
models called, multimodal models.

343
00:25:36,568 --> 00:25:41,538
The original transformer model, "attention
is all you need", which powers the

344
00:25:41,538 --> 00:25:44,308
chatGPT, they only use text, right?

345
00:25:44,308 --> 00:25:49,728
So the input is a sequence of text, the
output is also a sequence of text, but

346
00:25:49,768 --> 00:25:55,778
the multimodal models, the input and
output can be, different formats, right?

347
00:25:55,798 --> 00:26:02,358
32, 33, the input is a text and
the output is an image, right?

348
00:26:02,388 --> 00:26:04,598
you can have a different, inputs, outputs.

349
00:26:04,628 --> 00:26:08,813
You can have audio, you can have video,
Sora has videos, that kind of stuff.

350
00:26:08,893 --> 00:26:12,053
but let's talk about what
is the underlying mechanism

351
00:26:12,053 --> 00:26:14,078
behind multi modal models.

352
00:26:14,448 --> 00:26:19,783
DALL-E 2, DALL-E 3,s it has something
to do with different models.

353
00:26:20,033 --> 00:26:25,463
So I think you mentioned that, at first
the generated image is very grainy, right?

354
00:26:25,743 --> 00:26:31,543
the different models add
noise to an image gradually.

355
00:26:31,743 --> 00:26:34,333
let's say there are like 1000 time steps.

356
00:26:34,553 --> 00:26:40,543
And then at each time step, you can
actually add a little bit of noise

357
00:26:40,543 --> 00:26:48,313
to the image and gradually you have
a 1000 different images and each one

358
00:26:48,673 --> 00:26:54,823
becomes progressively noisier and at
the end, it becomes completely noisy.

359
00:26:56,093 --> 00:27:00,683
And then what you can do is that
you can give those images to a

360
00:27:00,683 --> 00:27:06,133
machine learning model and you can
train the model to remove those

361
00:27:06,233 --> 00:27:09,043
noises, progressively, step by step.

362
00:27:09,253 --> 00:27:15,163
that's how, DALL-E and all
those text to image models work.

363
00:27:15,393 --> 00:27:21,383
first step is that you use a text prompt
to generate a very grainy image, and

364
00:27:21,423 --> 00:27:26,963
then after that you use a model which
is very much like a different models.

365
00:27:27,293 --> 00:27:33,073
You will progressively refine those
models so that, you turn a very grainy

366
00:27:33,073 --> 00:27:36,303
image into a high resolution image.

367
00:27:36,453 --> 00:27:44,058
that's why, when you enter a like a
shorter prompt and then, DALL-E 2 can

368
00:27:44,058 --> 00:27:46,128
give you a higher resolution image.

369
00:27:46,668 --> 00:27:51,618
capturing, what are you trying
to produce in the text prompt.

370
00:27:51,628 --> 00:27:54,718
So that's actually chapter 14 of my book.

371
00:27:54,918 --> 00:27:58,068
I'm going to talk about how you
can add a little bit of noise to

372
00:27:58,068 --> 00:28:00,538
the image, one step at a time.

373
00:28:00,538 --> 00:28:06,848
And then you can use those, images to
train the model to remove the noise step

374
00:28:06,848 --> 00:28:15,748
by step progressively, and very much like,
DALL-E 2 trying to, make the image clearer

375
00:28:15,748 --> 00:28:18,248
and clearer step by step progressively.

376
00:28:22,458 --> 00:28:26,628
Generative adversarial networks,
which was an interesting

377
00:28:26,628 --> 00:28:28,898
development, from Ian Goodfellow.

378
00:28:29,158 --> 00:28:32,278
How does that fit into the rest
of what you just described?

379
00:28:33,053 --> 00:28:39,123
Generative Adversarial, Networks,
so it's great at generating

380
00:28:39,273 --> 00:28:41,883
different forms, of content.

381
00:28:42,703 --> 00:28:48,063
a lot of times when readers learn
something, if you give them the end

382
00:28:48,173 --> 00:28:50,653
product, it's too complicated, right?

383
00:28:50,663 --> 00:28:53,563
So they may get frustrated
and they just give up.

384
00:28:54,473 --> 00:29:03,303
as an author, my job is how to make sure
that readers stay engaged throughout

385
00:29:03,333 --> 00:29:09,353
the book and never get tired, never
get frustrated, and gradually learn and

386
00:29:09,463 --> 00:29:15,403
finally learn to do the state of the
art machine learning models generally by

387
00:29:15,403 --> 00:29:21,953
models like ChatGPT-style transformer to
generate the text and the audio, right?

388
00:29:22,023 --> 00:29:24,243
So what is the idea behind the GANs?

389
00:29:24,763 --> 00:29:25,933
You have two networks.

390
00:29:25,983 --> 00:29:28,383
One is a generator network.

391
00:29:28,433 --> 00:29:34,193
The other one is a discriminator network,
so the job of the generator is trying

392
00:29:34,373 --> 00:29:39,363
to generate a piece of work similar
to that from the training data set.

393
00:29:39,413 --> 00:29:43,718
let's use a grayscale
image as an example, right?

394
00:29:43,958 --> 00:29:47,548
you have a training dataset
of grayscale images of,

395
00:29:47,598 --> 00:29:50,068
handwritten digits, like 0 to 9.

396
00:29:50,268 --> 00:29:52,938
And then, those are the real images.

397
00:29:53,118 --> 00:29:59,853
And then you will ask the generator
to generate something similar to

398
00:29:59,853 --> 00:30:04,728
that, so that it can pass as real
in front of the discriminator.

399
00:30:05,678 --> 00:30:09,288
before you train the model,
the generator is terrible.

400
00:30:09,708 --> 00:30:14,858
So whatever the generator generated,
completely like gibberish.

401
00:30:14,878 --> 00:30:18,148
it's like a snowflake on a
screen, that kind of stuff.

402
00:30:19,188 --> 00:30:22,568
But, this is where training, comes in.

403
00:30:23,068 --> 00:30:27,068
you will have a training loop,
and then, in each iteration,

404
00:30:27,108 --> 00:30:32,788
you will ask the generator to
generate a bunch of fake images.

405
00:30:33,258 --> 00:30:39,788
At the same time, you also have a bunch
of real images from the training set and

406
00:30:39,818 --> 00:30:46,338
you give all those to the discriminator
and ask the discriminator to determine

407
00:30:47,188 --> 00:30:50,738
whether each image is real or fake

408
00:30:51,058 --> 00:30:58,628
And then the generator's job is
trying to create an image so that the

409
00:30:58,628 --> 00:31:01,838
discriminator would think it's real.

410
00:31:02,088 --> 00:31:04,178
that's the generator's objective.

411
00:31:04,538 --> 00:31:08,898
So therefore you have a loss function,
and then you train the model.

412
00:31:09,378 --> 00:31:15,658
You gradually fine tune the model
parameters so that in the next

413
00:31:15,658 --> 00:31:22,828
iteration, whatever image generated
by the generator will have a higher

414
00:31:22,838 --> 00:31:26,098
probability of passing as real.

415
00:31:26,868 --> 00:31:31,968
And then you do this again and again,
you can do the thousands of iterations.

416
00:31:32,508 --> 00:31:38,168
And, if you do that, long enough,
then eventually the generator will

417
00:31:38,188 --> 00:31:44,098
be able to create an image identical
to the image from the training set.

418
00:31:44,108 --> 00:31:50,448
So that's how GAN works you have a
zero sum game, you have a competitive

419
00:31:50,488 --> 00:31:54,898
kind of two networks competing
with each other, trying to outsmart

420
00:31:54,898 --> 00:31:59,018
each other and eventually, the
generator gets better and better.

421
00:31:59,178 --> 00:32:05,683
So that's the idea behind GANs,
it's a revolutionary idea.

422
00:32:05,833 --> 00:32:12,353
in 2014, 2015, Ian Goodfellow and
his co authors proposed the model.

423
00:32:12,653 --> 00:32:19,243
a great thing about the model is it can
generate different content: numbers.

424
00:32:19,713 --> 00:32:24,013
Images, shapes, even
music, so on and so forth.

425
00:32:25,083 --> 00:32:28,343
I love this idea because on
top of that, you've got this

426
00:32:28,603 --> 00:32:30,193
built-in target point, right?

427
00:32:30,223 --> 00:32:33,793
When your discriminator can
no longer discriminate between

428
00:32:33,793 --> 00:32:34,703
what you're generating.

429
00:32:34,703 --> 00:32:36,813
when you're finished, it's not arbitrary.

430
00:32:36,823 --> 00:32:37,593
You've got that.

431
00:32:37,593 --> 00:32:40,813
And the other reason why I love that
is that it's got this anecdote attached

432
00:32:40,823 --> 00:32:45,933
to it that, legend has it, it was
written one evening, when Ian was

433
00:32:45,933 --> 00:32:49,953
celebrating in a pub I think someone
was graduating, some fellow students.

434
00:32:51,148 --> 00:32:55,158
And, they were discussing a problem when
they wanted to generate some pictures.

435
00:32:55,658 --> 00:32:59,558
And he came up with this idea that,
'oh, what you're suggesting is too

436
00:32:59,558 --> 00:33:03,548
complicated and you should, put
two networks against each other'.

437
00:33:04,028 --> 00:33:04,758
And they laughed.

438
00:33:05,708 --> 00:33:08,298
he went home and, still slightly drunk.

439
00:33:08,378 --> 00:33:10,298
he wrote a proof of concept of that.

440
00:33:10,928 --> 00:33:13,268
And then turned out, that
it actually worked out.

441
00:33:13,318 --> 00:33:17,618
I think in one of the interviews
later, he said that if he wasn't drunk,

442
00:33:17,618 --> 00:33:20,448
he probably wouldn't have done it
because it sounded like a silly idea.

443
00:33:22,788 --> 00:33:23,188
Okay.

444
00:33:23,508 --> 00:33:24,278
Yeah, that's right.

445
00:33:24,278 --> 00:33:24,508
Yeah.

446
00:33:24,508 --> 00:33:26,678
how random some of those things are.

447
00:33:27,228 --> 00:33:28,898
How, weird and unpredicted.

448
00:33:28,988 --> 00:33:33,978
And I think one of the things I
wanted to ask you about is also

449
00:33:34,303 --> 00:33:38,563
what made all of those kind of
recent breakthroughs possible?

450
00:33:38,783 --> 00:33:39,623
what was missing?

451
00:33:39,623 --> 00:33:43,283
Because we've had the neural network
since what the 80s or something like that.

452
00:33:43,313 --> 00:33:48,353
all of a sudden, it looks like in the
last few years, or maybe last decade or

453
00:33:48,353 --> 00:33:52,733
so, it was just like one breakthrough
after another breakthrough just dropping.

454
00:33:52,733 --> 00:33:57,743
And if you try to keep up with
currently written papers on AI,

455
00:33:57,743 --> 00:33:59,243
there's just so many of them.

456
00:33:59,768 --> 00:34:03,938
And it looks like every other day, there's
something super interesting that's been

457
00:34:03,938 --> 00:34:09,738
developed and it's literally hard to
keep up just with other people's ideas.

458
00:34:10,168 --> 00:34:15,558
What do you think enabled this kind
of explosion in the recent years?

459
00:34:16,921 --> 00:34:21,631
actually, like a neural networks was
proposed even earlier than 1980s.

460
00:34:21,691 --> 00:34:28,021
I think in 1960s, researchers proposed
artificial neural networks, basically

461
00:34:28,121 --> 00:34:34,771
modeled after human brain, The idea was
a great one,   but at that point, we

462
00:34:34,831 --> 00:34:43,041
didn't have the, hardware to support it,
And then started in 1990s, early 2000s.

463
00:34:43,996 --> 00:34:47,356
The hardware becomes much
more powerful, number one.

464
00:34:47,886 --> 00:34:54,096
Number two: there was more research,
more breakthroughs in the research

465
00:34:54,096 --> 00:34:56,346
field of, artificial neural networks.

466
00:34:56,856 --> 00:35:03,535
so one example is, LeCun's, uh,
Convolutional Neural Networks.

467
00:35:04,605 --> 00:35:08,665
most neural networks are fully
connected, dense neural networks,

468
00:35:08,685 --> 00:35:13,485
which means, a neuron in the previous
layer is connected to all the neurons

469
00:35:13,525 --> 00:35:16,875
in the next layer, and it works great.

470
00:35:17,705 --> 00:35:22,865
Except that once your model becomes
larger, the number of parameters,

471
00:35:23,325 --> 00:35:26,925
grow exponentially, and then it's
very hard to train it, right?

472
00:35:26,945 --> 00:35:27,995
So that's a problem.

473
00:35:28,465 --> 00:35:34,225
convolutional neural networks is,
you localize the weights, okay?

474
00:35:34,445 --> 00:35:40,240
you have a filter, and then the weights
in the filter is a fixed When you move

475
00:35:40,280 --> 00:35:48,790
the filter on an image, and then this
greatly reduced the number of parameters.

476
00:35:49,290 --> 00:35:52,500
it makes, computer vision
much more efficient.

477
00:35:52,790 --> 00:35:58,710
because of that in, Early 2000s, there
were a lot of breakthroughs in computer

478
00:35:58,710 --> 00:36:05,410
vision, in convolutional neural networks,
and I think that's a huge breakthrough.

479
00:36:05,740 --> 00:36:06,740
And then

480
00:36:07,800 --> 00:36:11,540
after that, you also have, GPU training.

481
00:36:11,930 --> 00:36:16,540
GPU training became very popular
in the past maybe 10 years or So.

482
00:36:16,850 --> 00:36:22,931
And there is, Huge game changer because
as deep neural networks became larger

483
00:36:22,931 --> 00:36:30,671
and larger, It's very hard to train
them, without, extra help, right?

484
00:36:30,721 --> 00:36:32,231
When you train on CPU.

485
00:36:32,411 --> 00:36:35,750
CPU is a general purpose
kind of processor.

486
00:36:36,030 --> 00:36:38,060
you have to do many things on it.

487
00:36:38,430 --> 00:36:40,690
But, GPU is specialized.

488
00:36:40,925 --> 00:36:43,895
So you can do machine
learning jobs much faster.

489
00:36:44,915 --> 00:36:47,435
and of course, we also have more and more.

490
00:36:47,990 --> 00:36:53,260
training data available, and
that also is necessary for

491
00:36:53,610 --> 00:36:55,380
large language models to work.

492
00:36:55,650 --> 00:37:01,490
it takes time, but I think, the past
20 years or so, we suddenly have,

493
00:37:01,730 --> 00:37:04,890
everything come together to make it work,

494
00:37:04,940 --> 00:37:09,550
basically, we've got gamers to
thank for their breakthroughs in AI

495
00:37:10,060 --> 00:37:14,770
because of the graphic cards, the
GPUs that they requested, right?

496
00:37:15,360 --> 00:37:19,990
you have a very good point, I
think GPU was originally designed

497
00:37:20,050 --> 00:37:21,420
for gaming purpose, right?

498
00:37:22,210 --> 00:37:26,620
And then suddenly right now, it has
a completely different purpose, And

499
00:37:26,690 --> 00:37:33,970
I have several GPUs at home, not very
powerful I think it's powerful enough

500
00:37:33,980 --> 00:37:36,910
for me to experiment on different models.

501
00:37:37,010 --> 00:37:39,640
It costs maybe several hundred
dollars, thousand dollars.

502
00:37:40,140 --> 00:37:41,430
I have three of them.

503
00:37:41,960 --> 00:37:43,500
Two of them are from my son.

504
00:37:43,520 --> 00:37:45,160
My son was playing video games.

505
00:37:45,590 --> 00:37:48,340
And then now he doesn't use
those computers anymore.

506
00:37:48,340 --> 00:37:49,940
And then he just gave it to me.

507
00:37:49,940 --> 00:37:53,720
And then I just simply take
them out and use it for my own,

508
00:37:53,780 --> 00:37:56,120
But the cost is not that much.

509
00:37:56,170 --> 00:37:59,310
the cost is not that much unless
you go for like the top of the line

510
00:37:59,310 --> 00:38:05,620
80 gig ones, which are very hard to
come by and also quite expensive.

511
00:38:06,420 --> 00:38:08,590
Yeah, so thank you gamers.

512
00:38:08,650 --> 00:38:13,090
Thank you for enabling the
AI revolution in many ways.

513
00:38:13,090 --> 00:38:15,730
it goes back to what I was
saying about how random some

514
00:38:15,760 --> 00:38:17,190
of these things seem to be.

515
00:38:17,790 --> 00:38:19,780
so where do you think, we're heading?

516
00:38:19,870 --> 00:38:24,390
Like you said, the future is notoriously
difficult to predict, obviously.

517
00:38:24,930 --> 00:38:30,380
But, if you were still going to
venture and make a guess, that will

518
00:38:30,380 --> 00:38:33,880
probably prove completely wrong a
few years down the line, where do you

519
00:38:33,880 --> 00:38:35,360
think we're heading with all of this?

520
00:38:36,628 --> 00:38:43,278
if I had to venture to guess The
large language models will become even

521
00:38:43,278 --> 00:38:49,188
more powerful in the near future, not
only in terms of generating, cohesive

522
00:38:49,788 --> 00:38:57,208
text, but also generating images,
generating videos and also Multimodal

523
00:38:57,208 --> 00:38:59,218
models will become very popular.

524
00:38:59,268 --> 00:39:04,018
Okay, you can generate not only
images, text, you can also generate

525
00:39:04,018 --> 00:39:06,108
audio, video, sound, and so forth.

526
00:39:06,748 --> 00:39:11,293
other than that, I think, it really
depends on, which breakthrough will

527
00:39:11,293 --> 00:39:13,483
come through in the near future.

528
00:39:13,893 --> 00:39:18,303
And you never know if there's just one
day suddenly is huge breakthrough, and

529
00:39:18,303 --> 00:39:24,453
then they'll completely, change the
landscape of ai, just like what the

530
00:39:24,453 --> 00:39:26,703
ChatGPT did a couple years ago, right?

531
00:39:26,703 --> 00:39:29,463
the future is very exciting,
but at the same time, like you

532
00:39:29,463 --> 00:39:30,963
said, it's very hard to predict.

533
00:39:31,443 --> 00:39:37,778
But, I think right now is a very
fortunate time, a very exciting

534
00:39:38,048 --> 00:39:41,558
time for, tech enthusiasts.

535
00:39:41,988 --> 00:39:47,263
for anybody who is passionate about
ai, about technology, is very exciting.

536
00:39:47,890 --> 00:39:49,520
So two follow up questions then.

537
00:39:49,620 --> 00:39:54,530
one it's, like anything else,
there are these fashion waves

538
00:39:54,570 --> 00:39:56,590
that kind of, come and go.

539
00:39:57,140 --> 00:40:00,870
and AI is now the latest hottest thing.

540
00:40:00,880 --> 00:40:04,260
So all the VCs, everybody's
throwing money at it.

541
00:40:05,210 --> 00:40:08,700
But at some point people will probably
move on to the next thing, just like

542
00:40:08,700 --> 00:40:12,930
they did with crypto and smartphones and
internet and whatever else before, right?

543
00:40:13,600 --> 00:40:18,930
So I'm wondering, where do you think
we are in that, hype cycle, and what's

544
00:40:18,930 --> 00:40:24,675
going to happen when all of a sudden
slapping AI-first on your startup, no

545
00:40:24,675 --> 00:40:27,225
longer make sure that you get funding.

546
00:40:27,645 --> 00:40:29,515
So that's question number one, follow up.

547
00:40:30,425 --> 00:40:35,095
and then the second question is, if
you were to plot, a graph of how you

548
00:40:35,095 --> 00:40:39,455
expect, the large language models
to continue developing, I think we

549
00:40:39,455 --> 00:40:42,715
can all agree that there are some
kind of like very exponential growth

550
00:40:42,715 --> 00:40:46,995
where somebody figured out, ChatGPT
or one of those massive models.

551
00:40:47,495 --> 00:40:52,775
If you throw enough data at it, and
you massage it for long enough, you

552
00:40:52,775 --> 00:40:56,795
can create this impression of, 'oh,
this is magic, how on earth is that

553
00:40:56,795 --> 00:41:00,515
even happening?' But then, at some
point it has to plateau, right?

554
00:41:00,615 --> 00:41:05,045
it's not possible for it to go, at
that kind of speed, into the sky.

555
00:41:05,650 --> 00:41:06,210
Feeling.

556
00:41:06,910 --> 00:41:09,520
Again, it's hard to predict the sense.

557
00:41:09,950 --> 00:41:15,570
course, all the usual disclaimers
about predictions, but what's your take

558
00:41:15,570 --> 00:41:19,350
on what it means about us as humans?

559
00:41:19,820 --> 00:41:26,780
Does it mean that what we, cherish
as one of the unique capabilities

560
00:41:26,800 --> 00:41:28,930
of humans, the human intelligence?

561
00:41:29,905 --> 00:41:34,755
it's not actually all that unique,
because it's hard to not have this

562
00:41:34,755 --> 00:41:37,785
feeling when you talk to one of
those big large language models

563
00:41:37,785 --> 00:41:42,115
and, during the time it doesn't go
haywire and start behaving weird,

564
00:41:42,145 --> 00:41:44,165
but on the times where it works well.

565
00:41:44,590 --> 00:41:48,690
It's really hard to not have this
impression that you're talking to somebody

566
00:41:48,690 --> 00:41:51,150
with, some amount of intelligence to it.

567
00:41:51,510 --> 00:41:56,700
So does it mean that we're all some
kind of statistical models and the

568
00:41:56,800 --> 00:42:01,740
intelligence that we demonstrate
is also an emerging property?

569
00:42:01,830 --> 00:42:03,080
What's your take on that?

570
00:42:04,110 --> 00:42:07,780
I don't think, many people
in the world right now have a

571
00:42:07,800 --> 00:42:09,450
good answer to that question.

572
00:42:09,501 --> 00:42:14,041
that said, I do want to point
out that there are many people

573
00:42:14,041 --> 00:42:17,291
right now have concerns about, AI.

574
00:42:17,891 --> 00:42:24,821
Because of the potential damage it can
do, so it's all about the objective

575
00:42:24,831 --> 00:42:32,501
function, So if you give a task to the
model and, in terms of the last function,

576
00:42:32,521 --> 00:42:38,291
and then you can just try it again and
again, and eventually it will become

577
00:42:38,291 --> 00:42:43,961
very good at, whatever objective you
want the model to do so that is good,

578
00:42:44,001 --> 00:42:48,381
but at the same time, it can be bad
the AI may not even know it, right?

579
00:42:48,381 --> 00:42:51,591
It's just trying to
accomplish a certain goal.

580
00:42:51,791 --> 00:42:55,951
It just happens that a human being
is standing in the way of that goal.

581
00:42:56,121 --> 00:43:00,098
so in that sense, I do think that,
Human beings need to be careful.

582
00:43:00,138 --> 00:43:06,688
I think like AI needs to be,
regulated in to some degree.

583
00:43:07,198 --> 00:43:11,158
we cannot let it to, do whatever it wants.

584
00:43:11,158 --> 00:43:13,568
It may have serious.

585
00:43:14,158 --> 00:43:16,878
negative consequences to human beings,

586
00:43:17,345 --> 00:43:22,135
I think that a lot of what you just
described has been the main kind of

587
00:43:22,135 --> 00:43:27,525
concern for everybody making sci-fi
movies from the Terminator and Skynet

588
00:43:27,595 --> 00:43:32,515
And, I certainly get that, but I
think I'm probably more worried about.

589
00:43:32,565 --> 00:43:37,555
going back to what we said about, you
won't be losing your job to AI, you'll

590
00:43:37,555 --> 00:43:42,595
be losing your job to someone using,
an AI, I think this probably applies

591
00:43:42,605 --> 00:43:49,495
here too, that you can just do, as an
enabler, it scales up the amount of

592
00:43:49,495 --> 00:43:57,535
damage that, nefarious party can actually,
produce, because using that to bad ends.

593
00:43:57,575 --> 00:44:02,465
a lot of the security that we
rely on is practical, right?

594
00:44:02,525 --> 00:44:07,405
Like for example, all the encryption
keys that we use for everything are, only

595
00:44:07,405 --> 00:44:11,635
because it would be computationally too
expensive to actually figure that out.

596
00:44:11,675 --> 00:44:16,905
But then when you've got tools like
this, it's easy to be scared about the

597
00:44:16,935 --> 00:44:22,150
possibility of that figuring out, and
making things possible, that previously

598
00:44:22,450 --> 00:44:28,130
weren't, so I think I'm more worried
about that scenario, where someone uses

599
00:44:28,170 --> 00:44:33,580
the AI to bad ends and it enables them
to do more damage that they would be

600
00:44:33,790 --> 00:44:35,710
able to do with traditional methods,

601
00:44:36,390 --> 00:44:41,200
even in the current stage, if
AI falls into the wrong hands,

602
00:44:41,220 --> 00:44:43,840
it can do a lot of damage.

603
00:44:43,900 --> 00:44:49,300
not that catastrophic, but it can do a
lot of damage to a lot of families, right?

604
00:44:49,300 --> 00:44:55,050
I think, There were like stories about,
people use the generative AI to create

605
00:44:55,100 --> 00:45:01,280
a fake phone call to their parents and,
demand a ransom money so I think it

606
00:45:01,280 --> 00:45:07,590
causes, financial damage and also a lot
of emotional distress, like fake news.

607
00:45:08,625 --> 00:45:12,775
Fake video, a lot of deep fake
stuff, so even at this stage I

608
00:45:12,825 --> 00:45:16,765
think you can do a lot of harm
if you fall into the wrong hands

609
00:45:16,928 --> 00:45:19,298
Yeah, that's a very good
example of the call.

610
00:45:19,308 --> 00:45:24,198
Like you can technically go and call
people and scam and, people do that,

611
00:45:24,518 --> 00:45:27,838
but there is a limit to how many people
you can physically call in a day.

612
00:45:28,628 --> 00:45:33,463
If on the other hand, you have a powerful
enough AI, you can scale it up and

613
00:45:33,463 --> 00:45:37,708
you can probably call everybody in the
United States, a certain amount of times.

614
00:45:39,003 --> 00:45:39,243
That's

615
00:45:39,368 --> 00:45:45,033
you concerned about  the AI
involvement in the upcoming election.

616
00:45:46,073 --> 00:45:51,743
so we have to be careful, but I think
so far the impact that it's limited.

617
00:45:52,033 --> 00:45:56,833
but at the same time, I think all
the parties, politicians need to

618
00:45:57,243 --> 00:45:59,613
pay attention to generative AI.

619
00:45:59,998 --> 00:46:05,328
Because of what it can do,
fake news and so forth.

620
00:46:06,028 --> 00:46:10,458
imagine you are running a
political campaign, right?

621
00:46:10,478 --> 00:46:17,618
You must, get to know, analytics, how
AI can influence your campaign either

622
00:46:17,808 --> 00:46:25,558
positively or negatively, if your team
can utilize AI, uh, to, Strengthen

623
00:46:25,558 --> 00:46:30,858
your position legally, you're in a
very good, position, it can help you,

624
00:46:30,898 --> 00:46:36,178
but on the other hand, if you're not
careful, your opponents or somebody can

625
00:46:36,178 --> 00:46:43,658
use deepfake to disrupt your campaign
for your cause that's why I think AI

626
00:46:43,688 --> 00:46:46,968
is so powerful and also so widespread.

627
00:46:47,288 --> 00:46:54,388
It affects every single industry in the
economy, not just a few isolated sectors.

628
00:46:54,758 --> 00:46:56,738
that's very unique.

629
00:46:57,098 --> 00:46:57,838
About AI.

630
00:46:58,543 --> 00:47:04,243
Did you hear about the Elon Musk lawsuit
against OpenAI from a few days ago?

631
00:47:04,943 --> 00:47:08,683
obviously OpenAI initially started
as an alternative to the big

632
00:47:08,733 --> 00:47:12,813
companies, and the massive labs
like Google, Facebook and so on.

633
00:47:13,503 --> 00:47:18,073
And their pitch and the initial
mission statement was to

634
00:47:18,183 --> 00:47:20,393
release everything open source.

635
00:47:20,473 --> 00:47:22,153
Now, hence the name OpenAI.

636
00:47:23,178 --> 00:47:28,158
And then somewhere along the way, that
turned and it's currently a for profit,

637
00:47:28,618 --> 00:47:33,598
closed source company, worth, what,
under a hundred billion at the moment.

638
00:47:34,128 --> 00:47:37,548
we're recording this on March
the 4th, a few days ago.

639
00:47:37,628 --> 00:47:43,898
Elon Musk, opened this lawsuit, where he
alleges that, he was basically scammed

640
00:47:43,998 --> 00:47:49,368
because they turned the company around
and they went against the initial mission.

641
00:47:49,428 --> 00:47:53,208
And, I think the opinions on the
internet, vary from, 'okay, this is

642
00:47:53,578 --> 00:47:59,638
jealousy', because he's jealous of,
of the success that open AI has seen.

643
00:48:00,263 --> 00:48:03,313
To, 'okay, this is a nice publicity stand.

644
00:48:03,353 --> 00:48:06,503
he probably has a point, but
this is probably not going

645
00:48:06,503 --> 00:48:07,833
to start standing court'.

646
00:48:08,483 --> 00:48:13,173
and I'm trying to make sense of, how
much of that is actually valid and

647
00:48:13,173 --> 00:48:18,293
how much I should be worried about
OpenAI being, at the forefront of

648
00:48:18,293 --> 00:48:20,793
this, a big closed source company.

649
00:48:21,313 --> 00:48:26,673
I also heard that, many years ago when
Elon Musk and the Sam Altman co-founded

650
00:48:26,693 --> 00:48:33,403
the OpenAI, their objective, was,
a nonprofit organization, Given the

651
00:48:33,428 --> 00:48:41,838
competition from other big players in
the industry, I think OpenAI was under

652
00:48:41,888 --> 00:48:51,388
pressure to commercialize ChatGPT and this
may go against the original objective so

653
00:48:51,758 --> 00:48:53,908
I can see the argument from both sides.

654
00:48:53,958 --> 00:48:59,388
on the one hand, we have to be careful
like we just discussed about the use of,

655
00:49:00,178 --> 00:49:07,618
AI that may lead to, the end of humanity
as we know it, if we're not careful.

656
00:49:07,618 --> 00:49:11,018
But at the same time, if
we use that properly, I.

657
00:49:11,543 --> 00:49:18,613
It can be a great tool, that's why there
is such a great market for, generative AI,

658
00:49:18,663 --> 00:49:24,013
so I think there is some tension, within
the company, so you have different views.

659
00:49:24,203 --> 00:49:28,783
that's why, I think, a few months
ago, within several days, Altman

660
00:49:29,013 --> 00:49:32,613
was fired and then get hired
back and so on and so forth.

661
00:49:33,083 --> 00:49:37,233
in the background, I think it's really
just those two forces at play, so

662
00:49:37,253 --> 00:49:44,153
the force wants to make sure that, AI
does not go out of control, harm human

663
00:49:44,153 --> 00:49:50,923
beings and at the same time, there is
huge pressure from, industry peers to

664
00:49:51,418 --> 00:49:56,638
Commercialize those applications to make
profits, Actually I'm glad that, Elon

665
00:49:56,638 --> 00:50:02,918
Musk actually made the lawsuit in the
sense that it may, swing the pendulum

666
00:50:02,938 --> 00:50:07,408
to the other side so eventually what
I think, uh, the view that we should

667
00:50:07,413 --> 00:50:13,348
commercialize and make money out of it, I
think that kind of view prevailed, right?

668
00:50:13,348 --> 00:50:20,738
that's why Sam Altman got hired back,
but that can go too far, because, in the

669
00:50:20,758 --> 00:50:29,018
process of competition, making profits,
you may sacrifice security, so I think,

670
00:50:29,918 --> 00:50:36,538
the lawsuit by Elon Musk can potentially
put the original mission in check.

671
00:50:36,858 --> 00:50:44,688
So to speak, and maybe, force OpenAI
and other tech companies to think

672
00:50:44,708 --> 00:50:49,433
more about, guardrails around,
AI to make sure It doesn't go out

673
00:50:49,433 --> 00:50:53,153
of control and harm human beings,

674
00:50:53,153 --> 00:50:59,083
time will tell if anything comes out
of it other than, one billionaire

675
00:50:59,123 --> 00:51:01,653
being upset at another, but we'll see.

676
00:51:02,693 --> 00:51:06,033
So I'm going to ask you for one
more prediction, and this time

677
00:51:06,053 --> 00:51:07,543
a little bit more down-to-earth.

678
00:51:08,343 --> 00:51:08,723
Pytorch.

679
00:51:09,593 --> 00:51:14,433
It appears to be still on the rise
and, it appears to be the kind of

680
00:51:14,503 --> 00:51:17,443
go-to option for any new papers.

681
00:51:18,013 --> 00:51:21,013
TensorFlow seems to be,
stagnating a little bit.

682
00:51:22,093 --> 00:51:25,483
you talked a little bit about
the advantages of PyTorch and

683
00:51:25,483 --> 00:51:26,923
why you chose it for your book.

684
00:51:27,023 --> 00:51:31,133
and, I'm wondering, do you see this
being like the prevailing platform?

685
00:51:31,693 --> 00:51:36,433
because now I think that the main kind
of breakthroughs for Pytorch was, you

686
00:51:36,433 --> 00:51:41,463
mentioned the GPU support, obviously,
and also the built in, backpropagation,

687
00:51:41,573 --> 00:51:45,963
right, the autograd now, the other
frameworks also provide the autograd.

688
00:51:46,443 --> 00:51:50,013
so I guess they're closing up the
gap a little bit in that respect, if

689
00:51:50,013 --> 00:51:54,063
you were to venture one more crazy
prediction, would you see Pytorch

690
00:51:54,283 --> 00:51:55,833
leading the way going forward?

691
00:51:56,648 --> 00:51:59,208
Are you going to update your
book in a couple of years to

692
00:51:59,238 --> 00:52:01,508
port it to some other framework?

693
00:52:02,613 --> 00:52:09,743
I think PyTorch is going to
prevail in the near future.

694
00:52:09,743 --> 00:52:13,297
So I mentioned this in my book.

695
00:52:13,297 --> 00:52:20,633
So what PyTorch does is, using a dynamic
computational graph, which means it

696
00:52:20,633 --> 00:52:26,503
creates, Computational graph on the fly
so that, it's faster, it's more flexible.

697
00:52:26,973 --> 00:52:31,593
TensorFlow is using static
computational graph.

698
00:52:31,643 --> 00:52:33,253
so it's slower.

699
00:52:33,793 --> 00:52:35,993
so that's the main difference.

700
00:52:36,063 --> 00:52:40,243
And, it affects the
training speed greatly.

701
00:52:40,533 --> 00:52:46,103
so in TensorFlow, you don't really have
to worry about which device you can use.

702
00:52:46,443 --> 00:52:50,873
it's all done at the backend
automatically by TensorFlow.

703
00:52:51,083 --> 00:52:56,853
but at a cost, If you have, an industry
scale Models, and then you have a lot

704
00:52:56,853 --> 00:53:04,503
of GPU and you do a huge calculation
Maybe the overhead is neglectable.

705
00:53:04,523 --> 00:53:11,433
doesn't affect things much but for a lot
of researchers it makes a huge difference

706
00:53:11,453 --> 00:53:17,393
because we already working with a lot
of toy models not huge, therefore If you

707
00:53:17,393 --> 00:53:23,023
use the PyTorch, there is a little bit
of inconvenience in the sense that you

708
00:53:23,023 --> 00:53:31,303
have to, specify whether to move this
tensor to GPU, and then once you are

709
00:53:31,303 --> 00:53:34,323
done with it, you have to, get it back.

710
00:53:35,723 --> 00:53:39,613
But the benefit is huge
because it, greatly.

711
00:53:40,783 --> 00:53:42,963
Increases the training speed.

712
00:53:43,353 --> 00:53:50,373
I think like at least for, small
players, regular readers, and also

713
00:53:50,393 --> 00:53:53,463
for researchers around the world.

714
00:53:53,523 --> 00:53:56,493
I think a PyTorch is much more convenient.

715
00:53:56,693 --> 00:53:57,733
It's much faster.

716
00:53:58,193 --> 00:54:03,383
And certain large corporations,
they may not care that much.

717
00:54:04,803 --> 00:54:09,793
for regular people PyTorch is much
more convenient, it's much faster

718
00:54:09,993 --> 00:54:12,753
and in the near term it may, win out.

719
00:54:13,393 --> 00:54:18,208
for anybody listening to this, I know
that if I haven't, read your book

720
00:54:18,208 --> 00:54:20,038
before, I would probably be on manning.

721
00:54:20,248 --> 00:54:21,528
com, looking at it.

722
00:54:21,588 --> 00:54:26,248
And then at some point I would reach
chapter 4, where you're walking

723
00:54:26,398 --> 00:54:32,568
us through building a network that
does, generation of anime faces.

724
00:54:33,908 --> 00:54:36,358
Which I thought was a pretty cool example.

725
00:54:37,663 --> 00:54:42,503
Can you give us a taste, for,
anybody who's going to be doing that?

726
00:54:42,603 --> 00:54:45,563
what's the training gonna look like?

727
00:54:45,713 --> 00:54:49,463
what data we're going to use, how
we're going to implement a network.

728
00:54:49,463 --> 00:54:54,523
And then in terms of training, what
kind of hardware you need for the

729
00:54:54,523 --> 00:54:59,293
training to be, quick, how much
time you need to, see for that.

730
00:54:59,423 --> 00:55:03,083
give us an idea whether this is something
that, someone who is comfortable with

731
00:55:03,083 --> 00:55:08,753
Python can just pick up on a Sunday, on a
random weekend and go through, or whether

732
00:55:08,753 --> 00:55:10,823
there's any extra prep that's needed.

733
00:55:11,813 --> 00:55:18,922
in order to train a GAN model
to produce the color images or

734
00:55:18,952 --> 00:55:24,367
for anime faces obviously you
need the training data, right?

735
00:55:24,417 --> 00:55:30,527
the research community has
a lot of human-created data

736
00:55:30,667 --> 00:55:33,467
for us to experiment on.

737
00:55:33,887 --> 00:55:38,367
So you can actually go to a
website, download the anime faces.

738
00:55:38,952 --> 00:55:44,942
I think tens of thousands of
them, and then you need to

739
00:55:44,952 --> 00:55:47,902
create two neural networks.

740
00:55:48,132 --> 00:55:54,212
One is the generator, one is the
discriminator, and the generator is

741
00:55:54,282 --> 00:56:01,867
trying to create an image that can pass
as real in front of the discriminator.

742
00:56:02,357 --> 00:56:07,357
you just train the model, many
rounds and then eventually you will

743
00:56:07,357 --> 00:56:13,307
see that the generator is able to
generate a anime face, which is very

744
00:56:13,307 --> 00:56:15,857
much the one from the training set.

745
00:56:16,347 --> 00:56:21,257
I want to mention that in order
to, generate, color images of

746
00:56:21,337 --> 00:56:25,297
human faces, you don't need to
use, convolutional neural networks

747
00:56:25,317 --> 00:56:27,197
because, we mentioned this earlier.

748
00:56:27,197 --> 00:56:30,797
So if you use a fully connected,
dense neural networks.

749
00:56:31,097 --> 00:56:35,797
There are just too many, parameters
and then the training will be too slow.

750
00:56:36,067 --> 00:56:40,057
So on the other hand, if you
use the convolutional neural

751
00:56:40,057 --> 00:56:42,255
networks, you localize the weights.

752
00:56:42,255 --> 00:56:47,077
So the weights will stay the
same in a filter and then you

753
00:56:47,077 --> 00:56:49,487
move the filter around the image.

754
00:56:49,667 --> 00:56:54,117
So there's a way of greatly reduce the
number of parameters in the model and

755
00:56:54,147 --> 00:56:56,927
make the model training much faster.

756
00:56:57,887 --> 00:57:00,997
this is on the software
side, on the training side.

757
00:57:01,357 --> 00:57:12,022
In terms of hardware, so I trained it
using, GeForce RTX 2060, like a GPU.

758
00:57:12,342 --> 00:57:15,192
I think right now the cost is
three or four hundred bucks.

759
00:57:15,212 --> 00:57:19,612
It's not that expensive You can
easily buy it or if you have a older

760
00:57:19,652 --> 00:57:23,872
gaming computer, you can just grab
it and then put on your computer.

761
00:57:24,102 --> 00:57:27,332
It's very easy to do, you don't
really need a lot of knowledge

762
00:57:27,332 --> 00:57:29,352
about computer hardware to do it.

763
00:57:29,432 --> 00:57:33,993
Nowadays, computers are very user friendly
you can Just pop it open and, change

764
00:57:34,073 --> 00:57:35,403
ports, very fast, that kind of stuff.

765
00:57:35,883 --> 00:57:40,081
So it took me like, 30 minutes
to an hour to train the model.

766
00:57:40,331 --> 00:57:41,461
So it's very fast.

767
00:57:42,651 --> 00:57:49,091
However, if you don't really want to
bother with the GPU, you can train

768
00:57:49,091 --> 00:57:55,271
the same model with the CPU and,
what you can do is, you can simply

769
00:57:55,311 --> 00:58:00,486
leave your computer on all night it
may take, five, six or seven hours,

770
00:58:00,756 --> 00:58:02,946
but, it can be easily done overnight.

771
00:58:02,966 --> 00:58:09,556
You just leave the program on, go to
sleep, next morning, you see the result.

772
00:58:09,576 --> 00:58:13,085
so in that sense, computationally,
it's not that costly.

773
00:58:14,295 --> 00:58:22,330
I think the most complicated model, would
it be, chapter six, you have to convert,

774
00:58:22,330 --> 00:58:25,990
like a horse image into a zebra image.

775
00:58:26,370 --> 00:58:31,850
It's called, cycleGAN and then you have to
convert like a blonde hair to black hair

776
00:58:31,880 --> 00:58:34,700
in images or black hair to blonde hair,

777
00:58:35,020 --> 00:58:37,630
Those kind of models
are a little bit more.

778
00:58:38,020 --> 00:58:42,740
Time consuming, because you are
using higher resolution, number one.

779
00:58:42,740 --> 00:58:48,920
Number two, you are actually training
two generators and two discriminators.

780
00:58:49,090 --> 00:58:55,580
Okay, so what, how CycleGAN works is that,
you have two generators, let's use a horse

781
00:58:55,581 --> 00:59:00,080
and a zebra as the example, how to convert
a horse image to a zebra image, right?

782
00:59:00,240 --> 00:59:01,490
So you have two generators.

783
00:59:01,685 --> 00:59:07,925
One generator is called a horse generator,
the other one is called a zebra generator.

784
00:59:08,885 --> 00:59:15,535
So what horse generator does is
that it takes in a zebra image

785
00:59:15,845 --> 00:59:18,805
and convert it into a horse image.

786
00:59:19,275 --> 00:59:25,565
And then what is a zebra generator
does is that it will Take a horse

787
00:59:25,570 --> 00:59:27,575
image and convert it into a zebra.

788
00:59:28,385 --> 00:59:30,755
And then you also have two discriminators.

789
00:59:30,885 --> 00:59:38,265
the horse discriminator will tell whether
an image is a horse image or not, and

790
00:59:38,265 --> 00:59:45,185
then the zebra discriminator will tell
if an image, if is a zebra image or not.

791
00:59:45,365 --> 00:59:51,810
and then, cycleGAN has another
element a loss function has a

792
00:59:51,860 --> 00:59:53,940
component called a cycle loss.

793
00:59:54,020 --> 00:59:54,740
So what do you do?

794
00:59:54,870 --> 00:59:57,830
So I think the idea is really Ingenious.

795
00:59:57,860 --> 01:00:03,505
that's why I mentioned that with the right
loss function you can't show anything.

796
01:00:03,605 --> 01:00:06,385
originally you have a horse image, right?

797
01:00:06,575 --> 01:00:15,501
And then you give that image to a zebra
generator to create a fake Zebra image.

798
01:00:16,241 --> 01:00:16,531
Okay.

799
01:00:16,591 --> 01:00:25,551
Now, you will use that fake zebra image
as input to the horse generator, and ask

800
01:00:25,591 --> 01:00:35,181
the horse generator to convert the fake
zebra image into a fake horse image.

801
01:00:35,301 --> 01:00:41,931
now here is the key if both
generators do their job right, then

802
01:00:42,891 --> 01:00:49,711
the fake horse image you got will
be Identical to the original horse

803
01:00:49,711 --> 01:00:52,141
image You so that's called a cycleGAN.

804
01:00:52,164 --> 01:00:56,444
cycle loss is trying to
minimize the loss between

805
01:00:58,194 --> 01:01:04,044
the original horse image and the
fake horse image after a round trip.

806
01:01:04,944 --> 01:01:10,484
That's a very powerful tool because
that forces the model, both models,

807
01:01:10,554 --> 01:01:17,399
both the zebra generator and the horse
generator to generate realistic Images.

808
01:01:17,849 --> 01:01:22,949
so since your show is called
HockeyStick I think that's like when

809
01:01:22,949 --> 01:01:27,289
I was like trying to experiment the
different models I think that is

810
01:01:27,369 --> 01:01:29,479
pretty much like a hockeystick moment.

811
01:01:29,539 --> 01:01:34,929
When I saw that, I was like, this
is like a psycho loss is really

812
01:01:34,969 --> 01:01:43,124
ingenious because that component in
the loss function is crucial for you to

813
01:01:43,124 --> 01:01:48,494
successfully convert a horse image into
a zebra and a zebra image into a horse.

814
01:01:49,034 --> 01:01:54,694
When I saw that I was completely
amazed not just by how well the model

815
01:01:54,694 --> 01:02:01,824
works, but also by, the, ingenious
mechanism, devised by the researchers.

816
01:02:01,834 --> 01:02:05,094
again, there are tons of smart
people in the profession.

817
01:02:05,354 --> 01:02:11,239
So sometimes I see what they are
doing, and once I understand what they

818
01:02:11,239 --> 01:02:13,709
are doing, I was completely amazed.

819
01:02:13,709 --> 01:02:18,379
I said, this method is amazing, the
author must be a genius, I think there

820
01:02:18,479 --> 01:02:20,349
are tons of geniuses in our profession.

821
01:02:22,344 --> 01:02:23,264
Love that story.

822
01:02:23,294 --> 01:02:27,614
And also FYI, I'm totally
stealing the quote from you

823
01:02:27,724 --> 01:02:29,894
with the right loss function.

824
01:02:30,014 --> 01:02:31,124
You can achieve anything.

825
01:02:31,124 --> 01:02:32,614
I think this should go on a t shirt.

826
01:02:34,199 --> 01:02:34,839
That's right, yeah.

827
01:02:35,319 --> 01:02:37,459
with the right loss function,
you can achieve anything.

828
01:02:38,529 --> 01:02:43,194
That's my belief, the concept of
the loss function is very powerful.

829
01:02:44,059 --> 01:02:47,639
so loss function is another way of
saying the objective function, right?

830
01:02:47,639 --> 01:02:51,819
you are telling the model what to
achieve, what to do, it's very powerful.

831
01:02:52,284 --> 01:02:59,484
Yeah, I think what keeps striking me is
that once you go and look into this ideas.

832
01:02:59,744 --> 01:03:04,284
They're not actually that complicated,
there's not too much magic in it, but

833
01:03:04,324 --> 01:03:09,224
to come up with that idea initially,
be the first one to propose that it

834
01:03:09,224 --> 01:03:11,974
does require certain a level of genius.

835
01:03:12,544 --> 01:03:16,664
So I think, probably decades from
now, kids will be learning a lot

836
01:03:16,664 --> 01:03:20,274
of that stuff in primary school
or early in their education.

837
01:03:20,864 --> 01:03:26,074
And it just feels like we're really
experiencing some kind of breakthrough

838
01:03:26,114 --> 01:03:28,934
in this profession, a hockeystick moment.

839
01:03:30,387 --> 01:03:31,037
Absolutely.

840
01:03:31,037 --> 01:03:34,667
it's good that a lot of smart
researchers are working in the field.

841
01:03:34,997 --> 01:03:39,177
And sometimes when you get
stuck on a question, you may

842
01:03:39,297 --> 01:03:40,997
work on it for years, right?

843
01:03:40,997 --> 01:03:46,207
Without any breakthrough, and then
suddenly, last year, like a strong line,

844
01:03:46,627 --> 01:03:51,687
year after year, suddenly, there is a
aha moment, and then you figure out the

845
01:03:51,707 --> 01:03:54,667
way to tackle the problem and it worked.

846
01:03:54,957 --> 01:03:59,712
And then it's a method may
become revolutionary, it may

847
01:03:59,972 --> 01:04:01,582
completely change the field

848
01:04:03,102 --> 01:04:05,372
You're about to finish, your book.

849
01:04:05,912 --> 01:04:08,672
Is there anything that you
would do differently if you

850
01:04:08,672 --> 01:04:10,702
were starting to write it today?

851
01:04:11,722 --> 01:04:13,632
Would you make any different choices?

852
01:04:15,096 --> 01:04:16,236
Good question.

853
01:04:16,336 --> 01:04:20,156
I don't think there are
many things I would change.

854
01:04:20,476 --> 01:04:24,636
The reason is because even though
it's a new book, actually I have been

855
01:04:24,636 --> 01:04:34,486
working on it for a couple years now,
so I have, a GitHub, repository, before

856
01:04:34,546 --> 01:04:41,536
I, submit a proposal to manning so
it's my way of working things out.

857
01:04:41,566 --> 01:04:46,766
couple years ago I started to,
use PyTorch for machine learning

858
01:04:46,766 --> 01:04:49,021
models and I started to get into.

859
01:04:50,151 --> 01:04:59,301
generative AI, and then I started to,
use PyTorch to generate shapes, images,

860
01:04:59,521 --> 01:05:04,641
and then eventually I get into natural
language processing, large language

861
01:05:04,641 --> 01:05:07,571
models, and then I had a lot of projects.

862
01:05:08,411 --> 01:05:16,161
on my computer writing book, it's my
way of, organize things to, think things

863
01:05:16,161 --> 01:05:20,331
through to make sure everything works out.

864
01:05:21,051 --> 01:05:25,931
but I know that, in order to
write a compelling proposal.

865
01:05:26,951 --> 01:05:30,521
I need to, first prepare well, right?

866
01:05:30,521 --> 01:05:35,346
especially there are not too many
good publishers out there, so you only

867
01:05:35,346 --> 01:05:38,986
have one shot with a good publisher.

868
01:05:38,986 --> 01:05:42,786
like manning is one of
the great publishers.

869
01:05:42,846 --> 01:05:48,246
over the years I've read many books from
manning and, I really enjoyed their books

870
01:05:48,856 --> 01:05:57,316
and, I knew that I needed to write a
good proposal in order to work it out.

871
01:05:57,406 --> 01:05:59,026
I don't want to lose a chance.

872
01:05:59,606 --> 01:06:05,986
So what I did was, in the
summer, I spent several months to

873
01:06:06,026 --> 01:06:10,556
create a huge github repository.

874
01:06:11,006 --> 01:06:17,156
So I lay out all the chapters
initially, like the first draft, and

875
01:06:17,156 --> 01:06:24,636
it had 17 chapters and, each chapter
I use a Jupyter notebook to explain

876
01:06:24,726 --> 01:06:27,666
everything to the best of my ability.

877
01:06:27,866 --> 01:06:29,236
All the codes are there.

878
01:06:29,446 --> 01:06:31,596
So it's, pretty much like a book.

879
01:06:33,056 --> 01:06:40,206
Once I have that, then I spend
another month to convert it

880
01:06:40,206 --> 01:06:43,326
into an actual book, a PDF file.

881
01:06:43,686 --> 01:06:45,846
a lot of tech people use latex.

882
01:06:45,856 --> 01:06:49,896
Latex is A word processing
software, right?

883
01:06:49,946 --> 01:06:54,846
especially if you have a lot of math, you
can actually generate like a beautiful

884
01:06:54,946 --> 01:06:59,266
equation, my book has some like a
equation, some math, but not a whole lot.

885
01:06:59,736 --> 01:07:06,261
But, it forces me to go through
everything one more time, in

886
01:07:06,291 --> 01:07:13,181
the process of converting, the
GitHub repository into a PDF file.

887
01:07:14,041 --> 01:07:16,471
I spent a lot of months
converting everything.

888
01:07:17,061 --> 01:07:22,211
And also it looks beautiful
because, uh, it exactly like a book.

889
01:07:22,421 --> 01:07:27,921
you have a template, you have a cover,
you have, table of content, you have each

890
01:07:27,921 --> 01:07:33,271
chapter, what is section number, what
is section title, what is subsection so

891
01:07:33,271 --> 01:07:37,931
forth, you have images, in short, it's
pretty much like a book to be published.

892
01:07:37,951 --> 01:07:44,556
and then I sent that, to manning, in
the summer, along with the PDF file,

893
01:07:44,596 --> 01:07:51,196
along with the, proposal file, and
then I have a link to the GitHub page.

894
01:07:51,646 --> 01:07:57,566
And then what manning did was send
the book proposal to more than

895
01:07:57,566 --> 01:07:59,716
10 reviewers in the profession.

896
01:07:59,726 --> 01:08:07,196
The reviewers are all data scientists,
people who know, AI in the profession,

897
01:08:07,486 --> 01:08:12,926
and they give comments on whether, this
book should be published And then they

898
01:08:12,936 --> 01:08:15,241
give a lot of, very valuable feedback.

899
01:08:16,056 --> 01:08:21,466
the feedback was very positive, partly
because it's a hot topic, partly because

900
01:08:21,466 --> 01:08:23,966
I spent a lot of time preparing it, right?

901
01:08:24,296 --> 01:08:27,226
but I did receive a lot of good feedback.

902
01:08:27,256 --> 01:08:32,676
to answer your question is because I
have been through the several rounds.

903
01:08:32,936 --> 01:08:38,636
now, there's not much I would change,
because I have already incorporated,

904
01:08:38,836 --> 01:08:45,256
some feedbacks, great feedbacks from
about the 12, reviewers on the proposal.

905
01:08:45,706 --> 01:08:46,006
Fair enough.

906
01:08:46,454 --> 01:08:48,454
How many copies have you sold so far?

907
01:08:49,596 --> 01:08:51,946
it's already sold more
than a thousand copies.

908
01:08:52,046 --> 01:08:56,556
I think like it's a daily high was 58.

909
01:08:56,806 --> 01:09:05,526
So it says a lot about the demand for,
generative AI and if you look at the,

910
01:09:05,576 --> 01:09:13,276
the top 10, from manning website every
week, you will see generative AI is hot.

911
01:09:13,876 --> 01:09:14,956
a lot of demand.

912
01:09:14,956 --> 01:09:18,726
And another trend is, Python PyTorch.

913
01:09:18,796 --> 01:09:24,066
I think that's, a lot of people
are switching to PyTorch and, I

914
01:09:24,066 --> 01:09:27,316
think there is a book from Manning
called, "Deep Learning with PyTorch".

915
01:09:27,316 --> 01:09:29,246
It's selling very well.

916
01:09:29,526 --> 01:09:33,476
And then there's another book called,
"Large Language Models from Scratch".

917
01:09:33,496 --> 01:09:37,616
actually the book is also
using PyTorch just as I do.

918
01:09:37,776 --> 01:09:41,261
But it's that just that focuses
on large language models, but in

919
01:09:41,261 --> 01:09:46,591
my book focus on many different
contents like large language models.

920
01:09:47,031 --> 01:09:54,341
music, images, shapes, numbers
And then another thing I want to

921
01:09:54,391 --> 01:10:00,771
mention is that, I did spend a lot
of time thinking about, how to help

922
01:10:01,576 --> 01:10:05,846
readers learn progressively, step by step.

923
01:10:06,396 --> 01:10:09,966
chapter one, of course, is an
overview of the book of the,

924
01:10:10,106 --> 01:10:14,236
generative AI landscape and, what
is the book is trying to accomplish.

925
01:10:14,726 --> 01:10:18,406
Chapter two, it's a deep
learning with PyTorch.

926
01:10:18,616 --> 01:10:20,276
So even if readers.

927
01:10:21,261 --> 01:10:24,441
Have no background using PyTorch.

928
01:10:24,821 --> 01:10:29,011
after reading chapter two, they
will be able to use, pyTorch to

929
01:10:29,011 --> 01:10:31,021
create, deep learning models.

930
01:10:31,301 --> 01:10:35,306
from A to Z you have you
can do the whole thing.

931
01:10:35,396 --> 01:10:35,816
Okay?

932
01:10:36,146 --> 01:10:37,436
So that's very important.

933
01:10:37,586 --> 01:10:41,136
And then chapter three, we get into GANs.

934
01:10:41,316 --> 01:10:45,186
So you will use, GANs to
generate, numbers and the shapes.

935
01:10:45,426 --> 01:10:47,696
So the models are very simple.

936
01:10:47,696 --> 01:10:51,626
you only have a two or three
layers, of neurons in those models.

937
01:10:51,636 --> 01:10:53,986
So therefore, it's very
easy to understand.

938
01:10:54,246 --> 01:10:59,326
It's easy to create, and the
training takes a matter of minutes.

939
01:10:59,656 --> 01:11:03,786
readers will not get, frustrated
because everything is so simple.

940
01:11:04,186 --> 01:11:05,906
And then in chapter four,

941
01:11:07,086 --> 01:11:08,266
I kick things up a notch

942
01:11:09,546 --> 01:11:17,476
so instead of using fully connected
dense layers, I use convolutional layers

943
01:11:17,886 --> 01:11:20,766
that's needed for image processing.

944
01:11:20,836 --> 01:11:26,196
If you want to create a high
resolution color images, fully

945
01:11:26,236 --> 01:11:29,756
connected dense layers won't work
It may work, but it's very slow.

946
01:11:30,196 --> 01:11:33,336
On the other hand, if you use
convolutional layers, it's much faster

947
01:11:33,376 --> 01:11:39,767
because you use filters, to move around
the image, and then you just train

948
01:11:39,797 --> 01:11:42,578
the weights in the filter itself.

949
01:11:42,578 --> 01:11:46,948
So that's much more efficient and
that kind of stuff and then so

950
01:11:46,988 --> 01:11:52,848
people learn to use the convolutional
layers in chapter four to generate

951
01:11:52,868 --> 01:11:59,128
the color image and then in chapter
five I kick things up another level.

952
01:11:59,758 --> 01:12:06,468
So readers learn to select
characteristics in images, you can

953
01:12:06,468 --> 01:12:12,348
choose to generate An image with
eyeglasses or without eyeglasses You

954
01:12:12,348 --> 01:12:17,028
can transition from an image with
glass to an image without glasses.

955
01:12:17,558 --> 01:12:22,178
So all those arithmetic kind of
stuff and then chapter six is not out

956
01:12:22,178 --> 01:12:27,498
yet, but I will do the cycleGAN is
computationally costly, because the

957
01:12:27,498 --> 01:12:31,948
reason I just mentioned it because they
have two generators, two discriminators.

958
01:12:32,638 --> 01:12:38,208
and then chapter seven is about,
variational auto encoders.

959
01:12:38,208 --> 01:12:40,208
that's a different model from GAN.

960
01:12:40,438 --> 01:12:45,328
that is important, because it has
a encoder-decoder architecture.

961
01:12:45,548 --> 01:12:47,248
We see it's very common.

962
01:12:47,493 --> 01:12:54,093
In machine learning models, for example,
ChatGPT is like a decoder-only model, the

963
01:12:54,133 --> 01:13:00,823
original transformer paper attention is
all you need has like an encoder part,

964
01:13:00,913 --> 01:13:02,693
and a decoder part that kind of stuff.

965
01:13:02,963 --> 01:13:08,283
And then after that, I get into
transformers, natural language processing,

966
01:13:08,303 --> 01:13:15,713
how to do tokenization, how to create
a transformer from scratch, including

967
01:13:15,723 --> 01:13:21,543
like a ChatGPT-style, you can create
a GPT from scratch, you can train it.

968
01:13:21,603 --> 01:13:25,523
I saw that you have, several
posts on LinkedIn about how to

969
01:13:25,523 --> 01:13:27,293
create a GPT from scratch, right?

970
01:13:27,343 --> 01:13:33,523
my book does exactly that in, chapter
10, how to create a GPT from scratch.

971
01:13:33,563 --> 01:13:39,653
And then chapter 11 is how to create a
small GPT from scratch and then train it.

972
01:13:39,838 --> 01:13:40,858
To generate text.

973
01:13:41,078 --> 01:13:46,218
its focus is not mainly on creating,
but on training a GPT from scratch.

974
01:13:46,248 --> 01:13:47,908
Of course, it's much smaller.

975
01:13:47,908 --> 01:13:50,278
It only has 5 million parameters.

976
01:13:50,618 --> 01:13:53,458
But you learn how to train
a model from scratch.

977
01:13:53,878 --> 01:14:00,088
and after that it's music generation and
then different models and then how you

978
01:14:00,118 --> 01:14:06,268
can use the langChain to chain together
different, large language models.

979
01:14:06,278 --> 01:14:08,368
So that's the whole book,

980
01:14:08,398 --> 01:14:11,648
it's been a real pleasure to talk to you.

981
01:14:11,698 --> 01:14:14,968
I'm personally super excited,
can't wait until the rest of

982
01:14:14,968 --> 01:14:16,438
the chapters become available.

983
01:14:16,438 --> 01:14:21,878
So, you know, hurry up
before I let you go.

984
01:14:22,378 --> 01:14:28,533
I'm curious whether you have your next
idea for your next book already in

985
01:14:28,533 --> 01:14:33,063
mind or whether you're going to take
a small break before book number four.

986
01:14:33,768 --> 01:14:38,538
So far I'm very busy with,
writing the current book.

987
01:14:39,048 --> 01:14:41,768
I do get ideas from time to time.

988
01:14:42,218 --> 01:14:48,573
One example is, I think this
is a text to Image, like a

989
01:14:48,573 --> 01:14:51,573
multimodal model thing, is amazing.

990
01:14:51,833 --> 01:14:56,143
I think, there could be another
book, there, just focused purely

991
01:14:56,183 --> 01:15:03,303
on diffusion models and, multimodal
transformers, how to convert a text

992
01:15:03,303 --> 01:15:06,323
to image, or convert, text to video,

993
01:15:06,373 --> 01:15:07,763
There could be a book there.

994
01:15:07,813 --> 01:15:12,363
I thought about it, but, I didn't spend
a lot of time on it because I'm busy

995
01:15:12,673 --> 01:15:18,823
writing the current book and the other,
idea I thought about is, so this is

996
01:15:18,823 --> 01:15:21,783
also related to multi modal models.

997
01:15:21,843 --> 01:15:24,223
my first book is called a
make a Python talk, right?

998
01:15:24,393 --> 01:15:30,843
But it's actually using Google
API to do the actual speech

999
01:15:30,843 --> 01:15:32,453
recognition, text to speech.

1000
01:15:32,843 --> 01:15:35,533
I don't do any machine learning part.

1001
01:15:35,783 --> 01:15:40,813
So I just use the Google API to do
all the heavy lifting But, there are

1002
01:15:41,093 --> 01:15:43,103
like open source models out there.

1003
01:15:43,103 --> 01:15:45,203
You can actually train a model.

1004
01:15:45,648 --> 01:15:50,988
To, do speech recognition, so that's
actually a multi modal model, right?

1005
01:15:50,988 --> 01:15:56,498
Because, speech recognition, basically the
input is, audio, output is text, right?

1006
01:15:56,808 --> 01:16:00,068
And then you can also do text to speech.

1007
01:16:00,138 --> 01:16:03,298
that can be another interesting project.

1008
01:16:03,308 --> 01:16:06,658
I have some ideas on how they
work, but I do have to spend

1009
01:16:06,758 --> 01:16:10,208
a lot of time to experiment.

1010
01:16:10,568 --> 01:16:15,998
so I would say in another two or
three years, I may venture into

1011
01:16:15,998 --> 01:16:19,008
one of those ideas and maybe
write another book about it.

1012
01:16:19,868 --> 01:16:20,308
Awesome.

1013
01:16:20,328 --> 01:16:23,418
you're going to have one reader
already interested in that.

1014
01:16:23,428 --> 01:16:24,918
So definitely go for it.

1015
01:16:25,440 --> 01:16:31,760
Okay, let me ask you then, which idea do
you like better, the speech recognition

1016
01:16:31,770 --> 01:16:38,150
model or, just a book about, text
to image, multimodal, transformer,

1017
01:16:38,590 --> 01:16:39,690
which idea do you like better?

1018
01:16:40,096 --> 01:16:45,046
I've been meaning to properly
read the whisper, paper.

1019
01:16:45,506 --> 01:16:50,946
So I think the speech, recognition
is actually a pretty good use

1020
01:16:51,116 --> 01:16:54,306
case, and I would definitely
be interested in reading that.

1021
01:16:55,033 --> 01:16:55,583
Good to know.

1022
01:16:55,603 --> 01:16:57,683
I may put more emphasis on that project.

1023
01:16:58,308 --> 01:16:58,768
Awesome.

1024
01:16:58,978 --> 01:16:59,528
the feedback.

1025
01:17:00,348 --> 01:17:00,708
All right.

1026
01:17:00,818 --> 01:17:01,678
thank you so much.

1027
01:17:01,728 --> 01:17:05,078
It's been a pleasure, and hopefully I'll
get you next time with your next book.

1028
01:17:05,558 --> 01:17:06,208
Thanks a lot.

1029
01:17:07,368 --> 01:17:07,978
Thank you.