1 00:00:00,240 --> 00:00:04,260 I'm Miko Pawlikowski and this is HockeyStick. 2 00:00:04,475 --> 00:00:08,940 Generative AI is on everyone's mind. 3 00:00:09,230 --> 00:00:13,430 From essays to photorealistic pictures to high quality videos it 4 00:00:13,430 --> 00:00:17,489 has changed the way we think about creativity and intelligence forever. 5 00:00:17,700 --> 00:00:22,009 If the AI won't steal your job, but somebody using AI will, then 6 00:00:22,009 --> 00:00:25,040 the best defense is to learn how this technology works ASAP. 7 00:00:25,470 --> 00:00:29,800 Today, I'm bringing you Mark Liu, the author of Learn Generative AI with 8 00:00:29,820 --> 00:00:33,860 PyTorch, a tenured finance professor and the founding director of the Master 9 00:00:33,860 --> 00:00:38,250 of Science in Finance program at the University of Kentucky and a veteran 10 00:00:38,279 --> 00:00:40,470 coder with over 20 years of experience. 11 00:00:40,699 --> 00:00:44,490 In this conversation, we'll talk about learning through doing, how everybody can 12 00:00:44,499 --> 00:00:48,560 build generative AI models, the various breakthroughs that allowed for the current 13 00:00:48,589 --> 00:00:52,789 AI explosion to take place, and make some wild predictions about the future. 14 00:00:53,570 --> 00:00:55,859 Welcome to this episode and please enjoy. 15 00:00:56,090 --> 00:00:56,840 How are you doing today? 16 00:00:58,170 --> 00:00:58,860 Pretty good. 17 00:00:59,019 --> 00:00:59,840 Thank you Miko. 18 00:01:00,010 --> 00:01:00,970 glad to be here. 19 00:01:01,229 --> 00:01:02,630 Yeah, I'm very excited. 20 00:01:02,700 --> 00:01:05,869 not only because I'm hoping to learn so many interesting things from 21 00:01:05,869 --> 00:01:10,540 your book, but also because I'm very curious, how does somebody who's a 22 00:01:10,540 --> 00:01:14,210 founding director of a master of science in finance and a tenured professor 23 00:01:14,210 --> 00:01:17,160 in finance, decide to go into AI. 24 00:01:17,700 --> 00:01:19,070 Tell us a little bit about your story. 25 00:01:19,237 --> 00:01:26,787 it goes back to, like five years ago, in 2017, our department wanted to launch 26 00:01:27,227 --> 00:01:29,507 a Master of Science in Finance program. 27 00:01:30,227 --> 00:01:34,247 And it is that point, I've been tenured for about five years. 28 00:01:34,982 --> 00:01:39,352 I was always, very adventurous, trying to do new things. 29 00:01:39,762 --> 00:01:45,832 I was appointed the founding director to start an academic 30 00:01:45,962 --> 00:01:47,662 graduate program from scratch. 31 00:01:48,722 --> 00:01:51,552 And, I was very much into it. 32 00:01:52,022 --> 00:01:53,422 it was a lot of work. 33 00:01:53,877 --> 00:01:55,727 But I thoroughly enjoyed it. 34 00:01:55,727 --> 00:01:59,907 So our program launched in fall of 2017. 35 00:02:00,447 --> 00:02:01,707 And it's a one year program. 36 00:02:02,897 --> 00:02:09,337 at the end of 2017, We started to, place our students. 37 00:02:09,507 --> 00:02:16,567 the very first year we had 30 students in the program, which is a great number. 38 00:02:17,267 --> 00:02:22,757 And, I talked to many employers, many companies, trying to 39 00:02:22,797 --> 00:02:25,157 place our MS Finance students. 40 00:02:25,597 --> 00:02:28,937 I heard the same thing again and again. 41 00:02:29,587 --> 00:02:35,162 they told me that they want somebody who not only knows finance, but also 42 00:02:35,162 --> 00:02:41,422 knows coding programming analytics and the number one programming language 43 00:02:41,652 --> 00:02:48,942 in finance is Python and I've been doing programming for many years 44 00:02:48,992 --> 00:02:53,782 So those are mainly, statistical, software to run regression 45 00:02:53,822 --> 00:02:55,302 for the finance research. 46 00:02:56,032 --> 00:03:04,062 And then I had to learn Python from scratch in order to teach my students. 47 00:03:04,112 --> 00:03:10,947 And it turns out that Python is a very user-friendly programming language, 48 00:03:11,227 --> 00:03:20,937 so even if you never programmed before, you can guess what a block 49 00:03:20,937 --> 00:03:23,487 of code is trying to accomplish. 50 00:03:24,117 --> 00:03:33,067 I started to run Python workshops to MS finance students and gradually I 51 00:03:33,377 --> 00:03:43,047 accumulated a lot of teaching notes and I also had to convince my students to 52 00:03:43,047 --> 00:03:47,697 use Python, because some of the students said that, "I can do everything in 53 00:03:47,697 --> 00:03:49,927 Excel, why should I learn Python", right? 54 00:03:50,297 --> 00:03:54,917 And then I told them that, Excel is not exactly a programming language, 55 00:03:55,337 --> 00:04:00,697 and you do need a programming language in order to automate things 56 00:04:01,037 --> 00:04:05,557 to make sense, more convenient, the bigger programs, that kind of stuff. 57 00:04:05,857 --> 00:04:13,527 So what I did was I started to create fun projects in finance, like speech 58 00:04:13,577 --> 00:04:15,767 recognition and text to speech. 59 00:04:16,097 --> 00:04:23,017 So one example would be I add those features to a finance calculator. 60 00:04:23,127 --> 00:04:28,727 what you can do is that you can actually speak to a computer, and ask the 61 00:04:28,727 --> 00:04:31,777 computer to do a finance calculation. 62 00:04:31,957 --> 00:04:38,187 you can tell the program in a human voice "what is the present value of 63 00:04:38,187 --> 00:04:44,317 $1000 in five years", And then the program will do the calculation and 64 00:04:44,377 --> 00:04:47,697 tell you the answer in a human voice. 65 00:04:48,177 --> 00:04:50,887 and then that caught a student's attention. 66 00:04:51,177 --> 00:04:54,257 So I started to do those kind of applications. 67 00:04:54,647 --> 00:04:58,657 And then after a year or so, I had plenty of projects. 68 00:04:59,027 --> 00:05:03,327 And then some students told me "you should write a book about it". 69 00:05:03,587 --> 00:05:10,232 So I started to, send the manuscript to no starch, press to publish the book. 70 00:05:10,602 --> 00:05:16,122 The moment my colleagues, or my students, or a lot of my friends, even my family 71 00:05:16,122 --> 00:05:21,792 members, heard that I'm writing a programming book, In Python about the 72 00:05:21,822 --> 00:05:27,552 speech recognition and the text to speech, their first reaction was, "I 73 00:05:27,582 --> 00:05:29,242 thought you were a finance professor" 74 00:05:29,572 --> 00:05:31,842 that question came up again and again. 75 00:05:32,172 --> 00:05:40,892 And then I gave them a famous quote by a chief risk officer from Deutsche Bank. 76 00:05:41,382 --> 00:05:45,472 "banks are essentially technology firms now". 77 00:05:46,402 --> 00:05:53,602 So there is a lot of truth in that because in order to be in the field of finance, 78 00:05:53,622 --> 00:05:59,972 you need to know a lot of technology, know programming, know, analytics and so forth. 79 00:06:00,712 --> 00:06:03,432 So that was my first book. 80 00:06:03,852 --> 00:06:06,292 in 2020, it's finally published in 2021. 81 00:06:07,102 --> 00:06:11,552 So I think I, signed a contract with them in 2019. 82 00:06:12,172 --> 00:06:14,912 And then after that, I. 83 00:06:14,912 --> 00:06:18,652 Started to, teach a course in the MS finance program. 84 00:06:19,242 --> 00:06:22,322 So it's called, Python, predictive analytics. 85 00:06:22,622 --> 00:06:25,602 so use Python to do machinery models. 86 00:06:26,092 --> 00:06:33,132 for business analytics, and, I started to, teach students a lot of machine learning 87 00:06:33,162 --> 00:06:36,452 models, including, deep neural networks. 88 00:06:37,112 --> 00:06:41,632 And then, again, I, accumulated a lot of, notes. 89 00:06:42,482 --> 00:06:43,522 And then, 90 00:06:43,582 --> 00:06:51,232 I came across a video from DeepMind, showing how you can actually play 91 00:06:51,582 --> 00:06:59,322 Atari games like, Breakout, by training a computer program to play 92 00:06:59,322 --> 00:07:02,602 the game, at a superhuman level. 93 00:07:02,972 --> 00:07:08,612 So what happened was, not only the computer program learned,, To play 94 00:07:08,632 --> 00:07:15,682 the game, it actually figured out a way to score very efficiently, a 95 00:07:15,682 --> 00:07:18,672 way human beings didn't know before. 96 00:07:18,822 --> 00:07:23,652 So you, dig a tunnel at the side of the wall, and then you send 97 00:07:23,692 --> 00:07:27,982 the ball to the back of the wall to score it very efficiently. 98 00:07:28,362 --> 00:07:31,842 When I saw that video, I was completely amazed. 99 00:07:32,392 --> 00:07:35,522 I told myself, "I gotta figure out how this works". 100 00:07:35,962 --> 00:07:42,212 I spent several months experimented with different kind of programs, 101 00:07:42,212 --> 00:07:44,202 trying to figure out how it works. 102 00:07:45,062 --> 00:07:47,852 And eventually I figured it out. 103 00:07:48,722 --> 00:07:51,492 And that became my second book. 104 00:07:51,812 --> 00:07:54,182 it's machine learning animated. 105 00:07:54,582 --> 00:07:59,072 So it's published with CRC Press, last year. 106 00:08:00,552 --> 00:08:09,407 And then, recently, once, ChatGPT was out, generative AI was very popular. 107 00:08:09,507 --> 00:08:11,397 I was very curious. 108 00:08:12,162 --> 00:08:17,542 I was trying to figure out how exactly a large language model 109 00:08:17,552 --> 00:08:25,552 works, and how a computer program can understand the human language. 110 00:08:26,072 --> 00:08:29,022 I spend a lot of time trying to figure it out. 111 00:08:29,492 --> 00:08:31,742 Before I was actually using TensorFlow. 112 00:08:31,762 --> 00:08:37,502 It worked pretty well for me with Atari games and so on and so forth. 113 00:08:37,892 --> 00:08:42,492 apparently it's not great in terms of GPU training. 114 00:08:42,852 --> 00:08:47,052 You can do GPU training, but there is an overhead. 115 00:08:47,102 --> 00:08:51,322 So you have to program everything in CPU and then send it to the GPU. 116 00:08:52,367 --> 00:08:54,517 Do the calculation and then send it back. 117 00:08:55,077 --> 00:08:56,687 the overhead is just too much. 118 00:08:56,697 --> 00:08:59,967 So it ended up, not very fast. 119 00:09:00,427 --> 00:09:06,277 then I learned another AI framework called PyTorch. 120 00:09:06,387 --> 00:09:17,547 you can explicitly send a tensor to GPU to do the calculation and so on and so forth. 121 00:09:17,547 --> 00:09:23,387 It's a little more complicated than TensorFlow because you do have to send 122 00:09:23,397 --> 00:09:28,097 something to GPU and then, get it back. 123 00:09:28,317 --> 00:09:32,787 So in terms of coding, you have to do a slightly more work, but in 124 00:09:32,787 --> 00:09:35,417 terms of performance, it's amazing. 125 00:09:35,797 --> 00:09:39,827 So I get to, train models. 126 00:09:40,962 --> 00:09:46,322 7 to 10 times faster, compared to CPU training. 127 00:09:46,822 --> 00:09:51,502 as all those large language models, they have billions or hundreds 128 00:09:51,502 --> 00:09:53,762 of billions of parameters, right? 129 00:09:54,122 --> 00:09:56,432 So the speed is crucial. 130 00:09:57,042 --> 00:10:01,302 RIght now, I'm like, training a model with millions of parameters. 131 00:10:01,382 --> 00:10:02,242 which is fine. 132 00:10:02,462 --> 00:10:08,012 So for, even larger kind of language models, in my third book, which 133 00:10:08,012 --> 00:10:10,387 is with, manning publications. 134 00:10:10,827 --> 00:10:16,097 So in this book, I'm doing generative AI with PyTorch. 135 00:10:16,417 --> 00:10:21,577 the reason I switched to PyTorch is because of dynamic, computing, 136 00:10:21,587 --> 00:10:25,097 graph, and then, the GPU training. 137 00:10:25,177 --> 00:10:28,247 I can train most models in a matter of minutes. 138 00:10:28,687 --> 00:10:32,047 sometimes I get a larger ones, maybe a couple of hours. 139 00:10:32,427 --> 00:10:33,117 That's it. 140 00:10:33,147 --> 00:10:37,507 I can see the model in action and then I can tune the model 141 00:10:37,687 --> 00:10:39,227 so that's the third book. 142 00:10:39,237 --> 00:10:44,907 So let me, conclude by quickly summarizing what I'm doing in the third book. 143 00:10:45,417 --> 00:10:48,337 the name, I think you just mentioned at the beginning. 144 00:10:48,727 --> 00:10:51,897 Learn Generative AI with PyTorch. 145 00:10:52,417 --> 00:11:00,387 Readers learn to create generative AI models from scratch, to create the 146 00:11:00,387 --> 00:11:08,797 different contents like, images, shapes, numbers, text, music, sound, so forth, 147 00:11:09,197 --> 00:11:13,237 all with PyTorch and deep learning models. 148 00:11:13,697 --> 00:11:15,187 And in particular, 149 00:11:15,747 --> 00:11:18,147 readers learn how to create. 150 00:11:18,877 --> 00:11:27,397 a ChatGPT-style transformer from scratch, and then in particular, I teach 151 00:11:27,567 --> 00:11:36,867 readers how to create a GPT-2 XL with 1.5B parameters Of course, with 1. 152 00:11:37,027 --> 00:11:39,667 5 billion parameters, it's very hard to train, right? 153 00:11:39,667 --> 00:11:40,647 It's very slow, number one. 154 00:11:40,647 --> 00:11:47,927 Number two, GPT-2 was trained with huge amounts of data, and regular readers don't 155 00:11:47,987 --> 00:11:50,917 have access to this training data, right? 156 00:11:51,277 --> 00:12:01,377 but, I also teach readers how to extract the pre trained weights from 157 00:12:01,397 --> 00:12:11,477 OpenAI and then you load those weights into the GPT-2 model you created from 158 00:12:11,487 --> 00:12:15,097 scratch, and start to generate the text. 159 00:12:15,507 --> 00:12:23,547 So the text you generate is very coherent without grammar errors, 160 00:12:23,837 --> 00:12:29,967 it's amazing, of course it's not as Powerful as ChatGPT GPT-4, but 161 00:12:30,077 --> 00:12:38,837 a normal person without access to super computing facilities, without access 162 00:12:38,837 --> 00:12:45,667 to larger amounts of training data can create a ChatGPT-style deep neural network 163 00:12:45,707 --> 00:12:53,097 from scratch, and use it to generate a text and generate a lifelike music. 164 00:12:53,287 --> 00:12:54,117 It's amazing. 165 00:12:54,237 --> 00:12:55,517 And that's the text part. 166 00:12:55,607 --> 00:12:59,437 on the image part, you can create like a color image. 167 00:12:59,852 --> 00:13:04,012 You can also convert a horse to a zebra. 168 00:13:04,222 --> 00:13:08,822 You can convert blonde hair to black hair in images. 169 00:13:09,042 --> 00:13:14,292 You can add or remove glasses in images and so forth. 170 00:13:14,302 --> 00:13:16,852 So the whole experience is amazing. 171 00:13:16,922 --> 00:13:20,382 it worked better than anticipated. 172 00:13:20,812 --> 00:13:22,812 And that's a whole experience. 173 00:13:23,372 --> 00:13:29,582 Reminded me of famous quote, "technology advanced enough is 174 00:13:29,612 --> 00:13:32,262 indistinguishable from magic". 175 00:13:32,462 --> 00:13:33,902 The whole thing is really magic. 176 00:13:34,342 --> 00:13:36,372 That's my long answer to your question. 177 00:13:37,047 --> 00:13:37,817 Thank you for that. 178 00:13:37,867 --> 00:13:41,937 just for anybody who's not familiar with Manning, the book, is currently 179 00:13:41,937 --> 00:13:43,997 available in what's called MEEP. 180 00:13:44,047 --> 00:13:48,547 That's for Manning Early Access Program, you can read the chapters 181 00:13:48,617 --> 00:13:51,127 as, they are produced, by Mark. 182 00:13:51,497 --> 00:13:55,647 So at the moment there is five chapters that are available, but I'm being told 183 00:13:55,677 --> 00:13:58,447 that 11, will be coming, very soon. 184 00:13:58,902 --> 00:14:02,402 And the estimated time for the whole book to be available is May 185 00:14:02,402 --> 00:14:07,272 2024, so for anybody who's eager and who might be thinking that the 186 00:14:07,282 --> 00:14:10,632 book is not finished yet, you can actually start reading it right now. 187 00:14:12,282 --> 00:14:16,602 speaking of the magic and the building from scratch, I think what I liked the 188 00:14:16,602 --> 00:14:20,512 most about your book, and what initially attracted me to actually go and read it, 189 00:14:21,352 --> 00:14:22,912 It's that 'build from scratch' thing. 190 00:14:22,922 --> 00:14:27,862 And I love that you used Richard Feynman's philosophy, the quote, "What 191 00:14:27,862 --> 00:14:30,092 I cannot create, I do not understand". 192 00:14:30,722 --> 00:14:33,852 I think that's a very good motto to live by. 193 00:14:34,812 --> 00:14:39,542 it's absolutely great that, you take us on this journey to build 194 00:14:39,562 --> 00:14:43,682 things up, even though I've only read the five chapters so far. 195 00:14:44,822 --> 00:14:47,872 all of a sudden with ChatGPT, everybody started talking 196 00:14:47,872 --> 00:14:49,692 about this and this explosion. 197 00:14:50,132 --> 00:14:53,692 what were some other moments, other than chat GPT, where you realized, 198 00:14:53,722 --> 00:14:55,552 Oh man, this is going to blow up. 199 00:14:55,552 --> 00:14:58,612 This is going to be massive with generative AI. 200 00:14:58,612 --> 00:15:03,312 I believe you mentioned, the writer's guild of America versus AI, story. 201 00:15:03,392 --> 00:15:04,902 Can we talk about that for a minute? 202 00:15:06,077 --> 00:15:11,717 before I answer that question, I encourage you to read my chapter one for free, 203 00:15:11,757 --> 00:15:13,607 even if you don't have to buy my book. 204 00:15:13,667 --> 00:15:15,557 manning has a great feature. 205 00:15:15,607 --> 00:15:22,517 If you go to manning.com and if you look for my book, Learn Generative 206 00:15:22,517 --> 00:15:24,527 AI with PyTorch, you can find it. 207 00:15:24,957 --> 00:15:30,727 I have a fairly long chapter one summarizing the state of the art 208 00:15:30,837 --> 00:15:34,607 in generative AI and also what I've been doing in the book. 209 00:15:35,227 --> 00:15:38,697 what Miko talked about, the Writer's Guild of America. 210 00:15:39,207 --> 00:15:42,827 So a few months ago, they, negotiated with, big firms. 211 00:15:43,477 --> 00:15:46,627 About, The threat of, AI. 212 00:15:47,097 --> 00:15:54,382 And as a result, it's a, contract to limit, how much AI you can use 213 00:15:54,432 --> 00:16:00,492 in writing, in production, in order to protect the jobs of the writers. 214 00:16:00,912 --> 00:16:06,852 And, this is just one example of the, Disruptive power of AI 215 00:16:06,902 --> 00:16:09,272 in many different industries. 216 00:16:09,602 --> 00:16:17,512 writers is just one example, and it threatens many other industries. 217 00:16:17,672 --> 00:16:25,572 Another example is checkmate, which is online educational platform. 218 00:16:25,952 --> 00:16:31,467 So college students go there to get tutoring service and so forth, 219 00:16:31,717 --> 00:16:37,077 and with the ChatGPT actually their business model is threatened, right? 220 00:16:37,077 --> 00:16:42,122 I think, in the month after the release of ChatGPT, their, stock 221 00:16:42,122 --> 00:16:45,112 price plunged by almost 40%. 222 00:16:45,562 --> 00:16:49,502 So that's how serious the, competition is. 223 00:16:49,802 --> 00:16:51,872 Those are just, a couple of examples. 224 00:16:51,952 --> 00:16:57,662 the potential of generative AI is huge, but at the same time, if you don't, 225 00:16:57,662 --> 00:17:00,032 catch up with the trend, there is, 226 00:17:00,112 --> 00:17:04,757 a risk that, your job might be, replaced by ai. 227 00:17:05,387 --> 00:17:06,947 there is a, an interesting quote. 228 00:17:07,337 --> 00:17:08,987 I think there is a lot of truth. 229 00:17:09,842 --> 00:17:12,942 It says that, "AI will not take your job. 230 00:17:13,692 --> 00:17:15,692 somebody using AI will". 231 00:17:15,942 --> 00:17:17,812 So I think there is a lot of truth in that. 232 00:17:17,822 --> 00:17:23,902 So in order to avoid being replaced by AI, I think the best 233 00:17:23,902 --> 00:17:26,022 strategy is to get in the game. 234 00:17:26,592 --> 00:17:33,462 to learn about the general AI, to protect yourself in terms of, future careers. 235 00:17:33,962 --> 00:17:34,882 so that's, 236 00:17:34,932 --> 00:17:38,292 the big motivation, behind my books. 237 00:17:38,392 --> 00:17:42,652 the main motivation, of course, is intellectual curiosity. 238 00:17:42,702 --> 00:17:45,012 I'm by nature a very curious person. 239 00:17:45,102 --> 00:17:52,602 So when I saw like ChatGPT works like magic, I really want 240 00:17:52,607 --> 00:17:53,747 to get it to the bottom of it. 241 00:17:54,267 --> 00:17:56,437 And they're trying to figure out how it works. 242 00:17:57,147 --> 00:17:58,477 So that's the main reason. 243 00:17:58,487 --> 00:18:01,587 But at the same time, I'm trying to teach my students. 244 00:18:02,452 --> 00:18:08,492 programming skills, machine learning skills, AI skills, generative AI skills in 245 00:18:08,492 --> 00:18:12,322 order to prepare them for the job market. 246 00:18:12,712 --> 00:18:18,032 so that, in the future, their skill sets will not be outdated. 247 00:18:18,082 --> 00:18:22,722 that's my second motivation for writing the books. 248 00:18:24,318 --> 00:18:29,588 Do you buy in this comparison that AI is like personal compuers? 249 00:18:30,697 --> 00:18:34,597 And that, a lot of people were worried about how personal computers 250 00:18:34,627 --> 00:18:37,637 were going to just remove jobs. 251 00:18:37,687 --> 00:18:43,327 But what ended up happening was, some, small portion of jobs was eliminated, 252 00:18:43,327 --> 00:18:48,467 but most of the jobs were modified, and became, operating computers. 253 00:18:49,107 --> 00:18:52,787 Do you think that's the most apt comparison of what we're likely to 254 00:18:52,817 --> 00:18:54,987 experience with AI in the coming years? 255 00:18:55,960 --> 00:19:02,850 the future, is hard to predict, but personally, I think, most likely, that's 256 00:19:03,430 --> 00:19:05,240 what's going to happen in the near future. 257 00:19:05,580 --> 00:19:12,700 if generative AI, you can actually use it to increase your productivity, 258 00:19:13,040 --> 00:19:15,190 to have more job opportunities. 259 00:19:15,630 --> 00:19:20,960 On the other hand, if you, basically, completely stay away from it, your 260 00:19:20,960 --> 00:19:25,560 skill sets might be outdated but at the same time, I think technology 261 00:19:25,560 --> 00:19:32,250 will make all this AI stuff more accessible to most people, right? 262 00:19:32,490 --> 00:19:36,970 You don't necessarily have to be a programmer, so one 263 00:19:36,975 --> 00:19:38,930 example is Midjourney right? 264 00:19:39,220 --> 00:19:44,400 you can actually just go to a browser and then you can use Midjourney 265 00:19:44,420 --> 00:19:52,500 or DALL-E 2, DALL-E 3, or whatever to create a very fancy images. 266 00:19:52,860 --> 00:19:55,560 You can use a text prompt to create a. 267 00:19:55,560 --> 00:19:59,230 an image of what you meant, you don't have to draw yourself, 268 00:19:59,250 --> 00:20:01,320 in that sense, I'm optimistic. 269 00:20:01,460 --> 00:20:08,260 I think for most people, generative AI will be a very valuable tool 270 00:20:08,860 --> 00:20:11,260 to increase their productivity. 271 00:20:11,680 --> 00:20:14,710 as long as, you keep up with the technology, 272 00:20:14,710 --> 00:20:18,420 I'm glad you mentioned Midjourney because I think for me personally, that was where 273 00:20:18,420 --> 00:20:22,450 I realized: 'okay, this is the hockeystick moment' because I remember the little 274 00:20:22,770 --> 00:20:25,550 tiny pictures, blurry from the GAN paper. 275 00:20:26,270 --> 00:20:30,420 and then all of a sudden I saw some pictures that were generated by 276 00:20:30,420 --> 00:20:34,820 Midjourney and I went and I, I tried it myself and, it was more or less able 277 00:20:34,820 --> 00:20:40,440 to produce almost everything I threw at it, other than some particular types of 278 00:20:40,440 --> 00:20:42,510 dinosaurs that just didn't recognize. 279 00:20:44,020 --> 00:20:46,520 That was like the one thing I knew, 'okay, they didn't train 280 00:20:46,520 --> 00:20:47,860 it on that kind of dinosaur'. 281 00:20:47,860 --> 00:20:51,450 But, that was definitely one of those moments where I realized, wow. 282 00:20:51,920 --> 00:20:55,510 And the other is, I think, I live in London, one way or another, you end 283 00:20:55,510 --> 00:20:59,630 up using the tube a lot, and, usually you're annoyed at people who, play 284 00:20:59,630 --> 00:21:01,570 some music on like public transport. 285 00:21:01,580 --> 00:21:05,980 And then, at some point I realized that I was getting annoyed at people 286 00:21:06,010 --> 00:21:11,980 talking about generative AI, on the public transport and making noise. 287 00:21:12,370 --> 00:21:16,600 And that's when you realize that, 'okay, so this has now gone, mainstream and, 288 00:21:16,600 --> 00:21:18,020 and everybody's talking about that'. 289 00:21:18,030 --> 00:21:23,310 But let's talk a little bit about, The actual underlying breakthroughs, 290 00:21:23,370 --> 00:21:26,130 that brought us to where we are. 291 00:21:26,130 --> 00:21:31,330 And, in particular, I'm thinking about GAN, the generative adversarial 292 00:21:31,330 --> 00:21:34,280 networks and transformers and diffusion. 293 00:21:34,960 --> 00:21:36,150 where should we start? 294 00:21:36,430 --> 00:21:40,520 what's the first important breakthrough that everybody should know about? 295 00:21:41,130 --> 00:21:47,470 I think, all the generative AI models, in my book are deep neural networks. 296 00:21:47,960 --> 00:21:50,580 machine learning is a very wide field. 297 00:21:50,900 --> 00:21:55,650 there are many traditional machine learning models, random forest, 298 00:21:56,260 --> 00:22:02,290 linear regressions, this and that, but about, 20 years ago, deep neural 299 00:22:02,290 --> 00:22:04,080 networks became very powerful. 300 00:22:04,890 --> 00:22:13,250 one great thing about the neural networks is that you can scale it and, deep neural 301 00:22:13,250 --> 00:22:22,000 network can approximate any relationship, even if we human beings don't know what's 302 00:22:22,000 --> 00:22:28,740 the exact relationship, as long as you create a large enough model to capture it. 303 00:22:29,150 --> 00:22:32,390 so that's the foundation 20 years ago. 304 00:22:32,720 --> 00:22:36,300 And then over the past, 20 years or so, many people. 305 00:22:36,800 --> 00:22:43,690 Breakthroughs in, deep learning field, and then, let's talk about it like a ChatGPT. 306 00:22:43,710 --> 00:22:51,590 Okay, so ChatGPT is a huge deep neural network trained on huge amounts of data. 307 00:22:52,610 --> 00:22:58,150 And before that, state of the art, natural language processing models 308 00:22:58,200 --> 00:23:00,840 are recurrent neural networks. 309 00:23:01,240 --> 00:23:06,560 So how it works was either progresses on the timeline. 310 00:23:06,640 --> 00:23:10,920 Let's say you have a sentence like, this is a sentence, right? 311 00:23:11,190 --> 00:23:13,730 So you have like four words in the sentence, right? 312 00:23:13,950 --> 00:23:20,400 the model uses the first, word, "this" to predict the second word "is" and then 313 00:23:20,470 --> 00:23:25,250 it uses the first two words to predict the third word, and so on and so forth. 314 00:23:26,200 --> 00:23:31,200 it worked to some degree, but it's very slow because, you have 315 00:23:31,210 --> 00:23:34,920 to, predict one word at a time. 316 00:23:35,760 --> 00:23:40,560 And then in 2017, there is a huge breakthrough. 317 00:23:41,010 --> 00:23:42,090 There's a paper. 318 00:23:42,265 --> 00:23:46,735 called "attention is all you need" by a group of, Google scholars, 319 00:23:47,345 --> 00:23:54,365 and they used a different mechanism to capture the, relationship of 320 00:23:54,375 --> 00:23:55,915 different words in a sentence. 321 00:23:56,515 --> 00:24:05,025 So it's called the attention mechanism and It's much more effective on top of that. 322 00:24:05,505 --> 00:24:07,750 it's not sequential. 323 00:24:07,770 --> 00:24:11,710 So which means one word can't pay attention to all 324 00:24:11,760 --> 00:24:13,410 other words at the same time. 325 00:24:14,080 --> 00:24:17,490 And this allows for, parallel training. 326 00:24:18,080 --> 00:24:20,320 And this has huge implications. 327 00:24:20,370 --> 00:24:26,010 number one, it works better in terms of capturing long-term relationships. 328 00:24:26,315 --> 00:24:30,775 between different words in a sentence so that you can understand the meaning of 329 00:24:30,775 --> 00:24:33,025 a long sentence, long text, number one. 330 00:24:33,025 --> 00:24:38,265 Number two, because of the non sequential nature of, Attention mechanism. 331 00:24:39,085 --> 00:24:40,955 You can use parallel training. 332 00:24:41,005 --> 00:24:45,555 you can train the same model on many different devices. 333 00:24:46,135 --> 00:24:48,105 this makes training much faster. 334 00:24:48,545 --> 00:24:53,935 And this also allows you to train the model on more data. 335 00:24:54,425 --> 00:25:00,775 that's why ChatGPT became so powerful, because, you can train them much faster, 336 00:25:00,805 --> 00:25:02,845 and then you can train them on more data. 337 00:25:02,925 --> 00:25:08,275 On top of that, the mechanism works much better than recurrent neural 338 00:25:08,275 --> 00:25:14,515 networks, because it can capture really long term relationships in a sequence, 339 00:25:14,875 --> 00:25:16,975 like as a text is a sequence, right? 340 00:25:17,025 --> 00:25:23,175 that propelled, uh, OpenAI to have all these models, including ChatGPT. 341 00:25:23,575 --> 00:25:31,135 now let's go to, the recent development, the text to image transformers. 342 00:25:31,235 --> 00:25:36,518 this is a new innovation in transformer models called, multimodal models. 343 00:25:36,568 --> 00:25:41,538 The original transformer model, "attention is all you need", which powers the 344 00:25:41,538 --> 00:25:44,308 chatGPT, they only use text, right? 345 00:25:44,308 --> 00:25:49,728 So the input is a sequence of text, the output is also a sequence of text, but 346 00:25:49,768 --> 00:25:55,778 the multimodal models, the input and output can be, different formats, right? 347 00:25:55,798 --> 00:26:02,358 32, 33, the input is a text and the output is an image, right? 348 00:26:02,388 --> 00:26:04,598 you can have a different, inputs, outputs. 349 00:26:04,628 --> 00:26:08,813 You can have audio, you can have video, Sora has videos, that kind of stuff. 350 00:26:08,893 --> 00:26:12,053 but let's talk about what is the underlying mechanism 351 00:26:12,053 --> 00:26:14,078 behind multi modal models. 352 00:26:14,448 --> 00:26:19,783 DALL-E 2, DALL-E 3,s it has something to do with different models. 353 00:26:20,033 --> 00:26:25,463 So I think you mentioned that, at first the generated image is very grainy, right? 354 00:26:25,743 --> 00:26:31,543 the different models add noise to an image gradually. 355 00:26:31,743 --> 00:26:34,333 let's say there are like 1000 time steps. 356 00:26:34,553 --> 00:26:40,543 And then at each time step, you can actually add a little bit of noise 357 00:26:40,543 --> 00:26:48,313 to the image and gradually you have a 1000 different images and each one 358 00:26:48,673 --> 00:26:54,823 becomes progressively noisier and at the end, it becomes completely noisy. 359 00:26:56,093 --> 00:27:00,683 And then what you can do is that you can give those images to a 360 00:27:00,683 --> 00:27:06,133 machine learning model and you can train the model to remove those 361 00:27:06,233 --> 00:27:09,043 noises, progressively, step by step. 362 00:27:09,253 --> 00:27:15,163 that's how, DALL-E and all those text to image models work. 363 00:27:15,393 --> 00:27:21,383 first step is that you use a text prompt to generate a very grainy image, and 364 00:27:21,423 --> 00:27:26,963 then after that you use a model which is very much like a different models. 365 00:27:27,293 --> 00:27:33,073 You will progressively refine those models so that, you turn a very grainy 366 00:27:33,073 --> 00:27:36,303 image into a high resolution image. 367 00:27:36,453 --> 00:27:44,058 that's why, when you enter a like a shorter prompt and then, DALL-E 2 can 368 00:27:44,058 --> 00:27:46,128 give you a higher resolution image. 369 00:27:46,668 --> 00:27:51,618 capturing, what are you trying to produce in the text prompt. 370 00:27:51,628 --> 00:27:54,718 So that's actually chapter 14 of my book. 371 00:27:54,918 --> 00:27:58,068 I'm going to talk about how you can add a little bit of noise to 372 00:27:58,068 --> 00:28:00,538 the image, one step at a time. 373 00:28:00,538 --> 00:28:06,848 And then you can use those, images to train the model to remove the noise step 374 00:28:06,848 --> 00:28:15,748 by step progressively, and very much like, DALL-E 2 trying to, make the image clearer 375 00:28:15,748 --> 00:28:18,248 and clearer step by step progressively. 376 00:28:22,458 --> 00:28:26,628 Generative adversarial networks, which was an interesting 377 00:28:26,628 --> 00:28:28,898 development, from Ian Goodfellow. 378 00:28:29,158 --> 00:28:32,278 How does that fit into the rest of what you just described? 379 00:28:33,053 --> 00:28:39,123 Generative Adversarial, Networks, so it's great at generating 380 00:28:39,273 --> 00:28:41,883 different forms, of content. 381 00:28:42,703 --> 00:28:48,063 a lot of times when readers learn something, if you give them the end 382 00:28:48,173 --> 00:28:50,653 product, it's too complicated, right? 383 00:28:50,663 --> 00:28:53,563 So they may get frustrated and they just give up. 384 00:28:54,473 --> 00:29:03,303 as an author, my job is how to make sure that readers stay engaged throughout 385 00:29:03,333 --> 00:29:09,353 the book and never get tired, never get frustrated, and gradually learn and 386 00:29:09,463 --> 00:29:15,403 finally learn to do the state of the art machine learning models generally by 387 00:29:15,403 --> 00:29:21,953 models like ChatGPT-style transformer to generate the text and the audio, right? 388 00:29:22,023 --> 00:29:24,243 So what is the idea behind the GANs? 389 00:29:24,763 --> 00:29:25,933 You have two networks. 390 00:29:25,983 --> 00:29:28,383 One is a generator network. 391 00:29:28,433 --> 00:29:34,193 The other one is a discriminator network, so the job of the generator is trying 392 00:29:34,373 --> 00:29:39,363 to generate a piece of work similar to that from the training data set. 393 00:29:39,413 --> 00:29:43,718 let's use a grayscale image as an example, right? 394 00:29:43,958 --> 00:29:47,548 you have a training dataset of grayscale images of, 395 00:29:47,598 --> 00:29:50,068 handwritten digits, like 0 to 9. 396 00:29:50,268 --> 00:29:52,938 And then, those are the real images. 397 00:29:53,118 --> 00:29:59,853 And then you will ask the generator to generate something similar to 398 00:29:59,853 --> 00:30:04,728 that, so that it can pass as real in front of the discriminator. 399 00:30:05,678 --> 00:30:09,288 before you train the model, the generator is terrible. 400 00:30:09,708 --> 00:30:14,858 So whatever the generator generated, completely like gibberish. 401 00:30:14,878 --> 00:30:18,148 it's like a snowflake on a screen, that kind of stuff. 402 00:30:19,188 --> 00:30:22,568 But, this is where training, comes in. 403 00:30:23,068 --> 00:30:27,068 you will have a training loop, and then, in each iteration, 404 00:30:27,108 --> 00:30:32,788 you will ask the generator to generate a bunch of fake images. 405 00:30:33,258 --> 00:30:39,788 At the same time, you also have a bunch of real images from the training set and 406 00:30:39,818 --> 00:30:46,338 you give all those to the discriminator and ask the discriminator to determine 407 00:30:47,188 --> 00:30:50,738 whether each image is real or fake 408 00:30:51,058 --> 00:30:58,628 And then the generator's job is trying to create an image so that the 409 00:30:58,628 --> 00:31:01,838 discriminator would think it's real. 410 00:31:02,088 --> 00:31:04,178 that's the generator's objective. 411 00:31:04,538 --> 00:31:08,898 So therefore you have a loss function, and then you train the model. 412 00:31:09,378 --> 00:31:15,658 You gradually fine tune the model parameters so that in the next 413 00:31:15,658 --> 00:31:22,828 iteration, whatever image generated by the generator will have a higher 414 00:31:22,838 --> 00:31:26,098 probability of passing as real. 415 00:31:26,868 --> 00:31:31,968 And then you do this again and again, you can do the thousands of iterations. 416 00:31:32,508 --> 00:31:38,168 And, if you do that, long enough, then eventually the generator will 417 00:31:38,188 --> 00:31:44,098 be able to create an image identical to the image from the training set. 418 00:31:44,108 --> 00:31:50,448 So that's how GAN works you have a zero sum game, you have a competitive 419 00:31:50,488 --> 00:31:54,898 kind of two networks competing with each other, trying to outsmart 420 00:31:54,898 --> 00:31:59,018 each other and eventually, the generator gets better and better. 421 00:31:59,178 --> 00:32:05,683 So that's the idea behind GANs, it's a revolutionary idea. 422 00:32:05,833 --> 00:32:12,353 in 2014, 2015, Ian Goodfellow and his co authors proposed the model. 423 00:32:12,653 --> 00:32:19,243 a great thing about the model is it can generate different content: numbers. 424 00:32:19,713 --> 00:32:24,013 Images, shapes, even music, so on and so forth. 425 00:32:25,083 --> 00:32:28,343 I love this idea because on top of that, you've got this 426 00:32:28,603 --> 00:32:30,193 built-in target point, right? 427 00:32:30,223 --> 00:32:33,793 When your discriminator can no longer discriminate between 428 00:32:33,793 --> 00:32:34,703 what you're generating. 429 00:32:34,703 --> 00:32:36,813 when you're finished, it's not arbitrary. 430 00:32:36,823 --> 00:32:37,593 You've got that. 431 00:32:37,593 --> 00:32:40,813 And the other reason why I love that is that it's got this anecdote attached 432 00:32:40,823 --> 00:32:45,933 to it that, legend has it, it was written one evening, when Ian was 433 00:32:45,933 --> 00:32:49,953 celebrating in a pub I think someone was graduating, some fellow students. 434 00:32:51,148 --> 00:32:55,158 And, they were discussing a problem when they wanted to generate some pictures. 435 00:32:55,658 --> 00:32:59,558 And he came up with this idea that, 'oh, what you're suggesting is too 436 00:32:59,558 --> 00:33:03,548 complicated and you should, put two networks against each other'. 437 00:33:04,028 --> 00:33:04,758 And they laughed. 438 00:33:05,708 --> 00:33:08,298 he went home and, still slightly drunk. 439 00:33:08,378 --> 00:33:10,298 he wrote a proof of concept of that. 440 00:33:10,928 --> 00:33:13,268 And then turned out, that it actually worked out. 441 00:33:13,318 --> 00:33:17,618 I think in one of the interviews later, he said that if he wasn't drunk, 442 00:33:17,618 --> 00:33:20,448 he probably wouldn't have done it because it sounded like a silly idea. 443 00:33:22,788 --> 00:33:23,188 Okay. 444 00:33:23,508 --> 00:33:24,278 Yeah, that's right. 445 00:33:24,278 --> 00:33:24,508 Yeah. 446 00:33:24,508 --> 00:33:26,678 how random some of those things are. 447 00:33:27,228 --> 00:33:28,898 How, weird and unpredicted. 448 00:33:28,988 --> 00:33:33,978 And I think one of the things I wanted to ask you about is also 449 00:33:34,303 --> 00:33:38,563 what made all of those kind of recent breakthroughs possible? 450 00:33:38,783 --> 00:33:39,623 what was missing? 451 00:33:39,623 --> 00:33:43,283 Because we've had the neural network since what the 80s or something like that. 452 00:33:43,313 --> 00:33:48,353 all of a sudden, it looks like in the last few years, or maybe last decade or 453 00:33:48,353 --> 00:33:52,733 so, it was just like one breakthrough after another breakthrough just dropping. 454 00:33:52,733 --> 00:33:57,743 And if you try to keep up with currently written papers on AI, 455 00:33:57,743 --> 00:33:59,243 there's just so many of them. 456 00:33:59,768 --> 00:34:03,938 And it looks like every other day, there's something super interesting that's been 457 00:34:03,938 --> 00:34:09,738 developed and it's literally hard to keep up just with other people's ideas. 458 00:34:10,168 --> 00:34:15,558 What do you think enabled this kind of explosion in the recent years? 459 00:34:16,921 --> 00:34:21,631 actually, like a neural networks was proposed even earlier than 1980s. 460 00:34:21,691 --> 00:34:28,021 I think in 1960s, researchers proposed artificial neural networks, basically 461 00:34:28,121 --> 00:34:34,771 modeled after human brain, The idea was a great one, but at that point, we 462 00:34:34,831 --> 00:34:43,041 didn't have the, hardware to support it, And then started in 1990s, early 2000s. 463 00:34:43,996 --> 00:34:47,356 The hardware becomes much more powerful, number one. 464 00:34:47,886 --> 00:34:54,096 Number two: there was more research, more breakthroughs in the research 465 00:34:54,096 --> 00:34:56,346 field of, artificial neural networks. 466 00:34:56,856 --> 00:35:03,535 so one example is, LeCun's, uh, Convolutional Neural Networks. 467 00:35:04,605 --> 00:35:08,665 most neural networks are fully connected, dense neural networks, 468 00:35:08,685 --> 00:35:13,485 which means, a neuron in the previous layer is connected to all the neurons 469 00:35:13,525 --> 00:35:16,875 in the next layer, and it works great. 470 00:35:17,705 --> 00:35:22,865 Except that once your model becomes larger, the number of parameters, 471 00:35:23,325 --> 00:35:26,925 grow exponentially, and then it's very hard to train it, right? 472 00:35:26,945 --> 00:35:27,995 So that's a problem. 473 00:35:28,465 --> 00:35:34,225 convolutional neural networks is, you localize the weights, okay? 474 00:35:34,445 --> 00:35:40,240 you have a filter, and then the weights in the filter is a fixed When you move 475 00:35:40,280 --> 00:35:48,790 the filter on an image, and then this greatly reduced the number of parameters. 476 00:35:49,290 --> 00:35:52,500 it makes, computer vision much more efficient. 477 00:35:52,790 --> 00:35:58,710 because of that in, Early 2000s, there were a lot of breakthroughs in computer 478 00:35:58,710 --> 00:36:05,410 vision, in convolutional neural networks, and I think that's a huge breakthrough. 479 00:36:05,740 --> 00:36:06,740 And then 480 00:36:07,800 --> 00:36:11,540 after that, you also have, GPU training. 481 00:36:11,930 --> 00:36:16,540 GPU training became very popular in the past maybe 10 years or So. 482 00:36:16,850 --> 00:36:22,931 And there is, Huge game changer because as deep neural networks became larger 483 00:36:22,931 --> 00:36:30,671 and larger, It's very hard to train them, without, extra help, right? 484 00:36:30,721 --> 00:36:32,231 When you train on CPU. 485 00:36:32,411 --> 00:36:35,750 CPU is a general purpose kind of processor. 486 00:36:36,030 --> 00:36:38,060 you have to do many things on it. 487 00:36:38,430 --> 00:36:40,690 But, GPU is specialized. 488 00:36:40,925 --> 00:36:43,895 So you can do machine learning jobs much faster. 489 00:36:44,915 --> 00:36:47,435 and of course, we also have more and more. 490 00:36:47,990 --> 00:36:53,260 training data available, and that also is necessary for 491 00:36:53,610 --> 00:36:55,380 large language models to work. 492 00:36:55,650 --> 00:37:01,490 it takes time, but I think, the past 20 years or so, we suddenly have, 493 00:37:01,730 --> 00:37:04,890 everything come together to make it work, 494 00:37:04,940 --> 00:37:09,550 basically, we've got gamers to thank for their breakthroughs in AI 495 00:37:10,060 --> 00:37:14,770 because of the graphic cards, the GPUs that they requested, right? 496 00:37:15,360 --> 00:37:19,990 you have a very good point, I think GPU was originally designed 497 00:37:20,050 --> 00:37:21,420 for gaming purpose, right? 498 00:37:22,210 --> 00:37:26,620 And then suddenly right now, it has a completely different purpose, And 499 00:37:26,690 --> 00:37:33,970 I have several GPUs at home, not very powerful I think it's powerful enough 500 00:37:33,980 --> 00:37:36,910 for me to experiment on different models. 501 00:37:37,010 --> 00:37:39,640 It costs maybe several hundred dollars, thousand dollars. 502 00:37:40,140 --> 00:37:41,430 I have three of them. 503 00:37:41,960 --> 00:37:43,500 Two of them are from my son. 504 00:37:43,520 --> 00:37:45,160 My son was playing video games. 505 00:37:45,590 --> 00:37:48,340 And then now he doesn't use those computers anymore. 506 00:37:48,340 --> 00:37:49,940 And then he just gave it to me. 507 00:37:49,940 --> 00:37:53,720 And then I just simply take them out and use it for my own, 508 00:37:53,780 --> 00:37:56,120 But the cost is not that much. 509 00:37:56,170 --> 00:37:59,310 the cost is not that much unless you go for like the top of the line 510 00:37:59,310 --> 00:38:05,620 80 gig ones, which are very hard to come by and also quite expensive. 511 00:38:06,420 --> 00:38:08,590 Yeah, so thank you gamers. 512 00:38:08,650 --> 00:38:13,090 Thank you for enabling the AI revolution in many ways. 513 00:38:13,090 --> 00:38:15,730 it goes back to what I was saying about how random some 514 00:38:15,760 --> 00:38:17,190 of these things seem to be. 515 00:38:17,790 --> 00:38:19,780 so where do you think, we're heading? 516 00:38:19,870 --> 00:38:24,390 Like you said, the future is notoriously difficult to predict, obviously. 517 00:38:24,930 --> 00:38:30,380 But, if you were still going to venture and make a guess, that will 518 00:38:30,380 --> 00:38:33,880 probably prove completely wrong a few years down the line, where do you 519 00:38:33,880 --> 00:38:35,360 think we're heading with all of this? 520 00:38:36,628 --> 00:38:43,278 if I had to venture to guess The large language models will become even 521 00:38:43,278 --> 00:38:49,188 more powerful in the near future, not only in terms of generating, cohesive 522 00:38:49,788 --> 00:38:57,208 text, but also generating images, generating videos and also Multimodal 523 00:38:57,208 --> 00:38:59,218 models will become very popular. 524 00:38:59,268 --> 00:39:04,018 Okay, you can generate not only images, text, you can also generate 525 00:39:04,018 --> 00:39:06,108 audio, video, sound, and so forth. 526 00:39:06,748 --> 00:39:11,293 other than that, I think, it really depends on, which breakthrough will 527 00:39:11,293 --> 00:39:13,483 come through in the near future. 528 00:39:13,893 --> 00:39:18,303 And you never know if there's just one day suddenly is huge breakthrough, and 529 00:39:18,303 --> 00:39:24,453 then they'll completely, change the landscape of ai, just like what the 530 00:39:24,453 --> 00:39:26,703 ChatGPT did a couple years ago, right? 531 00:39:26,703 --> 00:39:29,463 the future is very exciting, but at the same time, like you 532 00:39:29,463 --> 00:39:30,963 said, it's very hard to predict. 533 00:39:31,443 --> 00:39:37,778 But, I think right now is a very fortunate time, a very exciting 534 00:39:38,048 --> 00:39:41,558 time for, tech enthusiasts. 535 00:39:41,988 --> 00:39:47,263 for anybody who is passionate about ai, about technology, is very exciting. 536 00:39:47,890 --> 00:39:49,520 So two follow up questions then. 537 00:39:49,620 --> 00:39:54,530 one it's, like anything else, there are these fashion waves 538 00:39:54,570 --> 00:39:56,590 that kind of, come and go. 539 00:39:57,140 --> 00:40:00,870 and AI is now the latest hottest thing. 540 00:40:00,880 --> 00:40:04,260 So all the VCs, everybody's throwing money at it. 541 00:40:05,210 --> 00:40:08,700 But at some point people will probably move on to the next thing, just like 542 00:40:08,700 --> 00:40:12,930 they did with crypto and smartphones and internet and whatever else before, right? 543 00:40:13,600 --> 00:40:18,930 So I'm wondering, where do you think we are in that, hype cycle, and what's 544 00:40:18,930 --> 00:40:24,675 going to happen when all of a sudden slapping AI-first on your startup, no 545 00:40:24,675 --> 00:40:27,225 longer make sure that you get funding. 546 00:40:27,645 --> 00:40:29,515 So that's question number one, follow up. 547 00:40:30,425 --> 00:40:35,095 and then the second question is, if you were to plot, a graph of how you 548 00:40:35,095 --> 00:40:39,455 expect, the large language models to continue developing, I think we 549 00:40:39,455 --> 00:40:42,715 can all agree that there are some kind of like very exponential growth 550 00:40:42,715 --> 00:40:46,995 where somebody figured out, ChatGPT or one of those massive models. 551 00:40:47,495 --> 00:40:52,775 If you throw enough data at it, and you massage it for long enough, you 552 00:40:52,775 --> 00:40:56,795 can create this impression of, 'oh, this is magic, how on earth is that 553 00:40:56,795 --> 00:41:00,515 even happening?' But then, at some point it has to plateau, right? 554 00:41:00,615 --> 00:41:05,045 it's not possible for it to go, at that kind of speed, into the sky. 555 00:41:05,650 --> 00:41:06,210 Feeling. 556 00:41:06,910 --> 00:41:09,520 Again, it's hard to predict the sense. 557 00:41:09,950 --> 00:41:15,570 course, all the usual disclaimers about predictions, but what's your take 558 00:41:15,570 --> 00:41:19,350 on what it means about us as humans? 559 00:41:19,820 --> 00:41:26,780 Does it mean that what we, cherish as one of the unique capabilities 560 00:41:26,800 --> 00:41:28,930 of humans, the human intelligence? 561 00:41:29,905 --> 00:41:34,755 it's not actually all that unique, because it's hard to not have this 562 00:41:34,755 --> 00:41:37,785 feeling when you talk to one of those big large language models 563 00:41:37,785 --> 00:41:42,115 and, during the time it doesn't go haywire and start behaving weird, 564 00:41:42,145 --> 00:41:44,165 but on the times where it works well. 565 00:41:44,590 --> 00:41:48,690 It's really hard to not have this impression that you're talking to somebody 566 00:41:48,690 --> 00:41:51,150 with, some amount of intelligence to it. 567 00:41:51,510 --> 00:41:56,700 So does it mean that we're all some kind of statistical models and the 568 00:41:56,800 --> 00:42:01,740 intelligence that we demonstrate is also an emerging property? 569 00:42:01,830 --> 00:42:03,080 What's your take on that? 570 00:42:04,110 --> 00:42:07,780 I don't think, many people in the world right now have a 571 00:42:07,800 --> 00:42:09,450 good answer to that question. 572 00:42:09,501 --> 00:42:14,041 that said, I do want to point out that there are many people 573 00:42:14,041 --> 00:42:17,291 right now have concerns about, AI. 574 00:42:17,891 --> 00:42:24,821 Because of the potential damage it can do, so it's all about the objective 575 00:42:24,831 --> 00:42:32,501 function, So if you give a task to the model and, in terms of the last function, 576 00:42:32,521 --> 00:42:38,291 and then you can just try it again and again, and eventually it will become 577 00:42:38,291 --> 00:42:43,961 very good at, whatever objective you want the model to do so that is good, 578 00:42:44,001 --> 00:42:48,381 but at the same time, it can be bad the AI may not even know it, right? 579 00:42:48,381 --> 00:42:51,591 It's just trying to accomplish a certain goal. 580 00:42:51,791 --> 00:42:55,951 It just happens that a human being is standing in the way of that goal. 581 00:42:56,121 --> 00:43:00,098 so in that sense, I do think that, Human beings need to be careful. 582 00:43:00,138 --> 00:43:06,688 I think like AI needs to be, regulated in to some degree. 583 00:43:07,198 --> 00:43:11,158 we cannot let it to, do whatever it wants. 584 00:43:11,158 --> 00:43:13,568 It may have serious. 585 00:43:14,158 --> 00:43:16,878 negative consequences to human beings, 586 00:43:17,345 --> 00:43:22,135 I think that a lot of what you just described has been the main kind of 587 00:43:22,135 --> 00:43:27,525 concern for everybody making sci-fi movies from the Terminator and Skynet 588 00:43:27,595 --> 00:43:32,515 And, I certainly get that, but I think I'm probably more worried about. 589 00:43:32,565 --> 00:43:37,555 going back to what we said about, you won't be losing your job to AI, you'll 590 00:43:37,555 --> 00:43:42,595 be losing your job to someone using, an AI, I think this probably applies 591 00:43:42,605 --> 00:43:49,495 here too, that you can just do, as an enabler, it scales up the amount of 592 00:43:49,495 --> 00:43:57,535 damage that, nefarious party can actually, produce, because using that to bad ends. 593 00:43:57,575 --> 00:44:02,465 a lot of the security that we rely on is practical, right? 594 00:44:02,525 --> 00:44:07,405 Like for example, all the encryption keys that we use for everything are, only 595 00:44:07,405 --> 00:44:11,635 because it would be computationally too expensive to actually figure that out. 596 00:44:11,675 --> 00:44:16,905 But then when you've got tools like this, it's easy to be scared about the 597 00:44:16,935 --> 00:44:22,150 possibility of that figuring out, and making things possible, that previously 598 00:44:22,450 --> 00:44:28,130 weren't, so I think I'm more worried about that scenario, where someone uses 599 00:44:28,170 --> 00:44:33,580 the AI to bad ends and it enables them to do more damage that they would be 600 00:44:33,790 --> 00:44:35,710 able to do with traditional methods, 601 00:44:36,390 --> 00:44:41,200 even in the current stage, if AI falls into the wrong hands, 602 00:44:41,220 --> 00:44:43,840 it can do a lot of damage. 603 00:44:43,900 --> 00:44:49,300 not that catastrophic, but it can do a lot of damage to a lot of families, right? 604 00:44:49,300 --> 00:44:55,050 I think, There were like stories about, people use the generative AI to create 605 00:44:55,100 --> 00:45:01,280 a fake phone call to their parents and, demand a ransom money so I think it 606 00:45:01,280 --> 00:45:07,590 causes, financial damage and also a lot of emotional distress, like fake news. 607 00:45:08,625 --> 00:45:12,775 Fake video, a lot of deep fake stuff, so even at this stage I 608 00:45:12,825 --> 00:45:16,765 think you can do a lot of harm if you fall into the wrong hands 609 00:45:16,928 --> 00:45:19,298 Yeah, that's a very good example of the call. 610 00:45:19,308 --> 00:45:24,198 Like you can technically go and call people and scam and, people do that, 611 00:45:24,518 --> 00:45:27,838 but there is a limit to how many people you can physically call in a day. 612 00:45:28,628 --> 00:45:33,463 If on the other hand, you have a powerful enough AI, you can scale it up and 613 00:45:33,463 --> 00:45:37,708 you can probably call everybody in the United States, a certain amount of times. 614 00:45:39,003 --> 00:45:39,243 That's 615 00:45:39,368 --> 00:45:45,033 you concerned about the AI involvement in the upcoming election. 616 00:45:46,073 --> 00:45:51,743 so we have to be careful, but I think so far the impact that it's limited. 617 00:45:52,033 --> 00:45:56,833 but at the same time, I think all the parties, politicians need to 618 00:45:57,243 --> 00:45:59,613 pay attention to generative AI. 619 00:45:59,998 --> 00:46:05,328 Because of what it can do, fake news and so forth. 620 00:46:06,028 --> 00:46:10,458 imagine you are running a political campaign, right? 621 00:46:10,478 --> 00:46:17,618 You must, get to know, analytics, how AI can influence your campaign either 622 00:46:17,808 --> 00:46:25,558 positively or negatively, if your team can utilize AI, uh, to, Strengthen 623 00:46:25,558 --> 00:46:30,858 your position legally, you're in a very good, position, it can help you, 624 00:46:30,898 --> 00:46:36,178 but on the other hand, if you're not careful, your opponents or somebody can 625 00:46:36,178 --> 00:46:43,658 use deepfake to disrupt your campaign for your cause that's why I think AI 626 00:46:43,688 --> 00:46:46,968 is so powerful and also so widespread. 627 00:46:47,288 --> 00:46:54,388 It affects every single industry in the economy, not just a few isolated sectors. 628 00:46:54,758 --> 00:46:56,738 that's very unique. 629 00:46:57,098 --> 00:46:57,838 About AI. 630 00:46:58,543 --> 00:47:04,243 Did you hear about the Elon Musk lawsuit against OpenAI from a few days ago? 631 00:47:04,943 --> 00:47:08,683 obviously OpenAI initially started as an alternative to the big 632 00:47:08,733 --> 00:47:12,813 companies, and the massive labs like Google, Facebook and so on. 633 00:47:13,503 --> 00:47:18,073 And their pitch and the initial mission statement was to 634 00:47:18,183 --> 00:47:20,393 release everything open source. 635 00:47:20,473 --> 00:47:22,153 Now, hence the name OpenAI. 636 00:47:23,178 --> 00:47:28,158 And then somewhere along the way, that turned and it's currently a for profit, 637 00:47:28,618 --> 00:47:33,598 closed source company, worth, what, under a hundred billion at the moment. 638 00:47:34,128 --> 00:47:37,548 we're recording this on March the 4th, a few days ago. 639 00:47:37,628 --> 00:47:43,898 Elon Musk, opened this lawsuit, where he alleges that, he was basically scammed 640 00:47:43,998 --> 00:47:49,368 because they turned the company around and they went against the initial mission. 641 00:47:49,428 --> 00:47:53,208 And, I think the opinions on the internet, vary from, 'okay, this is 642 00:47:53,578 --> 00:47:59,638 jealousy', because he's jealous of, of the success that open AI has seen. 643 00:48:00,263 --> 00:48:03,313 To, 'okay, this is a nice publicity stand. 644 00:48:03,353 --> 00:48:06,503 he probably has a point, but this is probably not going 645 00:48:06,503 --> 00:48:07,833 to start standing court'. 646 00:48:08,483 --> 00:48:13,173 and I'm trying to make sense of, how much of that is actually valid and 647 00:48:13,173 --> 00:48:18,293 how much I should be worried about OpenAI being, at the forefront of 648 00:48:18,293 --> 00:48:20,793 this, a big closed source company. 649 00:48:21,313 --> 00:48:26,673 I also heard that, many years ago when Elon Musk and the Sam Altman co-founded 650 00:48:26,693 --> 00:48:33,403 the OpenAI, their objective, was, a nonprofit organization, Given the 651 00:48:33,428 --> 00:48:41,838 competition from other big players in the industry, I think OpenAI was under 652 00:48:41,888 --> 00:48:51,388 pressure to commercialize ChatGPT and this may go against the original objective so 653 00:48:51,758 --> 00:48:53,908 I can see the argument from both sides. 654 00:48:53,958 --> 00:48:59,388 on the one hand, we have to be careful like we just discussed about the use of, 655 00:49:00,178 --> 00:49:07,618 AI that may lead to, the end of humanity as we know it, if we're not careful. 656 00:49:07,618 --> 00:49:11,018 But at the same time, if we use that properly, I. 657 00:49:11,543 --> 00:49:18,613 It can be a great tool, that's why there is such a great market for, generative AI, 658 00:49:18,663 --> 00:49:24,013 so I think there is some tension, within the company, so you have different views. 659 00:49:24,203 --> 00:49:28,783 that's why, I think, a few months ago, within several days, Altman 660 00:49:29,013 --> 00:49:32,613 was fired and then get hired back and so on and so forth. 661 00:49:33,083 --> 00:49:37,233 in the background, I think it's really just those two forces at play, so 662 00:49:37,253 --> 00:49:44,153 the force wants to make sure that, AI does not go out of control, harm human 663 00:49:44,153 --> 00:49:50,923 beings and at the same time, there is huge pressure from, industry peers to 664 00:49:51,418 --> 00:49:56,638 Commercialize those applications to make profits, Actually I'm glad that, Elon 665 00:49:56,638 --> 00:50:02,918 Musk actually made the lawsuit in the sense that it may, swing the pendulum 666 00:50:02,938 --> 00:50:07,408 to the other side so eventually what I think, uh, the view that we should 667 00:50:07,413 --> 00:50:13,348 commercialize and make money out of it, I think that kind of view prevailed, right? 668 00:50:13,348 --> 00:50:20,738 that's why Sam Altman got hired back, but that can go too far, because, in the 669 00:50:20,758 --> 00:50:29,018 process of competition, making profits, you may sacrifice security, so I think, 670 00:50:29,918 --> 00:50:36,538 the lawsuit by Elon Musk can potentially put the original mission in check. 671 00:50:36,858 --> 00:50:44,688 So to speak, and maybe, force OpenAI and other tech companies to think 672 00:50:44,708 --> 00:50:49,433 more about, guardrails around, AI to make sure It doesn't go out 673 00:50:49,433 --> 00:50:53,153 of control and harm human beings, 674 00:50:53,153 --> 00:50:59,083 time will tell if anything comes out of it other than, one billionaire 675 00:50:59,123 --> 00:51:01,653 being upset at another, but we'll see. 676 00:51:02,693 --> 00:51:06,033 So I'm going to ask you for one more prediction, and this time 677 00:51:06,053 --> 00:51:07,543 a little bit more down-to-earth. 678 00:51:08,343 --> 00:51:08,723 Pytorch. 679 00:51:09,593 --> 00:51:14,433 It appears to be still on the rise and, it appears to be the kind of 680 00:51:14,503 --> 00:51:17,443 go-to option for any new papers. 681 00:51:18,013 --> 00:51:21,013 TensorFlow seems to be, stagnating a little bit. 682 00:51:22,093 --> 00:51:25,483 you talked a little bit about the advantages of PyTorch and 683 00:51:25,483 --> 00:51:26,923 why you chose it for your book. 684 00:51:27,023 --> 00:51:31,133 and, I'm wondering, do you see this being like the prevailing platform? 685 00:51:31,693 --> 00:51:36,433 because now I think that the main kind of breakthroughs for Pytorch was, you 686 00:51:36,433 --> 00:51:41,463 mentioned the GPU support, obviously, and also the built in, backpropagation, 687 00:51:41,573 --> 00:51:45,963 right, the autograd now, the other frameworks also provide the autograd. 688 00:51:46,443 --> 00:51:50,013 so I guess they're closing up the gap a little bit in that respect, if 689 00:51:50,013 --> 00:51:54,063 you were to venture one more crazy prediction, would you see Pytorch 690 00:51:54,283 --> 00:51:55,833 leading the way going forward? 691 00:51:56,648 --> 00:51:59,208 Are you going to update your book in a couple of years to 692 00:51:59,238 --> 00:52:01,508 port it to some other framework? 693 00:52:02,613 --> 00:52:09,743 I think PyTorch is going to prevail in the near future. 694 00:52:09,743 --> 00:52:13,297 So I mentioned this in my book. 695 00:52:13,297 --> 00:52:20,633 So what PyTorch does is, using a dynamic computational graph, which means it 696 00:52:20,633 --> 00:52:26,503 creates, Computational graph on the fly so that, it's faster, it's more flexible. 697 00:52:26,973 --> 00:52:31,593 TensorFlow is using static computational graph. 698 00:52:31,643 --> 00:52:33,253 so it's slower. 699 00:52:33,793 --> 00:52:35,993 so that's the main difference. 700 00:52:36,063 --> 00:52:40,243 And, it affects the training speed greatly. 701 00:52:40,533 --> 00:52:46,103 so in TensorFlow, you don't really have to worry about which device you can use. 702 00:52:46,443 --> 00:52:50,873 it's all done at the backend automatically by TensorFlow. 703 00:52:51,083 --> 00:52:56,853 but at a cost, If you have, an industry scale Models, and then you have a lot 704 00:52:56,853 --> 00:53:04,503 of GPU and you do a huge calculation Maybe the overhead is neglectable. 705 00:53:04,523 --> 00:53:11,433 doesn't affect things much but for a lot of researchers it makes a huge difference 706 00:53:11,453 --> 00:53:17,393 because we already working with a lot of toy models not huge, therefore If you 707 00:53:17,393 --> 00:53:23,023 use the PyTorch, there is a little bit of inconvenience in the sense that you 708 00:53:23,023 --> 00:53:31,303 have to, specify whether to move this tensor to GPU, and then once you are 709 00:53:31,303 --> 00:53:34,323 done with it, you have to, get it back. 710 00:53:35,723 --> 00:53:39,613 But the benefit is huge because it, greatly. 711 00:53:40,783 --> 00:53:42,963 Increases the training speed. 712 00:53:43,353 --> 00:53:50,373 I think like at least for, small players, regular readers, and also 713 00:53:50,393 --> 00:53:53,463 for researchers around the world. 714 00:53:53,523 --> 00:53:56,493 I think a PyTorch is much more convenient. 715 00:53:56,693 --> 00:53:57,733 It's much faster. 716 00:53:58,193 --> 00:54:03,383 And certain large corporations, they may not care that much. 717 00:54:04,803 --> 00:54:09,793 for regular people PyTorch is much more convenient, it's much faster 718 00:54:09,993 --> 00:54:12,753 and in the near term it may, win out. 719 00:54:13,393 --> 00:54:18,208 for anybody listening to this, I know that if I haven't, read your book 720 00:54:18,208 --> 00:54:20,038 before, I would probably be on manning. 721 00:54:20,248 --> 00:54:21,528 com, looking at it. 722 00:54:21,588 --> 00:54:26,248 And then at some point I would reach chapter 4, where you're walking 723 00:54:26,398 --> 00:54:32,568 us through building a network that does, generation of anime faces. 724 00:54:33,908 --> 00:54:36,358 Which I thought was a pretty cool example. 725 00:54:37,663 --> 00:54:42,503 Can you give us a taste, for, anybody who's going to be doing that? 726 00:54:42,603 --> 00:54:45,563 what's the training gonna look like? 727 00:54:45,713 --> 00:54:49,463 what data we're going to use, how we're going to implement a network. 728 00:54:49,463 --> 00:54:54,523 And then in terms of training, what kind of hardware you need for the 729 00:54:54,523 --> 00:54:59,293 training to be, quick, how much time you need to, see for that. 730 00:54:59,423 --> 00:55:03,083 give us an idea whether this is something that, someone who is comfortable with 731 00:55:03,083 --> 00:55:08,753 Python can just pick up on a Sunday, on a random weekend and go through, or whether 732 00:55:08,753 --> 00:55:10,823 there's any extra prep that's needed. 733 00:55:11,813 --> 00:55:18,922 in order to train a GAN model to produce the color images or 734 00:55:18,952 --> 00:55:24,367 for anime faces obviously you need the training data, right? 735 00:55:24,417 --> 00:55:30,527 the research community has a lot of human-created data 736 00:55:30,667 --> 00:55:33,467 for us to experiment on. 737 00:55:33,887 --> 00:55:38,367 So you can actually go to a website, download the anime faces. 738 00:55:38,952 --> 00:55:44,942 I think tens of thousands of them, and then you need to 739 00:55:44,952 --> 00:55:47,902 create two neural networks. 740 00:55:48,132 --> 00:55:54,212 One is the generator, one is the discriminator, and the generator is 741 00:55:54,282 --> 00:56:01,867 trying to create an image that can pass as real in front of the discriminator. 742 00:56:02,357 --> 00:56:07,357 you just train the model, many rounds and then eventually you will 743 00:56:07,357 --> 00:56:13,307 see that the generator is able to generate a anime face, which is very 744 00:56:13,307 --> 00:56:15,857 much the one from the training set. 745 00:56:16,347 --> 00:56:21,257 I want to mention that in order to, generate, color images of 746 00:56:21,337 --> 00:56:25,297 human faces, you don't need to use, convolutional neural networks 747 00:56:25,317 --> 00:56:27,197 because, we mentioned this earlier. 748 00:56:27,197 --> 00:56:30,797 So if you use a fully connected, dense neural networks. 749 00:56:31,097 --> 00:56:35,797 There are just too many, parameters and then the training will be too slow. 750 00:56:36,067 --> 00:56:40,057 So on the other hand, if you use the convolutional neural 751 00:56:40,057 --> 00:56:42,255 networks, you localize the weights. 752 00:56:42,255 --> 00:56:47,077 So the weights will stay the same in a filter and then you 753 00:56:47,077 --> 00:56:49,487 move the filter around the image. 754 00:56:49,667 --> 00:56:54,117 So there's a way of greatly reduce the number of parameters in the model and 755 00:56:54,147 --> 00:56:56,927 make the model training much faster. 756 00:56:57,887 --> 00:57:00,997 this is on the software side, on the training side. 757 00:57:01,357 --> 00:57:12,022 In terms of hardware, so I trained it using, GeForce RTX 2060, like a GPU. 758 00:57:12,342 --> 00:57:15,192 I think right now the cost is three or four hundred bucks. 759 00:57:15,212 --> 00:57:19,612 It's not that expensive You can easily buy it or if you have a older 760 00:57:19,652 --> 00:57:23,872 gaming computer, you can just grab it and then put on your computer. 761 00:57:24,102 --> 00:57:27,332 It's very easy to do, you don't really need a lot of knowledge 762 00:57:27,332 --> 00:57:29,352 about computer hardware to do it. 763 00:57:29,432 --> 00:57:33,993 Nowadays, computers are very user friendly you can Just pop it open and, change 764 00:57:34,073 --> 00:57:35,403 ports, very fast, that kind of stuff. 765 00:57:35,883 --> 00:57:40,081 So it took me like, 30 minutes to an hour to train the model. 766 00:57:40,331 --> 00:57:41,461 So it's very fast. 767 00:57:42,651 --> 00:57:49,091 However, if you don't really want to bother with the GPU, you can train 768 00:57:49,091 --> 00:57:55,271 the same model with the CPU and, what you can do is, you can simply 769 00:57:55,311 --> 00:58:00,486 leave your computer on all night it may take, five, six or seven hours, 770 00:58:00,756 --> 00:58:02,946 but, it can be easily done overnight. 771 00:58:02,966 --> 00:58:09,556 You just leave the program on, go to sleep, next morning, you see the result. 772 00:58:09,576 --> 00:58:13,085 so in that sense, computationally, it's not that costly. 773 00:58:14,295 --> 00:58:22,330 I think the most complicated model, would it be, chapter six, you have to convert, 774 00:58:22,330 --> 00:58:25,990 like a horse image into a zebra image. 775 00:58:26,370 --> 00:58:31,850 It's called, cycleGAN and then you have to convert like a blonde hair to black hair 776 00:58:31,880 --> 00:58:34,700 in images or black hair to blonde hair, 777 00:58:35,020 --> 00:58:37,630 Those kind of models are a little bit more. 778 00:58:38,020 --> 00:58:42,740 Time consuming, because you are using higher resolution, number one. 779 00:58:42,740 --> 00:58:48,920 Number two, you are actually training two generators and two discriminators. 780 00:58:49,090 --> 00:58:55,580 Okay, so what, how CycleGAN works is that, you have two generators, let's use a horse 781 00:58:55,581 --> 00:59:00,080 and a zebra as the example, how to convert a horse image to a zebra image, right? 782 00:59:00,240 --> 00:59:01,490 So you have two generators. 783 00:59:01,685 --> 00:59:07,925 One generator is called a horse generator, the other one is called a zebra generator. 784 00:59:08,885 --> 00:59:15,535 So what horse generator does is that it takes in a zebra image 785 00:59:15,845 --> 00:59:18,805 and convert it into a horse image. 786 00:59:19,275 --> 00:59:25,565 And then what is a zebra generator does is that it will Take a horse 787 00:59:25,570 --> 00:59:27,575 image and convert it into a zebra. 788 00:59:28,385 --> 00:59:30,755 And then you also have two discriminators. 789 00:59:30,885 --> 00:59:38,265 the horse discriminator will tell whether an image is a horse image or not, and 790 00:59:38,265 --> 00:59:45,185 then the zebra discriminator will tell if an image, if is a zebra image or not. 791 00:59:45,365 --> 00:59:51,810 and then, cycleGAN has another element a loss function has a 792 00:59:51,860 --> 00:59:53,940 component called a cycle loss. 793 00:59:54,020 --> 00:59:54,740 So what do you do? 794 00:59:54,870 --> 00:59:57,830 So I think the idea is really Ingenious. 795 00:59:57,860 --> 01:00:03,505 that's why I mentioned that with the right loss function you can't show anything. 796 01:00:03,605 --> 01:00:06,385 originally you have a horse image, right? 797 01:00:06,575 --> 01:00:15,501 And then you give that image to a zebra generator to create a fake Zebra image. 798 01:00:16,241 --> 01:00:16,531 Okay. 799 01:00:16,591 --> 01:00:25,551 Now, you will use that fake zebra image as input to the horse generator, and ask 800 01:00:25,591 --> 01:00:35,181 the horse generator to convert the fake zebra image into a fake horse image. 801 01:00:35,301 --> 01:00:41,931 now here is the key if both generators do their job right, then 802 01:00:42,891 --> 01:00:49,711 the fake horse image you got will be Identical to the original horse 803 01:00:49,711 --> 01:00:52,141 image You so that's called a cycleGAN. 804 01:00:52,164 --> 01:00:56,444 cycle loss is trying to minimize the loss between 805 01:00:58,194 --> 01:01:04,044 the original horse image and the fake horse image after a round trip. 806 01:01:04,944 --> 01:01:10,484 That's a very powerful tool because that forces the model, both models, 807 01:01:10,554 --> 01:01:17,399 both the zebra generator and the horse generator to generate realistic Images. 808 01:01:17,849 --> 01:01:22,949 so since your show is called HockeyStick I think that's like when 809 01:01:22,949 --> 01:01:27,289 I was like trying to experiment the different models I think that is 810 01:01:27,369 --> 01:01:29,479 pretty much like a hockeystick moment. 811 01:01:29,539 --> 01:01:34,929 When I saw that, I was like, this is like a psycho loss is really 812 01:01:34,969 --> 01:01:43,124 ingenious because that component in the loss function is crucial for you to 813 01:01:43,124 --> 01:01:48,494 successfully convert a horse image into a zebra and a zebra image into a horse. 814 01:01:49,034 --> 01:01:54,694 When I saw that I was completely amazed not just by how well the model 815 01:01:54,694 --> 01:02:01,824 works, but also by, the, ingenious mechanism, devised by the researchers. 816 01:02:01,834 --> 01:02:05,094 again, there are tons of smart people in the profession. 817 01:02:05,354 --> 01:02:11,239 So sometimes I see what they are doing, and once I understand what they 818 01:02:11,239 --> 01:02:13,709 are doing, I was completely amazed. 819 01:02:13,709 --> 01:02:18,379 I said, this method is amazing, the author must be a genius, I think there 820 01:02:18,479 --> 01:02:20,349 are tons of geniuses in our profession. 821 01:02:22,344 --> 01:02:23,264 Love that story. 822 01:02:23,294 --> 01:02:27,614 And also FYI, I'm totally stealing the quote from you 823 01:02:27,724 --> 01:02:29,894 with the right loss function. 824 01:02:30,014 --> 01:02:31,124 You can achieve anything. 825 01:02:31,124 --> 01:02:32,614 I think this should go on a t shirt. 826 01:02:34,199 --> 01:02:34,839 That's right, yeah. 827 01:02:35,319 --> 01:02:37,459 with the right loss function, you can achieve anything. 828 01:02:38,529 --> 01:02:43,194 That's my belief, the concept of the loss function is very powerful. 829 01:02:44,059 --> 01:02:47,639 so loss function is another way of saying the objective function, right? 830 01:02:47,639 --> 01:02:51,819 you are telling the model what to achieve, what to do, it's very powerful. 831 01:02:52,284 --> 01:02:59,484 Yeah, I think what keeps striking me is that once you go and look into this ideas. 832 01:02:59,744 --> 01:03:04,284 They're not actually that complicated, there's not too much magic in it, but 833 01:03:04,324 --> 01:03:09,224 to come up with that idea initially, be the first one to propose that it 834 01:03:09,224 --> 01:03:11,974 does require certain a level of genius. 835 01:03:12,544 --> 01:03:16,664 So I think, probably decades from now, kids will be learning a lot 836 01:03:16,664 --> 01:03:20,274 of that stuff in primary school or early in their education. 837 01:03:20,864 --> 01:03:26,074 And it just feels like we're really experiencing some kind of breakthrough 838 01:03:26,114 --> 01:03:28,934 in this profession, a hockeystick moment. 839 01:03:30,387 --> 01:03:31,037 Absolutely. 840 01:03:31,037 --> 01:03:34,667 it's good that a lot of smart researchers are working in the field. 841 01:03:34,997 --> 01:03:39,177 And sometimes when you get stuck on a question, you may 842 01:03:39,297 --> 01:03:40,997 work on it for years, right? 843 01:03:40,997 --> 01:03:46,207 Without any breakthrough, and then suddenly, last year, like a strong line, 844 01:03:46,627 --> 01:03:51,687 year after year, suddenly, there is a aha moment, and then you figure out the 845 01:03:51,707 --> 01:03:54,667 way to tackle the problem and it worked. 846 01:03:54,957 --> 01:03:59,712 And then it's a method may become revolutionary, it may 847 01:03:59,972 --> 01:04:01,582 completely change the field 848 01:04:03,102 --> 01:04:05,372 You're about to finish, your book. 849 01:04:05,912 --> 01:04:08,672 Is there anything that you would do differently if you 850 01:04:08,672 --> 01:04:10,702 were starting to write it today? 851 01:04:11,722 --> 01:04:13,632 Would you make any different choices? 852 01:04:15,096 --> 01:04:16,236 Good question. 853 01:04:16,336 --> 01:04:20,156 I don't think there are many things I would change. 854 01:04:20,476 --> 01:04:24,636 The reason is because even though it's a new book, actually I have been 855 01:04:24,636 --> 01:04:34,486 working on it for a couple years now, so I have, a GitHub, repository, before 856 01:04:34,546 --> 01:04:41,536 I, submit a proposal to manning so it's my way of working things out. 857 01:04:41,566 --> 01:04:46,766 couple years ago I started to, use PyTorch for machine learning 858 01:04:46,766 --> 01:04:49,021 models and I started to get into. 859 01:04:50,151 --> 01:04:59,301 generative AI, and then I started to, use PyTorch to generate shapes, images, 860 01:04:59,521 --> 01:05:04,641 and then eventually I get into natural language processing, large language 861 01:05:04,641 --> 01:05:07,571 models, and then I had a lot of projects. 862 01:05:08,411 --> 01:05:16,161 on my computer writing book, it's my way of, organize things to, think things 863 01:05:16,161 --> 01:05:20,331 through to make sure everything works out. 864 01:05:21,051 --> 01:05:25,931 but I know that, in order to write a compelling proposal. 865 01:05:26,951 --> 01:05:30,521 I need to, first prepare well, right? 866 01:05:30,521 --> 01:05:35,346 especially there are not too many good publishers out there, so you only 867 01:05:35,346 --> 01:05:38,986 have one shot with a good publisher. 868 01:05:38,986 --> 01:05:42,786 like manning is one of the great publishers. 869 01:05:42,846 --> 01:05:48,246 over the years I've read many books from manning and, I really enjoyed their books 870 01:05:48,856 --> 01:05:57,316 and, I knew that I needed to write a good proposal in order to work it out. 871 01:05:57,406 --> 01:05:59,026 I don't want to lose a chance. 872 01:05:59,606 --> 01:06:05,986 So what I did was, in the summer, I spent several months to 873 01:06:06,026 --> 01:06:10,556 create a huge github repository. 874 01:06:11,006 --> 01:06:17,156 So I lay out all the chapters initially, like the first draft, and 875 01:06:17,156 --> 01:06:24,636 it had 17 chapters and, each chapter I use a Jupyter notebook to explain 876 01:06:24,726 --> 01:06:27,666 everything to the best of my ability. 877 01:06:27,866 --> 01:06:29,236 All the codes are there. 878 01:06:29,446 --> 01:06:31,596 So it's, pretty much like a book. 879 01:06:33,056 --> 01:06:40,206 Once I have that, then I spend another month to convert it 880 01:06:40,206 --> 01:06:43,326 into an actual book, a PDF file. 881 01:06:43,686 --> 01:06:45,846 a lot of tech people use latex. 882 01:06:45,856 --> 01:06:49,896 Latex is A word processing software, right? 883 01:06:49,946 --> 01:06:54,846 especially if you have a lot of math, you can actually generate like a beautiful 884 01:06:54,946 --> 01:06:59,266 equation, my book has some like a equation, some math, but not a whole lot. 885 01:06:59,736 --> 01:07:06,261 But, it forces me to go through everything one more time, in 886 01:07:06,291 --> 01:07:13,181 the process of converting, the GitHub repository into a PDF file. 887 01:07:14,041 --> 01:07:16,471 I spent a lot of months converting everything. 888 01:07:17,061 --> 01:07:22,211 And also it looks beautiful because, uh, it exactly like a book. 889 01:07:22,421 --> 01:07:27,921 you have a template, you have a cover, you have, table of content, you have each 890 01:07:27,921 --> 01:07:33,271 chapter, what is section number, what is section title, what is subsection so 891 01:07:33,271 --> 01:07:37,931 forth, you have images, in short, it's pretty much like a book to be published. 892 01:07:37,951 --> 01:07:44,556 and then I sent that, to manning, in the summer, along with the PDF file, 893 01:07:44,596 --> 01:07:51,196 along with the, proposal file, and then I have a link to the GitHub page. 894 01:07:51,646 --> 01:07:57,566 And then what manning did was send the book proposal to more than 895 01:07:57,566 --> 01:07:59,716 10 reviewers in the profession. 896 01:07:59,726 --> 01:08:07,196 The reviewers are all data scientists, people who know, AI in the profession, 897 01:08:07,486 --> 01:08:12,926 and they give comments on whether, this book should be published And then they 898 01:08:12,936 --> 01:08:15,241 give a lot of, very valuable feedback. 899 01:08:16,056 --> 01:08:21,466 the feedback was very positive, partly because it's a hot topic, partly because 900 01:08:21,466 --> 01:08:23,966 I spent a lot of time preparing it, right? 901 01:08:24,296 --> 01:08:27,226 but I did receive a lot of good feedback. 902 01:08:27,256 --> 01:08:32,676 to answer your question is because I have been through the several rounds. 903 01:08:32,936 --> 01:08:38,636 now, there's not much I would change, because I have already incorporated, 904 01:08:38,836 --> 01:08:45,256 some feedbacks, great feedbacks from about the 12, reviewers on the proposal. 905 01:08:45,706 --> 01:08:46,006 Fair enough. 906 01:08:46,454 --> 01:08:48,454 How many copies have you sold so far? 907 01:08:49,596 --> 01:08:51,946 it's already sold more than a thousand copies. 908 01:08:52,046 --> 01:08:56,556 I think like it's a daily high was 58. 909 01:08:56,806 --> 01:09:05,526 So it says a lot about the demand for, generative AI and if you look at the, 910 01:09:05,576 --> 01:09:13,276 the top 10, from manning website every week, you will see generative AI is hot. 911 01:09:13,876 --> 01:09:14,956 a lot of demand. 912 01:09:14,956 --> 01:09:18,726 And another trend is, Python PyTorch. 913 01:09:18,796 --> 01:09:24,066 I think that's, a lot of people are switching to PyTorch and, I 914 01:09:24,066 --> 01:09:27,316 think there is a book from Manning called, "Deep Learning with PyTorch". 915 01:09:27,316 --> 01:09:29,246 It's selling very well. 916 01:09:29,526 --> 01:09:33,476 And then there's another book called, "Large Language Models from Scratch". 917 01:09:33,496 --> 01:09:37,616 actually the book is also using PyTorch just as I do. 918 01:09:37,776 --> 01:09:41,261 But it's that just that focuses on large language models, but in 919 01:09:41,261 --> 01:09:46,591 my book focus on many different contents like large language models. 920 01:09:47,031 --> 01:09:54,341 music, images, shapes, numbers And then another thing I want to 921 01:09:54,391 --> 01:10:00,771 mention is that, I did spend a lot of time thinking about, how to help 922 01:10:01,576 --> 01:10:05,846 readers learn progressively, step by step. 923 01:10:06,396 --> 01:10:09,966 chapter one, of course, is an overview of the book of the, 924 01:10:10,106 --> 01:10:14,236 generative AI landscape and, what is the book is trying to accomplish. 925 01:10:14,726 --> 01:10:18,406 Chapter two, it's a deep learning with PyTorch. 926 01:10:18,616 --> 01:10:20,276 So even if readers. 927 01:10:21,261 --> 01:10:24,441 Have no background using PyTorch. 928 01:10:24,821 --> 01:10:29,011 after reading chapter two, they will be able to use, pyTorch to 929 01:10:29,011 --> 01:10:31,021 create, deep learning models. 930 01:10:31,301 --> 01:10:35,306 from A to Z you have you can do the whole thing. 931 01:10:35,396 --> 01:10:35,816 Okay? 932 01:10:36,146 --> 01:10:37,436 So that's very important. 933 01:10:37,586 --> 01:10:41,136 And then chapter three, we get into GANs. 934 01:10:41,316 --> 01:10:45,186 So you will use, GANs to generate, numbers and the shapes. 935 01:10:45,426 --> 01:10:47,696 So the models are very simple. 936 01:10:47,696 --> 01:10:51,626 you only have a two or three layers, of neurons in those models. 937 01:10:51,636 --> 01:10:53,986 So therefore, it's very easy to understand. 938 01:10:54,246 --> 01:10:59,326 It's easy to create, and the training takes a matter of minutes. 939 01:10:59,656 --> 01:11:03,786 readers will not get, frustrated because everything is so simple. 940 01:11:04,186 --> 01:11:05,906 And then in chapter four, 941 01:11:07,086 --> 01:11:08,266 I kick things up a notch 942 01:11:09,546 --> 01:11:17,476 so instead of using fully connected dense layers, I use convolutional layers 943 01:11:17,886 --> 01:11:20,766 that's needed for image processing. 944 01:11:20,836 --> 01:11:26,196 If you want to create a high resolution color images, fully 945 01:11:26,236 --> 01:11:29,756 connected dense layers won't work It may work, but it's very slow. 946 01:11:30,196 --> 01:11:33,336 On the other hand, if you use convolutional layers, it's much faster 947 01:11:33,376 --> 01:11:39,767 because you use filters, to move around the image, and then you just train 948 01:11:39,797 --> 01:11:42,578 the weights in the filter itself. 949 01:11:42,578 --> 01:11:46,948 So that's much more efficient and that kind of stuff and then so 950 01:11:46,988 --> 01:11:52,848 people learn to use the convolutional layers in chapter four to generate 951 01:11:52,868 --> 01:11:59,128 the color image and then in chapter five I kick things up another level. 952 01:11:59,758 --> 01:12:06,468 So readers learn to select characteristics in images, you can 953 01:12:06,468 --> 01:12:12,348 choose to generate An image with eyeglasses or without eyeglasses You 954 01:12:12,348 --> 01:12:17,028 can transition from an image with glass to an image without glasses. 955 01:12:17,558 --> 01:12:22,178 So all those arithmetic kind of stuff and then chapter six is not out 956 01:12:22,178 --> 01:12:27,498 yet, but I will do the cycleGAN is computationally costly, because the 957 01:12:27,498 --> 01:12:31,948 reason I just mentioned it because they have two generators, two discriminators. 958 01:12:32,638 --> 01:12:38,208 and then chapter seven is about, variational auto encoders. 959 01:12:38,208 --> 01:12:40,208 that's a different model from GAN. 960 01:12:40,438 --> 01:12:45,328 that is important, because it has a encoder-decoder architecture. 961 01:12:45,548 --> 01:12:47,248 We see it's very common. 962 01:12:47,493 --> 01:12:54,093 In machine learning models, for example, ChatGPT is like a decoder-only model, the 963 01:12:54,133 --> 01:13:00,823 original transformer paper attention is all you need has like an encoder part, 964 01:13:00,913 --> 01:13:02,693 and a decoder part that kind of stuff. 965 01:13:02,963 --> 01:13:08,283 And then after that, I get into transformers, natural language processing, 966 01:13:08,303 --> 01:13:15,713 how to do tokenization, how to create a transformer from scratch, including 967 01:13:15,723 --> 01:13:21,543 like a ChatGPT-style, you can create a GPT from scratch, you can train it. 968 01:13:21,603 --> 01:13:25,523 I saw that you have, several posts on LinkedIn about how to 969 01:13:25,523 --> 01:13:27,293 create a GPT from scratch, right? 970 01:13:27,343 --> 01:13:33,523 my book does exactly that in, chapter 10, how to create a GPT from scratch. 971 01:13:33,563 --> 01:13:39,653 And then chapter 11 is how to create a small GPT from scratch and then train it. 972 01:13:39,838 --> 01:13:40,858 To generate text. 973 01:13:41,078 --> 01:13:46,218 its focus is not mainly on creating, but on training a GPT from scratch. 974 01:13:46,248 --> 01:13:47,908 Of course, it's much smaller. 975 01:13:47,908 --> 01:13:50,278 It only has 5 million parameters. 976 01:13:50,618 --> 01:13:53,458 But you learn how to train a model from scratch. 977 01:13:53,878 --> 01:14:00,088 and after that it's music generation and then different models and then how you 978 01:14:00,118 --> 01:14:06,268 can use the langChain to chain together different, large language models. 979 01:14:06,278 --> 01:14:08,368 So that's the whole book, 980 01:14:08,398 --> 01:14:11,648 it's been a real pleasure to talk to you. 981 01:14:11,698 --> 01:14:14,968 I'm personally super excited, can't wait until the rest of 982 01:14:14,968 --> 01:14:16,438 the chapters become available. 983 01:14:16,438 --> 01:14:21,878 So, you know, hurry up before I let you go. 984 01:14:22,378 --> 01:14:28,533 I'm curious whether you have your next idea for your next book already in 985 01:14:28,533 --> 01:14:33,063 mind or whether you're going to take a small break before book number four. 986 01:14:33,768 --> 01:14:38,538 So far I'm very busy with, writing the current book. 987 01:14:39,048 --> 01:14:41,768 I do get ideas from time to time. 988 01:14:42,218 --> 01:14:48,573 One example is, I think this is a text to Image, like a 989 01:14:48,573 --> 01:14:51,573 multimodal model thing, is amazing. 990 01:14:51,833 --> 01:14:56,143 I think, there could be another book, there, just focused purely 991 01:14:56,183 --> 01:15:03,303 on diffusion models and, multimodal transformers, how to convert a text 992 01:15:03,303 --> 01:15:06,323 to image, or convert, text to video, 993 01:15:06,373 --> 01:15:07,763 There could be a book there. 994 01:15:07,813 --> 01:15:12,363 I thought about it, but, I didn't spend a lot of time on it because I'm busy 995 01:15:12,673 --> 01:15:18,823 writing the current book and the other, idea I thought about is, so this is 996 01:15:18,823 --> 01:15:21,783 also related to multi modal models. 997 01:15:21,843 --> 01:15:24,223 my first book is called a make a Python talk, right? 998 01:15:24,393 --> 01:15:30,843 But it's actually using Google API to do the actual speech 999 01:15:30,843 --> 01:15:32,453 recognition, text to speech. 1000 01:15:32,843 --> 01:15:35,533 I don't do any machine learning part. 1001 01:15:35,783 --> 01:15:40,813 So I just use the Google API to do all the heavy lifting But, there are 1002 01:15:41,093 --> 01:15:43,103 like open source models out there. 1003 01:15:43,103 --> 01:15:45,203 You can actually train a model. 1004 01:15:45,648 --> 01:15:50,988 To, do speech recognition, so that's actually a multi modal model, right? 1005 01:15:50,988 --> 01:15:56,498 Because, speech recognition, basically the input is, audio, output is text, right? 1006 01:15:56,808 --> 01:16:00,068 And then you can also do text to speech. 1007 01:16:00,138 --> 01:16:03,298 that can be another interesting project. 1008 01:16:03,308 --> 01:16:06,658 I have some ideas on how they work, but I do have to spend 1009 01:16:06,758 --> 01:16:10,208 a lot of time to experiment. 1010 01:16:10,568 --> 01:16:15,998 so I would say in another two or three years, I may venture into 1011 01:16:15,998 --> 01:16:19,008 one of those ideas and maybe write another book about it. 1012 01:16:19,868 --> 01:16:20,308 Awesome. 1013 01:16:20,328 --> 01:16:23,418 you're going to have one reader already interested in that. 1014 01:16:23,428 --> 01:16:24,918 So definitely go for it. 1015 01:16:25,440 --> 01:16:31,760 Okay, let me ask you then, which idea do you like better, the speech recognition 1016 01:16:31,770 --> 01:16:38,150 model or, just a book about, text to image, multimodal, transformer, 1017 01:16:38,590 --> 01:16:39,690 which idea do you like better? 1018 01:16:40,096 --> 01:16:45,046 I've been meaning to properly read the whisper, paper. 1019 01:16:45,506 --> 01:16:50,946 So I think the speech, recognition is actually a pretty good use 1020 01:16:51,116 --> 01:16:54,306 case, and I would definitely be interested in reading that. 1021 01:16:55,033 --> 01:16:55,583 Good to know. 1022 01:16:55,603 --> 01:16:57,683 I may put more emphasis on that project. 1023 01:16:58,308 --> 01:16:58,768 Awesome. 1024 01:16:58,978 --> 01:16:59,528 the feedback. 1025 01:17:00,348 --> 01:17:00,708 All right. 1026 01:17:00,818 --> 01:17:01,678 thank you so much. 1027 01:17:01,728 --> 01:17:05,078 It's been a pleasure, and hopefully I'll get you next time with your next book. 1028 01:17:05,558 --> 01:17:06,208 Thanks a lot. 1029 01:17:07,368 --> 01:17:07,978 Thank you.