1 00:00:00,000 --> 00:00:03,110 Miko Pawlikowski: I'm Miko Pawlikowski, and this is Hockey Stick. 2 00:00:06,490 --> 00:00:08,740 Starting with generative AI can be daunting. 3 00:00:08,970 --> 00:00:12,239 There's a lot of hype, a lot of development, and a lot of change, daily. 4 00:00:12,730 --> 00:00:22,099 It's easy enough to launch ChatGPT and ask for a poem on how Vim is superior to Emacs, but to get value from it professionally requires a bit more skill. 5 00:00:22,660 --> 00:00:29,095 Today, I'm joined by Amit Bahri, The author of "Generative AI in Action", a brand new book published by Manning. 6 00:00:29,525 --> 00:00:39,575 Amit is a principal group technical program manager at Microsoft, where he leads the engineering team that builds the next generation of AI products and services on the Azure AI platform. 7 00:00:40,125 --> 00:00:48,465 He has over 25 years of experience in technology and product development, including the artificial intelligence and cloud platforms fields. 8 00:00:49,235 --> 00:00:52,555 And yes, you will learn what his mom thinks about ChatGPT. 9 00:00:53,264 --> 00:00:56,554 Welcome to this episode and thank you for flying hockey stick. 10 00:00:57,066 --> 00:00:58,996 let's start right away. 11 00:00:59,096 --> 00:01:00,716 why did you write the book? 12 00:01:01,206 --> 00:01:13,905 Amit Bahree: being in the AI platform team at Microsoft, one of my roles, which more or less became the day job over the last year and a half was meeting with a lot of our customers, which are generally large enterprises where 13 00:01:13,905 --> 00:01:23,136 everybody wanted to know how do I use Gen AI, obviously took over the world, as I joke, my mom's a ChatGPT expert 14 00:01:23,266 --> 00:01:24,666 Miko Pawlikowski: Oh yeah, I bet she is. 15 00:01:26,066 --> 00:01:33,726 Amit Bahree: and I basically, at the end of the day, got tired answering and guiding the same thing again and again across multiple customers. 16 00:01:34,206 --> 00:01:42,221 so I said, What if this could be put down on paper and they could just learn themselves rather than we being the bottleneck in many ways, right? 17 00:01:42,221 --> 00:01:46,461 So in, in full transparency, it was a selfish exercise. 18 00:01:46,571 --> 00:01:54,311 so I don't have to repeat myself again and again in doing this could just point them and say, Hey, go read this and that'll at least give you a jumpstart. 19 00:01:55,436 --> 00:01:56,266 Miko Pawlikowski: Yeah, exactly. 20 00:01:56,266 --> 00:01:57,766 Read the book. 21 00:01:58,366 --> 00:01:59,026 I love that. 22 00:01:59,136 --> 00:02:02,316 that's a completely valid, perfect origin story. 23 00:02:02,586 --> 00:02:07,226 so you mentioned that your mom is a generative AI expert, so I guess we'll interview her next time. 24 00:02:07,226 --> 00:02:11,636 But, for you,v what was your moment? 25 00:02:11,756 --> 00:02:13,686 When did you decide to go into AI? 26 00:02:13,786 --> 00:02:16,076 obviously you've been in it for a while. 27 00:02:16,546 --> 00:02:19,606 It wasn't as hot as it is right now back then. 28 00:02:19,946 --> 00:02:23,106 Can you tell us a little bit about your story and how you ended up doing it? 29 00:02:23,116 --> 00:02:23,776 What you're doing? 30 00:02:24,194 --> 00:02:26,114 Amit Bahree: I am actually not a data scientist. 31 00:02:26,134 --> 00:02:27,674 I'm not a machine learning engineer. 32 00:02:28,349 --> 00:02:34,109 I know how to build models, but that's not what I live and breathe and dream up in the middle of the night, as I know many of my colleagues do. 33 00:02:34,209 --> 00:02:35,109 That's their passion. 34 00:02:35,729 --> 00:02:51,099 in my previous role before Microsoft, one of the things I was learning was, emerging technologies, understanding from a technical point of view, what they are, how they work, how they could be used, or mostly in the context of an enterprise setting. 35 00:02:51,269 --> 00:02:56,519 And one of the technologies among a few was AI, a few years ago. 36 00:02:56,569 --> 00:03:01,479 In my role of looking at emerging tech, is how I got into AI. 37 00:03:01,559 --> 00:03:07,769 Of course, Gen AI or these underlying architecture principles that power these things today didn't exist. 38 00:03:08,599 --> 00:03:09,779 But I was quite fascinated. 39 00:03:09,779 --> 00:03:17,309 it was still a side job in the sense it was one of a few areas of emerging technologies to go dig and deep into. 40 00:03:17,824 --> 00:03:23,414 And then as that started getting more traction, I was the one eyed king in the kingdom of blind. 41 00:03:23,484 --> 00:03:27,014 Because, I knew more than the others, didn't know, doesn't mean I know most. 42 00:03:27,014 --> 00:03:28,594 And then I was stuck with that. 43 00:03:29,784 --> 00:03:31,734 And then grew into that and got fascinated. 44 00:03:32,326 --> 00:03:33,736 Miko Pawlikowski: as they say, 'the rest is history'. 45 00:03:34,914 --> 00:03:35,784 Amit Bahree: It's still early days. 46 00:03:37,779 --> 00:03:45,479 Miko Pawlikowski: so what does a principal group technical program manager, that's a mouthful, is that how you introduce yourself at parties? 47 00:03:45,794 --> 00:03:46,604 Amit Bahree: No. 48 00:03:47,604 --> 00:03:49,884 Microsoft likes long names and titles. 49 00:03:49,934 --> 00:03:50,324 titles. 50 00:03:50,324 --> 00:03:59,004 Being a aside, I basically have officially two day jobs, unofficially three day jobs So I sit in what we call the AI platform team. 51 00:03:59,094 --> 00:04:04,854 We are the product team that builds all the AI products that power other products or our end customers. 52 00:04:04,854 --> 00:04:08,224 I have formally two buckets of responsibilities. 53 00:04:08,934 --> 00:04:20,074 Microsoft, our leadership goals, we sign large contracts with customers, within which we promise them either new or better AI features. 54 00:04:20,094 --> 00:04:27,154 it could be brand new things that we're building with them or for them, or it could be improving existing features. 55 00:04:27,164 --> 00:04:33,084 So once we sign those contracts, those land on my plate to go deliver from a platform team perspective. 56 00:04:33,084 --> 00:04:37,004 So I'm responsible for a lot of the custom engineering on the platform, which is this. 57 00:04:37,774 --> 00:04:40,044 That's my first bucket of responsibilities. 58 00:04:40,144 --> 00:04:45,724 My second bucket of responsibilities is whatever we do custom in the first, make sure it's in the platform. 59 00:04:46,554 --> 00:04:48,694 Because if you keep being custom, then there's no platform left. 60 00:04:49,044 --> 00:05:04,574 So the way I want you and the listeners who will get to this to think about it is, these large deals that we sign are the catalyst for us to go do things in the platform that we already are thinking, maybe it's not prioritized enough. 61 00:05:04,574 --> 00:05:08,544 so they are a forcing function to go improve the platform at the end of the day. 62 00:05:09,014 --> 00:05:13,604 and that helps not just that one specific customer, but all the rest of them as well. 63 00:05:14,229 --> 00:05:29,044 And then my third unofficial one is anything and everything related to, Azure OpenAI coming from our, CEO and what we call our SLT, which is CEO and his, direct reports, in the context of customers where, it's a top of mind for many and. 64 00:05:30,144 --> 00:05:35,204 For many folks, their understanding is varied, which somewhat ties back to the genesis of the book. 65 00:05:35,844 --> 00:05:47,344 so when Satya meets other, CEOs and they have a question or they're not happy about something or they need guidance, those get sent over and say, here's the team is going to go help you. 66 00:05:47,444 --> 00:05:54,204 And so then I go in and from an engineering point of view, support, see what they need or what they want. 67 00:05:54,514 --> 00:05:56,394 So those are, that's my day job, right? 68 00:05:56,394 --> 00:05:57,474 So custom engineering. 69 00:05:57,979 --> 00:06:02,789 And then supporting, Azure OpenAI related things, from our leadership team. 70 00:06:03,306 --> 00:06:06,086 Miko Pawlikowski: and then there's your fourth job, which is writing books. 71 00:06:07,281 --> 00:06:20,411 Amit Bahree: That, indeed, that is also a moment of insanity in some ways, but yes, that is the graveyard shift, as I call it, because, it's after the work is done, and, which is never done, these days at least, so yes. 72 00:06:20,806 --> 00:06:21,476 Miko Pawlikowski: Of course. 73 00:06:22,926 --> 00:06:32,376 I have to ask you, obviously, not that long ago, there was this entire drama of, Sam Altman being fired, and then rehired, and all of that. 74 00:06:33,166 --> 00:06:36,786 And a lot of people were wondering a lot of things. 75 00:06:36,836 --> 00:06:41,616 Satya was quite prominent during that entire conversation. 76 00:06:42,256 --> 00:06:44,296 What's your take on what happened? 77 00:06:44,969 --> 00:06:45,869 Amit Bahree: couple of things. 78 00:06:45,949 --> 00:06:52,959 we were learning along with the rest of the folks on Twitter or Reddit or wherever one follows things, right? 79 00:06:53,009 --> 00:06:59,239 the conversations that, Satya and Sam were having was above my pay grade, just to be black and white about it. 80 00:06:59,679 --> 00:07:03,265 So we were following along and listening along just like rest of the world. 81 00:07:03,315 --> 00:07:06,285 I think the one difference is, we had a little bit in the machinery. 82 00:07:06,715 --> 00:07:13,585 Obviously in our team, we do work from an engineering perspective closely with OpenAI and they're a massive partner to us. 83 00:07:13,585 --> 00:07:21,135 So I think in some cases, maybe we are a little more empathetic, I would say, because it's a little more closer to home. 84 00:07:21,135 --> 00:07:25,006 And, it's one big virtual team is loosely speaking, how to think about it. 85 00:07:26,103 --> 00:07:38,563 Miko Pawlikowski: So there was one particular thing that I think it's interesting and, It might be that people are just reading way too much into that, but I think Satya went and said something along the lines of, 'don't you worry, 86 00:07:38,573 --> 00:07:45,383 even if OpenAI stops existing tomorrow, we're basically well positioned to continue, the innovation' and all of that. 87 00:07:45,703 --> 00:07:50,533 And a lot of people took it as saying, okay, they basically bought themselves OpenAI. 88 00:07:50,603 --> 00:07:52,123 is that roughly what's happening? 89 00:07:52,320 --> 00:07:53,070 Amit Bahree: Couple of things. 90 00:07:53,280 --> 00:07:55,850 One is now I'm not a Microsoft spokesman. 91 00:07:55,880 --> 00:07:57,340 I'm just talking on my behalf. 92 00:07:57,400 --> 00:07:58,560 we don't own OpenAI. 93 00:08:00,160 --> 00:08:01,220 I don't think that is correct. 94 00:08:01,220 --> 00:08:02,850 I think people are reading too much into it. 95 00:08:03,650 --> 00:08:11,690 I think the thing I want the folks to understand is, Microsoft and Microsoft research investments in AI have been over 30 years. 96 00:08:12,420 --> 00:08:14,675 So it's not just today we've woken up. 97 00:08:14,925 --> 00:08:18,615 Or a few years ago, we've woken up and say, 'look, this is the thing to go in'. 98 00:08:19,145 --> 00:08:24,245 I think the difference really is my mom didn't know about it, nor did she care. 99 00:08:24,875 --> 00:08:25,885 now she does. 100 00:08:26,135 --> 00:08:30,105 so I think, where we're coming from in some ways it's not new. 101 00:08:30,105 --> 00:08:41,210 And it's just become more in the limelight and people are becoming more aware, but We've been at it for a while, and both from a research perspective, products perspective, it's just more in the limelight now. 102 00:08:41,950 --> 00:08:42,330 Miko Pawlikowski: Okay. 103 00:08:42,330 --> 00:08:45,670 let's leave Microsoft alone and talk a little bit closer to your book. 104 00:08:45,670 --> 00:08:49,240 So one of the questions that I keep asking everybody is. 105 00:08:49,975 --> 00:08:54,855 Their reason to think why GenAI is such a massive deal, right? 106 00:08:54,855 --> 00:08:59,165 Why is it such a big deal and why again, your mom, why does she know about it now? 107 00:08:59,215 --> 00:09:01,935 And she didn't before, I don't think she knew about BERT. 108 00:09:02,015 --> 00:09:09,110 I suspect, but she does know about ChatGPT and there's a good chance she's using ChatGPT, which is, next level. 109 00:09:09,890 --> 00:09:13,280 And, what do you think was so special recently? 110 00:09:13,280 --> 00:09:20,180 What's the like hockey stick moment from your perspective of what's changed that it became, a household name? 111 00:09:20,725 --> 00:09:29,105 Amit Bahree: it was ChatGPT itself that changed to make it a household name, and as we all know and perhaps understand what most people is, the roots of ChatGPT was a demo. 112 00:09:29,655 --> 00:09:31,475 It wasn't meant to where it is right now. 113 00:09:32,175 --> 00:09:43,235 And the fact that one doesn't have to know BERT or any of the other sort of technical mumbo jumbo, and I can just talk to it, I can just use it just as an end user. 114 00:09:43,845 --> 00:09:45,845 I think the simplicity is the power of it. 115 00:09:46,365 --> 00:10:02,145 And the breadth of what a language understanding, it can do versus as we call it now, the traditional AI, which is very odd, by the way, in the first place, but, in the old AI, the pre gen AI, which is not old again, it's very much valid, of course, today. 116 00:10:02,750 --> 00:10:06,080 was very task specific, where you go deep in a certain task. 117 00:10:06,100 --> 00:10:13,280 So if you're in a company, in an enterprise doing a certain thing, using that, you understand it, you get its value, you know why it's powerful. 118 00:10:13,750 --> 00:10:21,520 But you can't have a generic, free ranging, wider set of, conversations and thoughts, around it. 119 00:10:21,520 --> 00:10:29,470 So if you take a previous chatbot, for example, which is not powered by GenAI, and you say, and if Miko goes and says, hey, I'm hungry, 120 00:10:31,815 --> 00:10:33,855 It won't know what to do, I'm sorry. 121 00:10:34,205 --> 00:10:43,065 Whereas these things understand, they adapt, so I think the simplicity from a using perspective is the power. 122 00:10:43,665 --> 00:10:47,385 And that's why the likes of my mom and others in the world is talking about it, right? 123 00:10:47,405 --> 00:10:53,325 Because it's not technical mumbo jumbo that a handful of people understand and you geek out in the corner. 124 00:10:53,475 --> 00:10:54,115 I can just use it. 125 00:10:56,970 --> 00:10:57,990 Miko Pawlikowski: you behind this comparison? 126 00:10:57,990 --> 00:11:05,190 This is the iPhone moment for, artificial intelligence in general, in particular, like large language models? 127 00:11:05,261 --> 00:11:13,661 Amit Bahree: is it the iPhone, the original one, or which was the one which got the 3G support, or when the App Store came up, is it that one? 128 00:11:13,661 --> 00:11:15,151 It's some variants out there, right? 129 00:11:15,191 --> 00:11:21,461 But, I look at it even simpler, because I think the, iPhone is still a very consumer thing at least. 130 00:11:21,651 --> 00:11:23,321 My world is very enterprising. 131 00:11:23,371 --> 00:11:24,731 Consumer is one side of the house. 132 00:11:24,731 --> 00:11:29,091 Enterprise is a very different kettle of fish in the sense of the problems and what they're trying to solve. 133 00:11:29,311 --> 00:11:34,061 So I think if you look at a consumer sense, like my mom, that is an iPhone sort of comparison moment. 134 00:11:34,441 --> 00:11:38,951 Miko Pawlikowski: If you go to Manning.com, you can actually browse portions of the book for free. 135 00:11:38,951 --> 00:11:44,661 So if you're listening along to that, go to Manning.com, find the book, and, look for figure 1. 136 00:11:44,661 --> 00:11:45,071 1. 137 00:11:45,071 --> 00:11:49,791 It's a graph, that Amit took out of our world in data .org. 138 00:11:49,791 --> 00:11:55,781 And it's called 'language and image recognition capabilities of AI systems, have improved rapidly'. 139 00:11:56,151 --> 00:12:09,831 And it's basically plotting, the human performance benchmark, which goes from minus 100, meaning that it's pretty bad and goes all the way to zero where it's comparable, I think, or maybe equivalent to human. 140 00:12:09,881 --> 00:12:23,031 And, For everybody who's listening to that as a podcast and not seeing this on video, it's showing different, machine learning, AI, trends, it's got handwriting, recognition, speech, recognition, image recognition, 141 00:12:23,391 --> 00:12:28,711 and then it's got the reading comprehension and language understanding and what's mind blowing to me. 142 00:12:28,711 --> 00:12:32,291 And I suspect this is why you chose this particular graph is that. 143 00:12:32,636 --> 00:12:37,566 We've got the handwriting and the speech recognition that kind of goes slowly, looks linearly. 144 00:12:37,956 --> 00:12:47,676 there was a little bit of progress and then somewhere in mid 2010s, it just goes out of control and, it goes all the way up, to very good results. 145 00:12:47,676 --> 00:12:52,456 And then 2016, I think on the graph starts the reading comprehension. 146 00:12:52,936 --> 00:12:59,096 It's basically, An arrow going straight up, same for language understanding. 147 00:12:59,096 --> 00:13:00,276 This is within two years. 148 00:13:00,286 --> 00:13:05,646 It goes from nothing literally to basically comparable to human performance. 149 00:13:05,696 --> 00:13:07,026 why did it happen then? 150 00:13:07,076 --> 00:13:09,996 What needed to happen this is not even a hockey stick. 151 00:13:09,996 --> 00:13:12,896 This is just like the right angle here. 152 00:13:14,196 --> 00:13:15,106 How do you explain that? 153 00:13:15,331 --> 00:13:15,581 Amit Bahree: true. 154 00:13:16,561 --> 00:13:18,101 I actually never thought of the right angle. 155 00:13:18,501 --> 00:13:21,461 I think it's, it's three things coming together, right? 156 00:13:21,461 --> 00:13:28,091 So one is aspects of AI and the research behind it have gotten better in that time frame, right? 157 00:13:28,091 --> 00:13:34,311 So we started getting deep learning, transformers I don't think quite existed at that point in time. 158 00:13:34,761 --> 00:13:40,141 so fundamental architecture changes, or improvements, from a model perspective, model architecture. 159 00:13:40,141 --> 00:13:40,911 So I think that's one. 160 00:13:41,501 --> 00:13:48,251 But I think crucially, maybe equally, maybe more crucially is availability of data at the scale you need. 161 00:13:48,856 --> 00:13:54,026 And then also compute most specific GPUs to, train and crunch through these. 162 00:13:54,386 --> 00:13:57,656 I think that it's that perfect storm of those three things coming together. 163 00:13:58,266 --> 00:14:01,316 if one of them didn't happen as much, it would be still slower. 164 00:14:01,356 --> 00:14:05,946 And that's why you see the linear progression in the others versus I don't know, is that a rocket thing? 165 00:14:05,986 --> 00:14:07,096 Miko Pawlikowski: basically vertical. 166 00:14:08,268 --> 00:14:11,428 Amit Bahree: so I think it's those three sort of things coming together. 167 00:14:11,648 --> 00:14:14,898 I personally believe, I don't think anything was planned or orchestrated. 168 00:14:14,898 --> 00:14:25,638 I think it's one of those happy accidents, how GPUs work and the number, the floating points it needs to do for graphics, which is gaming, is the same thing that AI models need to do. 169 00:14:26,038 --> 00:14:34,888 We as humans started spitting on more data, maybe thanks to social, thanks to actually iPhones and other smartphones and devices and whatnot. 170 00:14:34,928 --> 00:14:40,438 And then, cloud capabilities in the context of GPUs and compute, improved. 171 00:14:40,788 --> 00:14:46,923 I guess there's a fourth one, which is inherent, but A lot of system engineering things started coming online, right? 172 00:14:46,933 --> 00:14:48,443 How do you run these? 173 00:14:48,623 --> 00:14:51,203 Because it's not like running them on one GPU, for example. 174 00:14:51,203 --> 00:14:52,773 You need clusters of machines. 175 00:14:52,843 --> 00:15:01,463 So there's a fair amount of systems engineering, in the sense of reliability, resilience, and so on, under the covers that, that has to make it all happen. 176 00:15:01,493 --> 00:15:02,483 Otherwise it won't run. 177 00:15:02,713 --> 00:15:04,093 Lots of Physics and computer science. 178 00:15:04,093 --> 00:15:05,973 I keep saying that to my team, for example. 179 00:15:06,023 --> 00:15:14,913 so I think that's maybe a fourth dimension, which most people don't talk about, but, I think those are the things that perhaps enabled a bunch of this to go where we are right now. 180 00:15:16,433 --> 00:15:26,323 Miko Pawlikowski: There's another interesting reference that you have, it's called a survey of large language models and somehow I missed that I found it in your book. 181 00:15:26,323 --> 00:15:27,513 So thank you for that. 182 00:15:28,133 --> 00:15:31,983 and I think, page nine is where I found the figure three. 183 00:15:32,458 --> 00:15:35,543 it's going to be very difficult to describe on a verbal way, but. 184 00:15:36,023 --> 00:15:42,663 Imagine like a little anthill with a bunch of ants in it, swarming. 185 00:15:42,743 --> 00:15:44,893 And each one of those ants is basically a model. 186 00:15:45,553 --> 00:15:53,453 And, the figure is making a distinction between the ones that are basically open source, publicly available, and the ones that are closed source, and it's only. 187 00:15:53,803 --> 00:15:57,853 graphing, up to, GPT-4 and LLAMA-2. 188 00:15:57,873 --> 00:16:00,283 So there's, way more of that. 189 00:16:00,903 --> 00:16:05,933 I think at some point I saw that, hugging face had a hundred thousand models uploaded to it. 190 00:16:05,933 --> 00:16:12,853 And I suspect after Lama three, it's probably doubled since, it gives you a little bit of a perspective. 191 00:16:12,873 --> 00:16:16,723 It's not just ChatGPT and it's certainly not just OpenAI. 192 00:16:16,743 --> 00:16:19,973 And it, it shows you how much variety there is. 193 00:16:20,013 --> 00:16:23,603 And, frankly, I've been looking at this things for a while now. 194 00:16:23,653 --> 00:16:30,723 And I still, there's probably like half of this graph that I haven't actually even heard of, let alone, tried. 195 00:16:30,753 --> 00:16:34,263 I keep using this word Cambrian explosion, but it really does feel like that. 196 00:16:34,568 --> 00:16:39,238 They're just crawling out of every rock and hole, which is amazing. 197 00:16:39,258 --> 00:16:44,268 This is, such an exciting time to be alive, is that it's the right way of putting that. 198 00:16:44,998 --> 00:16:47,988 why did you choose that figure, for your book? 199 00:16:48,311 --> 00:16:54,131 Amit Bahree: I had two schools of thought when I originally said this would be a right one. 200 00:16:54,521 --> 00:17:01,221 I think one of them is what you touched on, yes, OpenAI and ChatGPT has the world's attention. 201 00:17:01,811 --> 00:17:05,871 but there's a lot of other innovation, a lot of other companies, a lot of other stuff going on as well. 202 00:17:05,871 --> 00:17:07,171 It's not only that. 203 00:17:07,621 --> 00:17:13,361 so I think it is more of awareness in that sense, because the book also is in my personal capacity. 204 00:17:13,371 --> 00:17:16,711 It's not a Microsoft-sponsored or a Microsoft book, right? 205 00:17:16,711 --> 00:17:23,731 So in that sense, I felt I would be doing a disservice if I didn't make folks, at least aware, because you just know what you know. 206 00:17:23,911 --> 00:17:26,271 So I think that was my one aspect. 207 00:17:26,541 --> 00:17:28,981 I think the second aspect was also. 208 00:17:28,981 --> 00:17:36,071 Showing lineage because a lot of these models are complex, as base models of training. 209 00:17:36,151 --> 00:17:48,081 they're super expensive, both in the sense of data gathering, cleaning it up, actual training costs, and so on and so forth, which many don't really have the appetite. 210 00:17:48,121 --> 00:17:50,541 or have the ability resources-wise to do that. 211 00:17:51,036 --> 00:18:01,336 So what I also wanted to show was, at the end of the day, it's still only a handful of base models that are further trained or fine-tuned and derived from. 212 00:18:02,056 --> 00:18:06,516 so it's a lineage aspect also I wanted to, because that gets lost in the noise as well. 213 00:18:07,076 --> 00:18:17,166 and again, the framing of the book is mostly in enterprises, so if you're in an enterprise setting, you just need to know the roots of the model you're using and the lineage it has. 214 00:18:17,961 --> 00:18:21,611 So you can make an informed decision if that's the right thing or not the right. 215 00:18:22,596 --> 00:18:27,216 Miko Pawlikowski: Speaking of which, that reminds me, Phi-3 released last week. 216 00:18:27,746 --> 00:18:32,276 It seems to be punching above its, weight category, quite heavily. 217 00:18:33,006 --> 00:18:35,731 were you involved in any capacity in that project? 218 00:18:36,356 --> 00:18:37,516 Amit Bahree: in a minor way. 219 00:18:37,516 --> 00:18:44,006 So if you go read the technical paper, I'm one of the 70 some people listed on that. 220 00:18:44,036 --> 00:18:45,026 it's a team sport. 221 00:18:45,476 --> 00:18:52,056 so the team that built the, the SLM, the Phi-3 is originally from our platform team. 222 00:18:52,531 --> 00:18:57,961 they've been moved out of that into the new GenAI team we've recently formed and publicly announced. 223 00:18:58,351 --> 00:19:00,061 so we work very closely with the team. 224 00:19:00,161 --> 00:19:07,861 even though I have roots in applied research, I don't think I can take credit to say I built the model, but I've been involved with it for sure. 225 00:19:08,141 --> 00:19:08,951 Miko Pawlikowski: you're on the paper. 226 00:19:08,951 --> 00:19:12,651 That means you, you built it, you can claim that 227 00:19:13,096 --> 00:19:20,746 Amit Bahree: I think Sebastian and the others have been very kind where some of us have been involved in providing feedback and input and guidance and what have you. 228 00:19:21,046 --> 00:19:24,526 I think they've been quite kind and then they've done the right thing. 229 00:19:24,526 --> 00:19:27,286 But that doesn't mean I can take foot credit. 230 00:19:27,416 --> 00:19:29,691 the way I think it is, it takes a village. 231 00:19:30,071 --> 00:19:31,991 Each village needs an idiot, and that's me. 232 00:19:31,991 --> 00:19:32,741 It's an important role. 233 00:19:32,741 --> 00:19:33,611 Somebody has to do it. 234 00:19:34,626 --> 00:19:34,986 Miko Pawlikowski: Oh, wow. 235 00:19:34,986 --> 00:19:36,406 That, is a lot of authors. 236 00:19:36,426 --> 00:19:39,771 I just opened, and the paper, was released three days ago. 237 00:19:40,176 --> 00:19:45,326 looks like it, and it is an impressive number of people working on that. 238 00:19:45,656 --> 00:19:46,986 I've been reading people's opinions. 239 00:19:46,986 --> 00:19:48,396 I haven't actually read the paper. 240 00:19:48,446 --> 00:19:52,646 so I don't know how it explains how it's possibly this good. 241 00:19:53,046 --> 00:19:54,646 it happened a few days. 242 00:19:54,646 --> 00:19:57,356 Was it a week after LLAMA-3 was 243 00:19:57,551 --> 00:19:57,941 . Amit Bahree: Roughly. 244 00:19:58,241 --> 00:20:04,571 Miko Pawlikowski: the main selling point being that they trained it on 15 trillion tokens or some ridiculous number like that. 245 00:20:04,591 --> 00:20:06,341 And they were surprised that it kept getting better. 246 00:20:06,861 --> 00:20:11,141 Sounds like this one was, trained on a much smaller corpus of text. 247 00:20:11,691 --> 00:20:14,661 how do you explain, why it's so good? 248 00:20:15,796 --> 00:20:17,256 Amit Bahree: so there's two things here. 249 00:20:17,256 --> 00:20:20,826 I think it's, and it's in the paper, it's 3 trillion tokens. 250 00:20:21,296 --> 00:20:27,356 I think the, again, this is a genesis from Phi-2, which is a genesis from Phi-1, which is a genesis from ORCA2. 251 00:20:27,356 --> 00:20:28,886 Those are all research models. 252 00:20:29,126 --> 00:20:38,196 one of the things we've come around to seeing is in the context of these new categories of small language models, is highly curated data sets is better. 253 00:20:38,486 --> 00:20:44,196 so one reason why you see Phi-2 and Phi-3 doing so much better. 254 00:20:44,196 --> 00:20:50,196 Relative to, bigger models is because, a good chunk of the data is highly curated. 255 00:20:50,216 --> 00:20:52,306 There's two aspects to it, which we also publish. 256 00:20:52,306 --> 00:20:57,616 So there's this other paper, this textbooks is all you need if you or your readers have seen it. 257 00:20:57,936 --> 00:21:05,991 So basically a good portion of the corpus is high quality textbooks as input into the model to train on. 258 00:21:06,421 --> 00:21:19,041 And then the second aspect of, data is not common crawl sucking stuff off the web, but again, highly curated, web data, or a very small subset of the web data, combined with the, Textbooks. 259 00:21:19,051 --> 00:21:24,611 that is also an interesting research thing where it's going now to say is like for these smaller models. 260 00:21:25,821 --> 00:21:31,421 How much higher quality data sets does carry a lot of weight. 261 00:21:31,913 --> 00:21:34,093 and that's really a lot of what you're seeing. 262 00:21:35,098 --> 00:21:39,498 Miko Pawlikowski: So When people say curated, does it mean an army of humans? 263 00:21:39,888 --> 00:21:45,208 Like selecting, reading that and annotating and like discarding low quality stuff. 264 00:21:45,208 --> 00:21:53,028 Or is there like another model that does that work to pre select and it's models all the way down 265 00:21:53,731 --> 00:21:58,451 Amit Bahree: It's not an army of humans because that's not scalable and doable at, you can do it as a one 266 00:21:58,556 --> 00:21:59,446 Miko Pawlikowski: trillion tokens. 267 00:21:59,446 --> 00:21:59,656 Yeah. 268 00:22:00,171 --> 00:22:02,941 Amit Bahree: Yes, you can do it as a one off maybe, but, especially. 269 00:22:03,721 --> 00:22:05,251 Phi-3 is a product. 270 00:22:05,621 --> 00:22:17,091 Phi-2 was a research model, two different things and from our perspective, the minute we're saying it's a product, we release it to production, it has to go through the right rigour and cycles from a Microsoft perspective. 271 00:22:17,211 --> 00:22:21,041 That means, We have to support it for a number of years. 272 00:22:21,101 --> 00:22:23,201 We have customers who are gonna use it and so on. 273 00:22:23,201 --> 00:22:26,031 So we can't just publish it with an army of people. 274 00:22:26,061 --> 00:22:27,351 'cause that doesn't really scale. 275 00:22:27,351 --> 00:22:41,771 So there is other models helping when you say how do you create it, at least in the context of this, it is synthetic data generated using, GPT-4, but then the humans are involved to make sure that, it is curated. 276 00:22:41,771 --> 00:22:43,661 Again, it's not an army of people, but it's. 277 00:22:44,136 --> 00:22:47,396 machinery evaluations and so on, machinery running to 278 00:22:47,546 --> 00:22:49,836 Miko Pawlikowski: So this is synthetic data we're talking about. 279 00:22:49,856 --> 00:22:52,186 It's literally all generated by ChatGPT, 280 00:22:52,621 --> 00:22:55,831 Amit Bahree: most, most is generated by GPT-4. 281 00:22:56,141 --> 00:23:03,021 Miko Pawlikowski: So that always makes me wonder, if we train things on data coming out of a model. 282 00:23:03,781 --> 00:23:19,251 I'm, obviously no expert on this, but intuitively it seems to me like that data generated by GPT-4, any model really it's going to have certain attributes to it that don't necessarily represent, the web, is that not a problem? 283 00:23:20,093 --> 00:23:20,703 Amit Bahree: Yes and no. 284 00:23:20,703 --> 00:23:27,863 I think one shouldn't be using the output of another model as your general data input only. 285 00:23:27,913 --> 00:23:32,253 I think you have to look at it in certain domains and specific of what you're trying to do. 286 00:23:32,283 --> 00:23:34,433 And then in that context, it would be okay. 287 00:23:34,433 --> 00:23:36,743 But that's where the human aspect also comes. 288 00:23:36,743 --> 00:23:38,783 You have to make sure evaluations are right. 289 00:23:38,793 --> 00:23:40,433 Cause guess what? 290 00:23:40,433 --> 00:23:44,803 The old school garbage in garbage out is still very much valid. 291 00:23:45,283 --> 00:23:47,733 but I think your intuition is correct in the sense. 292 00:23:48,858 --> 00:23:59,308 One shouldn't think of it as, 'hey, I can go use an LLM, spit it out, and then use that to go train my own model', in the breadth, in the broad sense of it. 293 00:23:59,838 --> 00:24:13,488 but you'll also hear of more recent papers coming, and more recent news where, in general, this is not Phi-3, but in general, we have reached the points where we are sucking in all of the available Internet that one's reachable or allowed to reach. 294 00:24:14,048 --> 00:24:21,248 And to train the models more and more, we are also then complementing it with the synthetic data, which, other AI is, generating, 295 00:24:21,888 --> 00:24:35,168 So I think you have to go put it back in which aspects of your existing models not doing great on, evaluating those, and then using that as a basis to strengthen that dimension, rather than just a more horizontal generic, if that makes sense. 296 00:24:35,588 --> 00:24:36,568 Miko Pawlikowski: Yeah, It certainly does. 297 00:24:36,568 --> 00:24:46,758 And I think what I appreciated, I actually haven't seen the shortcut, the abbreviation SLMs for small language models until I opened your 298 00:24:47,018 --> 00:24:47,468 Oh, okay. 299 00:24:47,538 --> 00:24:54,028 which is, an indication of just how much focus we put on LLMs, the large language models. 300 00:24:54,548 --> 00:25:10,458 And, I think that, to me, at least, I don't know if it's just like the part of me that loves running things on Raspberry Pis and gets excited about the possibility of actually running a decent enough model that I can speak to that actually runs on my phone or something like that. 301 00:25:10,458 --> 00:25:18,758 so 3 billion parameters, does that mean roughly with 4 bit quantization that we can run it on effectively any phone at this stage? 302 00:25:18,818 --> 00:25:21,378 Like it's going to need maybe a couple of gigs 303 00:25:21,476 --> 00:25:22,316 Amit Bahree: everyone's asking that. 304 00:25:22,706 --> 00:25:30,416 so on a certain profile, so I think we talk about an iPhone 14 with a Bionic processor. 305 00:25:31,006 --> 00:25:31,876 You can run it. 306 00:25:32,216 --> 00:25:35,766 It can do a certain number of tokens per minute sort of generations. 307 00:25:36,316 --> 00:25:36,716 I think. 308 00:25:37,141 --> 00:25:50,531 To be able to go run it for Miko or Amit as a, I can run it on a phone and as an experiment, what have you is one thing, versus the ability to run it at scale for a production deployment is a different thing. 309 00:25:51,171 --> 00:26:00,631 So yes, these are small language models and we do believe how LLMs after ChatGPT became a lot of hype. 310 00:26:00,811 --> 00:26:02,461 Some is good, some is not so good. 311 00:26:02,701 --> 00:26:06,026 SLMs will be the next set, in the context of the hype. 312 00:26:06,471 --> 00:26:14,421 But as I go to remind many of the folks I talk to, it's a small language model in relation to a large language model 313 00:26:15,831 --> 00:26:23,591 at the end of the day, I think 2.8 or 3.8 or whatever parameter we have on the mini one, because this is Phi mini, Phi-3 mini. 314 00:26:24,206 --> 00:26:25,626 It's also a family of Phi models. 315 00:26:25,636 --> 00:26:40,016 This is the smallest of the ones that should be coming out and the paper touches on the others, at the end of the day, three billion parameter or whatever the exact number isn't small just from a computer science perspective, it is still a pretty big, complex thing. 316 00:26:40,906 --> 00:26:41,236 Yes. 317 00:26:41,256 --> 00:26:47,596 Compared to hundreds of billions of parameters, it is small, but it is not small. 318 00:26:47,876 --> 00:26:49,496 I think I have to go remind people that. 319 00:26:49,561 --> 00:27:00,861 In relation, or relative to an LLM, yes, it's small, but by itself, it is still pretty complex and beefy in the sense of compute requirements and GPU requirements and, what it needs. 320 00:27:01,541 --> 00:27:12,181 It doesn't mean you'll go off and deploy a bunch of these on your Raspberry Pi with inference and, milliseconds and whatnot. 321 00:27:12,371 --> 00:27:23,391 Miko Pawlikowski: I'm asking this, but one of the reasons I am asking this is because, I don't know if you followed the launch of Humane AI, that little gadget that kind of looks like something from Star Trek. 322 00:27:23,951 --> 00:27:29,051 and it looks like it hasn't been particularly well received because it's a little slow and a little clunky. 323 00:27:29,051 --> 00:27:30,411 I think, I watched. 324 00:27:31,086 --> 00:27:44,808 YouTube review of that and I think they basically destroyed it a little bit by showing just how long you have to wait because it's effectively just uploading it to a cloud somewhere and then downloading the response and it's just not there. 325 00:27:45,348 --> 00:27:51,208 And, with Phi-3 and like the smaller models, all of a sudden everybody's thinking the same thing. 326 00:27:51,238 --> 00:27:53,188 Can we make it native? 327 00:27:53,218 --> 00:28:00,298 I think Apple announced some things about how they're going to work on making sure that the hardware in the newer iPhones. 328 00:28:00,653 --> 00:28:03,123 to run this stuff at reasonable speed. 329 00:28:03,153 --> 00:28:07,833 And this feels like this would be, another hockey stick moment for this things. 330 00:28:07,833 --> 00:28:12,063 There's small language models where Siri doesn't suck Hey Google. 331 00:28:13,093 --> 00:28:14,483 "Okay Google" works. 332 00:28:14,543 --> 00:28:17,423 And Alexa actually listens to me, that kind of stuff. 333 00:28:17,693 --> 00:28:22,303 do you think it was that one of the motivations of the smaller model? 334 00:28:22,873 --> 00:28:25,923 Amit Bahree: the premise you're touching on is, was one of the motivations. 335 00:28:25,923 --> 00:28:33,343 So if I rewind for a second, for a large language model, I go back, Again, if you cut through the hype for a second, laws of Physics and computer science. 336 00:28:33,583 --> 00:28:39,903 For these large language models, enormously complex, needs a lot of compute resources to run. 337 00:28:40,313 --> 00:28:52,533 And like any developer, programmer, computer scientist will tell you, laws of Physics, the scale means complexity, means latency, I have to process more things. 338 00:28:52,933 --> 00:28:58,163 It takes time to get results back and there is no ways to cut those corners at the end of the day. 339 00:28:58,913 --> 00:29:03,663 So where you're seeing latency or things are slower, it's because of that. 340 00:29:03,743 --> 00:29:11,963 from our perspective, there's also another dimension to run this at cloud at Azure level globally across the hundreds of data centers and what have you. 341 00:29:11,963 --> 00:29:24,393 that's not simple or cheap, So if we can reduce our costs to run this at scale, we can make sure the service is cheaper for our customers as well. 342 00:29:24,898 --> 00:29:29,138 I think this is also where, we as humans are awesome and we forget things. 343 00:29:30,018 --> 00:29:33,098 Because many of these models are exposed as an API. 344 00:29:33,868 --> 00:29:43,868 we as, at least developers for sure, have the expectation that it's an API call, so I'm going to get my response back in, milliseconds and what have you. 345 00:29:44,413 --> 00:29:45,913 because that's what we have been used to. 346 00:29:46,233 --> 00:29:53,993 The difference is, yes, it's an API call, but the machinery that's running behind, including the models itself, is super complex. 347 00:29:54,753 --> 00:29:56,663 and when things are slow, we get unhappy. 348 00:29:56,673 --> 00:29:59,113 So I think that also needs to be a mentorship. 349 00:29:59,123 --> 00:30:07,293 So if you package it all of this up, that's a big motivation of, in some cases, a small language model would make more sense. 350 00:30:07,393 --> 00:30:09,463 But I also want to outline this. 351 00:30:10,248 --> 00:30:13,428 It doesn't have the same power as a large language model. 352 00:30:13,428 --> 00:30:16,558 I see a lot of comparisons to the bigger models and all, which is good. 353 00:30:16,588 --> 00:30:21,168 It's early days, but at the end of the day, not an apples and apples comparison. 354 00:30:21,178 --> 00:30:30,303 For example, A lot of people, including me, have been guilty of just using GPT 4 as a knowledge database, more and more people are instead of googling or binging or whatever you do, you just ask the thing. 355 00:30:30,463 --> 00:30:32,633 So you're using it as a big, fancy database. 356 00:30:33,433 --> 00:30:45,083 So if I put that in the sense of the world knowledge, again, it's not factually correct whether it doesn't have the world knowledge, it only has the publicly accessible knowledge as of its cutoff training, but ignoring that 357 00:30:45,093 --> 00:30:51,948 point, the small language models will not have that because they've not been trained on that volume of data. 358 00:30:52,168 --> 00:31:01,968 So I think the other dimension is whilst the compute profile, what we've been talking is one, you have to think of the SLMs in the right use case. 359 00:31:01,968 --> 00:31:02,968 What am I trying to do? 360 00:31:03,398 --> 00:31:08,208 If I'm trying to do understand an entity in a workflow, I can use a small language model. 361 00:31:08,278 --> 00:31:11,818 I don't need the power of these large language models necessarily. 362 00:31:12,228 --> 00:31:22,018 equally if there's different languages that one has to use, not English, for example, a small language model may not be as powerful or as good as a large language model. 363 00:31:22,098 --> 00:31:25,018 So the way we should think about it is they shouldn't be competing. 364 00:31:25,038 --> 00:31:26,258 They're complementing each other. 365 00:31:27,008 --> 00:31:32,458 in what you're trying to solve at the right step, use the right model because the beauty again is they're an API call. 366 00:31:32,798 --> 00:31:38,158 So it's not that if you're developing an application, you're stuck with one thing for the whole duration. 367 00:31:38,168 --> 00:31:41,208 You can choose at the right step for the right thing, for the right power. 368 00:31:41,208 --> 00:31:41,758 So I use. 369 00:31:42,158 --> 00:31:55,418 Often with my teams and others, the analogy, like if a GPT or pick your model is like a Ferrari, if you're going to racing, you need a Ferrari, but if you are, if an SLM is like a Honda, and by the way, I don't get pick your brand, 370 00:31:55,988 --> 00:32:01,708 and you're stuck in morning rush hour traffic, and Honda is better than you pick the right thing for the right purpose. 371 00:32:01,768 --> 00:32:03,188 Is this really what I'm getting into? 372 00:32:03,588 --> 00:32:04,858 I would show them the compute profile. 373 00:32:06,403 --> 00:32:07,423 Miko Pawlikowski: I completely agree. 374 00:32:07,483 --> 00:32:15,793 And I think these are separate use cases where I just want my okay Google and my Siri to not suck so much. 375 00:32:15,803 --> 00:32:21,723 I want it to understand what I mean half of the time and not have to say the thing three times. 376 00:32:22,218 --> 00:32:28,508 Not to wonder every time, what did I say differently now that it didn't catch the song that I wanted to play kind of thing. 377 00:32:28,518 --> 00:32:33,488 And that would already be like a big improvement for me, just interacting with that thing. 378 00:32:33,828 --> 00:32:52,728 when you were saying all those things, I was wondering whether there is a certain kind of minimal level where it will be, a certain number of tokens per second that will feel to most humans as real time is really what we're talking about here and beyond that point, probably doesn't matter. 379 00:32:53,148 --> 00:32:58,228 if you can't read it faster than it's being produced and you're not going to have that, that feeling of slowness. 380 00:32:58,228 --> 00:33:00,378 And there are some interesting. 381 00:33:00,788 --> 00:33:14,738 things like Groq, the one with a Q at the end, I think there's suing Elon Musk over there, that has some dedicated hardware, and I saw some demo, I was doing something ridiculous, like 800 tokens a second on LLAMA 3. 382 00:33:14,758 --> 00:33:16,688 So was it 70B or something? 383 00:33:17,438 --> 00:33:24,478 Is it not just like the matter of waiting like 5-10 years for the dedicated hardware to get cheap and plentiful enough. 384 00:33:24,478 --> 00:33:28,268 And it won't be so much of an issue? 385 00:33:29,086 --> 00:33:32,546 Amit Bahree: that's the whole story of computing history, if you go look at it, right? 386 00:33:32,971 --> 00:33:33,411 Miko Pawlikowski: Yeah. 387 00:33:34,146 --> 00:33:40,046 Amit Bahree: As hardware improves, but I think we also have to put it in the context of the scale of use. 388 00:33:40,336 --> 00:33:53,166 For example, if you have access to a data center with hundreds of GPUs of today's best in breed, let's say, and there's nobody else in, it won't feel slow to you, it'll be like, what's everyone complaining about? 389 00:33:53,586 --> 00:33:58,756 But if in the same data center, you have 4000 other users concurrently coming, it's a different story. 390 00:33:58,806 --> 00:34:13,716 I think I have to also remind people when you are doing comparisons or the expectations you have to think of in the sense of the load, the traffic, how much, now, we as a cloud provider, that's a lot of our headache and a lot of customers saying like, why do you think I'm paying you? 391 00:34:14,946 --> 00:34:18,216 but I also go back, yes, and laws of Physics don't change as well. 392 00:34:18,246 --> 00:34:21,986 but overall, I think, Nvidia announced a whole bunch of new stuff. 393 00:34:22,221 --> 00:34:26,671 at their conference quite recently, network speeds are improving or have been. 394 00:34:26,671 --> 00:34:34,761 So if you step back for a second, I think, just history of computing has been that hardware scales up and improves and helps the software. 395 00:34:35,111 --> 00:34:40,351 I think the one thing that I'm not, personally speaking, I can't predict the future. 396 00:34:40,381 --> 00:34:44,761 The one thing is, This is one of those back to your hockey stick points. 397 00:34:44,801 --> 00:34:47,381 The scale is almost at a global level. 398 00:34:47,451 --> 00:34:54,291 okay, it's not every human on the planet using ChatGPT or LLMs in some fashion, but quite a big percentage of people are. 399 00:34:54,401 --> 00:34:58,601 In some manner, on a daily basis, for some people, it is eight hours a day. 400 00:34:59,431 --> 00:35:03,791 My mom may be once every other, once a week or whatever it is she does. 401 00:35:04,171 --> 00:35:10,551 but the breadth of humans using it is much more broader than it ever has been. 402 00:35:10,931 --> 00:35:28,271 So with that context, even as other underlying system and hardware improves, I think the perception of, is it actually getting improving, maybe slower than perhaps in the past where it was still more niche, if that makes sense. 403 00:35:29,046 --> 00:35:29,626 Miko Pawlikowski: It does. 404 00:35:29,626 --> 00:35:38,846 And I think to follow the train of thought that you started here, the potential is probably higher just because it's so much more intuitive. 405 00:35:38,846 --> 00:35:40,116 You just talk to it. 406 00:35:40,786 --> 00:35:43,836 my mom when she needs to install an app, it's a whole thing. 407 00:35:43,876 --> 00:35:44,886 It takes a while. 408 00:35:44,926 --> 00:35:46,206 She needs to get used to it. 409 00:35:46,216 --> 00:35:49,937 She needs get comfortable with it, needs to remember the password. 410 00:35:49,937 --> 00:35:51,257 There might be another pin. 411 00:35:51,267 --> 00:36:06,912 it's a whole thing, but, once she gets some kind of interface that's built into her phone, or whatever, where she can just talk to it, that kind of clears a lot of barriers and, a lot of people are picturing this feature where your 412 00:36:06,912 --> 00:36:13,174 phone is slowly turning to effectively listening device from Star Trek and, It's just doing what you want it to do. 413 00:36:13,224 --> 00:36:15,134 And maybe integrates with all the apps. 414 00:36:15,684 --> 00:36:17,344 I ordered that Rabbit R1. 415 00:36:17,364 --> 00:36:18,264 I'm still waiting. 416 00:36:18,284 --> 00:36:22,853 I don't know where the delivery is supposed to be, but, that's one of the visions of the future is right there. 417 00:36:22,934 --> 00:36:31,924 You just talk to it and the model does things on your behalf, goes to this dodgy apps and clicks things, and you don't have to worry about that. 418 00:36:31,924 --> 00:36:33,044 And you don't have a learning curve. 419 00:36:33,804 --> 00:36:38,074 And I think that's a vision of the future that excites a lot of people. 420 00:36:38,914 --> 00:36:44,724 And, I suspect we might see something like that in the near future because I don't see any roadblocks for 421 00:36:44,849 --> 00:36:46,929 Amit Bahree: No, I actually argue the other way. 422 00:36:47,189 --> 00:36:49,939 I see it actually is already happening now. 423 00:36:49,959 --> 00:36:51,469 And I can give you two, real examples. 424 00:36:51,479 --> 00:36:53,599 for example, I'm originally from India. 425 00:36:54,099 --> 00:37:03,399 And in India, as much as the country's made progress, there's still a, decent percentage of the population who is not very literate. 426 00:37:03,579 --> 00:37:09,149 Either they haven't finished school, or they dropped out early, or they've actually not gone to school. 427 00:37:09,634 --> 00:37:17,874 Now, it may be a small percentage at a country level, but if it's a country with 1.4 billion, a small percentage in absolute numbers is still a big number. 428 00:37:18,304 --> 00:37:20,693 a chunk of humanity. 429 00:37:21,263 --> 00:37:34,363 And in that, we're seeing, for many people who are not comfortable reading or writing, some of the cheaper devices they have, it's not an iPhone or an Android phone, but they have a speech. 430 00:37:34,373 --> 00:37:35,693 So they print, there's a big mic. 431 00:37:36,348 --> 00:37:51,298 In the middle of the phone, they can press that and talk to it and actually in natural language, in their language, they're asking questions and talking to it and that is, as it happens in some of these cases, it's out of some of our speech AI, which is understanding it and then responding back. 432 00:37:51,418 --> 00:37:58,658 So it's lowering the barrier and opening up this to a broader segment, which in the past was not possible. 433 00:37:59,213 --> 00:38:03,423 so that's one example, because they don't need to know the language to go type it in or what have you. 434 00:38:03,423 --> 00:38:05,913 They can just talk to it normally how they would talk to it. 435 00:38:06,293 --> 00:38:22,158 And then the second one was actually more of a ChatGPT example, which I think, Microsoft also published where it's, plugging in different languages for, again, in rural areas in India as farmers, like India is not, For those who don't know, it's not like the U. 436 00:38:22,158 --> 00:38:22,308 S. 437 00:38:22,318 --> 00:38:27,678 and others, where you have big farms with, hundreds and thousands of hectares or acres. 438 00:38:27,708 --> 00:38:29,348 They're usually small farms. 439 00:38:29,738 --> 00:38:31,448 Usually it's, the family which owns it. 440 00:38:31,828 --> 00:38:39,138 And they don't really have the muscle at an individual level, to go understand pricing and markets and what's happening as they want to go sell there. 441 00:38:39,623 --> 00:38:41,073 green or whatever they're growing. 442 00:38:41,283 --> 00:38:48,983 So in that sense, we talked about democratizing was how they're using ChatGPT to actually, getting basically real time market information. 443 00:38:48,983 --> 00:38:59,263 So they're empowered to go make a better decision, which until now was impossible because you need a computer, you need a modem, or you need to be online and those are the barriers. 444 00:38:59,263 --> 00:39:03,743 And it's like, they don't know how to use it, or in the language that they understand. 445 00:39:04,298 --> 00:39:23,138 So these are actually happening today, like in production, so to speak, live, and the way we want to think about it as is democratizing AI, which is when I go back to how you started asking me the question, when we started talking, if I go back to my mom's or the example you used with your mom of the barrier of a new 446 00:39:23,188 --> 00:39:31,788 app or a new interface, if we free them up or make it easier in many ways, those are the democratizing, elements that is happening is not only about your, Okay. 447 00:39:32,188 --> 00:39:36,188 how literate you are or not, but it's the, it's easing barriers basically. 448 00:39:36,268 --> 00:39:38,588 So of course it doesn't do everything. 449 00:39:39,358 --> 00:39:47,958 It doesn't mean all barriers are gone, but we see a lot of real examples, day to day life things that people are using it, which is absolutely fascinating. 450 00:39:48,408 --> 00:40:05,778 Miko Pawlikowski: there was this very popular demo of, I think it's called bland AI where they had a billboard with a phone number to call and you can have thousands and thousands of parallel conversations with an AI to do things like booking and, Basically get like a first line human in a, experience really. 451 00:40:06,518 --> 00:40:09,058 And the demos were amazing. 452 00:40:09,158 --> 00:40:16,178 And there's, like a million startups doing things around that at the moment, it also obviously has the dark side, right? 453 00:40:16,208 --> 00:40:28,258 Where people are worried that what does it mean, Can you go and sway an election now by just calling everybody in the US and telling them something that they want to hear and, personalize the message. 454 00:40:28,808 --> 00:40:39,318 It is a brave new world, a weird world that we're entering here, Where some things that, you could always technically go and call everybody in the U. 455 00:40:39,318 --> 00:40:41,358 S., but it would take a while. 456 00:40:43,668 --> 00:40:49,648 now with those things, maybe you can do it convincingly in a shorter period of time, And maybe not that expensively. 457 00:40:49,708 --> 00:40:50,878 does that scare you? 458 00:40:51,591 --> 00:40:52,161 Amit Bahree: yes and no. 459 00:40:52,161 --> 00:40:55,901 I think that's true with any aspect of humanity or technology. 460 00:40:55,901 --> 00:40:57,991 You can use it for good, you can not use it for good. 461 00:40:58,211 --> 00:40:59,561 And it's a choice you have to make. 462 00:40:59,571 --> 00:41:00,701 So I think that's sort of one. 463 00:41:00,701 --> 00:41:06,591 So in that dimension, it's not something new that we haven't been doing. 464 00:41:06,591 --> 00:41:10,961 I think what is new or what is more dangerous, if that's the word I want. 465 00:41:10,981 --> 00:41:13,521 I'm not sure if that's the word I want, but I can't think of a better one. 466 00:41:13,761 --> 00:41:17,611 More concerning, is how easy it is. 467 00:41:17,671 --> 00:41:20,281 And unless What things to watch out for? 468 00:41:20,291 --> 00:41:21,701 How do you know what's true or not? 469 00:41:21,761 --> 00:41:29,221 So I think there's of course dimensions into it where we as humans have to recalibrate ourselves on, do I trust it or not? 470 00:41:29,511 --> 00:41:31,961 For example, Robocalling has been around for decades. 471 00:41:32,001 --> 00:41:34,541 The fact that I can cheaply call everyone is not the problem. 472 00:41:34,923 --> 00:41:40,838 now it is, it may sound like Amit or Miko's calling, which in the past You know it's not Amit or Miko calling. 473 00:41:41,298 --> 00:41:44,878 I think that's the really, things to think about and worry about. 474 00:41:44,878 --> 00:41:56,418 The way I reposition it as well, from a Microsoft perspective, and also I have a whole chapter in the book on that, is, the, there is new emerging threats from a security. 475 00:41:56,418 --> 00:42:06,428 So if you think of a traditional security aspect of your application or developer, DevStack, The way we're saying is look, there's additional new security threats you have to go think about. 476 00:42:07,228 --> 00:42:18,978 And it's easy to get wrapped up in all the negative, but if you step back and say, look, as there were paradigm shifts, as you went from client server two tier, and I'm going to show my age now, applications to distributed applications and then 477 00:42:18,978 --> 00:42:25,643 to web applications, There's a lot of goodness, but then it also opened up the exposure to, a different threat vector. 478 00:42:26,043 --> 00:42:27,413 The surface area was different. 479 00:42:27,643 --> 00:42:31,793 In some cases it was broader, in other cases it was actually contracted. 480 00:42:32,373 --> 00:42:34,433 And in that sense, this is no different. 481 00:42:34,443 --> 00:42:41,413 There is new emerging threats you have to think about and be cognizant of, and then also understand what is the risk of that. 482 00:42:41,913 --> 00:42:44,873 And sure, a threat could happen, but how often will it happen? 483 00:42:45,043 --> 00:42:46,123 And how do I mitigate that? 484 00:42:46,163 --> 00:42:48,033 Uganda will solve 100 percent everything. 485 00:42:48,488 --> 00:42:53,128 But you have to then hone it back down into what's your use case, how you're thinking about it, and so on. 486 00:42:53,768 --> 00:43:03,008 so instead of, either ignoring it, which is not good, or putting your head in the sand like it's all doom, neither of those dimensions are going to be helpful. 487 00:43:03,028 --> 00:43:08,208 So I think part of it is understanding that, yes, there is a new set of threats that are emerging. 488 00:43:08,918 --> 00:43:10,208 Be aware of those. 489 00:43:10,518 --> 00:43:12,118 How do you solve for those? 490 00:43:12,378 --> 00:43:13,808 How do you manage those? 491 00:43:13,868 --> 00:43:17,688 And then In the context of a use case, in the context of how you're using it. 492 00:43:18,638 --> 00:43:21,558 Miko Pawlikowski: It's a little bit like passwords, isn't it? 493 00:43:21,908 --> 00:43:30,578 We rely on the fact that it's not practical for someone to go and brute force your password because it would take a thousand years. 494 00:43:31,358 --> 00:43:45,833 And if someone goes and figures out how to make a computer that goes around the limitations of Physics and can do it a thousand times faster, all of a sudden a lot of passwords would be useless. 495 00:43:46,363 --> 00:43:48,253 And I think it's a little bit like that, right? 496 00:43:48,273 --> 00:43:55,973 we got a technology that made things possible now, that we're relying on them just not being practical from, time and cost perspective. 497 00:43:56,623 --> 00:43:57,833 And now we have to deal with that. 498 00:43:57,863 --> 00:44:00,693 And the genie's out of the bottle, as they say, I think. 499 00:44:00,733 --> 00:44:02,333 and the cat's out of the bag. 500 00:44:04,836 --> 00:44:05,766 Amit Bahree: That's a great analogy. 501 00:44:05,766 --> 00:44:06,476 I actually like that. 502 00:44:06,476 --> 00:44:07,896 I'm going to steal that in other places. 503 00:44:07,896 --> 00:44:10,506 But you're right, like there was a time where we didn't need passwords. 504 00:44:10,736 --> 00:44:11,596 It wasn't a problem. 505 00:44:12,056 --> 00:44:15,386 And then there was a time where we needed passwords, but it was simple passwords. 506 00:44:15,696 --> 00:44:18,716 You could do hello1234 or password1 or what have you. 507 00:44:19,076 --> 00:44:22,296 And then it was like, time where, okay, it needs to be a little more complex. 508 00:44:22,326 --> 00:44:25,276 And now you can find these, buy these on the dark web and all. 509 00:44:25,276 --> 00:44:29,476 and hence you need more complex passwords. 510 00:44:29,576 --> 00:44:32,246 my one PSA is please use a password manager. 511 00:44:32,806 --> 00:44:37,866 10 years, 15 years ago, if you were chatting, the concept of a password manager would be so alien. 512 00:44:38,646 --> 00:44:42,806 And here now, I'm sure as you do, I do tech support for my family. 513 00:44:43,646 --> 00:44:44,696 Unpaid, of course. 514 00:44:44,876 --> 00:44:47,066 my everything is go use the password manager. 515 00:44:47,096 --> 00:44:48,176 Here's how you set it up. 516 00:44:48,176 --> 00:44:49,916 And why shouldn't you reuse passwords? 517 00:44:49,916 --> 00:44:52,986 And let the thing do the heavy lifting for you. 518 00:44:53,046 --> 00:44:54,446 But you save it, right? 519 00:44:54,466 --> 00:44:56,016 With your master password and whatnot. 520 00:44:56,596 --> 00:44:58,976 I think it's, yeah, it's the same analogy in that sense. 521 00:44:59,046 --> 00:45:01,926 it goes back to your thread vectors change society. 522 00:45:01,946 --> 00:45:04,976 Things are changing and, part of it is adapting. 523 00:45:05,036 --> 00:45:05,736 Some is good. 524 00:45:05,776 --> 00:45:06,426 Some is not good. 525 00:45:07,291 --> 00:45:09,051 Miko Pawlikowski: Let's circle back to your book. 526 00:45:09,371 --> 00:45:12,411 ultimately, that's how I learned about you existing. 527 00:45:13,256 --> 00:45:22,576 So as I was reading it, for anybody who's, interested in, go and pick it up on, manning.com, it's a very practical guide. 528 00:45:22,646 --> 00:45:26,716 It's called "Generative AI in action" for a reason. 529 00:45:26,726 --> 00:45:27,136 There is. 530 00:45:27,676 --> 00:45:32,006 Little time spent on the underlying details. 531 00:45:32,006 --> 00:45:49,536 There is obviously the intro that covers everything that you would expect in terms of what is generative AI, the architecture, high level, what it means, references, overview of LLMs, transformer, smaller language models, that kind of stuff. 532 00:45:49,576 --> 00:46:02,181 And then it turns into basically a guide to show you what's possible with it, show you how you can go and call some API and get magic text being generated. 533 00:46:02,411 --> 00:46:12,241 It shows you how to generate pictures, shows you how to generate other things like music, video, I think briefly code, all that kind of stuff. 534 00:46:12,241 --> 00:46:25,171 So I'm picturing this really as a kind of guide that you get yourself when you want to get into this without wasting any time on things that are not necessary for your journey, I will get you from zero to one on that. 535 00:46:25,721 --> 00:46:27,531 is that accurate description? 536 00:46:27,681 --> 00:46:30,221 Am I doing a good, marketing pitch here? 537 00:46:30,396 --> 00:46:30,886 Amit Bahree: Mostly. 538 00:46:31,116 --> 00:46:31,626 Yeah. 539 00:46:31,961 --> 00:46:32,541 Miko Pawlikowski: Mostly. 540 00:46:34,556 --> 00:46:35,576 Amit Bahree: I'm not in marketing. 541 00:46:35,696 --> 00:46:35,896 Yes. 542 00:46:35,936 --> 00:46:37,536 I think that is an accurate description. 543 00:46:37,536 --> 00:46:45,386 I think the emphasis is on the "in Action" part, the premise of this is, you want to go build an app and right now. 544 00:46:45,386 --> 00:46:55,226 I go back to my year and a half of conversations from CEOs down across the Fortune 500 or whatever, which is our, from a work point of view, right? 545 00:46:55,226 --> 00:47:11,126 A lot of these large enterprises, but this is not about just large enterprises, it's about if you're a company, you have a set of products you want to improve or make new, how do I use this GenAI and ChatGPT and LLMs and everyone's heard about it. 546 00:47:11,626 --> 00:47:13,596 And they don't know where to start or how to start. 547 00:47:14,226 --> 00:47:16,816 So that's really what I was trying to do, right? 548 00:47:16,816 --> 00:47:19,376 There's broadly speaking three parts to the book. 549 00:47:19,396 --> 00:47:29,076 The first part is introductions and because you just know what you know, I can't just go dig into things without giving you some context and basis on what's possible, what's not possible. 550 00:47:29,556 --> 00:47:31,276 and that's the first part you're touching on. 551 00:47:32,126 --> 00:47:36,196 What I stay away from is it's not a science research book. 552 00:47:36,246 --> 00:47:43,116 I link to papers where there are people who generally are curious or they want to go deeper. 553 00:47:43,536 --> 00:47:53,666 So we leave those crumb trails in a way saying, if you want to dig more in your own time kind of a thing, here's the things you can go read up and then that'll expose you to more dimensions, right? 554 00:47:53,666 --> 00:48:07,186 So it's not a science book, techie book in that sense, because At least in an enterprise setting, most developers and CTOs and CIOs or CEOs, they want to see like, how is it going to solve my business problem? 555 00:48:07,876 --> 00:48:08,926 How do I do it? 556 00:48:09,606 --> 00:48:19,486 Some are interested in the science and the depth, but most just want to know at a high level, how it works deep enough, but not in the guts at least on the AI side of the science. 557 00:48:20,036 --> 00:48:35,586 so we leave the breadcrumbs and the trails pointing to papers where people can go deeper should they want to, but If you're a developer and you can use a set of APIs and SDKs, that is really for you and the way we say is because these, at least these LLMs are exposed as an API. 558 00:48:36,151 --> 00:48:40,981 You really don't need to know any of the AI sort of mumbo jumbo, any developer can pick it up easily. 559 00:48:41,001 --> 00:48:43,211 So that's certainly why I was trying to position it. 560 00:48:43,791 --> 00:48:49,081 Part one is getting you a sense of the world from a technical perspective, but not go super deep. 561 00:48:49,601 --> 00:49:01,008 And then part two and part three is where we start going deeper on, okay, how do I use this in my production, solving my business problem, what I'm trying to do. 562 00:49:01,101 --> 00:49:07,661 Miko Pawlikowski: making it very applicable, for example, at some point, the book is talking about image generation. 563 00:49:08,176 --> 00:49:28,206 And there is a short description of generative adversarial networks, and it doesn't include Ian Goodfellow getting drunk and going to a fellow student's graduation and then arguing with them and then going home and implementing a proof of concept algorithm to prove the other people wrong. 564 00:49:28,316 --> 00:49:30,736 And, next day discovering is actually working. 565 00:49:31,286 --> 00:49:33,296 It's giving you the kind of applicable. 566 00:49:34,021 --> 00:49:43,841 This is used for scenarios where the data is complex and diverse, requiring realism, suitable for high quality images, data augmentation, style transfer. 567 00:49:43,841 --> 00:49:51,841 So it's prescriptive in a way, I would say, you give people, what they need to, get to get cracking with it. 568 00:49:51,901 --> 00:50:05,091 speaking of which, let's talk a little bit about the images because you do cover a few interesting things like the VAE, the GANs, diffusion, vision transformers, to give people a sneak peek of what they're going to expect. 569 00:50:05,151 --> 00:50:12,041 Can you talk about why they're interesting and why they might be, something that you should be paying attention to. 570 00:50:12,671 --> 00:50:13,891 What are the breakthroughs? 571 00:50:14,649 --> 00:50:21,249 Amit Bahree: I think one aspect is where ChatGPT and the LLMs is just the language part, is taking the hype. 572 00:50:21,249 --> 00:50:26,799 And I think most people understand that there's a different set of tech related but different on images, right? 573 00:50:26,849 --> 00:50:41,479 and image understanding, image editing, the power of it on one hand, wherever you go on whichever social thingy with stable diffusion came out, lot of creativity on the image generation was there, but in a social 574 00:50:41,489 --> 00:50:50,209 setting, the thing really is, how do you expand that in a corporate, application setting and what can you do? 575 00:50:50,379 --> 00:50:54,569 It's one is like fun and wonderful in a personal social setting. 576 00:50:55,109 --> 00:51:01,909 But then how do I transfer that and then which area do I use in a work setting? 577 00:51:02,519 --> 00:51:06,009 Not even have to be work, each of these techniques have their own power. 578 00:51:06,059 --> 00:51:17,254 I think most people don't really care or maybe nor should they care, but in some cases where it would matter It's good to know what is the underlying tech, so I know what to ignore versus not to ignore. 579 00:51:17,404 --> 00:51:20,024 Because, again, the hype wraps up a lot of this. 580 00:51:20,024 --> 00:51:28,104 If you come back to it, it's more of helping people ground themselves a little, because at the end of the day, the tech is still the tech, right? 581 00:51:28,164 --> 00:51:31,954 What it is meant to do and how it is meant to do doesn't fundamentally move. 582 00:51:31,964 --> 00:51:38,424 if you're trying to solve one set of images, one kind of things, like diffusion models would be great for, that set of categories. 583 00:51:38,424 --> 00:51:40,014 And now there's multiple diffusion models. 584 00:51:40,014 --> 00:51:41,374 You can go pick which one you want. 585 00:51:41,949 --> 00:51:43,119 versus a transformer model. 586 00:51:43,129 --> 00:51:53,169 So again, we don't go, I have a few diagrams and images to, outline at a high level how these are, because there's papers on each topic, you can go read like hundreds of them. 587 00:51:53,849 --> 00:51:57,659 but the intention is just to know, look, there's different buckets and categories. 588 00:51:57,729 --> 00:51:59,089 Each has its own strengths. 589 00:52:00,379 --> 00:52:04,549 And if what you're trying to solve for, just make sure you connect those dots. 590 00:52:05,029 --> 00:52:10,149 I guess the other analogy is if you're writing a book, word is easier than notepad kind of a thing, right? 591 00:52:10,319 --> 00:52:18,089 Miko Pawlikowski: I was a little surprised to see a prompt engineering chapter, but I guess it makes perfect sense. 592 00:52:18,119 --> 00:52:20,449 You need a little bit of basics. 593 00:52:20,499 --> 00:52:22,659 What was your thinking, with that chapter? 594 00:52:22,989 --> 00:52:25,049 What was the goal you wanted to achieve with it? 595 00:52:26,771 --> 00:52:30,621 Amit Bahree: in the context of LLMs, like prompt engineering is pretty crucial. 596 00:52:31,461 --> 00:52:34,651 it is how you steer the model fundamentally in many ways. 597 00:52:35,111 --> 00:52:38,271 the beauty of it is half art and half science. 598 00:52:39,096 --> 00:52:41,736 The frustration of it is it is half art and half science. 599 00:52:42,926 --> 00:52:49,816 but, fundamentally, at least with today's technology of where things are, prompt engineering is quite crucial. 600 00:52:50,366 --> 00:52:58,616 And the way we also tell many of our customers and I tell is look, you have to start thinking about prompts as your IP 601 00:52:59,526 --> 00:53:01,686 in many ways, and I'm not talking about simple prompts. 602 00:53:01,686 --> 00:53:04,886 Like I, in the book, I use simple prompts to make the point. 603 00:53:04,896 --> 00:53:09,586 So 'tell me a story about a panda' is not really IP in the context of a prompt. 604 00:53:10,166 --> 00:53:14,326 and then prompts are also closely tied to how a model understands it. 605 00:53:14,886 --> 00:53:33,111 So again, outside of simple prompts, where you are When you're using this concept of RAG, for example, as you start using a specific model or a family of models, which are closely related, you start picking up nuances on how the models interpreting things and working with things and so on. 606 00:53:33,141 --> 00:53:36,261 And then you're tweaking your prompts along with that, right? 607 00:53:36,261 --> 00:53:37,621 So it's cohesive together. 608 00:53:38,161 --> 00:53:44,091 and that intuition as you learn is also part of your IP and how you want to think about prompt engineering. 609 00:53:44,291 --> 00:53:45,401 That also means. 610 00:53:46,101 --> 00:53:47,461 There is no universal prompts. 611 00:53:47,501 --> 00:53:50,991 Again, outside of the simple ones, I'm not talking about the simple, straightforward prompts. 612 00:53:51,381 --> 00:54:02,581 So you should not, or one should not just say, if I'm, let's say, using GPT 3, 4, whichever, the same prompts, which are complex ones, I can pick up and expect to work on, let's say, LLAMA or something else. 613 00:54:02,781 --> 00:54:07,101 They will work, but do they work at the same level and the same evaluation, the same criteria? 614 00:54:07,111 --> 00:54:11,221 Probably not, because they are quite tied into how the model behaves. 615 00:54:11,271 --> 00:54:12,271 This is very loose, right? 616 00:54:12,271 --> 00:54:16,101 It's not a scientific thing, prompt engineering is quite crucial. 617 00:54:16,171 --> 00:54:21,561 It is how we talk, even though you are calling an API, but how you're talking to the model is through those. 618 00:54:21,971 --> 00:54:28,341 so I think it's worth spending time to understand how, what these are, how they work. 619 00:54:28,361 --> 00:54:30,061 There's a lot of hype around prompts as well. 620 00:54:30,561 --> 00:54:31,841 I would say don't believe all of it. 621 00:54:32,501 --> 00:54:37,591 The one final point I want to make on it is prompts is also one of the new threat vectors. 622 00:54:37,611 --> 00:54:42,281 So I touch on it in a later chapter, I touch a little bit on prompt injection in the chapter you've seen. 623 00:54:42,771 --> 00:54:45,381 but we go a little more deeper in one of the later chapters. 624 00:54:45,761 --> 00:54:48,851 But prompt injection, as an example, is one of the new threat vectors. 625 00:54:48,851 --> 00:54:49,761 It's not the only one. 626 00:54:50,161 --> 00:54:55,101 so again, understanding that as well, but prompts, in today's world gets quite crucial. 627 00:54:55,541 --> 00:54:58,421 At the end of the day, it's how we, in quotes, talk to the model. 628 00:54:58,437 --> 00:54:59,227 Miko Pawlikowski: That makes sense. 629 00:55:00,247 --> 00:55:08,897 prompt engineering might be getting a little bit of bad rep just because of how many people are walking around saying that they have the ultimate prompt stuff like that. 630 00:55:08,897 --> 00:55:14,057 But at the end of the day, you do need to learn how to to these things. 631 00:55:14,967 --> 00:55:17,357 And it is one of the biggest frustrations. 632 00:55:17,367 --> 00:55:25,017 It's almost like, You're talking to a cat sometimes it can suddenly freak out and do something very weird at that moment notice. 633 00:55:25,017 --> 00:55:27,497 And there is little you can do to prevent that. 634 00:55:28,422 --> 00:55:28,942 Amit Bahree: Yeah. 635 00:55:28,972 --> 00:55:35,472 and so we call it, or at least I call it, is you have to think of prompts when you're talking to the model as a parent. 636 00:55:35,602 --> 00:55:41,082 So for those who have had children or who are toddlers right now, it is what we call parentology. 637 00:55:42,452 --> 00:55:46,542 Somebody said this to me in one of my meetings and I loved it and I stole it from them. 638 00:55:47,062 --> 00:55:51,852 So if you're a toddler, your memories retention is lower. 639 00:55:52,332 --> 00:55:54,182 so many often you have to keep repeating. 640 00:55:54,192 --> 00:55:57,542 It's the classic, don't stick your finger in the wall socket, a thing. 641 00:55:58,312 --> 00:56:00,942 Saying it one time doesn't help, you have to keep repeating. 642 00:56:01,422 --> 00:56:08,882 The way I want, generally speaking, folks to think is, your model's like a toddler, you have to keep repeating, keep thinking about it, right? 643 00:56:08,932 --> 00:56:17,277 and as silly as it may sound, it's like basic stuff, like for example, one of the side effects is what's called hallucinations where, non grounded. 644 00:56:17,277 --> 00:56:20,527 So you will get responses back, which are made up and it's not factual. 645 00:56:21,117 --> 00:56:22,947 That may be okay in one dimension. 646 00:56:22,997 --> 00:56:33,037 If you are writing a creative story, it may not be okay in another dimension where in a business setting, you're answering things based on some policy or information or what have you. 647 00:56:33,897 --> 00:56:47,837 so in the prompt it figures like simple things like do not make up any information, only answer from this, you would think that would be obvious, so your intuition of a cat is not very far off 648 00:56:48,767 --> 00:56:50,287 Miko Pawlikowski: that's an interesting comparison. 649 00:56:51,177 --> 00:56:52,207 Let's do one more. 650 00:56:52,417 --> 00:57:09,635 you talk about RAG in your book and I think a lot of people, I have heard the term and know that there is something to do with getting fresher data, can you give us an explanation for a five year old version of what that is and how it works 651 00:57:10,875 --> 00:57:15,335 Amit Bahree: should open ChatGPT on the other screen and say, 'explain RAG for a 5 year old in summary'. 652 00:57:16,625 --> 00:57:19,305 RAG is Retrieve, Augment, Generate, right? 653 00:57:19,305 --> 00:57:26,335 So the technique originally came from Meta, Facebook, as the research paper. 654 00:57:26,785 --> 00:57:34,935 But fundamentally, it is crucial when you are using large language models, specifically in the context of a company or a business or what have you. 655 00:57:36,500 --> 00:57:47,160 Basically, it is also a little clunky right now, but what it does is, as the name suggests, the model that you're using just knows what it knows, what it's been trained on, which is public data. 656 00:57:47,550 --> 00:57:48,050 That's one. 657 00:57:48,500 --> 00:57:51,450 And then as with these things, there's a training cutoff, right? 658 00:57:51,480 --> 00:57:54,240 At some point, you say, okay, I'm done collecting data. 659 00:57:54,680 --> 00:58:04,510 I need to go off for a few weeks or a few months or whatever it is and go train this thing and then spit out a model and go through a bunch more other alignment and this and that, and then eventually have a model available. 660 00:58:05,260 --> 00:58:16,100 So Online, when you go and see, a lot of people using RAG to get fresh information, which is post training data, that is absolutely valid use case. 661 00:58:16,490 --> 00:58:20,420 For many others, the other thing is my proprietary information. 662 00:58:20,600 --> 00:58:31,480 So especially in a company setting, your proprietary internal information, corporate knowledge, the model doesn't know because it's never seen it. 663 00:58:32,410 --> 00:58:35,340 In fact, if it does know that, then fundamentally there's a different problem. 664 00:58:35,970 --> 00:58:37,110 Because it shouldn't know that. 665 00:58:37,620 --> 00:58:49,320 for your business workflow, you often need to bring in your internal proprietary knowledge, whether it's a CRM or a database or an ERP, or you're solving a ticket or what have you, depending on the use case. 666 00:58:49,800 --> 00:58:56,840 The only way the knowledge, you can bring in the knowledge is through This technique of RAG, so retrieve, augment, generate. 667 00:58:57,230 --> 00:59:05,590 Retrieve means I'm retrieving the information, which could be from my corporate enterprise systems, or, Google or Bing to get more fresh information. 668 00:59:06,110 --> 00:59:10,220 I'm augmenting it in my prompt, which goes back to prompt engineering. 669 00:59:10,220 --> 00:59:14,100 And then based on that, I'm saying please generate, or whatever I'm trying to do. 670 00:59:14,120 --> 00:59:19,860 Generation could be a summary, or entity extraction, or depending on, whatever I'm trying to do. 671 00:59:20,300 --> 00:59:21,480 But that's what RAG is doing. 672 00:59:21,840 --> 00:59:25,620 It also gets clunky, by the way, because, it is the first generation. 673 00:59:25,650 --> 00:59:27,790 I do expect things to get improving in that. 674 00:59:27,790 --> 00:59:47,490 you talk about complexities of RAG, but if I have to get proprietary in-house information, if I have to get more fresher information, the only ways I can do that is through RAG, without retraining a whole model, which, in theory, is an option practically for, I guess 99% of people is not an option. 675 00:59:47,490 --> 00:59:49,050 I don't know if that was for a five year old, but 676 00:59:50,640 --> 00:59:56,320 Miko Pawlikowski: Yeah, that might have been a six and a half, maybe even seven, but I let, we'll let it 677 00:59:56,580 --> 00:59:57,000 Amit Bahree: Thank you. 678 00:59:57,080 --> 00:59:57,380 Miko Pawlikowski: time. 679 00:59:59,830 --> 01:00:00,260 Okay. 680 01:00:00,360 --> 01:00:16,425 so this is basically what, you're going to see if you look at, the early access version of the book, tells me that there is six more chapters coming very soon and there's a chapters 8-13 that cover things like, More on 681 01:00:16,425 --> 01:00:25,465 RAG, telling models, application architecture for GenAI apps, can they have evaluation and ethical on GenAI. 682 01:00:25,485 --> 01:00:30,765 I, I think at some point we're going to have to get you back to talk about the rest of the book. 683 01:00:31,295 --> 01:00:36,475 but before I let you go, I wanted to, ask you for a few predictions. 684 01:00:37,260 --> 01:00:42,220 from where you stand, where, are we going to see the next evolutions and breakthrough, 685 01:00:42,290 --> 01:00:44,270 Amit Bahree: one is , Multimodality, 686 01:00:47,130 --> 01:00:53,460 which basically, a lot of people today, primarily when they're using GenAI and the likes of ChatGPT is in one mode, i.e. 687 01:00:53,480 --> 01:00:54,550 language, text. 688 01:00:55,030 --> 01:01:01,815 But I do expect multimodality where I'm starting to combine language, images, text, video, and what have you, together. 689 01:01:02,155 --> 01:01:03,875 Not just generation, but input. 690 01:01:03,885 --> 01:01:05,515 We already are seeing that, by the way. 691 01:01:05,515 --> 01:01:10,365 That's already here today, like GPT V, which is vision, being one example of that. 692 01:01:10,705 --> 01:01:14,525 But more and more multimodality, because our real world is that as well, right? 693 01:01:14,525 --> 01:01:16,045 So I see one and that happening. 694 01:01:16,455 --> 01:01:19,655 I do see SLMs to accelerate more, as we touched on. 695 01:01:20,375 --> 01:01:22,755 Again, they're not better, they're different. 696 01:01:23,255 --> 01:01:27,225 there's times you need one, and there's times you need the other, and there's times you need both. 697 01:01:28,040 --> 01:01:34,460 But, I do see more and more on that front because for many use cases, I need simple things. 698 01:01:34,460 --> 01:01:35,870 I don't need all the other power. 699 01:01:36,320 --> 01:01:37,890 so I do see that accelerating a lot. 700 01:01:37,890 --> 01:01:54,370 And then I also see a third dimension is the, underlying, systems engineering things improving to be it cost effective from how much hardware and GPUs I need to run it to, latency around it, things like memory profile and so on and so forth. 701 01:01:54,700 --> 01:02:04,860 so I do see those sort of three and I guess I want to sneak in a fourth one, which is also all of the responsible AI aspects, which is one of the later chapters, we touched on the likes of prompt engineering. 702 01:02:04,860 --> 01:02:09,600 I know I talked a little bit on hallucinations, but the new harmful things one can do. 703 01:02:10,315 --> 01:02:12,025 That's also a cat and mouse thing. 704 01:02:12,025 --> 01:02:15,455 I do see more research breakthroughs 705 01:02:15,652 --> 01:02:21,182 Miko Pawlikowski: Do you expect we're still going to be doing transformers a year or two or three from now? 706 01:02:22,502 --> 01:02:25,642 Do you think it was big enough of a breakthrough that it's going to stay 707 01:02:25,692 --> 01:02:31,552 Amit Bahree: I honestly don't know what I can tell you is it's what everybody's doing at the moment, which is not going away anytime soon. 708 01:02:31,732 --> 01:02:32,902 That's one side of it. 709 01:02:33,362 --> 01:02:34,192 Having said that. 710 01:02:34,737 --> 01:02:40,277 I think it's also pushing a lot of other areas around it where we can do things better. 711 01:02:40,317 --> 01:02:47,017 for example, we didn't touch on it, but each model has this concept of what we call a context window. 712 01:02:47,077 --> 01:02:52,247 How much, how big my prompt can be and in reality how many tokens can it be and how much can it send back? 713 01:02:52,737 --> 01:02:59,777 So on one hand a lot of people get happy Hey, if I have a longer token, my context window is longer. 714 01:02:59,787 --> 01:03:05,877 It means I can stuff in more things I can ask you more things or I can generate more things On one hand, that's good. 715 01:03:05,947 --> 01:03:07,145 People get happy about it. 716 01:03:07,145 --> 01:03:21,217 What then I come and have to remind them like each token one length Increase is a quadratic increase in compute So it's four times, extra costly in the sense of computing profile. 717 01:03:21,217 --> 01:03:25,497 So just having a longer token, context window isn't necessarily good. 718 01:03:26,287 --> 01:03:29,237 So there is research going on now to say, how can we do that? 719 01:03:29,237 --> 01:03:33,877 How can we, derivatives of the transformer architecture, which is, how can we increase? 720 01:03:34,362 --> 01:03:38,582 the token windows without having a quadratic increase on the compute profile. 721 01:03:38,942 --> 01:03:43,002 and that ties back to the attention mechanics of how the transformer architecture works. 722 01:03:43,062 --> 01:03:45,622 the way I would say it's the first one which has, reached the scale. 723 01:03:45,662 --> 01:03:50,852 And then now there's other research damage that's happening to make those profiles better. 724 01:03:51,732 --> 01:03:52,952 will there be another big two? 725 01:03:54,062 --> 01:03:57,722 Which is, better than this, I'm sure, in the sense of humanity, absolutely. 726 01:03:58,115 --> 01:04:04,195 Miko Pawlikowski: plus like you alluded to, there is a lot of value to being the first thing that's good enough, right? 727 01:04:04,845 --> 01:04:14,735 when we look at how technology works, it's better to have something that's good enough today than to have the perfect or, ideal solution much later. 728 01:04:15,325 --> 01:04:17,555 And typically there is enough momentum. 729 01:04:17,995 --> 01:04:23,955 By the time the better thing comes that it might not be as, attractive as one would think. 730 01:04:24,795 --> 01:04:33,625 I think what was the paper from Google talking about basically a method of achieving infinite, attention span, or was that some other research that you're alluding to? 731 01:04:33,992 --> 01:04:36,042 Amit Bahree: There, there's a few, there's a few papers. 732 01:04:36,062 --> 01:04:40,012 So there's one, in fact, Microsoft has on, like, how can I do 2 million tokens. 733 01:04:40,188 --> 01:04:41,178 that's one example. 734 01:04:41,418 --> 01:04:45,798 There's another one which is research going on called Ring Attention, which is different. 735 01:04:46,218 --> 01:04:48,718 I can't remember, I think it was Google? 736 01:04:48,958 --> 01:04:50,378 I can't recall off the top of my head. 737 01:04:50,468 --> 01:04:53,038 so there's multitudes of things going on. 738 01:04:53,728 --> 01:04:58,168 In parallel, like active research, on, how do we look at this differently. 739 01:04:58,208 --> 01:04:59,888 And then, that's just a context window. 740 01:04:59,888 --> 01:05:04,558 There's other things, for example, like when we touched on RAG, I said, it's a clunky way of doing things. 741 01:05:04,668 --> 01:05:06,248 we didn't go deep in it, but it's a clunky way. 742 01:05:06,248 --> 01:05:10,988 So there's other things happening, like graph, can I do graph with RAGs, and so on and so forth. 743 01:05:11,058 --> 01:05:17,418 it's not only in one dimension, I was using one of these as an example, but across multitudes of dimensions. 744 01:05:17,848 --> 01:05:20,698 There is active research going on to improve those. 745 01:05:20,728 --> 01:05:29,688 And as you start, as that starts formulating, cause look, research is one, getting something as a product that is deployable and running. 746 01:05:30,143 --> 01:05:38,233 That you can consistently, is a whole separate sort of scale and of its own separate complexities. 747 01:05:38,633 --> 01:05:42,443 but across these multiple dimensions as they come together, it'll just suddenly get improved. 748 01:05:42,463 --> 01:05:50,653 To your point, this is like version one, and it's a mad race across the board from academia to, commercial and whatnot. 749 01:05:50,653 --> 01:05:55,203 So it'll just be improving is how I see it. 750 01:05:55,568 --> 01:06:00,098 Miko Pawlikowski: the open models eventually prevailing and taking over? 751 01:06:00,748 --> 01:06:01,688 there's a lot of talk. 752 01:06:01,708 --> 01:06:04,208 Obviously people are excited about LLAMA-3. 753 01:06:04,228 --> 01:06:14,138 That's I think a lot of people call it GPT-4 class, comparable, a model that's effectively free to use. 754 01:06:15,148 --> 01:06:20,838 And, obviously Microsoft doing their own, research and releasing the Phi-3, in the open as well. 755 01:06:21,158 --> 01:06:25,328 Do you see this models eventually becoming the defacto standard? 756 01:06:26,418 --> 01:06:28,178 Amit Bahree: they certainly have a place, for sure. 757 01:06:28,218 --> 01:06:30,538 I think, there's no question about that. 758 01:06:30,618 --> 01:06:32,578 I don't know if they're the de facto standard or not. 759 01:06:32,578 --> 01:06:41,638 I think the challenge would come down to is at the end of the day, with the current state of technology, training a model is super expensive. 760 01:06:42,508 --> 01:06:44,158 There is no shortcut around that. 761 01:06:44,598 --> 01:06:57,833 So even if you have an open source model in the sort of near term, there's only a handful of companies who have the technical know how have the sort of the muscle, have the compute profile to be able to do that. 762 01:06:58,593 --> 01:07:04,873 so more and more until again, unless, as there's more fundamental breakthroughs to open that up more. 763 01:07:05,468 --> 01:07:14,468 a lot of the open source, it's back to that, the ant analogy used for, with the, one of the diagrams we have from the paper in the book. 764 01:07:14,898 --> 01:07:19,528 the part I was trying to show there also is the roots are very few models that are derived from that. 765 01:07:19,528 --> 01:07:31,348 So I think once there's a lot happening, it's at the end of the day, there'll be just a handful of people who are publishing those and exposing those, that others are deriving from. 766 01:07:32,008 --> 01:07:38,168 So until that happens, as in fundamental breakthroughs at a cost point of view, where it becomes cheaper. 767 01:07:38,218 --> 01:07:46,193 It all doesn't need hundreds of thousands of whatever it is, GPUs plus I don't know how many billions of tokens of data, to train them. 768 01:07:47,270 --> 01:07:47,800 Miko Pawlikowski: Oh, don't worry. 769 01:07:47,800 --> 01:07:47,880 I 770 01:07:47,968 --> 01:07:48,388 Amit Bahree: it won't be 771 01:07:48,523 --> 01:07:51,843 Miko Pawlikowski: my crypto mining farm in my garage. 772 01:07:54,418 --> 01:07:55,288 Amit Bahree: There you go. 773 01:07:55,388 --> 01:07:56,718 That is one way to do it. 774 01:07:56,768 --> 01:08:02,478 I think open source won't still be constrained with a few source models, which they go derive from. 775 01:08:03,018 --> 01:08:19,708 But the fact, if you, the other way If you just look in the last one year, 12 months, which is nothing in the sense of humanity and technology, just see how much progress and how much improvement the models have made across both dimensions, whether they're open source or closed source or what have you. 776 01:08:20,078 --> 01:08:21,178 It is fascinating. 777 01:08:21,188 --> 01:08:25,118 and, the fact that there's literally new models every day is a good thing, but also not a good thing. 778 01:08:25,528 --> 01:08:27,408 So it has to stabilize to some extent. 779 01:08:27,428 --> 01:08:28,298 At some point it will. 780 01:08:28,828 --> 01:08:31,458 but I think the open source community is absolutely critical. 781 01:08:31,818 --> 01:08:44,128 On the flip side, a lot of research breakthroughs also are coming from the research labs where, there's, at the end of the day, deeper pockets and muscle sports in the sense of financial and compute and data as well. 782 01:08:44,238 --> 01:08:57,228 It's a fascinating world we are in, which is your, one of your opening statements, because at least for a geek and somebody in the industry, these are far and few moments that one gets. 783 01:08:57,238 --> 01:08:58,938 So it's absolutely fascinating. 784 01:08:59,660 --> 01:09:06,460 Miko Pawlikowski: Yeah, we'll be Sitting down with the grandchildren saying, ah, I remember in my day 785 01:09:08,475 --> 01:09:10,755 They released the first capable. 786 01:09:10,935 --> 01:09:11,775 Amit Bahree: they're like, what? 787 01:09:11,775 --> 01:09:14,005 you use hundreds of GPUs and all this stuff? 788 01:09:14,005 --> 01:09:14,255 Why? 789 01:09:14,305 --> 01:09:17,415 I can just run it on my phone or whatever the phone looks like. 790 01:09:17,575 --> 01:09:17,635 I don't know. 791 01:09:18,495 --> 01:09:19,275 Miko Pawlikowski: Yeah, exactly. 792 01:09:19,275 --> 01:09:21,475 You are so wasteful back in the day. 793 01:09:21,505 --> 01:09:23,155 Really very clunky 794 01:09:24,730 --> 01:09:25,310 Amit Bahree: That's right. 795 01:09:25,360 --> 01:09:37,640 Miko Pawlikowski: Well, we're going to have to wait a little bit until that materializes, but I completely agree It's a very interesting time to be alive, and I'm certainly grateful That I get to experience that. 796 01:09:38,370 --> 01:09:40,910 Amit, it's been a pleasure to host you. 797 01:09:41,010 --> 01:09:42,220 Thank you so much for coming 798 01:09:42,235 --> 01:09:43,105 Amit Bahree: Thank you for having me.