1 00:00:00,030 --> 00:00:05,700 I'm Miko Pawlikowski, and this is HockeyStick. 2 00:00:06,370 --> 00:00:11,190 Today, we're talking about LLMOps and how it iffers from MLOps, 3 00:00:11,209 --> 00:00:13,479 MLE, and other co acronyms. 4 00:00:13,640 --> 00:00:14,980 Do we need any discipline? 5 00:00:15,230 --> 00:00:18,709 How different is it really to work with large language models compared 6 00:00:18,710 --> 00:00:20,069 to any other piece of software? 7 00:00:20,430 --> 00:00:22,700 Why do models deteriorate over time? 8 00:00:22,990 --> 00:00:27,720 I'm joined by Abi Aryan, the author of LLM Ops, Managing Large Language 9 00:00:27,740 --> 00:00:32,740 Models in Production, as well as What is LLM Ops, published by O'Reilly. 10 00:00:33,235 --> 00:00:36,175 Abby is a founder at Abide ai. 11 00:00:36,714 --> 00:00:39,960 Welcome to this episode and thank you for flying Hockey Stick. 12 00:00:40,916 --> 00:00:45,426 LLMOps versus MLOps versus MLE. 13 00:00:46,376 --> 00:00:48,666 Can you tell me what's the difference between the three of them? 14 00:00:49,646 --> 00:00:54,706 So in very simple words, I think MLOps versus LLMOps. 15 00:00:54,726 --> 00:00:55,706 Those are framework. 16 00:00:55,776 --> 00:00:59,516 Machine learning engineering is a discipline or an engineering 17 00:00:59,556 --> 00:01:00,586 practice, I would say. 18 00:01:00,946 --> 00:01:03,416 So it's more like a role or a practice. 19 00:01:03,696 --> 00:01:06,566 I would keep that separate from both of those ones. 20 00:01:06,856 --> 00:01:10,776 but let me define the differece between MLOps versus LLMOps. 21 00:01:11,226 --> 00:01:14,356 So most of the conventional machine learning models that we have seen till 22 00:01:14,376 --> 00:01:19,246 date were discriminative models, which is they were very predictable in the sense 23 00:01:19,246 --> 00:01:21,476 that they were making their inferences. 24 00:01:22,236 --> 00:01:25,116 The models that we are working with right now, large language models, 25 00:01:25,216 --> 00:01:26,646 they are generative in nature. 26 00:01:27,066 --> 00:01:32,586 So one of the core differences between MLOps Versus LLMOps is what 27 00:01:32,586 --> 00:01:34,286 kind of model are we working with? 28 00:01:34,316 --> 00:01:37,056 Are we working with a discriminative model or are we 29 00:01:37,096 --> 00:01:38,696 working with the generative model? 30 00:01:39,536 --> 00:01:44,366 The big difference really happens because when we're talking about 31 00:01:44,406 --> 00:01:49,276 generative models, they're not really generating things at the same scale as 32 00:01:49,456 --> 00:01:51,126 conventional machine learning models. 33 00:01:51,466 --> 00:01:56,276 The size is much, much bigger, because they need a lot of information to be able 34 00:01:56,276 --> 00:01:58,501 to create more information themselves. 35 00:01:58,901 --> 00:02:03,581 there are the big problems: first is evaluation, and this second is basically 36 00:02:03,591 --> 00:02:05,631 the scale or the size of the markets. 37 00:02:05,691 --> 00:02:08,841 with conventional machine learning models, a lot of focus was 38 00:02:09,221 --> 00:02:12,291 'let's collect the data', then 'let's do feature engineering'. 39 00:02:12,341 --> 00:02:17,431 it was very much experimental, we were trying to fit a model to a very specific 40 00:02:17,431 --> 00:02:21,501 task, but large language models are task agnostic, which is their more generalized 41 00:02:21,541 --> 00:02:28,411 model, there's a shift from building task specific software to building task 42 00:02:28,411 --> 00:02:32,011 agnostic software, and that's where large language models come into play. 43 00:02:32,021 --> 00:02:35,611 Anytime you're building any sort of unbounded solution, that 44 00:02:35,621 --> 00:02:37,161 comes with its own challenges. 45 00:02:37,661 --> 00:02:43,591 So I would say large language model operations is inspired from MLOps which is 46 00:02:43,611 --> 00:02:46,921 because it shares some of the same things. 47 00:02:47,761 --> 00:02:50,561 There's maybe some stuff that we are doing in the engineering. 48 00:02:50,571 --> 00:02:54,051 Yes, we're still fine tuning the products, even though the fine tuning we're doing 49 00:02:54,051 --> 00:02:59,541 is very different, we can't really afford to update all of the weights of the model. 50 00:02:59,601 --> 00:03:03,606 So we're using approaches that only update some of the weights, or we 51 00:03:03,606 --> 00:03:06,456 are using other techniques, for example, prompt engineering, which 52 00:03:06,466 --> 00:03:11,576 is new with these models specifically because again, updating all of the 53 00:03:11,576 --> 00:03:15,006 weights of the model during the fine tuning or the training process. 54 00:03:15,096 --> 00:03:16,906 If we are to call fine tuning. 55 00:03:17,141 --> 00:03:18,831 very similar to training itself. 56 00:03:19,371 --> 00:03:20,301 It's very costly. 57 00:03:20,846 --> 00:03:21,136 okay. 58 00:03:21,136 --> 00:03:22,516 So you're blowing my mind a little bit. 59 00:03:22,556 --> 00:03:29,646 I thought the answer was LLMOps is a niche within MLOps and, just leave it at that. 60 00:03:29,646 --> 00:03:31,986 But sounds like there is more to that. 61 00:03:32,286 --> 00:03:33,716 How much of that is fashion? 62 00:03:33,926 --> 00:03:37,856 People who work on the fancy new LLMs who don't want to be called, the old 63 00:03:37,856 --> 00:03:40,036 fashioned, machine learning engineers? 64 00:03:40,781 --> 00:03:45,491 it's a complicated thing, which is, I don't think you essentially 65 00:03:45,491 --> 00:03:47,291 ever work on a technology. 66 00:03:47,291 --> 00:03:50,511 There are people who work on technology for the sake of technology. 67 00:03:50,511 --> 00:03:53,491 I would call those people researchers, which is ML scientists, 68 00:03:53,541 --> 00:03:55,011 essentially are those kind of people. 69 00:03:55,011 --> 00:03:59,516 But then the next step is people who are working on technology 70 00:03:59,526 --> 00:04:01,066 because it's solving a problem. 71 00:04:01,076 --> 00:04:05,226 So whether it's using a very simple decision tree or whether it's still 72 00:04:05,226 --> 00:04:10,646 using cat booster XGBoost, I don't think there needs to be much difference 73 00:04:10,646 --> 00:04:15,226 in terms of how people approach these kind of technologies itself, because 74 00:04:15,496 --> 00:04:19,466 the focus people need to have is 'this is the problem we are trying to solve. 75 00:04:19,466 --> 00:04:21,386 What kind of problem is it? 76 00:04:21,596 --> 00:04:25,306 Is it discrimative problem or is it a generative problem? 77 00:04:25,546 --> 00:04:29,566 If it's a generative problem, yes, I'm going to implement this technology, but 78 00:04:29,571 --> 00:04:34,866 it doesn't really mean that, both of those two fields are in competition with each 79 00:04:34,866 --> 00:04:36,666 other, I think they compliment each other. 80 00:04:37,166 --> 00:04:43,286 You picked LLMOps as the topic of your next book, and for anybody listening 81 00:04:43,286 --> 00:04:48,706 to this, the book will be very soon available in preview early access. 82 00:04:49,396 --> 00:04:53,946 It's called "LLMOps: Managing Large Language Models in Production". 83 00:04:54,691 --> 00:04:58,061 which takes quite a bit to pronounce all of this together. 84 00:04:58,461 --> 00:05:01,501 Tell us a little bit about how you, came up with the topic 85 00:05:01,501 --> 00:05:04,351 of the book, the origin story. 86 00:05:04,351 --> 00:05:06,631 I always wanted to write a technical book. 87 00:05:06,981 --> 00:05:12,081 Manning approached me back in 2018 to write a book on interpretability. 88 00:05:12,086 --> 00:05:15,026 I didn't feel like I was ready to write a book back then. 89 00:05:15,726 --> 00:05:19,206 but the person who was the assistant acquisition editor, which is 90 00:05:19,206 --> 00:05:22,266 the person who reached out, is essentially my acquisition editor. 91 00:05:22,266 --> 00:05:24,066 So she's now at O'Reilly. 92 00:05:24,996 --> 00:05:29,346 it's a very small place eventually, in terms of what inspired me to write 93 00:05:29,346 --> 00:05:33,106 a book, especially on this topic and not pick up any other topic for say, 94 00:05:33,786 --> 00:05:38,726 is basically seeing that shift, which is, as the scale is increasing with 95 00:05:38,736 --> 00:05:42,886 these models, yes, they're not really good at doing discriminative tasks 96 00:05:42,946 --> 00:05:47,736 right now, but eventually, because these are generalized models, we'll be 97 00:05:47,826 --> 00:05:52,256 eventually expanding on the capabilities of these models, but these models 98 00:05:52,256 --> 00:05:54,136 are not getting smaller anytime soon. 99 00:05:55,171 --> 00:05:59,381 Because of the scale of these models, there will be a few questions for people 100 00:05:59,421 --> 00:06:03,871 to asking, because we're interacting so closely with these models as compared 101 00:06:03,871 --> 00:06:08,851 to before earlier, the majorly the people who were interacting directly 102 00:06:08,871 --> 00:06:11,811 with the model were machine learning engineers and data scientists. 103 00:06:12,181 --> 00:06:17,031 Now, these are in the form of chatbots, which the entire user is 104 00:06:17,821 --> 00:06:20,636 interacting with it, and people are playing with it, trying to hack it. 105 00:06:20,996 --> 00:06:25,696 So while the field of security operations wasn't super relevant for a lot of 106 00:06:25,736 --> 00:06:30,806 other companies, now that has become the main center of this show in a way. 107 00:06:30,876 --> 00:06:35,326 Everybody can build a large language model and that's one of the core 108 00:06:35,326 --> 00:06:41,086 differences is, the focus was more like in MLOps, like, how do we build a model? 109 00:06:41,156 --> 00:06:44,166 How do we host and, deploy it in production, which is how 110 00:06:44,166 --> 00:06:45,886 do we self serve these models. 111 00:06:46,346 --> 00:06:51,206 Now the focus has shifted, it's so damn easy To build a large language model 112 00:06:51,246 --> 00:06:54,576 for your particular application, you may not have to build it from scratch. 113 00:06:54,606 --> 00:06:56,826 You don't need to train a model from scratch. 114 00:06:57,116 --> 00:06:58,666 You can put wrappers around it. 115 00:06:58,696 --> 00:07:00,266 You can put guardrails around it. 116 00:07:00,606 --> 00:07:01,946 You can still fine tune it. 117 00:07:01,976 --> 00:07:06,956 You can integrate it with a RAG system and use it for your particular use case. 118 00:07:07,246 --> 00:07:12,936 So for me, understanding the fact that there's A big market of people who were 119 00:07:12,946 --> 00:07:17,776 software engineers, who didn't really have access to machine learning systems 120 00:07:17,826 --> 00:07:21,286 because they didn't have the skill setting, machine learning has always been 121 00:07:21,286 --> 00:07:25,736 posed like, Oh, my God, you need to know linear algebra to understand how these 122 00:07:25,756 --> 00:07:32,221 models work to now, where they can just give the API key and implement a machine 123 00:07:32,231 --> 00:07:36,701 learning model itself so that ease of use means the entire software engineering 124 00:07:36,711 --> 00:07:41,691 community or anybody who can code will now be able to build or host their own 125 00:07:41,711 --> 00:07:45,411 machine learning model and in this case, specifically, it will be large language 126 00:07:45,441 --> 00:07:48,001 models, but I recognize that shift. 127 00:07:48,001 --> 00:07:50,501 And I was like, this is a substantial shift. 128 00:07:50,661 --> 00:07:55,511 This is not just, this technology is limited to this very set of people. 129 00:07:56,031 --> 00:07:58,051 Now, so many people can use it. 130 00:07:58,091 --> 00:07:59,761 So many people can build on it. 131 00:08:00,561 --> 00:08:05,701 And the fact that the market has expanded and also the fact that these models do 132 00:08:05,701 --> 00:08:08,271 present additional challenges as well. 133 00:08:08,451 --> 00:08:11,101 This is the right point for us to write a book. 134 00:08:11,116 --> 00:08:15,416 Because it's a way bigger market than before. 135 00:08:16,154 --> 00:08:20,974 There's something really scary about putting a model in production and letting 136 00:08:21,034 --> 00:08:25,244 clients talk to it, when you never know what it's going to do for sure. 137 00:08:25,244 --> 00:08:25,274 Okay. 138 00:08:25,989 --> 00:08:28,339 And I understand the shift that you're describing. 139 00:08:28,339 --> 00:08:30,724 So who came up with the term LLMOps, 140 00:08:30,774 --> 00:08:33,644 Basically, when I sent my proposal, I used that term. 141 00:08:34,044 --> 00:08:40,264 it was in, Last year, February, when I sent my proposal, and I use that 142 00:08:40,264 --> 00:08:44,574 term, the proposal was sent to a couple of reviewers who were like, 'Oh, we 143 00:08:44,574 --> 00:08:46,194 don't think it's going to stick'. 144 00:08:46,194 --> 00:08:51,284 And then eventually, I think Weights and Biases came up with their own blog post, 145 00:08:51,284 --> 00:08:52,744 which is what's really the difference. 146 00:08:52,784 --> 00:08:55,944 Then Arise came up with their own blog post, as in what's the 147 00:08:55,944 --> 00:08:57,564 difference between LLMOps and MLOPs. 148 00:08:58,084 --> 00:09:01,964 And eventually, Everybody was like, 'Oh my God, this term is sticking'. 149 00:09:02,264 --> 00:09:06,084 And by then we had already signed up the contract with my editor. 150 00:09:06,084 --> 00:09:10,544 she took a gamble on me, which is, I said, this is going to stick because it's 151 00:09:10,544 --> 00:09:16,904 a substantial shift in what we're trying to do, in MLOps, the focus was different. 152 00:09:17,024 --> 00:09:18,524 Here, the focus is different. 153 00:09:18,844 --> 00:09:22,644 the amount of outages, the amount of reliability issues, the amount of, 154 00:09:23,014 --> 00:09:27,304 unreliability of these models is way higher than the conventional machine 155 00:09:27,304 --> 00:09:28,994 learning models that we were using. 156 00:09:29,364 --> 00:09:33,454 there are very few people who were doing distributed training, who 157 00:09:33,514 --> 00:09:36,014 understand that scope of problems. 158 00:09:36,514 --> 00:09:40,704 the engineering was not really done at the scale that is being done 159 00:09:40,754 --> 00:09:42,324 right now for large language models. 160 00:09:42,614 --> 00:09:48,414 I feel like this is going to be a big thing where there needs to be education. 161 00:09:48,734 --> 00:09:53,294 And I basically went in to create that education in this space. 162 00:09:53,854 --> 00:09:58,664 if I was to start in the field today as a 17 or a 19 year old, 163 00:09:59,424 --> 00:10:01,214 what would I want to learn? 164 00:10:01,324 --> 00:10:04,794 I come from like a background in maths and computer science and statistics. 165 00:10:05,194 --> 00:10:08,634 So I don't want people to feel like, 'oh, that's a barrier for 166 00:10:08,634 --> 00:10:09,944 me to get into machine learning'. 167 00:10:09,944 --> 00:10:11,614 No, that's not really a barrier. 168 00:10:13,154 --> 00:10:13,534 right? 169 00:10:14,144 --> 00:10:19,424 And so for anybody who is, like we said, the book is still a little bit, out. 170 00:10:19,484 --> 00:10:22,754 it's coming soon, but it's not available today. 171 00:10:23,119 --> 00:10:26,419 What is available is that new report that you authored. 172 00:10:26,569 --> 00:10:27,999 What is LLMOps? 173 00:10:28,529 --> 00:10:29,059 What's that? 174 00:10:29,069 --> 00:10:33,269 Basically to prepare people to start using that term to make sure 175 00:10:33,269 --> 00:10:37,009 that everybody's on the same page: 'okay, guys, we're doing LLM Ops. 176 00:10:37,009 --> 00:10:37,839 This is the term. 177 00:10:37,869 --> 00:10:38,779 Let's go with it'. 178 00:10:39,426 --> 00:10:44,736 so the reason the report came out was because we got very critical reviews 179 00:10:44,776 --> 00:10:49,146 from a lot of people early last year from people who were saying LLMs are not going 180 00:10:49,146 --> 00:10:51,566 to stick, LLMOps is not going to stick. 181 00:10:51,566 --> 00:10:56,156 So we were like, let's at least tell people what it is, and then if there's 182 00:10:56,156 --> 00:10:57,536 enough interest, we'll write the book. 183 00:10:57,556 --> 00:11:00,546 Though we had signed contracts for both of the things, but we were like, 184 00:11:00,596 --> 00:11:04,486 let's test out, if people really understand what's the difference. 185 00:11:04,536 --> 00:11:09,276 once people know why this is substantial, then we can take them 186 00:11:09,276 --> 00:11:11,396 to, how are you supposed to do it? 187 00:11:12,189 --> 00:11:17,509 what the report essentially does is what I would probably say we're a little bit 188 00:11:17,999 --> 00:11:22,079 late in the market where it has already stuck, which is people have already 189 00:11:22,079 --> 00:11:26,049 understood, there's a shift in terms of companies that are building their 190 00:11:26,079 --> 00:11:29,819 own large language model or generative AI teams, if I can use that word. 191 00:11:30,339 --> 00:11:32,809 The implementation has already started. 192 00:11:32,809 --> 00:11:35,089 They've already started looking at the issues. 193 00:11:35,139 --> 00:11:38,659 They've started realizing that they need a new discipline. 194 00:11:39,019 --> 00:11:43,979 so I'm having talks with a lot of companies on a consulting capacity, 195 00:11:44,409 --> 00:11:47,879 that are trying to figure out how to build, a specialized engineering 196 00:11:47,899 --> 00:11:49,259 practice around these models. 197 00:11:49,259 --> 00:11:53,009 What would the shift look like when it comes to these models? 198 00:11:53,289 --> 00:11:55,289 What would the team structure look like? 199 00:11:55,309 --> 00:11:56,979 What would the metrics look like? 200 00:11:57,319 --> 00:12:00,869 What are the key expectations that they can get? 201 00:12:00,899 --> 00:12:05,059 And, if you want to keep investing in the space, then how do we 202 00:12:05,059 --> 00:12:06,709 justify that investment as well? 203 00:12:07,579 --> 00:12:11,569 How do we make sure we tie these models with our KPIs now, given 204 00:12:11,569 --> 00:12:16,409 the fact, these models are still a little bit unpredictable for a few 205 00:12:16,409 --> 00:12:17,949 people would call it unpredictable. 206 00:12:18,499 --> 00:12:21,419 I don't particularly think they're unpredictable. 207 00:12:21,739 --> 00:12:22,849 They still exist. 208 00:12:23,249 --> 00:12:28,279 any inference that is being made does exist, and there are probably the space of 209 00:12:28,279 --> 00:12:30,889 the input data that you're providing to. 210 00:12:30,979 --> 00:12:35,149 So to me, while they're still very probabilistic model, but they're still 211 00:12:35,239 --> 00:12:40,279 a little bit like untameable in, if I can say it in that sense, which is, it's 212 00:12:40,279 --> 00:12:45,619 very hard to predict, if the model goes off and it's not because essentially the 213 00:12:45,629 --> 00:12:49,559 model is built that way, but it's because of the number of people interacting with 214 00:12:49,559 --> 00:12:54,479 it and, the way the models are being structured is basically to help the user. 215 00:12:54,809 --> 00:12:57,239 there are so many people trying to hack the solutions. 216 00:12:57,309 --> 00:13:00,959 basically you're building a product for your enemies, essentially. 217 00:13:01,039 --> 00:13:01,479 okay. 218 00:13:01,579 --> 00:13:02,879 Naming is hard. 219 00:13:02,889 --> 00:13:07,789 It's probably the hardest problem in computer science, but we've got a term. 220 00:13:07,939 --> 00:13:11,509 I think at this stage we understand what it means. 221 00:13:12,289 --> 00:13:17,499 There's a report in case you want to prove to somebody, Hey, LLMOps means this. 222 00:13:17,499 --> 00:13:18,749 You can just point them to that. 223 00:13:19,159 --> 00:13:20,479 And the book is coming out soon. 224 00:13:20,479 --> 00:13:27,644 So let's talk a little bit about what LLMOps really is in practice. 225 00:13:27,644 --> 00:13:30,754 And I'm browsing through your report right now. 226 00:13:30,754 --> 00:13:36,264 And I see things like safety, scalability, robustness, the LLM lifecycle. 227 00:13:37,064 --> 00:13:38,724 Let's talk about this things a little bit. 228 00:13:38,744 --> 00:13:39,754 where should we start? 229 00:13:39,814 --> 00:13:44,294 What's the most, painful part of running LLMs today? 230 00:13:45,779 --> 00:13:50,419 So I would say, the three goals are where we should ideally start this, 231 00:13:50,419 --> 00:13:55,064 which is why do we need this field, or why do we need this new practice? 232 00:13:55,734 --> 00:13:59,074 The first thing is essentially safety, which is making sure that 233 00:13:59,164 --> 00:14:00,944 the model is playing by the rules. 234 00:14:01,344 --> 00:14:05,904 Because, again, it's not just machine learning engineers trying 235 00:14:05,904 --> 00:14:07,274 to build on these models today. 236 00:14:07,314 --> 00:14:08,304 It's software engineers. 237 00:14:08,634 --> 00:14:10,454 It's a lot of other people as well. 238 00:14:10,724 --> 00:14:15,424 There needs to be a new playbook, for people who are working with these models. 239 00:14:15,804 --> 00:14:20,159 Because, again, the models do pose a lot of risk, which is yes, 240 00:14:20,159 --> 00:14:21,939 there's operational risk as well. 241 00:14:22,719 --> 00:14:26,489 But a lot of risk that people don't really understand, it's very easy to 242 00:14:26,509 --> 00:14:30,489 integrate code, integrate libraries, but a lot of people don't really 243 00:14:30,499 --> 00:14:35,329 think about supply chain risk, which is if I'm using a package from some 244 00:14:35,649 --> 00:14:38,609 website, is the package secure enough? 245 00:14:38,844 --> 00:14:43,654 How do I make sure that, I'm not installing malware on my system. 246 00:14:44,214 --> 00:14:47,644 Those things are not really well understood, which is the entire 247 00:14:47,644 --> 00:14:52,294 field of, cyber security and security operations was isolated from practice. 248 00:14:52,294 --> 00:14:57,109 And now that has to become very key integrated into this. 249 00:14:57,599 --> 00:15:03,409 The second thing I would say is scalability, which is basically 250 00:15:03,409 --> 00:15:08,329 making sure that the model does scale to the number of people that are 251 00:15:08,499 --> 00:15:09,979 interacting with the model as well. 252 00:15:10,239 --> 00:15:15,659 We're essentially going from where maybe a couple of people were interacting 253 00:15:15,669 --> 00:15:19,389 with these models to a large number of people interacting by the minute, 254 00:15:19,439 --> 00:15:23,609 which is, you're not going to open AI chat GPT to write one thing, right? 255 00:15:23,959 --> 00:15:28,529 You're having a conversation, which may take about five, 10, 15, 20 minutes, 256 00:15:28,819 --> 00:15:31,039 and they're wearing workloads as well. 257 00:15:31,039 --> 00:15:34,229 And they're wearing workloads from different locations. 258 00:15:34,959 --> 00:15:39,599 So we need to think about how do we make sure that the latency is fine? 259 00:15:39,869 --> 00:15:44,449 How do we make sure that the models are able to deal with the traffic if it's 260 00:15:44,979 --> 00:15:50,179 usual or unusual, and how do we build an architecture around making sure 261 00:15:50,199 --> 00:15:56,849 that the model can serve and can adapt to those requirements is the central 262 00:15:56,849 --> 00:16:01,209 thing, but also with the part that these models are so huge, inferencing 263 00:16:01,449 --> 00:16:03,459 that every single time does cost you. 264 00:16:03,459 --> 00:16:04,879 So how do we do caching? 265 00:16:04,989 --> 00:16:06,659 How do we do, load testing? 266 00:16:06,669 --> 00:16:08,419 How do we do performance testing? 267 00:16:08,429 --> 00:16:13,169 All of those questions become central that weren't really central before. 268 00:16:13,619 --> 00:16:15,839 then the next part is basically robustness. 269 00:16:16,089 --> 00:16:18,039 and by robust, the model keeps. 270 00:16:18,404 --> 00:16:22,734 Having the same kind of reactions, which is a conventionally we used to 271 00:16:22,734 --> 00:16:26,464 call it reproducibility, which is you can reproduce what was already there. 272 00:16:27,084 --> 00:16:30,424 Robustness is a little bit different, since a lot of people are building 273 00:16:30,444 --> 00:16:33,384 on closed source, a lot of people are building on open source, but 274 00:16:33,404 --> 00:16:38,104 the models behavior changes with how many people are interacting? 275 00:16:38,134 --> 00:16:39,944 it's getting a lot of light data as well. 276 00:16:40,384 --> 00:16:43,784 So there's some kind of model degradation that happens with time. 277 00:16:44,094 --> 00:16:47,844 Also, every single time the model gets updated as well, the behavior changes. 278 00:16:47,854 --> 00:16:52,224 So the entire prompt pipeline that you built up can break easily, which is a 279 00:16:52,224 --> 00:16:58,549 lot of companies eventually realized, mid last year that, the built up These very 280 00:16:58,569 --> 00:17:03,309 intricate prompt pipelines, and eventually OpenAI does one update and those prompt 281 00:17:03,399 --> 00:17:05,049 pipelines don't really work anymore. 282 00:17:05,049 --> 00:17:10,189 So how do you build a system that keeps on being predictable in that scenario, 283 00:17:10,189 --> 00:17:14,617 which is any sort of infrastructure that you build on top of the model? 284 00:17:14,667 --> 00:17:18,537 It doesn't need to be rebuilt for every single iteration or every single 285 00:17:18,577 --> 00:17:22,237 time you're moving from OpenAI to let's say plot or to some other model 286 00:17:22,297 --> 00:17:27,577 as well, because you need to keep improving and making sure that you're 287 00:17:27,577 --> 00:17:28,977 working with the new data as well. 288 00:17:29,007 --> 00:17:33,067 So the three questions that come over there are the questions of data drift, 289 00:17:33,807 --> 00:17:39,297 which is based on how the input changes over time, which is basically how many 290 00:17:39,297 --> 00:17:40,877 people are interacting with the model. 291 00:17:41,237 --> 00:17:44,947 and that causes one of the shifts in the model behavior. 292 00:17:45,227 --> 00:17:49,357 The second is concept drift, which is every single time, there's new 293 00:17:49,357 --> 00:17:50,757 information that comes out there. 294 00:17:50,757 --> 00:17:55,597 So a good example to give over there would be Corona used to be a beer brand. 295 00:17:55,867 --> 00:18:02,347 so any models that were built up till, let's say about 2019, 2020, understood 296 00:18:02,437 --> 00:18:06,197 Corona as like a beer, so it would always reference an answer in that 297 00:18:06,197 --> 00:18:10,177 perspective, the models that are being, built now to understand that, it could be 298 00:18:10,207 --> 00:18:13,417 a beer or it could be, the virus thing. 299 00:18:14,767 --> 00:18:16,647 So that is essentially concept drift. 300 00:18:16,747 --> 00:18:19,897 the prime minister, the president, or any new information that comes 301 00:18:19,897 --> 00:18:24,467 up where changes the behavioral functional, capabilities of the 302 00:18:24,467 --> 00:18:29,522 inputs that we've essentially, given or, adds additional information that 303 00:18:29,582 --> 00:18:31,262 changes the model behavior as well. 304 00:18:31,262 --> 00:18:35,512 And the third is basically the prompter, which is the updates 305 00:18:35,932 --> 00:18:37,002 of the model, essentially. 306 00:18:37,002 --> 00:18:41,622 And how does the retraining of the model affect the performance of 307 00:18:41,652 --> 00:18:43,562 your entire infrastructure as well? 308 00:18:43,612 --> 00:18:49,082 For anybody who's building on closed source models, OpenAI, Cloud, and Entropiq 309 00:18:49,082 --> 00:18:53,702 and all of these companies, they're constantly using, our LHF techniques 310 00:18:53,702 --> 00:18:55,702 to retrain the models substantially. 311 00:18:56,752 --> 00:18:59,512 So that does impact the model performance. 312 00:18:59,732 --> 00:19:02,652 I would say these three are core things which are in the center. 313 00:19:02,682 --> 00:19:07,412 Anybody who's building with these models needs to think of is my model safe? 314 00:19:07,782 --> 00:19:09,182 Is my model scalable? 315 00:19:09,282 --> 00:19:10,342 Is my model robust? 316 00:19:10,342 --> 00:19:14,302 And if you're not looking those properties, it's very hard to build a 317 00:19:14,652 --> 00:19:17,172 sustainable product around these models. 318 00:19:17,947 --> 00:19:20,157 that was a lot of information in one go. 319 00:19:20,187 --> 00:19:21,177 I've got questions. 320 00:19:21,227 --> 00:19:25,347 imagine you're talking to a five year old software engineer who has never done 321 00:19:25,387 --> 00:19:32,667 any AI, just, basic, software engineering things, as five year olds do, how 322 00:19:32,877 --> 00:19:39,072 different really is it, the safety part of it, compared to any other application? 323 00:19:39,122 --> 00:19:43,372 the few examples you gave, like using an unsafe library coming from somewhere, 324 00:19:43,412 --> 00:19:46,802 every piece of software on earth is going to have the same problem, right? 325 00:19:47,212 --> 00:19:52,922 What are the problems that are actually unique to LLMs, from 326 00:19:52,922 --> 00:19:54,912 the safety perspective and why? 327 00:19:55,970 --> 00:20:01,270 This is one reason I think LLMOps is closer to DevOps than it is to MLOps. 328 00:20:01,310 --> 00:20:06,110 Essentially, because DevOps is built up around so much software, 329 00:20:06,110 --> 00:20:09,220 so many frameworks, so many libraries exist out there, whereas 330 00:20:09,380 --> 00:20:13,190 in conventional machine learning models, we were using scikit learning. 331 00:20:13,460 --> 00:20:16,900 So there were very specific libraries that were already tested. 332 00:20:17,105 --> 00:20:18,625 And, we knew that these are secure. 333 00:20:18,645 --> 00:20:21,085 We were using TensorFlow, PyTorch and all those ones. 334 00:20:21,845 --> 00:20:27,095 Now, because the open source community is very similar to how the software community 335 00:20:27,095 --> 00:20:32,335 is, so a lot of things do translate from what DevOps engineers were doing or where 336 00:20:32,335 --> 00:20:37,415 the focus of what conventional software engineers were looking at versus what 337 00:20:37,415 --> 00:20:40,055 LLMOps engineers would be looking at. 338 00:20:40,385 --> 00:20:45,120 The key differences now.would be: anytime we're doing 339 00:20:45,350 --> 00:20:46,780 conventional software engineering. 340 00:20:47,160 --> 00:20:52,130 it's a rule based system where we define, what our code is supposed to do. 341 00:20:52,540 --> 00:20:57,230 Now we're moving away from a rule based system, which means, basically 342 00:20:57,370 --> 00:21:02,190 he model can create things that are factually inaccurate as well. 343 00:21:02,490 --> 00:21:05,350 Now those are things that you really need to cater for. 344 00:21:05,720 --> 00:21:08,790 So that's one of the big things, which is A: how do you 345 00:21:08,790 --> 00:21:11,060 deal with biases in the data? 346 00:21:11,560 --> 00:21:15,220 Second thing is how do you deal with factually inaccurate information? 347 00:21:15,770 --> 00:21:19,950 and so for a five year old, maybe it's not that significant, but for anybody who's 348 00:21:19,950 --> 00:21:24,390 doing software engineering, how do you make sure that you're not making decisions 349 00:21:24,520 --> 00:21:27,540 based on what the models are generating. 350 00:21:27,540 --> 00:21:33,610 For example, if the model says this is how this is supposed to happen or for business 351 00:21:33,610 --> 00:21:38,980 executives, if it says, based on the data, this is what the graph is looking like. 352 00:21:39,240 --> 00:21:43,680 And if that happens to be inaccurate, we can't really rely on that to make 353 00:21:43,690 --> 00:21:47,040 further decisions on where the strategic decisions we should be making next. 354 00:21:48,210 --> 00:21:52,050 Now then with models are exposed to that kind of risk, which is usually 355 00:21:52,050 --> 00:21:54,030 called this hallucination problem. 356 00:21:55,015 --> 00:21:55,375 got it. 357 00:21:55,765 --> 00:22:00,075 I guess it gets much worse when you've got things like autonomous agents, right? 358 00:22:00,075 --> 00:22:05,455 When people directly plug things that have permissions to do things, into 359 00:22:05,455 --> 00:22:08,995 this LLMs and we'll see how that goes. 360 00:22:09,045 --> 00:22:09,395 Okay. 361 00:22:09,505 --> 00:22:11,715 So I buy that argument. 362 00:22:11,785 --> 00:22:16,635 going back to the scalability, that's the bit that I don't think I fully 363 00:22:16,635 --> 00:22:18,345 understood when you were explaining. 364 00:22:18,435 --> 00:22:25,465 Why is it not the same as scaling any other request response server? 365 00:22:26,115 --> 00:22:30,495 how is it different other than, the practical part of it being massive 366 00:22:30,505 --> 00:22:32,675 and requiring a lot of resources? 367 00:22:33,745 --> 00:22:39,745 Why is it harder to scale an LLM, than it is to scale any other application? 368 00:22:40,700 --> 00:22:44,850 at this point, you can probably say there are three kinds of applications out there. 369 00:22:44,870 --> 00:22:46,780 One is conventional software piece, right? 370 00:22:47,220 --> 00:22:50,500 Anytime we're writing software, we're trying to refactor, making it 371 00:22:50,580 --> 00:22:55,050 as small as possible or making sure that we're defining rules on, this 372 00:22:55,220 --> 00:22:56,840 is what happens when you do this. 373 00:22:56,880 --> 00:22:58,490 This is what happens when you do this. 374 00:22:58,510 --> 00:23:00,070 That's how requests are processed. 375 00:23:00,500 --> 00:23:06,040 But conventional machine learning models, the way that they are working, is, 376 00:23:06,050 --> 00:23:10,770 the applications they're used for are entirely different, mostly they're used 377 00:23:10,770 --> 00:23:17,755 for internal data capture, being used for recommender systems or semantic analysis. 378 00:23:18,035 --> 00:23:20,865 the people who are interacting with the model outputs are different. 379 00:23:21,375 --> 00:23:25,645 Now, because large language models are customer facing. 380 00:23:26,015 --> 00:23:29,835 that necessitates some expectations that people have, the inference 381 00:23:29,835 --> 00:23:31,445 speed is always going to be high. 382 00:23:32,285 --> 00:23:36,335 Now, with the inference speed, when you have so much data that you need to 383 00:23:36,405 --> 00:23:41,465 retrieve or run an algorithm to create new information based on whatever they've 384 00:23:41,515 --> 00:23:46,235 given, that is a very hard task, so with conventional software engineering, we 385 00:23:46,235 --> 00:23:50,845 were writing, those birds first search, all of those algorithms, They were still 386 00:23:50,885 --> 00:23:56,755 implemented on a small scale data, it was still very simple to do as compared 387 00:23:56,755 --> 00:23:59,375 to now we have these massive databases. 388 00:23:59,815 --> 00:24:04,070 retrieving data and then generating information, both in 389 00:24:04,070 --> 00:24:05,590 real time, is a very hard task. 390 00:24:05,660 --> 00:24:09,790 Making sure that you can maintain the inference, speed, making sure that you 391 00:24:09,790 --> 00:24:13,600 can maintain the latency, making sure that you can make it the target, making sure 392 00:24:13,600 --> 00:24:16,400 that you can test, other things as well. 393 00:24:16,470 --> 00:24:20,850 before the model really passes information to the user, their guardrails 394 00:24:20,880 --> 00:24:24,030 being put in, which is there's one additional layer that's put in, there's 395 00:24:24,060 --> 00:24:27,770 evaluations that are being put in as well to make sure that the person isn't 396 00:24:27,770 --> 00:24:32,695 fiddling with the model or, it's not giving you information that's wrong. 397 00:24:32,935 --> 00:24:36,265 First part is retrieving the data and then generating information. 398 00:24:36,275 --> 00:24:39,175 The next part is making sure that, it passes through all of 399 00:24:39,225 --> 00:24:43,255 those layers as well, then still maintains the customer expectations. 400 00:24:43,285 --> 00:24:44,295 That's really hard. 401 00:24:45,015 --> 00:24:51,385 And also, when the demand does skyrocket, or when they don't find an information, 402 00:24:51,485 --> 00:24:54,090 they can easily freeze up as well. 403 00:24:54,630 --> 00:24:59,030 So that becomes a very hard problem to solve, because that means the 404 00:24:59,030 --> 00:25:01,340 performance would also degrade. 405 00:25:01,590 --> 00:25:07,330 then the next question is basically If a lot of people are using the model for 406 00:25:07,330 --> 00:25:14,020 one kind of things, making sure that the model can still answer or doesn't really 407 00:25:14,280 --> 00:25:19,230 adapt to only those kind of problems and can still go into the database 408 00:25:19,230 --> 00:25:23,580 and still look at, A very different problem and still perform well on that. 409 00:25:23,580 --> 00:25:24,320 that's hard. 410 00:25:24,790 --> 00:25:30,670 So the real challenges are basically the service disruption, the availability 411 00:25:31,140 --> 00:25:35,650 and that is usually a little bit harder. 412 00:25:36,095 --> 00:25:41,175 majorly because, you need to have a lot of, parallel nodes as well that are 413 00:25:41,175 --> 00:25:42,905 trying to interact with these models. 414 00:25:42,995 --> 00:25:46,555 And then for companies that are hosting different large language models as well. 415 00:25:46,895 --> 00:25:51,085 no one large language model is optimal for every single kind of problem. 416 00:25:51,085 --> 00:25:52,895 it may not be cost optimal as well. 417 00:25:53,045 --> 00:25:57,045 So what is really happening is there's a micro service kind 418 00:25:57,045 --> 00:25:58,895 of architecture, which was. 419 00:25:59,160 --> 00:26:01,320 Prominent in conventional software engineering. 420 00:26:01,320 --> 00:26:04,840 Do not so much in LLMOps. 421 00:26:05,290 --> 00:26:08,910 What really happens over there is then you're thinking about how am I doing 422 00:26:09,170 --> 00:26:13,080 parallel computing with all of these nodes and clusters that I do have? 423 00:26:13,360 --> 00:26:15,130 How do I do horizontal scaling? 424 00:26:15,440 --> 00:26:18,660 How do I make sure that all of my resources are being optimized, and 425 00:26:18,670 --> 00:26:23,190 one of my clusters isn't done while one is being really optimized to 426 00:26:23,210 --> 00:26:24,500 the maximum, you know, that is. 427 00:26:24,755 --> 00:26:26,145 stopping at one point in time. 428 00:26:26,595 --> 00:26:30,225 So those are key problems with these models now. 429 00:26:31,338 --> 00:26:31,708 So 430 00:26:31,830 --> 00:26:37,960 I get that, but I'm still not entirely sure why it's any harder than any 431 00:26:37,990 --> 00:26:41,350 other piece of software, like all the things that you just mentioned 432 00:26:41,350 --> 00:26:45,620 about scaling clusters and high availability, high throughput, making 433 00:26:45,620 --> 00:26:47,050 sure that all those things happen. 434 00:26:47,790 --> 00:26:52,230 That's problems that we've had for decades and that, all the other 435 00:26:52,340 --> 00:26:54,230 systems have in place, right? 436 00:26:54,230 --> 00:26:57,860 If you look at any enterprise ready application, there are layers and 437 00:26:57,870 --> 00:27:00,260 layers of things and they keep working. 438 00:27:00,260 --> 00:27:03,460 So where is the actual difference coming from? 439 00:27:03,470 --> 00:27:08,755 Is it because of the fact that your query, your prompt, it's indeterministic 440 00:27:08,755 --> 00:27:14,445 in terms of resources and time it takes to answer that query? 441 00:27:14,545 --> 00:27:19,265 Is that really the biggest wrench in the works here? 442 00:27:20,525 --> 00:27:24,575 so the biggest wrench is essentially just the size of 443 00:27:24,605 --> 00:27:26,665 the data that it needs to query. 444 00:27:27,225 --> 00:27:32,095 the size of the data has gone from, a few million parameters 445 00:27:32,155 --> 00:27:33,675 to now a trillion parameters. 446 00:27:33,675 --> 00:27:39,065 And every single time you need to look at a trillion parameters of information and 447 00:27:39,075 --> 00:27:42,905 then generate information, making sure that you're not overly relying on copying 448 00:27:42,905 --> 00:27:47,085 information from a single source, but you're building up a response from all 449 00:27:47,125 --> 00:27:51,895 different sources of information that you do have available that is time consuming. 450 00:27:53,535 --> 00:27:53,755 Got it. 451 00:27:53,965 --> 00:27:54,345 Okay. 452 00:27:54,485 --> 00:27:54,705 So, 453 00:27:54,735 --> 00:27:56,465 it is the unpredictable. 454 00:27:57,900 --> 00:28:00,580 Nature of you don't know how much data you're going to have 455 00:28:00,580 --> 00:28:04,110 to pull in basically Okay. 456 00:28:04,210 --> 00:28:07,920 And then you talked a little bit about the robustness. 457 00:28:08,650 --> 00:28:13,830 And if I understood correctly, you said something about: as people 458 00:28:13,830 --> 00:28:16,750 use this models, they deteriorate? 459 00:28:17,405 --> 00:28:22,875 so any single time people are interacting with these models, what really happens 460 00:28:22,875 --> 00:28:27,915 is we're asking a certain kind of questions over a period of time. 461 00:28:28,725 --> 00:28:34,345 as the model is answering those particular kind of questions, it starts 462 00:28:35,065 --> 00:28:36,855 learning new information as well. 463 00:28:36,925 --> 00:28:40,695 And with a period of time, it starts forgetting other information. 464 00:28:40,975 --> 00:28:44,645 consider this, you basically started in high school, you learned a couple of 465 00:28:45,125 --> 00:28:49,325 subjects, you learned social sciences, you learned physics, you learned chemistry 466 00:28:49,325 --> 00:28:52,915 and everything, but now you're doing software engineering, which is that's 467 00:28:52,955 --> 00:28:54,485 the thing you're doing all day long. 468 00:28:55,005 --> 00:28:58,885 Now, if I was to ask you a chemistry question, it would take you a very 469 00:28:58,885 --> 00:29:02,185 long time and you'll have to think and you may not be able to answer 470 00:29:02,390 --> 00:29:06,500 accurately on a chemistry question, compared to when you were exposed to 471 00:29:06,550 --> 00:29:08,610 that information on a daily basis. 472 00:29:09,020 --> 00:29:11,720 Now that's the same thing which is happening with large language models 473 00:29:11,720 --> 00:29:15,900 as well, which is based on the kind of Interactions they're having with 474 00:29:16,000 --> 00:29:20,250 these models, based on the inputs that they're getting from the users itself 475 00:29:20,280 --> 00:29:22,910 they can drift in a particular direction 476 00:29:24,163 --> 00:29:24,393 You're 477 00:29:24,555 --> 00:29:26,055 completely blowing my mind. 478 00:29:26,075 --> 00:29:30,095 What I thought was happening is that once I've got a model trained, let's 479 00:29:30,095 --> 00:29:33,175 say that I download LLAMA-3, right? 480 00:29:33,215 --> 00:29:35,735 And I run it on my, computer. 481 00:29:36,715 --> 00:29:39,795 I thought that this were static weights that didn't. 482 00:29:40,305 --> 00:29:41,945 budge anymore, right? 483 00:29:41,985 --> 00:29:47,785 I was just sorting a query through it, getting some kind of output, and my model 484 00:29:47,785 --> 00:29:50,085 itself wasn't changing over time at all. 485 00:29:50,760 --> 00:29:56,280 that's if you're implementing the model as is, but the moment you implement it 486 00:29:56,280 --> 00:30:00,720 in production, it changes because now you're integrating real data sources as 487 00:30:00,720 --> 00:30:06,220 well, but also with LLAMA models as well, which is if we keep interacting with 488 00:30:06,220 --> 00:30:09,540 the model for a couple of hours of time. 489 00:30:09,810 --> 00:30:15,120 It it will look at the queries that were done previously to answer you questions 490 00:30:15,120 --> 00:30:19,900 quickly over that period of time, based on the last interactions, essentially, 491 00:30:20,380 --> 00:30:22,310 there's a particular reason for that. 492 00:30:22,330 --> 00:30:25,180 it's basically the same thing which is happening in our brains, 493 00:30:25,210 --> 00:30:27,970 which is the same neurons are getting fired over and over again. 494 00:30:28,190 --> 00:30:32,990 as some information gets fired over and over again, as some rates are being 495 00:30:32,990 --> 00:30:37,770 called over and over again, those rates do get higher priority eventually. 496 00:30:37,910 --> 00:30:43,300 Okay, so why can't you just have a static set of weights for this 497 00:30:43,300 --> 00:30:47,630 model and not adjust them so that you don't have that problem? 498 00:30:48,360 --> 00:30:49,310 Why is it not enough? 499 00:30:49,845 --> 00:30:54,055 because then it wouldn't be able to do domain adaptation, which is, it may 500 00:30:54,115 --> 00:30:59,565 work fantastically well for the idea said that you've provided it with, but 501 00:30:59,605 --> 00:31:03,455 if you need to do something on top of that, which is implemented for your 502 00:31:03,455 --> 00:31:06,445 use case, then it can't really do that. 503 00:31:06,445 --> 00:31:11,585 And then again, the big question around: if we wanted to behave in a certain 504 00:31:11,595 --> 00:31:15,785 way, we want it to answer questions in a certain way, it wouldn't be able 505 00:31:15,785 --> 00:31:17,655 to have those capabilities either. 506 00:31:17,655 --> 00:31:21,285 So the whole RLHF thing with where we teach the model, 507 00:31:21,305 --> 00:31:22,545 this is wrong, this is right. 508 00:31:22,815 --> 00:31:24,155 That doesn't really happen. 509 00:31:24,915 --> 00:31:27,205 So there's essentially no learning happening. 510 00:31:27,205 --> 00:31:29,565 So the performance is static. 511 00:31:29,585 --> 00:31:33,965 It could be bad and it will deteriorate over time just because 512 00:31:33,995 --> 00:31:38,275 you know that this model wouldn't be able to generalize further for me. 513 00:31:38,485 --> 00:31:41,820 So it doesn't generalize with you as a person. 514 00:31:42,425 --> 00:31:46,155 I'm asking because I thought that you would just update that model of new 515 00:31:46,155 --> 00:31:51,195 data and have a fresh fine tuned or whatever updated version here and there. 516 00:31:51,755 --> 00:31:53,855 And, you would just replace it. 517 00:31:54,355 --> 00:31:58,890 But if you're telling me that this is how most people run this models, then 518 00:31:58,890 --> 00:32:00,885 I understand why this is so scary. 519 00:32:00,925 --> 00:32:05,420 Not only have this model, but also people interacting with it, 520 00:32:05,690 --> 00:32:08,980 they can break it, they can find a new way of going around your 521 00:32:09,005 --> 00:32:10,515 hacking the model as well. 522 00:32:10,595 --> 00:32:11,035 Yes. 523 00:32:11,300 --> 00:32:15,630 And by design, you want it to be malleable and every conversation it has with 524 00:32:15,630 --> 00:32:17,700 someone is actually changing the model. 525 00:32:18,585 --> 00:32:20,015 That's like triple scary. 526 00:32:20,940 --> 00:32:24,740 that's essentially why you need a new framework or why you need a new 527 00:32:25,000 --> 00:32:31,835 field that I was the core inspiration for as okay, this field has gone a 528 00:32:31,835 --> 00:32:34,425 little bit harder than it used to be. 529 00:32:35,750 --> 00:32:36,120 Okay. 530 00:32:36,120 --> 00:32:42,100 So not to ask you for spoilers or anything, but what can you do about that? 531 00:32:42,240 --> 00:32:45,690 what's your book going to introduce to make this stuff better? 532 00:32:46,323 --> 00:32:52,063 I don't think I can make the stuff better, but, if you can measure 533 00:32:52,063 --> 00:32:54,863 something, then you can improvise it. 534 00:32:55,043 --> 00:32:57,573 Or you can see if something works. 535 00:32:57,993 --> 00:33:00,263 That's happening is an outlier as well. 536 00:33:00,623 --> 00:33:04,833 So what my book really does, is give you ways to measure things, which is 537 00:33:05,163 --> 00:33:09,833 instead of just thinking about security, 'okay, I need to do X, Y, Z, blah, blah, 538 00:33:09,863 --> 00:33:13,743 blah, things', giving you a systematic framework to think about evaluations, 539 00:33:13,743 --> 00:33:18,958 which is, instead of implementing X framework or Y framework, which is let's 540 00:33:18,958 --> 00:33:23,538 say, instead of implementing just rove or blue score or anything that comes out 541 00:33:23,558 --> 00:33:27,768 tomorrow in the market, you really need to understand what am I essentially doing? 542 00:33:28,178 --> 00:33:30,228 Why are these scores really helpful? 543 00:33:30,418 --> 00:33:34,558 What are the limitations of these ones, where do they essentially fail? 544 00:33:34,888 --> 00:33:37,318 What are the new things that can be implemented? 545 00:33:37,338 --> 00:33:40,028 What are the properties that those new things need to have? 546 00:33:40,798 --> 00:33:41,948 So I'm more. 547 00:33:42,773 --> 00:33:46,953 Building the field from that first principles thing, which is understanding 548 00:33:47,623 --> 00:33:52,763 what do you really need and for a lot of things that I'm introducing in the book, 549 00:33:52,773 --> 00:33:56,023 there isn't really a framework, there isn't really a technology out there. 550 00:33:56,133 --> 00:34:00,193 And a lot of things I say, there can be a software that can be built around it. 551 00:34:00,453 --> 00:34:01,233 Nobody has. 552 00:34:02,628 --> 00:34:03,178 Okay. 553 00:34:03,458 --> 00:34:05,328 So that sounds like a good first step. 554 00:34:06,708 --> 00:34:16,613 Can I ask you in like a nutshell version of what's a life cycle of an LLM, like 555 00:34:16,613 --> 00:34:21,893 a modern one that you would see in production somewhere right now, typically 556 00:34:21,893 --> 00:34:25,553 looks like, because I'm just realizing I have holes in my understanding. 557 00:34:25,553 --> 00:34:27,723 It just blew my mind about the context drift. 558 00:34:28,563 --> 00:34:33,503 So can you walk me through what happens from the moment that, a company 559 00:34:33,503 --> 00:34:38,013 decides, 'okay, we need a model to do this because we really want our 560 00:34:38,033 --> 00:34:44,893 customers talk to something online how you add the domain knowledge to it. 561 00:34:44,913 --> 00:34:48,483 How do you evaluate it and how you integrate and then deploy 562 00:34:48,483 --> 00:34:49,773 and monitor the whole thing? 563 00:34:50,713 --> 00:34:55,393 let me be very precise in saying this, which is the first step for anybody 564 00:34:55,423 --> 00:34:59,843 to implement these models is use a toy model or use something which already 565 00:35:00,023 --> 00:35:07,423 exists and implement it as is and build evaluation metrics around your problem. 566 00:35:08,213 --> 00:35:13,123 so instead of trying to fine tune your model, or instead of giving it new 567 00:35:13,153 --> 00:35:15,233 data, just implement the model as is. 568 00:35:15,623 --> 00:35:20,373 use ChatGPT or something, and build evaluation metrics, which is what 569 00:35:20,433 --> 00:35:22,243 was I trying to measure around it? 570 00:35:22,993 --> 00:35:25,543 How is the model performing on these kind of tasks? 571 00:35:25,553 --> 00:35:27,823 So breaking those things down is the first step. 572 00:35:29,123 --> 00:35:33,533 Then it gets a little bit more intricate than that, which is once you realize 573 00:35:33,553 --> 00:35:39,148 these are the holes, or this is the data that I needed for the model to 574 00:35:39,158 --> 00:35:42,328 be able to answer, which is now I need the model to be able to answer 575 00:35:42,328 --> 00:35:47,198 questions about my company particularly, or about my product specifically. 576 00:35:47,518 --> 00:35:50,658 Now you're going into data engineering, which is now you're 577 00:35:50,658 --> 00:35:54,943 thinking, what is the additional data I can provide to the model itself? 578 00:35:55,983 --> 00:36:00,313 And once you've done that then there's the whole pipeline of data engineering 579 00:36:00,323 --> 00:36:04,913 that goes in which is now you need to think about how do you manage the noise? 580 00:36:04,953 --> 00:36:06,503 How do you augment the data? 581 00:36:06,503 --> 00:36:08,253 How are you tokenizing the data? 582 00:36:08,583 --> 00:36:12,773 How are you making sure that there's no bias or toxicity in the data as well? 583 00:36:12,823 --> 00:36:18,693 And how do you make sure that the model doesn't really memorize something. 584 00:36:18,963 --> 00:36:24,193 So the way models memorize information is because some of the information 585 00:36:24,773 --> 00:36:26,463 occurs quite a lot of times. 586 00:36:26,493 --> 00:36:29,733 So that is essentially called data deduplication. 587 00:36:29,983 --> 00:36:33,673 So making sure that there's no deduplication in the model itself. 588 00:36:33,823 --> 00:36:37,053 How do you sanitize the data, which is making sure that, there's no user 589 00:36:37,193 --> 00:36:41,623 information or any private information removed from the data that you're 590 00:36:41,623 --> 00:36:43,213 providing to the model itself. 591 00:36:43,613 --> 00:36:49,673 So then once you have a set of evaluation metrics, then the next step. 592 00:36:49,973 --> 00:36:53,973 Which is the next stage for the company to go in, is implement the 593 00:36:53,973 --> 00:36:58,903 data engineering pipeline, then use the same model on it and then evaluation. 594 00:37:00,523 --> 00:37:05,148 Once you've done evaluation on that one, then the next step is letting 595 00:37:05,178 --> 00:37:06,628 people interact with the model. 596 00:37:06,668 --> 00:37:11,318 But before that, set up orchestration deployment monitoring solutions on it 597 00:37:11,638 --> 00:37:17,048 so that if, you can measure what are the interactions people are having 598 00:37:17,058 --> 00:37:18,948 with these models, essentially. 599 00:37:19,173 --> 00:37:19,703 As well. 600 00:37:20,593 --> 00:37:23,943 So if something goes wrong on security, you can catch things 601 00:37:23,943 --> 00:37:25,483 quickly and turn things off, right? 602 00:37:26,053 --> 00:37:29,943 Or if there are a lot of people who are interacting with the model, you 603 00:37:29,943 --> 00:37:34,683 can serve next time, okay, I need to allocate X, Y, Z number of resources, 604 00:37:34,943 --> 00:37:37,733 or these are the kind of interactions people are having with the model. 605 00:37:37,733 --> 00:37:42,683 Essentially, once you've gone through stage two, now the stage three, the full 606 00:37:42,733 --> 00:37:46,858 pipeline is essentially you're doing data engineering, then you have, an 607 00:37:47,368 --> 00:37:51,488 LLM router which chooses the best base model or the foundation model for you. 608 00:37:51,498 --> 00:37:54,648 That really depends on the kind of prompt as well. 609 00:37:55,208 --> 00:37:58,628 different prompts can use different kinds of models. 610 00:37:58,678 --> 00:38:03,048 let's say if the person is asking an algorithmic question, then ideally a 611 00:38:03,048 --> 00:38:06,738 model which is trained on mathematical information would be much better. 612 00:38:07,198 --> 00:38:11,858 and plus the Other question is you don't always need to use the expensive model. 613 00:38:11,858 --> 00:38:17,558 Sometimes you can get away with providing a more, generalized information, 614 00:38:17,558 --> 00:38:20,948 which is if the person is asking very simplistic question, you don't need 615 00:38:20,948 --> 00:38:25,928 to use ChatGPT, you want to have a system that automatically sees that 616 00:38:25,968 --> 00:38:30,448 prompt and says, I think for this one, I can, inference on LLAMA-2 instead, 617 00:38:30,678 --> 00:38:34,118 I think for this kind of prompt, I can inference on so and so model. 618 00:38:35,278 --> 00:38:40,168 One step next after that, once you've done all of that is, doing 619 00:38:40,198 --> 00:38:41,918 domain adaptation on the model. 620 00:38:41,978 --> 00:38:46,088 Now that can be done in a lot of ways, you can implement prompt engineering 621 00:38:46,108 --> 00:38:52,808 pipelines using frameworks like DSPY, or you can implement drag pipelines to 622 00:38:52,858 --> 00:38:57,763 introduce more information to the model, without having to retrain the model, or 623 00:38:57,763 --> 00:39:01,653 you can do fine tuning and you essentially do fine tuning when you want to achieve 624 00:39:01,663 --> 00:39:07,023 the behavior of the model or how it essentially provides information for you. 625 00:39:07,253 --> 00:39:10,643 So tensioning is more like us putting a wrapper. 626 00:39:11,123 --> 00:39:16,453 it's very similar to if we say we want the input to be shaped like this. 627 00:39:16,453 --> 00:39:20,213 When you're doing structural changes to the input, that's when 628 00:39:20,213 --> 00:39:21,443 you're doing prompt engineering. 629 00:39:22,053 --> 00:39:26,563 But the moment you say: 'I want the input structure to be changed right 630 00:39:26,603 --> 00:39:31,073 now, I want the model to process this information in a different way', then 631 00:39:31,143 --> 00:39:32,723 you're essentially doing fine tuning. 632 00:39:33,593 --> 00:39:38,673 so data engineering, then implementing an LLM router, then doing some sort 633 00:39:38,673 --> 00:39:43,913 of domain adaptation on it, then evaluation and orchestration as well. 634 00:39:43,973 --> 00:39:46,343 Orchestration is more like the piece of how do you tie 635 00:39:46,343 --> 00:39:47,693 different software components. 636 00:39:47,703 --> 00:39:49,963 So how are you doing CI/CD on it? 637 00:39:50,433 --> 00:39:54,483 you're optimizing for things over there to be able to now 638 00:39:54,543 --> 00:39:56,723 reduce the, Influence latency. 639 00:39:56,723 --> 00:40:02,653 then the next step is doing security and reliability engineering, which I've 640 00:40:02,663 --> 00:40:06,553 not really seen a lot of companies do it, but the companies that are working 641 00:40:06,553 --> 00:40:10,473 in banking have already started working very heavily on it because they had the 642 00:40:10,473 --> 00:40:15,803 existing infrastructure where they were doing extensive security, reliability, 643 00:40:15,803 --> 00:40:19,403 engineering, then a few other ones were doing it, which is the big tech 644 00:40:19,413 --> 00:40:23,183 companies, but the more generalized normal companies weren't doing it. 645 00:40:23,653 --> 00:40:26,803 But now that has become one of the core stages. 646 00:40:27,163 --> 00:40:30,333 The next step is basically doing deployment and monitoring. 647 00:40:30,703 --> 00:40:34,733 Once you've done all of that, and the deployment and monitoring is done now. 648 00:40:34,783 --> 00:40:37,103 The end user is interacting with the model. 649 00:40:37,633 --> 00:40:40,993 So when the end user is interacting with the model, you're learning 650 00:40:41,033 --> 00:40:44,993 things because you've implemented monitoring solutions on it. 651 00:40:45,233 --> 00:40:47,933 Now you're making additional changes on security as well. 652 00:40:48,303 --> 00:40:53,403 you're learning from the data, using the customer interaction data, giving 653 00:40:53,403 --> 00:40:55,243 it back to the database as well. 654 00:40:55,323 --> 00:40:59,230 So there's that step which gets associated, so there's a loop of 655 00:40:59,230 --> 00:41:02,983 data flywheel that goes back into the engineering stage itself. 656 00:41:03,983 --> 00:41:04,843 A few questions. 657 00:41:05,593 --> 00:41:06,973 The router... 658 00:41:07,643 --> 00:41:13,613 it's an interesting one, because in my mind, if I talk to different models with 659 00:41:13,613 --> 00:41:18,503 my every query or my every follow up, I might get different behaviors, right? 660 00:41:19,178 --> 00:41:20,378 Isn't that the problem? 661 00:41:20,598 --> 00:41:26,038 if you sometimes root to a cheap funky model, because you want to 662 00:41:26,038 --> 00:41:29,408 save some money and sometimes it goes to ChatGPT, The quality of my 663 00:41:29,408 --> 00:41:31,448 responses might vary significantly. 664 00:41:32,268 --> 00:41:36,648 Is there a good way to work around that or it's just how it is? 665 00:41:37,255 --> 00:41:41,375 I think as long as your infrastructure is monitoring, which is this output came 666 00:41:41,385 --> 00:41:45,965 from this in this model, It's actually ideal because then you can compare 667 00:41:45,965 --> 00:41:50,145 the performance of different models on the kind of queries as well and pick 668 00:41:50,155 --> 00:41:56,275 which prompts or even pick which models should you be using in vain to subside 669 00:41:56,585 --> 00:42:00,940 the use of a particular model within your outer solution itself as well. 670 00:42:02,050 --> 00:42:04,980 And what does a router like this actually look like? 671 00:42:05,010 --> 00:42:07,260 Is that a deterministic algorithm? 672 00:42:07,260 --> 00:42:08,430 Or is it another model? 673 00:42:08,430 --> 00:42:10,240 Is it like turtles all the way down? 674 00:42:10,290 --> 00:42:14,040 I've come across, I think probably two companies that 675 00:42:14,070 --> 00:42:16,310 have built, a semantic router. 676 00:42:16,410 --> 00:42:20,680 they're looking at the semantics of the prompt itself and, based on resource 677 00:42:20,680 --> 00:42:24,430 limits set by the client itself, which could be like the company itself. 678 00:42:24,780 --> 00:42:27,500 They're picking up a particular model at that point in time. 679 00:42:28,590 --> 00:42:31,000 So those are very deterministic solutions. 680 00:42:31,450 --> 00:42:36,010 I've not really seen non deterministic solutions put into play where you could 681 00:42:36,040 --> 00:42:40,480 actually use a large language model as a routing solution, Or like using 682 00:42:40,480 --> 00:42:44,100 a decision tree for, as an LLM router. 683 00:42:44,100 --> 00:42:46,910 So I've not really seen those kind of implementations yet. 684 00:42:49,385 --> 00:42:50,601 What about the evaluation? 685 00:42:50,611 --> 00:42:54,791 So that sounds straightforward, but in practice, how do you 686 00:42:54,791 --> 00:42:57,561 evaluate freestyle text? 687 00:42:57,751 --> 00:43:01,641 do you get people to look at the responses and compare 'oh, I like this one better. 688 00:43:01,641 --> 00:43:03,021 like the chatbot arena'. 689 00:43:03,421 --> 00:43:08,801 Or are there more, scientific ways of comparing, different models. 690 00:43:09,720 --> 00:43:12,760 there's more scientific way of comparing different models because 691 00:43:13,130 --> 00:43:14,740 you're looking at so many things. 692 00:43:14,760 --> 00:43:19,070 You're looking at if the model is engaging, you're looking at if the model 693 00:43:19,330 --> 00:43:22,170 is, aware about that particular domain. 694 00:43:22,180 --> 00:43:24,550 Is the model really good at question answering? 695 00:43:24,600 --> 00:43:30,450 Is the model good at recognizing when it's giving a response that's off as well. 696 00:43:30,770 --> 00:43:35,150 so often picking the right model is a little bit harder for that reason. 697 00:43:35,150 --> 00:43:38,660 But essentially when you're building an evaluation pipeline 698 00:43:38,780 --> 00:43:42,610 for yourself, think about what is your model essentially doing? 699 00:43:42,650 --> 00:43:46,500 Are you building a model that is heavily focused on retrieval only? 700 00:43:46,920 --> 00:43:50,700 Or are you building a model that's very heavily focused on generation only? 701 00:43:51,130 --> 00:43:56,580 Both problems can be broken down, which is retrieval needs to have its own metrics. 702 00:43:56,920 --> 00:44:01,180 These can be, context recall, context precision, basic 703 00:44:01,180 --> 00:44:02,810 recall, basic precision as well. 704 00:44:03,100 --> 00:44:09,335 And for the more generative use cases, you need to have different metrics as well. 705 00:44:09,585 --> 00:44:15,675 So the metrics for the generative solutions or to test the generative 706 00:44:15,845 --> 00:44:20,365 performance you have n gram metrics, which are the blue scores, the raw scores 707 00:44:20,365 --> 00:44:25,805 that people used to implement in like the conventional NLP models, then you have 708 00:44:25,955 --> 00:44:32,075 sem score, which is basically looking at the semantic similarity of the model with 709 00:44:32,075 --> 00:44:33,895 a base transformer model, essentially. 710 00:44:34,155 --> 00:44:36,675 So a birb score, sem score, mover score. 711 00:44:36,735 --> 00:44:38,875 these are essentially called similarity scores. 712 00:44:38,876 --> 00:44:42,275 And then there are LLM based scoring as well. 713 00:44:42,605 --> 00:44:44,585 so there are three different categories. 714 00:44:44,645 --> 00:44:44,795 if. 715 00:44:45,175 --> 00:44:46,385 Somebody wants to learn. 716 00:44:46,385 --> 00:44:51,305 I'll leave it for the users, which is I have a talk on this thing, particularly, 717 00:44:51,755 --> 00:44:54,425 for I did an O'Reilly super stream. 718 00:44:54,535 --> 00:44:58,035 So that will give you like a really good framework to think about this. 719 00:44:58,630 --> 00:45:02,540 which is how to do evaluation, how to think about it super systematically, 720 00:45:02,550 --> 00:45:06,850 where, this is the actual number that I'm supposed to get, which is if it's above 0. 721 00:45:06,890 --> 00:45:09,650 5, if this number is above 0. 722 00:45:09,651 --> 00:45:10,860 7, then I need to optimize. 723 00:45:11,791 --> 00:45:12,171 Got it. 724 00:45:12,831 --> 00:45:15,751 Abhi, do you think we could make it a little bit more concrete and 725 00:45:15,761 --> 00:45:18,051 go through this with some examples? 726 00:45:18,091 --> 00:45:21,101 imagine that you own Reddit, right? 727 00:45:21,271 --> 00:45:24,791 you've got all this people talking about all the different topics and 728 00:45:24,841 --> 00:45:27,231 they tend to be useful in some domains. 729 00:45:27,951 --> 00:45:32,261 And let's say that you wanted to build a model that you can chat about, 730 00:45:32,271 --> 00:45:36,981 that basically knows all the things that, people at Reddit talk about. 731 00:45:37,541 --> 00:45:41,521 And if you wanted to build like a proof of concept to get a model 732 00:45:41,521 --> 00:45:45,721 that can answer queries about that, how would you go about that? 733 00:45:48,000 --> 00:45:54,280 So very simple would be use a similar model, which is, now we're looking at, 734 00:45:54,950 --> 00:45:58,330 Reddit conversations specifically, right? 735 00:45:58,550 --> 00:46:01,730 so what that essentially is basically some. 736 00:46:02,335 --> 00:46:06,545 Internet website that has a lot of information, which has a lot of textual 737 00:46:06,545 --> 00:46:11,515 information, by people on a lot of different topics and a lot of different 738 00:46:11,565 --> 00:46:14,945 languages as well, though I'm not entirely sure about the language part. 739 00:46:15,435 --> 00:46:16,635 So what I would do is. 740 00:46:17,080 --> 00:46:22,300 Look at huggingface for models that are trained on conversational data. 741 00:46:22,350 --> 00:46:26,650 or sub stack kind of data where people answering questions. 742 00:46:27,120 --> 00:46:30,630 so ideally a model that is trained on that kind of information 743 00:46:30,720 --> 00:46:32,210 would be my base model. 744 00:46:32,720 --> 00:46:37,020 Then the next step would be scraping data from Reddit, essentially. 745 00:46:37,420 --> 00:46:40,450 so that would be the next step, which is building my own, dataset 746 00:46:40,510 --> 00:46:44,760 pipeline from Reddit, essentially, and doing fine tuning with that. 747 00:46:45,250 --> 00:46:49,530 so that, that would be the first two steps, and then, the whole 748 00:46:49,570 --> 00:46:53,620 evaluation, security, and all of those things will always be consistent 749 00:46:53,670 --> 00:46:55,280 with all of the models, essentially. 750 00:46:55,911 --> 00:46:56,261 Got it. 751 00:46:56,271 --> 00:47:01,241 So in theory you could take a LLAMA and then fine tune it on all of 752 00:47:01,261 --> 00:47:06,951 Reddit's data and hopefully it would give you something to start with, and 753 00:47:06,951 --> 00:47:11,131 then you would have to worry about Evaluating it and all the other things. 754 00:47:13,501 --> 00:47:13,731 All right. 755 00:47:13,741 --> 00:47:21,856 So I think that probably gives our listeners tonight enough to eagerly await 756 00:47:21,856 --> 00:47:27,146 your book now and wonder when they're going to be able to actually read the 757 00:47:27,146 --> 00:47:29,626 whole thing or maybe buy it off Amazon. 758 00:47:30,096 --> 00:47:33,066 Is there an ETA at the moment that we can give them? 759 00:47:33,871 --> 00:47:38,141 the early release would happen sometime next month, which is, we're already 760 00:47:38,331 --> 00:47:43,071 in May, it should happen sometime in June, the whole book is supposed to 761 00:47:43,081 --> 00:47:44,880 be available by the end of the year. 762 00:47:45,691 --> 00:47:46,191 Awesome. 763 00:47:47,271 --> 00:47:47,611 All right. 764 00:47:47,701 --> 00:47:51,681 And before I let you off the hook, I think you might have seen it coming. 765 00:47:51,681 --> 00:47:55,601 I'm going to ask you for some predictions, obviously, with all 766 00:47:55,601 --> 00:47:57,971 the caveats, how difficult it is. 767 00:47:58,021 --> 00:48:02,401 And, previous performance is not a guarantee of future gains. 768 00:48:02,501 --> 00:48:04,191 where do you see all of this going? 769 00:48:04,801 --> 00:48:11,331 I see more people using generative models Instead of the number of people 770 00:48:11,331 --> 00:48:15,531 who were using it before, one of the big shifts is which is going to 771 00:48:15,531 --> 00:48:19,231 happen is the productivity in that people are getting from these models. 772 00:48:19,241 --> 00:48:20,791 So it could be developers. 773 00:48:20,791 --> 00:48:22,601 It could be people who are doing copywriting. 774 00:48:23,131 --> 00:48:24,961 So companies are getting smaller. 775 00:48:25,576 --> 00:48:27,456 And they will continue to get smaller. 776 00:48:27,806 --> 00:48:32,556 the number of companies that were working with external people or external 777 00:48:32,786 --> 00:48:36,546 audits is going to get smaller as well, I think going into the future, 778 00:48:36,596 --> 00:48:40,306 we would be seeing that shift of, you could say create an economy. 779 00:48:40,306 --> 00:48:43,856 I'm not entirely sure what would be the right word in the specific scenarios, 780 00:48:43,876 --> 00:48:45,756 which is every person is a company. 781 00:48:46,046 --> 00:48:51,635 So now instead of every person being a company, a company being 500, 800, 782 00:48:51,736 --> 00:48:56,796 and 55,000 employees, they will get certainly much, much, much smaller 783 00:48:57,436 --> 00:49:01,886 because one person is going to be able to do a lot, and there's a lot of stuff 784 00:49:01,886 --> 00:49:03,936 that would be automated essentially. 785 00:49:04,416 --> 00:49:10,216 along the lines of what Altman was saying about how he's expecting a 786 00:49:10,276 --> 00:49:16,166 unicorn single person company very soon because of the increased productivity? 787 00:49:18,815 --> 00:49:23,625 I think I would agree with that and, very importantly, this is something which 788 00:49:23,625 --> 00:49:28,975 I've mentioned in like the chapter one of my book as well, which is how big 789 00:49:29,075 --> 00:49:33,085 is the shift essentially, there were a couple of surveys that were done. 790 00:49:33,545 --> 00:49:38,415 And it wouldn't be wrong to say that within the next five years, essentially 791 00:49:38,415 --> 00:49:44,351 28% jobs, at least in some professions, would be eliminated and, they may be 792 00:49:44,391 --> 00:49:49,001 eliminated in the sense of like those people become unemployed for a period 793 00:49:49,001 --> 00:49:54,121 of time because, now the three people are able to do five people's task. 794 00:49:55,851 --> 00:49:59,621 Because again, they've gained more productivity, I don't think people will 795 00:49:59,841 --> 00:50:03,731 be unemployed for long, there will be more and more companies essentially. 796 00:50:03,733 --> 00:50:07,763 Yeah, I think the one thing that I always wonder about is, I remember as a kid 797 00:50:07,793 --> 00:50:12,703 reading all these predictions about how all this increases in productivity will 798 00:50:12,703 --> 00:50:18,983 mean that people work less than their work, like a couple days a week, and 799 00:50:19,013 --> 00:50:20,883 they will just have all this free time. 800 00:50:20,883 --> 00:50:25,833 And people are worrying about how that's going to affect an average 801 00:50:25,833 --> 00:50:27,753 person having so much free time 802 00:50:28,416 --> 00:50:32,586 that's a question one of my friends asked as well, which is what do you think 803 00:50:32,626 --> 00:50:35,816 people would do when full automation really happens and they don't think 804 00:50:35,816 --> 00:50:40,606 there will ever be full automation, there needs to be monitoring systems that are 805 00:50:40,616 --> 00:50:46,336 always put into play, monitoring systems can be automated, they still need to 806 00:50:46,346 --> 00:50:50,786 be fine tuned, but all of that, thing is going to be still done by humans. 807 00:50:50,796 --> 00:50:54,736 So you could say humans are transitioning from becoming 808 00:50:54,746 --> 00:50:56,936 workers to becoming managers. 809 00:50:58,081 --> 00:50:58,431 Yeah. 810 00:50:59,411 --> 00:51:03,241 I'm still working probably similar amount of time, but on 811 00:51:03,241 --> 00:51:05,986 a slightly more productive way. 812 00:51:06,486 --> 00:51:06,776 Yeah. 813 00:51:06,776 --> 00:51:11,636 I think, we had this concept of, silent promotion, that we were talking about 814 00:51:11,656 --> 00:51:16,606 on one of the previous episodes that overnight, everybody who works with code 815 00:51:16,636 --> 00:51:22,006 basically went from single contributor to effectively engineering manager with, Per 816 00:51:22,336 --> 00:51:27,826 like junior equivalent, software engineers at their disposal with tools like co 817 00:51:27,826 --> 00:51:30,596 pilot and just chatting to ChatGPT. 818 00:51:31,286 --> 00:51:36,456 I have friends who are VCs who are now trying to say, instead of trying 819 00:51:36,486 --> 00:51:42,066 to train and associate right now, to teach about, how to look for deals 820 00:51:42,066 --> 00:51:45,756 or, how to compile information from different datasets, which could be 821 00:51:45,796 --> 00:51:47,276 GitHub, which could be CrunchBase. 822 00:51:47,686 --> 00:51:49,146 Why not use a model instead? 823 00:51:49,146 --> 00:51:51,146 And there, instead, spending. 824 00:51:51,531 --> 00:51:57,321 50 to 60K on ChatGPT as compared to hiring a person for that essential task. 825 00:51:57,801 --> 00:52:01,221 so people need to be more autonomously driven. 826 00:52:01,321 --> 00:52:06,191 and the people who aren't, I think they may have a problem, very soon. 827 00:52:08,396 --> 00:52:11,316 Of that, that billboard 'still hiring humans'. 828 00:52:11,516 --> 00:52:12,576 Have you seen that one? 829 00:52:14,526 --> 00:52:14,766 Yeah. 830 00:52:14,766 --> 00:52:17,856 The, one of those companies, where is it called? 831 00:52:17,896 --> 00:52:18,496 The one that. 832 00:52:18,796 --> 00:52:24,006 There's the telephone AI where you can call a number. 833 00:52:24,406 --> 00:52:28,616 Effectively, the billboard was this massive, phone number to call and 834 00:52:28,656 --> 00:52:32,816 asking whether you're still hiring humans and people are calling that. 835 00:52:32,816 --> 00:52:36,506 And apparently it can handle million concurrent phone calls or 836 00:52:36,516 --> 00:52:37,936 some ridiculous stuff like that. 837 00:52:37,956 --> 00:52:44,386 And it's convincingly, replacing like the receptionist or like booking, 838 00:52:44,486 --> 00:52:45,756 conversations that you had before. 839 00:52:46,276 --> 00:52:50,356 Something that I remember that demo from Google years ago, I have, I'm 840 00:52:50,366 --> 00:52:53,976 forgetting what it was called, like duo or something when they had a demo, it 841 00:52:53,996 --> 00:52:58,826 was making a reservation and then it never really worked as well as the demo. 842 00:52:59,436 --> 00:53:03,586 So it's we're effectively reaching that at that moment now, just 843 00:53:03,726 --> 00:53:04,996 with different companies doing it, 844 00:53:05,768 --> 00:53:10,708 maybe this is a realization I do have constantly because I am ADHD. 845 00:53:11,068 --> 00:53:14,128 but we're interacting with so much software or so much 846 00:53:14,128 --> 00:53:15,838 information, which is isolated. 847 00:53:15,838 --> 00:53:18,898 And what we're essentially doing is trying to remember one thing 848 00:53:18,968 --> 00:53:20,208 and implement another thing. 849 00:53:20,498 --> 00:53:24,988 So We need systems that can interact with all of these systems and 850 00:53:25,068 --> 00:53:27,488 be more like assistance for us. 851 00:53:27,488 --> 00:53:30,838 And that's where a lot of people are trying to build up agents as well. 852 00:53:31,358 --> 00:53:36,958 So from isolated software, we're going towards a system where our 853 00:53:37,238 --> 00:53:41,828 software is getting linked as in, it's becoming an ecosystem as well. 854 00:53:42,398 --> 00:53:46,228 That is able to communicate and anticipate our requirements. 855 00:53:46,618 --> 00:53:50,298 But the downsides of that is still to be predicted, which 856 00:53:50,298 --> 00:53:53,058 is, what happens if it goes off? 857 00:53:53,518 --> 00:53:56,378 what happens if somebody hacks into the system? 858 00:53:56,418 --> 00:53:59,763 the risk of, deploying such systems is really high. 859 00:54:00,013 --> 00:54:04,283 So those are all technical problems that would need to be solved for 860 00:54:04,283 --> 00:54:09,563 in, for that particular reason, I think, the field of, safety, which is 861 00:54:09,563 --> 00:54:12,383 people who are working in LLMSecOps. 862 00:54:12,728 --> 00:54:16,918 And the field of evaluation, which is people who are doing evaluation and 863 00:54:16,938 --> 00:54:21,818 monitoring, are going to be some of the most important jobs as compared 864 00:54:21,818 --> 00:54:25,588 to people doing fine tuning and all of those things, while those will continue 865 00:54:25,708 --> 00:54:31,188 to be important, but the more does that we get from other companies will not 866 00:54:31,458 --> 00:54:37,688 eventually with time become really good enough as well, where we may not need 867 00:54:37,718 --> 00:54:39,588 to do a lot of those things manually. 868 00:54:39,638 --> 00:54:43,588 A lot of work of a machine learning engineer or data scientist 869 00:54:43,738 --> 00:54:45,188 will get automated as well. 870 00:54:46,183 --> 00:54:50,623 do you worry about other things that might go wrong with all of this? 871 00:54:50,673 --> 00:54:53,863 I don't think that many people are actually worried about, 872 00:54:53,923 --> 00:54:56,513 Skynet, materializing tomorrow. 873 00:54:57,223 --> 00:55:03,573 But are there things that you're realistically concerned about in, short, 874 00:55:03,573 --> 00:55:05,993 maybe two to five years, time horizon? 875 00:55:07,418 --> 00:55:13,458 Yeah, one of the things that does concern me is how are these models I'm 876 00:55:13,488 --> 00:55:18,988 being used by kids and, we're at the kind of risk that, generative AI does 877 00:55:19,008 --> 00:55:23,538 pose to risk in elderly people who don't really realize the difference, 878 00:55:23,968 --> 00:55:27,708 between something being generated versus something being true, or should they 879 00:55:27,758 --> 00:55:30,288 rely on that to some extent or not? 880 00:55:30,768 --> 00:55:36,508 I think the whole spamming industry got so big, or the whole stealing 881 00:55:36,568 --> 00:55:41,298 people's credit card information got so big, precisely because people need 882 00:55:41,298 --> 00:55:44,388 to stay in touch with the technology, the people who are more vulnerable. 883 00:55:45,923 --> 00:55:49,663 Are getting attacked and they're the people who are most at risk. 884 00:55:49,903 --> 00:55:54,063 So what really concerns me is not people who are data scientists or machine 885 00:55:54,063 --> 00:55:58,243 learning engineers and their jobs going away People i'm concerned most about right 886 00:55:58,243 --> 00:56:03,453 now, are the people who are vulnerable So kids and elderly people who will give a 887 00:56:03,453 --> 00:56:09,113 lot of information to ChatGPT hey charge if you look at my, medical details and see 888 00:56:09,123 --> 00:56:11,143 you, what problem I may be having as well. 889 00:56:11,143 --> 00:56:16,143 And my parents are heavily using chat GP as well, but they don't really realize a 890 00:56:16,143 --> 00:56:19,563 lot of information they're going giving into the system can be hacked very 891 00:56:19,563 --> 00:56:22,573 easily and they can be phishing attacks. 892 00:56:22,593 --> 00:56:25,163 There can be all of those attacks as wel,. 893 00:56:25,183 --> 00:56:25,843 eventually. 894 00:56:27,603 --> 00:56:33,193 Yeah, and I think the scale is what scares me the most about it, right? 895 00:56:33,193 --> 00:56:36,733 The fact that you can do it at a massive scale. 896 00:56:36,743 --> 00:56:41,223 There's always been scammers calling, elderly and scamming 897 00:56:41,243 --> 00:56:42,513 them out of their money. 898 00:56:43,163 --> 00:56:48,173 But now that you can automate it and you can scale it up, you could conceivably 899 00:56:48,363 --> 00:56:50,063 just make it a massive problem. 900 00:56:51,623 --> 00:56:54,263 And the second bit of that problem is... 901 00:56:54,513 --> 00:56:59,093 when you steal all of that data, you're stealing how the person is interacting 902 00:56:59,103 --> 00:57:03,773 because large language models are so good at impersonating or trying to 903 00:57:04,213 --> 00:57:08,733 learn how a person structures their question or answers their question. 904 00:57:08,763 --> 00:57:12,203 And, the same is happening with an audio speech synthesis as well, which 905 00:57:12,203 --> 00:57:18,098 is the models are getting much better at learning the, And the intonations 906 00:57:18,098 --> 00:57:22,468 or different, tonal capabilities of different people as well and adapting to 907 00:57:22,468 --> 00:57:28,028 them does expose a lot of risk because it becomes so easy to impersonate and 908 00:57:28,098 --> 00:57:31,378 spread misinformation or to be able to. 909 00:57:31,838 --> 00:57:38,238 Hurt somebody if hurt is a word or is a concentration in that particular 910 00:57:38,268 --> 00:57:42,788 scenario, which is, it can impersonate anybody and ask for certain information. 911 00:57:42,788 --> 00:57:46,588 It can interact with your child and there's a lot of information 912 00:57:46,588 --> 00:57:49,278 as well because people are interacting with these models every 913 00:57:49,278 --> 00:57:50,688 single minute of the day as well. 914 00:57:50,918 --> 00:57:54,898 And we'll get more and more With all of the systems, which is 915 00:57:54,968 --> 00:57:59,718 Google is now integrating their AI systems into Google Docs. 916 00:58:00,138 --> 00:58:02,028 So ChatGPT was already there. 917 00:58:02,038 --> 00:58:05,278 Now, Instagram might very soon integrate this. 918 00:58:06,548 --> 00:58:11,838 I don't think there will be a world where we can escape generative models as such, 919 00:58:12,408 --> 00:58:16,578 and the more we have conversations with them, the more they are learning about 920 00:58:16,578 --> 00:58:21,718 their personalities and about everything we're doing on the internet, essentially. 921 00:58:21,726 --> 00:58:21,986 Okay. 922 00:58:21,986 --> 00:58:24,416 I'm going to ask you for one more prediction. 923 00:58:24,836 --> 00:58:26,846 And then I promise I'll let you have the hook. 924 00:58:26,946 --> 00:58:33,576 Today, it's all about OpenAI, this OpenAI that, we also have seen that memo 925 00:58:33,846 --> 00:58:36,966 about Google having no moat a year ago. 926 00:58:37,876 --> 00:58:40,996 You obviously are deep in the industry. 927 00:58:41,386 --> 00:58:46,716 Where do you expect to see the different companies that were used to seeing, your 928 00:58:46,736 --> 00:58:51,446 Googles of the world that don't seem to be doing that well with the AI, despite being 929 00:58:51,446 --> 00:58:53,416 there at the forefront and that long ago. 930 00:58:54,106 --> 00:58:58,166 The different startups that, didn't exist a few years ago and now 931 00:58:58,166 --> 00:59:01,006 they're doing exceptional things. 932 00:59:01,926 --> 00:59:04,526 I'm thinking about places like midjourney. 933 00:59:04,976 --> 00:59:07,856 Where would you pay attention to the most? 934 00:59:07,866 --> 00:59:10,806 Where do you expect to see the good stuff coming from? 935 00:59:12,676 --> 00:59:17,596 So I would say it would change, which is, The companies that were able to be 936 00:59:17,596 --> 00:59:22,366 monopolies now, it will be very hard to be a monopoly that easily without 937 00:59:22,966 --> 00:59:25,026 by just trying to build software. 938 00:59:25,436 --> 00:59:29,346 so by acquisition, yes, you can be a monopoly, which is trying 939 00:59:29,366 --> 00:59:31,926 to acquire everybody, which is essentially what Google was doing. 940 00:59:31,926 --> 00:59:34,356 So a lot of people do think Google is essentially business 941 00:59:34,356 --> 00:59:35,406 world building products. 942 00:59:35,746 --> 00:59:38,596 No, they were essentially acquiring all the small companies that 943 00:59:38,596 --> 00:59:39,976 were building excellent products. 944 00:59:40,926 --> 00:59:45,766 before they became big, and that's a word we're moving further into because the 945 00:59:45,776 --> 00:59:50,926 bigger companies do have the infinite resources, compute resources as well 946 00:59:50,986 --> 00:59:53,156 to be able to control the ecosystem 947 00:59:53,206 --> 00:59:55,906 So I would say we will more likely see. 948 00:59:56,211 --> 00:59:59,751 More monopolies, but those wouldn't be monopolies because 949 00:59:59,751 --> 01:00:01,481 they have an excellent product. 950 01:00:01,531 --> 01:00:06,101 Those would be monopolies because they have more access to information, and they 951 01:00:06,111 --> 01:00:09,111 have higher number of resources out there. 952 01:00:09,711 --> 01:00:14,511 The number of small companies, yes, there will be many, but I do it easily 953 01:00:14,511 --> 01:00:19,471 assume, there, there will be still tons of companies who make exits as compared to 954 01:00:20,641 --> 01:00:24,071 becoming the victim in every hype cycle. 955 01:00:24,261 --> 01:00:27,281 And I would say we're going through a hype cycle right now where 956 01:00:27,651 --> 01:00:31,731 there's far too much paranoia and there's far too much excitement. 957 01:00:31,851 --> 01:00:37,121 There's very little realism around, the business value being derived 958 01:00:37,121 --> 01:00:38,901 out of these models essentially. 959 01:00:39,301 --> 01:00:43,171 so in this hype cycle, there are always a lot of companies that get created. 960 01:00:43,811 --> 01:00:49,311 Two years from now, there's a very good chance at least seven out of 961 01:00:49,311 --> 01:00:50,751 ten of those companies will die. 962 01:00:51,751 --> 01:00:52,331 Okay. 963 01:00:52,451 --> 01:00:54,321 And on that optimistic note, 964 01:00:56,651 --> 01:00:58,381 we're going to wrap up the episode. 965 01:00:58,531 --> 01:01:01,461 my guest, once again, everybody was Abi Arian. 966 01:01:01,541 --> 01:01:03,971 You can find her at abbyarian. 967 01:01:04,021 --> 01:01:04,481 com. 968 01:01:04,661 --> 01:01:06,191 Is that the best place to find you? 969 01:01:07,556 --> 01:01:12,576 Yep, so that's one place where you find all the information or where I'm giving 970 01:01:12,626 --> 01:01:17,926 talks, because essentially that's where I'm presenting bits of information from 971 01:01:17,936 --> 01:01:20,056 my book and testing out my material. 972 01:01:21,326 --> 01:01:25,506 So the best place to find information about me or to find social media. 973 01:01:25,506 --> 01:01:30,556 And if the links change, but otherwise, I'm @goabiarian on Twitter, on 974 01:01:30,596 --> 01:01:32,506 threads, on Instagram, on LinkedIn. 975 01:01:32,876 --> 01:01:34,886 So I use the same username everywhere. 976 01:01:35,356 --> 01:01:36,186 You can find me. 977 01:01:36,656 --> 01:01:37,186 There you go. 978 01:01:37,246 --> 01:01:42,286 Abi is omnipresent, always watching you on every platform and the 979 01:01:42,296 --> 01:01:47,246 book once again called "LLM Ops, Managing Large Language Models in 980 01:01:47,246 --> 01:01:49,436 Production", published by the O'Reilly. 981 01:01:49,896 --> 01:01:50,876 Thank you so much, Abhi. 982 01:01:50,936 --> 01:01:51,616 Thank you for coming. 983 01:01:53,066 --> 01:01:54,196 Thank you so much.