1 00:00:00,300 --> 00:00:06,370 I'm Miko Pawlikowski and this is HockeyStick. 2 00:00:07,130 --> 00:00:09,590 Today we talk about algorithms in machine learning. 3 00:00:09,970 --> 00:00:14,229 I'm joined by Vadim Smolyakov, the author of "Machine Learning Algorithms 4 00:00:14,250 --> 00:00:18,820 in Depth" by Manning, a data scientist and in the Enterprise and Security 5 00:00:18,820 --> 00:00:24,659 DI R&D team at Microsoft and a former PhD student in AI at MIT CSAIL. 6 00:00:25,190 --> 00:00:28,650 His job today is to simplify ML algorithms enough for me to understand. 7 00:00:29,190 --> 00:00:32,340 And if that wasn't hard enough, he's not allowed to use any pictures. 8 00:00:33,160 --> 00:00:37,009 Welcome to this episode and thank you for flying HockeyStick. 9 00:00:37,649 --> 00:00:41,720 I don't get to speak to a lot of people who have done the MIT CSAIL. 10 00:00:41,759 --> 00:00:47,430 It's like a mystical, legendary course at this stage with all the hype around AI. 11 00:00:48,200 --> 00:00:49,320 Maybe let's start there. 12 00:00:49,399 --> 00:00:50,180 How was it? 13 00:00:50,670 --> 00:00:51,560 How did you enjoy it? 14 00:00:52,050 --> 00:00:53,850 it was definitely an experience. 15 00:00:53,950 --> 00:00:58,729 I really liked the theoretical aspects of, the content treatment, Right now 16 00:00:58,779 --> 00:01:03,579 there's a lot of news articles that comes out about AI, And, people try to 17 00:01:03,629 --> 00:01:09,379 catch up on the latest large language models, they really took over in the past 18 00:01:09,379 --> 00:01:15,769 few years, but what I think is really a unique about, MIT CSAIL is that, the 19 00:01:15,779 --> 00:01:20,899 theoretical treatment of the subject and really getting in depth and understanding 20 00:01:20,939 --> 00:01:23,159 behind the hood, how things work. 21 00:01:23,802 --> 00:01:27,172 It also seems to be like the who's who, a lot of names that are 22 00:01:27,172 --> 00:01:28,862 recognizable now went through that. 23 00:01:28,862 --> 00:01:32,232 So you focus on Bayesian inference. 24 00:01:32,262 --> 00:01:34,442 What was your thesis about? 25 00:01:35,077 --> 00:01:38,769 my focuses was on Bayesian non parametrics. 26 00:01:39,279 --> 00:01:44,009 And it's a very interesting set of models in which the parameters grow with data. 27 00:01:45,519 --> 00:01:50,589 So one example is, Dirichlet Process K-means, for example, where you 28 00:01:50,609 --> 00:01:55,899 are trying to classify a number of, let's say, species, and you don't 29 00:01:55,899 --> 00:01:57,559 know how many there are, right? 30 00:01:57,559 --> 00:02:00,809 So as you keep discovering new species, you add new clusters. 31 00:02:01,449 --> 00:02:05,369 And, for that to work, you need to set the number of clusters K. 32 00:02:06,339 --> 00:02:11,329 and, with Dirichlet process K means this number of clusters is set automatically. 33 00:02:11,599 --> 00:02:14,469 which is like one of the main advantages of the algorithm. 34 00:02:15,309 --> 00:02:18,899 so based on non parametrics deals with models in which the number 35 00:02:18,899 --> 00:02:20,469 of parameters grows with data. 36 00:02:21,519 --> 00:02:26,589 It's a clever way of expanding the model size and capacity 37 00:02:26,589 --> 00:02:29,159 to fit the data available. 38 00:02:30,279 --> 00:02:33,899 And did you manage to bring that kind of research, expand on that and in 39 00:02:33,899 --> 00:02:35,559 what you do at Microsoft at the moment? 40 00:02:35,999 --> 00:02:41,759 yeah, at Microsoft at the moment, I'm like a local ML expert on the security team. 41 00:02:41,840 --> 00:02:46,280 so the nature of machine learning problems switched from, Bayesian inference to 42 00:02:46,280 --> 00:02:48,280 more anomaly detection type problems. 43 00:02:48,570 --> 00:02:54,100 essentially, I worked at Microsoft on, time series anomaly detection. 44 00:02:54,140 --> 00:02:58,840 I worked on, support ticket classification routing. 45 00:02:58,940 --> 00:03:00,550 I worked on hyper personalization. 46 00:03:01,755 --> 00:03:04,775 And, LLM data copilot most recently. 47 00:03:04,875 --> 00:03:09,005 the principles carry over, but the Bayesian non parametric nature 48 00:03:09,005 --> 00:03:12,815 of work doesn't necessarily, extend to my work right now. 49 00:03:14,210 --> 00:03:17,410 One of the reasons, how we actually met is through your book, The 50 00:03:17,420 --> 00:03:20,220 Machine Learning Algorithms in Depth. 51 00:03:21,350 --> 00:03:25,680 Can you tell us a little bit about the origin story of your book? 52 00:03:25,910 --> 00:03:26,820 Why did you write it? 53 00:03:26,915 --> 00:03:30,065 I've always liked writing even before graduate school. 54 00:03:30,165 --> 00:03:33,675 And, to me, working on a machine learning project in grad school 55 00:03:33,675 --> 00:03:38,425 and then, writing it up and eight pages of publication was not enough. 56 00:03:38,475 --> 00:03:39,545 I wanted to do more. 57 00:03:39,545 --> 00:03:43,345 I wanted to blog about the concepts I was learning. 58 00:03:43,375 --> 00:03:46,945 I wanted to, have a journal and then. 59 00:03:47,785 --> 00:03:51,055 I've done a number of blog posts and I realized that again, 60 00:03:51,065 --> 00:03:52,495 even that wasn't enough for me. 61 00:03:52,495 --> 00:03:57,595 I wanted to compile them into a collection of algorithms, collection of books. 62 00:03:57,615 --> 00:04:03,685 And at the same time I was writing, this library of algorithms as part 63 00:04:03,685 --> 00:04:06,855 of graduate studies, I was getting more experience and I figured. 64 00:04:07,395 --> 00:04:14,095 wouldn't it be nice one day to put all of this together in a format which 65 00:04:14,395 --> 00:04:16,495 would be accessible to a wide audience? 66 00:04:17,915 --> 00:04:21,645 and for me, I was personally transitioning, my area of study 67 00:04:21,645 --> 00:04:25,535 from wireless communications, which I did during my master's, to 68 00:04:25,575 --> 00:04:27,915 more machine learning during PhD. 69 00:04:29,010 --> 00:04:33,330 So I had to learn, a lot of these concepts, from scratch. 70 00:04:33,940 --> 00:04:36,620 So there was a steep learning curve and I figured if I can do 71 00:04:36,620 --> 00:04:38,300 it, then so can other people. 72 00:04:39,060 --> 00:04:43,470 And, that was like a big motivation behind writing a book is to be able 73 00:04:43,500 --> 00:04:48,920 to teach these cool concepts that I, learned in graduate school to, a wide 74 00:04:48,990 --> 00:04:51,060 audience, interested in the topic. 75 00:04:51,060 --> 00:04:54,250 so that sounds like a long time coming, right? 76 00:04:54,300 --> 00:04:58,390 from the moment you started writing this blog post to a finished book. 77 00:04:58,390 --> 00:04:59,630 How many years did that take? 78 00:04:59,850 --> 00:05:05,470 I would say it was two years of just writing the book, but I also had a lot 79 00:05:05,470 --> 00:05:07,820 of materials prepared ahead of time. 80 00:05:08,080 --> 00:05:13,120 like code and, some ideas on what to write about, which would take another two 81 00:05:13,140 --> 00:05:16,160 years just to compile everything together. 82 00:05:16,830 --> 00:05:20,060 sounds a little bit like, some of the authors I speak to and some 83 00:05:20,060 --> 00:05:26,870 of my friends are of this camp who basically like using writing as a 84 00:05:26,870 --> 00:05:28,650 tool to understand things better. 85 00:05:29,010 --> 00:05:32,010 there's this saying that if you can't understand something simply enough, 86 00:05:32,010 --> 00:05:33,390 you don't understand it well enough. 87 00:05:33,687 --> 00:05:38,537 are you also of that mind that writing a book is like the best way for yourself to 88 00:05:38,537 --> 00:05:42,897 organize this information in a way that you really can explain it to other people? 89 00:05:42,972 --> 00:05:43,772 yeah, definitely. 90 00:05:43,772 --> 00:05:47,412 And it takes several passes to, I know this, like you have one kind 91 00:05:47,412 --> 00:05:50,852 of point of view of an algorithm and then you start writing it and it's 92 00:05:50,852 --> 00:05:54,332 Oh, there's actually concepts like I'm thinking of decision trees right now. 93 00:05:55,012 --> 00:05:57,822 there, you have certain exposure to decision trees at first 94 00:05:57,822 --> 00:05:59,672 interpretable models, and then. 95 00:06:00,762 --> 00:06:04,052 You realize that, Hey, it's actually a recursive algorithm 96 00:06:04,052 --> 00:06:08,122 and, you grow the trees recursively until they reach maximum depth. 97 00:06:08,952 --> 00:06:12,392 And, of these parameters like mug steps, they start making a lot of 98 00:06:12,392 --> 00:06:16,172 sense, and then you start thinking about like bias, variance trade offs. 99 00:06:16,202 --> 00:06:20,262 And, you really understand the algorithm in depth 100 00:06:20,692 --> 00:06:24,042 Okay, so dear listeners, I think you know where this is going. 101 00:06:24,052 --> 00:06:27,782 Now we're going to try to go through some of those algorithms from the book and 102 00:06:27,792 --> 00:06:32,142 give you a sneak peek enough to understand some of the things you might not know. 103 00:06:33,127 --> 00:06:39,197 And also enough to go and buy Vadim's book, obviously, but before we do that, 104 00:06:39,847 --> 00:06:41,897 so who's the target audience of the book? 105 00:06:41,907 --> 00:06:42,647 Who is it for? 106 00:06:42,647 --> 00:06:44,197 And who's it not for? 107 00:06:44,310 --> 00:06:50,150 I wanted to make the book intermediate level so that, anyone who has some 108 00:06:50,150 --> 00:06:54,120 experience with machine learning could benefit from it, but also somebody 109 00:06:54,120 --> 00:06:57,200 who's new to machine learning will be able to pick up the concepts. 110 00:06:57,730 --> 00:07:03,230 So specifically, I'd say the audience are, be graduate students, it could 111 00:07:03,240 --> 00:07:07,590 be undergraduate students who are interested in the topic, it could be, 112 00:07:07,650 --> 00:07:10,710 people who are trying to get into the field of machine learning, but are 113 00:07:10,730 --> 00:07:14,390 working in the industry right now, like as a, let's say software developer. 114 00:07:15,095 --> 00:07:18,425 the book does, derive algorithms from scratch. 115 00:07:18,425 --> 00:07:22,205 So there's some requirements in terms of mathematics that are good to know, 116 00:07:22,315 --> 00:07:25,295 linear algebra, probability calculus. 117 00:07:26,225 --> 00:07:31,615 So I would say, anyone with interest in machine learning should be 118 00:07:31,615 --> 00:07:33,095 able to benefit from this book. 119 00:07:34,355 --> 00:07:34,715 Okay. 120 00:07:34,755 --> 00:07:36,345 But who shouldn't read it? 121 00:07:36,465 --> 00:07:39,875 what kind of expectations are going to misguide it for 122 00:07:39,875 --> 00:07:42,065 people to approach your book? 123 00:07:42,125 --> 00:07:45,655 The book is in depth written for somebody who's interested in 124 00:07:45,975 --> 00:07:50,755 understanding the algorithms from scratch, how they work under the hood. 125 00:07:51,365 --> 00:07:55,225 So if you don't have interest in that, then you just want to use the libraries, 126 00:07:55,475 --> 00:08:01,015 import scikit-learn, or hugging face, you wouldn't benefit as much, from reading it. 127 00:08:01,020 --> 00:08:01,660 Fair enough. 128 00:08:02,030 --> 00:08:04,980 So with that, warning ahead of us. 129 00:08:05,030 --> 00:08:08,040 Imagine that you're speaking to a five year old software 130 00:08:08,040 --> 00:08:12,040 engineer, which we basically are right now, where should we start? 131 00:08:12,200 --> 00:08:14,950 what's the first example that you cover in book? 132 00:08:15,020 --> 00:08:17,850 does it have Bayesian, next to its name? 133 00:08:19,035 --> 00:08:22,175 yeah, first thing I talk about is the Bayesian worldview. 134 00:08:22,945 --> 00:08:26,655 and basically, it's a way to view the world in which you start 135 00:08:26,655 --> 00:08:28,565 with some prior knowledge, right? 136 00:08:28,605 --> 00:08:30,455 Bayesians, they talk a lot about priors. 137 00:08:30,495 --> 00:08:35,115 You start with a prior knowledge of a particular aspect of the world. 138 00:08:35,165 --> 00:08:37,605 The world is too complex to have priors over everything. 139 00:08:37,625 --> 00:08:40,095 So you typically try to model a particular problem. 140 00:08:40,095 --> 00:08:41,565 You start with a prior knowledge. 141 00:08:42,425 --> 00:08:47,095 And then as you observe data, you update that prior knowledge 142 00:08:47,115 --> 00:08:48,825 into what's called the posterior. 143 00:08:49,275 --> 00:08:53,545 it's a probability of, parameters given the data, right? 144 00:08:53,545 --> 00:08:58,665 So as you observing data, you evolving your understanding of the world into 145 00:08:59,155 --> 00:09:01,675 something new, a posterior distribution, 146 00:09:02,295 --> 00:09:05,235 So what's an example of an algorithm like that? 147 00:09:06,047 --> 00:09:08,767 could anything you important so I could learn. 148 00:09:09,327 --> 00:09:12,217 Uh, is an example of, algorithm of this nature. 149 00:09:12,227 --> 00:09:16,827 So, let's say anything with a graphical model to, Gaussian 150 00:09:16,847 --> 00:09:18,437 mixture model is an example of that. 151 00:09:19,067 --> 00:09:23,677 In gaussian mixture model, you're modeling the distribution of 152 00:09:23,697 --> 00:09:29,527 data points using a mixture or a collection of Gaussian distributions. 153 00:09:30,377 --> 00:09:34,817 So essentially, the model is, a scaled sum of Gaussians that are 154 00:09:34,817 --> 00:09:37,877 parametrized by a mean and covariance. 155 00:09:39,037 --> 00:09:42,157 and the idea is to learn the mean and the covariance matrix, 156 00:09:42,447 --> 00:09:44,277 and, the mixture proportions. 157 00:09:45,417 --> 00:09:46,717 from the data itself. 158 00:09:47,747 --> 00:09:50,037 so there are several algorithms for learning it. 159 00:09:50,077 --> 00:09:52,967 one is one popular algorithms, EM algorithm. 160 00:09:53,787 --> 00:09:55,427 But, talk about it in the book. 161 00:09:55,437 --> 00:10:03,357 Um, but the idea is to be able to describe the data in this kind of 162 00:10:03,367 --> 00:10:06,307 forms of Gaussians, really closely. 163 00:10:07,402 --> 00:10:10,662 In a way that maximizes the likelihood of data. 164 00:10:11,382 --> 00:10:16,262 we may start off with a knowledge that all data is distributed as 165 00:10:16,262 --> 00:10:19,492 a uniform Gaussian distribution. 166 00:10:19,992 --> 00:10:23,212 And as we observe more points, we update that our knowledge into, 167 00:10:23,362 --> 00:10:27,542 we evolve the shape of a uniform into a distribution that actually 168 00:10:28,562 --> 00:10:30,632 covers the points in a close way. 169 00:10:31,587 --> 00:10:35,697 So that would be one example of how Bayesian, approach applies here. 170 00:10:36,475 --> 00:10:39,415 So it sounds like basically some kind of iterative process where you're 171 00:10:39,415 --> 00:10:45,885 taking new data and budge your, not your parameters, in the right direction, 172 00:10:46,265 --> 00:10:51,355 to fit more closely, your new data, So that's a worldview algorithm. 173 00:10:51,445 --> 00:10:54,995 You also mentioned previously when we were talking about your 174 00:10:54,995 --> 00:10:57,145 background, non parametrics. 175 00:10:57,145 --> 00:10:59,795 Can you tell us a bit more about that? 176 00:11:00,047 --> 00:11:05,337 Bayesian nonparametrics are the ones that, number of parameters grows with 177 00:11:05,337 --> 00:11:07,407 the number of data, the amount of data. 178 00:11:07,407 --> 00:11:11,977 the number of parameters automatically gets inferred from, data itself. 179 00:11:13,597 --> 00:11:18,217 one example, since we just talked about Gaussian mixture model is Dirichlet 180 00:11:18,217 --> 00:11:24,117 process mixture model, which is an extension of Gaussian mixtures with 181 00:11:24,347 --> 00:11:26,347 potential infinite number of mixtures. 182 00:11:26,972 --> 00:11:32,082 obviously constrained towards the simplest model, that describes the data 183 00:11:32,282 --> 00:11:34,682 best, like Occam's razor principle. 184 00:11:35,462 --> 00:11:39,052 but, yeah, Dirichlet process mixture model where you don't know the number 185 00:11:39,052 --> 00:11:42,882 of clusters and you informed that from the data itself automatically. 186 00:11:44,382 --> 00:11:50,482 Yeah, so that's an example of a Bayesian nonparametric model main advantage. 187 00:11:51,452 --> 00:11:51,792 Okay. 188 00:11:52,122 --> 00:11:56,192 So that's a little bit more abstract than a previous example of what you gave of, 189 00:11:56,222 --> 00:11:58,642 clustering various species of plants. 190 00:11:59,232 --> 00:12:05,992 Where does it have practical, application in day to day life of a developer. 191 00:12:06,702 --> 00:12:10,102 So clustering is a type of unsupervised learning where you 192 00:12:10,102 --> 00:12:13,762 are interested in understanding underlying patterns and data. 193 00:12:14,002 --> 00:12:16,052 a very kind of abstract, notion. 194 00:12:16,072 --> 00:12:18,412 And, the applications are. 195 00:12:19,057 --> 00:12:19,927 many, right? 196 00:12:19,987 --> 00:12:25,077 So one example of clustering could be to detect anomalies, for instance, 197 00:12:25,097 --> 00:12:29,817 if, you group data and there's an outlier, a point that's sufficiently 198 00:12:29,817 --> 00:12:34,427 far away from all the existing points, you could see that as an anomaly. 199 00:12:34,607 --> 00:12:40,397 And, basically, that would be one application where cluster is important. 200 00:12:40,427 --> 00:12:44,997 Another application could be customer segmentation. 201 00:12:45,177 --> 00:12:51,667 You're interested in, figuring out different cohorts of customers and, their 202 00:12:52,207 --> 00:12:55,027 lifetime value for a particular product. 203 00:12:55,647 --> 00:13:00,597 the example I give in the book is that of, classifying iris species. 204 00:13:01,337 --> 00:13:03,467 It's a classic machine learn data set. 205 00:13:03,547 --> 00:13:05,847 And, Yeah, it's simple enough to understand. 206 00:13:05,887 --> 00:13:09,027 So it's a toy example, but the applications are numerous. 207 00:13:09,187 --> 00:13:13,187 A lot of people coming, and listening to this, they come from classical, 208 00:13:13,187 --> 00:13:14,357 software engineering background. 209 00:13:14,377 --> 00:13:19,737 And when we say algos they think about binary search and, stuff like that and 210 00:13:20,287 --> 00:13:26,007 chasing down the complexity and thinking about, constraints of the stuff like that. 211 00:13:26,682 --> 00:13:32,762 And then we've got the machine learning algorithms that, somehow sound exotic. 212 00:13:32,792 --> 00:13:36,512 And, with all the hype around AI, everybody's wondering, oh, should 213 00:13:36,512 --> 00:13:37,812 I be looking into more of that? 214 00:13:37,862 --> 00:13:41,792 You mentioned the Bayesian worldview and non parametrics 215 00:13:41,842 --> 00:13:43,532 and a few applications of those. 216 00:13:43,872 --> 00:13:48,832 I wonder if this is like a representative sample of machine learning algorithms. 217 00:13:49,432 --> 00:13:53,602 And, the second part of the question is assuming that's the case, what other 218 00:13:53,622 --> 00:13:59,822 algorithms would you place firmly in this basic 101 machine learning algorithm 219 00:13:59,822 --> 00:14:01,682 set that everybody should be aware of? 220 00:14:01,947 --> 00:14:05,997 So first I want to make a distinction between kind of classical algorithms 221 00:14:06,017 --> 00:14:09,867 and machine learning algorithms, you have some sort of task that 222 00:14:09,867 --> 00:14:11,177 you're trying to solve, right? 223 00:14:11,187 --> 00:14:14,807 An algorithm is essentially a sequence of steps in solving that task. 224 00:14:15,477 --> 00:14:19,877 So example could be, like you said, binary search, over a sorted array. 225 00:14:20,387 --> 00:14:23,087 Or it could be a, sorting itself. 226 00:14:23,397 --> 00:14:28,307 and, you're interested in, runtime and memory complexity to characterize the 227 00:14:28,347 --> 00:14:33,827 algorithm, And run it in the fastest possible time using a small sum of memory. 228 00:14:34,507 --> 00:14:41,267 so for instance, comparison based sorting is, has, and log n runtime complexity. 229 00:14:41,337 --> 00:14:43,867 same carries over to machine learning as well. 230 00:14:43,867 --> 00:14:48,387 But, with the differences that in machine learning, you given, a 231 00:14:48,387 --> 00:14:53,337 collection of input output pairs, and you try to learn the rules to map the 232 00:14:53,337 --> 00:14:55,607 inputs to the outputs during training. 233 00:14:56,417 --> 00:15:00,797 so instead of having a fixed set of instructions, quicksort, for example, 234 00:15:01,297 --> 00:15:08,707 instead, if you're classifying, points, then you are learning the classification 235 00:15:08,707 --> 00:15:10,497 boundaries between the existing points. 236 00:15:11,087 --> 00:15:14,607 So you're learning the rules when it comes to machine learning algorithms. 237 00:15:14,777 --> 00:15:19,967 and, we did talk about nonparametrics and we talked about Bayesian algorithms. 238 00:15:20,507 --> 00:15:24,927 actually a lot of algorithms are derived from principles of, applied probability, 239 00:15:25,967 --> 00:15:32,547 Bayes rule so examples include Naive Bayes, examples include, mixture models. 240 00:15:33,577 --> 00:15:37,657 some of the principles like maximizing likelihood is a common 241 00:15:37,657 --> 00:15:40,077 theme across a variety of algorithms. 242 00:15:40,197 --> 00:15:46,197 just like in deep learning, choosing the loss function is a common theme across 243 00:15:46,317 --> 00:15:48,320 a variety of, deep learning models. 244 00:15:49,090 --> 00:15:49,790 so definitely. 245 00:15:49,890 --> 00:15:52,640 a big kind of category of algorithms. 246 00:15:52,650 --> 00:15:55,960 So we haven't touched upon yet is deep learning algorithms. 247 00:15:57,760 --> 00:16:02,540 and in general to classify the algorithm types, they come in 248 00:16:02,540 --> 00:16:04,440 supervised and unsupervised fashion. 249 00:16:05,065 --> 00:16:11,445 So, the supervised algorithms have, a label associated with every example. 250 00:16:11,735 --> 00:16:17,615 So in other words, what the right answer looks like given the problem and given 251 00:16:17,615 --> 00:16:23,145 enough of these right answers, the algorithm is learning how to create right 252 00:16:23,145 --> 00:16:27,415 answers by itself, through generalization. 253 00:16:28,055 --> 00:16:31,785 what I mean by that is, the goal of machine learning is to generalize to 254 00:16:31,815 --> 00:16:35,705 unseen data, to be able to demonstrate that something has been learned. 255 00:16:36,965 --> 00:16:39,915 So you mentioned supervised and unsupervised. 256 00:16:39,915 --> 00:16:44,525 And from what you said, I understand that supervised is basically, some 257 00:16:44,525 --> 00:16:47,635 kind of underlying function that we're trying to approximate, right? 258 00:16:48,045 --> 00:16:48,745 So that. 259 00:16:49,360 --> 00:16:55,120 As many, unknown, kind of data points, land as close to what we would like 260 00:16:55,120 --> 00:16:57,890 them to by giving it examples, right? 261 00:16:57,940 --> 00:17:03,420 By comparison, what does it actually mean when the algorithm is unsupervised? 262 00:17:04,415 --> 00:17:10,635 So when algorithm's unsupervised, we don't have a learning, label, to which to learn 263 00:17:10,635 --> 00:17:16,695 from, but instead what we're interested in is, understanding, patterns in data. 264 00:17:17,255 --> 00:17:20,045 we're interested in making sense of a lot of data. 265 00:17:20,045 --> 00:17:23,475 clustering is one example of, unsupervised learning where we group 266 00:17:23,475 --> 00:17:28,185 data into clusters and then try to make sense of each individual cluster 267 00:17:28,245 --> 00:17:30,385 or interpreted for our application. 268 00:17:30,970 --> 00:17:33,950 Will be some other examples of unsupervised. 269 00:17:34,545 --> 00:17:39,365 another example that comes to mind is, the extracting features from data. 270 00:17:40,075 --> 00:17:44,095 essentially all the encoders, they take an input and they reconstruct. 271 00:17:44,565 --> 00:17:49,760 An output from the input, but there is a bottleneck layer in between, which 272 00:17:50,190 --> 00:17:54,620 forces the auto encoder to learn a compressed representation of the input 273 00:17:54,620 --> 00:17:57,540 data, before generating the output, right? 274 00:17:57,540 --> 00:18:03,000 So this bottleneck layer, which means that it has fewer parameters than the 275 00:18:03,010 --> 00:18:09,275 input kind of forces the auto encoder to learn something useful about the data, 276 00:18:09,385 --> 00:18:12,495 and this could be used as a feature later on in a downstream algorithm. 277 00:18:13,240 --> 00:18:13,710 So 278 00:18:15,920 --> 00:18:18,190 this all makes sense at the high level, right? 279 00:18:18,280 --> 00:18:24,530 But I'm trying to come up with a more concrete example of the most basic version 280 00:18:24,580 --> 00:18:30,005 of an algorithm that you can have and to understand what it would look like, 281 00:18:30,425 --> 00:18:34,185 because like you said, the main difference being between something like binary 282 00:18:34,185 --> 00:18:39,855 search, when you've got a well understood algorithm, that's well, analyzed. 283 00:18:40,910 --> 00:18:44,730 And then you just apply it to data and you get some output. 284 00:18:44,760 --> 00:18:48,880 Whereas in the machine learning algorithm world, you're doing kind of the opposite. 285 00:18:48,890 --> 00:18:53,740 You're learning the rules and trying to come up with the actual algorithm. 286 00:18:53,800 --> 00:18:54,430 Really? 287 00:18:54,560 --> 00:18:56,170 I don't know if that's the right way of saying that. 288 00:18:56,220 --> 00:19:01,130 But, that's the bit that you're trying to figure out rather than just applying it. 289 00:19:01,970 --> 00:19:05,960 I'm trying to think, what's the simplest, algorithm that we could maybe 290 00:19:05,960 --> 00:19:10,730 talk a little bit more in detail, of how it works, because, like I said, 291 00:19:10,740 --> 00:19:14,500 it's abstract and it might be a little bit hard to wrap your head around 292 00:19:14,540 --> 00:19:16,450 how that actually works in practice. 293 00:19:17,125 --> 00:19:20,065 in my book, I talk about a lot of different algorithms. 294 00:19:20,105 --> 00:19:22,285 we can touch on a number of different algorithms. 295 00:19:22,315 --> 00:19:24,295 but let's start with decision trees. 296 00:19:24,935 --> 00:19:26,685 I think they're widely used. 297 00:19:26,705 --> 00:19:28,765 that's their interpretable models. 298 00:19:29,005 --> 00:19:33,205 Essentially, a decision tree learns to construct a sequence 299 00:19:33,205 --> 00:19:35,755 of if else conditions, right? 300 00:19:36,230 --> 00:19:41,250 you could trace the reasoning, behind the decision tree just 301 00:19:41,250 --> 00:19:45,500 by looking at how decisions are made through that if else tree. 302 00:19:45,860 --> 00:19:51,150 for example, if you're applying for a loan and the loan gets rejected, 303 00:19:51,160 --> 00:19:56,800 then you could, analyze this decision, why the loan got rejected by looking 304 00:19:56,830 --> 00:20:00,040 at the decision tree and figuring out what branch of the decision tree 305 00:20:00,090 --> 00:20:02,730 was taken to lead to the outcome. 306 00:20:03,440 --> 00:20:07,900 and yeah, in some cases it's, an important design choice to use an 307 00:20:07,900 --> 00:20:09,860 interpretable model like a decision tree. 308 00:20:10,530 --> 00:20:15,390 And, the interpretability, extends through an ensemble of these models. 309 00:20:16,260 --> 00:20:20,680 like random forest is an ensemble of decision trees, and we could extract 310 00:20:20,690 --> 00:20:23,340 feature importances, from that. 311 00:20:23,850 --> 00:20:26,240 let's talk about decision trees in detail. 312 00:20:26,880 --> 00:20:30,290 essentially it's a greedy and recursive algorithm that starts 313 00:20:30,290 --> 00:20:31,890 with a certain depth of a tree. 314 00:20:32,315 --> 00:20:37,805 And it grows, depth on each iteration, the maximum depth is reached. 315 00:20:38,330 --> 00:20:41,820 It's trying to optimize the genie index, which is a measure of, impurity. 316 00:20:42,880 --> 00:20:48,210 and, we are at each iteration trying to understand how to, 317 00:20:49,690 --> 00:20:57,790 divide our feature range into one that optimizes for genie index. 318 00:20:58,345 --> 00:21:02,945 And once we complete one level, we move on to the next level of the tree and so 319 00:21:02,945 --> 00:21:04,755 on until the maximum depth is reached. 320 00:21:05,115 --> 00:21:07,825 So it's a greedy algorithm and it's a recursive algorithm. 321 00:21:09,095 --> 00:21:12,615 And, the one I'm talking about is called CART, C A R T. 322 00:21:12,752 --> 00:21:18,522 So is that a deterministic way of doing that or did the maximum, depth 323 00:21:18,602 --> 00:21:23,782 you mentioned, is that an arbitrary decision, a hyper parameter effectively? 324 00:21:23,995 --> 00:21:26,545 the algorithm itself is, being greedy. 325 00:21:26,555 --> 00:21:27,875 It's deterministic. 326 00:21:27,935 --> 00:21:31,445 However, there's a way to introduce randomness and this is what's done 327 00:21:31,445 --> 00:21:35,675 in random forest is, you could introduce randomness in several ways. 328 00:21:35,705 --> 00:21:40,405 You could, sample the features that you're evaluating at each iteration. 329 00:21:40,955 --> 00:21:42,865 You sample introduces randomness. 330 00:21:43,375 --> 00:21:50,705 you could also run the algorithm on the subset of data, so that the data that the 331 00:21:50,705 --> 00:21:52,675 algorithm sees is different each time. 332 00:21:53,385 --> 00:21:57,135 and this is important because you are trying to reduce the 333 00:21:57,135 --> 00:22:00,985 variance, uh, the algorithm. 334 00:22:01,610 --> 00:22:05,460 basically, if you're working in the regression setting where you're trying 335 00:22:05,460 --> 00:22:11,320 to predict a continuous quantity using random forest, for example, so you have 336 00:22:11,320 --> 00:22:15,880 a random forest regressor, then you want to minimize the mean square error. 337 00:22:17,440 --> 00:22:20,850 And mean square error could be written as bias squared plus variance. 338 00:22:21,490 --> 00:22:24,830 So to minimize mean square error, you want to minimize, bias, and 339 00:22:24,870 --> 00:22:26,960 you want to minimize, variance. 340 00:22:28,090 --> 00:22:33,000 The way to minimize variance is by taking an average of, large number of trees. 341 00:22:34,030 --> 00:22:37,160 And, important to make sure that trees are de correlated. 342 00:22:37,720 --> 00:22:40,930 Because this will help, actually minimize the variance. 343 00:22:41,830 --> 00:22:46,340 and then injecting randomness into individual decision trees 344 00:22:46,630 --> 00:22:48,720 will help, decorrelate them. 345 00:22:49,050 --> 00:22:51,860 So they're basically different looking trees. 346 00:22:52,445 --> 00:22:56,705 So like in a practical sense, let's say, go back to the example of what you, 347 00:22:56,755 --> 00:23:04,065 suggested, what you mentioned, decision tree to either grant or deny your request 348 00:23:04,065 --> 00:23:10,728 for a loan Would that, decision tree be recalculated on the fly, or you run the 349 00:23:10,728 --> 00:23:16,018 algorithm once you've got your current best model for deciding whether to 350 00:23:16,018 --> 00:23:19,878 give people loans and you version that? 351 00:23:20,178 --> 00:23:23,528 because I understand that the decision tree is the actual output of your 352 00:23:23,608 --> 00:23:25,218 machine learning algorithm, right? 353 00:23:25,358 --> 00:23:29,208 And then the decision tree is like an algorithm in itself, right? 354 00:23:29,958 --> 00:23:32,578 That you run to evaluate whether you give the loan or not. 355 00:23:33,048 --> 00:23:34,888 So how does that work in practice? 356 00:23:34,998 --> 00:23:38,308 I would say there are two different modes of machine learning algorithms. 357 00:23:38,308 --> 00:23:40,638 One is training and the other is testing. 358 00:23:41,278 --> 00:23:45,268 So in training, you're learning all the parameters that are learnable 359 00:23:45,418 --> 00:23:46,708 in the machine learning algorithm. 360 00:23:47,548 --> 00:23:49,868 and, you need to have the right data for it. 361 00:23:49,938 --> 00:23:52,118 you need to have the labels in this case. 362 00:23:52,358 --> 00:23:55,718 While during testing, you fix the parameters that you've learned 363 00:23:56,088 --> 00:24:01,878 and you're focusing on prediction, meaning given new input data, what 364 00:24:01,908 --> 00:24:06,718 would the output be like given a new customer with their own profile? 365 00:24:06,748 --> 00:24:09,948 What should the output be for that particular person? 366 00:24:10,065 --> 00:24:10,435 Okay. 367 00:24:10,565 --> 00:24:14,425 So what I'm picturing is like a massive database of, okay, this person 368 00:24:14,425 --> 00:24:16,005 with all the parameters about them. 369 00:24:16,005 --> 00:24:17,325 This is their business plan. 370 00:24:17,325 --> 00:24:19,655 This is their, previous exits and stuff like that. 371 00:24:19,655 --> 00:24:24,005 And this is the amount they want and the decision, that were previously 372 00:24:24,005 --> 00:24:28,785 made by humans, you use that to somehow feed into, the decision tree maker, 373 00:24:29,055 --> 00:24:30,645 is that the right way of saying that 374 00:24:30,720 --> 00:24:31,030 Yeah. 375 00:24:31,975 --> 00:24:35,375 and spits out a decision tree version 1. 376 00:24:35,375 --> 00:24:35,530 7. 377 00:24:36,270 --> 00:24:38,130 that you start running, right? 378 00:24:38,130 --> 00:24:39,114 is that how it works? 379 00:24:39,834 --> 00:24:40,994 yeah, I would imagine so. 380 00:24:42,014 --> 00:24:47,394 what are some of the difficulties in terms of actual software implementation 381 00:24:47,394 --> 00:24:53,014 of this things, Again, going back to a binary search, you got that we've got 382 00:24:53,044 --> 00:24:58,184 some people who thought about that, they came up with this optimized idea. 383 00:24:58,234 --> 00:25:02,704 Then we got a few people who sat down and optimized that for whatever hardware. 384 00:25:03,304 --> 00:25:06,984 And we've got a pretty speedy binary search or I don't know, quicksort. 385 00:25:08,264 --> 00:25:11,864 this algorithm, there seem to be much more custom and much 386 00:25:11,884 --> 00:25:13,554 more, Aligned with the data. 387 00:25:14,354 --> 00:25:17,984 So what are some of the complications of that in terms 388 00:25:17,984 --> 00:25:19,464 of actually implementing this? 389 00:25:19,494 --> 00:25:23,314 Or maybe that's a completely wrong way of thinking about that and 390 00:25:23,584 --> 00:25:24,864 if that's the case, just tell me. 391 00:25:25,124 --> 00:25:25,384 yeah. 392 00:25:25,384 --> 00:25:28,684 When it comes to implementation, some of the computer science principles 393 00:25:28,684 --> 00:25:32,664 that you mentioned, they carry over, and I can talk about Some of the 394 00:25:32,674 --> 00:25:37,014 computer science paradigms, like algorithmic paradigms, later as well. 395 00:25:37,814 --> 00:25:40,014 but yeah, it's a matter of getting it right. 396 00:25:40,334 --> 00:25:44,604 I think the correctness of the algorithm is very important. 397 00:25:45,654 --> 00:25:51,574 computational complexity, like runtime and, memory complexity are also important. 398 00:25:52,629 --> 00:25:55,299 Being able to scale the algorithm is important. 399 00:25:55,399 --> 00:25:57,539 it's an important challenge. 400 00:25:58,759 --> 00:26:03,689 some algorithms like random forests are more amenable to parallelization because 401 00:26:03,709 --> 00:26:08,629 the trees are generated in parallel, whereas, another ensemble like, boosted 402 00:26:08,639 --> 00:26:16,184 algorithms, they work by fitting, sequentially residuals of trees, they're 403 00:26:16,194 --> 00:26:20,934 work in sequential manner, so there are less amenable to parallelization. 404 00:26:22,084 --> 00:26:26,474 I would say the number one challenge is to get, the math correctly, and 405 00:26:26,474 --> 00:26:28,304 then to translate that math into code. 406 00:26:29,124 --> 00:26:33,444 And then from there on to have low computational, low memory complexity. 407 00:26:33,444 --> 00:26:36,514 So I guess it's not all that different after all. 408 00:26:37,244 --> 00:26:42,334 I'm guessing some of the things will be common, you mentioned the greedy aspect 409 00:26:42,384 --> 00:26:44,844 that comes from, the classic algorithms. 410 00:26:45,414 --> 00:26:48,564 I'm guessing a lot of that will be, dynamic programming, and you're probably 411 00:26:48,564 --> 00:26:53,244 going to apply all the usual tricks, like divide and conquer and stuff like that 412 00:26:53,284 --> 00:27:00,164 Wherever you can, but is there anything like particularly common and unusual that 413 00:27:00,164 --> 00:27:04,964 you wouldn't be doing with, classical algorithms that you do a lot in ML? 414 00:27:05,569 --> 00:27:09,819 there's different phases like training and testing, right? 415 00:27:09,859 --> 00:27:12,279 Learning the parameters and predicting the parameters. 416 00:27:13,089 --> 00:27:18,409 the notion of learnable parameters themselves, I think, is key difference. 417 00:27:18,829 --> 00:27:19,499 What's that? 418 00:27:19,519 --> 00:27:21,599 What are learnable parameters? 419 00:27:22,064 --> 00:27:25,692 essentially like variables that you try to fit, variables that 420 00:27:25,692 --> 00:27:28,082 you try to optimize for data. 421 00:27:28,082 --> 00:27:33,260 It's like room for growth or room for, adaptability in an algorithm itself. 422 00:27:33,260 --> 00:27:35,960 Having an objective function is another key differentiator. 423 00:27:36,370 --> 00:27:40,740 a lot of Bayesian algorithms, they maximize the log likelihood or 424 00:27:40,740 --> 00:27:42,590 minimize the negative log likelihood. 425 00:27:42,700 --> 00:27:44,270 that's another difference. 426 00:27:44,320 --> 00:27:48,420 a methodology for learning these parameters would be another difference. 427 00:27:48,540 --> 00:27:52,690 for example, it could be backpropagation and deep learning, right? 428 00:27:52,830 --> 00:27:56,570 There's a methodology for learning the parameters of the model. 429 00:27:57,430 --> 00:28:01,360 or it could be, Bayes rule as a way of updating the parameters, 430 00:28:01,510 --> 00:28:03,360 in a graphical model, for example. 431 00:28:04,175 --> 00:28:05,065 That makes me think. 432 00:28:05,835 --> 00:28:10,505 So is it true that at the moment all of ML is being completely 433 00:28:10,545 --> 00:28:12,745 dominated by deep learning? 434 00:28:13,825 --> 00:28:17,575 And when people talk about ML, they basically talk about deep 435 00:28:17,575 --> 00:28:19,085 learning most of the time? 436 00:28:20,255 --> 00:28:22,525 Back propagation and stuff like that has been. 437 00:28:23,110 --> 00:28:27,040 Super hot topic, because of, chat GPTs, of the world and stuff like that. 438 00:28:27,070 --> 00:28:31,720 And the rest is becoming a little bit, less in, fashion at the moment? 439 00:28:31,998 --> 00:28:32,998 it comes in waves. 440 00:28:33,158 --> 00:28:37,498 I tend to focus on fundamentals because fundamentals are never going to be 441 00:28:37,498 --> 00:28:42,988 out of fashion, solid, background and applied probability calculus, linear 442 00:28:42,988 --> 00:28:48,798 algebra, Bayesian inference, deep learning, these are all going to be 443 00:28:48,798 --> 00:28:51,148 in fashion for a really long time. 444 00:28:51,888 --> 00:28:58,528 definitely large language models showed, so much, growth in the past few years. 445 00:28:59,268 --> 00:29:03,698 And, these are deep learning models, starting from like natural language, 446 00:29:03,758 --> 00:29:10,638 machine translation, encoder decoder type architectures and, going to, GPT. 447 00:29:11,778 --> 00:29:17,748 For, and wherever the next GPT is, in size and in performance. 448 00:29:17,748 --> 00:29:19,138 it's interesting to. 449 00:29:19,138 --> 00:29:19,728 Think about it. 450 00:29:19,768 --> 00:29:23,098 I'm really happy that, they took off at such speed and 451 00:29:23,108 --> 00:29:24,668 there's so much interest in AI, 452 00:29:24,778 --> 00:29:25,838 so what was that? 453 00:29:25,948 --> 00:29:31,118 2019 or something like that when the first version of ChatGPT came out, right? 454 00:29:31,188 --> 00:29:32,428 it's been a few years now. 455 00:29:33,508 --> 00:29:39,318 as someone who specializes in a lot of this, fundamental, algorithms and 456 00:29:39,348 --> 00:29:43,278 understands how they're derived and where they come from and their limitations. 457 00:29:43,878 --> 00:29:47,098 What do you think of all the hype that's currently flowing around, 458 00:29:47,158 --> 00:29:53,508 AGI being just around the corner and AI taking your job and all of that, 459 00:29:53,838 --> 00:29:56,018 I'm a believer in co pilots. 460 00:29:56,028 --> 00:29:59,638 So I think, AI is helping people with their job. 461 00:29:59,848 --> 00:30:02,958 I'm not sure if they're going to be taking over the job, but, also a big 462 00:30:02,958 --> 00:30:08,603 believer in automation, automation as a way of helping a developer deal with 463 00:30:08,603 --> 00:30:10,903 less pleasant aspects of the job, right? 464 00:30:10,983 --> 00:30:13,393 if AI can do that, that's fantastic. 465 00:30:13,903 --> 00:30:17,033 But I think a lot of the planning and thinking is still up to the 466 00:30:17,033 --> 00:30:19,663 human, to reason, to decide. 467 00:30:19,953 --> 00:30:22,953 Yeah, I benefited a lot from co pilots. 468 00:30:23,003 --> 00:30:28,933 they're really great at summarizing a lot of resources available online and through, 469 00:30:29,313 --> 00:30:31,543 retro augmented generation systems. 470 00:30:32,333 --> 00:30:33,733 You could accomplish a lot. 471 00:30:33,783 --> 00:30:35,593 I'm a big believer in co pilots. 472 00:30:36,285 --> 00:30:39,745 I think this is probably something that might be getting a little bit 473 00:30:40,615 --> 00:30:45,285 of, bad rep, because everybody just wants like the final step, right? 474 00:30:45,575 --> 00:30:47,885 It was the same thing with self driving cars. 475 00:30:48,615 --> 00:30:53,855 my Tesla is driving itself pretty well, maybe 95% of the time, if I'm 476 00:30:53,855 --> 00:30:59,200 on like a longer route and I'm on the motorway or whatever, It's doing most 477 00:30:59,200 --> 00:31:02,870 of the work already pretty well, I'm still responsible for it and I have to 478 00:31:02,870 --> 00:31:06,200 look but what everybody wants is like the final step when you can just kick 479 00:31:06,200 --> 00:31:12,450 back and relax and not do any of that and I think that's understandable. 480 00:31:12,460 --> 00:31:17,590 But at the same time it's like making the current, intermediate step of a co 481 00:31:17,590 --> 00:31:21,720 pilot situation, maybe sounds a little bit less glamorous than it actually 482 00:31:21,740 --> 00:31:23,430 is because it's already pretty cool. 483 00:31:24,400 --> 00:31:26,140 so totally agree with you on that. 484 00:31:26,160 --> 00:31:31,240 we've done one example of the decision tree. 485 00:31:32,020 --> 00:31:38,240 I wonder what would be like your top three hall of fame machine algorithms. 486 00:31:38,270 --> 00:31:38,700 I. 487 00:31:39,155 --> 00:31:42,555 saw your book, and there are some of the things that I keep seeing 488 00:31:42,595 --> 00:31:47,365 elsewhere, like Markov chains and Monte Carlo stuff like that. 489 00:31:47,415 --> 00:31:50,695 there are some of the things that sound interesting, like genetic algorithms 490 00:31:50,695 --> 00:31:52,375 and, I wonder what that actually means. 491 00:31:52,375 --> 00:31:56,965 But if you were to give us like your top three favorite, Hall of Fame 492 00:31:57,005 --> 00:32:01,655 algorithms and tell us a little bit how they work, high level again for 493 00:32:01,715 --> 00:32:03,285 a five year old software engineer. 494 00:32:03,830 --> 00:32:05,070 What will be your selection? 495 00:32:05,120 --> 00:32:06,050 What's on that menu? 496 00:32:06,235 --> 00:32:09,135 definitely have to mention one of them would be a Markov chain 497 00:32:09,135 --> 00:32:10,645 Monte Carlo type algorithm. 498 00:32:10,695 --> 00:32:15,285 so what Markov chains are essentially it's a sequence of random variables and, 499 00:32:15,335 --> 00:32:18,015 the future is independent of the past. 500 00:32:18,065 --> 00:32:22,795 So the future state of random variable only depends on the present state, which 501 00:32:22,795 --> 00:32:26,425 reminds me of a quote that, doesn't really matter where you're coming from, 502 00:32:26,455 --> 00:32:30,325 all that really matters is where you're going, so Markov chain Monte Carlo, 503 00:32:30,415 --> 00:32:35,015 one of my favorite algorithms in that area is Metropolis Hastings algorithm. 504 00:32:35,715 --> 00:32:39,315 And, idea there is you're after a posterior distribution. 505 00:32:39,765 --> 00:32:44,065 you want to draw samples from this posterior distribution. 506 00:32:44,095 --> 00:32:46,145 You want to, study it, analyze it. 507 00:32:46,525 --> 00:32:48,855 Posterior is like the goal, the answer. 508 00:32:49,385 --> 00:32:53,265 But it's hard to sample from it, because it's, in real 509 00:32:53,265 --> 00:32:54,965 life models, they're complex. 510 00:32:55,665 --> 00:32:58,935 And, what you do instead is you approximate it with something 511 00:32:58,935 --> 00:33:00,585 called a proposal distribution. 512 00:33:01,495 --> 00:33:04,405 And a proposal distribution is easier to sample from. 513 00:33:04,405 --> 00:33:08,905 So what happens is you draw samples from a proposal distribution, and then 514 00:33:08,905 --> 00:33:13,445 based on Metropolis Hastings ratio, you evaluate these samples, and you 515 00:33:13,445 --> 00:33:15,555 either accept them or reject them. 516 00:33:15,665 --> 00:33:17,850 You either take them or you drop them. 517 00:33:18,560 --> 00:33:20,810 And you repeat this process many times. 518 00:33:22,280 --> 00:33:26,270 so Metropolis Hastings enables sampling from these high dimensional 519 00:33:26,280 --> 00:33:31,960 distribution spaces, and, it's, simple enough to implement from scratch. 520 00:33:31,990 --> 00:33:33,880 it's a great algorithm, overall. 521 00:33:33,890 --> 00:33:35,800 There are various improvements on top of it. 522 00:33:35,800 --> 00:33:38,870 It's definitely not the most efficient. 523 00:33:40,605 --> 00:33:42,445 Algorithm, but it's a really good one. 524 00:33:42,735 --> 00:33:45,105 It's, that's why I bring it up. 525 00:33:45,135 --> 00:33:48,215 but how do you come up with this proposal distribution? 526 00:33:48,605 --> 00:33:52,245 Proposals are something that's easier to sample from. 527 00:33:52,245 --> 00:33:55,935 So it could be a Gaussian with certain mean covariance, like a multivariate 528 00:33:55,975 --> 00:33:57,985 Gaussian and high dimensional problems. 529 00:33:58,870 --> 00:34:03,750 You know, typically you want to have a high acceptance ratio, so 530 00:34:03,750 --> 00:34:07,340 the closer your proposal is to the actual target distribution, the 531 00:34:07,370 --> 00:34:09,540 target posterior, then the better. 532 00:34:09,590 --> 00:34:15,680 so you're trying to estimate, based on domain knowledge or 533 00:34:15,680 --> 00:34:19,960 otherwise, the proximity, how close can you get to the target. 534 00:34:20,011 --> 00:34:20,421 I see. 535 00:34:20,861 --> 00:34:23,601 because I keep, thinking the classical way about it. 536 00:34:23,611 --> 00:34:26,981 So it's not like one of those algorithms where you just have 537 00:34:26,981 --> 00:34:31,541 the steps, there's a step which is basically suggest a reasonable 538 00:34:32,601 --> 00:34:34,911 distribution that approximates it. 539 00:34:35,206 --> 00:34:38,826 with something that's well known and look at the data and come up with, 540 00:34:38,876 --> 00:34:40,656 something that should be reasonable. 541 00:34:40,686 --> 00:34:44,766 And then you used, Metropolis Hastings to evaluate, basically. 542 00:34:44,943 --> 00:34:46,613 it's much more artisanal, right? 543 00:34:46,623 --> 00:34:51,223 there's always this step of, staring at the data and looking and coming 544 00:34:51,223 --> 00:34:54,778 up with, mix of your experience and creativity to come up with 545 00:34:54,778 --> 00:34:56,218 something that sounds about right. 546 00:34:56,968 --> 00:34:57,788 which is scary. 547 00:34:58,108 --> 00:35:01,068 for someone who comes from, very exact word of, algorithms. 548 00:35:01,108 --> 00:35:02,808 This is a, this is scary stuff. 549 00:35:02,808 --> 00:35:05,888 All right, cool. 550 00:35:05,918 --> 00:35:10,218 So Metropolis Hastings, is that two names 551 00:35:10,298 --> 00:35:14,008 I believe these are names are named after the inventors of the algorithm. 552 00:35:14,248 --> 00:35:15,748 So that's an interesting approach. 553 00:35:15,748 --> 00:35:18,708 What will be your number two of your top three 554 00:35:18,916 --> 00:35:24,066 I would pick, approximate nearest neighbors because of its popularity 555 00:35:24,076 --> 00:35:29,026 and, current like ritual augmented generation systems they're used. 556 00:35:30,071 --> 00:35:30,641 everywhere. 557 00:35:31,411 --> 00:35:35,991 essentially, approximate nearest neighbors is an improvement of K nearest neighbors. 558 00:35:36,041 --> 00:35:41,531 with K nearest neighbors, if you're given, a query point, you want to compute the 559 00:35:41,531 --> 00:35:46,711 distance between that query point and all the other points in the training data set. 560 00:35:46,761 --> 00:35:50,451 You want to compute the distances and then sort these distances and then 561 00:35:50,481 --> 00:35:53,221 select top k closest distance points. 562 00:35:53,971 --> 00:35:57,341 So this is highly computationally intensive operation. 563 00:35:57,391 --> 00:36:01,371 first of all, you have to compute, and the dimensional distances. 564 00:36:01,431 --> 00:36:06,251 Then you have to sort and then log in and select the top K. 565 00:36:06,771 --> 00:36:09,831 So approximate nearest neighbors is a way to get around it. 566 00:36:10,851 --> 00:36:15,356 And, there are Three approximate nearest neighbour flavors that I could talk 567 00:36:15,366 --> 00:36:22,226 about, one is tree based nn essentially, what you do with tree based nn is, 568 00:36:23,466 --> 00:36:29,253 you divide up the space into regions, and each leaf in the tree is a region. 569 00:36:29,253 --> 00:36:31,596 So one example is like KD trees. 570 00:36:31,616 --> 00:36:34,376 and then based on the problem, is it a classification 571 00:36:34,376 --> 00:36:35,926 problem or regression problem? 572 00:36:36,496 --> 00:36:41,386 You can compute the final answer either by, taking majority vote 573 00:36:41,776 --> 00:36:46,346 for classification or taking an average of points in the 574 00:36:46,356 --> 00:36:48,136 region for regression problem. 575 00:36:48,801 --> 00:36:53,211 we take a quick detour to say, because those are words that have meaning, 576 00:36:53,211 --> 00:36:58,291 but they probably have more particular meaning in, the machine learning context. 577 00:36:58,291 --> 00:37:02,311 So could you quickly tell us what's the difference between regression and 578 00:37:02,491 --> 00:37:04,431 classification types of algorithms? 579 00:37:04,613 --> 00:37:08,473 so in regression, you're interested in estimating a continuous quantity. 580 00:37:09,438 --> 00:37:14,668 so a real value, such as, let's say, a stock price in classification. 581 00:37:14,678 --> 00:37:18,368 You're interested in, estimating a discrete quantity. 582 00:37:19,038 --> 00:37:23,258 For example, it could be a particular, customer age group. 583 00:37:24,088 --> 00:37:27,808 so the differences in the quantity you're estimating for continuous, it's regression 584 00:37:27,838 --> 00:37:29,328 for discrete, it's classification. 585 00:37:30,173 --> 00:37:36,043 back to approximate nearest neighbor is, tree based nn, we have our 586 00:37:36,043 --> 00:37:38,943 space, which we divide into regions. 587 00:37:40,053 --> 00:37:43,633 and, based on the points in each region and the task at hand, 588 00:37:43,643 --> 00:37:47,843 we either average the points if we're looking for regression. 589 00:37:48,693 --> 00:37:54,023 answer or we take majority vote is, looking at the class labels and taking 590 00:37:54,023 --> 00:37:59,553 the majority label as the answer, if the problem is a classification problem. 591 00:37:59,603 --> 00:38:00,403 another. 592 00:38:01,573 --> 00:38:05,083 Example of approximate nearest neighbors is locality sensitive hashing. 593 00:38:06,133 --> 00:38:11,423 what we do there is we essentially group points into buckets, based 594 00:38:11,423 --> 00:38:13,223 on their proximity with each other. 595 00:38:13,953 --> 00:38:17,963 And instead of searching through all the points, we only look inside the 596 00:38:17,963 --> 00:38:20,813 bucket to find the k nearest neighbors. 597 00:38:21,603 --> 00:38:25,303 So this helps reduce computational complexity dramatically. 598 00:38:26,478 --> 00:38:29,648 So first cluster them using one of the other algorithms, and then 599 00:38:29,648 --> 00:38:31,208 you just look inside the cluster. 600 00:38:32,458 --> 00:38:34,718 That will be the third type is clustering. 601 00:38:35,448 --> 00:38:40,018 in the clustering sense, we cluster the points into clusters 602 00:38:40,018 --> 00:38:41,798 and only look inside the cluster. 603 00:38:42,303 --> 00:38:45,903 So the buckets we were talking about, how are they different from clusters? 604 00:38:46,003 --> 00:38:48,273 the buckets are formed in a slightly different way. 605 00:38:48,373 --> 00:38:52,993 essentially, you can visualize it as points on a high dimensional sphere 606 00:38:53,433 --> 00:38:59,113 and you intersect the points with hyperplanes and points that are captured 607 00:38:59,113 --> 00:39:03,503 between the hyperplanes forming, into a bucket are placed into the same bucket. 608 00:39:03,513 --> 00:39:05,543 So they're based on locality there. 609 00:39:06,193 --> 00:39:10,143 it points are closer together on that sphere, get grouped into the same bucket. 610 00:39:11,246 --> 00:39:11,726 Okay. 611 00:39:12,016 --> 00:39:12,416 All right. 612 00:39:12,863 --> 00:39:17,133 so these three methods, the, Tree based nn, locality assigns to hashing 613 00:39:17,133 --> 00:39:23,283 nn, and, based nn they help speed up, complexity of exact K and N. 614 00:39:23,283 --> 00:39:28,183 So what would be some of the applications of this approximate nearest neighbors? 615 00:39:28,253 --> 00:39:31,323 where would we see that in practice maybe in production? 616 00:39:32,273 --> 00:39:33,503 can you give us an example? 617 00:39:34,036 --> 00:39:40,226 so in Retrieval Augmented Generation Systems, you have a vector store and 618 00:39:40,236 --> 00:39:44,466 you're interested in retrieving closest or in semantic search, for example, 619 00:39:44,466 --> 00:39:45,646 you're interested in retrieving. 620 00:39:46,176 --> 00:39:49,106 closest, unit from the vector store 621 00:39:49,976 --> 00:39:52,666 And like a bunch of dimensions. 622 00:39:52,666 --> 00:39:52,966 Yeah. 623 00:39:54,041 --> 00:39:57,321 So you can use this approximation to get something quicker. 624 00:39:57,831 --> 00:40:00,201 Even if it's not exact, which is pretty cool. 625 00:40:00,371 --> 00:40:00,801 All right. 626 00:40:01,031 --> 00:40:04,891 So that was number two on your top three lists. 627 00:40:05,581 --> 00:40:06,441 What's number three? 628 00:40:06,998 --> 00:40:12,308 I would say, attention and transformers, I would say my third, 629 00:40:12,478 --> 00:40:16,878 on the list of favorite algorithms, self attention methods, they 630 00:40:16,878 --> 00:40:19,228 really revolutionize the space. 631 00:40:20,538 --> 00:40:26,628 And, the idea there is to attend to the context. 632 00:40:27,518 --> 00:40:30,188 originating from, neural machine translation. 633 00:40:30,728 --> 00:40:35,478 if we are translating a target word, to a different language, we need to 634 00:40:35,478 --> 00:40:38,058 understand the context around that word. 635 00:40:38,208 --> 00:40:42,008 We need to understand the whole sentence around it before we can translate a 636 00:40:42,008 --> 00:40:48,138 single word and, self attention mechanisms enable us to do just that, and in a 637 00:40:48,208 --> 00:40:55,828 paralyzed fashion, it could also be seen as a soft dictionary lookup with query 638 00:40:55,828 --> 00:41:04,138 key value pairs in which the target word is a query and you're computing in 639 00:41:04,168 --> 00:41:08,478 the product between the query and the key, and multiplying that by the value 640 00:41:08,478 --> 00:41:11,348 stored in that soft lookup dictionary. 641 00:41:12,168 --> 00:41:15,548 And, that's how you get the famous, formula for involving 642 00:41:15,548 --> 00:41:16,758 those three variables. 643 00:41:16,758 --> 00:41:21,518 essentially we're trying to understand the context and the contribution of 644 00:41:21,578 --> 00:41:25,378 every word in the sentence to the target word in which we're translating. 645 00:41:26,268 --> 00:41:32,148 and, we have, Learnable parameters, so we're keeping track of, word 646 00:41:32,158 --> 00:41:36,108 embeddings and we're keeping track of word position we have these learnable 647 00:41:36,138 --> 00:41:40,998 parameters, which help us, find the closest map in the target language 648 00:41:40,998 --> 00:41:42,588 to the word, which we're translating. 649 00:41:42,598 --> 00:41:42,628 Okay. 650 00:41:45,008 --> 00:41:48,178 Okay, so I got, a sentence. 651 00:41:48,258 --> 00:41:48,938 I don't know. 652 00:41:49,028 --> 00:41:50,688 I like cats. 653 00:41:51,638 --> 00:41:56,828 And we want to understand, the like, how is it connected to the cats, right? 654 00:41:57,208 --> 00:41:57,598 Uh huh. 655 00:41:57,928 --> 00:42:02,828 so does it mean that we're calculating like the complete product of, connections 656 00:42:02,828 --> 00:42:08,438 between all the pairs of words, embeddings or whatever is underlying there. 657 00:42:09,418 --> 00:42:10,158 How does it work? 658 00:42:10,178 --> 00:42:15,148 I do have a chapter in my book on self attention and transformers, it 659 00:42:15,148 --> 00:42:19,068 comes back to attention's all we need architecture, the encoder decoder 660 00:42:19,068 --> 00:42:21,498 architecture, the paper, first introduced 661 00:42:21,578 --> 00:42:21,728 it. 662 00:42:21,798 --> 00:42:23,088 famous paper. 663 00:42:23,438 --> 00:42:26,078 yeah, essentially we're predicting one word at a time. 664 00:42:27,038 --> 00:42:29,628 in a masked, causal way, right? 665 00:42:29,758 --> 00:42:34,348 and are looking at all the words that came before that word and 666 00:42:34,388 --> 00:42:39,378 figuring out the highest probability next word in our dictionary. 667 00:42:39,948 --> 00:42:43,478 and of course there's different varieties of, architectures now when 668 00:42:43,478 --> 00:42:48,998 it comes to transformers, there's the decoder only GPT family, then there are 669 00:42:49,008 --> 00:42:54,458 encoder only BERT, then there's encoder decoder architectures and, like T5. 670 00:42:54,538 --> 00:42:57,498 And they're suitable for different applications. 671 00:42:58,018 --> 00:43:00,858 GPT has been very popular when it comes to generative AI. 672 00:43:00,908 --> 00:43:04,518 We've got the top three from Vadim. 673 00:43:04,538 --> 00:43:08,598 For anybody else who, is struggling a little bit like me 674 00:43:08,628 --> 00:43:12,998 to, go through that, probably the best way, is to go grab a book. 675 00:43:13,048 --> 00:43:15,138 the book is still in MEAP, right? 676 00:43:15,138 --> 00:43:17,068 The mining early access program. 677 00:43:17,128 --> 00:43:19,688 And, I think I looked it up on the website. 678 00:43:20,133 --> 00:43:23,613 I think it said August this year for final version. 679 00:43:23,683 --> 00:43:24,323 Is that right? 680 00:43:25,093 --> 00:43:29,813 everything is written and finished, it's up to production folks at manning to 681 00:43:30,093 --> 00:43:32,573 actually have the print version ready. 682 00:43:32,603 --> 00:43:37,683 the PDF is available and all the contents are there right now. 683 00:43:38,553 --> 00:43:39,063 Got it. 684 00:43:39,083 --> 00:43:40,713 Manning, please hurry up. 685 00:43:41,343 --> 00:43:42,663 We want the book finished. 686 00:43:43,783 --> 00:43:45,343 What's next for you, Vadim? 687 00:43:45,403 --> 00:43:50,593 I've been thinking about, maybe making an online course on machine learning topic. 688 00:43:51,723 --> 00:43:53,753 I'm exploring different media right now. 689 00:43:53,803 --> 00:43:55,123 writing a book is one media. 690 00:43:55,123 --> 00:43:57,153 I'm getting into YouTube a little bit more. 691 00:43:57,173 --> 00:43:59,453 I, started posting content on YouTube. 692 00:44:00,223 --> 00:44:04,553 I also have an Instagram channel, at the life guide now, which I talk 693 00:44:04,563 --> 00:44:10,143 about inspirational, motivational content related to different quotes 694 00:44:10,143 --> 00:44:14,433 and different things that helped me grow and go through, difficult periods, 695 00:44:14,593 --> 00:44:18,993 kind of things that help me and things I want to share with the world. 696 00:44:19,923 --> 00:44:20,983 yeah, growing those. 697 00:44:22,003 --> 00:44:26,323 channels and maybe looking at online courses is my next step. 698 00:44:27,788 --> 00:44:32,638 Yeah, I think to be honest with you, YouTube is probably my favorite way 699 00:44:32,638 --> 00:44:34,138 of learning things at the moment. 700 00:44:34,558 --> 00:44:37,798 It's basically got everything and anything that you need. 701 00:44:37,808 --> 00:44:41,818 And on any topic, really, you're going to find something and many 702 00:44:41,818 --> 00:44:45,058 topics, you're going to find so many different ways of explaining something. 703 00:44:45,778 --> 00:44:48,878 And it's a nice medium because it's so flexible, right? 704 00:44:48,938 --> 00:44:53,978 You can explain, you can show, you can give examples, you can demonstrate. 705 00:44:54,448 --> 00:44:55,028 It's amazing. 706 00:44:55,393 --> 00:45:00,533 if, our civilization fails some thousand years from now, I hope that 707 00:45:00,543 --> 00:45:04,633 YouTube survives because for the next one to pick it up, that's a lot of 708 00:45:04,653 --> 00:45:09,063 knowledge that's encoded in there and in a very nice, to consume way. 709 00:45:09,063 --> 00:45:11,743 I'm going to ask you before I let you go. 710 00:45:12,193 --> 00:45:13,683 for some predictions. 711 00:45:13,973 --> 00:45:18,913 Given, crazy rate of acceleration in all the things. 712 00:45:18,963 --> 00:45:22,313 There seem to be an AI startup on every corner now. 713 00:45:22,923 --> 00:45:26,043 And they seem to be going, almost as quickly as they're coming. 714 00:45:27,013 --> 00:45:31,623 Where do you think we're going to see most, development in the coming years? 715 00:45:32,173 --> 00:45:36,433 where would you personally love to see development in the coming years? 716 00:45:36,778 --> 00:45:40,128 actually recently attended a keynote that, machine learning data science 717 00:45:40,128 --> 00:45:42,208 conference, and bloods at Microsoft. 718 00:45:42,228 --> 00:45:47,748 And, I was really inspired by this, agents and, autonomous thinking units. 719 00:45:48,093 --> 00:45:52,373 And, as part of co pilots and assistance, and, there's so much 720 00:45:52,373 --> 00:45:55,783 room for growth in that space. 721 00:45:55,823 --> 00:45:59,933 there's different form factors like we're used to our phones and laptops, right? 722 00:45:59,933 --> 00:46:00,863 But imagine. 723 00:46:01,408 --> 00:46:06,338 Having co pilot that's not on your phone or on your laptop, but, somebody 724 00:46:06,338 --> 00:46:10,958 who's portable, somebody who's with you, somebody who's understands you 725 00:46:10,958 --> 00:46:16,538 really well and helps you do your tasks or helps you, have a good time. 726 00:46:17,348 --> 00:46:20,558 yeah, so different form factors, like a portable co pilot, 727 00:46:20,708 --> 00:46:22,318 like a device that could. 728 00:46:23,068 --> 00:46:28,238 Be with you and, learn from you and interact, with you. 729 00:46:29,098 --> 00:46:36,008 So I think that's redesigning what we have today in terms of, LLM 730 00:46:36,038 --> 00:46:41,218 agents or population of agents, not just focused on language, but 731 00:46:41,298 --> 00:46:44,938 other types of agents, I think is going to be the next step forward. 732 00:46:45,911 --> 00:46:49,181 I would like to challenge you a little bit on that, because I've been thinking 733 00:46:49,211 --> 00:46:53,831 like that initially when I was watching, for example, the rabbit R1 keynote, 734 00:46:54,271 --> 00:46:57,931 and they were giving this demo of how you're just going to talk to it. 735 00:46:57,961 --> 00:47:02,701 And it's going to, effectively use the UIs in various apps. 736 00:47:02,701 --> 00:47:04,281 And I was like, Oh, that's a great idea. 737 00:47:04,281 --> 00:47:08,351 All this apps, they have weird UI things, and I don't want to click that. 738 00:47:08,351 --> 00:47:09,311 I don't want to learn it. 739 00:47:09,351 --> 00:47:11,871 I just wish it was automated. 740 00:47:12,611 --> 00:47:14,001 And there was also humane AI. 741 00:47:15,936 --> 00:47:18,286 And they both seem to suck quite a lot. 742 00:47:18,346 --> 00:47:22,646 Like I watched some of the reviews, I even ordered the rabbit or one, and it just 743 00:47:22,646 --> 00:47:27,136 doesn't seem to be working all that well, I think humane AI was already talking 744 00:47:27,136 --> 00:47:30,916 about hoping to be acquired by someone who can take it in a better direction. 745 00:47:31,676 --> 00:47:36,476 And my thinking was actually, what is so wrong with the phones? 746 00:47:36,506 --> 00:47:40,376 there's a smartphone, it's already evolved and it's already got 747 00:47:40,876 --> 00:47:44,276 basically everything you need to run a reasonably sized model already. 748 00:47:44,931 --> 00:47:49,291 So why don't we just like the idea of just having that naturally evolve 749 00:47:49,301 --> 00:47:52,111 to be more prominent in your phone? 750 00:47:52,141 --> 00:47:54,881 And why do we need a new device for that? 751 00:47:54,881 --> 00:47:56,321 What do you think about that? 752 00:47:56,516 --> 00:47:57,966 it has to make sense, right? 753 00:47:58,076 --> 00:48:03,336 if it's not working as expected, then people are not gonna, buy it, right? 754 00:48:04,036 --> 00:48:07,906 but it has to add value to our lives. 755 00:48:07,906 --> 00:48:10,896 it could be, the interaction with the device. 756 00:48:10,986 --> 00:48:15,646 Instead of clicking, you simply use an eye tracking software and you could click 757 00:48:15,656 --> 00:48:22,206 using your eyes as an example, something seamless, something that removes the 758 00:48:22,206 --> 00:48:28,036 bottlenecks, instead of typing, of course, we have now, all the interactions with 759 00:48:28,036 --> 00:48:33,386 our devices, but something that simplifies our lives, it has to have value. 760 00:48:34,353 --> 00:48:35,393 Yeah, that's for sure. 761 00:48:36,073 --> 00:48:40,108 I'm just wondering, there's A few startups now that are working 762 00:48:40,108 --> 00:48:41,828 on this humanoid robots, right? 763 00:48:41,828 --> 00:48:45,908 There's obviously like the Tesla Optimus and a bunch of others. 764 00:48:45,938 --> 00:48:53,188 I think Unitree announced that you can now order their $16,000 mini, four or five. 765 00:48:53,268 --> 00:48:58,338 Feet tall humanoid, which is, I guess it's not mini anymore. 766 00:48:58,348 --> 00:49:06,128 It's, that's pretty big, but I'm just wondering if I actually need that yet. 767 00:49:06,188 --> 00:49:06,888 don't get me wrong. 768 00:49:06,918 --> 00:49:08,298 I would love to get one of this. 769 00:49:08,298 --> 00:49:11,668 And if I had 16 grand lying around that I had no use for, I 770 00:49:11,668 --> 00:49:12,958 would have ordered one already. 771 00:49:13,798 --> 00:49:19,668 But I do wonder, Whether that's literally around the corner or whether this is 772 00:49:19,668 --> 00:49:23,668 going to be another one of the self driving car situations where It's 773 00:49:23,688 --> 00:49:27,608 been next year for a decade and a half at least now Have you ordered one? 774 00:49:28,173 --> 00:49:30,893 no, we do have a robo vacuum though. 775 00:49:32,013 --> 00:49:32,163 Oh 776 00:49:32,213 --> 00:49:32,633 if there's 777 00:49:32,653 --> 00:49:32,713 a 778 00:49:32,913 --> 00:49:33,523 are awesome. 779 00:49:34,903 --> 00:49:38,183 if there's a way I could, reduce the amount of chores I need to do 780 00:49:38,203 --> 00:49:41,623 that could free up my time, but I know it's not an easy problem. 781 00:49:41,683 --> 00:49:45,933 even things like grasping is not, an easy problem for robotics. 782 00:49:45,933 --> 00:49:48,523 So, it might be a few more years. 783 00:49:48,523 --> 00:49:52,803 I'm glad that you're a fellow vacuum cleaner aficionado. 784 00:49:52,913 --> 00:49:53,933 I love mine. 785 00:49:54,273 --> 00:49:58,473 I upgraded last year to one that, finally has the mop thing. 786 00:49:58,483 --> 00:50:03,283 it not only vacuums, but also, mops the floor and cleans itself up and 787 00:50:03,293 --> 00:50:04,893 dries itself up and everything. 788 00:50:05,633 --> 00:50:08,443 And it's been like easily one of the best investments that I've done. 789 00:50:08,823 --> 00:50:13,353 I did have to basically change my flat layout quite significantly. 790 00:50:13,483 --> 00:50:18,983 I got rid of all the carpets now, and I laid better flooring just so that I know 791 00:50:18,983 --> 00:50:22,663 that all of the floods can be mopped and cleaned by the robot and it does 792 00:50:22,663 --> 00:50:24,523 it every day and I couldn't be happier. 793 00:50:25,063 --> 00:50:27,863 in that respect, robots, I'm looking forward. 794 00:50:28,953 --> 00:50:31,063 I can definitely bring one home. 795 00:50:31,063 --> 00:50:33,193 All right, Vadim, it's been a pleasure. 796 00:50:33,213 --> 00:50:35,703 That was probably the most challenging episode we've done. 797 00:50:36,213 --> 00:50:40,713 When you try to talk about algorithms without actually being able to show them 798 00:50:40,863 --> 00:50:48,263 and give an example or point to some code and what I'm hoping we achieved here was 799 00:50:48,263 --> 00:50:54,603 a high level map that people can now go and look up in books like yours again. 800 00:50:54,613 --> 00:50:55,713 Let me plug that. 801 00:50:55,773 --> 00:50:56,583 It's called. 802 00:50:56,753 --> 00:50:59,233 Machine learning algorithms in depth by Manning. 803 00:51:00,003 --> 00:51:02,383 My guest was Vadim Smolyakov. 804 00:51:02,403 --> 00:51:03,163 Vadim, thank you very much. 805 00:51:03,163 --> 00:51:03,783 I'll see you next time. 806 00:51:04,088 --> 00:51:04,468 Thank you.