1 00:00:00,250 --> 00:00:02,058 In this episode of Data Driven, 2 00:00:02,154 --> 00:00:03,886 frank and Andy get back to the 3 00:00:03,908 --> 00:00:05,406 data engineering side of the 4 00:00:05,428 --> 00:00:07,274 equation by speaking with Sakit 5 00:00:07,322 --> 00:00:09,802 Saurab, cofounder of Nexler. 6 00:00:09,946 --> 00:00:12,222 Nexler specializes in tools for 7 00:00:12,276 --> 00:00:13,834 automating data engineering 8 00:00:13,882 --> 00:00:16,240 processes. Now onto the show. 9 00:00:20,770 --> 00:00:22,302 Hello and welcome to Data 10 00:00:22,356 --> 00:00:24,206 Driven, the podcast where we 11 00:00:24,228 --> 00:00:25,446 explore the emergency fields of 12 00:00:25,468 --> 00:00:27,110 data science, machine learning, 13 00:00:27,180 --> 00:00:28,630 and of course, the ever present 14 00:00:28,700 --> 00:00:32,930 data engineering. This is season 15 00:00:33,010 --> 00:00:35,094 seven that we're now in and we 16 00:00:35,132 --> 00:00:38,178 are welcoming. Andy is shaking 17 00:00:38,194 --> 00:00:39,414 his head. If you're not watching 18 00:00:39,452 --> 00:00:41,142 the video, it is hard to believe 19 00:00:41,196 --> 00:00:42,362 that we hit season seven 20 00:00:42,416 --> 00:00:44,506 seasons. But by the time this is 21 00:00:44,528 --> 00:00:45,754 launched you probably have heard 22 00:00:45,792 --> 00:00:47,738 our one or two shows where we 23 00:00:47,744 --> 00:00:49,754 did kind of delve in deep. So 24 00:00:49,792 --> 00:00:50,906 you're probably tired of hearing 25 00:00:50,938 --> 00:00:57,054 us bang on about that. I really 26 00:00:57,092 --> 00:00:58,334 like kind of kicking off this 27 00:00:58,372 --> 00:01:01,402 first guest interview for season 28 00:01:01,466 --> 00:01:07,442 seven with a Saket Sarab who 29 00:01:07,576 --> 00:01:09,726 runs, who is co founder and CEO 30 00:01:09,758 --> 00:01:11,246 of a company called Nexla, whose 31 00:01:11,278 --> 00:01:13,650 tagline is automation for data 32 00:01:13,720 --> 00:01:15,006 engineering. And if there's 33 00:01:15,038 --> 00:01:15,938 anything we've heard about in 34 00:01:15,944 --> 00:01:17,542 the last, say, six to ten 35 00:01:17,596 --> 00:01:19,030 months, it's all about 36 00:01:19,180 --> 00:01:20,562 automation this, automation 37 00:01:20,626 --> 00:01:22,582 that. Whether it's Chat, GPT or 38 00:01:22,636 --> 00:01:24,694 any other kind of low code, no 39 00:01:24,732 --> 00:01:26,566 code, automation is all the 40 00:01:26,588 --> 00:01:31,946 rage. And he also very much like 41 00:01:31,968 --> 00:01:33,930 a previous guest, has a cool 42 00:01:34,000 --> 00:01:35,946 vendor tag from Gartner. So 43 00:01:35,968 --> 00:01:37,466 we're going to talk to that. 44 00:01:37,568 --> 00:01:38,394 Welcome to the show. 45 00:01:38,432 --> 00:01:41,146 Soquette thank you Frank, and 46 00:01:41,168 --> 00:01:42,800 thank you Andy. Good to be here. 47 00:01:44,130 --> 00:01:47,918 So very fascinated about kind of 48 00:01:47,924 --> 00:01:49,294 your story. In the virtual green 49 00:01:49,332 --> 00:01:51,086 room we were talking about. You 50 00:01:51,108 --> 00:01:52,954 used to write Linux drivers for 51 00:01:53,092 --> 00:01:56,674 video card manufacturers and we 52 00:01:56,712 --> 00:01:58,846 spent a few minutes on waxing 53 00:01:58,878 --> 00:02:01,554 poetic about how easy Linux has 54 00:02:01,592 --> 00:02:06,354 become. So what exactly does 55 00:02:06,392 --> 00:02:07,826 automation for data Engineering 56 00:02:07,858 --> 00:02:09,126 mean to you? I think let's start 57 00:02:09,148 --> 00:02:09,720 there. 58 00:02:10,330 --> 00:02:12,920 Yeah, I think when we look at 59 00:02:13,930 --> 00:02:15,526 enterprises and companies out 60 00:02:15,548 --> 00:02:17,206 there with a lot more data, with 61 00:02:17,228 --> 00:02:18,406 a lot more people who need to 62 00:02:18,428 --> 00:02:20,026 use data, there are two ways you 63 00:02:20,048 --> 00:02:22,442 can achieve scale. One is 64 00:02:22,496 --> 00:02:23,786 through automation and the other 65 00:02:23,808 --> 00:02:25,594 is through collaboration and 66 00:02:25,632 --> 00:02:27,366 automation. Or achieving scale 67 00:02:27,398 --> 00:02:28,474 through automation means that 68 00:02:28,512 --> 00:02:30,794 the tasks that we do today, can 69 00:02:30,832 --> 00:02:32,954 they be automated, can they 70 00:02:32,992 --> 00:02:34,266 become more intelligent? So for 71 00:02:34,288 --> 00:02:35,726 example, if I had to create a 72 00:02:35,748 --> 00:02:37,502 data pipeline and I have to 73 00:02:37,556 --> 00:02:38,846 connect to a data system and 74 00:02:38,868 --> 00:02:40,398 read that data, process it, 75 00:02:40,484 --> 00:02:41,838 maybe transform that, push the 76 00:02:41,844 --> 00:02:43,486 data somewhere, let's say it 77 00:02:43,508 --> 00:02:45,954 takes me four weeks or six weeks 78 00:02:45,992 --> 00:02:47,918 to write that code, test it QA, 79 00:02:48,014 --> 00:02:49,134 take it to production. 80 00:02:49,262 --> 00:02:50,994 Automation would basically mean 81 00:02:51,032 --> 00:02:53,602 that can a lot of these things 82 00:02:53,656 --> 00:02:55,074 be done automatically and 83 00:02:55,112 --> 00:02:56,594 faster? So can I, for example, 84 00:02:56,712 --> 00:02:57,906 not have to write a connector? 85 00:02:57,938 --> 00:02:59,590 It can get auto generated there, 86 00:02:59,660 --> 00:03:01,606 right? Can I not have to write, 87 00:03:01,788 --> 00:03:04,934 test or error conditions and 88 00:03:04,972 --> 00:03:06,726 check for them because the 89 00:03:06,748 --> 00:03:08,262 system can look at the data, 90 00:03:08,316 --> 00:03:09,558 understand its properties and 91 00:03:09,564 --> 00:03:10,534 say, oh, this would be a good 92 00:03:10,572 --> 00:03:11,798 validation for this type of 93 00:03:11,804 --> 00:03:13,674 data. For example, or if I had 94 00:03:13,712 --> 00:03:16,266 to process or run the same 95 00:03:16,288 --> 00:03:17,354 pipeline. But now the data 96 00:03:17,392 --> 00:03:18,998 volume has grown ten x, I don't 97 00:03:19,014 --> 00:03:20,278 have to go and do a whole bunch 98 00:03:20,294 --> 00:03:22,106 of engineering to manage that 99 00:03:22,128 --> 00:03:23,374 scale. The system can 100 00:03:23,412 --> 00:03:24,606 understand, oh, the scale is 101 00:03:24,628 --> 00:03:26,766 increasing. My bottom neck is in 102 00:03:26,788 --> 00:03:28,126 this part of the processing. Let 103 00:03:28,148 --> 00:03:29,566 me allocate more containers to 104 00:03:29,588 --> 00:03:31,294 that and just let it run 105 00:03:31,332 --> 00:03:33,634 smoothly. So automation is a lot 106 00:03:33,672 --> 00:03:35,186 about doing the same tasks that 107 00:03:35,208 --> 00:03:38,830 we do, but doing that faster. 108 00:03:38,910 --> 00:03:40,674 Why? Because something can 109 00:03:40,712 --> 00:03:42,562 figure out certain tasks, do it 110 00:03:42,616 --> 00:03:45,038 for us, create more reliability, 111 00:03:45,134 --> 00:03:46,382 create more repeatability, 112 00:03:46,446 --> 00:03:47,422 create better performance 113 00:03:47,486 --> 00:03:49,118 without us having to do that 114 00:03:49,144 --> 00:03:50,534 manual work. So when we go back 115 00:03:50,572 --> 00:03:51,974 into automation for data 116 00:03:52,012 --> 00:03:53,510 engineering and you understand 117 00:03:53,580 --> 00:03:54,902 that there is so much data 118 00:03:54,956 --> 00:03:56,646 engineering work to do, I think 119 00:03:56,668 --> 00:03:58,726 it's almost impossible for the 120 00:03:58,748 --> 00:03:59,994 data engineers out there to just 121 00:04:00,032 --> 00:04:01,466 support all that demand that 122 00:04:01,488 --> 00:04:03,660 they have. Automation for them 123 00:04:04,110 --> 00:04:05,366 is like something that helps 124 00:04:05,398 --> 00:04:06,598 them and supports them. And it's 125 00:04:06,614 --> 00:04:09,946 like a lot of easy use cases can 126 00:04:09,968 --> 00:04:11,226 be done automatically and 127 00:04:11,248 --> 00:04:12,378 quickly. And a lot of difficult 128 00:04:12,464 --> 00:04:14,510 use cases can have big chunks 129 00:04:15,010 --> 00:04:16,554 taken care of in various 130 00:04:16,602 --> 00:04:18,094 aspects. So that's kind of where 131 00:04:18,212 --> 00:04:19,706 that direction is and automation 132 00:04:19,738 --> 00:04:21,006 is one of the key parts to that 133 00:04:21,028 --> 00:04:21,790 scale. 134 00:04:22,530 --> 00:04:23,280 Interesting. 135 00:04:23,750 --> 00:04:26,594 Yeah. I really like your 136 00:04:26,632 --> 00:04:28,078 description of this. I'm 137 00:04:28,094 --> 00:04:29,186 wondering if it's okay if you 138 00:04:29,208 --> 00:04:31,358 share a little more detail. I've 139 00:04:31,454 --> 00:04:33,090 worked some with automating data 140 00:04:33,160 --> 00:04:36,854 engineering in the past and I 141 00:04:36,892 --> 00:04:40,802 find that it's very applicable 142 00:04:40,866 --> 00:04:42,342 when you're doing pretty much 143 00:04:42,396 --> 00:04:44,134 straight one to one type stuff 144 00:04:44,252 --> 00:04:47,254 and that's not throwing off on 145 00:04:47,292 --> 00:04:49,098 your product by any stretch. I 146 00:04:49,104 --> 00:04:50,106 don't know if you agree or 147 00:04:50,128 --> 00:04:51,562 disagree, but I think about 148 00:04:51,616 --> 00:04:54,842 staging data so I can pull data 149 00:04:54,896 --> 00:04:57,654 from extracts, text files, flat 150 00:04:57,702 --> 00:05:00,218 files and load that into some 151 00:05:00,304 --> 00:05:03,014 data store, usually a database. 152 00:05:03,142 --> 00:05:05,754 And once I get it there, I find 153 00:05:05,792 --> 00:05:06,878 a couple of things are true. And 154 00:05:06,884 --> 00:05:07,806 I may not pull it from an 155 00:05:07,828 --> 00:05:09,038 extract, I may pull it from the 156 00:05:09,044 --> 00:05:10,686 system of record. I want to get 157 00:05:10,708 --> 00:05:12,190 in, get the data and get out 158 00:05:12,260 --> 00:05:13,794 with doing as little harm as 159 00:05:13,832 --> 00:05:15,074 possible, stealing as little 160 00:05:15,112 --> 00:05:17,246 cycles. But once I get my copy 161 00:05:17,278 --> 00:05:19,534 of it, then I can start applying 162 00:05:19,582 --> 00:05:23,470 rules, looking for opportunities 163 00:05:23,550 --> 00:05:25,426 to apply strong data types and 164 00:05:25,448 --> 00:05:28,322 the like. And automation really 165 00:05:28,376 --> 00:05:30,930 works well there. Your product, 166 00:05:31,080 --> 00:05:33,094 I'm I'm assuming, does that and 167 00:05:33,132 --> 00:05:34,438 does that part well? 168 00:05:34,604 --> 00:05:36,006 Yeah, absolutely. But there are 169 00:05:36,028 --> 00:05:37,846 also parts where you're getting 170 00:05:37,868 --> 00:05:39,398 that extract and there is a 171 00:05:39,404 --> 00:05:41,206 slight change in schema now and 172 00:05:41,308 --> 00:05:42,634 it's more or less the same. But 173 00:05:42,672 --> 00:05:44,666 can automation cache that for 174 00:05:44,688 --> 00:05:45,866 you and take care of a few of 175 00:05:45,888 --> 00:05:47,146 those things? Or do you have to 176 00:05:47,168 --> 00:05:49,386 go back and write that piece of 177 00:05:49,408 --> 00:05:50,806 code there? So there are many 178 00:05:50,848 --> 00:05:55,594 places where you can benefit 179 00:05:55,642 --> 00:05:56,814 from that. So there is what we 180 00:05:56,852 --> 00:05:58,206 call when we talk about 181 00:05:58,228 --> 00:06:00,154 automation, what is the driver 182 00:06:00,202 --> 00:06:01,486 of automation, what is the 183 00:06:01,508 --> 00:06:03,214 source of that? And we put that 184 00:06:03,252 --> 00:06:05,434 on an aspect of applying 185 00:06:05,482 --> 00:06:07,454 intelligence to the metadata. So 186 00:06:07,492 --> 00:06:08,946 when we look at the data and we 187 00:06:08,968 --> 00:06:10,354 understand that the metadata is 188 00:06:10,392 --> 00:06:11,966 actually things like the Schema 189 00:06:11,998 --> 00:06:13,746 or I have a price attribute in 190 00:06:13,768 --> 00:06:14,914 my extract, but this is the 191 00:06:14,952 --> 00:06:16,258 behavior of that attribute, this 192 00:06:16,264 --> 00:06:17,846 is how it looks like, these are 193 00:06:17,868 --> 00:06:19,094 the characteristics. And based 194 00:06:19,132 --> 00:06:21,126 on that metadata, can I apply a 195 00:06:21,148 --> 00:06:22,966 validation rule to it, for 196 00:06:22,988 --> 00:06:24,134 example, automatically without 197 00:06:24,172 --> 00:06:25,734 having to define that. And 198 00:06:25,772 --> 00:06:27,958 something does it for me that is 199 00:06:28,044 --> 00:06:29,590 bringing automation. So actually 200 00:06:29,660 --> 00:06:31,500 the roots of that come from 201 00:06:32,030 --> 00:06:33,466 there is so much information. 202 00:06:33,568 --> 00:06:35,638 For example, your data extract 203 00:06:35,734 --> 00:06:37,914 happens every day at 04:00 p.m. 204 00:06:37,952 --> 00:06:39,482 And you expect the finished data 205 00:06:39,536 --> 00:06:41,354 to be ready by five and on 206 00:06:41,392 --> 00:06:42,678 someday. It doesn't happen. It's 207 00:06:42,694 --> 00:06:44,366 06:00 and it's not there. So is 208 00:06:44,388 --> 00:06:45,566 there automation going and 209 00:06:45,588 --> 00:06:48,334 saying alert it didn't come 210 00:06:48,372 --> 00:06:49,566 through. Oh, by the way, the 211 00:06:49,588 --> 00:06:50,574 reason it didn't come through 212 00:06:50,612 --> 00:06:53,006 was that your stuff was all 213 00:06:53,028 --> 00:06:55,506 great except these 20 records in 214 00:06:55,528 --> 00:06:56,994 between completely threw it off 215 00:06:57,032 --> 00:06:59,102 because it was wrongly formatted 216 00:06:59,166 --> 00:07:02,674 or whatever. So stuff like that 217 00:07:02,712 --> 00:07:04,366 is where I think that automation 218 00:07:04,478 --> 00:07:07,186 really becomes an assist in 219 00:07:07,208 --> 00:07:08,470 this. So you know the business 220 00:07:08,540 --> 00:07:09,506 problem, you know what you're 221 00:07:09,538 --> 00:07:11,286 trying to do, this is how you 222 00:07:11,308 --> 00:07:12,002 get there faster. 223 00:07:12,066 --> 00:07:14,386 So I've heard that unexpected 224 00:07:14,418 --> 00:07:16,694 changes in formats, which I see 225 00:07:16,732 --> 00:07:18,758 it all the time because it's 226 00:07:18,774 --> 00:07:22,874 what I do. But I hear that 227 00:07:22,912 --> 00:07:25,114 addressed under the topic of 228 00:07:25,152 --> 00:07:27,562 Schema drift. And it can happen 229 00:07:27,616 --> 00:07:28,986 a couple of different ways. You 230 00:07:29,008 --> 00:07:31,822 can miss a tab or comma whatever 231 00:07:31,876 --> 00:07:33,518 delimiter you're using. You can 232 00:07:33,524 --> 00:07:34,846 either miss one or an extra one 233 00:07:34,868 --> 00:07:37,066 be inserted in the extract 234 00:07:37,098 --> 00:07:40,842 process. You can have missing 235 00:07:40,906 --> 00:07:42,922 files, completely missing files 236 00:07:42,986 --> 00:07:44,866 for a variety of reasons, some 237 00:07:44,888 --> 00:07:46,846 of which are legit, like you're 238 00:07:46,878 --> 00:07:49,154 doing an incremental extract and 239 00:07:49,192 --> 00:07:51,906 nothing changed and stuff like 240 00:07:51,928 --> 00:07:53,810 that. And I guess your product 241 00:07:53,880 --> 00:07:55,554 addresses that has rules for 242 00:07:55,592 --> 00:07:57,154 saying if this is missing, just 243 00:07:57,192 --> 00:07:59,006 keep going. It's a slowly 244 00:07:59,038 --> 00:08:00,238 changing dimension and if we 245 00:08:00,264 --> 00:08:02,374 miss ten in a row, it's no big 246 00:08:02,412 --> 00:08:03,094 deal. 247 00:08:03,292 --> 00:08:04,886 So what we do at a very high 248 00:08:04,908 --> 00:08:05,798 level, right, when we think 249 00:08:05,804 --> 00:08:06,966 about data engineering, one of 250 00:08:06,988 --> 00:08:08,354 the key problems that it solves 251 00:08:08,402 --> 00:08:10,546 is integrating data, getting 252 00:08:10,588 --> 00:08:12,426 data from point A to point B and 253 00:08:12,528 --> 00:08:13,866 making sure it's valid, it is 254 00:08:13,888 --> 00:08:15,306 trusted, it can be used by the 255 00:08:15,328 --> 00:08:17,034 downstream application. There is 256 00:08:17,072 --> 00:08:19,180 often an implicit contract that 257 00:08:20,110 --> 00:08:21,386 dashboard is relying on this 258 00:08:21,408 --> 00:08:23,194 sort of information. So what we 259 00:08:23,232 --> 00:08:24,782 do basically at a high level is 260 00:08:24,836 --> 00:08:27,534 one we are like hey, we can 261 00:08:27,572 --> 00:08:29,374 figure out how to connect to new 262 00:08:29,412 --> 00:08:30,926 systems. This is a part where we 263 00:08:30,948 --> 00:08:31,886 bring automation to the 264 00:08:31,908 --> 00:08:33,582 connector creation. So instead 265 00:08:33,636 --> 00:08:34,954 of writing code for connectors, 266 00:08:35,002 --> 00:08:36,318 we are able to generate most of 267 00:08:36,324 --> 00:08:37,646 the connectors out there. So 268 00:08:37,668 --> 00:08:39,010 that's one part. But when we 269 00:08:39,080 --> 00:08:40,322 scan so the data we understand, 270 00:08:40,376 --> 00:08:41,886 we do understand what the Schema 271 00:08:41,918 --> 00:08:42,834 is and all of that, and we 272 00:08:42,872 --> 00:08:44,654 present that and automatically 273 00:08:44,702 --> 00:08:45,878 sort of package that into what 274 00:08:45,884 --> 00:08:47,382 we call as a logical data 275 00:08:47,436 --> 00:08:50,198 product that becomes much more 276 00:08:50,364 --> 00:08:52,006 easily understandable by an 277 00:08:52,028 --> 00:08:54,200 average data user person who 278 00:08:54,890 --> 00:08:56,294 understand it. So in that 279 00:08:56,332 --> 00:08:58,454 process, in between that, yes, 280 00:08:58,492 --> 00:08:59,862 the Schema Drift is an important 281 00:08:59,916 --> 00:09:01,274 part, but it's not as 282 00:09:01,312 --> 00:09:02,474 straightforward because what 283 00:09:02,512 --> 00:09:04,362 happens is you're getting data 284 00:09:04,416 --> 00:09:06,026 with first name and last name 285 00:09:06,048 --> 00:09:07,238 and email address and suddenly 286 00:09:07,254 --> 00:09:09,174 you get data maybe two records 287 00:09:09,222 --> 00:09:11,146 which don't match that. Is that 288 00:09:11,168 --> 00:09:12,666 an error? Is that a change? Is 289 00:09:12,688 --> 00:09:14,506 that an evolution? You got first 290 00:09:14,528 --> 00:09:16,142 name, last name, email address 291 00:09:16,196 --> 00:09:17,086 and now you're also getting 292 00:09:17,108 --> 00:09:17,998 phone number. Well, it's a 293 00:09:18,004 --> 00:09:19,246 sparse schema potentially and 294 00:09:19,268 --> 00:09:20,574 it's a drifting schema. Well 295 00:09:20,612 --> 00:09:21,674 does it break the downstream 296 00:09:21,722 --> 00:09:24,066 contract because something got 297 00:09:24,088 --> 00:09:25,406 renamed or does it just simply 298 00:09:25,438 --> 00:09:26,786 add to that? So there are a few 299 00:09:26,808 --> 00:09:28,242 of those aspects. We do cover 300 00:09:28,296 --> 00:09:31,122 all of those by sort of saying 301 00:09:31,176 --> 00:09:35,474 that when I connect to a data 302 00:09:35,512 --> 00:09:36,482 system, I'm going to present 303 00:09:36,536 --> 00:09:38,226 that data in a certain sort of a 304 00:09:38,248 --> 00:09:39,926 data product view is we call it 305 00:09:39,948 --> 00:09:41,718 a logical data product. So 306 00:09:41,884 --> 00:09:43,222 here's a logical data product, 307 00:09:43,276 --> 00:09:44,758 this is all there is, these are 308 00:09:44,764 --> 00:09:45,958 the characteristics and stuff 309 00:09:46,044 --> 00:09:47,446 and you decide what you want to 310 00:09:47,468 --> 00:09:48,598 do with it and how you want to 311 00:09:48,604 --> 00:09:49,938 use it. But once you have a 312 00:09:49,964 --> 00:09:52,074 consumer for a data product, 313 00:09:52,192 --> 00:09:53,526 then it sort of implicitly 314 00:09:53,558 --> 00:09:55,114 creates a contract and we keep 315 00:09:55,152 --> 00:09:58,106 track of that. And there's some 316 00:09:58,128 --> 00:09:59,466 interesting concepts that we do 317 00:09:59,488 --> 00:10:00,906 there, which is you can take a 318 00:10:00,928 --> 00:10:02,106 data product and create some 319 00:10:02,128 --> 00:10:03,626 derivatives out of that. So you 320 00:10:03,648 --> 00:10:05,486 can say I have an orders or 321 00:10:05,508 --> 00:10:07,134 transactions and it has credit 322 00:10:07,172 --> 00:10:08,926 card number, I'll mask it and I 323 00:10:08,948 --> 00:10:10,286 have an order ID and I'll look 324 00:10:10,308 --> 00:10:11,982 up the items from a different 325 00:10:12,116 --> 00:10:13,762 entity and now I have a new data 326 00:10:13,816 --> 00:10:15,106 product which is enriched and 327 00:10:15,128 --> 00:10:17,058 which is maybe more Pisa. So 328 00:10:17,144 --> 00:10:18,946 some interesting you know what 329 00:10:18,968 --> 00:10:19,058 I. 330 00:10:19,064 --> 00:10:21,394 Like about that is we think 331 00:10:21,432 --> 00:10:23,550 about tools that visualize 332 00:10:23,630 --> 00:10:26,134 lineage, Atlas and Purview and 333 00:10:26,172 --> 00:10:27,846 tools like that, but those are 334 00:10:27,868 --> 00:10:30,294 very reactive, even though both 335 00:10:30,332 --> 00:10:31,986 Atlas and Microsoft's 336 00:10:32,018 --> 00:10:34,070 implementation of that. Purview 337 00:10:35,130 --> 00:10:36,722 introduce automatic scans, 338 00:10:36,786 --> 00:10:39,202 automated scans and they manage 339 00:10:39,276 --> 00:10:40,522 Schema Drift to a certain 340 00:10:40,576 --> 00:10:44,086 extent. You know this if you've 341 00:10:44,118 --> 00:10:45,962 ever tried to code a tool to 342 00:10:46,016 --> 00:10:47,498 react to Schema drift, then you 343 00:10:47,504 --> 00:10:50,954 know how complex that is. And I 344 00:10:50,992 --> 00:10:52,746 can't wait to see large language 345 00:10:52,778 --> 00:10:54,174 models integrated into that 346 00:10:54,212 --> 00:10:56,938 process because I suspect it'll 347 00:10:56,954 --> 00:10:58,506 do a much better job of managing 348 00:10:58,538 --> 00:11:01,006 that than trying to guess what 349 00:11:01,028 --> 00:11:02,318 this data type should be and 350 00:11:02,324 --> 00:11:03,586 such. But what I hear you 351 00:11:03,608 --> 00:11:05,198 describing in your contract 352 00:11:05,374 --> 00:11:08,146 sounds like a proactive piece to 353 00:11:08,168 --> 00:11:10,674 that where you meet with your 354 00:11:10,712 --> 00:11:13,566 client and you say, yeah, here's 355 00:11:13,758 --> 00:11:15,042 whatever you want to call it, 356 00:11:15,096 --> 00:11:16,566 our data dictionary, what have 357 00:11:16,588 --> 00:11:18,806 you of all of our source data 358 00:11:18,908 --> 00:11:20,514 and our fields and columns, 359 00:11:20,642 --> 00:11:22,854 fields, columns, metadata for 360 00:11:22,892 --> 00:11:24,626 all of that. And then you're 361 00:11:24,658 --> 00:11:27,986 saying these particular fields 362 00:11:28,018 --> 00:11:29,198 are important because they're 363 00:11:29,234 --> 00:11:30,982 used downstream in these dozen 364 00:11:31,046 --> 00:11:32,746 reports at the very end of the 365 00:11:32,768 --> 00:11:34,842 process. But what you're saying 366 00:11:34,896 --> 00:11:37,386 is if another related field was 367 00:11:37,408 --> 00:11:39,898 to show up in that list, then 368 00:11:39,984 --> 00:11:41,710 you're going to be able to make 369 00:11:41,860 --> 00:11:44,302 an educated guess at whether 370 00:11:44,356 --> 00:11:46,302 that's schema drift or whether 371 00:11:46,356 --> 00:11:48,526 it's additional attributes. Is 372 00:11:48,548 --> 00:11:49,886 that what I'm understanding we 373 00:11:49,908 --> 00:11:50,094 make. 374 00:11:50,132 --> 00:11:51,934 An educated guess about? Is this 375 00:11:51,972 --> 00:11:53,086 a one off thing and we should 376 00:11:53,108 --> 00:11:55,694 treat it as an error? Hey, this 377 00:11:55,732 --> 00:11:56,746 record is an error, it didn't 378 00:11:56,778 --> 00:11:57,878 actually meet certain criteria 379 00:11:57,914 --> 00:11:59,794 or this is a change that is 380 00:11:59,912 --> 00:12:01,314 showing up. And the way you do 381 00:12:01,352 --> 00:12:02,754 that is you observe that data 382 00:12:02,792 --> 00:12:04,210 over a certain period of time to 383 00:12:04,280 --> 00:12:06,146 make the determination and we do 384 00:12:06,168 --> 00:12:08,434 a certain level of drift 385 00:12:08,482 --> 00:12:10,374 analysis. And if the drift is 386 00:12:10,412 --> 00:12:12,006 very significant, then what we 387 00:12:12,028 --> 00:12:14,774 do is we today actually do not 388 00:12:14,812 --> 00:12:16,198 go and make assumptions on 389 00:12:16,284 --> 00:12:18,614 behalf of the user. We actually 390 00:12:18,652 --> 00:12:19,846 create a notification and saying 391 00:12:19,868 --> 00:12:21,686 we saw a significant change and 392 00:12:21,708 --> 00:12:23,114 we think this might be a new 393 00:12:23,152 --> 00:12:24,746 data product that we have 394 00:12:24,768 --> 00:12:26,346 detected. So we are connected to 395 00:12:26,368 --> 00:12:27,338 a store. So looking at the 396 00:12:27,344 --> 00:12:28,406 schemas and stuff and we're 397 00:12:28,438 --> 00:12:29,562 saying here's a data product, 398 00:12:29,616 --> 00:12:31,066 that data product is 399 00:12:31,248 --> 00:12:32,746 transactions and this is what it 400 00:12:32,768 --> 00:12:34,394 looks like. Oh, some came here 401 00:12:34,432 --> 00:12:35,758 and whatnot. But then at some 402 00:12:35,764 --> 00:12:36,814 point we may say, hey, this 403 00:12:36,852 --> 00:12:37,822 looks significantly different, 404 00:12:37,876 --> 00:12:39,886 would you like to consider this 405 00:12:39,908 --> 00:12:41,406 as a new entity? And then it 406 00:12:41,428 --> 00:12:42,686 sort of notifies the user and 407 00:12:42,708 --> 00:12:44,554 lets them do that. I think when 408 00:12:44,612 --> 00:12:46,274 building automation, at least my 409 00:12:46,312 --> 00:12:47,314 understanding is that's very 410 00:12:47,352 --> 00:12:50,626 important to understand, when to 411 00:12:50,648 --> 00:12:52,194 make a reasonable assumption and 412 00:12:52,232 --> 00:12:53,486 when to actually let the user 413 00:12:53,518 --> 00:12:55,074 decide. But even creating that 414 00:12:55,112 --> 00:12:56,562 workflow is a big value add 415 00:12:56,616 --> 00:12:57,646 because this is stuff that 416 00:12:57,688 --> 00:12:58,706 actually would have gotten 417 00:12:58,738 --> 00:13:01,414 missed maybe for a few days, but 418 00:13:01,452 --> 00:13:02,614 you are getting notified about 419 00:13:02,652 --> 00:13:04,662 that upfront. So we try to be 420 00:13:04,716 --> 00:13:05,494 maybe a little bit more 421 00:13:05,532 --> 00:13:08,380 conservative, if anything about 422 00:13:08,750 --> 00:13:10,218 making assumptions on behalf of 423 00:13:10,224 --> 00:13:11,882 the customer or the user because 424 00:13:12,016 --> 00:13:15,386 you go wrong and it's not fun at 425 00:13:15,408 --> 00:13:15,594 all. 426 00:13:15,632 --> 00:13:17,580 Yeah, I really like it. 427 00:13:20,990 --> 00:13:22,218 Now that's interesting. And I 428 00:13:22,224 --> 00:13:26,158 see your company has been around 429 00:13:26,244 --> 00:13:27,502 about seven years. 430 00:13:27,636 --> 00:13:28,320 Yeah. 431 00:13:29,490 --> 00:13:31,054 What has changed in those seven 432 00:13:31,092 --> 00:13:32,634 years? Because I think seven 433 00:13:32,692 --> 00:13:36,562 years ago AI was not on the top 434 00:13:36,616 --> 00:13:38,178 of everyone's lips, obviously. I 435 00:13:38,184 --> 00:13:40,990 think certainly since Chat GPT 436 00:13:41,070 --> 00:13:42,914 came out. Right. It's a big part 437 00:13:42,952 --> 00:13:46,034 of the conversation. Have you 438 00:13:46,072 --> 00:13:47,382 also seen kind of the data 439 00:13:47,436 --> 00:13:48,806 engineering world be kind of 440 00:13:48,828 --> 00:13:52,360 touched by AI and if so, how? 441 00:13:53,290 --> 00:13:54,886 Not as much, but I think it is 442 00:13:54,908 --> 00:13:55,846 getting there. So when we 443 00:13:55,868 --> 00:13:58,246 started very early, as I said, 444 00:13:58,348 --> 00:13:59,654 seven years ago, we had actually 445 00:13:59,692 --> 00:14:03,254 come in with the question when 446 00:14:03,292 --> 00:14:04,426 starting the company was do we 447 00:14:04,448 --> 00:14:05,926 want to build an AI or a machine 448 00:14:05,958 --> 00:14:06,762 learning company? Because 449 00:14:06,816 --> 00:14:09,258 actually in 2016 it was hot, it 450 00:14:09,264 --> 00:14:10,378 was hype. There was a lot of 451 00:14:10,384 --> 00:14:11,226 like, oh, this is going to 452 00:14:11,248 --> 00:14:12,906 change the world. Always the 453 00:14:12,928 --> 00:14:14,394 hype is ahead of the reality. 454 00:14:14,442 --> 00:14:16,094 And it took a while before 455 00:14:16,292 --> 00:14:18,686 things like LLM came around and 456 00:14:18,708 --> 00:14:19,758 generative AI is starting to 457 00:14:19,764 --> 00:14:21,034 succeed. But even otherwise 458 00:14:21,082 --> 00:14:22,058 there are a lot of other AI 459 00:14:22,074 --> 00:14:22,926 initiatives that are still 460 00:14:22,948 --> 00:14:24,538 figuring their way out. But we 461 00:14:24,564 --> 00:14:25,858 were very clear that we want to 462 00:14:25,864 --> 00:14:27,586 build the technology for the 463 00:14:27,608 --> 00:14:29,202 users of data. We want to focus 464 00:14:29,256 --> 00:14:34,306 on users of data and allow the 465 00:14:34,328 --> 00:14:35,522 user of data to get the data 466 00:14:35,576 --> 00:14:36,994 wherever they need and do 467 00:14:37,032 --> 00:14:38,438 whatever they want to do with it 468 00:14:38,444 --> 00:14:39,686 in whatever tool they want to do 469 00:14:39,708 --> 00:14:41,254 with. So we came with that 470 00:14:41,292 --> 00:14:43,926 approach because we said that 471 00:14:43,948 --> 00:14:45,638 the user of data is not going to 472 00:14:45,644 --> 00:14:49,766 be necessarily very expert in 473 00:14:49,788 --> 00:14:51,266 the data system. So there's 474 00:14:51,298 --> 00:14:52,726 expertise in data, which is I 475 00:14:52,748 --> 00:14:53,818 understand what the data is, and 476 00:14:53,824 --> 00:14:54,634 there's expertise in data 477 00:14:54,672 --> 00:14:55,626 systems, which is with the more 478 00:14:55,648 --> 00:14:57,114 engineering side, we said the 479 00:14:57,152 --> 00:14:58,518 data user will not be an expert 480 00:14:58,534 --> 00:14:59,970 in data. So if you have a lot 481 00:14:59,980 --> 00:15:01,526 more variety of data that's 482 00:15:01,558 --> 00:15:02,954 coming, how do they use it? 483 00:15:02,992 --> 00:15:04,134 We'll create an abstraction. 484 00:15:04,182 --> 00:15:05,386 We'll create an abstraction that 485 00:15:05,408 --> 00:15:06,622 will give them a clean, 486 00:15:06,676 --> 00:15:07,726 consistent view of data no 487 00:15:07,748 --> 00:15:08,846 matter where it comes from. So 488 00:15:08,868 --> 00:15:09,838 now you're like, I don't care if 489 00:15:09,844 --> 00:15:11,486 the data was a stream or API or 490 00:15:11,508 --> 00:15:13,486 JSON or document or whatever, I 491 00:15:13,508 --> 00:15:14,606 have something consistent to 492 00:15:14,628 --> 00:15:16,606 work with. So we went in and 493 00:15:16,628 --> 00:15:18,014 looked at metadata and started 494 00:15:18,052 --> 00:15:19,534 to apply metadata intelligence 495 00:15:19,582 --> 00:15:21,346 to create that. I would say that 496 00:15:21,368 --> 00:15:22,578 when we were doing it, a lot of 497 00:15:22,584 --> 00:15:23,826 people early on were like, why 498 00:15:23,848 --> 00:15:25,186 are you doing that? Why not just 499 00:15:25,208 --> 00:15:26,494 create like this straightforward 500 00:15:26,542 --> 00:15:28,066 thing that everybody does? What 501 00:15:28,088 --> 00:15:29,398 has changed for us is in the 502 00:15:29,404 --> 00:15:31,254 last two years or so, our 503 00:15:31,292 --> 00:15:32,694 approach of creating this 504 00:15:32,732 --> 00:15:34,534 logical entity around data and 505 00:15:34,572 --> 00:15:36,726 using that started to catch on 506 00:15:36,748 --> 00:15:37,782 with the concept of data 507 00:15:37,836 --> 00:15:39,222 products. So where we were 508 00:15:39,276 --> 00:15:40,854 initially struggling to say what 509 00:15:40,892 --> 00:15:42,154 does this mean? And why is it 510 00:15:42,192 --> 00:15:44,282 valuable? Has suddenly become 511 00:15:44,336 --> 00:15:45,594 like, oh, it makes so much sense 512 00:15:45,632 --> 00:15:46,746 that you guys have done it this 513 00:15:46,768 --> 00:15:48,650 way. So that has certainly 514 00:15:48,720 --> 00:15:50,118 changed. I think that 515 00:15:50,224 --> 00:15:52,606 application of intelligence to 516 00:15:52,628 --> 00:15:55,150 the metadata itself to make data 517 00:15:55,220 --> 00:15:57,546 tasks themselves more automated 518 00:15:57,738 --> 00:16:01,166 is a very valid use case. The 519 00:16:01,188 --> 00:16:02,782 generative models are doing 520 00:16:02,836 --> 00:16:05,026 quite well in things like can 521 00:16:05,048 --> 00:16:06,434 you generate a description for 522 00:16:06,472 --> 00:16:07,746 this data if this data looks 523 00:16:07,768 --> 00:16:11,570 like this? There's also been 524 00:16:11,640 --> 00:16:12,978 some really interesting stuff in 525 00:16:12,984 --> 00:16:14,626 terms of generating code as far 526 00:16:14,648 --> 00:16:15,586 as data engineering is 527 00:16:15,608 --> 00:16:16,654 concerned. Hey, can you generate 528 00:16:16,702 --> 00:16:19,126 code that reads it up from here 529 00:16:19,148 --> 00:16:20,646 and pushes it out there? So I 530 00:16:20,668 --> 00:16:21,846 think there is that happening as 531 00:16:21,868 --> 00:16:23,366 well. So there's a lot of, I 532 00:16:23,388 --> 00:16:26,454 think, places transforming data. 533 00:16:26,492 --> 00:16:28,134 For example, if the data looks 534 00:16:28,172 --> 00:16:29,222 like this and it has to become 535 00:16:29,276 --> 00:16:30,826 like that, can we figure out 536 00:16:30,848 --> 00:16:32,266 what is the logic in between? So 537 00:16:32,288 --> 00:16:35,286 I think I would say that it's 538 00:16:35,318 --> 00:16:36,474 good that we're moving in that 539 00:16:36,512 --> 00:16:39,226 direction because I believe that 540 00:16:39,248 --> 00:16:40,554 the number of things to do is so 541 00:16:40,592 --> 00:16:42,318 massive that some degree of 542 00:16:42,324 --> 00:16:43,710 automation is essential. 543 00:16:44,050 --> 00:16:45,806 I totally agree. And I was just 544 00:16:45,828 --> 00:16:47,550 scrolling, so apologies, Frank, 545 00:16:48,130 --> 00:16:50,478 scrolling around on Nexla.com 546 00:16:50,564 --> 00:16:51,710 and looking at your data 547 00:16:51,780 --> 00:16:54,786 operations. And first I want to 548 00:16:54,808 --> 00:16:56,770 commend you for next sets. 549 00:16:57,110 --> 00:17:00,386 That's a cool play. And some of 550 00:17:00,408 --> 00:17:02,580 the fields that you cover here 551 00:17:05,050 --> 00:17:06,886 in my career, I've been doing 552 00:17:06,908 --> 00:17:08,600 this for a long time. I'm old 553 00:17:09,130 --> 00:17:12,354 socket but Continuous Metadata 554 00:17:12,402 --> 00:17:15,766 intelligence caught my eye. That 555 00:17:15,788 --> 00:17:17,798 is a very cool concept. And I've 556 00:17:17,814 --> 00:17:19,354 done what you described here 557 00:17:19,392 --> 00:17:20,646 sounds like some stuff that I've 558 00:17:20,678 --> 00:17:23,734 done but that's a much cooler 559 00:17:23,782 --> 00:17:28,502 name. Continuous Metadata CMI 560 00:17:28,566 --> 00:17:30,410 that just rolls off the tongue. 561 00:17:31,410 --> 00:17:34,154 The idea, the automated Error 562 00:17:34,202 --> 00:17:35,498 management and quarantine. I'm 563 00:17:35,514 --> 00:17:36,286 just kind of going from the 564 00:17:36,308 --> 00:17:38,014 bottom up here on the page for 565 00:17:38,052 --> 00:17:41,022 data operations. Those are just 566 00:17:41,076 --> 00:17:44,434 key pieces of functionality that 567 00:17:44,632 --> 00:17:47,042 time and time again data 568 00:17:47,096 --> 00:17:50,162 Warehouse ETL includes something 569 00:17:50,216 --> 00:17:51,646 like that. But everybody's 570 00:17:51,678 --> 00:17:56,786 rolling it from scratch. This is 571 00:17:56,808 --> 00:17:58,454 just very cool. I love this idea 572 00:17:58,492 --> 00:18:02,338 of abstracting that out and I'm 573 00:18:02,354 --> 00:18:03,286 just going to throw this out 574 00:18:03,308 --> 00:18:04,226 there. I'm going to be signing 575 00:18:04,258 --> 00:18:05,814 up for a demo. I want to see 576 00:18:05,852 --> 00:18:06,486 more. 577 00:18:06,668 --> 00:18:08,214 Oh absolutely. You're very 578 00:18:08,252 --> 00:18:09,766 welcome for that. And I think 579 00:18:09,788 --> 00:18:11,858 the other problem that it helped 580 00:18:11,874 --> 00:18:13,258 us solve, as I said, there are 581 00:18:13,264 --> 00:18:14,426 two ways to get scale in 582 00:18:14,448 --> 00:18:15,386 enterprise. One is through 583 00:18:15,408 --> 00:18:16,426 automation and the other is 584 00:18:16,448 --> 00:18:18,074 through collaboration. By 585 00:18:18,112 --> 00:18:19,462 creating this abstracted entity 586 00:18:19,526 --> 00:18:21,210 we were able to say hey, this is 587 00:18:21,280 --> 00:18:23,054 a lot more easy to understand as 588 00:18:23,092 --> 00:18:25,854 access control. I can be really 589 00:18:25,892 --> 00:18:27,230 good at connecting to the data 590 00:18:27,300 --> 00:18:29,694 from a transaction system and 591 00:18:29,732 --> 00:18:31,520 cleaning it up and doing some 592 00:18:32,130 --> 00:18:33,566 applying some compliance to it. 593 00:18:33,588 --> 00:18:35,310 But then the output of my work, 594 00:18:35,380 --> 00:18:36,538 which is the cool thing about 595 00:18:36,564 --> 00:18:37,634 these next sets of the data 596 00:18:37,672 --> 00:18:39,746 products, is you take one and 597 00:18:39,768 --> 00:18:41,106 you apply some operations to the 598 00:18:41,128 --> 00:18:42,386 output is another next set, 599 00:18:42,488 --> 00:18:44,270 which is identical in behavior 600 00:18:44,350 --> 00:18:46,818 and consistency, but it is a 601 00:18:46,824 --> 00:18:47,906 slightly different view of the 602 00:18:47,928 --> 00:18:48,882 data and you can give somebody 603 00:18:48,936 --> 00:18:50,166 else access to that and you can 604 00:18:50,188 --> 00:18:51,558 keep repeating that process. You 605 00:18:51,564 --> 00:18:53,398 can imagine in a large company 606 00:18:53,484 --> 00:18:54,966 people are finding these, 607 00:18:54,988 --> 00:18:55,766 they're creating their own 608 00:18:55,788 --> 00:18:57,014 variants and they're using it. 609 00:18:57,052 --> 00:18:58,146 But what they are doing becomes 610 00:18:58,178 --> 00:18:59,526 an input to somebody else and 611 00:18:59,548 --> 00:19:00,678 they can go take it out of 612 00:19:00,684 --> 00:19:02,282 there. And what we did a month 613 00:19:02,336 --> 00:19:04,006 back was introduced the concept 614 00:19:04,038 --> 00:19:05,690 of you can take all of these 615 00:19:05,760 --> 00:19:07,574 logical data products and Nexus 616 00:19:07,622 --> 00:19:08,906 that we are creating and make 617 00:19:08,928 --> 00:19:10,186 them into, put them into a 618 00:19:10,208 --> 00:19:11,578 marketplace that is internal to 619 00:19:11,584 --> 00:19:12,762 the company. You can allow 620 00:19:12,816 --> 00:19:15,006 people to go find it request to 621 00:19:15,028 --> 00:19:16,014 get that you're not really 622 00:19:16,052 --> 00:19:17,198 buying. You're saying hey, can I 623 00:19:17,204 --> 00:19:18,800 get access to that? And there is 624 00:19:19,330 --> 00:19:21,614 a mechanism to approve and give 625 00:19:21,652 --> 00:19:23,374 people access to that. Now the 626 00:19:23,412 --> 00:19:24,466 interesting thing is that these 627 00:19:24,488 --> 00:19:27,154 are all very importantly, these 628 00:19:27,192 --> 00:19:28,482 Nexus or Data products are 629 00:19:28,536 --> 00:19:30,114 logical entities. They're not 630 00:19:30,152 --> 00:19:31,506 making copies of data. So 631 00:19:31,528 --> 00:19:33,074 they're bringing the same sort 632 00:19:33,112 --> 00:19:34,226 of benefits that 633 00:19:34,328 --> 00:19:35,682 containerization for example, 634 00:19:35,736 --> 00:19:37,106 has done on the compute side is 635 00:19:37,128 --> 00:19:38,046 like you have an abstracted 636 00:19:38,078 --> 00:19:39,666 entity, it's consistent, you 637 00:19:39,688 --> 00:19:40,694 don't have to worry about what 638 00:19:40,732 --> 00:19:42,086 was under the hood. Where did it 639 00:19:42,108 --> 00:19:43,926 come from, was it XML data, was 640 00:19:43,948 --> 00:19:45,942 it CSV? Now you have something 641 00:19:45,996 --> 00:19:47,526 consistent to work with and it 642 00:19:47,548 --> 00:19:48,566 opens up a whole bunch of 643 00:19:48,588 --> 00:19:50,022 interfaces. I can take that data 644 00:19:50,076 --> 00:19:51,418 product, that next set and say, 645 00:19:51,504 --> 00:19:52,938 I would like this data in a 646 00:19:52,944 --> 00:19:55,066 warehouse or I would like this 647 00:19:55,088 --> 00:19:57,030 data as an API. And you realize 648 00:19:57,110 --> 00:19:59,114 that the same entity can have 649 00:19:59,152 --> 00:20:00,906 benefits for different users and 650 00:20:00,928 --> 00:20:01,946 they can approach it in 651 00:20:01,968 --> 00:20:03,418 different ways. So we think that 652 00:20:03,424 --> 00:20:04,234 is what is bringing 653 00:20:04,282 --> 00:20:05,486 collaboration. So when you bring 654 00:20:05,508 --> 00:20:07,246 together automation on one hand, 655 00:20:07,348 --> 00:20:08,446 collaboration on other, and then 656 00:20:08,468 --> 00:20:10,126 you really get the benefits of 657 00:20:10,148 --> 00:20:12,302 scale from both technology and 658 00:20:12,356 --> 00:20:12,960 process. 659 00:20:13,350 --> 00:20:16,402 I absolutely love you. Now I get 660 00:20:16,456 --> 00:20:18,420 why you keep saying data product 661 00:20:18,790 --> 00:20:21,090 and it makes perfect sense now. 662 00:20:21,240 --> 00:20:23,650 You're creating a very 663 00:20:23,720 --> 00:20:26,006 interesting, almost an 664 00:20:26,028 --> 00:20:29,414 integration layer in between the 665 00:20:29,452 --> 00:20:33,286 idea of Containerized for code 666 00:20:33,468 --> 00:20:35,718 and you're containerizing data. 667 00:20:35,804 --> 00:20:37,046 And that's what I believe your 668 00:20:37,068 --> 00:20:39,558 data product represents. And now 669 00:20:39,644 --> 00:20:40,934 I'm really interested in that 670 00:20:40,972 --> 00:20:41,654 demo. 671 00:20:41,852 --> 00:20:44,646 Well, especially if my recent 672 00:20:44,678 --> 00:20:46,746 forays into OpenShift and kind 673 00:20:46,768 --> 00:20:48,746 of what Containerization has 674 00:20:48,768 --> 00:20:51,166 done for developers, I think 675 00:20:51,188 --> 00:20:52,382 it's only a matter of time 676 00:20:52,436 --> 00:20:53,838 before containerization kind of 677 00:20:53,844 --> 00:20:56,906 hits the data world. And there's 678 00:20:56,938 --> 00:21:02,702 something I want to point out is 679 00:21:02,756 --> 00:21:05,566 that it was very smart, I think, 680 00:21:05,588 --> 00:21:09,614 of you to focus on the data 681 00:21:09,652 --> 00:21:10,786 engineering side, right. Because 682 00:21:10,808 --> 00:21:12,866 AI is the hype machine, right. I 683 00:21:12,888 --> 00:21:14,098 fully admit that I say this as a 684 00:21:14,104 --> 00:21:16,326 data scientist, right? But one 685 00:21:16,348 --> 00:21:17,814 of the things that we've kind of 686 00:21:17,852 --> 00:21:20,678 discovered, both Andy and I, and 687 00:21:20,764 --> 00:21:22,418 in both our professional careers 688 00:21:22,514 --> 00:21:25,366 is that data scientists will 689 00:21:25,388 --> 00:21:27,254 tend to brush aside the simple 690 00:21:27,372 --> 00:21:30,470 basics of it's. Five words, 691 00:21:30,540 --> 00:21:31,802 right? First you get the data, 692 00:21:31,856 --> 00:21:33,820 or first we got the data. Right. 693 00:21:34,190 --> 00:21:36,794 Behind that is months of work. 694 00:21:36,832 --> 00:21:38,218 It's orders of magnitude of 695 00:21:38,224 --> 00:21:39,558 work. In fact, one of the spiels 696 00:21:39,574 --> 00:21:42,958 I have now is kind of like the 697 00:21:42,964 --> 00:21:44,602 idea of rock stars and roadies, 698 00:21:44,746 --> 00:21:46,430 right? And for every rock star 699 00:21:46,500 --> 00:21:48,286 there's, there's an army of 700 00:21:48,308 --> 00:21:50,714 Rhodeies that set up the lights 701 00:21:50,762 --> 00:21:54,158 to move the chairs around and 702 00:21:54,244 --> 00:21:55,742 take set up the equipment and 703 00:21:55,796 --> 00:21:56,946 manage the sound. So I think 704 00:21:56,968 --> 00:21:58,738 that the data engineering, I 705 00:21:58,744 --> 00:22:00,034 think, is one of those things 706 00:22:00,072 --> 00:22:02,018 that I think has not it's like 707 00:22:02,024 --> 00:22:04,066 the Rodney Dangerfield of kind 708 00:22:04,088 --> 00:22:06,406 of the data world where it 709 00:22:06,428 --> 00:22:07,782 didn't get any respect up until 710 00:22:07,836 --> 00:22:11,366 lately because Chat GPT will get 711 00:22:11,388 --> 00:22:12,870 all the headlines, right? But 712 00:22:12,940 --> 00:22:14,166 think about the data that went 713 00:22:14,188 --> 00:22:16,982 into it. I've heard numbers, 714 00:22:17,116 --> 00:22:18,486 billions and trillions of 715 00:22:18,508 --> 00:22:21,786 parameters for four. I think 716 00:22:21,808 --> 00:22:23,226 it's smart that you picked that 717 00:22:23,248 --> 00:22:24,154 and I think that actually worked 718 00:22:24,192 --> 00:22:25,418 out pretty well for you. 719 00:22:25,584 --> 00:22:27,340 Actually what worked, 720 00:22:27,790 --> 00:22:30,826 fortunately for us, was that I 721 00:22:30,848 --> 00:22:32,170 actually really looked at 722 00:22:32,240 --> 00:22:35,078 machine learning as a way to I'd 723 00:22:35,094 --> 00:22:36,210 been an entrepreneur before. I'd 724 00:22:36,230 --> 00:22:37,866 built a company I really enjoyed 725 00:22:37,978 --> 00:22:39,386 sort of building the data aspect 726 00:22:39,418 --> 00:22:40,526 of it. I had built a company in 727 00:22:40,548 --> 00:22:41,790 the advertising at Tech Space, 728 00:22:41,860 --> 00:22:42,986 built one of the earliest mobile 729 00:22:43,018 --> 00:22:45,342 ad servers. We became part of 730 00:22:45,396 --> 00:22:46,786 the largest ad exchange at the 731 00:22:46,808 --> 00:22:48,098 time outside of Google. So we 732 00:22:48,104 --> 00:22:49,742 were processing over 300 billion 733 00:22:49,806 --> 00:22:52,206 records a day, and my co founder 734 00:22:52,238 --> 00:22:52,994 here was running that 735 00:22:53,032 --> 00:22:54,402 infrastructure. And we're like, 736 00:22:54,536 --> 00:22:55,906 at some level, be candid, I 737 00:22:55,928 --> 00:22:56,946 didn't really enjoy being in 738 00:22:56,968 --> 00:22:58,546 advertising, but I did like the 739 00:22:58,568 --> 00:22:59,698 sort of data challenges that 740 00:22:59,704 --> 00:23:01,062 were there. So we were looking 741 00:23:01,116 --> 00:23:02,486 to figure out what is the next 742 00:23:02,508 --> 00:23:03,286 because we had taken that 743 00:23:03,308 --> 00:23:04,326 company public and all that 744 00:23:04,348 --> 00:23:05,846 stuff, and we moved on. I was 745 00:23:05,868 --> 00:23:07,046 like, okay, what is the next 746 00:23:07,068 --> 00:23:08,614 thing you want to work on? And I 747 00:23:08,652 --> 00:23:09,862 really seriously looked at 748 00:23:09,916 --> 00:23:10,982 building something because 749 00:23:11,036 --> 00:23:12,522 Machine Learning and AI was hot 750 00:23:12,576 --> 00:23:14,906 topic even in 20, 15, 16. But 751 00:23:14,928 --> 00:23:15,866 when I looked at it, I 752 00:23:15,888 --> 00:23:17,594 understood that at some level, 753 00:23:17,712 --> 00:23:19,354 it is so specific to that 754 00:23:19,392 --> 00:23:21,018 business and that company and 755 00:23:21,024 --> 00:23:22,726 that problem. It almost becomes 756 00:23:22,758 --> 00:23:24,014 consultative. And I was like, 757 00:23:24,052 --> 00:23:25,418 how do you platform it? It's 758 00:23:25,434 --> 00:23:27,422 hard to platform because even 759 00:23:27,556 --> 00:23:28,682 you can take two retail 760 00:23:28,746 --> 00:23:30,074 companies and they're solving 761 00:23:30,122 --> 00:23:31,166 maybe the same problem of 762 00:23:31,188 --> 00:23:32,638 recommending products to people 763 00:23:32,724 --> 00:23:34,174 and their models will be very 764 00:23:34,212 --> 00:23:35,546 different. And what you do there 765 00:23:35,588 --> 00:23:38,386 is not easily translatable. So I 766 00:23:38,408 --> 00:23:39,954 hesitated for that reason. And 767 00:23:39,992 --> 00:23:41,374 when I look back, in hindsight, 768 00:23:41,422 --> 00:23:43,042 almost some of the major 769 00:23:43,096 --> 00:23:44,034 companies that came out on 770 00:23:44,072 --> 00:23:44,894 machine Learning, ultimately, 771 00:23:44,942 --> 00:23:46,146 when you look deeper into it, 772 00:23:46,168 --> 00:23:47,534 there are large professional 773 00:23:47,582 --> 00:23:49,766 services organizations under the 774 00:23:49,788 --> 00:23:52,598 hood. And that's what gave us a 775 00:23:52,604 --> 00:23:53,526 hesitation. I'm more of an 776 00:23:53,548 --> 00:23:54,726 engineer. I like to build a 777 00:23:54,748 --> 00:23:56,406 platform and make it once and 778 00:23:56,428 --> 00:23:58,054 let people use it. And the data 779 00:23:58,092 --> 00:23:59,494 engineering part of it is what 780 00:23:59,532 --> 00:24:01,878 looked like that thing. But I 781 00:24:01,884 --> 00:24:02,714 don't know if you have read this 782 00:24:02,752 --> 00:24:04,794 paper called The Hidden Debt in 783 00:24:04,832 --> 00:24:06,378 Machine Learning Systems. It's a 784 00:24:06,384 --> 00:24:07,930 Google paper and it actually 785 00:24:08,000 --> 00:24:09,722 talks about there's a very cool 786 00:24:09,776 --> 00:24:10,938 diagram in there. I don't know 787 00:24:10,944 --> 00:24:12,460 if you can screen share here, 788 00:24:13,390 --> 00:24:15,386 but actually it's a diagram 789 00:24:15,418 --> 00:24:16,414 which shows that in all these 790 00:24:16,452 --> 00:24:18,106 different boxes, there's a tiny 791 00:24:18,138 --> 00:24:19,278 box called Machine Learning, and 792 00:24:19,284 --> 00:24:20,606 there's huge boxes around it 793 00:24:20,628 --> 00:24:22,560 which are all essentially data. 794 00:24:22,930 --> 00:24:24,590 I think it's a 2016 paper. 795 00:24:24,660 --> 00:24:27,618 If I'm not, I'm looking it up. 796 00:24:27,784 --> 00:24:29,474 Yeah. Hidden technical debt in 797 00:24:29,512 --> 00:24:31,554 machine learning systems is the 798 00:24:31,592 --> 00:24:35,398 paper. And there's a diagram on 799 00:24:35,404 --> 00:24:39,174 the third page which shows that. 800 00:24:39,212 --> 00:24:42,230 But it is interesting that even 801 00:24:42,300 --> 00:24:44,086 at that time, people who were 802 00:24:44,108 --> 00:24:46,422 working on these things were 803 00:24:46,476 --> 00:24:48,810 seeing the same pattern. 804 00:24:50,110 --> 00:24:52,810 Yes, I have seen this diagram. 805 00:24:57,230 --> 00:24:59,146 This is something that comes up 806 00:24:59,168 --> 00:25:00,586 in my day job quite a bit, where 807 00:25:00,608 --> 00:25:03,218 we talk about how the machine 808 00:25:03,254 --> 00:25:04,670 learning is only one part of it, 809 00:25:04,740 --> 00:25:07,886 right? There's a whole lot that 810 00:25:07,908 --> 00:25:09,760 has to go into that. So yeah, 811 00:25:10,370 --> 00:25:11,966 and that's a good point. And I 812 00:25:11,988 --> 00:25:13,726 think that everybody wants to be 813 00:25:13,748 --> 00:25:15,182 the rock star, right? Everybody 814 00:25:15,236 --> 00:25:16,514 wants to have their name up on 815 00:25:16,552 --> 00:25:19,202 lights, but the amount of people 816 00:25:19,256 --> 00:25:20,594 that goes to make that rock star 817 00:25:20,632 --> 00:25:22,818 look good, there's a lot of 818 00:25:22,824 --> 00:25:25,966 opportunity in there. And he's 819 00:25:25,998 --> 00:25:28,706 been a guest on the show. He's 820 00:25:28,738 --> 00:25:30,066 famous within a certain Internet 821 00:25:30,098 --> 00:25:32,040 circle. John Lee Dumas. Right? 822 00:25:32,410 --> 00:25:34,934 He has a phrase where I like 823 00:25:34,972 --> 00:25:37,046 boring boring is good because no 824 00:25:37,068 --> 00:25:38,746 one is competing to do the 825 00:25:38,768 --> 00:25:40,442 boring stuff. Not that data 826 00:25:40,496 --> 00:25:42,778 engineering is boring. I want to 827 00:25:42,784 --> 00:25:45,994 head off the hate mail right 828 00:25:46,032 --> 00:25:48,298 there. But no, I mean it's one 829 00:25:48,304 --> 00:25:51,782 of those things where there's 830 00:25:51,846 --> 00:25:53,498 enormous opportunity. If you 831 00:25:53,504 --> 00:25:57,326 look that box, it's one part of 832 00:25:57,348 --> 00:25:59,358 the the whole operation and. 833 00:25:59,444 --> 00:26:01,470 There'S a tiny little ML code 834 00:26:01,540 --> 00:26:01,966 box. 835 00:26:02,068 --> 00:26:03,978 In the it takes out all the 836 00:26:04,004 --> 00:26:05,780 oxygen in the room. But 837 00:26:06,150 --> 00:26:07,666 realistically, in order to have 838 00:26:07,688 --> 00:26:08,866 that little box, you need to 839 00:26:08,888 --> 00:26:11,602 stand on a lot of other 840 00:26:11,656 --> 00:26:15,874 operations. Right. Andy and I 841 00:26:15,912 --> 00:26:18,194 kind of had this back and forth 842 00:26:18,242 --> 00:26:20,246 and obviously data science is 843 00:26:20,268 --> 00:26:21,506 important, blah, blah, blah. AI 844 00:26:21,538 --> 00:26:22,374 is important, machine learning 845 00:26:22,412 --> 00:26:24,566 is important, but it stands on 846 00:26:24,588 --> 00:26:28,150 the shoulders of giants. Another 847 00:26:28,300 --> 00:26:30,520 analogy I've seen where it shows 848 00:26:31,310 --> 00:26:32,666 like a rocket, right, in a 849 00:26:32,688 --> 00:26:34,006 little capsule that holds 850 00:26:34,038 --> 00:26:36,426 people. But it's sitting on top 851 00:26:36,448 --> 00:26:38,186 of a massive rocket, which of 852 00:26:38,208 --> 00:26:39,866 course has the launch pad and 853 00:26:39,888 --> 00:26:41,820 all the other accessories to it. 854 00:26:42,830 --> 00:26:44,046 That's another way to look at 855 00:26:44,068 --> 00:26:44,158 it. 856 00:26:44,164 --> 00:26:44,286 Right. 857 00:26:44,308 --> 00:26:47,066 It is crucial. And I think it's 858 00:26:47,098 --> 00:26:49,182 a shame that we kind of not we, 859 00:26:49,236 --> 00:26:50,634 but it doesn't get the attention 860 00:26:50,682 --> 00:26:51,754 that it deserves. 861 00:26:51,882 --> 00:26:54,990 Yeah, I think data engineering 862 00:26:55,330 --> 00:26:58,030 is complex. It is also painful. 863 00:26:58,110 --> 00:26:59,458 It is also something that has to 864 00:26:59,464 --> 00:27:00,814 be done at a massive, massive 865 00:27:00,862 --> 00:27:02,690 scale. And it's challenging. But 866 00:27:02,760 --> 00:27:03,986 remember, it's a means to the 867 00:27:04,008 --> 00:27:05,806 end and people get fascinated 868 00:27:05,838 --> 00:27:07,058 about that end. That happens 869 00:27:07,144 --> 00:27:08,534 eventually, but the means to the 870 00:27:08,572 --> 00:27:10,246 end takes a lot of work. And 871 00:27:10,268 --> 00:27:11,574 sometimes, to be candid, it 872 00:27:11,612 --> 00:27:13,960 becomes tankless work. Because 873 00:27:15,210 --> 00:27:16,674 why are we such a big proponent 874 00:27:16,722 --> 00:27:18,054 of bringing automation into data 875 00:27:18,092 --> 00:27:19,026 engineering work? You cannot 876 00:27:19,058 --> 00:27:19,814 automate all of data 877 00:27:19,852 --> 00:27:21,014 engineering, just to be clear. 878 00:27:21,132 --> 00:27:22,186 But when you bring in 879 00:27:22,208 --> 00:27:23,306 automation, you're saying that 880 00:27:23,328 --> 00:27:24,746 if I'm running literally, I have 881 00:27:24,768 --> 00:27:25,866 customers running thousands of 882 00:27:25,888 --> 00:27:27,306 pipelines in our system, for 883 00:27:27,328 --> 00:27:29,818 example. And you don't want to 884 00:27:29,824 --> 00:27:30,966 be waken up on that Saturday 885 00:27:30,998 --> 00:27:32,154 night because one of those is 886 00:27:32,192 --> 00:27:33,826 not working. You want automation 887 00:27:33,878 --> 00:27:35,054 in there. Right? You want that 888 00:27:35,092 --> 00:27:36,446 system to work for you. 889 00:27:36,468 --> 00:27:38,702 Otherwise all you get in data 890 00:27:38,756 --> 00:27:40,526 engineering is a lot of lot of 891 00:27:40,548 --> 00:27:42,494 work to do and complaints when 892 00:27:42,532 --> 00:27:44,606 it doesn't work. But it is a 893 00:27:44,628 --> 00:27:46,734 very key niece then. So I do 894 00:27:46,772 --> 00:27:48,146 celebrate the work that they do. 895 00:27:48,168 --> 00:27:49,378 And if you look at OpenAI, for 896 00:27:49,384 --> 00:27:50,434 example, because it's such a hot 897 00:27:50,472 --> 00:27:51,966 company right now, that billion 898 00:27:51,998 --> 00:27:54,002 dollar plus in funding, very few 899 00:27:54,056 --> 00:27:55,746 companies could have done what 900 00:27:55,768 --> 00:27:57,026 they have done because they had 901 00:27:57,048 --> 00:27:58,418 that kind of money okay. 902 00:27:58,504 --> 00:27:58,754 Right. 903 00:27:58,792 --> 00:28:01,046 To begin with. But I bet if you 904 00:28:01,068 --> 00:28:02,214 look at how that money got 905 00:28:02,252 --> 00:28:04,166 spent, I'm sure a big chunk is 906 00:28:04,188 --> 00:28:06,342 in pipelines because of data and 907 00:28:06,396 --> 00:28:07,766 processing that and moving that 908 00:28:07,788 --> 00:28:09,786 around and we don't talk about 909 00:28:09,808 --> 00:28:12,374 it. But the reason that Chat GPD 910 00:28:12,422 --> 00:28:14,170 works so well is because it can 911 00:28:14,320 --> 00:28:17,866 look into all of that data and 912 00:28:18,048 --> 00:28:19,466 talk intelligently about it, 913 00:28:19,488 --> 00:28:20,060 right? 914 00:28:20,610 --> 00:28:23,886 No, absolutely. I don't know 915 00:28:23,908 --> 00:28:25,438 when this switch happened, but 916 00:28:25,524 --> 00:28:28,174 in terms of staffing, a. 917 00:28:28,212 --> 00:28:29,646 Previous job, at the end of 918 00:28:29,668 --> 00:28:33,774 2021, they wanted to do 919 00:28:33,812 --> 00:28:34,878 something. Can't say what it 920 00:28:34,884 --> 00:28:36,146 was, but they wanted to do 921 00:28:36,168 --> 00:28:37,460 something. And they said, oh, 922 00:28:38,870 --> 00:28:40,546 it'll be challenging to find the 923 00:28:40,568 --> 00:28:42,530 data scientist for this job. And 924 00:28:42,680 --> 00:28:43,938 my manager and I can look each 925 00:28:43,944 --> 00:28:45,346 other. Actually, at this point 926 00:28:45,368 --> 00:28:46,514 in time, it's going to be a lot 927 00:28:46,552 --> 00:28:47,774 harder to find a data engineers 928 00:28:47,822 --> 00:28:48,918 that you're going to need for 929 00:28:48,924 --> 00:28:49,046 that. 930 00:28:49,068 --> 00:28:49,206 Right. 931 00:28:49,228 --> 00:28:50,774 Because they only really needed 932 00:28:50,812 --> 00:28:52,534 two data scientists. But just 933 00:28:52,572 --> 00:28:54,034 based on what the aggressive 934 00:28:54,082 --> 00:28:54,966 thing that they were trying to 935 00:28:54,988 --> 00:28:59,014 build, they would need, I would 936 00:28:59,212 --> 00:29:00,598 just spitball and I would say a 937 00:29:00,604 --> 00:29:01,990 dozen data engineers. 938 00:29:02,350 --> 00:29:04,038 But from a technology provider 939 00:29:04,054 --> 00:29:05,238 and a tool provider perspective, 940 00:29:05,254 --> 00:29:06,170 I would say the interesting 941 00:29:06,240 --> 00:29:07,386 thing about data engineering is 942 00:29:07,408 --> 00:29:09,674 it is very complex, but the 943 00:29:09,712 --> 00:29:11,094 challenges are very consistent. 944 00:29:11,142 --> 00:29:12,894 I can look at our customers in 945 00:29:12,932 --> 00:29:14,686 retail, like a BedBath, or 946 00:29:14,708 --> 00:29:16,494 Forever 21, or in delivery like 947 00:29:16,532 --> 00:29:19,198 DoorDash, or in Pharma like 948 00:29:19,284 --> 00:29:21,594 Yansen JNJ, or in financial 949 00:29:21,642 --> 00:29:23,982 services, or in cybersecurity, 950 00:29:24,126 --> 00:29:26,114 all of them. The challenges at 951 00:29:26,152 --> 00:29:27,666 fundamental level are very 952 00:29:27,768 --> 00:29:29,838 similar, which is large amounts 953 00:29:29,854 --> 00:29:31,940 of diverse heterogeneous data. 954 00:29:32,310 --> 00:29:33,938 Being able to take that process 955 00:29:34,024 --> 00:29:35,534 that do that, reliably detect 956 00:29:35,582 --> 00:29:37,378 the issues, all of that stuff. 957 00:29:37,464 --> 00:29:40,158 Have data quality monitoring, 958 00:29:40,334 --> 00:29:42,280 make that data usable by people 959 00:29:42,890 --> 00:29:44,454 scaling. All of those are very 960 00:29:44,492 --> 00:29:46,614 similar. Which means that it 961 00:29:46,652 --> 00:29:47,926 fits very nicely into the 962 00:29:47,948 --> 00:29:49,530 traditional sort of problems 963 00:29:49,600 --> 00:29:51,142 that can be solved by software, 964 00:29:51,206 --> 00:29:52,474 problems that can be solved by 965 00:29:52,512 --> 00:29:55,420 automation sort of model. Right. 966 00:29:56,350 --> 00:29:58,586 So I think that is definitely a 967 00:29:58,608 --> 00:29:59,946 part where the challenges are 968 00:29:59,968 --> 00:30:02,122 not very unique to a certain 969 00:30:02,176 --> 00:30:03,118 problem. And of course, there 970 00:30:03,124 --> 00:30:04,686 are unique flavors to it. Some 971 00:30:04,708 --> 00:30:05,806 have real time data, some have 972 00:30:05,828 --> 00:30:07,406 data from devices, some have 973 00:30:07,428 --> 00:30:09,726 data from legacy systems and so 974 00:30:09,748 --> 00:30:11,280 on. But yeah, there is 975 00:30:12,130 --> 00:30:13,238 structure. 976 00:30:13,434 --> 00:30:14,014 Excellent. 977 00:30:14,062 --> 00:30:14,660 Cool. 978 00:30:15,510 --> 00:30:17,966 Very cool. All right, I'm 979 00:30:17,998 --> 00:30:19,678 tweeting about Nexla. 980 00:30:19,854 --> 00:30:20,242 Cool. 981 00:30:20,296 --> 00:30:20,962 Thank you. 982 00:30:21,096 --> 00:30:22,020 Right now. 983 00:30:24,150 --> 00:30:25,846 And for all the stalkers, we are 984 00:30:25,868 --> 00:30:31,574 recording this on April 12, just 985 00:30:31,612 --> 00:30:33,058 FYI. So if you look at Andy's 986 00:30:33,074 --> 00:30:33,878 feed and you're like, where is 987 00:30:33,884 --> 00:30:37,000 the tweet? You have to go back. 988 00:30:38,250 --> 00:30:40,246 So at this point in the show, we 989 00:30:40,268 --> 00:30:41,370 want to switch to kind of the 990 00:30:41,440 --> 00:30:43,642 pre found questions. And given 991 00:30:43,696 --> 00:30:46,282 your background, the first one, 992 00:30:46,336 --> 00:30:47,178 I really have to know the 993 00:30:47,184 --> 00:30:48,522 answer, right? We always ask, 994 00:30:48,656 --> 00:30:50,042 how did you find your way into 995 00:30:50,096 --> 00:30:52,410 data? Did the data life find you 996 00:30:52,560 --> 00:30:54,160 or did you find data? 997 00:30:54,530 --> 00:30:55,582 I think it was happening 998 00:30:55,636 --> 00:30:57,934 together, right? I guess so. My 999 00:30:57,972 --> 00:30:59,886 decision to start a company, and 1000 00:30:59,988 --> 00:31:01,066 I mentioned to you guys, I'd 1001 00:31:01,098 --> 00:31:03,386 been at Nvidia, on the compute 1002 00:31:03,418 --> 00:31:05,614 side of the world, really, and 1003 00:31:05,732 --> 00:31:07,178 at some point I decided to start 1004 00:31:07,204 --> 00:31:08,866 my own company. And when I was 1005 00:31:08,888 --> 00:31:10,818 looking to do that in 2009, I 1006 00:31:10,824 --> 00:31:13,794 was like, where do I go? Build a 1007 00:31:13,832 --> 00:31:15,186 platform, if you will. And I 1008 00:31:15,208 --> 00:31:16,846 felt that at the time, in 2009, 1009 00:31:16,888 --> 00:31:17,958 apps were new and I'm going to 1010 00:31:17,964 --> 00:31:19,238 build a monetization platform 1011 00:31:19,324 --> 00:31:21,106 for app developers in the mobile 1012 00:31:21,138 --> 00:31:24,838 space. That whole approach about 1013 00:31:25,004 --> 00:31:26,614 building something around ad 1014 00:31:26,652 --> 00:31:28,006 servers and one of the early ad 1015 00:31:28,028 --> 00:31:30,326 servers in the mobile space, you 1016 00:31:30,348 --> 00:31:31,942 realize that it's a very data 1017 00:31:31,996 --> 00:31:33,642 driven world in advertising. And 1018 00:31:33,696 --> 00:31:35,306 there is a reason why a lot of 1019 00:31:35,328 --> 00:31:36,538 the data innovation, I would 1020 00:31:36,544 --> 00:31:37,926 say, if you trace its roots, 1021 00:31:38,038 --> 00:31:39,354 come from advertising. Whether 1022 00:31:39,392 --> 00:31:40,658 it was Yahoo, whether it was 1023 00:31:40,684 --> 00:31:42,330 Google, whether it was Facebook, 1024 00:31:43,310 --> 00:31:44,666 what were these guys doing with 1025 00:31:44,688 --> 00:31:45,726 data in the first place? They 1026 00:31:45,748 --> 00:31:47,374 were dealing with a huge number 1027 00:31:47,412 --> 00:31:48,826 of people visiting those pages 1028 00:31:48,858 --> 00:31:50,238 and clicking on those ads, and 1029 00:31:50,244 --> 00:31:51,518 they had to really figure out 1030 00:31:51,604 --> 00:31:53,006 how to show the performance and 1031 00:31:53,028 --> 00:31:54,094 say, which ad should you spend 1032 00:31:54,132 --> 00:31:55,406 more money in, where should you 1033 00:31:55,428 --> 00:31:57,166 not? And a lot of machine 1034 00:31:57,198 --> 00:31:58,546 learning systems that we built 1035 00:31:58,648 --> 00:32:00,610 early on back in 2011, twelve 1036 00:32:00,680 --> 00:32:02,510 actually were for that purpose. 1037 00:32:02,670 --> 00:32:04,226 We ran one of the largest ad 1038 00:32:04,248 --> 00:32:06,354 auction systems at the time, and 1039 00:32:06,472 --> 00:32:07,774 if you're running an automated 1040 00:32:07,822 --> 00:32:09,442 ad auction with 15 billion 1041 00:32:09,506 --> 00:32:11,734 auctions happening and 300 1042 00:32:11,772 --> 00:32:13,878 billion bids on that, you have a 1043 00:32:13,884 --> 00:32:15,430 lot of data. But you can also 1044 00:32:15,580 --> 00:32:17,366 figure out that, hey, based on 1045 00:32:17,388 --> 00:32:20,594 certain patterns, I can decide 1046 00:32:20,642 --> 00:32:21,978 who to invite for an auction, I 1047 00:32:21,984 --> 00:32:23,274 can decide what the floor price 1048 00:32:23,312 --> 00:32:24,634 should be. And those were all 1049 00:32:24,672 --> 00:32:25,866 machine learning systems. So I 1050 00:32:25,888 --> 00:32:28,314 ended up actually building this 1051 00:32:28,352 --> 00:32:30,586 advertising technology and 1052 00:32:30,608 --> 00:32:32,906 system in those days to solve 1053 00:32:33,018 --> 00:32:34,606 that developer problem of like, 1054 00:32:34,628 --> 00:32:36,654 I'm building apps, how do I 1055 00:32:36,692 --> 00:32:38,606 monetize it? But realizing that 1056 00:32:38,628 --> 00:32:40,400 a whole chunk of it was data, 1057 00:32:41,330 --> 00:32:42,814 and this was a lot of the data 1058 00:32:42,852 --> 00:32:45,110 stuff that we did was pre kafka, 1059 00:32:45,290 --> 00:32:47,186 even when big data hadoop was 1060 00:32:47,208 --> 00:32:48,706 relatively early at the time, so 1061 00:32:48,728 --> 00:32:50,606 the technologies were limited. 1062 00:32:50,798 --> 00:32:52,498 Did a lot of homegrown stuff at 1063 00:32:52,504 --> 00:32:54,786 the time, but realized that this 1064 00:32:54,808 --> 00:32:56,994 is a massive problem. And I 1065 00:32:57,032 --> 00:32:59,638 think data and I sort of met in 1066 00:32:59,644 --> 00:33:00,966 that time, but I'm coming from 1067 00:33:00,988 --> 00:33:02,680 this compute land. I'm building 1068 00:33:03,050 --> 00:33:04,626 software for embedded systems 1069 00:33:04,658 --> 00:33:06,354 where you are trained to squeeze 1070 00:33:06,402 --> 00:33:09,014 every single kilobyte and every 1071 00:33:09,052 --> 00:33:10,406 single ounce of performance in 1072 00:33:10,428 --> 00:33:11,418 some of these systems. Like I 1073 00:33:11,424 --> 00:33:12,362 mentioned, I was building 1074 00:33:12,416 --> 00:33:13,494 software for the PlayStation 1075 00:33:13,542 --> 00:33:16,058 Three when I was at Nvidia. And 1076 00:33:16,064 --> 00:33:17,386 you come from that mindset of 1077 00:33:17,408 --> 00:33:19,386 high performance squeezing the 1078 00:33:19,408 --> 00:33:20,906 most of the system, and you see 1079 00:33:20,928 --> 00:33:22,506 the data challenges. And I 1080 00:33:22,528 --> 00:33:24,226 thought it sort of intersected 1081 00:33:24,278 --> 00:33:25,886 nicely for me in sort of a 1082 00:33:25,908 --> 00:33:28,138 developer or a product approach. 1083 00:33:28,314 --> 00:33:29,934 And then we said, well, more 1084 00:33:29,972 --> 00:33:31,310 people need to be using data. 1085 00:33:31,380 --> 00:33:32,286 That's where the world is 1086 00:33:32,308 --> 00:33:33,518 headed. Everybody is going to 1087 00:33:33,524 --> 00:33:35,022 become a data user. I could see 1088 00:33:35,076 --> 00:33:37,442 my second grade kid at the time, 1089 00:33:37,496 --> 00:33:39,266 and they do simple survey in the 1090 00:33:39,288 --> 00:33:40,946 class, and they created a 1091 00:33:40,968 --> 00:33:42,034 histogram like, okay, 1092 00:33:42,072 --> 00:33:43,682 everybody's going to be data 1093 00:33:43,736 --> 00:33:45,474 user. How do you really get 1094 00:33:45,512 --> 00:33:47,138 there? Not everybody's going to 1095 00:33:47,144 --> 00:33:48,706 be technical and engineer. So 1096 00:33:48,728 --> 00:33:52,134 that is kind of where my sort of 1097 00:33:52,172 --> 00:33:54,054 direct experience in the data 1098 00:33:54,092 --> 00:33:55,126 world started to happen, is 1099 00:33:55,148 --> 00:33:56,134 like, we want to solve that 1100 00:33:56,172 --> 00:33:57,254 problem. We want to make it 1101 00:33:57,292 --> 00:33:58,822 possible for anybody to use 1102 00:33:58,876 --> 00:34:00,486 data. And what is standing in 1103 00:34:00,508 --> 00:34:01,526 their way is that data is 1104 00:34:01,548 --> 00:34:03,514 complex. It's everywhere. It is 1105 00:34:03,552 --> 00:34:05,162 hard to work with. Only 1106 00:34:05,216 --> 00:34:06,682 developers are able to do that. 1107 00:34:06,736 --> 00:34:08,154 So we're going to automate that. 1108 00:34:08,192 --> 00:34:09,306 We're going to bring that and 1109 00:34:09,328 --> 00:34:10,986 present that data to this user 1110 00:34:11,098 --> 00:34:12,286 and they'll be able to use it 1111 00:34:12,308 --> 00:34:14,206 wherever they want. And that was 1112 00:34:14,228 --> 00:34:15,582 the driver for me. 1113 00:34:15,716 --> 00:34:20,606 Wow, that's fascinating. Our 1114 00:34:20,628 --> 00:34:22,334 next question is what's your 1115 00:34:22,372 --> 00:34:24,162 favorite part of your current 1116 00:34:24,216 --> 00:34:24,820 job? 1117 00:34:26,070 --> 00:34:30,194 I think the CEO job and the co 1118 00:34:30,232 --> 00:34:33,326 founder job is a new challenge, 1119 00:34:33,358 --> 00:34:36,514 I would say every day and all 1120 00:34:36,552 --> 00:34:38,046 sorts of unexpected things. I'm 1121 00:34:38,078 --> 00:34:39,286 still a very product person at 1122 00:34:39,308 --> 00:34:41,174 heart, so it's like those things 1123 00:34:41,212 --> 00:34:44,166 are always fun to look at. But I 1124 00:34:44,188 --> 00:34:46,246 would say no day is similar to 1125 00:34:46,268 --> 00:34:47,606 the last one is the best part of 1126 00:34:47,628 --> 00:34:49,974 it. I think about a month back, 1127 00:34:50,092 --> 00:34:51,226 all of a sudden we were like, 1128 00:34:51,248 --> 00:34:52,518 oh, the bank that we are banking 1129 00:34:52,534 --> 00:34:53,978 with is going under and what do 1130 00:34:53,984 --> 00:34:59,674 you do? Did I go in or did any 1131 00:34:59,712 --> 00:35:01,050 of those CEOs go in on that 1132 00:35:01,120 --> 00:35:02,506 Thursday or Friday morning to 1133 00:35:02,528 --> 00:35:03,578 say, oh, this is what I'm going 1134 00:35:03,584 --> 00:35:04,718 to do. I'm going to spend the 1135 00:35:04,724 --> 00:35:06,286 whole weekend figuring out if we 1136 00:35:06,308 --> 00:35:07,518 have any money as a company or 1137 00:35:07,524 --> 00:35:10,094 we're out. All of a sudden the 1138 00:35:10,132 --> 00:35:11,518 rug pulled underneath us. So I 1139 00:35:11,524 --> 00:35:13,610 think that's the challenge. But 1140 00:35:13,620 --> 00:35:17,394 that's also the fun of this 1141 00:35:17,432 --> 00:35:19,506 particular role. I do enjoy the 1142 00:35:19,528 --> 00:35:20,594 thought that we are doing some 1143 00:35:20,632 --> 00:35:23,506 very cool stuff in data. And the 1144 00:35:23,528 --> 00:35:24,386 number one source of 1145 00:35:24,408 --> 00:35:25,746 satisfaction for me is when our 1146 00:35:25,768 --> 00:35:27,480 customers come to us and say, 1147 00:35:29,050 --> 00:35:31,298 this user of ours in this pharma 1148 00:35:31,314 --> 00:35:32,486 industry, they're like, you know 1149 00:35:32,508 --> 00:35:34,150 what, this data that we're using 1150 00:35:34,220 --> 00:35:36,482 with you guys, it was processing 1151 00:35:36,546 --> 00:35:38,486 in multiple hours and now it 1152 00:35:38,508 --> 00:35:39,718 happens in nine minutes. I was 1153 00:35:39,724 --> 00:35:42,474 like, wow, my goodness. And when 1154 00:35:42,512 --> 00:35:43,526 I hear those kind of stories, 1155 00:35:43,558 --> 00:35:44,682 I'm like, okay, we are actually 1156 00:35:44,736 --> 00:35:46,266 delivering value. I don't know 1157 00:35:46,288 --> 00:35:48,586 how to make a medicine or a 1158 00:35:48,608 --> 00:35:50,170 medical device, but these guys 1159 00:35:50,240 --> 00:35:51,934 who know how to do that, we are 1160 00:35:51,972 --> 00:35:53,486 somehow enabling them to do 1161 00:35:53,508 --> 00:35:55,566 their job better. And that is, I 1162 00:35:55,588 --> 00:35:56,574 think, ultimately the 1163 00:35:56,612 --> 00:35:57,822 satisfaction of the work. 1164 00:35:57,876 --> 00:35:58,574 Right? 1165 00:35:58,772 --> 00:35:59,886 Very cool. 1166 00:36:00,068 --> 00:36:03,120 Interesting. So we have three 1167 00:36:04,370 --> 00:36:06,226 complete the sentences. And the 1168 00:36:06,248 --> 00:36:07,554 first one is when I'm not 1169 00:36:07,592 --> 00:36:09,730 working, I enjoy blank. 1170 00:36:10,950 --> 00:36:13,074 So many things. I love road 1171 00:36:13,112 --> 00:36:14,386 biking. I do that a lot right 1172 00:36:14,408 --> 00:36:17,940 now, but I love snowboarding and 1173 00:36:19,210 --> 00:36:21,218 a bunch of activities. I don't 1174 00:36:21,234 --> 00:36:22,726 do flying anymore actively, but 1175 00:36:22,748 --> 00:36:24,102 that's another one I would do. 1176 00:36:24,236 --> 00:36:27,110 Nice. Very cool. Our next 1177 00:36:27,180 --> 00:36:29,034 complete the sentence. I think 1178 00:36:29,072 --> 00:36:30,554 the coolest thing in technology 1179 00:36:30,672 --> 00:36:32,570 today is blank. 1180 00:36:36,750 --> 00:36:38,314 I would have to say that 1181 00:36:38,432 --> 00:36:40,026 Generative AI is certainly one 1182 00:36:40,048 --> 00:36:41,322 of the coolest things out there. 1183 00:36:41,376 --> 00:36:42,686 I think I'm still trying to 1184 00:36:42,708 --> 00:36:44,042 understand from a technical 1185 00:36:44,106 --> 00:36:45,902 engineering perspective, like 1186 00:36:46,036 --> 00:36:47,966 the ins and outs of it, but it 1187 00:36:47,988 --> 00:36:49,214 is fascinating. It is also 1188 00:36:49,252 --> 00:36:52,154 scary, to be honest. I wouldn't 1189 00:36:52,202 --> 00:36:53,300 deny that either. 1190 00:36:53,670 --> 00:36:55,634 Well, it's funny you mentioned 1191 00:36:55,672 --> 00:36:57,586 that because I was thinking on 1192 00:36:57,608 --> 00:37:00,100 that earlier today. Even. And 1193 00:37:01,110 --> 00:37:03,282 the whole idea, the moving 1194 00:37:03,336 --> 00:37:05,654 parts, when you start thinking 1195 00:37:05,692 --> 00:37:07,686 about the image generation, just 1196 00:37:07,708 --> 00:37:10,646 take a subset and you think 1197 00:37:10,668 --> 00:37:12,182 about what goes into that. You 1198 00:37:12,236 --> 00:37:14,534 describe something, so it has to 1199 00:37:14,572 --> 00:37:16,150 understand what you describe, 1200 00:37:16,810 --> 00:37:19,366 and that has that LLM component 1201 00:37:19,398 --> 00:37:22,874 to it. Right. And then it 1202 00:37:22,912 --> 00:37:24,394 interprets that in such a way 1203 00:37:24,432 --> 00:37:27,866 and then probably tokenizes it, 1204 00:37:27,888 --> 00:37:29,194 and then it generates this 1205 00:37:29,232 --> 00:37:32,238 image. And I was reading a 1206 00:37:32,244 --> 00:37:35,726 blurb, a quote from someone at 1207 00:37:35,748 --> 00:37:38,014 Nvidia today, and that's what 1208 00:37:38,052 --> 00:37:40,814 kind of got me off doing 1209 00:37:40,852 --> 00:37:42,926 Billable work, mind you, and 1210 00:37:43,108 --> 00:37:44,890 running down the rabbit hole. 1211 00:37:44,970 --> 00:37:47,074 And I think it was a guy from 1212 00:37:47,112 --> 00:37:48,626 Nvidia. And if it wasn't, then I 1213 00:37:48,648 --> 00:37:50,354 apologize. But it was a person 1214 00:37:50,392 --> 00:37:52,414 at Nvidia who made the statement 1215 00:37:52,542 --> 00:37:56,614 that we're approaching that 1216 00:37:56,652 --> 00:37:57,750 point where we're no longer 1217 00:37:57,820 --> 00:38:01,074 rendering the pixels we're 1218 00:38:01,122 --> 00:38:03,862 generating at the pixel level. 1219 00:38:03,916 --> 00:38:05,762 So now they're rendering 1220 00:38:05,906 --> 00:38:08,982 splotches of it. Yes, generated, 1221 00:38:09,046 --> 00:38:11,290 but probably pulled in from 1222 00:38:11,360 --> 00:38:12,634 someplace based on the 1223 00:38:12,672 --> 00:38:15,286 description. A tokenized image 1224 00:38:15,318 --> 00:38:17,110 from a tokenized description. 1225 00:38:17,270 --> 00:38:18,442 But they're talking about 1226 00:38:18,496 --> 00:38:20,618 generating the pixels suck in. 1227 00:38:20,784 --> 00:38:21,258 Wow. 1228 00:38:21,344 --> 00:38:22,474 I would like to understand how 1229 00:38:22,512 --> 00:38:23,686 lighting because lighting has 1230 00:38:23,728 --> 00:38:25,006 always been the biggest part in 1231 00:38:25,028 --> 00:38:27,534 doing this, right? And how that 1232 00:38:27,572 --> 00:38:28,606 applies to it. But I would say 1233 00:38:28,628 --> 00:38:30,174 the reason I used to have 1234 00:38:30,212 --> 00:38:31,326 unconditional love for 1235 00:38:31,348 --> 00:38:33,054 technology innovation was ten 1236 00:38:33,092 --> 00:38:34,414 years ago. Everything that is 1237 00:38:34,452 --> 00:38:36,114 better is always or faster is 1238 00:38:36,152 --> 00:38:37,940 better. But I would say that 1239 00:38:38,470 --> 00:38:41,182 post social media and YouTube 1240 00:38:41,246 --> 00:38:42,898 and all of the stuff have become 1241 00:38:42,984 --> 00:38:44,226 a little bit concerned that we 1242 00:38:44,248 --> 00:38:45,634 really have to understand what 1243 00:38:45,672 --> 00:38:47,378 is this technology going to do? 1244 00:38:47,464 --> 00:38:48,854 And nothing scares me more than 1245 00:38:48,892 --> 00:38:50,902 that about genetic AI. Is that 1246 00:38:51,036 --> 00:38:52,566 okay? It is a cool piece of 1247 00:38:52,588 --> 00:38:55,186 innovation, but unlike a faster 1248 00:38:55,218 --> 00:38:56,326 chip, which was almost a no 1249 00:38:56,348 --> 00:38:57,478 brainer, I think now the 1250 00:38:57,484 --> 00:38:59,466 question is like, oh, okay, what 1251 00:38:59,488 --> 00:39:00,758 is it going to do that we can't 1252 00:39:00,774 --> 00:39:02,058 even think about today? 1253 00:39:02,224 --> 00:39:04,362 Yeah, I'm with you. And I get 1254 00:39:04,416 --> 00:39:08,218 the hesitancy and the thinking 1255 00:39:08,384 --> 00:39:10,330 part of our population calling 1256 00:39:10,400 --> 00:39:12,494 for a moratorium, kind of a six 1257 00:39:12,532 --> 00:39:15,374 month pause. Knowing what I know 1258 00:39:15,412 --> 00:39:17,326 about geeks and engineers, even 1259 00:39:17,428 --> 00:39:18,574 knowing what I know about me, 1260 00:39:18,612 --> 00:39:20,446 that's not going to happen. I 1261 00:39:20,468 --> 00:39:21,920 found a quote. It's from 1262 00:39:22,370 --> 00:39:25,760 digitalnative Substac.com, and 1263 00:39:26,930 --> 00:39:28,386 here's a quote from it. It's in 1264 00:39:28,408 --> 00:39:31,010 a talk with Sequoia last week, 1265 00:39:31,160 --> 00:39:34,500 nvidia CEO Jensen Huang said, 1266 00:39:34,870 --> 00:39:36,534 every single pixel will be 1267 00:39:36,572 --> 00:39:38,514 generated soon. Not rendered, 1268 00:39:38,642 --> 00:39:41,318 generated. And that was from, I 1269 00:39:41,324 --> 00:39:43,766 think, at the time of this 1270 00:39:43,788 --> 00:39:44,806 recording. It's either the 1271 00:39:44,828 --> 00:39:46,614 latest from Digital native or 1272 00:39:46,732 --> 00:39:48,550 next to the latest. 1273 00:39:50,250 --> 00:39:51,706 And he's a very smart guy, and 1274 00:39:51,728 --> 00:39:53,514 he knows the stuff better than 1275 00:39:53,552 --> 00:39:55,126 anybody out there. So if he's 1276 00:39:55,158 --> 00:39:56,762 saying it, I believe it. Okay? 1277 00:39:56,816 --> 00:39:56,986 Yeah. 1278 00:39:57,008 --> 00:39:59,738 But I don't know if everybody 1279 00:39:59,824 --> 00:40:02,400 gets how big of a leap that is 1280 00:40:02,850 --> 00:40:05,182 to go from what we were doing 1281 00:40:05,236 --> 00:40:10,478 before to generating pixels. I 1282 00:40:10,484 --> 00:40:12,478 don't know. Maybe I'm making too 1283 00:40:12,484 --> 00:40:13,754 much out of it, but it boggles 1284 00:40:13,802 --> 00:40:14,510 my mind. 1285 00:40:14,660 --> 00:40:16,046 At least the game logic and 1286 00:40:16,068 --> 00:40:17,266 stuff is going there. I was 1287 00:40:17,288 --> 00:40:18,546 reading something about how they 1288 00:40:18,568 --> 00:40:20,946 put these AI agents in the game. 1289 00:40:20,968 --> 00:40:22,146 In this paper that came out, I 1290 00:40:22,168 --> 00:40:23,202 think, two or three days back, 1291 00:40:23,256 --> 00:40:25,346 and they figured out to do a 1292 00:40:25,368 --> 00:40:26,706 Valentine's Day party in there 1293 00:40:26,728 --> 00:40:27,954 and invite people and all that 1294 00:40:27,992 --> 00:40:29,734 sort of social mechanics were 1295 00:40:29,772 --> 00:40:30,598 happening in that sort of 1296 00:40:30,604 --> 00:40:31,526 generative way. So they're not 1297 00:40:31,548 --> 00:40:34,150 pre programming these goodness. 1298 00:40:34,810 --> 00:40:36,182 That was also sort of crazy 1299 00:40:36,236 --> 00:40:37,126 fascinating, because when it 1300 00:40:37,148 --> 00:40:38,486 means to game development and 1301 00:40:38,508 --> 00:40:39,766 the gaming experience, you can 1302 00:40:39,788 --> 00:40:42,718 have a truly sort of a multiplat 1303 00:40:42,754 --> 00:40:44,218 storyline of any sort and you 1304 00:40:44,224 --> 00:40:45,366 don't know what everybody's 1305 00:40:45,398 --> 00:40:46,266 gaming experience is going to do 1306 00:40:46,288 --> 00:40:47,386 different. I mean, there was a 1307 00:40:47,408 --> 00:40:49,290 thing at a time when you would 1308 00:40:49,360 --> 00:40:50,506 code those things and make 1309 00:40:50,528 --> 00:40:51,242 people have a different 1310 00:40:51,296 --> 00:40:51,862 experience. 1311 00:40:52,016 --> 00:40:54,206 Right. One more thing and then 1312 00:40:54,228 --> 00:40:55,822 we'll shut up. Do you remember 1313 00:40:55,876 --> 00:40:58,254 that Black Mirror episode where 1314 00:40:58,372 --> 00:41:00,894 the lady's social score was just 1315 00:41:00,932 --> 00:41:02,494 crashing as she went to some 1316 00:41:02,532 --> 00:41:04,674 gathering and by the time she 1317 00:41:04,712 --> 00:41:06,338 got there, she couldn't get in 1318 00:41:06,504 --> 00:41:07,666 because she didn't have a high 1319 00:41:07,688 --> 00:41:11,426 enough social score? It's like 1320 00:41:11,528 --> 00:41:13,314 we're getting to more and more 1321 00:41:13,352 --> 00:41:16,174 to that point in some place. 1322 00:41:16,232 --> 00:41:16,982 Of the world that already 1323 00:41:17,036 --> 00:41:17,750 exists. 1324 00:41:22,330 --> 00:41:23,778 Your next fill in the blank. I'm 1325 00:41:23,794 --> 00:41:24,214 sorry. 1326 00:41:24,332 --> 00:41:26,710 Sure. No, it's all good stuff. 1327 00:41:26,860 --> 00:41:28,278 That's what makes this podcast I 1328 00:41:28,284 --> 00:41:29,298 mean, dare I say, it makes this 1329 00:41:29,324 --> 00:41:30,230 podcast look cool. But that's 1330 00:41:30,230 --> 00:41:30,794 what makes this field 1331 00:41:30,832 --> 00:41:32,234 interesting. Right. It's not 1332 00:41:32,272 --> 00:41:33,574 just about the bits anymore. 1333 00:41:33,622 --> 00:41:36,378 Right. There's social 1334 00:41:36,464 --> 00:41:38,586 connotations. Now, I used to 1335 00:41:38,608 --> 00:41:40,954 also be unquestioned fan of 1336 00:41:40,992 --> 00:41:42,294 technology. It makes everything 1337 00:41:42,352 --> 00:41:43,406 better, it's going to solve the 1338 00:41:43,428 --> 00:41:45,374 problems. And here we are some 1339 00:41:45,412 --> 00:41:47,166 ten years later. It's like we 1340 00:41:47,188 --> 00:41:48,158 actually created a whole bunch 1341 00:41:48,164 --> 00:41:53,218 of new problems. And I don't 1342 00:41:53,224 --> 00:41:55,186 know, is that maturity? I'm ten 1343 00:41:55,208 --> 00:41:57,620 years older, or is that kind of 1344 00:41:58,550 --> 00:42:01,554 the state of the technology that 1345 00:42:01,592 --> 00:42:05,234 we have created? No, I'll let 1346 00:42:05,272 --> 00:42:06,546 the philosophers debate that 1347 00:42:06,568 --> 00:42:07,140 one. 1348 00:42:08,070 --> 00:42:09,494 I think it's evolution because 1349 00:42:09,532 --> 00:42:10,994 we went from building the basic 1350 00:42:11,042 --> 00:42:12,006 infrastructure, which is like 1351 00:42:12,028 --> 00:42:13,526 chips and compute and stuff, and 1352 00:42:13,548 --> 00:42:15,586 those things are but we didn't 1353 00:42:15,618 --> 00:42:16,998 see that end application to the 1354 00:42:17,004 --> 00:42:18,166 level that we are seeing now. So 1355 00:42:18,188 --> 00:42:20,454 it's better building technology. 1356 00:42:20,572 --> 00:42:21,866 Bricks, cement, all that is 1357 00:42:21,888 --> 00:42:22,986 good, but suddenly cities are 1358 00:42:23,008 --> 00:42:24,794 built that look crazy and 1359 00:42:24,832 --> 00:42:25,914 whatever, and you're like, oh, 1360 00:42:25,952 --> 00:42:27,114 is that what we're trying to do 1361 00:42:27,152 --> 00:42:30,666 here? So I think it's just 1362 00:42:30,688 --> 00:42:31,854 applying technology to every 1363 00:42:31,892 --> 00:42:32,800 kind of problem. 1364 00:42:33,730 --> 00:42:35,886 Yeah, no, absolutely. All right. 1365 00:42:35,908 --> 00:42:37,454 And our third and final complete 1366 00:42:37,492 --> 00:42:39,326 the sentence I look forward to 1367 00:42:39,348 --> 00:42:40,766 today when I can use technology 1368 00:42:40,868 --> 00:42:42,030 to blank. 1369 00:42:46,050 --> 00:42:49,266 I think that I would say, like, 1370 00:42:49,288 --> 00:42:50,498 the self driving car. I think it 1371 00:42:50,504 --> 00:42:51,826 would save me a good bunch of 1372 00:42:51,848 --> 00:42:53,054 time if it really becomes 1373 00:42:53,102 --> 00:42:54,286 reliable. I don't know if I'll 1374 00:42:54,318 --> 00:42:55,380 trust it, but. 1375 00:42:57,370 --> 00:42:59,010 Why is it that all the engineers 1376 00:42:59,090 --> 00:43:00,722 are suspicious of self driving 1377 00:43:00,786 --> 00:43:01,510 cars? 1378 00:43:04,170 --> 00:43:05,686 I'll be candid with you. At some 1379 00:43:05,708 --> 00:43:06,578 level I feel like when I'm 1380 00:43:06,594 --> 00:43:08,134 driving the car, it's not that 1381 00:43:08,172 --> 00:43:09,298 much work. And I'm sitting 1382 00:43:09,314 --> 00:43:10,746 there, I might as well just have 1383 00:43:10,768 --> 00:43:11,786 my hand on, because most of the 1384 00:43:11,808 --> 00:43:13,786 stuff is got an automated like 1385 00:43:13,968 --> 00:43:16,022 those adaptive cruises and lane 1386 00:43:16,086 --> 00:43:17,706 management and stuff. So, yeah, 1387 00:43:17,808 --> 00:43:18,602 that much work. 1388 00:43:18,656 --> 00:43:22,302 But yeah, adaptive cruise 1389 00:43:22,356 --> 00:43:25,886 control is not self driving in 1390 00:43:25,908 --> 00:43:27,358 the purest sense, but I will 1391 00:43:27,364 --> 00:43:29,902 tell you, I can't live without 1392 00:43:29,956 --> 00:43:32,894 it. Now, my car, this is first 1393 00:43:32,932 --> 00:43:34,350 world problems, right? My car 1394 00:43:34,420 --> 00:43:38,226 will break all the way to 0. 1395 00:43:38,248 --> 00:43:39,602 Wife's car will stop at around 1396 00:43:39,656 --> 00:43:42,766 22 when I'm stuck in a traffic 1397 00:43:42,798 --> 00:43:44,338 jam in my car. I wouldn't say 1398 00:43:44,344 --> 00:43:46,226 it's no big deal, but I can just 1399 00:43:46,248 --> 00:43:48,294 kind of sit back and let the car 1400 00:43:48,332 --> 00:43:50,546 handle the braking in my wife's 1401 00:43:50,578 --> 00:43:52,294 car when it goes below a certain 1402 00:43:52,332 --> 00:43:53,846 speed. Now I'm on the hook. It 1403 00:43:53,868 --> 00:43:55,366 actually is annoying, which is 1404 00:43:55,388 --> 00:44:01,850 pretty funny. But next question. 1405 00:44:01,920 --> 00:44:02,582 Andy? 1406 00:44:02,726 --> 00:44:04,586 I'm in. Yeah, I was going to 1407 00:44:04,608 --> 00:44:05,654 interrupt you there, Frank. 1408 00:44:05,702 --> 00:44:07,386 Before you yell at you to put 1409 00:44:07,408 --> 00:44:09,180 down the shovel and climb out, 1410 00:44:12,770 --> 00:44:14,142 share something different about 1411 00:44:14,196 --> 00:44:16,298 yourself. Socket. But we remind 1412 00:44:16,394 --> 00:44:19,806 our interviewees that it's a 1413 00:44:19,828 --> 00:44:21,758 family podcast, so we want to 1414 00:44:21,764 --> 00:44:24,158 keep our clean ratings. So 1415 00:44:24,324 --> 00:44:25,710 something different about you. 1416 00:44:25,780 --> 00:44:27,380 And you already mentioned a few. 1417 00:44:28,790 --> 00:44:32,526 Yeah, from a family perspective, 1418 00:44:32,638 --> 00:44:34,594 I'm a dad of three kids, two 1419 00:44:34,632 --> 00:44:37,766 boys who are 13 and ten and a 1420 00:44:37,788 --> 00:44:41,046 daughter who is six. And there 1421 00:44:41,068 --> 00:44:43,622 is definitely a joy to seeing 1422 00:44:43,676 --> 00:44:45,526 all of that happen. So, yeah, 1423 00:44:45,708 --> 00:44:47,126 I'm a pretty regular individual 1424 00:44:47,228 --> 00:44:49,900 in that way, but definitely 1425 00:44:51,150 --> 00:44:54,762 bitten by the desire to build. 1426 00:44:54,816 --> 00:44:56,266 And I'm like, I see a problem, I 1427 00:44:56,288 --> 00:44:57,526 have to solve it. And that's 1428 00:44:57,558 --> 00:44:59,162 kind of what landed me in this 1429 00:44:59,296 --> 00:45:00,958 entrepreneur boat. Okay. 1430 00:45:01,044 --> 00:45:02,720 Yeah, that's awesome. 1431 00:45:04,930 --> 00:45:09,470 Awesome. And the next question, 1432 00:45:09,540 --> 00:45:10,622 although technically speaking, 1433 00:45:10,676 --> 00:45:11,966 it's out of order, so I need to 1434 00:45:11,988 --> 00:45:14,802 fix that. Audible sponsors data 1435 00:45:14,856 --> 00:45:17,342 driven. Do you do audiobooks? 1436 00:45:17,486 --> 00:45:19,586 And if so, can you recommend a 1437 00:45:19,608 --> 00:45:21,266 good one? And if you don't do 1438 00:45:21,288 --> 00:45:23,394 audiobooks, any book 1439 00:45:23,432 --> 00:45:24,740 recommendation will do. 1440 00:45:25,670 --> 00:45:27,458 I do actually love them. That's 1441 00:45:27,474 --> 00:45:28,886 where the driving part comes in, 1442 00:45:28,908 --> 00:45:30,486 right? I mean, that's the way to 1443 00:45:30,668 --> 00:45:32,226 to make the most of your driving 1444 00:45:32,258 --> 00:45:35,366 time is to be listening into a 1445 00:45:35,388 --> 00:45:38,742 book. I do a lot of technical 1446 00:45:38,806 --> 00:45:40,666 stuff, actually, I think the 1447 00:45:40,688 --> 00:45:45,180 most recent book that I was 1448 00:45:45,630 --> 00:45:48,986 listening into was I'm just 1449 00:45:49,008 --> 00:45:50,082 looking back at my bookshelf 1450 00:45:50,086 --> 00:45:51,114 because always get a physical 1451 00:45:51,162 --> 00:45:52,240 copy as well. 1452 00:45:54,290 --> 00:45:56,000 I do the same thing. 1453 00:45:56,610 --> 00:46:01,754 Yeah, I think my most recent 1454 00:46:01,802 --> 00:46:05,278 book that I really enjoyed was I 1455 00:46:05,284 --> 00:46:06,978 think the book called Sapiens. I 1456 00:46:06,984 --> 00:46:08,114 don't know if you have read yes. 1457 00:46:08,152 --> 00:46:09,250 Oh, I've heard of that. 1458 00:46:09,320 --> 00:46:11,650 Yeah, it is an older book, but 1459 00:46:11,800 --> 00:46:14,210 yeah, I got to it more recently. 1460 00:46:14,710 --> 00:46:15,526 I've got it. 1461 00:46:15,548 --> 00:46:18,514 Haven't read it yet. That author 1462 00:46:18,562 --> 00:46:20,246 wrote something else that I 1463 00:46:20,268 --> 00:46:24,054 read. Was it the guns? Germs and 1464 00:46:24,092 --> 00:46:25,720 steel or something like that? 1465 00:46:28,570 --> 00:46:30,038 That's an amazing book. It's a 1466 00:46:30,044 --> 00:46:31,354 difficult read, I would say. 1467 00:46:31,392 --> 00:46:32,362 That'S why it is. 1468 00:46:32,496 --> 00:46:34,474 We can sit back and listen to 1469 00:46:34,512 --> 00:46:34,714 it. 1470 00:46:34,752 --> 00:46:37,210 Yeah, crank that dude up to 1.25 1471 00:46:37,280 --> 00:46:38,330 and let it go. 1472 00:46:38,480 --> 00:46:41,342 Yeah, exactly. I actually just 1473 00:46:41,396 --> 00:46:43,934 finished The Wolf of Wall 1474 00:46:43,972 --> 00:46:46,906 Street, the Abridged version, 1475 00:46:47,018 --> 00:46:49,566 and I'd seen the movie, and I 1476 00:46:49,588 --> 00:46:50,734 follow a lot of the other things 1477 00:46:50,772 --> 00:46:52,766 that Jordan Belford had kind of 1478 00:46:52,788 --> 00:46:55,026 done. But after listening to the 1479 00:46:55,048 --> 00:46:56,366 story, there's a lot that didn't 1480 00:46:56,398 --> 00:46:58,322 make it into the movie. And the 1481 00:46:58,376 --> 00:47:00,354 impression I'm left with was 1482 00:47:00,552 --> 00:47:02,274 truth really is stranger than 1483 00:47:02,312 --> 00:47:06,082 fiction. I think they didn't put 1484 00:47:06,136 --> 00:47:07,686 some of that stuff in because no 1485 00:47:07,708 --> 00:47:11,526 one would have believed it. And 1486 00:47:11,548 --> 00:47:12,726 if you've seen the movie, it's a 1487 00:47:12,748 --> 00:47:14,214 pretty wild story. Anyway. So 1488 00:47:14,252 --> 00:47:16,182 there's like even crazier stuff 1489 00:47:16,236 --> 00:47:16,758 in there. 1490 00:47:16,844 --> 00:47:17,762 Oh, my goodness. 1491 00:47:17,906 --> 00:47:19,770 Yeah. So it's hard to imagine 1492 00:47:21,150 --> 00:47:21,658 there's a. 1493 00:47:21,664 --> 00:47:23,334 Second edition of The Black Swan 1494 00:47:23,382 --> 00:47:23,978 out. 1495 00:47:24,144 --> 00:47:24,906 Oh, really? 1496 00:47:25,008 --> 00:47:26,474 Yeah, it's got a few extra stuff 1497 00:47:26,512 --> 00:47:29,066 in it. There's definitely an 1498 00:47:29,088 --> 00:47:30,366 appendix, I think a brand new 1499 00:47:30,388 --> 00:47:31,918 appendix at the end where he 1500 00:47:31,924 --> 00:47:33,214 talks about implications and 1501 00:47:33,252 --> 00:47:35,694 applications. But there's new 1502 00:47:35,732 --> 00:47:36,846 stuff all the way. Even in the 1503 00:47:36,868 --> 00:47:39,054 introduction. He added some 1504 00:47:39,092 --> 00:47:43,086 nassim. Nicholas Tilleb and I 1505 00:47:43,108 --> 00:47:44,334 believe the first book of his 1506 00:47:44,372 --> 00:47:47,780 encerta. So very interesting 1507 00:47:48,230 --> 00:47:49,540 listening to that. 1508 00:47:50,870 --> 00:47:52,798 I have to pick up the New Blacks 1509 00:47:52,814 --> 00:47:53,746 one. That'll be interesting. 1510 00:47:53,848 --> 00:47:55,846 Yeah. As a product person, I 1511 00:47:55,868 --> 00:47:58,006 actually love also listening to 1512 00:47:58,028 --> 00:48:00,278 a lot of these books on the 1513 00:48:00,284 --> 00:48:01,846 business side of things. So I 1514 00:48:01,868 --> 00:48:02,902 was listening to this book 1515 00:48:02,956 --> 00:48:04,662 called The Ultimate Sales 1516 00:48:04,716 --> 00:48:08,822 Machine maybe a couple of months 1517 00:48:08,876 --> 00:48:10,354 back. And what I really enjoyed 1518 00:48:10,402 --> 00:48:12,066 was that while you're listening 1519 00:48:12,098 --> 00:48:13,466 to these books in the car and 1520 00:48:13,488 --> 00:48:14,906 you have this free time, which 1521 00:48:14,928 --> 00:48:16,186 is so hard to get, by the way, 1522 00:48:16,208 --> 00:48:19,034 these days, is that gives me the 1523 00:48:19,072 --> 00:48:21,994 space to think about things and 1524 00:48:22,032 --> 00:48:23,306 reconstruct my ideas and 1525 00:48:23,328 --> 00:48:24,606 thoughts. Most of the ideas that 1526 00:48:24,628 --> 00:48:26,506 I get about maybe writing a blog 1527 00:48:26,538 --> 00:48:28,062 post and stuff like that happen 1528 00:48:28,196 --> 00:48:29,786 typically in the car, listening 1529 00:48:29,818 --> 00:48:31,006 to something that triggers some 1530 00:48:31,028 --> 00:48:32,994 idea in my head and do that 1531 00:48:33,032 --> 00:48:34,146 because otherwise the whole day 1532 00:48:34,168 --> 00:48:35,774 is like meetings and chasing 1533 00:48:35,822 --> 00:48:36,866 stuff and whatever. 1534 00:48:37,048 --> 00:48:37,780 Sure. 1535 00:48:39,030 --> 00:48:43,586 Awesome. That's a good point. 1536 00:48:43,688 --> 00:48:46,482 It's interesting how it always 1537 00:48:46,536 --> 00:48:47,626 good to stretch your brain, 1538 00:48:47,678 --> 00:48:49,960 because our brain is pretty much 1539 00:48:50,970 --> 00:48:53,558 I'm not physically fit, but I 1540 00:48:53,564 --> 00:48:54,678 like to think my brain is, at 1541 00:48:54,684 --> 00:48:57,320 least. It's just good to kind of 1542 00:48:57,690 --> 00:48:59,266 I get a lot out of just walking 1543 00:48:59,298 --> 00:49:00,774 around or driving and kind of 1544 00:49:00,812 --> 00:49:02,006 thinking about things and 1545 00:49:02,028 --> 00:49:03,226 listening to books that are kind 1546 00:49:03,248 --> 00:49:05,146 of outside my norm. Right. Which 1547 00:49:05,168 --> 00:49:06,266 is probably why I picked up The 1548 00:49:06,288 --> 00:49:08,086 Wolf of Wall Street because I'd 1549 00:49:08,118 --> 00:49:09,206 been doing so many technical 1550 00:49:09,238 --> 00:49:10,358 books and kind of sales books, 1551 00:49:10,374 --> 00:49:12,394 and I was like, Let me check 1552 00:49:12,432 --> 00:49:13,200 that out. 1553 00:49:14,610 --> 00:49:16,414 Sometimes reading something or 1554 00:49:16,532 --> 00:49:17,694 listening to something in one 1555 00:49:17,732 --> 00:49:18,718 context, which is completely 1556 00:49:18,804 --> 00:49:20,026 different, suddenly starts 1557 00:49:20,058 --> 00:49:21,086 connecting to all the stuff that 1558 00:49:21,108 --> 00:49:22,880 we are doing, like day to day. 1559 00:49:24,150 --> 00:49:25,666 Absolutely. Yeah. 1560 00:49:25,688 --> 00:49:26,980 I totally get that. 1561 00:49:28,630 --> 00:49:33,362 Well, cool. So with that, is 1562 00:49:33,416 --> 00:49:36,146 there anything where people can 1563 00:49:36,168 --> 00:49:38,580 find you? I know@nextla.com. 1564 00:49:39,270 --> 00:49:41,682 Yeah. next.com. And on LinkedIn, 1565 00:49:41,746 --> 00:49:43,846 actually, I do engage with a lot 1566 00:49:43,868 --> 00:49:45,174 of folks in conversations over 1567 00:49:45,212 --> 00:49:46,134 there as well. 1568 00:49:46,252 --> 00:49:47,170 Excellent. 1569 00:49:47,330 --> 00:49:48,806 Awesome. Well, thank you for 1570 00:49:48,828 --> 00:49:50,006 being on the show. And we'll let 1571 00:49:50,028 --> 00:49:52,454 Bailey wrap us up now. 1572 00:49:52,492 --> 00:49:54,566 That was some show. Is it me, or 1573 00:49:54,588 --> 00:49:56,406 are the shows getting better? It 1574 00:49:56,428 --> 00:49:58,006 could be my bias that leads me 1575 00:49:58,028 --> 00:49:59,686 to say that. But I figured I 1576 00:49:59,708 --> 00:50:01,546 would ask to get more input. 1577 00:50:01,738 --> 00:50:03,982 After all, what's an AI without 1578 00:50:04,036 --> 00:50:06,042 good input and a feedback loop? 1579 00:50:06,186 --> 00:50:07,966 Speaking of feedback, have you 1580 00:50:07,988 --> 00:50:09,594 checked out Data Driven Magazine 1581 00:50:09,642 --> 00:50:09,740 yet?