1 00:00:00,240 --> 00:00:03,919 Hello, listeners. And welcome back to another thrilling episode 2 00:00:03,919 --> 00:00:07,680 of data driven. In today's episode, we delve deep into the 3 00:00:07,680 --> 00:00:11,519 fascinating and, let's be honest, slightly terrifying world of 4 00:00:11,519 --> 00:00:14,985 generative AI and security risks. Joining us is 5 00:00:14,985 --> 00:00:18,605 Niamh Braun, co founder and CEO of Noma Security, 6 00:00:18,904 --> 00:00:22,425 who's on the front lines of keeping your AI driven project safe from 7 00:00:22,425 --> 00:00:26,220 digital mischief. So grab a cuppa and let's get data 8 00:00:31,259 --> 00:00:35,100 driven. Well, hello, and welcome back to Data Driven, the podcast where we explore 9 00:00:35,100 --> 00:00:38,460 the emergent fields of AI, data science, and, of course, data 10 00:00:38,460 --> 00:00:41,875 engineering. Speaking of data engineering, my favoritest data 11 00:00:41,875 --> 00:00:45,714 engineer in the world can't make it, today. But we 12 00:00:45,714 --> 00:00:49,335 have an exciting, conversation queued up with Niv Braun, 13 00:00:49,635 --> 00:00:53,220 who is the cofounder and CEO of Noma. Noma 14 00:00:53,220 --> 00:00:56,840 is a security firm that focuses on effectively 15 00:00:57,460 --> 00:01:01,140 he'll describe it more eloquently than I can, but effectively thinks about 16 00:01:01,140 --> 00:01:04,740 security in the context of data and AI across the 17 00:01:04,740 --> 00:01:08,185 entire life cycle. Welcome to the show, Niv. Hey, 18 00:01:08,185 --> 00:01:11,865 Frank. Happy to hear you, bro. Yeah. It's good to have 19 00:01:11,865 --> 00:01:15,625 you. And and security is one of those things where I've been thinking about more 20 00:01:15,625 --> 00:01:19,085 lately. Right? So my background was a software engineer and, 21 00:01:19,225 --> 00:01:22,970 you know, software engineers historically have not thought of 22 00:01:22,970 --> 00:01:26,810 security. Then I made the transition into data engineering and data 23 00:01:26,810 --> 00:01:30,490 science, and, traditionally, security is not really at top 24 00:01:30,490 --> 00:01:34,090 of mind, for them either. Now I 25 00:01:34,090 --> 00:01:37,854 kinda look at this, and I kinda look at the landscape that we're in where 26 00:01:37,854 --> 00:01:40,354 enterprises are deploying LLMs, 27 00:01:41,455 --> 00:01:44,915 generative AI solutions, on top of the predictive AI solutions, 28 00:01:45,935 --> 00:01:49,760 fast and furiously, and not thinking about 29 00:01:49,760 --> 00:01:53,520 security ramifications. So what are your what's your take on 30 00:01:53,520 --> 00:01:57,360 that? 100% agree. I think that, it's 31 00:01:57,360 --> 00:02:00,820 even like the the the the current, like, timing is even more fascinating 32 00:02:01,120 --> 00:02:04,725 than the than just, like, a new technology. Because exactly like you said, like, 33 00:02:04,725 --> 00:02:08,565 Frank, like, we all like the data practitioners. We all know that, like, security is 34 00:02:08,565 --> 00:02:11,445 not, like, our top priority. And by the way, like, by, like, like, this is, 35 00:02:11,445 --> 00:02:14,405 like, how it should be. Like, we are focusing on the business and, like, drive, 36 00:02:14,405 --> 00:02:18,165 like, drive, like, the business forward. And this is why we're, like, this is 37 00:02:18,165 --> 00:02:21,840 what we're paid for. The problem is that 38 00:02:22,140 --> 00:02:25,820 because we're not, like, in this kind of, like, mindset, we also, like, like 39 00:02:25,820 --> 00:02:29,660 any technologies in the company, also, like, create some risk. What we see right 40 00:02:29,660 --> 00:02:33,260 now is the LLM drive, which is pretty cool, is that for the 41 00:02:33,260 --> 00:02:37,085 first time, the security teams started to put 42 00:02:37,085 --> 00:02:40,765 the focus and, like, the spotlight on the data and AI teams. Because until 43 00:02:40,765 --> 00:02:44,525 now, let's be honest, they were focusing only on the software developers and 44 00:02:44,525 --> 00:02:48,205 their SDLC and the CICD and all these areas. Like, we were, 45 00:02:48,205 --> 00:02:51,300 like, you know, like, in the shadow. And we were, like, able, like, to act 46 00:02:51,300 --> 00:02:55,000 like exactly like, like, like, completely freely as we wanted. 47 00:02:55,620 --> 00:02:58,819 But now when, like, the security team start, like, to put the spotlight on the 48 00:02:58,819 --> 00:03:02,355 data and AI teams, what they understand is that it's not 49 00:03:02,355 --> 00:03:06,115 only this new kind of LLM threats, but also all 50 00:03:06,115 --> 00:03:09,655 the basic principles of security are not implemented 51 00:03:09,955 --> 00:03:13,795 in the data engineers and the data science teams. Nobody, like, scans all the 52 00:03:13,795 --> 00:03:17,610 code in our notebooks, for example, unlike the software developers that, like, all 53 00:03:17,610 --> 00:03:21,209 their code is being scanned. Nobody helps us to 54 00:03:21,209 --> 00:03:24,730 find configurations in our data pipelines or our 55 00:03:24,730 --> 00:03:28,269 MLOps tools or our AI platforms, like Databricks, for example. 56 00:03:28,505 --> 00:03:31,805 Like, nobody, like, provide us this ability to to find it easily, 57 00:03:32,185 --> 00:03:35,945 unlike, again, the software developers that they receive all this coverage 58 00:03:35,945 --> 00:03:39,485 and everything. Like, on the moment that they have, like, the smallest misconfigurations 59 00:03:39,945 --> 00:03:43,480 in their SCM or their their CICD, they 60 00:03:43,480 --> 00:03:47,320 will immediately, like, receive, like, a notification, like, 61 00:03:47,320 --> 00:03:51,080 helping them exactly, like, how to secure it. And also eventually, 62 00:03:51,080 --> 00:03:54,855 like, in the run time, in the runtime, in software life cycle, in 63 00:03:54,855 --> 00:03:58,535 classic like software application, we also have a lot of API security and web 64 00:03:58,535 --> 00:04:02,295 application firewalls tools that help us to protect the application in the 65 00:04:02,295 --> 00:04:06,010 runtime. But now specifically in LLM, this is, like, very 66 00:04:06,010 --> 00:04:09,630 related also, like, to what you said. Like, there are new kind of adversarial attacks, 67 00:04:10,090 --> 00:04:13,790 all the prompt injection and model jailbreak and stuff like that. 68 00:04:13,850 --> 00:04:17,610 And, again, nobody, like, else would like to protect it, like, in real time. And 69 00:04:17,610 --> 00:04:20,565 I think that this is, like, one of, like, the main shift that we see 70 00:04:20,565 --> 00:04:24,264 today in this area. We understand that the spotlight 71 00:04:24,405 --> 00:04:27,604 moved to the data and AI teams, but we need to make sure that we 72 00:04:27,604 --> 00:04:31,125 do, like, both. Like, we start with, like, a new kind, like, 73 00:04:31,125 --> 00:04:34,824 trendy, like, risk that we want to make sure that we are protected from. 74 00:04:35,099 --> 00:04:38,699 But also that for the first time, after a lot of years, we're 75 00:04:38,699 --> 00:04:42,060 starting also, like, to implement the basic security measurements 76 00:04:42,060 --> 00:04:45,900 needed in our area. But the most important thing, of course, 77 00:04:45,900 --> 00:04:49,335 is to continue and, like, do it without slowing us down. Like, we need to 78 00:04:49,335 --> 00:04:53,175 make sure that, like, everything, like, all the different, like, security measurements that 79 00:04:53,175 --> 00:04:57,015 we take still provide us the ability to move fast, to enable 80 00:04:57,015 --> 00:05:00,775 the data sent the data science and the data engineering teams to 81 00:05:00,775 --> 00:05:04,315 continue and, like, innovate, but in a secure way. 82 00:05:04,990 --> 00:05:08,830 You know, that's a good point because I never thought about scanning a notebook for 83 00:05:08,830 --> 00:05:12,430 errors. Right? Shame on me. Right? Like for code 84 00:05:12,430 --> 00:05:16,030 security I mean, not errors, but, you know, security vulnerabilities. That's not something 85 00:05:16,030 --> 00:05:19,469 that I have seen done in practice. I mean, the the 86 00:05:19,469 --> 00:05:23,125 closest I've seen where security has been an issue for 87 00:05:23,125 --> 00:05:25,065 anyone in this space is, 88 00:05:27,205 --> 00:05:30,645 basically using protected, you know, Python 89 00:05:30,645 --> 00:05:34,405 libraries, right, or or Python library repos, right, where they're those 90 00:05:34,405 --> 00:05:38,080 are scanned by, I forget the name of the 3rd party that'll do it where 91 00:05:38,080 --> 00:05:41,919 you just basically say you point your Python instance to there. Yeah. Because 92 00:05:41,919 --> 00:05:45,599 I also think that Internal Artifactory. Yes, exactly. So 93 00:05:45,599 --> 00:05:49,405 like, what, because I often 94 00:05:49,405 --> 00:05:51,104 wonder, you know, people just like to install. 95 00:05:53,645 --> 00:05:57,245 God only knows what's in there. I can tell that, like, it already, like, happens. 96 00:05:57,245 --> 00:06:00,625 Like, I don't know if you heard, but for example, like, like, 97 00:06:01,324 --> 00:06:04,900 like, pretty recently, PyTorch, for example. Right. 98 00:06:04,900 --> 00:06:08,740 PyTorch that we all know was compromised. We all know and love. We're most people 99 00:06:08,740 --> 00:06:12,260 love. It was compromised. Like, specific version of PyTorch, a 100 00:06:12,260 --> 00:06:15,794 malicious actor succeeded to to put some 101 00:06:15,794 --> 00:06:17,815 code inside that basically, 102 00:06:19,395 --> 00:06:22,755 collected all the the the secrets and token that you have in the 103 00:06:22,755 --> 00:06:26,435 environment and sent it to DNS. Now we all 104 00:06:26,435 --> 00:06:30,180 know, like, how much like like, how many downloads, like, PyTorch have. 105 00:06:30,500 --> 00:06:34,340 And most times, where PyTorch is downloaded to through, like, to 106 00:06:34,340 --> 00:06:37,720 all these different, like, notebooks, wherever they be, JupyterOps, 107 00:06:38,100 --> 00:06:41,320 SageMaker, Databricks, like, we all use them. 108 00:06:41,620 --> 00:06:44,100 And it I can tell that, like, it caused us to a lot of, like, 109 00:06:44,100 --> 00:06:47,895 problem. I can tell, like, like, like, firsthand, like, we saw, like, a lot 110 00:06:47,895 --> 00:06:51,275 of organizations that were compromised because of this attack. 111 00:06:52,375 --> 00:06:55,175 And it happens all the time. And by the way, if you mentioned, for example, 112 00:06:55,175 --> 00:06:58,955 like, if you already, like, touched the point of, of open source, 113 00:06:59,700 --> 00:07:03,540 now you have also Hugging Face, which is completely different area. Now it's 114 00:07:03,540 --> 00:07:06,920 not only Open Source packages. It's all these different Open Source 115 00:07:07,060 --> 00:07:10,760 Hugging Face models and Hugging Face datasets. And there, 116 00:07:10,820 --> 00:07:14,440 all these internal artifact are completely useless because they don't even 117 00:07:14,445 --> 00:07:18,045 scan these models. It's completely different technology, completely different, like, 118 00:07:18,045 --> 00:07:21,485 heuristics in order to find it. And, therefore, you start to 119 00:07:21,485 --> 00:07:25,325 see kind of, like, trends for for the attackers. They started to 120 00:07:25,325 --> 00:07:28,920 upload a lot of backdoored and a lot of malicious models 121 00:07:29,160 --> 00:07:32,920 into Hugging Face. I can tell you, like, we personally, we already, 122 00:07:32,920 --> 00:07:36,680 like, detected, I think, almost, like, 100, back 123 00:07:36,760 --> 00:07:40,360 or the malicious models, on Hugging Face because it's a wild 124 00:07:40,360 --> 00:07:44,205 west. Right. Because how do you because these these models, first off, 125 00:07:44,205 --> 00:07:47,885 they're physically large files. Right? So that there's that's a factor. 126 00:07:47,885 --> 00:07:51,485 Right? I don't know how Hugging Face makes money. I'd be 127 00:07:51,485 --> 00:07:55,325 curious to have someone on the show talk about that. But, you know, 128 00:07:55,325 --> 00:07:59,169 they're doing the service. And, how would you even scan? I 129 00:07:59,169 --> 00:08:02,849 mean, that's a good question. Right? What types of vulnerabilities have you sent have you 130 00:08:02,849 --> 00:08:06,689 found so far? And how does one even scan, like, a safe 131 00:08:06,689 --> 00:08:10,405 tensor or g file? Like, how do you what's what's 132 00:08:10,405 --> 00:08:14,025 that look like? Right? Obviously, I'm pretty sure, you know, McAfee 133 00:08:14,085 --> 00:08:17,485 antivirus doesn't have a thing for that. But, like Exactly. 134 00:08:17,925 --> 00:08:21,125 But, how do you even do that? I'm just curious. Yeah. So this is, like, 135 00:08:21,125 --> 00:08:24,410 exactly, like, the problem. Like, it's even, like, in in in the models, like, it's 136 00:08:24,410 --> 00:08:28,250 even, like, a a more, like, the the risk there, like, is 137 00:08:28,250 --> 00:08:32,090 more, like, clearer because as you know, a lot of time, like, 138 00:08:32,090 --> 00:08:35,610 these models in hanging face are even, like, in pickle. And, like, pickle is, like, 139 00:08:35,610 --> 00:08:39,325 by design, like, insecure, like, file. And so 140 00:08:39,885 --> 00:08:43,265 binary dump, right, of, like, the memory space. Yeah. Like, in the deserialization 141 00:08:43,485 --> 00:08:46,705 process, like, basically, you can, like, put, like, any kind of, like, malicious, 142 00:08:47,245 --> 00:08:50,780 action that you'd like, that, like, the attacker can. So we see, 143 00:08:50,780 --> 00:08:54,560 like, different attacks. Like, most of the attacks come today, like, from pickle files. 144 00:08:54,780 --> 00:08:58,220 Some also, like, not even, like, in the deserialization process, but also, like, in the 145 00:08:58,220 --> 00:09:01,735 model code itself. For example, like, if you ask 146 00:09:01,735 --> 00:09:05,475 for a specific example, like, share something that we 147 00:09:05,475 --> 00:09:09,155 detected, like, recently. We found, like, a very, 148 00:09:09,155 --> 00:09:12,915 let's say, a popular, open source, LLA model that we all 149 00:09:12,915 --> 00:09:16,275 know. But we know that, like, a it has a lot of, like, different 150 00:09:16,275 --> 00:09:19,415 versions. And one of the version was actually a docker 151 00:09:19,635 --> 00:09:23,290 that took the original model, wrapped it up with few 152 00:09:23,290 --> 00:09:26,810 lines of code in the model, which what they did is that every 153 00:09:26,810 --> 00:09:30,110 input to the model and every output from the model 154 00:09:30,410 --> 00:09:33,405 was also sent to the attacker, which basically 155 00:09:34,105 --> 00:09:37,705 just received full visibility and observability to all the 156 00:09:37,705 --> 00:09:41,385 runtime application and production. So, like, all the organizations that, 157 00:09:41,385 --> 00:09:45,000 like, use this model. And performance wise, the 158 00:09:45,000 --> 00:09:48,600 data scientist, of course, they cannot, like, detect it because performance 159 00:09:48,600 --> 00:09:52,440 wise, it worked perfectly because it took the original model. So nothing to be 160 00:09:52,440 --> 00:09:56,040 suspicious about. If we want the data 161 00:09:56,040 --> 00:09:59,875 scientist, every new open source model that they like, like 162 00:09:59,875 --> 00:10:03,715 in Hugging Face, they'll start, like, to open, like, these files and the binaries and, 163 00:10:03,715 --> 00:10:07,335 like, to start, like, to looking, like, in their own hands, they're manually 164 00:10:07,475 --> 00:10:11,130 for, like, a for a for risk. First, like, of course, 165 00:10:11,130 --> 00:10:14,410 like, we understand that this is not their expertise and, like, it it 166 00:10:14,570 --> 00:10:17,950 like, we want to be secured, but, like, like, even, like, worse, 167 00:10:18,329 --> 00:10:22,089 we just spend all their time on security. And I think that 168 00:10:22,089 --> 00:10:25,134 this is, like, the worst stuff. Actually, it's not the worst. I think that, like, 169 00:10:25,134 --> 00:10:28,495 the worst, and this is also, like, something that, like, I saw recently in several 170 00:10:28,495 --> 00:10:32,274 organizations is just, like, to block everything. Organizations 171 00:10:32,495 --> 00:10:36,334 that, like, understand, okay, Hugging Face model, it's, like, true, like, a secure, like, 172 00:10:36,334 --> 00:10:40,000 in secure area. Let's block it. Let's say, like, to 173 00:10:40,000 --> 00:10:43,760 all the data scientists in the organization, you're disallowed to use HAG interface model. I 174 00:10:43,760 --> 00:10:46,980 think this is, like, the worst. That seems like a mistake because 175 00:10:48,000 --> 00:10:51,440 because the people are gonna find a way. Well, 1, where you can't stop the 176 00:10:51,440 --> 00:10:54,260 signal. Right? That was a line from, a movie. 177 00:10:55,345 --> 00:10:58,865 They can't, kudos if people know who that what movie that is. 178 00:10:58,865 --> 00:11:02,465 But, you know, if you block Huggy Face, people are gonna find a way 179 00:11:02,465 --> 00:11:06,305 around that. They're gonna put it on a thumb drive at 180 00:11:06,305 --> 00:11:10,010 home and then bring it in. So percent. This is, by the way, also, like, 181 00:11:10,010 --> 00:11:13,690 what you see, like, with this kind of, like, internal Artifactory. You see that, like, 182 00:11:13,690 --> 00:11:16,890 once you get to you you create for the r and d or create for 183 00:11:16,890 --> 00:11:20,650 the developers or for the data scientists, you create some level of, like, 184 00:11:20,650 --> 00:11:24,345 friction. They will just find a way out to, like, bypass 185 00:11:24,345 --> 00:11:27,325 it and to to lower this, this friction. 186 00:11:28,585 --> 00:11:31,005 Right. So so couple of questions. 187 00:11:32,505 --> 00:11:35,725 One, I've seen, improper naming 188 00:11:36,320 --> 00:11:40,020 Not improper naming, but but basically using, names, 189 00:11:40,080 --> 00:11:43,760 like, that's looks similar to what should be. Yeah. Type will split. Type 190 00:11:43,760 --> 00:11:47,600 type splitting. That's it. I've seen that, which is kind of, I guess, 191 00:11:47,600 --> 00:11:50,500 kind of, you know, dollar store approach. But also, 192 00:11:53,045 --> 00:11:56,725 how does how does it if you wanted to look through these model files, as 193 00:11:56,725 --> 00:11:59,605 far as I know, they're just I just looked at them. I just see binary 194 00:11:59,605 --> 00:12:02,644 stuff. Like, how would you look for malicious code in there? Because I think you're 195 00:12:02,644 --> 00:12:06,485 right. That's not a skill set the average AI engineer or data scientist 196 00:12:06,485 --> 00:12:10,190 would have. Yeah. So, basically, like, you need, like, to manually kind of, like, 197 00:12:10,190 --> 00:12:13,630 parsing it because, like, you have, of course, like, the the binary file, but most 198 00:12:13,630 --> 00:12:17,470 times, it's not only, like, the binary file. You label for, like, the the code 199 00:12:17,470 --> 00:12:21,045 file that run, like, run the model, and you label for, like, the, in 200 00:12:21,045 --> 00:12:24,805 case it's, like, pick a, like, the deserialization process, that you can, like, 201 00:12:24,965 --> 00:12:28,725 parse and then, like, to see, like, the code there. But then you 202 00:12:28,725 --> 00:12:31,845 need also, like, you know, like, you have, like, 2 phase. 1st, you need to 203 00:12:31,845 --> 00:12:34,890 to parse it, you know, like, to see, like, the code, but then you need 204 00:12:34,890 --> 00:12:38,649 also, like, to be able to read code and to understand which 205 00:12:38,649 --> 00:12:42,089 one is valid and which one is malicious, which is also, like, completely, like, you 206 00:12:42,089 --> 00:12:45,529 know, like, you need expertise in this area. If you see bash 207 00:12:45,529 --> 00:12:49,065 commands, is it okay or not? Do you see access to the 208 00:12:49,065 --> 00:12:52,825 Internet? Okay or not? Like, you you need, like, to have, like, 209 00:12:52,825 --> 00:12:56,505 some, like, detectors in there that, that know how to do it, like, build 210 00:12:56,505 --> 00:13:00,265 by by expert or something. So how would you even detect 211 00:13:00,265 --> 00:13:03,470 that if you found it? Like, how was this found? Was this just somebody looking 212 00:13:03,470 --> 00:13:07,230 in network packets? Or, like, what how was it discovered? I'm 213 00:13:07,230 --> 00:13:10,530 just curious. Yeah. This specifically was, like, by our 214 00:13:10,750 --> 00:13:14,394 security research team. Okay. Yeah. That's like, looks a 215 00:13:14,394 --> 00:13:17,514 lot if, a lot like all the time, like, you know, all these different kind 216 00:13:17,514 --> 00:13:20,954 of, like, open source and third party models in order to to help 217 00:13:20,954 --> 00:13:24,735 our users to make sure that, like, everything that they use 218 00:13:25,035 --> 00:13:28,870 is is valid. And again, most importantly, without slowing 219 00:13:28,870 --> 00:13:32,230 them down. They can just, like, download and, like, run, like, with everything that they 220 00:13:32,230 --> 00:13:35,769 that they want. And in case, we see something that is, 221 00:13:36,230 --> 00:13:39,750 that is suspicious, we know how to detect it and to to help them to 222 00:13:39,750 --> 00:13:42,475 to secure it. Interesting. Interesting. 223 00:13:43,415 --> 00:13:47,115 Because I know a lot of people, you know, they they've been downloading 224 00:13:47,175 --> 00:13:50,935 these models from Hugging Face. And just taking it on 225 00:13:50,935 --> 00:13:54,535 faith, and I've heard that these things don't call 226 00:13:54,535 --> 00:13:58,370 out to the Internet. Mhmm. And I fell into that. And then 227 00:13:58,370 --> 00:14:02,150 I kinda had this moment of paranoia where I'm like, how do I know? 228 00:14:02,370 --> 00:14:05,730 I mean, the only way I'm a I'm just a humble data scientist. Right? Like, 229 00:14:05,730 --> 00:14:08,530 so the only way I would think about it would be to have a firewall 230 00:14:08,530 --> 00:14:12,214 rule that would block network traffic going up for that box. 231 00:14:12,515 --> 00:14:16,035 And I'm sure there's probably workarounds to that too. I mean, are these 232 00:14:16,035 --> 00:14:19,415 attacks are these attacks that sophisticated yet? 233 00:14:20,515 --> 00:14:24,149 Yeah. Yeah. And, like, also, like, most times you don't, like, the data 234 00:14:24,149 --> 00:14:27,910 science, like, they don't want, like, to permanently, like, to close, like, the Internet, like, 235 00:14:27,910 --> 00:14:31,670 the outbound because also, like, the application needs it. And also, like, the, you 236 00:14:31,670 --> 00:14:35,029 know, like, the the in order, like, to download, like, the dependencies and the models 237 00:14:35,029 --> 00:14:38,805 you needed. So most times, like, just, like, to block the Internet, it doesn't solve 238 00:14:38,805 --> 00:14:42,404 everything. It was, like, more, like, in the past that everything was, like, network based 239 00:14:42,404 --> 00:14:46,165 only. Today, when you have, like, also, like, the applicative layer here, so 240 00:14:46,165 --> 00:14:47,705 it's, like, a bit more sophisticated. 241 00:14:50,325 --> 00:14:53,690 But yeah. Wow. So 242 00:14:54,310 --> 00:14:57,830 the safe tensor format, as I understand it, what you 243 00:14:57,830 --> 00:15:01,190 know, you basically digitally sign or somebody 244 00:15:01,190 --> 00:15:05,030 digitally signs the contents of it. Is that is 245 00:15:05,030 --> 00:15:08,445 that a correct understanding? Yeah. So it's end up like a 246 00:15:08,525 --> 00:15:12,285 like, in general, first thing, of course, that, like, a safe denture is, like, 247 00:15:12,285 --> 00:15:15,965 much more secure. Okay. I already like by design, and as long 248 00:15:15,965 --> 00:15:19,769 as we as the industry will go, like, more and more, like, towards 249 00:15:19,769 --> 00:15:22,750 this road, because today, like, we still see, like, tons of light pickles. 250 00:15:23,449 --> 00:15:27,290 But as long as we progress, like, all as an industry, we'll already, 251 00:15:27,290 --> 00:15:31,050 like, be, like, in a bit better situation. It's not 252 00:15:31,050 --> 00:15:34,685 perfect, of course. We still see some issues. And, of course, organizations still 253 00:15:34,685 --> 00:15:38,245 need, like, to have some security measurements and processes 254 00:15:38,245 --> 00:15:41,025 to make sure that, like, they're aware of what, 255 00:15:42,445 --> 00:15:46,205 like, Hang in Face are using. But I think that it's already, like, 256 00:15:46,205 --> 00:15:49,840 going to be a bit better. I can tell you something that, actually, 257 00:15:49,840 --> 00:15:53,140 like, recently one of our one of our partners told me, 258 00:15:53,760 --> 00:15:57,440 which was pretty cool, very similar to what you said that you 259 00:15:57,440 --> 00:16:00,180 start, like, to feel a lot of concerns about this area. 260 00:16:01,245 --> 00:16:04,764 VP data science of a very big like, 261 00:16:04,764 --> 00:16:08,605 Fortune Fortune 500, like, very big, like, corporate. And you kind 262 00:16:08,605 --> 00:16:11,884 of, like, the the head of, like, the older data science, like, groups here. And 263 00:16:11,884 --> 00:16:15,665 they told me, you know, Niv, I I already know 264 00:16:16,250 --> 00:16:19,850 what I'm going to be fired about, like, in a in the next, like, 265 00:16:19,850 --> 00:16:23,690 24 months, and it's going to be about that. I know for sure, like, we're 266 00:16:23,690 --> 00:16:27,529 using, like, so much, like, Agiface models. I know for sure that I'm this is, 267 00:16:27,529 --> 00:16:30,490 like, the reason that I'm going to be fired, like, one day. Because today, like, 268 00:16:30,490 --> 00:16:34,274 we're using it, like, freely. We are also, like, very creative. We're not, like, 269 00:16:34,274 --> 00:16:37,795 only using, like, the most popular LAMA model, but, like, we're to, 270 00:16:37,795 --> 00:16:41,635 like, take advantage of this great advantage of the platform, which is, like, 271 00:16:41,635 --> 00:16:45,360 the amount and the diversity of the model that you have there. But I have 272 00:16:45,360 --> 00:16:49,040 no no doubt that we create so many risks that we're just, 273 00:16:49,040 --> 00:16:52,879 like, not exposed yet, that I'm going to to pay with it, 274 00:16:52,879 --> 00:16:56,560 like, with my head. So it it's 275 00:16:56,560 --> 00:17:00,145 it's pretty cool because it's not it's not always that you see, 276 00:17:01,005 --> 00:17:04,684 r and d and business owners that are so concerned 277 00:17:04,684 --> 00:17:08,525 about security even before the security team arrived 278 00:17:08,525 --> 00:17:12,330 to them. But they're already aware of this risk. And it's something that 279 00:17:12,330 --> 00:17:15,770 we start, like, to see more and more because, you know, it's just like it's 280 00:17:15,770 --> 00:17:19,310 it's too obvious. Like, the the the window is open and everybody see it. 281 00:17:20,010 --> 00:17:23,530 Yeah. I I would suppose that's in in a in a very kinda strange way 282 00:17:23,530 --> 00:17:26,615 that's bit progress, right, where people think about security beforehand. 283 00:17:27,155 --> 00:17:30,455 Like, even if they don't know I mean, I think this this this VP, 284 00:17:32,115 --> 00:17:35,955 you know, is pretty spot on. Like, what concerns me about the widespread 285 00:17:35,955 --> 00:17:39,635 adoption of these models and particularly Hugging Face, so there are no knock on Hugging 286 00:17:39,635 --> 00:17:43,310 Face. I think whatever you get your models Mhmm. 287 00:17:45,050 --> 00:17:48,670 I mean, we just don't know. And these things are just complicated. 288 00:17:48,730 --> 00:17:52,170 Right? I mean, they are by design complicated with 1,000,000,000,000 of 289 00:17:52,170 --> 00:17:55,955 parameters. In some cases, I guess, 1,000,000,000,000. But also, you 290 00:17:55,955 --> 00:17:59,715 know, they have this ability to even 291 00:17:59,715 --> 00:18:03,395 even if everything worked out well, even even assuming everything is 292 00:18:03,395 --> 00:18:06,855 fine, right, in terms of the operationalization of these things, 293 00:18:08,470 --> 00:18:11,830 There's still the chance that the model itself and its 294 00:18:11,830 --> 00:18:15,450 training was poisoned. So, like, 295 00:18:15,590 --> 00:18:18,550 I I mean, like, there's just so many because when I my wife works in 296 00:18:18,550 --> 00:18:22,070 IT security, and I was all excited. It was about a year and a half 297 00:18:22,070 --> 00:18:25,125 ago. I I was talking to her about LLMs and stuff like that 298 00:18:25,825 --> 00:18:29,365 and chat GPT and and and those types of things. 299 00:18:30,705 --> 00:18:33,665 And I was like, oh, well, you take all this data and you train a 300 00:18:33,665 --> 00:18:36,960 model and you you distill down this graph and this and this. And then she's 301 00:18:36,960 --> 00:18:40,760 like, that sounds like a big attack surface to me. Yeah. 302 00:18:40,760 --> 00:18:44,480 And I was like like, data poisoning in the classic one and data 303 00:18:44,480 --> 00:18:47,840 poisoning can be, like, in in in 2 levels or, like like, someone 304 00:18:47,840 --> 00:18:51,175 like poisoning your data or exactly what you say, 305 00:18:51,395 --> 00:18:55,235 somebody just, like, this way, like, 306 00:18:55,235 --> 00:18:58,835 create backdoor in, in third party models and open source 307 00:18:58,835 --> 00:19:02,190 models that then, like, everybody downloads. Right. 308 00:19:02,250 --> 00:19:05,390 Right. And we wouldn't know, like, what's 309 00:19:05,850 --> 00:19:09,390 the I mean, the defense against that seems very 310 00:19:09,610 --> 00:19:12,990 intricate. Not impossible, but very delicate and intricate. 311 00:19:13,915 --> 00:19:17,675 So in in in classic application security, there is a 312 00:19:17,675 --> 00:19:21,515 great practice called SBOM. SBOM is a software 313 00:19:21,515 --> 00:19:25,195 billing of material. Basically, it means that, you get, like, in 314 00:19:25,195 --> 00:19:28,760 specific format, kind of like visibility to all the different 315 00:19:28,760 --> 00:19:32,360 software components that build your application. One of the things that 316 00:19:32,360 --> 00:19:35,960 now we're also, like, part of the building is a 317 00:19:35,960 --> 00:19:39,420 official framework of OWASP, the nonprofit organization 318 00:19:40,040 --> 00:19:43,865 around security of AI and machine learning. And 319 00:19:44,325 --> 00:19:47,945 what you have there is for the first time you have like double layer 320 00:19:48,164 --> 00:19:51,865 of visibility. The first one is just like to understand 321 00:19:52,565 --> 00:19:56,300 what models I'm even using in the organization. Everything, like 322 00:19:56,300 --> 00:19:59,980 what models like, include in my application. It can be open 323 00:19:59,980 --> 00:20:03,740 source models. It can be self developed models. Also, by the way, not only not 324 00:20:03,740 --> 00:20:07,120 only LLM, of course, also like vision, NLP, like everything else. 325 00:20:07,420 --> 00:20:11,245 And also third party models that are embedded as part of the application, they 326 00:20:11,245 --> 00:20:14,545 are not open no. They are not open source. For example, 327 00:20:15,085 --> 00:20:18,785 if software engineer add API call as part of the application 328 00:20:19,565 --> 00:20:23,025 to OpenAI, in this way, they embed 329 00:20:25,040 --> 00:20:28,320 LLM as part of the application. This is also like one of, like, the models 330 00:20:28,320 --> 00:20:32,000 that you are using, but you you you want to know this is all my 331 00:20:32,000 --> 00:20:35,460 AI and model inventory that I'm using as Spyro as part of the application. 332 00:20:36,000 --> 00:20:39,845 And in addition to that, you have even the deeper context there, which 333 00:20:39,845 --> 00:20:43,684 is also like what you referred to. It's not only this is 334 00:20:43,684 --> 00:20:47,285 the list of the model that I'm using, but for each one, you want to 335 00:20:47,285 --> 00:20:51,100 understand on what dataset it was trained, what data 336 00:20:51,100 --> 00:20:54,299 maybe also like it has access to in case it's in production, let's say, with 337 00:20:54,299 --> 00:20:57,679 RAG architecture. You want to understand, like, the deep context 338 00:20:58,059 --> 00:21:01,659 of all these, like, models, what I'm using, but also, like, what 339 00:21:01,659 --> 00:21:05,095 happens, like, in this specific, like, model. Sometimes 340 00:21:05,395 --> 00:21:08,915 it's, as you said, for to to understand what data was trained on a 341 00:21:08,915 --> 00:21:12,595 model before, like, I'm starting, like, to use it by 3rd party, a 342 00:21:12,595 --> 00:21:16,295 lot of time is even, like, internally in the organization. 343 00:21:16,830 --> 00:21:19,809 Because once we start to train a lot of models, 344 00:21:20,750 --> 00:21:23,970 we want to make sure that we don't violate 345 00:21:24,590 --> 00:21:28,270 any policy that we have in the organization, either it's for compliance or 346 00:21:28,270 --> 00:21:32,115 security. For example, one of the things that, like, we are like, I keep, like, 347 00:21:32,115 --> 00:21:35,875 hearing a lot of time from, from security and legal and privacy 348 00:21:35,875 --> 00:21:39,715 teams is that, look, we instruct all the 349 00:21:39,715 --> 00:21:43,015 organization not to train any sensitive 350 00:21:43,075 --> 00:21:46,900 data, PII, PCI, PHI, any other sensitive 351 00:21:46,900 --> 00:21:50,520 information on our models. But except instructing 352 00:21:50,660 --> 00:21:54,420 it and speak about it, nobody knows if it 353 00:21:54,420 --> 00:21:57,780 happens. And we don't provide also our data 354 00:21:57,780 --> 00:22:01,404 teams tools that will help them to 355 00:22:01,404 --> 00:22:05,245 detect it in case it, like, it happens, like like, not in purpose. For 356 00:22:05,245 --> 00:22:08,924 example, I can tell you, like, one of the thing that we saw very 357 00:22:08,924 --> 00:22:11,985 recently. Big organization, a huge Fintech company, 358 00:22:12,764 --> 00:22:16,340 that data scientist unintentionally trained all the 359 00:22:16,340 --> 00:22:20,100 transaction of the application on one of the models. Now it's 360 00:22:20,100 --> 00:22:23,540 a, like, crazy big violation there of, like, compliance and 361 00:22:23,540 --> 00:22:27,300 security. The data scientist did this unintentionally. They 362 00:22:27,300 --> 00:22:31,134 truly, like, didn't know it. If they had something that, like, would help them, like, 363 00:22:31,134 --> 00:22:34,894 the basic visibility that you mentioned before, it will truly, like, help them to 364 00:22:34,894 --> 00:22:38,434 start, like, to continue, like, innovate and just, like, in case something like bad happens, 365 00:22:38,495 --> 00:22:41,934 to be alerted in that. And so I see that, like, the the data training 366 00:22:41,934 --> 00:22:45,740 is also, like, very, very important point also internally and not 367 00:22:45,740 --> 00:22:49,420 only the external data train on the external models that we're embedding and 368 00:22:49,420 --> 00:22:53,180 downloading. So you mentioned, OWASP. So just 369 00:22:53,180 --> 00:22:55,980 for the benefit of folks who may not know, because most of our listeners are 370 00:22:55,980 --> 00:22:59,815 either data engineers, data scientists. What is OWASP? And what is the 371 00:22:59,895 --> 00:23:03,414 I think it's with the OWASP 10? Yeah. So 372 00:23:03,414 --> 00:23:07,255 OWASP in general, it's a amazing organization that, 373 00:23:07,495 --> 00:23:10,554 is like a nonprofit one that helps basically, 374 00:23:12,360 --> 00:23:16,040 we combine a lot of people together, gather together in order to make 375 00:23:16,040 --> 00:23:19,720 sure that all our industry is much more secured with a lot of 376 00:23:19,720 --> 00:23:23,320 different security initiatives in a lot of different aspects, mainly of like product 377 00:23:23,320 --> 00:23:26,895 security, but not only. Product security is like application 378 00:23:26,895 --> 00:23:28,515 security. It's building security. 379 00:23:30,495 --> 00:23:34,095 Specifically in OASP, you have several different types of 380 00:23:34,095 --> 00:23:37,875 projects. So for example, one type of project is the OSP10, 381 00:23:38,410 --> 00:23:41,150 top ten, that basically takes different areas 382 00:23:42,090 --> 00:23:45,850 and define the top ten risks in this specific area. So it can be top 383 00:23:45,850 --> 00:23:48,670 ten for API, top ten for 384 00:23:50,730 --> 00:23:54,190 CICD. And now there is also like top ten for LLM. 385 00:23:55,345 --> 00:23:58,965 Addition framework, like, there are a lot of like different tools. Specifically, 386 00:23:59,424 --> 00:24:02,965 if someone wants to understand a bit more about like the wide 387 00:24:03,745 --> 00:24:07,205 landscape and the risk around AI and machine learning, 388 00:24:08,309 --> 00:24:12,150 the framework that I would like recommend on, highly recommend on, is 389 00:24:12,150 --> 00:24:15,510 amazing and very comprehensive called the OWASP AI 390 00:24:15,510 --> 00:24:19,130 Exchange. A group of people, again, gathered together, 391 00:24:19,910 --> 00:24:23,495 that covered not only LLM, but all the basic 392 00:24:23,635 --> 00:24:27,335 principles and risk in data pipelines and MLOps 393 00:24:27,475 --> 00:24:31,075 and start from the building and up to the runtime and start from the 394 00:24:31,075 --> 00:24:34,135 classic machine learning and up to Gen AI, very comprehensive, 395 00:24:34,675 --> 00:24:38,310 very also practical, which is very important and 396 00:24:38,310 --> 00:24:42,010 speaks in both language, on both languages. On one hand, 397 00:24:42,310 --> 00:24:46,090 of course, security, but on the other, also like very oriented 398 00:24:46,550 --> 00:24:49,210 for data and machine learning and AI practitioners. 399 00:24:50,390 --> 00:24:51,610 Interesting. Interesting. 400 00:24:54,125 --> 00:24:57,905 What what do you see 401 00:24:58,045 --> 00:25:00,845 well, here's what I mean, I'll have a lot of questions, but one of them 402 00:25:00,845 --> 00:25:04,605 is, do you think the 0 what do you think the 403 00:25:04,605 --> 00:25:08,430 0 trust approach is a good starting point? I don't think 404 00:25:08,430 --> 00:25:11,810 it's the answer here like it is kinda everywhere else. But do you think that, 405 00:25:13,790 --> 00:25:16,370 that type of philosophy of don't trust anything? 406 00:25:17,150 --> 00:25:20,904 Right? Kind of like, I mean, is that because you you mentioned this 407 00:25:20,904 --> 00:25:24,664 early when I talked about network firewalls, right, where the old approach of thing 408 00:25:24,664 --> 00:25:28,424 is just pull the plug or set up rules. And that used 409 00:25:28,424 --> 00:25:31,870 to work, but there's plenty of other ways around it, Both I think 410 00:25:31,870 --> 00:25:35,549 kind of low skill, mid skill, and certainly high skill 411 00:25:35,549 --> 00:25:39,230 ways around that. What do you you mean then 0 412 00:25:39,230 --> 00:25:42,990 trust is meant to address that. What are your thoughts on like I 413 00:25:42,990 --> 00:25:46,049 mean, is that the pro is that the mindset that either 414 00:25:49,155 --> 00:25:52,875 security folks in this space would have to take on? Like, it's more 415 00:25:52,995 --> 00:25:56,515 if they well, they probably already have. Right? Yeah. I think you're, 416 00:25:56,515 --> 00:25:59,735 like, I think you're actually, like, the the you you you perfectly 417 00:26:00,115 --> 00:26:03,610 defined it because I believe that 0 Trust is exactly like you say, it's kind 418 00:26:03,610 --> 00:26:07,210 of like a, like, kind of like a mindset. It's not like a very 419 00:26:07,210 --> 00:26:11,050 accurate, like, technical approach, but it's kind of 420 00:26:11,050 --> 00:26:14,750 like more like a a philosophy with some level of implementation. 421 00:26:17,535 --> 00:26:21,055 I believe that, like, the right mindset and, like, the right framework to look 422 00:26:21,055 --> 00:26:24,435 on a on a security for AI and, like, all the building 423 00:26:24,815 --> 00:26:28,495 and also, like, the runtime is basically to take all the 424 00:26:28,495 --> 00:26:32,200 different principles that we are all already aware 425 00:26:32,419 --> 00:26:36,200 of. Like we are all, like I'm saying, like the security industry, 426 00:26:36,500 --> 00:26:39,720 we are all already aware of on classic software development, 427 00:26:40,340 --> 00:26:44,019 building and runtime, and to implement it on the 428 00:26:44,019 --> 00:26:47,595 data and AI lifecycle. For example, if we mentioned, like, 429 00:26:47,595 --> 00:26:51,435 code scanning, so code scanning the notebooks, we mentioned open source, 430 00:26:51,435 --> 00:26:55,215 so checking all the all the Ag interface models. But it's not only that. 431 00:26:55,355 --> 00:26:58,875 For example, one of the things that, like, we see, a lot of attacks that 432 00:26:58,875 --> 00:27:02,610 we, like, we had recently in the security area are around the 433 00:27:02,610 --> 00:27:06,130 CICD. A few years ago, there was a big attack called 434 00:27:06,130 --> 00:27:09,970 SolarWinds, that basically, yeah, so you know it 435 00:27:09,970 --> 00:27:13,650 perfectly, just for the audience that, like, are not familiar with the specific details 436 00:27:13,650 --> 00:27:17,245 in, like, very high level attacker that exploited and 437 00:27:17,245 --> 00:27:20,925 misconfigurations in CICD tools. And this is 438 00:27:20,925 --> 00:27:24,605 basically how they succeeded, like, to start, like, this whole huge attack and 439 00:27:24,605 --> 00:27:28,385 breach. Now one of the things that, like, it taught us all as an industry 440 00:27:28,890 --> 00:27:32,570 is that until now we were focusing on, like, securing only 441 00:27:32,570 --> 00:27:36,250 our code. Now we understand that the code is not enough. We need to make 442 00:27:36,250 --> 00:27:39,930 sure that the building tools are also well configured. So 443 00:27:39,930 --> 00:27:42,505 we start, like, to see a lot of, like, tools that help us to make 444 00:27:42,505 --> 00:27:46,265 sure that we don't have misconfigurations in the CICD and the SCMs and all 445 00:27:46,265 --> 00:27:50,105 these different kind of tools. But when we are going to our domain, when 446 00:27:50,105 --> 00:27:53,785 we go to the data and AI teams, as we know, we just use different 447 00:27:53,785 --> 00:27:57,620 stack. We use all these data pipelines and model 448 00:27:57,620 --> 00:28:01,460 registries and MLOps tools and platforms like Databricks and Domino 449 00:28:01,460 --> 00:28:05,060 and Snowflake and stuff like that. The configuration, as we know, is 450 00:28:05,060 --> 00:28:08,900 not like neverwhere. Most time, it's even wider. This is why it's 451 00:28:08,900 --> 00:28:12,595 not managed by DevOps. It's managed by us, by the data teams. It's managed by 452 00:28:12,595 --> 00:28:16,435 MLOps teams, by data infra, by data platform. And we're doing a 453 00:28:16,435 --> 00:28:20,275 lot like, a great job in order to optimize all the configuration for the 454 00:28:20,275 --> 00:28:23,850 product. We're not security experts. We don't want to be 455 00:28:23,850 --> 00:28:26,970 security experts and, like, start, like, to spend a lot of time in that. But 456 00:28:26,970 --> 00:28:30,670 nobody else just like to very easily find all these different kind of misconfigurations. 457 00:28:31,370 --> 00:28:35,210 And this is also a threat and, like, attack vector that we 458 00:28:35,210 --> 00:28:38,010 started, like, to see a lot in the field today. I can tell you that, 459 00:28:38,010 --> 00:28:40,575 like, we see tons of attacks around 460 00:28:41,914 --> 00:28:45,615 different misconfigurations in tools like Airflows and Databricks 461 00:28:45,755 --> 00:28:49,115 and stuff like that. And I think this is also like a very, very important, 462 00:28:49,115 --> 00:28:52,890 like, mindset, like, to be in. And in addition to that, of course, we have 463 00:28:52,890 --> 00:28:56,650 all the all the runtime and all the adversarial attacks there. 464 00:28:56,650 --> 00:29:00,030 There are specifically, if I mentioned in the 465 00:29:00,330 --> 00:29:03,950 OSPI exchange, so OSPI exchange covers everything. 466 00:29:04,409 --> 00:29:07,815 The OSPI 10LLM specifically is more 467 00:29:08,474 --> 00:29:10,934 covering this LLM, like, 468 00:29:12,115 --> 00:29:15,794 specific risk. And then you have, like, all the adversarial attacks, like prompt injection 469 00:29:15,794 --> 00:29:19,174 and model jailbreak and model dn out of service, model dn out of wallet, 470 00:29:19,634 --> 00:29:23,450 etcetera. So basically, the mindset should 471 00:29:23,450 --> 00:29:26,890 be we already know security very well. We already have, like, these 472 00:29:26,890 --> 00:29:30,250 principles. Until now, we just haven't 473 00:29:30,250 --> 00:29:34,030 implemented them on the data and AI teams, 474 00:29:34,575 --> 00:29:38,174 tools, and technology. And this is exactly what we start, like, to 475 00:29:38,335 --> 00:29:41,054 what we, like, need, like, to start to do. And this is what we see 476 00:29:41,054 --> 00:29:44,255 also that, like, you know, like, now we have no reason. Like, we all see, 477 00:29:44,255 --> 00:29:47,715 like, these different kind of attacks. So we start to see that all the organizations 478 00:29:47,855 --> 00:29:50,250 were, like, starting to to already, like, walk the walk. 479 00:29:52,010 --> 00:29:55,610 Wow. Yeah. I I often wonder too, like, what you 480 00:29:55,610 --> 00:29:59,130 mentioned the pipelines being a vulnerability or an 481 00:29:59,130 --> 00:30:02,350 attack surface. Right? Like, or a potential vulnerability. 482 00:30:03,184 --> 00:30:06,725 I often wonder now, like, when, you know, we're looking at agentic 483 00:30:06,785 --> 00:30:10,325 AI, right, where these things aren't just LLMs, 484 00:30:10,465 --> 00:30:14,165 right, producing text or going through these materials. 485 00:30:14,385 --> 00:30:17,550 We're giving them, you know, abilities, 486 00:30:18,090 --> 00:30:21,610 right, to influence pipelines, right, to to or to 487 00:30:21,610 --> 00:30:25,390 whatever. Right? Like, that just seems to me like a giant 488 00:30:26,410 --> 00:30:30,090 security risk. I mean, telling someone you know, there's there's multiple ways to 489 00:30:30,090 --> 00:30:33,665 break an LOM. Right? Like, obviously, there's the the the $1 Chevy 490 00:30:33,665 --> 00:30:37,105 Tahoe. Right? Where the guy did that. Right? Pretty low tech 491 00:30:37,105 --> 00:30:39,765 approach, pretty brute force ish. 492 00:30:40,865 --> 00:30:43,685 But I often wonder, like, well, what 493 00:30:46,290 --> 00:30:49,990 what sorts of things are agentic systems gonna open up? 494 00:30:50,130 --> 00:30:53,650 Like, what does that look like? I think that this is exactly like where we 495 00:30:53,650 --> 00:30:57,170 we will start, like, to see, like, the very big LLM, 496 00:30:58,535 --> 00:31:01,975 breaches, that we'll have. I believe that, by the 497 00:31:01,975 --> 00:31:05,495 way, my belief is that the the how does the 498 00:31:05,495 --> 00:31:09,335 attack start will still be, like, in a lot of cases, 499 00:31:09,335 --> 00:31:12,855 very similar to what we see today. But the impact of the 500 00:31:12,855 --> 00:31:16,639 attack will be much, much, much, much, much higher because now like the 501 00:31:16,639 --> 00:31:19,700 model cannot only like, promise you a 502 00:31:20,799 --> 00:31:24,179 $1 a car, but you can throw, like, I already like 503 00:31:24,399 --> 00:31:27,919 send the order, can send the car to you, can like book your 504 00:31:27,919 --> 00:31:31,735 hotel, can do like everything there, can share with you, like, the data 505 00:31:31,735 --> 00:31:35,335 of maybe, like, other customers in the application because it is, 506 00:31:35,335 --> 00:31:39,175 like, a RAG architecture, and it is also, like, different, like, tools 507 00:31:39,175 --> 00:31:42,935 that provide him the ability to maybe even, like, write different codes 508 00:31:42,935 --> 00:31:46,710 to the application. And then it might also like start like different types 509 00:31:46,710 --> 00:31:50,310 of remote code execution. As long as we are going to 510 00:31:50,310 --> 00:31:53,990 provide to these NLMs more privilege, more access, 511 00:31:53,990 --> 00:31:57,794 more tools, more abilities, the impact of the risk 512 00:31:57,794 --> 00:32:01,475 that they will be able, like, to cause will be much higher. I still 513 00:32:01,475 --> 00:32:05,154 believe again that that pack vectors are going to start from more or less, 514 00:32:05,154 --> 00:32:08,455 like, the same areas, like prompt injection and model jailbreak, 515 00:32:08,755 --> 00:32:12,274 but they they eventually, like, the outcome of these attacks will be much 516 00:32:12,274 --> 00:32:15,690 higher. I could see that. Because we're giving them 517 00:32:15,690 --> 00:32:19,450 actuators, so to speak. Right? Like we're not we're we're 518 00:32:19,450 --> 00:32:23,290 giving them agency. Right? Like where they could actually do real damage as 519 00:32:23,290 --> 00:32:26,910 opposed to because one thing in saying you're gonna give somebody 520 00:32:27,365 --> 00:32:31,125 a $1 Chevy Tahoe. It's quite another to actually place the order, 521 00:32:31,125 --> 00:32:34,965 sign off on the invoice, and then ship it. Right? Yep. And what 522 00:32:34,965 --> 00:32:37,845 if you'll do, like I don't know. Like, you'll you'll start, like, to see it 523 00:32:37,845 --> 00:32:41,550 also, like, in banks and in investments. They will start, like, to transfer 524 00:32:41,550 --> 00:32:45,150 your money. They will start, like, to invest, like, to buy stock. They will like, 525 00:32:45,150 --> 00:32:48,830 the the the the amount of, like, potential impact here is, like, a 526 00:32:48,830 --> 00:32:52,510 crazy high. I believe, by the way, that eventually, this is going to be one 527 00:32:52,510 --> 00:32:56,095 of the things that, like, we'll see also, like, slow down the adoption, not 528 00:32:56,095 --> 00:32:59,695 less than the than the technology or, like, finding, like, the 529 00:32:59,695 --> 00:33:03,135 right use case. Yeah. No. I could see 530 00:33:03,135 --> 00:33:06,675 that. I I just think that we're just setting, as an industry. 531 00:33:07,380 --> 00:33:11,140 We're setting ourselves up for a huge exploit that we 532 00:33:11,140 --> 00:33:13,640 haven't figured out is already there yet. 533 00:33:14,500 --> 00:33:18,180 And so so what what 534 00:33:18,180 --> 00:33:21,895 can AI engineers, data scientists, 535 00:33:21,895 --> 00:33:24,795 data engineers do today to make things 536 00:33:25,575 --> 00:33:28,615 better? I know we can't fix it because we don't know what's we really don't 537 00:33:28,615 --> 00:33:32,455 know what's broken. I think that's one of the frustrating and kind of fun things 538 00:33:32,455 --> 00:33:36,200 about security work is, like, it's not that there's no vulnerabilities. 539 00:33:36,260 --> 00:33:40,100 You haven't discovered any vulnerabilities yet. Right? There are no unknown there are 540 00:33:40,100 --> 00:33:43,800 always un there are always unknown unknowns. 541 00:33:44,020 --> 00:33:47,720 But if you have an unknown unknown or a known thing, 542 00:33:47,905 --> 00:33:50,945 you can you can say that you pretty much figured that out. But there's this 543 00:33:50,945 --> 00:33:53,845 whole aspect, which I don't think data scientists 544 00:33:54,865 --> 00:33:58,225 fully appreciate. I think they can understand the concept of the unknown 545 00:33:58,225 --> 00:34:01,985 unknowns. But in terms of the consequences of it, I don't 546 00:34:01,985 --> 00:34:05,399 think I think it's gonna take 1 major solar wind style 547 00:34:06,100 --> 00:34:09,719 issue or CrowdStrike style issue to make people conscious 548 00:34:10,179 --> 00:34:14,020 of of that. But how do 549 00:34:14,020 --> 00:34:17,824 we how do we prepare ourselves? Right? You can't 550 00:34:17,824 --> 00:34:21,505 stop the hurricane, but you can board up your windows. Right? Like, you 551 00:34:21,505 --> 00:34:25,344 know, how do you Yeah. I and I totally 552 00:34:25,344 --> 00:34:29,030 agree that, like, what's going through, like, to to shake every everybody 553 00:34:29,650 --> 00:34:33,090 will be, like, the the first SolarWinds or, like, the 4 log 4 554 00:34:33,090 --> 00:34:36,610 j attack that we see, like, in these areas. I think that, 555 00:34:36,610 --> 00:34:40,390 like, I think that you broke it very well 556 00:34:40,449 --> 00:34:43,844 and that we need to relate to both categories. 557 00:34:44,545 --> 00:34:47,285 1st is, like, the known, 558 00:34:47,985 --> 00:34:51,585 which already, like, exist. Like, we know that, like, you know, like, 559 00:34:52,545 --> 00:34:55,364 we see that as scientists. Like, we are not a scientist. 560 00:34:57,580 --> 00:35:01,100 And we see that one of the the things that, like, we see 561 00:35:01,100 --> 00:35:04,000 in in in our code in compared to software developers 562 00:35:04,940 --> 00:35:08,375 is that we don't give a 563 00:35:08,375 --> 00:35:12,075 tip on, like, everything, around security. 564 00:35:12,135 --> 00:35:15,655 Like, you'll see, like, tons of exposed secrets in plain 565 00:35:15,655 --> 00:35:19,275 text. You'll see tons of, like, test and, like, the sensitive data 566 00:35:19,494 --> 00:35:22,555 just like playing. And, like, it's state, like, exposed, like, in the notebooks. 567 00:35:23,140 --> 00:35:26,820 You'll see that we download, like, any dependencies without, like, like, 568 00:35:26,820 --> 00:35:30,500 even, like, think about it. Even so that, like, yeah, it looks like maybe, like, 569 00:35:30,500 --> 00:35:34,280 a bit suspicious and stuff like that. So it's it's far 570 00:35:34,420 --> 00:35:37,700 from from the basic. Let's make sure that, like, what we know that is not 571 00:35:37,700 --> 00:35:41,435 best practice, just, like, start, like, to implement it. And 572 00:35:41,435 --> 00:35:45,275 then regarding the unknown unknown, so, of 573 00:35:45,275 --> 00:35:48,155 course, like, you don't know how to handle it. I think that, like, as you 574 00:35:48,155 --> 00:35:51,915 as you said, you can start to prepare yourself. How do how do you 575 00:35:51,915 --> 00:35:55,410 prepare yourself in security? It's basically to be very 576 00:35:55,410 --> 00:35:58,850 organized and to to make sure that you have, like, the right visibility and 577 00:35:58,850 --> 00:36:02,609 governance. As long as you have, for example, like, you know how to build, 578 00:36:02,609 --> 00:36:06,290 like, your your AI or the machine learning bomb. You know all the 579 00:36:06,290 --> 00:36:10,125 different, like, models that are built or embedded as part of the application, 580 00:36:10,125 --> 00:36:12,625 and you have, like, the right lineage, which one 581 00:36:13,805 --> 00:36:17,265 was trained on which dataset, etcetera. 582 00:36:18,045 --> 00:36:21,565 Once, for example, that now let's say we'll continue with the 583 00:36:21,565 --> 00:36:25,160 examples of of Hugging Face. Like, a new Hugging Face 584 00:36:25,160 --> 00:36:29,000 model is is is now, like, published as a like, someone, 585 00:36:29,000 --> 00:36:32,599 like, found that it's, like, malicious. You because you prepared 586 00:36:32,599 --> 00:36:36,359 yourself and you have, like, the right visibility, you are able to go 587 00:36:36,359 --> 00:36:40,105 and very easily search exactly, like, if you use it and 588 00:36:40,105 --> 00:36:43,865 where you use it in all your organization. And this is also 589 00:36:43,865 --> 00:36:47,464 because you prepare yourself. This is exactly what happened, like, in Log 4 590 00:36:47,464 --> 00:36:51,145 j. In Log 4 j, it was like a dependency that 591 00:36:51,145 --> 00:36:54,730 found as a critical vulnerable. And a lot of 592 00:36:54,730 --> 00:36:58,350 organization, what they spent, like, most of the time is to try to understand 593 00:36:58,890 --> 00:37:02,730 where they even use this Log4j. And they seem that, like, if you prepare 594 00:37:02,730 --> 00:37:06,575 yourself, you are like, if you are organizing everything, you'll already 595 00:37:06,575 --> 00:37:10,174 be very, very, like, ready for the for the 596 00:37:10,174 --> 00:37:13,934 attack of, like, the unknown unknown. And, of course, everything 597 00:37:13,934 --> 00:37:17,474 in addition to to, you know, like, learning and, like, educating 598 00:37:17,535 --> 00:37:21,230 yourself. If you start, like, to understand, you'll go 599 00:37:21,230 --> 00:37:24,990 to, I don't know, Databricks, for example. A lot of people use Databricks. You'll 600 00:37:24,990 --> 00:37:27,790 go and, like, start, like, to see what are, like, the best practices of how 601 00:37:27,790 --> 00:37:31,550 to, like, configure your Databricks environments and what are, like, the best practices 602 00:37:31,550 --> 00:37:34,775 there. It's something that you can, like, find very easily, like, in the Internet. You 603 00:37:34,775 --> 00:37:37,194 don't need, like, to to do it, like, from scratch. 604 00:37:38,615 --> 00:37:42,055 But I'll say that, like, you you know, like, it's still, like, when we are 605 00:37:42,055 --> 00:37:45,655 aware of that, it's not still, like, the the top of our mind as the 606 00:37:45,655 --> 00:37:49,400 data practitioner to start looking, like, in our free time for this 607 00:37:49,400 --> 00:37:53,240 kind of concept. Right. I mean, that's a good point. 608 00:37:53,240 --> 00:37:55,980 Right? The fundamentals are still fundamental. Right? 609 00:37:57,240 --> 00:38:01,025 You know, making sure, you know, you track what 610 00:38:01,025 --> 00:38:04,465 your dependencies are. Right? So that way, if there's a breach in a hugging face 611 00:38:04,465 --> 00:38:08,225 model, like you said, you'll know right away whether or not it 612 00:38:08,225 --> 00:38:11,905 impacts you or not. Also too, I think you're 613 00:38:11,905 --> 00:38:15,445 right. This isn't top of mind for AI practitioners. Right? 614 00:38:17,320 --> 00:38:21,079 Even when I code, like, an app, my met 615 00:38:21,240 --> 00:38:24,140 my thought process are very different than when I'm in a notebook. 616 00:38:24,760 --> 00:38:28,060 Mhmm. It's just different wiring. 617 00:38:28,395 --> 00:38:32,075 Yep. And by the way, it's kind of like, it's kind of 618 00:38:32,075 --> 00:38:35,055 like a paradox because most times on the notebooks, 619 00:38:35,915 --> 00:38:39,355 we are connected to much more sensitive information than on our 620 00:38:39,355 --> 00:38:43,040 ID. Right. No. Exactly. So 621 00:38:43,040 --> 00:38:46,720 it's kind of it's like the worst, one of the worst case 622 00:38:46,720 --> 00:38:50,480 scenarios. Right? And and you're right. Like, people wanna work with real 623 00:38:50,480 --> 00:38:54,080 data, and they they just assume that if they're on a system that's 624 00:38:54,080 --> 00:38:57,470 secured and internal, they 625 00:38:58,695 --> 00:39:02,215 they, they don't have to worry about such things, 626 00:39:02,215 --> 00:39:06,055 which I think you're right. Like, with these systems that have access to 627 00:39:06,055 --> 00:39:09,895 sensitive data, these pipelines, I mean, it's one of those 628 00:39:09,895 --> 00:39:13,690 things where we need to start thinking about this. And what would you do 629 00:39:13,690 --> 00:39:17,289 you think that there's a, like, a career path for, like, an AI security engineer? 630 00:39:17,289 --> 00:39:20,829 Right? So it's not just a security engineer, like, in a traditional 631 00:39:20,890 --> 00:39:24,589 sense. Right? But also a someone who specializes 632 00:39:24,809 --> 00:39:28,635 in AI related issues. You think that's a growth industries? I 633 00:39:28,635 --> 00:39:31,035 have, like, no doubt that we are going to like to see more. Like, we 634 00:39:31,035 --> 00:39:34,715 already see these kind of practitioners in the field. I have no 635 00:39:34,715 --> 00:39:38,555 doubt that it's going, to be more and more frequent. And in 636 00:39:38,555 --> 00:39:41,680 addition to that, I believe that, like, even in the future, it's it's going to 637 00:39:41,680 --> 00:39:45,440 be even, like, several different, like, roles. For example, one of the 638 00:39:45,440 --> 00:39:48,800 things that, like, a lot of people that we work also, like, very closely with 639 00:39:48,800 --> 00:39:52,560 are AI red teaming. Right. It's not even, 640 00:39:52,560 --> 00:39:56,365 like, just like a AI security engineer, like, general one. Specifically around, 641 00:39:56,365 --> 00:39:59,425 like, credit teaming because all these kinds of adversarial 642 00:39:59,725 --> 00:40:03,105 attacks on models are very different, requires 643 00:40:03,485 --> 00:40:07,245 different techniques, different tactics. And the red teamers are the 644 00:40:07,245 --> 00:40:10,550 ones that, like, to, like, learning all these different 645 00:40:10,930 --> 00:40:14,470 types of adversarial attacks and how to, like, check your model, 646 00:40:15,170 --> 00:40:18,850 in your organization. And by the way, specifically in this 647 00:40:18,850 --> 00:40:22,275 area, I do feel that it's kind of, like, top priority and 648 00:40:22,275 --> 00:40:25,795 like top of mind also for the data science 649 00:40:25,795 --> 00:40:29,015 team. Like you do see that on LLMs, 650 00:40:29,795 --> 00:40:33,155 once they are deployed into production, the data 651 00:40:33,155 --> 00:40:36,615 scientists, they are kind of like understand that there are a lot of risk there 652 00:40:36,770 --> 00:40:40,450 and they are starting, like, to take also, like, responsibility even completely, like, regard 653 00:40:40,690 --> 00:40:44,390 regardless of the security team to make sure that, like, we we 654 00:40:44,609 --> 00:40:48,130 we reduce some of the risk there. Now the risk is not only 655 00:40:48,130 --> 00:40:51,825 security. The first thing is security, like, to try and, like, make sure 656 00:40:51,825 --> 00:40:55,585 that you are secured from all these different adversarial attacks or that you know how 657 00:40:55,585 --> 00:40:59,345 to detect sensitive data leakage, for example, as part of the response and stuff 658 00:40:59,345 --> 00:41:03,025 like that. In addition to that, it's also a lot of time 659 00:41:03,025 --> 00:41:06,800 like safety risks. You want to make sure that once you deploy LLM into 660 00:41:06,800 --> 00:41:10,480 production, your model doesn't give any financial advice to your 661 00:41:10,480 --> 00:41:14,100 customers, doesn't give any health advice in case it's not your business. 662 00:41:14,560 --> 00:41:18,395 So you then have, like, these kinds of responsibility, or example, like in the 663 00:41:18,395 --> 00:41:22,235 Chevy example that you gave, that you just, like, you don't just, release 664 00:41:22,235 --> 00:41:25,915 free cars or flights or books or a tail off, like, anything 665 00:41:25,915 --> 00:41:29,455 like that. So I think that because the 666 00:41:29,595 --> 00:41:33,090 the the the amount of potential risks are 667 00:41:33,090 --> 00:41:36,690 so high on the run time. In this area, I 668 00:41:36,690 --> 00:41:40,450 believe that, like, the data scientists already understood that this is, like, 669 00:41:40,450 --> 00:41:44,130 under their responsibility. They see it also as part of, 670 00:41:44,130 --> 00:41:47,695 like, being a professional data scientist. If I 671 00:41:47,695 --> 00:41:51,095 deploy this model, it has, like, a lot of, like, accuracy, but, 672 00:41:51,095 --> 00:41:54,155 like, it creates all these different kinds of risk. 673 00:41:54,855 --> 00:41:58,615 I would define myself as not a super professional data 674 00:41:58,615 --> 00:42:01,990 scientist, unlike on the supply chain, unlike in the 675 00:42:01,990 --> 00:42:05,670 notebooks that if I code a code that is not secure, I wouldn't say that, 676 00:42:05,670 --> 00:42:08,950 like, it's not professional. I would say that, like, it's okay. You're just, like, focusing 677 00:42:08,950 --> 00:42:12,230 on the business. So I do believe that we start, like, to seeing this shift 678 00:42:12,230 --> 00:42:15,925 also, like, in the mindset of the data scientist because of the risk of 679 00:42:15,925 --> 00:42:19,525 the Gen AI, but now it's also, like, like, a move 680 00:42:19,525 --> 00:42:23,204 to to all the the development and the building practices 681 00:42:23,204 --> 00:42:26,964 that we have. Yeah. And I think data 682 00:42:26,964 --> 00:42:30,480 scientists are acutely aware that LLMs 683 00:42:31,500 --> 00:42:35,340 are just taking they mean, we talk we we call it hallucinating when 684 00:42:35,340 --> 00:42:39,020 they get things wrong. But realistically, they're 685 00:42:39,020 --> 00:42:42,780 always hallucinating to a very real degree. Right? It's just they 686 00:42:42,780 --> 00:42:46,285 happen to be correct. And what these things are doing 687 00:42:46,664 --> 00:42:49,964 under the hood is they are looking for patterns of words. 688 00:42:50,825 --> 00:42:54,204 Sometimes those patterns of words are wrong, obviously wrong. 689 00:42:54,505 --> 00:42:57,325 And sometimes they may give out sensitive information 690 00:42:58,480 --> 00:43:01,920 inadvertently. So I can talk at least at least there's some common sense out there 691 00:43:01,920 --> 00:43:05,680 when they when they do realize these things are higher risk than I think 692 00:43:05,680 --> 00:43:09,360 we've been led to believe. Yeah. Actually, I love this this finish. They are, 693 00:43:09,360 --> 00:43:13,200 like, hallucinating, like, all this time. Sometimes they really find it 694 00:43:13,200 --> 00:43:16,955 as wrong. Like, they do the same thing as always. Right. 695 00:43:16,955 --> 00:43:20,795 Right. The they don't know they're hallucinating because they're just operating normally. 696 00:43:20,795 --> 00:43:24,235 And so when they go in a different direction and I've noticed 697 00:43:24,235 --> 00:43:27,970 that, you know, kinda like a little bit of, you you know, off by a 698 00:43:27,970 --> 00:43:31,410 little bit, and then then then it generates an off by a little bit, off 699 00:43:31,410 --> 00:43:35,250 by a little bit. I ran an experiment with a hallucination, and 700 00:43:35,250 --> 00:43:37,970 I read it through I ran it through a bunch of models and each one 701 00:43:37,970 --> 00:43:41,270 of them didn't do any fact checking, which I mean, realistically, 702 00:43:42,245 --> 00:43:45,605 I wouldn't expect that. Right? In the future, I think that'll be kind of table 703 00:43:45,605 --> 00:43:49,445 stakes. But, you know, it would just go through. So 704 00:43:49,445 --> 00:43:53,205 I took a hallucination, fed it through notebook l m, which then 705 00:43:53,205 --> 00:43:56,760 create even more hallucinations. Right? So it took this little 706 00:43:56,760 --> 00:44:00,380 genesis of something that was wrong and then made it even crazier wrong, 707 00:44:01,160 --> 00:44:04,760 which I think is an interesting kinda statement and and and 708 00:44:04,760 --> 00:44:07,900 also is a risk. Right? Like hallucination on top, compounding 709 00:44:08,475 --> 00:44:12,235 other hallucinations. And I don't think we've really seen that yet because we've 710 00:44:12,235 --> 00:44:15,995 only really seen for the most part, I've only seen one kind 711 00:44:15,995 --> 00:44:19,675 of model in production. But if you have these models that will kinda work together 712 00:44:19,675 --> 00:44:23,320 as agents or, you know, whether they're agents 713 00:44:23,320 --> 00:44:27,160 that do things or agents that it's different LLM discrete LLMs that talk 714 00:44:27,160 --> 00:44:30,760 to one another. They can get things wrong and make things worse. I mean, I 715 00:44:30,760 --> 00:44:34,200 haven't I think it's too soon to tell either way, honestly. Yeah. But, like, the 716 00:44:34,280 --> 00:44:37,714 like, theoretically, like, it makes a lot of sense. I think in general, like, we 717 00:44:37,714 --> 00:44:41,255 don't see, like, a lot like, we hear a lot about Gen AI. 718 00:44:41,555 --> 00:44:45,394 I think that, like, the level of adoption and the amount 719 00:44:45,394 --> 00:44:49,070 of business use cases that, like, businesses 720 00:44:49,290 --> 00:44:52,910 found are not that high yet. I think that, like, the 721 00:44:53,210 --> 00:44:56,830 most of the usage today is done by, like, consumers, like, 722 00:44:57,690 --> 00:45:01,530 like, directly, like, from, from the foundation model providers, like OpenAI and stuff 723 00:45:01,530 --> 00:45:05,315 like that for day to day, like, jobs, like, you know, 724 00:45:05,315 --> 00:45:08,375 like, reviewing mails and stuff like that. 725 00:45:09,395 --> 00:45:12,994 The the big businesses are still trying to find these 726 00:45:12,994 --> 00:45:16,750 different, like, use cases. I do believe that the that the 727 00:45:16,750 --> 00:45:20,430 agents are going, like, to open a lot of different use cases 728 00:45:20,430 --> 00:45:24,270 around it. Right. Right. I could I could see that. And 729 00:45:24,270 --> 00:45:28,030 I think I think it's just too soon to make a statement 730 00:45:28,030 --> 00:45:31,665 either way. But I think grounding yourself in the fundamentals 731 00:45:31,724 --> 00:45:34,785 is probably always a good idea. Mhmm. 732 00:45:35,405 --> 00:45:38,765 And probably a good a good 733 00:45:38,765 --> 00:45:42,600 approach. So so tell me about NOMA. What is is it NOMA? I 734 00:45:42,600 --> 00:45:46,280 I don't wanna make sure I pronounce it. NOMA. Okay. NOMA. Security. What does 735 00:45:46,280 --> 00:45:50,040 NOMA do? Is it security firms that focus on this space? You 736 00:45:50,040 --> 00:45:53,500 mentioned red teaming. Is that is that a sir service you offer? 737 00:45:53,960 --> 00:45:57,615 Yeah. So NOMA basically is an like, our name is Nomo 738 00:45:57,615 --> 00:46:01,375 Security. The domain is Nomo dot security. So it's Oh, okay. Sorry about 739 00:46:01,375 --> 00:46:05,135 that. No. No. We're good. So, so, yeah, 740 00:46:05,135 --> 00:46:08,190 what we do is, like, secure the entire data in the AI life cycle. 741 00:46:09,150 --> 00:46:12,589 Basically means that we truly, like, cover it end to end. Like, we enable, like, 742 00:46:12,589 --> 00:46:16,430 the data teams and the machine learning and the AI teams, to continue and 743 00:46:16,430 --> 00:46:20,030 innovate while we are securing them without 744 00:46:20,030 --> 00:46:23,725 slowing down. And this is like the the like, we are built from, like, 745 00:46:23,725 --> 00:46:27,105 data practitioner, like, the company. So this is, like, our main focus, 746 00:46:27,405 --> 00:46:31,085 meaning that we start, like, from the building phase. So if we 747 00:46:31,085 --> 00:46:34,765 said, like, notebooks and hugging face models and all these different stuff and the 748 00:46:34,765 --> 00:46:38,400 misconfigurations are on all the different stack and all the envelopes 749 00:46:38,400 --> 00:46:42,080 tools and AI platforms and data pipelines and stuff like that. So we are 750 00:46:42,080 --> 00:46:45,520 connected seamlessly on the background, and, 751 00:46:45,840 --> 00:46:49,060 basically assist the the data teams to to work securely, 752 00:46:49,760 --> 00:46:52,395 without changing changing anything in the workflows. 753 00:46:53,494 --> 00:46:56,795 And then also, like, we provide, as you said, the red teaming. 754 00:46:57,255 --> 00:47:00,934 Before you're deploying the model into production, you want to 755 00:47:00,934 --> 00:47:04,450 understand what is the level of, of 756 00:47:04,450 --> 00:47:08,290 robustness and security that the like, that your model has. And 757 00:47:08,290 --> 00:47:11,890 what we do is we had, like, a big research team that, 758 00:47:11,890 --> 00:47:15,490 like, builds, simulated, thousands of different 759 00:47:15,490 --> 00:47:19,185 attacks. And then we dynamically start to run all these attacks against 760 00:47:19,185 --> 00:47:22,485 your models, showing you exactly, like, what kind of, like, tactics 761 00:47:23,025 --> 00:47:26,705 and techniques your model is vulnerable to, and exactly 762 00:47:26,705 --> 00:47:30,145 also how to mitigate and improve it to be more 763 00:47:30,145 --> 00:47:33,940 robust. And then the 3rd part is also the runtime. 764 00:47:34,560 --> 00:47:38,080 We are mapping, we're scanning all the prompts and all the 765 00:47:38,080 --> 00:47:41,840 responses in real time, making sure that you don't 766 00:47:41,840 --> 00:47:45,375 have any risk on both sides. The security, we are detecting all these 767 00:47:45,375 --> 00:47:49,135 different kind of, like, a host and a little, like, adversarial tax prompt 768 00:47:49,135 --> 00:47:52,735 injection, model jailbreak, etcetera. We check also the responses for 769 00:47:52,735 --> 00:47:56,415 sensitive data leakage and stuff like that. But in addition, also the 770 00:47:56,415 --> 00:48:00,180 safety. We see a lot of organizations that the data scientists, as we 771 00:48:00,180 --> 00:48:03,540 said, they understand the risk of deploying 772 00:48:03,540 --> 00:48:07,140 models into production. And this is why not even, like, the security, but more like 773 00:48:07,140 --> 00:48:10,680 the the Chevy example and, like, the the health advice and stuff like that. 774 00:48:10,895 --> 00:48:14,335 So they built for their own, model 775 00:48:14,335 --> 00:48:18,095 guardrails in order to make sure that they are, like, controlling what 776 00:48:18,095 --> 00:48:21,694 are, like, the topics that the model is be able like, is allowed or 777 00:48:21,694 --> 00:48:25,330 disallowed to communicate about. And what we do is basically to save 778 00:48:25,330 --> 00:48:29,010 them also like this time. We also provide them, like, all this 779 00:48:29,010 --> 00:48:32,770 runtime protection already, like, as a service. You can define exactly what kind 780 00:48:32,770 --> 00:48:36,610 of, like, detectors and in native language, what kind of, like, policies you want 781 00:48:36,610 --> 00:48:40,244 to make sure that are enforced. And then we also, like, protect it in the 782 00:48:40,244 --> 00:48:43,685 run time. So, basically, we just, like, cover you, like, end to end, start from 783 00:48:43,685 --> 00:48:46,905 the building and up to the run time. It starts from the classic data engineering 784 00:48:47,205 --> 00:48:50,984 pipelines and machine learning and up to gen AI. Interesting. Interesting. 785 00:48:51,125 --> 00:48:54,540 It sounds like something I think is totally, I think, a 786 00:48:54,540 --> 00:48:58,380 needed needed service and and skill 787 00:48:58,380 --> 00:49:01,980 set. Because you're right. Like, I mean, there's just so many risks 788 00:49:01,980 --> 00:49:05,500 here, and the hype around Gen 789 00:49:05,500 --> 00:49:09,055 AI is so over the top. 790 00:49:10,234 --> 00:49:13,055 It is gonna be revolutionary, but 791 00:49:13,915 --> 00:49:16,954 maybe not in the way you think. Right? And I always call back to the 792 00:49:16,954 --> 00:49:20,700 early days of the dotcom. Right? Where it was pets.com. There was, 793 00:49:20,859 --> 00:49:24,700 you know, this.com, that, you know, like all these crazy things. But the 794 00:49:24,700 --> 00:49:28,300 real quote unquote winner of, you know, 795 00:49:28,300 --> 00:49:31,599 .com was some guy in Seattle selling books. 796 00:49:31,980 --> 00:49:35,660 Mhmm. Right? No one no one I mean, selling books. Like, really? 797 00:49:35,660 --> 00:49:39,395 Like, not, you know, and it's 798 00:49:39,395 --> 00:49:43,155 it's interesting to see how I think I 799 00:49:43,155 --> 00:49:46,675 think that the the obvious use case for chat for for 800 00:49:46,675 --> 00:49:50,355 LLMs thus far has been chatbots. Right? Customer 801 00:49:50,355 --> 00:49:54,010 service type things. I think that's really only the 802 00:49:54,010 --> 00:49:57,690 the the the the the surface of it. I think for me, what 803 00:49:57,690 --> 00:50:01,530 I've seen is most impactful is the ability for natural language 804 00:50:01,530 --> 00:50:05,290 understanding and their ability to understand what's happening in a in 805 00:50:05,290 --> 00:50:08,655 a block of text. And I think 806 00:50:08,655 --> 00:50:12,415 that that has enormous potential. I 807 00:50:12,415 --> 00:50:16,255 agree. A lot of risks too. Right? Because what if, you know, what if 808 00:50:16,255 --> 00:50:19,695 I I mean, to your point. Right? You wanna make sure these things stay on 809 00:50:19,695 --> 00:50:23,099 topic. Right? Like, I don't if I'm talking to a 810 00:50:23,099 --> 00:50:26,880 financial services chatbot and I say, hey, I have 811 00:50:27,420 --> 00:50:29,839 my my leg kinda hurts. Right? 812 00:50:31,180 --> 00:50:34,940 It's, you know, the risk of moving into health care, like, it's just kind 813 00:50:34,940 --> 00:50:38,755 of, I don't how mature are those guardrails? Because I've 814 00:50:38,755 --> 00:50:42,515 not really seen a good implementation of 815 00:50:42,515 --> 00:50:46,275 it yet. Yeah. So, you know, like, I 816 00:50:46,275 --> 00:50:49,955 don't want to to give ourself, like, a compliment, but, 817 00:50:51,230 --> 00:50:54,670 we Oh, you guys are pretty good at it? Yeah. Like, we're pretty good. Like, 818 00:50:54,670 --> 00:50:57,970 we were, like, you know, like, with fortune 5 100, with fortune 1 100. 819 00:50:58,670 --> 00:51:02,349 Not in vain. But, yeah, I believe that in general, specifically, like, when we speak 820 00:51:02,349 --> 00:51:06,125 more, like, on the guardrail side, I see that the most important thing is 821 00:51:06,125 --> 00:51:09,965 to make sure that it's, it's building the 822 00:51:09,965 --> 00:51:13,565 right architecture to be very flexible and easily 823 00:51:13,565 --> 00:51:17,325 configure for the organization because eventually, like, you know, like, each 824 00:51:17,325 --> 00:51:20,900 organization is completely different needs, completely different 825 00:51:20,900 --> 00:51:24,599 context to the calls, like, in their customers, internally to their employees. 826 00:51:25,460 --> 00:51:29,240 So everything should should be, like, very easily configured, but very flexible. 827 00:51:30,180 --> 00:51:33,755 Interesting. Interesting. I wanna I I could talk for 828 00:51:33,755 --> 00:51:36,894 another hour or 2 with you because this is this is a fascinating space. 829 00:51:38,315 --> 00:51:42,154 Where can folks find out more about Noma and you? I you think it's Noma 830 00:51:42,154 --> 00:51:45,970 dot security? Yeah. Noma dot security. Can't believe that's now 831 00:51:45,970 --> 00:51:47,510 a top load pain, but, 832 00:51:50,770 --> 00:51:54,610 and, any any, NOMA dot 833 00:51:54,610 --> 00:51:58,130 security, you're on LinkedIn, and, anything 834 00:51:58,130 --> 00:52:01,275 else you you'd like the folks to find out more? 835 00:52:02,615 --> 00:52:06,135 No. I had, like, a great time speaking with you, Frank. Great. 836 00:52:06,135 --> 00:52:09,815 Likewise. And for the listeners out there, if you're a little bit 837 00:52:09,815 --> 00:52:13,529 scared and a little bit paranoid about generative AI and LLMs, 838 00:52:13,670 --> 00:52:16,549 then I think we had a good conversation. Because I think we need a little 839 00:52:16,549 --> 00:52:19,910 bit of that fear in the back of our heads to guide us and 840 00:52:19,910 --> 00:52:23,589 maybe think about security issues. A 841 00:52:23,589 --> 00:52:26,309 little bit of thought ahead of time will probably save you a lot of problems 842 00:52:26,309 --> 00:52:29,995 later. And want to lose some. That's 843 00:52:29,995 --> 00:52:33,055 that's all I got, and we'll let the nice British AI, 844 00:52:33,915 --> 00:52:37,755 Bailey finish the show. Well, that wraps up another 845 00:52:37,755 --> 00:52:41,540 eye opening episode of data driven. A big thank you to Niamh 846 00:52:41,540 --> 00:52:45,320 Braun for sharing his expertise on the critical intersection of AI, 847 00:52:45,700 --> 00:52:49,540 security, and innovation. If today's conversation didn't make 848 00:52:49,540 --> 00:52:52,900 you double check your data pipelines or rethink your Hugging Face 849 00:52:52,900 --> 00:52:56,535 downloads, well, you're braver than I am. As always, 850 00:52:56,595 --> 00:53:00,435 I'm Bailey, your semi sentient MC, reminding you that while 851 00:53:00,435 --> 00:53:04,215 AI might be clever, it's never too clever for a security breach. 852 00:53:04,435 --> 00:53:08,195 Until next time, stay curious, stay secure, and 853 00:53:08,195 --> 00:53:10,258 stay data driven. Cheerio.