1
00:00:00,240 --> 00:00:03,919
Hello, listeners. And welcome back to another thrilling episode

2
00:00:03,919 --> 00:00:07,680
of data driven. In today's episode, we delve deep into the

3
00:00:07,680 --> 00:00:11,519
fascinating and, let's be honest, slightly terrifying world of

4
00:00:11,519 --> 00:00:14,985
generative AI and security risks. Joining us is

5
00:00:14,985 --> 00:00:18,605
Niamh Braun, co founder and CEO of Noma Security,

6
00:00:18,904 --> 00:00:22,425
who's on the front lines of keeping your AI driven project safe from

7
00:00:22,425 --> 00:00:26,220
digital mischief. So grab a cuppa and let's get data

8
00:00:31,259 --> 00:00:35,100
driven. Well, hello, and welcome back to Data Driven, the podcast where we explore

9
00:00:35,100 --> 00:00:38,460
the emergent fields of AI, data science, and, of course, data

10
00:00:38,460 --> 00:00:41,875
engineering. Speaking of data engineering, my favoritest data

11
00:00:41,875 --> 00:00:45,714
engineer in the world can't make it, today. But we

12
00:00:45,714 --> 00:00:49,335
have an exciting, conversation queued up with Niv Braun,

13
00:00:49,635 --> 00:00:53,220
who is the cofounder and CEO of Noma. Noma

14
00:00:53,220 --> 00:00:56,840
is a security firm that focuses on effectively

15
00:00:57,460 --> 00:01:01,140
he'll describe it more eloquently than I can, but effectively thinks about

16
00:01:01,140 --> 00:01:04,740
security in the context of data and AI across the

17
00:01:04,740 --> 00:01:08,185
entire life cycle. Welcome to the show, Niv. Hey,

18
00:01:08,185 --> 00:01:11,865
Frank. Happy to hear you, bro. Yeah. It's good to have

19
00:01:11,865 --> 00:01:15,625
you. And and security is one of those things where I've been thinking about more

20
00:01:15,625 --> 00:01:19,085
lately. Right? So my background was a software engineer and,

21
00:01:19,225 --> 00:01:22,970
you know, software engineers historically have not thought of

22
00:01:22,970 --> 00:01:26,810
security. Then I made the transition into data engineering and data

23
00:01:26,810 --> 00:01:30,490
science, and, traditionally, security is not really at top

24
00:01:30,490 --> 00:01:34,090
of mind, for them either. Now I

25
00:01:34,090 --> 00:01:37,854
kinda look at this, and I kinda look at the landscape that we're in where

26
00:01:37,854 --> 00:01:40,354
enterprises are deploying LLMs,

27
00:01:41,455 --> 00:01:44,915
generative AI solutions, on top of the predictive AI solutions,

28
00:01:45,935 --> 00:01:49,760
fast and furiously, and not thinking about

29
00:01:49,760 --> 00:01:53,520
security ramifications. So what are your what's your take on

30
00:01:53,520 --> 00:01:57,360
that? 100% agree. I think that, it's

31
00:01:57,360 --> 00:02:00,820
even like the the the the current, like, timing is even more fascinating

32
00:02:01,120 --> 00:02:04,725
than the than just, like, a new technology. Because exactly like you said, like,

33
00:02:04,725 --> 00:02:08,565
Frank, like, we all like the data practitioners. We all know that, like, security is

34
00:02:08,565 --> 00:02:11,445
not, like, our top priority. And by the way, like, by, like, like, this is,

35
00:02:11,445 --> 00:02:14,405
like, how it should be. Like, we are focusing on the business and, like, drive,

36
00:02:14,405 --> 00:02:18,165
like, drive, like, the business forward. And this is why we're, like, this is

37
00:02:18,165 --> 00:02:21,840
what we're paid for. The problem is that

38
00:02:22,140 --> 00:02:25,820
because we're not, like, in this kind of, like, mindset, we also, like, like

39
00:02:25,820 --> 00:02:29,660
any technologies in the company, also, like, create some risk. What we see right

40
00:02:29,660 --> 00:02:33,260
now is the LLM drive, which is pretty cool, is that for the

41
00:02:33,260 --> 00:02:37,085
first time, the security teams started to put

42
00:02:37,085 --> 00:02:40,765
the focus and, like, the spotlight on the data and AI teams. Because until

43
00:02:40,765 --> 00:02:44,525
now, let's be honest, they were focusing only on the software developers and

44
00:02:44,525 --> 00:02:48,205
their SDLC and the CICD and all these areas. Like, we were,

45
00:02:48,205 --> 00:02:51,300
like, you know, like, in the shadow. And we were, like, able, like, to act

46
00:02:51,300 --> 00:02:55,000
like exactly like, like, like, completely freely as we wanted.

47
00:02:55,620 --> 00:02:58,819
But now when, like, the security team start, like, to put the spotlight on the

48
00:02:58,819 --> 00:03:02,355
data and AI teams, what they understand is that it's not

49
00:03:02,355 --> 00:03:06,115
only this new kind of LLM threats, but also all

50
00:03:06,115 --> 00:03:09,655
the basic principles of security are not implemented

51
00:03:09,955 --> 00:03:13,795
in the data engineers and the data science teams. Nobody, like, scans all the

52
00:03:13,795 --> 00:03:17,610
code in our notebooks, for example, unlike the software developers that, like, all

53
00:03:17,610 --> 00:03:21,209
their code is being scanned. Nobody helps us to

54
00:03:21,209 --> 00:03:24,730
find configurations in our data pipelines or our

55
00:03:24,730 --> 00:03:28,269
MLOps tools or our AI platforms, like Databricks, for example.

56
00:03:28,505 --> 00:03:31,805
Like, nobody, like, provide us this ability to to find it easily,

57
00:03:32,185 --> 00:03:35,945
unlike, again, the software developers that they receive all this coverage

58
00:03:35,945 --> 00:03:39,485
and everything. Like, on the moment that they have, like, the smallest misconfigurations

59
00:03:39,945 --> 00:03:43,480
in their SCM or their their CICD, they

60
00:03:43,480 --> 00:03:47,320
will immediately, like, receive, like, a notification, like,

61
00:03:47,320 --> 00:03:51,080
helping them exactly, like, how to secure it. And also eventually,

62
00:03:51,080 --> 00:03:54,855
like, in the run time, in the runtime, in software life cycle, in

63
00:03:54,855 --> 00:03:58,535
classic like software application, we also have a lot of API security and web

64
00:03:58,535 --> 00:04:02,295
application firewalls tools that help us to protect the application in the

65
00:04:02,295 --> 00:04:06,010
runtime. But now specifically in LLM, this is, like, very

66
00:04:06,010 --> 00:04:09,630
related also, like, to what you said. Like, there are new kind of adversarial attacks,

67
00:04:10,090 --> 00:04:13,790
all the prompt injection and model jailbreak and stuff like that.

68
00:04:13,850 --> 00:04:17,610
And, again, nobody, like, else would like to protect it, like, in real time. And

69
00:04:17,610 --> 00:04:20,565
I think that this is, like, one of, like, the main shift that we see

70
00:04:20,565 --> 00:04:24,264
today in this area. We understand that the spotlight

71
00:04:24,405 --> 00:04:27,604
moved to the data and AI teams, but we need to make sure that we

72
00:04:27,604 --> 00:04:31,125
do, like, both. Like, we start with, like, a new kind, like,

73
00:04:31,125 --> 00:04:34,824
trendy, like, risk that we want to make sure that we are protected from.

74
00:04:35,099 --> 00:04:38,699
But also that for the first time, after a lot of years, we're

75
00:04:38,699 --> 00:04:42,060
starting also, like, to implement the basic security measurements

76
00:04:42,060 --> 00:04:45,900
needed in our area. But the most important thing, of course,

77
00:04:45,900 --> 00:04:49,335
is to continue and, like, do it without slowing us down. Like, we need to

78
00:04:49,335 --> 00:04:53,175
make sure that, like, everything, like, all the different, like, security measurements that

79
00:04:53,175 --> 00:04:57,015
we take still provide us the ability to move fast, to enable

80
00:04:57,015 --> 00:05:00,775
the data sent the data science and the data engineering teams to

81
00:05:00,775 --> 00:05:04,315
continue and, like, innovate, but in a secure way.

82
00:05:04,990 --> 00:05:08,830
You know, that's a good point because I never thought about scanning a notebook for

83
00:05:08,830 --> 00:05:12,430
errors. Right? Shame on me. Right? Like for code

84
00:05:12,430 --> 00:05:16,030
security I mean, not errors, but, you know, security vulnerabilities. That's not something

85
00:05:16,030 --> 00:05:19,469
that I have seen done in practice. I mean, the the

86
00:05:19,469 --> 00:05:23,125
closest I've seen where security has been an issue for

87
00:05:23,125 --> 00:05:25,065
anyone in this space is,

88
00:05:27,205 --> 00:05:30,645
basically using protected, you know, Python

89
00:05:30,645 --> 00:05:34,405
libraries, right, or or Python library repos, right, where they're those

90
00:05:34,405 --> 00:05:38,080
are scanned by, I forget the name of the 3rd party that'll do it where

91
00:05:38,080 --> 00:05:41,919
you just basically say you point your Python instance to there. Yeah. Because

92
00:05:41,919 --> 00:05:45,599
I also think that Internal Artifactory. Yes, exactly. So

93
00:05:45,599 --> 00:05:49,405
like, what, because I often

94
00:05:49,405 --> 00:05:51,104
wonder, you know, people just like to install.

95
00:05:53,645 --> 00:05:57,245
God only knows what's in there. I can tell that, like, it already, like, happens.

96
00:05:57,245 --> 00:06:00,625
Like, I don't know if you heard, but for example, like, like,

97
00:06:01,324 --> 00:06:04,900
like, pretty recently, PyTorch, for example. Right.

98
00:06:04,900 --> 00:06:08,740
PyTorch that we all know was compromised. We all know and love. We're most people

99
00:06:08,740 --> 00:06:12,260
love. It was compromised. Like, specific version of PyTorch, a

100
00:06:12,260 --> 00:06:15,794
malicious actor succeeded to to put some

101
00:06:15,794 --> 00:06:17,815
code inside that basically,

102
00:06:19,395 --> 00:06:22,755
collected all the the the secrets and token that you have in the

103
00:06:22,755 --> 00:06:26,435
environment and sent it to DNS. Now we all

104
00:06:26,435 --> 00:06:30,180
know, like, how much like like, how many downloads, like, PyTorch have.

105
00:06:30,500 --> 00:06:34,340
And most times, where PyTorch is downloaded to through, like, to

106
00:06:34,340 --> 00:06:37,720
all these different, like, notebooks, wherever they be, JupyterOps,

107
00:06:38,100 --> 00:06:41,320
SageMaker, Databricks, like, we all use them.

108
00:06:41,620 --> 00:06:44,100
And it I can tell that, like, it caused us to a lot of, like,

109
00:06:44,100 --> 00:06:47,895
problem. I can tell, like, like, like, firsthand, like, we saw, like, a lot

110
00:06:47,895 --> 00:06:51,275
of organizations that were compromised because of this attack.

111
00:06:52,375 --> 00:06:55,175
And it happens all the time. And by the way, if you mentioned, for example,

112
00:06:55,175 --> 00:06:58,955
like, if you already, like, touched the point of, of open source,

113
00:06:59,700 --> 00:07:03,540
now you have also Hugging Face, which is completely different area. Now it's

114
00:07:03,540 --> 00:07:06,920
not only Open Source packages. It's all these different Open Source

115
00:07:07,060 --> 00:07:10,760
Hugging Face models and Hugging Face datasets. And there,

116
00:07:10,820 --> 00:07:14,440
all these internal artifact are completely useless because they don't even

117
00:07:14,445 --> 00:07:18,045
scan these models. It's completely different technology, completely different, like,

118
00:07:18,045 --> 00:07:21,485
heuristics in order to find it. And, therefore, you start to

119
00:07:21,485 --> 00:07:25,325
see kind of, like, trends for for the attackers. They started to

120
00:07:25,325 --> 00:07:28,920
upload a lot of backdoored and a lot of malicious models

121
00:07:29,160 --> 00:07:32,920
into Hugging Face. I can tell you, like, we personally, we already,

122
00:07:32,920 --> 00:07:36,680
like, detected, I think, almost, like, 100, back

123
00:07:36,760 --> 00:07:40,360
or the malicious models, on Hugging Face because it's a wild

124
00:07:40,360 --> 00:07:44,205
west. Right. Because how do you because these these models, first off,

125
00:07:44,205 --> 00:07:47,885
they're physically large files. Right? So that there's that's a factor.

126
00:07:47,885 --> 00:07:51,485
Right? I don't know how Hugging Face makes money. I'd be

127
00:07:51,485 --> 00:07:55,325
curious to have someone on the show talk about that. But, you know,

128
00:07:55,325 --> 00:07:59,169
they're doing the service. And, how would you even scan? I

129
00:07:59,169 --> 00:08:02,849
mean, that's a good question. Right? What types of vulnerabilities have you sent have you

130
00:08:02,849 --> 00:08:06,689
found so far? And how does one even scan, like, a safe

131
00:08:06,689 --> 00:08:10,405
tensor or g file? Like, how do you what's what's

132
00:08:10,405 --> 00:08:14,025
that look like? Right? Obviously, I'm pretty sure, you know, McAfee

133
00:08:14,085 --> 00:08:17,485
antivirus doesn't have a thing for that. But, like Exactly.

134
00:08:17,925 --> 00:08:21,125
But, how do you even do that? I'm just curious. Yeah. So this is, like,

135
00:08:21,125 --> 00:08:24,410
exactly, like, the problem. Like, it's even, like, in in in the models, like, it's

136
00:08:24,410 --> 00:08:28,250
even, like, a a more, like, the the risk there, like, is

137
00:08:28,250 --> 00:08:32,090
more, like, clearer because as you know, a lot of time, like,

138
00:08:32,090 --> 00:08:35,610
these models in hanging face are even, like, in pickle. And, like, pickle is, like,

139
00:08:35,610 --> 00:08:39,325
by design, like, insecure, like, file. And so

140
00:08:39,885 --> 00:08:43,265
binary dump, right, of, like, the memory space. Yeah. Like, in the deserialization

141
00:08:43,485 --> 00:08:46,705
process, like, basically, you can, like, put, like, any kind of, like, malicious,

142
00:08:47,245 --> 00:08:50,780
action that you'd like, that, like, the attacker can. So we see,

143
00:08:50,780 --> 00:08:54,560
like, different attacks. Like, most of the attacks come today, like, from pickle files.

144
00:08:54,780 --> 00:08:58,220
Some also, like, not even, like, in the deserialization process, but also, like, in the

145
00:08:58,220 --> 00:09:01,735
model code itself. For example, like, if you ask

146
00:09:01,735 --> 00:09:05,475
for a specific example, like, share something that we

147
00:09:05,475 --> 00:09:09,155
detected, like, recently. We found, like, a very,

148
00:09:09,155 --> 00:09:12,915
let's say, a popular, open source, LLA model that we all

149
00:09:12,915 --> 00:09:16,275
know. But we know that, like, a it has a lot of, like, different

150
00:09:16,275 --> 00:09:19,415
versions. And one of the version was actually a docker

151
00:09:19,635 --> 00:09:23,290
that took the original model, wrapped it up with few

152
00:09:23,290 --> 00:09:26,810
lines of code in the model, which what they did is that every

153
00:09:26,810 --> 00:09:30,110
input to the model and every output from the model

154
00:09:30,410 --> 00:09:33,405
was also sent to the attacker, which basically

155
00:09:34,105 --> 00:09:37,705
just received full visibility and observability to all the

156
00:09:37,705 --> 00:09:41,385
runtime application and production. So, like, all the organizations that,

157
00:09:41,385 --> 00:09:45,000
like, use this model. And performance wise, the

158
00:09:45,000 --> 00:09:48,600
data scientist, of course, they cannot, like, detect it because performance

159
00:09:48,600 --> 00:09:52,440
wise, it worked perfectly because it took the original model. So nothing to be

160
00:09:52,440 --> 00:09:56,040
suspicious about. If we want the data

161
00:09:56,040 --> 00:09:59,875
scientist, every new open source model that they like, like

162
00:09:59,875 --> 00:10:03,715
in Hugging Face, they'll start, like, to open, like, these files and the binaries and,

163
00:10:03,715 --> 00:10:07,335
like, to start, like, to looking, like, in their own hands, they're manually

164
00:10:07,475 --> 00:10:11,130
for, like, a for a for risk. First, like, of course,

165
00:10:11,130 --> 00:10:14,410
like, we understand that this is not their expertise and, like, it it

166
00:10:14,570 --> 00:10:17,950
like, we want to be secured, but, like, like, even, like, worse,

167
00:10:18,329 --> 00:10:22,089
we just spend all their time on security. And I think that

168
00:10:22,089 --> 00:10:25,134
this is, like, the worst stuff. Actually, it's not the worst. I think that, like,

169
00:10:25,134 --> 00:10:28,495
the worst, and this is also, like, something that, like, I saw recently in several

170
00:10:28,495 --> 00:10:32,274
organizations is just, like, to block everything. Organizations

171
00:10:32,495 --> 00:10:36,334
that, like, understand, okay, Hugging Face model, it's, like, true, like, a secure, like,

172
00:10:36,334 --> 00:10:40,000
in secure area. Let's block it. Let's say, like, to

173
00:10:40,000 --> 00:10:43,760
all the data scientists in the organization, you're disallowed to use HAG interface model. I

174
00:10:43,760 --> 00:10:46,980
think this is, like, the worst. That seems like a mistake because

175
00:10:48,000 --> 00:10:51,440
because the people are gonna find a way. Well, 1, where you can't stop the

176
00:10:51,440 --> 00:10:54,260
signal. Right? That was a line from, a movie.

177
00:10:55,345 --> 00:10:58,865
They can't, kudos if people know who that what movie that is.

178
00:10:58,865 --> 00:11:02,465
But, you know, if you block Huggy Face, people are gonna find a way

179
00:11:02,465 --> 00:11:06,305
around that. They're gonna put it on a thumb drive at

180
00:11:06,305 --> 00:11:10,010
home and then bring it in. So percent. This is, by the way, also, like,

181
00:11:10,010 --> 00:11:13,690
what you see, like, with this kind of, like, internal Artifactory. You see that, like,

182
00:11:13,690 --> 00:11:16,890
once you get to you you create for the r and d or create for

183
00:11:16,890 --> 00:11:20,650
the developers or for the data scientists, you create some level of, like,

184
00:11:20,650 --> 00:11:24,345
friction. They will just find a way out to, like, bypass

185
00:11:24,345 --> 00:11:27,325
it and to to lower this, this friction.

186
00:11:28,585 --> 00:11:31,005
Right. So so couple of questions.

187
00:11:32,505 --> 00:11:35,725
One, I've seen, improper naming

188
00:11:36,320 --> 00:11:40,020
Not improper naming, but but basically using, names,

189
00:11:40,080 --> 00:11:43,760
like, that's looks similar to what should be. Yeah. Type will split. Type

190
00:11:43,760 --> 00:11:47,600
type splitting. That's it. I've seen that, which is kind of, I guess,

191
00:11:47,600 --> 00:11:50,500
kind of, you know, dollar store approach. But also,

192
00:11:53,045 --> 00:11:56,725
how does how does it if you wanted to look through these model files, as

193
00:11:56,725 --> 00:11:59,605
far as I know, they're just I just looked at them. I just see binary

194
00:11:59,605 --> 00:12:02,644
stuff. Like, how would you look for malicious code in there? Because I think you're

195
00:12:02,644 --> 00:12:06,485
right. That's not a skill set the average AI engineer or data scientist

196
00:12:06,485 --> 00:12:10,190
would have. Yeah. So, basically, like, you need, like, to manually kind of, like,

197
00:12:10,190 --> 00:12:13,630
parsing it because, like, you have, of course, like, the the binary file, but most

198
00:12:13,630 --> 00:12:17,470
times, it's not only, like, the binary file. You label for, like, the the code

199
00:12:17,470 --> 00:12:21,045
file that run, like, run the model, and you label for, like, the, in

200
00:12:21,045 --> 00:12:24,805
case it's, like, pick a, like, the deserialization process, that you can, like,

201
00:12:24,965 --> 00:12:28,725
parse and then, like, to see, like, the code there. But then you

202
00:12:28,725 --> 00:12:31,845
need also, like, you know, like, you have, like, 2 phase. 1st, you need to

203
00:12:31,845 --> 00:12:34,890
to parse it, you know, like, to see, like, the code, but then you need

204
00:12:34,890 --> 00:12:38,649
also, like, to be able to read code and to understand which

205
00:12:38,649 --> 00:12:42,089
one is valid and which one is malicious, which is also, like, completely, like, you

206
00:12:42,089 --> 00:12:45,529
know, like, you need expertise in this area. If you see bash

207
00:12:45,529 --> 00:12:49,065
commands, is it okay or not? Do you see access to the

208
00:12:49,065 --> 00:12:52,825
Internet? Okay or not? Like, you you need, like, to have, like,

209
00:12:52,825 --> 00:12:56,505
some, like, detectors in there that, that know how to do it, like, build

210
00:12:56,505 --> 00:13:00,265
by by expert or something. So how would you even detect

211
00:13:00,265 --> 00:13:03,470
that if you found it? Like, how was this found? Was this just somebody looking

212
00:13:03,470 --> 00:13:07,230
in network packets? Or, like, what how was it discovered? I'm

213
00:13:07,230 --> 00:13:10,530
just curious. Yeah. This specifically was, like, by our

214
00:13:10,750 --> 00:13:14,394
security research team. Okay. Yeah. That's like, looks a

215
00:13:14,394 --> 00:13:17,514
lot if, a lot like all the time, like, you know, all these different kind

216
00:13:17,514 --> 00:13:20,954
of, like, open source and third party models in order to to help

217
00:13:20,954 --> 00:13:24,735
our users to make sure that, like, everything that they use

218
00:13:25,035 --> 00:13:28,870
is is valid. And again, most importantly, without slowing

219
00:13:28,870 --> 00:13:32,230
them down. They can just, like, download and, like, run, like, with everything that they

220
00:13:32,230 --> 00:13:35,769
that they want. And in case, we see something that is,

221
00:13:36,230 --> 00:13:39,750
that is suspicious, we know how to detect it and to to help them to

222
00:13:39,750 --> 00:13:42,475
to secure it. Interesting. Interesting.

223
00:13:43,415 --> 00:13:47,115
Because I know a lot of people, you know, they they've been downloading

224
00:13:47,175 --> 00:13:50,935
these models from Hugging Face. And just taking it on

225
00:13:50,935 --> 00:13:54,535
faith, and I've heard that these things don't call

226
00:13:54,535 --> 00:13:58,370
out to the Internet. Mhmm. And I fell into that. And then

227
00:13:58,370 --> 00:14:02,150
I kinda had this moment of paranoia where I'm like, how do I know?

228
00:14:02,370 --> 00:14:05,730
I mean, the only way I'm a I'm just a humble data scientist. Right? Like,

229
00:14:05,730 --> 00:14:08,530
so the only way I would think about it would be to have a firewall

230
00:14:08,530 --> 00:14:12,214
rule that would block network traffic going up for that box.

231
00:14:12,515 --> 00:14:16,035
And I'm sure there's probably workarounds to that too. I mean, are these

232
00:14:16,035 --> 00:14:19,415
attacks are these attacks that sophisticated yet?

233
00:14:20,515 --> 00:14:24,149
Yeah. Yeah. And, like, also, like, most times you don't, like, the data

234
00:14:24,149 --> 00:14:27,910
science, like, they don't want, like, to permanently, like, to close, like, the Internet, like,

235
00:14:27,910 --> 00:14:31,670
the outbound because also, like, the application needs it. And also, like, the, you

236
00:14:31,670 --> 00:14:35,029
know, like, the the in order, like, to download, like, the dependencies and the models

237
00:14:35,029 --> 00:14:38,805
you needed. So most times, like, just, like, to block the Internet, it doesn't solve

238
00:14:38,805 --> 00:14:42,404
everything. It was, like, more, like, in the past that everything was, like, network based

239
00:14:42,404 --> 00:14:46,165
only. Today, when you have, like, also, like, the applicative layer here, so

240
00:14:46,165 --> 00:14:47,705
it's, like, a bit more sophisticated.

241
00:14:50,325 --> 00:14:53,690
But yeah. Wow. So

242
00:14:54,310 --> 00:14:57,830
the safe tensor format, as I understand it, what you

243
00:14:57,830 --> 00:15:01,190
know, you basically digitally sign or somebody

244
00:15:01,190 --> 00:15:05,030
digitally signs the contents of it. Is that is

245
00:15:05,030 --> 00:15:08,445
that a correct understanding? Yeah. So it's end up like a

246
00:15:08,525 --> 00:15:12,285
like, in general, first thing, of course, that, like, a safe denture is, like,

247
00:15:12,285 --> 00:15:15,965
much more secure. Okay. I already like by design, and as long

248
00:15:15,965 --> 00:15:19,769
as we as the industry will go, like, more and more, like, towards

249
00:15:19,769 --> 00:15:22,750
this road, because today, like, we still see, like, tons of light pickles.

250
00:15:23,449 --> 00:15:27,290
But as long as we progress, like, all as an industry, we'll already,

251
00:15:27,290 --> 00:15:31,050
like, be, like, in a bit better situation. It's not

252
00:15:31,050 --> 00:15:34,685
perfect, of course. We still see some issues. And, of course, organizations still

253
00:15:34,685 --> 00:15:38,245
need, like, to have some security measurements and processes

254
00:15:38,245 --> 00:15:41,025
to make sure that, like, they're aware of what,

255
00:15:42,445 --> 00:15:46,205
like, Hang in Face are using. But I think that it's already, like,

256
00:15:46,205 --> 00:15:49,840
going to be a bit better. I can tell you something that, actually,

257
00:15:49,840 --> 00:15:53,140
like, recently one of our one of our partners told me,

258
00:15:53,760 --> 00:15:57,440
which was pretty cool, very similar to what you said that you

259
00:15:57,440 --> 00:16:00,180
start, like, to feel a lot of concerns about this area.

260
00:16:01,245 --> 00:16:04,764
VP data science of a very big like,

261
00:16:04,764 --> 00:16:08,605
Fortune Fortune 500, like, very big, like, corporate. And you kind

262
00:16:08,605 --> 00:16:11,884
of, like, the the head of, like, the older data science, like, groups here. And

263
00:16:11,884 --> 00:16:15,665
they told me, you know, Niv, I I already know

264
00:16:16,250 --> 00:16:19,850
what I'm going to be fired about, like, in a in the next, like,

265
00:16:19,850 --> 00:16:23,690
24 months, and it's going to be about that. I know for sure, like, we're

266
00:16:23,690 --> 00:16:27,529
using, like, so much, like, Agiface models. I know for sure that I'm this is,

267
00:16:27,529 --> 00:16:30,490
like, the reason that I'm going to be fired, like, one day. Because today, like,

268
00:16:30,490 --> 00:16:34,274
we're using it, like, freely. We are also, like, very creative. We're not, like,

269
00:16:34,274 --> 00:16:37,795
only using, like, the most popular LAMA model, but, like, we're to,

270
00:16:37,795 --> 00:16:41,635
like, take advantage of this great advantage of the platform, which is, like,

271
00:16:41,635 --> 00:16:45,360
the amount and the diversity of the model that you have there. But I have

272
00:16:45,360 --> 00:16:49,040
no no doubt that we create so many risks that we're just,

273
00:16:49,040 --> 00:16:52,879
like, not exposed yet, that I'm going to to pay with it,

274
00:16:52,879 --> 00:16:56,560
like, with my head. So it it's

275
00:16:56,560 --> 00:17:00,145
it's pretty cool because it's not it's not always that you see,

276
00:17:01,005 --> 00:17:04,684
r and d and business owners that are so concerned

277
00:17:04,684 --> 00:17:08,525
about security even before the security team arrived

278
00:17:08,525 --> 00:17:12,330
to them. But they're already aware of this risk. And it's something that

279
00:17:12,330 --> 00:17:15,770
we start, like, to see more and more because, you know, it's just like it's

280
00:17:15,770 --> 00:17:19,310
it's too obvious. Like, the the the window is open and everybody see it.

281
00:17:20,010 --> 00:17:23,530
Yeah. I I would suppose that's in in a in a very kinda strange way

282
00:17:23,530 --> 00:17:26,615
that's bit progress, right, where people think about security beforehand.

283
00:17:27,155 --> 00:17:30,455
Like, even if they don't know I mean, I think this this this VP,

284
00:17:32,115 --> 00:17:35,955
you know, is pretty spot on. Like, what concerns me about the widespread

285
00:17:35,955 --> 00:17:39,635
adoption of these models and particularly Hugging Face, so there are no knock on Hugging

286
00:17:39,635 --> 00:17:43,310
Face. I think whatever you get your models Mhmm.

287
00:17:45,050 --> 00:17:48,670
I mean, we just don't know. And these things are just complicated.

288
00:17:48,730 --> 00:17:52,170
Right? I mean, they are by design complicated with 1,000,000,000,000 of

289
00:17:52,170 --> 00:17:55,955
parameters. In some cases, I guess, 1,000,000,000,000. But also, you

290
00:17:55,955 --> 00:17:59,715
know, they have this ability to even

291
00:17:59,715 --> 00:18:03,395
even if everything worked out well, even even assuming everything is

292
00:18:03,395 --> 00:18:06,855
fine, right, in terms of the operationalization of these things,

293
00:18:08,470 --> 00:18:11,830
There's still the chance that the model itself and its

294
00:18:11,830 --> 00:18:15,450
training was poisoned. So, like,

295
00:18:15,590 --> 00:18:18,550
I I mean, like, there's just so many because when I my wife works in

296
00:18:18,550 --> 00:18:22,070
IT security, and I was all excited. It was about a year and a half

297
00:18:22,070 --> 00:18:25,125
ago. I I was talking to her about LLMs and stuff like that

298
00:18:25,825 --> 00:18:29,365
and chat GPT and and and those types of things.

299
00:18:30,705 --> 00:18:33,665
And I was like, oh, well, you take all this data and you train a

300
00:18:33,665 --> 00:18:36,960
model and you you distill down this graph and this and this. And then she's

301
00:18:36,960 --> 00:18:40,760
like, that sounds like a big attack surface to me. Yeah.

302
00:18:40,760 --> 00:18:44,480
And I was like like, data poisoning in the classic one and data

303
00:18:44,480 --> 00:18:47,840
poisoning can be, like, in in in 2 levels or, like like, someone

304
00:18:47,840 --> 00:18:51,175
like poisoning your data or exactly what you say,

305
00:18:51,395 --> 00:18:55,235
somebody just, like, this way, like,

306
00:18:55,235 --> 00:18:58,835
create backdoor in, in third party models and open source

307
00:18:58,835 --> 00:19:02,190
models that then, like, everybody downloads. Right.

308
00:19:02,250 --> 00:19:05,390
Right. And we wouldn't know, like, what's

309
00:19:05,850 --> 00:19:09,390
the I mean, the defense against that seems very

310
00:19:09,610 --> 00:19:12,990
intricate. Not impossible, but very delicate and intricate.

311
00:19:13,915 --> 00:19:17,675
So in in in classic application security, there is a

312
00:19:17,675 --> 00:19:21,515
great practice called SBOM. SBOM is a software

313
00:19:21,515 --> 00:19:25,195
billing of material. Basically, it means that, you get, like, in

314
00:19:25,195 --> 00:19:28,760
specific format, kind of like visibility to all the different

315
00:19:28,760 --> 00:19:32,360
software components that build your application. One of the things that

316
00:19:32,360 --> 00:19:35,960
now we're also, like, part of the building is a

317
00:19:35,960 --> 00:19:39,420
official framework of OWASP, the nonprofit organization

318
00:19:40,040 --> 00:19:43,865
around security of AI and machine learning. And

319
00:19:44,325 --> 00:19:47,945
what you have there is for the first time you have like double layer

320
00:19:48,164 --> 00:19:51,865
of visibility. The first one is just like to understand

321
00:19:52,565 --> 00:19:56,300
what models I'm even using in the organization. Everything, like

322
00:19:56,300 --> 00:19:59,980
what models like, include in my application. It can be open

323
00:19:59,980 --> 00:20:03,740
source models. It can be self developed models. Also, by the way, not only not

324
00:20:03,740 --> 00:20:07,120
only LLM, of course, also like vision, NLP, like everything else.

325
00:20:07,420 --> 00:20:11,245
And also third party models that are embedded as part of the application, they

326
00:20:11,245 --> 00:20:14,545
are not open no. They are not open source. For example,

327
00:20:15,085 --> 00:20:18,785
if software engineer add API call as part of the application

328
00:20:19,565 --> 00:20:23,025
to OpenAI, in this way, they embed

329
00:20:25,040 --> 00:20:28,320
LLM as part of the application. This is also like one of, like, the models

330
00:20:28,320 --> 00:20:32,000
that you are using, but you you you want to know this is all my

331
00:20:32,000 --> 00:20:35,460
AI and model inventory that I'm using as Spyro as part of the application.

332
00:20:36,000 --> 00:20:39,845
And in addition to that, you have even the deeper context there, which

333
00:20:39,845 --> 00:20:43,684
is also like what you referred to. It's not only this is

334
00:20:43,684 --> 00:20:47,285
the list of the model that I'm using, but for each one, you want to

335
00:20:47,285 --> 00:20:51,100
understand on what dataset it was trained, what data

336
00:20:51,100 --> 00:20:54,299
maybe also like it has access to in case it's in production, let's say, with

337
00:20:54,299 --> 00:20:57,679
RAG architecture. You want to understand, like, the deep context

338
00:20:58,059 --> 00:21:01,659
of all these, like, models, what I'm using, but also, like, what

339
00:21:01,659 --> 00:21:05,095
happens, like, in this specific, like, model. Sometimes

340
00:21:05,395 --> 00:21:08,915
it's, as you said, for to to understand what data was trained on a

341
00:21:08,915 --> 00:21:12,595
model before, like, I'm starting, like, to use it by 3rd party, a

342
00:21:12,595 --> 00:21:16,295
lot of time is even, like, internally in the organization.

343
00:21:16,830 --> 00:21:19,809
Because once we start to train a lot of models,

344
00:21:20,750 --> 00:21:23,970
we want to make sure that we don't violate

345
00:21:24,590 --> 00:21:28,270
any policy that we have in the organization, either it's for compliance or

346
00:21:28,270 --> 00:21:32,115
security. For example, one of the things that, like, we are like, I keep, like,

347
00:21:32,115 --> 00:21:35,875
hearing a lot of time from, from security and legal and privacy

348
00:21:35,875 --> 00:21:39,715
teams is that, look, we instruct all the

349
00:21:39,715 --> 00:21:43,015
organization not to train any sensitive

350
00:21:43,075 --> 00:21:46,900
data, PII, PCI, PHI, any other sensitive

351
00:21:46,900 --> 00:21:50,520
information on our models. But except instructing

352
00:21:50,660 --> 00:21:54,420
it and speak about it, nobody knows if it

353
00:21:54,420 --> 00:21:57,780
happens. And we don't provide also our data

354
00:21:57,780 --> 00:22:01,404
teams tools that will help them to

355
00:22:01,404 --> 00:22:05,245
detect it in case it, like, it happens, like like, not in purpose. For

356
00:22:05,245 --> 00:22:08,924
example, I can tell you, like, one of the thing that we saw very

357
00:22:08,924 --> 00:22:11,985
recently. Big organization, a huge Fintech company,

358
00:22:12,764 --> 00:22:16,340
that data scientist unintentionally trained all the

359
00:22:16,340 --> 00:22:20,100
transaction of the application on one of the models. Now it's

360
00:22:20,100 --> 00:22:23,540
a, like, crazy big violation there of, like, compliance and

361
00:22:23,540 --> 00:22:27,300
security. The data scientist did this unintentionally. They

362
00:22:27,300 --> 00:22:31,134
truly, like, didn't know it. If they had something that, like, would help them, like,

363
00:22:31,134 --> 00:22:34,894
the basic visibility that you mentioned before, it will truly, like, help them to

364
00:22:34,894 --> 00:22:38,434
start, like, to continue, like, innovate and just, like, in case something like bad happens,

365
00:22:38,495 --> 00:22:41,934
to be alerted in that. And so I see that, like, the the data training

366
00:22:41,934 --> 00:22:45,740
is also, like, very, very important point also internally and not

367
00:22:45,740 --> 00:22:49,420
only the external data train on the external models that we're embedding and

368
00:22:49,420 --> 00:22:53,180
downloading. So you mentioned, OWASP. So just

369
00:22:53,180 --> 00:22:55,980
for the benefit of folks who may not know, because most of our listeners are

370
00:22:55,980 --> 00:22:59,815
either data engineers, data scientists. What is OWASP? And what is the

371
00:22:59,895 --> 00:23:03,414
I think it's with the OWASP 10? Yeah. So

372
00:23:03,414 --> 00:23:07,255
OWASP in general, it's a amazing organization that,

373
00:23:07,495 --> 00:23:10,554
is like a nonprofit one that helps basically,

374
00:23:12,360 --> 00:23:16,040
we combine a lot of people together, gather together in order to make

375
00:23:16,040 --> 00:23:19,720
sure that all our industry is much more secured with a lot of

376
00:23:19,720 --> 00:23:23,320
different security initiatives in a lot of different aspects, mainly of like product

377
00:23:23,320 --> 00:23:26,895
security, but not only. Product security is like application

378
00:23:26,895 --> 00:23:28,515
security. It's building security.

379
00:23:30,495 --> 00:23:34,095
Specifically in OASP, you have several different types of

380
00:23:34,095 --> 00:23:37,875
projects. So for example, one type of project is the OSP10,

381
00:23:38,410 --> 00:23:41,150
top ten, that basically takes different areas

382
00:23:42,090 --> 00:23:45,850
and define the top ten risks in this specific area. So it can be top

383
00:23:45,850 --> 00:23:48,670
ten for API, top ten for

384
00:23:50,730 --> 00:23:54,190
CICD. And now there is also like top ten for LLM.

385
00:23:55,345 --> 00:23:58,965
Addition framework, like, there are a lot of like different tools. Specifically,

386
00:23:59,424 --> 00:24:02,965
if someone wants to understand a bit more about like the wide

387
00:24:03,745 --> 00:24:07,205
landscape and the risk around AI and machine learning,

388
00:24:08,309 --> 00:24:12,150
the framework that I would like recommend on, highly recommend on, is

389
00:24:12,150 --> 00:24:15,510
amazing and very comprehensive called the OWASP AI

390
00:24:15,510 --> 00:24:19,130
Exchange. A group of people, again, gathered together,

391
00:24:19,910 --> 00:24:23,495
that covered not only LLM, but all the basic

392
00:24:23,635 --> 00:24:27,335
principles and risk in data pipelines and MLOps

393
00:24:27,475 --> 00:24:31,075
and start from the building and up to the runtime and start from the

394
00:24:31,075 --> 00:24:34,135
classic machine learning and up to Gen AI, very comprehensive,

395
00:24:34,675 --> 00:24:38,310
very also practical, which is very important and

396
00:24:38,310 --> 00:24:42,010
speaks in both language, on both languages. On one hand,

397
00:24:42,310 --> 00:24:46,090
of course, security, but on the other, also like very oriented

398
00:24:46,550 --> 00:24:49,210
for data and machine learning and AI practitioners.

399
00:24:50,390 --> 00:24:51,610
Interesting. Interesting.

400
00:24:54,125 --> 00:24:57,905
What what do you see

401
00:24:58,045 --> 00:25:00,845
well, here's what I mean, I'll have a lot of questions, but one of them

402
00:25:00,845 --> 00:25:04,605
is, do you think the 0 what do you think the

403
00:25:04,605 --> 00:25:08,430
0 trust approach is a good starting point? I don't think

404
00:25:08,430 --> 00:25:11,810
it's the answer here like it is kinda everywhere else. But do you think that,

405
00:25:13,790 --> 00:25:16,370
that type of philosophy of don't trust anything?

406
00:25:17,150 --> 00:25:20,904
Right? Kind of like, I mean, is that because you you mentioned this

407
00:25:20,904 --> 00:25:24,664
early when I talked about network firewalls, right, where the old approach of thing

408
00:25:24,664 --> 00:25:28,424
is just pull the plug or set up rules. And that used

409
00:25:28,424 --> 00:25:31,870
to work, but there's plenty of other ways around it, Both I think

410
00:25:31,870 --> 00:25:35,549
kind of low skill, mid skill, and certainly high skill

411
00:25:35,549 --> 00:25:39,230
ways around that. What do you you mean then 0

412
00:25:39,230 --> 00:25:42,990
trust is meant to address that. What are your thoughts on like I

413
00:25:42,990 --> 00:25:46,049
mean, is that the pro is that the mindset that either

414
00:25:49,155 --> 00:25:52,875
security folks in this space would have to take on? Like, it's more

415
00:25:52,995 --> 00:25:56,515
if they well, they probably already have. Right? Yeah. I think you're,

416
00:25:56,515 --> 00:25:59,735
like, I think you're actually, like, the the you you you perfectly

417
00:26:00,115 --> 00:26:03,610
defined it because I believe that 0 Trust is exactly like you say, it's kind

418
00:26:03,610 --> 00:26:07,210
of like a, like, kind of like a mindset. It's not like a very

419
00:26:07,210 --> 00:26:11,050
accurate, like, technical approach, but it's kind of

420
00:26:11,050 --> 00:26:14,750
like more like a a philosophy with some level of implementation.

421
00:26:17,535 --> 00:26:21,055
I believe that, like, the right mindset and, like, the right framework to look

422
00:26:21,055 --> 00:26:24,435
on a on a security for AI and, like, all the building

423
00:26:24,815 --> 00:26:28,495
and also, like, the runtime is basically to take all the

424
00:26:28,495 --> 00:26:32,200
different principles that we are all already aware

425
00:26:32,419 --> 00:26:36,200
of. Like we are all, like I'm saying, like the security industry,

426
00:26:36,500 --> 00:26:39,720
we are all already aware of on classic software development,

427
00:26:40,340 --> 00:26:44,019
building and runtime, and to implement it on the

428
00:26:44,019 --> 00:26:47,595
data and AI lifecycle. For example, if we mentioned, like,

429
00:26:47,595 --> 00:26:51,435
code scanning, so code scanning the notebooks, we mentioned open source,

430
00:26:51,435 --> 00:26:55,215
so checking all the all the Ag interface models. But it's not only that.

431
00:26:55,355 --> 00:26:58,875
For example, one of the things that, like, we see, a lot of attacks that

432
00:26:58,875 --> 00:27:02,610
we, like, we had recently in the security area are around the

433
00:27:02,610 --> 00:27:06,130
CICD. A few years ago, there was a big attack called

434
00:27:06,130 --> 00:27:09,970
SolarWinds, that basically, yeah, so you know it

435
00:27:09,970 --> 00:27:13,650
perfectly, just for the audience that, like, are not familiar with the specific details

436
00:27:13,650 --> 00:27:17,245
in, like, very high level attacker that exploited and

437
00:27:17,245 --> 00:27:20,925
misconfigurations in CICD tools. And this is

438
00:27:20,925 --> 00:27:24,605
basically how they succeeded, like, to start, like, this whole huge attack and

439
00:27:24,605 --> 00:27:28,385
breach. Now one of the things that, like, it taught us all as an industry

440
00:27:28,890 --> 00:27:32,570
is that until now we were focusing on, like, securing only

441
00:27:32,570 --> 00:27:36,250
our code. Now we understand that the code is not enough. We need to make

442
00:27:36,250 --> 00:27:39,930
sure that the building tools are also well configured. So

443
00:27:39,930 --> 00:27:42,505
we start, like, to see a lot of, like, tools that help us to make

444
00:27:42,505 --> 00:27:46,265
sure that we don't have misconfigurations in the CICD and the SCMs and all

445
00:27:46,265 --> 00:27:50,105
these different kind of tools. But when we are going to our domain, when

446
00:27:50,105 --> 00:27:53,785
we go to the data and AI teams, as we know, we just use different

447
00:27:53,785 --> 00:27:57,620
stack. We use all these data pipelines and model

448
00:27:57,620 --> 00:28:01,460
registries and MLOps tools and platforms like Databricks and Domino

449
00:28:01,460 --> 00:28:05,060
and Snowflake and stuff like that. The configuration, as we know, is

450
00:28:05,060 --> 00:28:08,900
not like neverwhere. Most time, it's even wider. This is why it's

451
00:28:08,900 --> 00:28:12,595
not managed by DevOps. It's managed by us, by the data teams. It's managed by

452
00:28:12,595 --> 00:28:16,435
MLOps teams, by data infra, by data platform. And we're doing a

453
00:28:16,435 --> 00:28:20,275
lot like, a great job in order to optimize all the configuration for the

454
00:28:20,275 --> 00:28:23,850
product. We're not security experts. We don't want to be

455
00:28:23,850 --> 00:28:26,970
security experts and, like, start, like, to spend a lot of time in that. But

456
00:28:26,970 --> 00:28:30,670
nobody else just like to very easily find all these different kind of misconfigurations.

457
00:28:31,370 --> 00:28:35,210
And this is also a threat and, like, attack vector that we

458
00:28:35,210 --> 00:28:38,010
started, like, to see a lot in the field today. I can tell you that,

459
00:28:38,010 --> 00:28:40,575
like, we see tons of attacks around

460
00:28:41,914 --> 00:28:45,615
different misconfigurations in tools like Airflows and Databricks

461
00:28:45,755 --> 00:28:49,115
and stuff like that. And I think this is also like a very, very important,

462
00:28:49,115 --> 00:28:52,890
like, mindset, like, to be in. And in addition to that, of course, we have

463
00:28:52,890 --> 00:28:56,650
all the all the runtime and all the adversarial attacks there.

464
00:28:56,650 --> 00:29:00,030
There are specifically, if I mentioned in the

465
00:29:00,330 --> 00:29:03,950
OSPI exchange, so OSPI exchange covers everything.

466
00:29:04,409 --> 00:29:07,815
The OSPI 10LLM specifically is more

467
00:29:08,474 --> 00:29:10,934
covering this LLM, like,

468
00:29:12,115 --> 00:29:15,794
specific risk. And then you have, like, all the adversarial attacks, like prompt injection

469
00:29:15,794 --> 00:29:19,174
and model jailbreak and model dn out of service, model dn out of wallet,

470
00:29:19,634 --> 00:29:23,450
etcetera. So basically, the mindset should

471
00:29:23,450 --> 00:29:26,890
be we already know security very well. We already have, like, these

472
00:29:26,890 --> 00:29:30,250
principles. Until now, we just haven't

473
00:29:30,250 --> 00:29:34,030
implemented them on the data and AI teams,

474
00:29:34,575 --> 00:29:38,174
tools, and technology. And this is exactly what we start, like, to

475
00:29:38,335 --> 00:29:41,054
what we, like, need, like, to start to do. And this is what we see

476
00:29:41,054 --> 00:29:44,255
also that, like, you know, like, now we have no reason. Like, we all see,

477
00:29:44,255 --> 00:29:47,715
like, these different kind of attacks. So we start to see that all the organizations

478
00:29:47,855 --> 00:29:50,250
were, like, starting to to already, like, walk the walk.

479
00:29:52,010 --> 00:29:55,610
Wow. Yeah. I I often wonder too, like, what you

480
00:29:55,610 --> 00:29:59,130
mentioned the pipelines being a vulnerability or an

481
00:29:59,130 --> 00:30:02,350
attack surface. Right? Like, or a potential vulnerability.

482
00:30:03,184 --> 00:30:06,725
I often wonder now, like, when, you know, we're looking at agentic

483
00:30:06,785 --> 00:30:10,325
AI, right, where these things aren't just LLMs,

484
00:30:10,465 --> 00:30:14,165
right, producing text or going through these materials.

485
00:30:14,385 --> 00:30:17,550
We're giving them, you know, abilities,

486
00:30:18,090 --> 00:30:21,610
right, to influence pipelines, right, to to or to

487
00:30:21,610 --> 00:30:25,390
whatever. Right? Like, that just seems to me like a giant

488
00:30:26,410 --> 00:30:30,090
security risk. I mean, telling someone you know, there's there's multiple ways to

489
00:30:30,090 --> 00:30:33,665
break an LOM. Right? Like, obviously, there's the the the $1 Chevy

490
00:30:33,665 --> 00:30:37,105
Tahoe. Right? Where the guy did that. Right? Pretty low tech

491
00:30:37,105 --> 00:30:39,765
approach, pretty brute force ish.

492
00:30:40,865 --> 00:30:43,685
But I often wonder, like, well, what

493
00:30:46,290 --> 00:30:49,990
what sorts of things are agentic systems gonna open up?

494
00:30:50,130 --> 00:30:53,650
Like, what does that look like? I think that this is exactly like where we

495
00:30:53,650 --> 00:30:57,170
we will start, like, to see, like, the very big LLM,

496
00:30:58,535 --> 00:31:01,975
breaches, that we'll have. I believe that, by the

497
00:31:01,975 --> 00:31:05,495
way, my belief is that the the how does the

498
00:31:05,495 --> 00:31:09,335
attack start will still be, like, in a lot of cases,

499
00:31:09,335 --> 00:31:12,855
very similar to what we see today. But the impact of the

500
00:31:12,855 --> 00:31:16,639
attack will be much, much, much, much, much higher because now like the

501
00:31:16,639 --> 00:31:19,700
model cannot only like, promise you a

502
00:31:20,799 --> 00:31:24,179
$1 a car, but you can throw, like, I already like

503
00:31:24,399 --> 00:31:27,919
send the order, can send the car to you, can like book your

504
00:31:27,919 --> 00:31:31,735
hotel, can do like everything there, can share with you, like, the data

505
00:31:31,735 --> 00:31:35,335
of maybe, like, other customers in the application because it is,

506
00:31:35,335 --> 00:31:39,175
like, a RAG architecture, and it is also, like, different, like, tools

507
00:31:39,175 --> 00:31:42,935
that provide him the ability to maybe even, like, write different codes

508
00:31:42,935 --> 00:31:46,710
to the application. And then it might also like start like different types

509
00:31:46,710 --> 00:31:50,310
of remote code execution. As long as we are going to

510
00:31:50,310 --> 00:31:53,990
provide to these NLMs more privilege, more access,

511
00:31:53,990 --> 00:31:57,794
more tools, more abilities, the impact of the risk

512
00:31:57,794 --> 00:32:01,475
that they will be able, like, to cause will be much higher. I still

513
00:32:01,475 --> 00:32:05,154
believe again that that pack vectors are going to start from more or less,

514
00:32:05,154 --> 00:32:08,455
like, the same areas, like prompt injection and model jailbreak,

515
00:32:08,755 --> 00:32:12,274
but they they eventually, like, the outcome of these attacks will be much

516
00:32:12,274 --> 00:32:15,690
higher. I could see that. Because we're giving them

517
00:32:15,690 --> 00:32:19,450
actuators, so to speak. Right? Like we're not we're we're

518
00:32:19,450 --> 00:32:23,290
giving them agency. Right? Like where they could actually do real damage as

519
00:32:23,290 --> 00:32:26,910
opposed to because one thing in saying you're gonna give somebody

520
00:32:27,365 --> 00:32:31,125
a $1 Chevy Tahoe. It's quite another to actually place the order,

521
00:32:31,125 --> 00:32:34,965
sign off on the invoice, and then ship it. Right? Yep. And what

522
00:32:34,965 --> 00:32:37,845
if you'll do, like I don't know. Like, you'll you'll start, like, to see it

523
00:32:37,845 --> 00:32:41,550
also, like, in banks and in investments. They will start, like, to transfer

524
00:32:41,550 --> 00:32:45,150
your money. They will start, like, to invest, like, to buy stock. They will like,

525
00:32:45,150 --> 00:32:48,830
the the the the amount of, like, potential impact here is, like, a

526
00:32:48,830 --> 00:32:52,510
crazy high. I believe, by the way, that eventually, this is going to be one

527
00:32:52,510 --> 00:32:56,095
of the things that, like, we'll see also, like, slow down the adoption, not

528
00:32:56,095 --> 00:32:59,695
less than the than the technology or, like, finding, like, the

529
00:32:59,695 --> 00:33:03,135
right use case. Yeah. No. I could see

530
00:33:03,135 --> 00:33:06,675
that. I I just think that we're just setting, as an industry.

531
00:33:07,380 --> 00:33:11,140
We're setting ourselves up for a huge exploit that we

532
00:33:11,140 --> 00:33:13,640
haven't figured out is already there yet.

533
00:33:14,500 --> 00:33:18,180
And so so what what

534
00:33:18,180 --> 00:33:21,895
can AI engineers, data scientists,

535
00:33:21,895 --> 00:33:24,795
data engineers do today to make things

536
00:33:25,575 --> 00:33:28,615
better? I know we can't fix it because we don't know what's we really don't

537
00:33:28,615 --> 00:33:32,455
know what's broken. I think that's one of the frustrating and kind of fun things

538
00:33:32,455 --> 00:33:36,200
about security work is, like, it's not that there's no vulnerabilities.

539
00:33:36,260 --> 00:33:40,100
You haven't discovered any vulnerabilities yet. Right? There are no unknown there are

540
00:33:40,100 --> 00:33:43,800
always un there are always unknown unknowns.

541
00:33:44,020 --> 00:33:47,720
But if you have an unknown unknown or a known thing,

542
00:33:47,905 --> 00:33:50,945
you can you can say that you pretty much figured that out. But there's this

543
00:33:50,945 --> 00:33:53,845
whole aspect, which I don't think data scientists

544
00:33:54,865 --> 00:33:58,225
fully appreciate. I think they can understand the concept of the unknown

545
00:33:58,225 --> 00:34:01,985
unknowns. But in terms of the consequences of it, I don't

546
00:34:01,985 --> 00:34:05,399
think I think it's gonna take 1 major solar wind style

547
00:34:06,100 --> 00:34:09,719
issue or CrowdStrike style issue to make people conscious

548
00:34:10,179 --> 00:34:14,020
of of that. But how do

549
00:34:14,020 --> 00:34:17,824
we how do we prepare ourselves? Right? You can't

550
00:34:17,824 --> 00:34:21,505
stop the hurricane, but you can board up your windows. Right? Like, you

551
00:34:21,505 --> 00:34:25,344
know, how do you Yeah. I and I totally

552
00:34:25,344 --> 00:34:29,030
agree that, like, what's going through, like, to to shake every everybody

553
00:34:29,650 --> 00:34:33,090
will be, like, the the first SolarWinds or, like, the 4 log 4

554
00:34:33,090 --> 00:34:36,610
j attack that we see, like, in these areas. I think that,

555
00:34:36,610 --> 00:34:40,390
like, I think that you broke it very well

556
00:34:40,449 --> 00:34:43,844
and that we need to relate to both categories.

557
00:34:44,545 --> 00:34:47,285
1st is, like, the known,

558
00:34:47,985 --> 00:34:51,585
which already, like, exist. Like, we know that, like, you know, like,

559
00:34:52,545 --> 00:34:55,364
we see that as scientists. Like, we are not a scientist.

560
00:34:57,580 --> 00:35:01,100
And we see that one of the the things that, like, we see

561
00:35:01,100 --> 00:35:04,000
in in in our code in compared to software developers

562
00:35:04,940 --> 00:35:08,375
is that we don't give a

563
00:35:08,375 --> 00:35:12,075
tip on, like, everything, around security.

564
00:35:12,135 --> 00:35:15,655
Like, you'll see, like, tons of exposed secrets in plain

565
00:35:15,655 --> 00:35:19,275
text. You'll see tons of, like, test and, like, the sensitive data

566
00:35:19,494 --> 00:35:22,555
just like playing. And, like, it's state, like, exposed, like, in the notebooks.

567
00:35:23,140 --> 00:35:26,820
You'll see that we download, like, any dependencies without, like, like,

568
00:35:26,820 --> 00:35:30,500
even, like, think about it. Even so that, like, yeah, it looks like maybe, like,

569
00:35:30,500 --> 00:35:34,280
a bit suspicious and stuff like that. So it's it's far

570
00:35:34,420 --> 00:35:37,700
from from the basic. Let's make sure that, like, what we know that is not

571
00:35:37,700 --> 00:35:41,435
best practice, just, like, start, like, to implement it. And

572
00:35:41,435 --> 00:35:45,275
then regarding the unknown unknown, so, of

573
00:35:45,275 --> 00:35:48,155
course, like, you don't know how to handle it. I think that, like, as you

574
00:35:48,155 --> 00:35:51,915
as you said, you can start to prepare yourself. How do how do you

575
00:35:51,915 --> 00:35:55,410
prepare yourself in security? It's basically to be very

576
00:35:55,410 --> 00:35:58,850
organized and to to make sure that you have, like, the right visibility and

577
00:35:58,850 --> 00:36:02,609
governance. As long as you have, for example, like, you know how to build,

578
00:36:02,609 --> 00:36:06,290
like, your your AI or the machine learning bomb. You know all the

579
00:36:06,290 --> 00:36:10,125
different, like, models that are built or embedded as part of the application,

580
00:36:10,125 --> 00:36:12,625
and you have, like, the right lineage, which one

581
00:36:13,805 --> 00:36:17,265
was trained on which dataset, etcetera.

582
00:36:18,045 --> 00:36:21,565
Once, for example, that now let's say we'll continue with the

583
00:36:21,565 --> 00:36:25,160
examples of of Hugging Face. Like, a new Hugging Face

584
00:36:25,160 --> 00:36:29,000
model is is is now, like, published as a like, someone,

585
00:36:29,000 --> 00:36:32,599
like, found that it's, like, malicious. You because you prepared

586
00:36:32,599 --> 00:36:36,359
yourself and you have, like, the right visibility, you are able to go

587
00:36:36,359 --> 00:36:40,105
and very easily search exactly, like, if you use it and

588
00:36:40,105 --> 00:36:43,865
where you use it in all your organization. And this is also

589
00:36:43,865 --> 00:36:47,464
because you prepare yourself. This is exactly what happened, like, in Log 4

590
00:36:47,464 --> 00:36:51,145
j. In Log 4 j, it was like a dependency that

591
00:36:51,145 --> 00:36:54,730
found as a critical vulnerable. And a lot of

592
00:36:54,730 --> 00:36:58,350
organization, what they spent, like, most of the time is to try to understand

593
00:36:58,890 --> 00:37:02,730
where they even use this Log4j. And they seem that, like, if you prepare

594
00:37:02,730 --> 00:37:06,575
yourself, you are like, if you are organizing everything, you'll already

595
00:37:06,575 --> 00:37:10,174
be very, very, like, ready for the for the

596
00:37:10,174 --> 00:37:13,934
attack of, like, the unknown unknown. And, of course, everything

597
00:37:13,934 --> 00:37:17,474
in addition to to, you know, like, learning and, like, educating

598
00:37:17,535 --> 00:37:21,230
yourself. If you start, like, to understand, you'll go

599
00:37:21,230 --> 00:37:24,990
to, I don't know, Databricks, for example. A lot of people use Databricks. You'll

600
00:37:24,990 --> 00:37:27,790
go and, like, start, like, to see what are, like, the best practices of how

601
00:37:27,790 --> 00:37:31,550
to, like, configure your Databricks environments and what are, like, the best practices

602
00:37:31,550 --> 00:37:34,775
there. It's something that you can, like, find very easily, like, in the Internet. You

603
00:37:34,775 --> 00:37:37,194
don't need, like, to to do it, like, from scratch.

604
00:37:38,615 --> 00:37:42,055
But I'll say that, like, you you know, like, it's still, like, when we are

605
00:37:42,055 --> 00:37:45,655
aware of that, it's not still, like, the the top of our mind as the

606
00:37:45,655 --> 00:37:49,400
data practitioner to start looking, like, in our free time for this

607
00:37:49,400 --> 00:37:53,240
kind of concept. Right. I mean, that's a good point.

608
00:37:53,240 --> 00:37:55,980
Right? The fundamentals are still fundamental. Right?

609
00:37:57,240 --> 00:38:01,025
You know, making sure, you know, you track what

610
00:38:01,025 --> 00:38:04,465
your dependencies are. Right? So that way, if there's a breach in a hugging face

611
00:38:04,465 --> 00:38:08,225
model, like you said, you'll know right away whether or not it

612
00:38:08,225 --> 00:38:11,905
impacts you or not. Also too, I think you're

613
00:38:11,905 --> 00:38:15,445
right. This isn't top of mind for AI practitioners. Right?

614
00:38:17,320 --> 00:38:21,079
Even when I code, like, an app, my met

615
00:38:21,240 --> 00:38:24,140
my thought process are very different than when I'm in a notebook.

616
00:38:24,760 --> 00:38:28,060
Mhmm. It's just different wiring.

617
00:38:28,395 --> 00:38:32,075
Yep. And by the way, it's kind of like, it's kind of

618
00:38:32,075 --> 00:38:35,055
like a paradox because most times on the notebooks,

619
00:38:35,915 --> 00:38:39,355
we are connected to much more sensitive information than on our

620
00:38:39,355 --> 00:38:43,040
ID. Right. No. Exactly. So

621
00:38:43,040 --> 00:38:46,720
it's kind of it's like the worst, one of the worst case

622
00:38:46,720 --> 00:38:50,480
scenarios. Right? And and you're right. Like, people wanna work with real

623
00:38:50,480 --> 00:38:54,080
data, and they they just assume that if they're on a system that's

624
00:38:54,080 --> 00:38:57,470
secured and internal, they

625
00:38:58,695 --> 00:39:02,215
they, they don't have to worry about such things,

626
00:39:02,215 --> 00:39:06,055
which I think you're right. Like, with these systems that have access to

627
00:39:06,055 --> 00:39:09,895
sensitive data, these pipelines, I mean, it's one of those

628
00:39:09,895 --> 00:39:13,690
things where we need to start thinking about this. And what would you do

629
00:39:13,690 --> 00:39:17,289
you think that there's a, like, a career path for, like, an AI security engineer?

630
00:39:17,289 --> 00:39:20,829
Right? So it's not just a security engineer, like, in a traditional

631
00:39:20,890 --> 00:39:24,589
sense. Right? But also a someone who specializes

632
00:39:24,809 --> 00:39:28,635
in AI related issues. You think that's a growth industries? I

633
00:39:28,635 --> 00:39:31,035
have, like, no doubt that we are going to like to see more. Like, we

634
00:39:31,035 --> 00:39:34,715
already see these kind of practitioners in the field. I have no

635
00:39:34,715 --> 00:39:38,555
doubt that it's going, to be more and more frequent. And in

636
00:39:38,555 --> 00:39:41,680
addition to that, I believe that, like, even in the future, it's it's going to

637
00:39:41,680 --> 00:39:45,440
be even, like, several different, like, roles. For example, one of the

638
00:39:45,440 --> 00:39:48,800
things that, like, a lot of people that we work also, like, very closely with

639
00:39:48,800 --> 00:39:52,560
are AI red teaming. Right. It's not even,

640
00:39:52,560 --> 00:39:56,365
like, just like a AI security engineer, like, general one. Specifically around,

641
00:39:56,365 --> 00:39:59,425
like, credit teaming because all these kinds of adversarial

642
00:39:59,725 --> 00:40:03,105
attacks on models are very different, requires

643
00:40:03,485 --> 00:40:07,245
different techniques, different tactics. And the red teamers are the

644
00:40:07,245 --> 00:40:10,550
ones that, like, to, like, learning all these different

645
00:40:10,930 --> 00:40:14,470
types of adversarial attacks and how to, like, check your model,

646
00:40:15,170 --> 00:40:18,850
in your organization. And by the way, specifically in this

647
00:40:18,850 --> 00:40:22,275
area, I do feel that it's kind of, like, top priority and

648
00:40:22,275 --> 00:40:25,795
like top of mind also for the data science

649
00:40:25,795 --> 00:40:29,015
team. Like you do see that on LLMs,

650
00:40:29,795 --> 00:40:33,155
once they are deployed into production, the data

651
00:40:33,155 --> 00:40:36,615
scientists, they are kind of like understand that there are a lot of risk there

652
00:40:36,770 --> 00:40:40,450
and they are starting, like, to take also, like, responsibility even completely, like, regard

653
00:40:40,690 --> 00:40:44,390
regardless of the security team to make sure that, like, we we

654
00:40:44,609 --> 00:40:48,130
we reduce some of the risk there. Now the risk is not only

655
00:40:48,130 --> 00:40:51,825
security. The first thing is security, like, to try and, like, make sure

656
00:40:51,825 --> 00:40:55,585
that you are secured from all these different adversarial attacks or that you know how

657
00:40:55,585 --> 00:40:59,345
to detect sensitive data leakage, for example, as part of the response and stuff

658
00:40:59,345 --> 00:41:03,025
like that. In addition to that, it's also a lot of time

659
00:41:03,025 --> 00:41:06,800
like safety risks. You want to make sure that once you deploy LLM into

660
00:41:06,800 --> 00:41:10,480
production, your model doesn't give any financial advice to your

661
00:41:10,480 --> 00:41:14,100
customers, doesn't give any health advice in case it's not your business.

662
00:41:14,560 --> 00:41:18,395
So you then have, like, these kinds of responsibility, or example, like in the

663
00:41:18,395 --> 00:41:22,235
Chevy example that you gave, that you just, like, you don't just, release

664
00:41:22,235 --> 00:41:25,915
free cars or flights or books or a tail off, like, anything

665
00:41:25,915 --> 00:41:29,455
like that. So I think that because the

666
00:41:29,595 --> 00:41:33,090
the the the amount of potential risks are

667
00:41:33,090 --> 00:41:36,690
so high on the run time. In this area, I

668
00:41:36,690 --> 00:41:40,450
believe that, like, the data scientists already understood that this is, like,

669
00:41:40,450 --> 00:41:44,130
under their responsibility. They see it also as part of,

670
00:41:44,130 --> 00:41:47,695
like, being a professional data scientist. If I

671
00:41:47,695 --> 00:41:51,095
deploy this model, it has, like, a lot of, like, accuracy, but,

672
00:41:51,095 --> 00:41:54,155
like, it creates all these different kinds of risk.

673
00:41:54,855 --> 00:41:58,615
I would define myself as not a super professional data

674
00:41:58,615 --> 00:42:01,990
scientist, unlike on the supply chain, unlike in the

675
00:42:01,990 --> 00:42:05,670
notebooks that if I code a code that is not secure, I wouldn't say that,

676
00:42:05,670 --> 00:42:08,950
like, it's not professional. I would say that, like, it's okay. You're just, like, focusing

677
00:42:08,950 --> 00:42:12,230
on the business. So I do believe that we start, like, to seeing this shift

678
00:42:12,230 --> 00:42:15,925
also, like, in the mindset of the data scientist because of the risk of

679
00:42:15,925 --> 00:42:19,525
the Gen AI, but now it's also, like, like, a move

680
00:42:19,525 --> 00:42:23,204
to to all the the development and the building practices

681
00:42:23,204 --> 00:42:26,964
that we have. Yeah. And I think data

682
00:42:26,964 --> 00:42:30,480
scientists are acutely aware that LLMs

683
00:42:31,500 --> 00:42:35,340
are just taking they mean, we talk we we call it hallucinating when

684
00:42:35,340 --> 00:42:39,020
they get things wrong. But realistically, they're

685
00:42:39,020 --> 00:42:42,780
always hallucinating to a very real degree. Right? It's just they

686
00:42:42,780 --> 00:42:46,285
happen to be correct. And what these things are doing

687
00:42:46,664 --> 00:42:49,964
under the hood is they are looking for patterns of words.

688
00:42:50,825 --> 00:42:54,204
Sometimes those patterns of words are wrong, obviously wrong.

689
00:42:54,505 --> 00:42:57,325
And sometimes they may give out sensitive information

690
00:42:58,480 --> 00:43:01,920
inadvertently. So I can talk at least at least there's some common sense out there

691
00:43:01,920 --> 00:43:05,680
when they when they do realize these things are higher risk than I think

692
00:43:05,680 --> 00:43:09,360
we've been led to believe. Yeah. Actually, I love this this finish. They are,

693
00:43:09,360 --> 00:43:13,200
like, hallucinating, like, all this time. Sometimes they really find it

694
00:43:13,200 --> 00:43:16,955
as wrong. Like, they do the same thing as always. Right.

695
00:43:16,955 --> 00:43:20,795
Right. The they don't know they're hallucinating because they're just operating normally.

696
00:43:20,795 --> 00:43:24,235
And so when they go in a different direction and I've noticed

697
00:43:24,235 --> 00:43:27,970
that, you know, kinda like a little bit of, you you know, off by a

698
00:43:27,970 --> 00:43:31,410
little bit, and then then then it generates an off by a little bit, off

699
00:43:31,410 --> 00:43:35,250
by a little bit. I ran an experiment with a hallucination, and

700
00:43:35,250 --> 00:43:37,970
I read it through I ran it through a bunch of models and each one

701
00:43:37,970 --> 00:43:41,270
of them didn't do any fact checking, which I mean, realistically,

702
00:43:42,245 --> 00:43:45,605
I wouldn't expect that. Right? In the future, I think that'll be kind of table

703
00:43:45,605 --> 00:43:49,445
stakes. But, you know, it would just go through. So

704
00:43:49,445 --> 00:43:53,205
I took a hallucination, fed it through notebook l m, which then

705
00:43:53,205 --> 00:43:56,760
create even more hallucinations. Right? So it took this little

706
00:43:56,760 --> 00:44:00,380
genesis of something that was wrong and then made it even crazier wrong,

707
00:44:01,160 --> 00:44:04,760
which I think is an interesting kinda statement and and and

708
00:44:04,760 --> 00:44:07,900
also is a risk. Right? Like hallucination on top, compounding

709
00:44:08,475 --> 00:44:12,235
other hallucinations. And I don't think we've really seen that yet because we've

710
00:44:12,235 --> 00:44:15,995
only really seen for the most part, I've only seen one kind

711
00:44:15,995 --> 00:44:19,675
of model in production. But if you have these models that will kinda work together

712
00:44:19,675 --> 00:44:23,320
as agents or, you know, whether they're agents

713
00:44:23,320 --> 00:44:27,160
that do things or agents that it's different LLM discrete LLMs that talk

714
00:44:27,160 --> 00:44:30,760
to one another. They can get things wrong and make things worse. I mean, I

715
00:44:30,760 --> 00:44:34,200
haven't I think it's too soon to tell either way, honestly. Yeah. But, like, the

716
00:44:34,280 --> 00:44:37,714
like, theoretically, like, it makes a lot of sense. I think in general, like, we

717
00:44:37,714 --> 00:44:41,255
don't see, like, a lot like, we hear a lot about Gen AI.

718
00:44:41,555 --> 00:44:45,394
I think that, like, the level of adoption and the amount

719
00:44:45,394 --> 00:44:49,070
of business use cases that, like, businesses

720
00:44:49,290 --> 00:44:52,910
found are not that high yet. I think that, like, the

721
00:44:53,210 --> 00:44:56,830
most of the usage today is done by, like, consumers, like,

722
00:44:57,690 --> 00:45:01,530
like, directly, like, from, from the foundation model providers, like OpenAI and stuff

723
00:45:01,530 --> 00:45:05,315
like that for day to day, like, jobs, like, you know,

724
00:45:05,315 --> 00:45:08,375
like, reviewing mails and stuff like that.

725
00:45:09,395 --> 00:45:12,994
The the big businesses are still trying to find these

726
00:45:12,994 --> 00:45:16,750
different, like, use cases. I do believe that the that the

727
00:45:16,750 --> 00:45:20,430
agents are going, like, to open a lot of different use cases

728
00:45:20,430 --> 00:45:24,270
around it. Right. Right. I could I could see that. And

729
00:45:24,270 --> 00:45:28,030
I think I think it's just too soon to make a statement

730
00:45:28,030 --> 00:45:31,665
either way. But I think grounding yourself in the fundamentals

731
00:45:31,724 --> 00:45:34,785
is probably always a good idea. Mhmm.

732
00:45:35,405 --> 00:45:38,765
And probably a good a good

733
00:45:38,765 --> 00:45:42,600
approach. So so tell me about NOMA. What is is it NOMA? I

734
00:45:42,600 --> 00:45:46,280
I don't wanna make sure I pronounce it. NOMA. Okay. NOMA. Security. What does

735
00:45:46,280 --> 00:45:50,040
NOMA do? Is it security firms that focus on this space? You

736
00:45:50,040 --> 00:45:53,500
mentioned red teaming. Is that is that a sir service you offer?

737
00:45:53,960 --> 00:45:57,615
Yeah. So NOMA basically is an like, our name is Nomo

738
00:45:57,615 --> 00:46:01,375
Security. The domain is Nomo dot security. So it's Oh, okay. Sorry about

739
00:46:01,375 --> 00:46:05,135
that. No. No. We're good. So, so, yeah,

740
00:46:05,135 --> 00:46:08,190
what we do is, like, secure the entire data in the AI life cycle.

741
00:46:09,150 --> 00:46:12,589
Basically means that we truly, like, cover it end to end. Like, we enable, like,

742
00:46:12,589 --> 00:46:16,430
the data teams and the machine learning and the AI teams, to continue and

743
00:46:16,430 --> 00:46:20,030
innovate while we are securing them without

744
00:46:20,030 --> 00:46:23,725
slowing down. And this is like the the like, we are built from, like,

745
00:46:23,725 --> 00:46:27,105
data practitioner, like, the company. So this is, like, our main focus,

746
00:46:27,405 --> 00:46:31,085
meaning that we start, like, from the building phase. So if we

747
00:46:31,085 --> 00:46:34,765
said, like, notebooks and hugging face models and all these different stuff and the

748
00:46:34,765 --> 00:46:38,400
misconfigurations are on all the different stack and all the envelopes

749
00:46:38,400 --> 00:46:42,080
tools and AI platforms and data pipelines and stuff like that. So we are

750
00:46:42,080 --> 00:46:45,520
connected seamlessly on the background, and,

751
00:46:45,840 --> 00:46:49,060
basically assist the the data teams to to work securely,

752
00:46:49,760 --> 00:46:52,395
without changing changing anything in the workflows.

753
00:46:53,494 --> 00:46:56,795
And then also, like, we provide, as you said, the red teaming.

754
00:46:57,255 --> 00:47:00,934
Before you're deploying the model into production, you want to

755
00:47:00,934 --> 00:47:04,450
understand what is the level of, of

756
00:47:04,450 --> 00:47:08,290
robustness and security that the like, that your model has. And

757
00:47:08,290 --> 00:47:11,890
what we do is we had, like, a big research team that,

758
00:47:11,890 --> 00:47:15,490
like, builds, simulated, thousands of different

759
00:47:15,490 --> 00:47:19,185
attacks. And then we dynamically start to run all these attacks against

760
00:47:19,185 --> 00:47:22,485
your models, showing you exactly, like, what kind of, like, tactics

761
00:47:23,025 --> 00:47:26,705
and techniques your model is vulnerable to, and exactly

762
00:47:26,705 --> 00:47:30,145
also how to mitigate and improve it to be more

763
00:47:30,145 --> 00:47:33,940
robust. And then the 3rd part is also the runtime.

764
00:47:34,560 --> 00:47:38,080
We are mapping, we're scanning all the prompts and all the

765
00:47:38,080 --> 00:47:41,840
responses in real time, making sure that you don't

766
00:47:41,840 --> 00:47:45,375
have any risk on both sides. The security, we are detecting all these

767
00:47:45,375 --> 00:47:49,135
different kind of, like, a host and a little, like, adversarial tax prompt

768
00:47:49,135 --> 00:47:52,735
injection, model jailbreak, etcetera. We check also the responses for

769
00:47:52,735 --> 00:47:56,415
sensitive data leakage and stuff like that. But in addition, also the

770
00:47:56,415 --> 00:48:00,180
safety. We see a lot of organizations that the data scientists, as we

771
00:48:00,180 --> 00:48:03,540
said, they understand the risk of deploying

772
00:48:03,540 --> 00:48:07,140
models into production. And this is why not even, like, the security, but more like

773
00:48:07,140 --> 00:48:10,680
the the Chevy example and, like, the the health advice and stuff like that.

774
00:48:10,895 --> 00:48:14,335
So they built for their own, model

775
00:48:14,335 --> 00:48:18,095
guardrails in order to make sure that they are, like, controlling what

776
00:48:18,095 --> 00:48:21,694
are, like, the topics that the model is be able like, is allowed or

777
00:48:21,694 --> 00:48:25,330
disallowed to communicate about. And what we do is basically to save

778
00:48:25,330 --> 00:48:29,010
them also like this time. We also provide them, like, all this

779
00:48:29,010 --> 00:48:32,770
runtime protection already, like, as a service. You can define exactly what kind

780
00:48:32,770 --> 00:48:36,610
of, like, detectors and in native language, what kind of, like, policies you want

781
00:48:36,610 --> 00:48:40,244
to make sure that are enforced. And then we also, like, protect it in the

782
00:48:40,244 --> 00:48:43,685
run time. So, basically, we just, like, cover you, like, end to end, start from

783
00:48:43,685 --> 00:48:46,905
the building and up to the run time. It starts from the classic data engineering

784
00:48:47,205 --> 00:48:50,984
pipelines and machine learning and up to gen AI. Interesting. Interesting.

785
00:48:51,125 --> 00:48:54,540
It sounds like something I think is totally, I think, a

786
00:48:54,540 --> 00:48:58,380
needed needed service and and skill

787
00:48:58,380 --> 00:49:01,980
set. Because you're right. Like, I mean, there's just so many risks

788
00:49:01,980 --> 00:49:05,500
here, and the hype around Gen

789
00:49:05,500 --> 00:49:09,055
AI is so over the top.

790
00:49:10,234 --> 00:49:13,055
It is gonna be revolutionary, but

791
00:49:13,915 --> 00:49:16,954
maybe not in the way you think. Right? And I always call back to the

792
00:49:16,954 --> 00:49:20,700
early days of the dotcom. Right? Where it was pets.com. There was,

793
00:49:20,859 --> 00:49:24,700
you know, this.com, that, you know, like all these crazy things. But the

794
00:49:24,700 --> 00:49:28,300
real quote unquote winner of, you know,

795
00:49:28,300 --> 00:49:31,599
.com was some guy in Seattle selling books.

796
00:49:31,980 --> 00:49:35,660
Mhmm. Right? No one no one I mean, selling books. Like, really?

797
00:49:35,660 --> 00:49:39,395
Like, not, you know, and it's

798
00:49:39,395 --> 00:49:43,155
it's interesting to see how I think I

799
00:49:43,155 --> 00:49:46,675
think that the the obvious use case for chat for for

800
00:49:46,675 --> 00:49:50,355
LLMs thus far has been chatbots. Right? Customer

801
00:49:50,355 --> 00:49:54,010
service type things. I think that's really only the

802
00:49:54,010 --> 00:49:57,690
the the the the the surface of it. I think for me, what

803
00:49:57,690 --> 00:50:01,530
I've seen is most impactful is the ability for natural language

804
00:50:01,530 --> 00:50:05,290
understanding and their ability to understand what's happening in a in

805
00:50:05,290 --> 00:50:08,655
a block of text. And I think

806
00:50:08,655 --> 00:50:12,415
that that has enormous potential. I

807
00:50:12,415 --> 00:50:16,255
agree. A lot of risks too. Right? Because what if, you know, what if

808
00:50:16,255 --> 00:50:19,695
I I mean, to your point. Right? You wanna make sure these things stay on

809
00:50:19,695 --> 00:50:23,099
topic. Right? Like, I don't if I'm talking to a

810
00:50:23,099 --> 00:50:26,880
financial services chatbot and I say, hey, I have

811
00:50:27,420 --> 00:50:29,839
my my leg kinda hurts. Right?

812
00:50:31,180 --> 00:50:34,940
It's, you know, the risk of moving into health care, like, it's just kind

813
00:50:34,940 --> 00:50:38,755
of, I don't how mature are those guardrails? Because I've

814
00:50:38,755 --> 00:50:42,515
not really seen a good implementation of

815
00:50:42,515 --> 00:50:46,275
it yet. Yeah. So, you know, like, I

816
00:50:46,275 --> 00:50:49,955
don't want to to give ourself, like, a compliment, but,

817
00:50:51,230 --> 00:50:54,670
we Oh, you guys are pretty good at it? Yeah. Like, we're pretty good. Like,

818
00:50:54,670 --> 00:50:57,970
we were, like, you know, like, with fortune 5 100, with fortune 1 100.

819
00:50:58,670 --> 00:51:02,349
Not in vain. But, yeah, I believe that in general, specifically, like, when we speak

820
00:51:02,349 --> 00:51:06,125
more, like, on the guardrail side, I see that the most important thing is

821
00:51:06,125 --> 00:51:09,965
to make sure that it's, it's building the

822
00:51:09,965 --> 00:51:13,565
right architecture to be very flexible and easily

823
00:51:13,565 --> 00:51:17,325
configure for the organization because eventually, like, you know, like, each

824
00:51:17,325 --> 00:51:20,900
organization is completely different needs, completely different

825
00:51:20,900 --> 00:51:24,599
context to the calls, like, in their customers, internally to their employees.

826
00:51:25,460 --> 00:51:29,240
So everything should should be, like, very easily configured, but very flexible.

827
00:51:30,180 --> 00:51:33,755
Interesting. Interesting. I wanna I I could talk for

828
00:51:33,755 --> 00:51:36,894
another hour or 2 with you because this is this is a fascinating space.

829
00:51:38,315 --> 00:51:42,154
Where can folks find out more about Noma and you? I you think it's Noma

830
00:51:42,154 --> 00:51:45,970
dot security? Yeah. Noma dot security. Can't believe that's now

831
00:51:45,970 --> 00:51:47,510
a top load pain, but,

832
00:51:50,770 --> 00:51:54,610
and, any any, NOMA dot

833
00:51:54,610 --> 00:51:58,130
security, you're on LinkedIn, and, anything

834
00:51:58,130 --> 00:52:01,275
else you you'd like the folks to find out more?

835
00:52:02,615 --> 00:52:06,135
No. I had, like, a great time speaking with you, Frank. Great.

836
00:52:06,135 --> 00:52:09,815
Likewise. And for the listeners out there, if you're a little bit

837
00:52:09,815 --> 00:52:13,529
scared and a little bit paranoid about generative AI and LLMs,

838
00:52:13,670 --> 00:52:16,549
then I think we had a good conversation. Because I think we need a little

839
00:52:16,549 --> 00:52:19,910
bit of that fear in the back of our heads to guide us and

840
00:52:19,910 --> 00:52:23,589
maybe think about security issues. A

841
00:52:23,589 --> 00:52:26,309
little bit of thought ahead of time will probably save you a lot of problems

842
00:52:26,309 --> 00:52:29,995
later. And want to lose some. That's

843
00:52:29,995 --> 00:52:33,055
that's all I got, and we'll let the nice British AI,

844
00:52:33,915 --> 00:52:37,755
Bailey finish the show. Well, that wraps up another

845
00:52:37,755 --> 00:52:41,540
eye opening episode of data driven. A big thank you to Niamh

846
00:52:41,540 --> 00:52:45,320
Braun for sharing his expertise on the critical intersection of AI,

847
00:52:45,700 --> 00:52:49,540
security, and innovation. If today's conversation didn't make

848
00:52:49,540 --> 00:52:52,900
you double check your data pipelines or rethink your Hugging Face

849
00:52:52,900 --> 00:52:56,535
downloads, well, you're braver than I am. As always,

850
00:52:56,595 --> 00:53:00,435
I'm Bailey, your semi sentient MC, reminding you that while

851
00:53:00,435 --> 00:53:04,215
AI might be clever, it's never too clever for a security breach.

852
00:53:04,435 --> 00:53:08,195
Until next time, stay curious, stay secure, and

853
00:53:08,195 --> 00:53:10,258
stay data driven. Cheerio.