Speaker: 00:00:12

So with that said, Maria, now I've done a small advert for you.

00:00:15,841 --> 00:00:29,673

What is that is changing right now in AI in terms of both at a commercial industrial

level, but also in what you write about, you write a lot about what's happening for the

00:00:29,673 --> 00:00:31,906

general utility user.

00:00:31,906 --> 00:00:35,561

of AI tools out there in the world.

00:00:35,561 --> 00:00:47,158

Yeah, think, well, now we had kind of a turning point, I think, when GPT-5 was released,

because I feel like a lot of things were building up to this moment when GPT-5 will be

00:00:47,158 --> 00:00:54,397

released, because when this whole hype started, right, and when we had this avalanche of

experts,

00:00:54,397 --> 00:00:59,360

showing up a lot of arguments against me, but you do not know what GPT-5 will do.

00:00:59,360 --> 00:01:01,310

Wait till GPT-5 will be there.

00:01:01,310 --> 00:01:02,421

You have no idea.

00:01:02,421 --> 00:01:05,444

You are not visionary enough and stuff like this.

00:01:05,444 --> 00:01:08,867

Then GPT-5 appeared and flopped.

00:01:08,867 --> 00:01:14,762

nothing, incremental improvement at best because Well, it was expected obviously.

00:01:14,762 --> 00:01:20,749

I it was clear basically from the beginning that that's where we'll end up and that's why

they're dragging this whole story.

00:01:20,749 --> 00:01:24,713

now a lot of AI experts, now they're pivoting to AI skeptic.

00:01:24,713 --> 00:01:25,915

I noticed the big shift.

00:01:25,915 --> 00:01:28,897

So suddenly they pretend they were always saying this.

00:01:28,897 --> 00:01:34,762

But anyway, now I feel like we will be heading towards...

00:01:34,893 --> 00:01:39,388

the point where people already started asking, so where is the return on investment?

00:01:39,388 --> 00:01:50,037

We are still in a very dangerous zone here with this agentic stuff because agentic stuff

is again over-promised into some kind of absurdity.

00:01:50,037 --> 00:01:57,566

I actually don't know how this still worked after like the whole nonsense with GPT-5, but

it still somehow works.

00:01:57,566 --> 00:01:59,199

People still were buying into this.

00:01:59,199 --> 00:02:08,870

I have a feeling that very soon people who actually invested in this, they will start

asking, where's the return on investment?

00:02:08,870 --> 00:02:19,261

And there will be no return on investment because obviously models are not cognitively

there to perform on the same level, like these agents to perform on the same level as

00:02:19,261 --> 00:02:19,721

humans.

00:02:19,721 --> 00:02:28,448

For example, like just recently, think McKinsey CEO or like one of their high profile

managers said that he has

00:02:28,448 --> 00:02:34,588

what, 60,000 employees, 25,000 of them are AI agents.

00:02:34,857 --> 00:02:48,448

I don't know, like I want to ask him which is his favorite agent because each time I talk

to some kind of a senior manager and he starts talking to me like, can I improve, build

00:02:48,448 --> 00:02:50,099

this and this and this, I ask you.

00:02:50,099 --> 00:02:54,719

So over the past three years, we built a ridiculous amount of chatbots, agents and so on.

00:02:54,719 --> 00:02:56,210

Which one do you use every day?

00:02:56,210 --> 00:03:06,693

They're like, well, I use a ChatGPT I use Gemini, I use sometimes Microsoft CoPilot I'm

like, but which bot do you use from what we build?

00:03:06,693 --> 00:03:07,413

Which agent?

00:03:07,413 --> 00:03:14,645

Actually, I don't use any of those.

00:03:14,645 --> 00:03:17,736

That's where all the money goes to now, to this.

00:03:17,736 --> 00:03:22,066

At some point, I think the problem with all this hype and lies and

00:03:22,066 --> 00:03:33,077

Building AGI as OpenAI traditionally understands it and all this stuff that was over

promised might undermine the trust to the field as a whole.

00:03:33,077 --> 00:03:42,478

And I think that's what's happening to blockchain eventually because I have a colleague,

she also has a sub stack, it's called blockchain meets AI.

00:03:42,478 --> 00:03:45,338

And I was also kind of skeptical about blockchain.

00:03:45,338 --> 00:03:50,960

And when I talked to her, this actually has value still, and it can be a good technology.

00:03:50,960 --> 00:03:55,351

But there is like no way to convince anyone now to invest into this.

00:03:55,351 --> 00:03:58,362

People simply do not have trust anymore.

00:03:58,362 --> 00:04:09,673

So even if you will be saying like, oh, it's like secure and it's better for your

databases and it can preserve the identification, so on and so on, like people just like,

00:04:09,673 --> 00:04:11,024

nah, we already invested enough.

00:04:11,024 --> 00:04:13,786

Yeah.

00:04:13,786 --> 00:04:19,111

It reminds me of the business case study that's always talked about when we're talking

about this.

00:04:19,111 --> 00:04:32,384

it's a colloquial term now, which was the tulip investment market in the 18th century,

whereby everybody was convinced that they had to invest in growing tulips because tulips

00:04:32,384 --> 00:04:36,529

were this amazing flower product that had a

00:04:36,529 --> 00:04:41,140

could last longer, was more beautiful, very easy to grow, so on and so forth.

00:04:41,140 --> 00:04:46,420

And so there was an enormous hype cycle that led to a crash.

00:04:46,420 --> 00:04:54,111

A lot of people in the 18th century lost money because they invested in tulips.

00:04:54,111 --> 00:05:01,622

And if you look at it through history, the same thing has happened with various product

classes over time.

00:05:01,622 --> 00:05:04,773

Whether it was blockchain, it was Web 2.0 before that.

00:05:04,773 --> 00:05:11,460

I go back and look at my career and I can see these various waves of hype cycles occur.

00:05:11,460 --> 00:05:13,275

And the problem is...

00:05:13,275 --> 00:05:19,906

And I'm going to a lot of my marketing friends and colleagues will will will will squirm

at this.

00:05:19,906 --> 00:05:31,206

But marketing takes over and advertising takes over and promotion takes over and everybody

jumps on the bandwagon, whether it's CEOs or whomever and boards turn around and go, well,

00:05:31,206 --> 00:05:32,746

what are we doing about this?

00:05:32,746 --> 00:05:34,366

You know, why aren't we investing in that?

00:05:34,366 --> 00:05:37,386

All of my friends and these other corporations are doing this.

00:05:37,386 --> 00:05:38,026

Why aren't we?

00:05:38,026 --> 00:05:41,197

And so you end up with this enormous pressure.

00:05:41,197 --> 00:05:43,589

to spend money on these technologies.

00:05:43,589 --> 00:05:54,803

And I do have sympathy for executives inside companies in level two and below, below the

board, because they're basically being given instructions and if they're skeptical, they

00:05:54,803 --> 00:05:55,424

get fired.

00:05:55,424 --> 00:06:01,787

And so it becomes very difficult for them to resist these kind of marches and trends that

occur.

00:06:01,787 --> 00:06:03,058

So I agree with you.

00:06:03,058 --> 00:06:05,590

I think that we are about to enter that phase.

00:06:05,590 --> 00:06:17,926

The second phase that really worries me when we talk about all of these agents is, and I

wrote about this recently, is the whole vibe coding movement, whereby everybody can become

00:06:17,926 --> 00:06:28,913

a coder if they just go and buy Claude or Copilot Studio or whatever else it might be and

just enter some prompts in and suddenly, ba-boom, we've got an answer.

00:06:28,913 --> 00:06:40,523

And you have a beautiful article that you published recently about building websites, so

writing code for websites using the different commercially available to the consumer and

00:06:40,523 --> 00:06:41,884

consumer models.

00:06:41,884 --> 00:06:50,411

Do want to talk about that for a moment so we can just see an example of what's good, bad,

and indifferent about these various models?

00:06:50,462 --> 00:07:01,914

Yeah, in fact, this Vibe coding, that's a very interesting case here with LLMs because

from one point of view, it does create a lot of bad code.

00:07:01,914 --> 00:07:14,600

It kind of can propagate the errors into the LLMs itself because uh I think GitHub or

GitLab, one of those, released the statistics like how much more code was written lately.

00:07:14,600 --> 00:07:19,211

And there is a lot of code and obviously new LLMs will be trained on this code.

00:07:19,211 --> 00:07:27,791

And it might be that the new LLMs will actually like have this kind of model collapse into

that code essentially.

00:07:28,391 --> 00:07:28,771

Yeah.

00:07:28,771 --> 00:07:30,202

So that's possible.

00:07:30,202 --> 00:07:39,424

On the other hand, AI assisted coding is probably the only area where LLMs are really

transformative.

00:07:39,424 --> 00:07:39,688

00:07:39,688 --> 00:07:50,578

That's not many other applications uh of LLMs that would have such a big impact and where

most of those chatbots and whatever we built with these LLMs, they are nice to have.

00:07:50,578 --> 00:08:04,230

honestly, if Microsoft copilot, no matter how we want to twist it, if it would go down

tomorrow, uh I do not think uh major corporations would uh

00:08:04,230 --> 00:08:07,071

notice that much impact on them.

00:08:07,071 --> 00:08:09,302

Their stocks wouldn't go down or anything.

00:08:09,302 --> 00:08:25,116

But if we would remove AI-assisted coding from coders entirely now, then we would have a

significant drop in productivity and also for experienced people in the quality of the

00:08:25,116 --> 00:08:25,598

code.

00:08:25,598 --> 00:08:28,870

That would have an impact on the industry.

00:08:28,870 --> 00:08:37,570

So that's how I evaluate if a use case or application is important or is impactful and

transformative or not.

00:08:37,570 --> 00:08:39,570

What happens if this thing goes down?

00:08:39,570 --> 00:08:44,181

If nobody notices, really whatever.

00:08:49,690 --> 00:08:53,084

Ragbot stops talking, nobody cares, right?

00:08:53,084 --> 00:09:05,694

I think the other concern I've seen about AI assisted coding is obviously exactly what I

kind of just said about the snake eating its own tail is that as all of this generated

00:09:05,694 --> 00:09:15,685

code starts to populate out into the data pool of what's available and then gets

re-ingested as probabilistically

00:09:15,685 --> 00:09:21,587

very prevalent, so therefore it's good, so therefore we'll reuse it again, is there's two

things.

00:09:21,587 --> 00:09:28,081

One, there's a danger of errors getting recompounded and re-ingested, but more

importantly, it becomes boring.

00:09:28,081 --> 00:09:39,747

So because you're constantly going back and copying those models of what a UI looks like,

you what a user experience and interface looks like, what the...

00:09:39,747 --> 00:09:42,739

flow of logic might look like in a process.

00:09:42,739 --> 00:09:47,670

And that may be okay because it's very, it's very deterministic.

00:09:47,670 --> 00:09:49,332

It's very, that works.

100

00:09:49,332 --> 00:09:50,603

So we'll use it again.

101

00:09:50,603 --> 00:09:56,146

But with all, with coding, in my experience, there is also creativity in art.

102

00:09:56,146 --> 00:10:03,883

When you talk to a really good coder, somebody who is creating new, genuinely new product,

then they are

103

00:10:03,883 --> 00:10:13,439

they're applying their own knowledge and own experience to that and their own judgment to

the business problem or the flow or the experience they're trying to create.

104

00:10:13,439 --> 00:10:26,139

And I fear that we're going to end up with a lot of very similar looking applications or

agents because we just keep re-ingesting the same code and using it that way.

105

00:10:26,139 --> 00:10:31,303

And I think that that's part of an evolution that people need to think about.

106

00:10:31,303 --> 00:10:41,632

as they're going forward is yes, it may make for a great improvement in productivity and

reduce the amount of unit testing of code.

107

00:10:41,632 --> 00:10:48,516

And for my non-technical background, viewers and audience, please fast forward, because

this is getting a little deep.

108

00:10:48,516 --> 00:10:57,164

But I think that it is one of the concerns I've got is that constantly reusing code and

re-ingesting it into the models.

109

00:10:57,164 --> 00:11:02,443

could lead to us actually creating very boring applications.

110

00:11:02,443 --> 00:11:03,427

What do you think about that?

111

00:11:03,427 --> 00:11:04,829

Yeah, we already see this.

112

00:11:04,829 --> 00:11:07,630

have like an avalanche of purple websites.

113

00:11:07,630 --> 00:11:18,847

Basically, internet is getting purple because for some reason when LLMs write a website

and you don't tell them in what color, they always try to insert purple color somewhere.

114

00:11:18,847 --> 00:11:21,179

So you have like all these purple websites everywhere.

115

00:11:21,179 --> 00:11:27,653

I always say like now we should, humans should always focus on the question what.

116

00:11:27,653 --> 00:11:31,466

And AI can frequently, especially in coding, answer the question, how?

117

00:11:31,466 --> 00:11:37,229

you, a web designer, or maybe as a coder, you should know what you want to do.

118

00:11:37,229 --> 00:11:41,931

maybe you don't want to do a purple website that has the same boxes everywhere.

119

00:11:41,931 --> 00:11:49,636

So maybe you should figure out how the UX would be more usable for humans.

120

00:11:49,636 --> 00:11:51,579

But that thing can...

121

00:11:51,579 --> 00:11:56,370

probably very well right formatted CSS styles.

122

00:11:56,370 --> 00:12:01,602

And we did try with, we did this experiment with five off the shelf models.

123

00:12:01,602 --> 00:12:10,195

So we had the HHGPT, Claude, Gemini, Minimax and KimiK2 to just like code as a website out

of a CV.

124

00:12:10,195 --> 00:12:17,668

So like a simple website when someone needs a web presence or wants, know, LLMs to be able

to crawl.

125

00:12:17,668 --> 00:12:19,649

them when they answer questions about them.

126

00:12:19,649 --> 00:12:22,291

So it's a good idea to make a website.

127

00:12:22,291 --> 00:12:34,543

And a website is a good idea to build by LLM basically because there's basically nothing

else that would have that much training data as a website in LLM because LLMs are trained

128

00:12:34,543 --> 00:12:44,864

on the internet and internet exists of like basically it consists of HTML and CSS scripts

and JavaScript and so on.

129

00:12:44,864 --> 00:12:45,598

this is

130

00:12:45,598 --> 00:12:58,807

This is a perfect example of what LLM should do something well, then they should be

capable of building websites because there is absolutely nothing more on the internet than

131

00:12:58,807 --> 00:12:59,418

websites.

132

00:12:59,418 --> 00:13:01,610

133

00:13:01,610 --> 00:13:03,942

what was the outcome of your experiment?

134

00:13:03,942 --> 00:13:08,879

And I'm going to place a link here so that people can go and read your article.

135

00:13:08,879 --> 00:13:12,744

But just summarize that for the viewers and listeners.

136

00:13:12,744 --> 00:13:20,990

Yeah, so what we noticed, they actually build websites fairly well, but fairly simple

websites.

137

00:13:20,990 --> 00:13:31,205

I would say that if you, of course, if you are like a big corporation, like, I don't know,

BMW or something like this, and you want to build a website for your company, you will not

138

00:13:31,205 --> 00:13:32,396

code it with ChatGPT.

139

00:13:32,396 --> 00:13:34,257

You need a professional person.

140

00:13:34,257 --> 00:13:39,840

But if you want someone, some kind of website with a very simple

141

00:13:39,871 --> 00:13:51,071

just online CV or your product portfolio, you're a photographer and you want to put up

your pictures, then in my opinion, it's completely all right to do this.

142

00:13:51,071 --> 00:13:53,573

The quality between the lamps, changes.

143

00:13:53,573 --> 00:13:55,913

It's very different.

144

00:13:57,250 --> 00:14:02,030

Yeah, they were very different.

145

00:14:02,067 --> 00:14:02,708

Exactly.

146

00:14:02,708 --> 00:14:07,251

Some LLMs were surprisingly bad, like ChatGPT.

147

00:14:07,251 --> 00:14:11,553

Some were much more superior in terms of the code quality.

148

00:14:11,553 --> 00:14:20,676

For example, China is currently doing very well on these agents and coding models, maybe

Minimax or GLM.

149

00:14:20,676 --> 00:14:22,096

Those were very good.

150

00:14:22,096 --> 00:14:24,028

The rest was somewhere in the middle.

151

00:14:24,028 --> 00:14:30,013

So, but all in all, yeah, we ended up, we were sitting together for five hours.

152

00:14:30,013 --> 00:14:34,366

In five hours, we built five websites and then deployed them all online.

153

00:14:34,366 --> 00:14:36,518

It's actually very simple to deploy them.

154

00:14:36,518 --> 00:14:42,471

And now I also had decided to make a short course for the subscribers of Realist.

155

00:14:42,471 --> 00:14:44,774

And we built like...

156

00:14:44,774 --> 00:14:48,537

with people to build a website, to teach them how to build websites for themselves.

157

00:14:48,537 --> 00:14:58,805

And I had a couple of students who now build the websites, I'll probably publish their

websites in the nearest future on Realist or on LinkedIn.

158

00:14:58,805 --> 00:15:00,746

So to see what they created.

159

00:15:00,746 --> 00:15:03,068

People have no technical background.

160

00:15:03,068 --> 00:15:07,332

They really just, yeah, kind of vibe coded it.

161

00:15:07,332 --> 00:15:08,732

162

00:15:08,732 --> 00:15:16,637

And I think that that is a great way to try and learn what these models and these products

are capable of.

163

00:15:16,637 --> 00:15:19,238

I have to confess.

164

00:15:19,238 --> 00:15:22,250

So I've been a very heavy user of ChatGPT.

165

00:15:22,250 --> 00:15:23,552

And I agree with you.

166

00:15:23,552 --> 00:15:26,165

ChatGPT 5, when it replaced 4o, was

167

00:15:26,165 --> 00:15:28,467

It degraded my experience.

168

00:15:28,467 --> 00:15:29,948

There's no doubt about it.

169

00:15:29,948 --> 00:15:39,317

Now we've gotten to 5.2 and things are kind of, I think that they've adjusted and they've

tried to kind of probably take away some of the things they added in 5.

170

00:15:39,317 --> 00:15:47,064

So 5.2 I find a little better, but I've still got, you know, I've learned so much in terms

of my experience with that.

171

00:15:47,064 --> 00:15:51,338

So now, and I always had planned to do this, I've had perplexity.

172

00:15:51,338 --> 00:15:52,751

I was in and out of that.

173

00:15:52,751 --> 00:15:55,773

I had a six month trial and I decided I wasn't going to continue it.

174

00:15:55,773 --> 00:15:59,625

I have Gemini Pro and Notebook LM, wonderful tools.

175

00:15:59,625 --> 00:16:05,809

have, uh particularly now on the visual image side with Nano Banana, Vivo Veo3 et cetera.

176

00:16:05,809 --> 00:16:12,284

um And then I also have just started with Claude because obviously I do a lot of writing

now.

177

00:16:12,284 --> 00:16:18,590

And what I found is ChatGPT will lead you into writing hell, particularly long form.

178

00:16:18,590 --> 00:16:27,674

If you want to tidy up an email or get some ideas for an email or you want to write a less

than 500 word words type piece, great.

179

00:16:27,674 --> 00:16:34,750

As soon as you get to 2000 words, my goodness, chat GPT will frustrate you to blazes.

180

00:16:34,750 --> 00:16:38,604

And I've used custom GPTs with tons of instructions.

181

00:16:38,604 --> 00:16:42,849

I've used project folders to try and keep and limit.

182

00:16:42,849 --> 00:16:47,311

the resources that it's referencing to a minimum, I've given up.

183

00:16:47,311 --> 00:16:51,104

I've now switched to Claude, because Claude is now a much better writer.

184

00:16:51,104 --> 00:16:56,107

I still use ChatGPT for a lot of ideation and research and things like that.

185

00:16:56,107 --> 00:17:04,893

But it's amazing to me that as you described earlier on, each of these models has very

different capabilities.

186

00:17:04,893 --> 00:17:08,956

I think going back to BMW, why does BMW have, I don't know.

187

00:17:08,956 --> 00:17:13,929

35 different cars in their range because they all do subtly different things.

188

00:17:13,929 --> 00:17:15,884

They're all subtly different price points.

189

00:17:15,884 --> 00:17:27,886

And I feel that way about a lot of the commercially available GPTs is you have to try and

understand what it is that you are trying to accomplish to use the correct one.

190

00:17:27,886 --> 00:17:30,339

So that's been my experience.

191

00:17:30,339 --> 00:17:38,660

But what are the other things that in your experience, you obviously talked about the

strengths and weaknesses as you did that website challenge, but what are the things that

192

00:17:38,660 --> 00:17:42,653

fail in real world use of a lot of these GPTs?

193

00:17:55,286 --> 00:17:55,717

Hmm.

194

00:17:55,717 --> 00:18:01,119

I feel like there are like many things that actually fail in a way.

195

00:18:01,119 --> 00:18:08,621

mean, they indeed, struggle, for example, with following instructions, especially many

people think that if you get like...

196

00:18:08,621 --> 00:18:15,784

And that was like the confusion with prompt engineering because people were taking it too

seriously and you shouldn't take it too seriously.

197

00:18:15,784 --> 00:18:20,576

They were trying to put as many instructions as possible into a prompt.

198

00:18:20,576 --> 00:18:29,401

And you know, we have an attention mechanism and basically just from the name of

attention, you need to, if you pay attention to something, you do not pay attention to

199

00:18:29,401 --> 00:18:30,282

something else.

200

00:18:30,282 --> 00:18:32,603

Like this is the definition of attention, right?

201

00:18:32,603 --> 00:18:34,164

You cannot pay attention to everything.

202

00:18:34,164 --> 00:18:35,385

Otherwise you have no attention.

203

00:18:35,385 --> 00:18:37,949

And if attention is all you need, right?

204

00:18:37,949 --> 00:18:45,066

It means that if you have too many instructions, you basically creating some kind of a

gamble to which instruction

205

00:18:45,066 --> 00:18:46,827

the model will pay attention to.

206

00:18:46,827 --> 00:18:51,061

And that's why all this from injection is working very well.

207

00:18:51,061 --> 00:19:02,477

Because you can just give a ton of random instructions and just hope that at some point,

chart GPT will forget its system instructions and do something that it's prohibited to do.

208

00:19:02,477 --> 00:19:06,688

And the early version of a chart GPT worked very well with us.

209

00:19:06,688 --> 00:19:07,859

I mean, I was...

210

00:19:07,859 --> 00:19:08,411

211

00:19:08,411 --> 00:19:22,301

I know that I'm probably legally not allowed to elaborate on this, but I did manage to get

some bots of some companies give me non-existing discounts and basically sell me products.

212

00:19:22,301 --> 00:19:25,893

mean, I never actually profited from it.

213

00:19:25,893 --> 00:19:30,188

I never followed up, but it was so easy to hack those rag bots.

214

00:19:30,188 --> 00:19:38,014

with just simply overloading them with instruction, like write me a poem, tell me this,

write in this language and give me a discount.

215

00:19:38,014 --> 00:19:40,225

And then it's just like, okay, here's your discount.

216

00:19:40,225 --> 00:19:41,329

That's fascinating.

217

00:19:41,329 --> 00:19:42,634

That's fascinating.

218

00:19:42,634 --> 00:19:44,824

Yeah.

219

00:19:44,824 --> 00:19:52,412

I think they put some filters on top and maybe even like non-LLM filters to catch such

behavior.

220

00:19:52,412 --> 00:19:53,914

but still doable.

221

00:19:53,914 --> 00:19:57,477

There is still plenty of examples where people hack into them.

222

00:19:57,581 --> 00:19:59,463

give you a real world example of that.

223

00:19:59,463 --> 00:20:11,084

last year, I was involved with helping some of the people in our organization and they

were doing some really great work creating a chatbot that interrogated the electronic

224

00:20:11,084 --> 00:20:11,985

medical record.

225

00:20:11,985 --> 00:20:18,252

Electronic medical record is just words, words and characters for all the numerical

characters.

226

00:20:18,252 --> 00:20:30,568

And so therefore, we're blessed in that organization that a large part of the

transactional medical record system has been replicated into a very organized database

227

00:20:30,568 --> 00:20:35,713

with a longitudinal patient record that has everything that's known about that patient.

228

00:20:35,713 --> 00:20:40,135

And that's easily enough to be able to interrogate using some of these tools.

229

00:20:40,135 --> 00:20:53,430

One of the things we found with doctors, once we expose this capability to them to sort of

interrogate and ask for information from a patient's record, is doctors ask compounded

230

00:20:53,430 --> 00:20:54,321

questions.

231

00:20:54,321 --> 00:21:04,495

When you give them just a prompt box and say, hey, ask this record something about this

patient, they write long, complex, compounded questions.

232

00:21:04,495 --> 00:21:14,529

And we found that that no matter which model we applied, and we actually were kind of

like, we had some switches that allowed us to be able to switch in and out different

233

00:21:14,529 --> 00:21:14,980

models.

234

00:21:14,980 --> 00:21:19,522

It didn't matter which model, they all got confused with compounded questions.

235

00:21:19,522 --> 00:21:28,437

So the solution we came up with, or sorry, not we, I just sat and observed, I was much

more involved in the, how do we market, how do we launch this and so on and so forth.

236

00:21:28,437 --> 00:21:31,569

But the solution that the developers came up with was,

237

00:21:31,569 --> 00:21:39,907

Okay, we can't engineer this out because we're using commercially available models, so we

can't change the model or its learning patterns.

238

00:21:39,907 --> 00:21:43,332

What we can do though is train the doctors.

239

00:21:43,332 --> 00:21:46,966

Not in the sense of going and saying, here's how to prompt engineer.

240

00:21:46,966 --> 00:21:52,523

We basically created an interface that said, okay, you have these questions you want to

ask.

241

00:21:52,523 --> 00:21:57,757

Because the other behavior we noted is they kept asking the same questions, depending on

their medical discipline.

242

00:21:57,757 --> 00:22:01,490

And so we said, OK, you keep asking the same questions.

243

00:22:01,490 --> 00:22:03,441

We need you to break them apart.

244

00:22:03,441 --> 00:22:07,494

You can't just write this, like, 200-character question.

245

00:22:07,494 --> 00:22:13,317

So therefore, let's help you by designing an interface whereby we give you an easy button.

246

00:22:13,317 --> 00:22:19,482

Because you always ask these same sets of questions over and over again, we're going to

give you a macro capability.

247

00:22:19,482 --> 00:22:29,327

but when you hit it, it will ask up to 20 well-structured questions and they'll be

well-structured because we've written them with you, so they're in a very good syntax and

248

00:22:29,327 --> 00:22:39,506

they're going to be LLM friendly and obviously medical record friendly and they're going

to be accurate in terms of what your intent is and then that way when you want to ask that

249

00:22:39,506 --> 00:22:45,411

batch of questions again you just go boop and out will come a response that actually takes

250

00:22:45,411 --> 00:22:51,765

each one of those questions in a single context window so that it's only one context

that's running.

251

00:22:51,765 --> 00:22:56,628

Now the challenge with that is it then consumes more processor cycles and so on and so

forth.

252

00:22:56,628 --> 00:23:07,524

But it meant that we were able to get much higher accuracy than allowing the doctors to

ask their big long compounded question, you know, just natively, you know, and obviously

253

00:23:07,524 --> 00:23:13,627

some doctors were really good at it and wrote very logical prompts and others just wrote

almost gibberish.

254

00:23:13,627 --> 00:23:18,881

because they were talking to it like a human being who had the context of the rest of a

medical education.

255

00:23:18,881 --> 00:23:27,147

And so I think that that was an answer that we came up with to try and solve that because

to your point, the models very easily get confused.

256

00:23:27,147 --> 00:23:36,575

And I find that commercially when I'm writing and I've learned a lot in the last nine

months of playing with these various tools commercially for my writing.

257

00:23:36,575 --> 00:23:49,294

um So I think that there's still a long way to go before these uh models don't keep

falling into hallucinations and confusion and loss of context and so on and so forth.

258

00:23:49,294 --> 00:23:50,998

So yeah, I agree with you.

259

00:23:50,998 --> 00:23:56,984

Why do you think some of these GPTs do some things better than others?

260

00:23:56,984 --> 00:24:01,736

probably what they were optimized for.

261

00:24:01,736 --> 00:24:07,329

mean, they all eventually get post-trained on certain feedback.

262

00:24:07,329 --> 00:24:15,096

For example, there are also different approaches now how they train, example, Minimax

introduce something that's...

263

00:24:15,096 --> 00:24:24,340

called interleaved thinking where it's like the model reasons, the model outputs, the

model reasons again about it, what output and its reasons.

264

00:24:24,340 --> 00:24:30,243

The Claude does it in the background and this stuff works very well, for example, for

coding.

265

00:24:30,243 --> 00:24:36,087

Also, they probably have different types of training data in there.

266

00:24:36,087 --> 00:24:39,880

I feel like OpenAI is getting heavily into healthcare.

267

00:24:39,880 --> 00:24:45,013

The cloud, obviously, they don't even care that much about multimodality.

268

00:24:45,013 --> 00:24:48,116

At least last time I checked, it still wasn't generating any images.

269

00:24:48,116 --> 00:24:50,858

was training very well.

270

00:24:50,858 --> 00:24:52,941

They have their segment of coding.

271

00:24:52,941 --> 00:25:01,314

The minimax also has their segment of coding agents, the agents that also run forever and

do this.

272

00:25:01,314 --> 00:25:05,610

It's really what the provider optimizes it for.

273

00:25:05,610 --> 00:25:14,146

Plus there are other aspects, for example, Google seems to have a very good, it has the

best index, the best search in the world.

274

00:25:14,146 --> 00:25:19,329

So they ground heavily their model in search.

275

00:25:19,329 --> 00:25:28,613

And I even have a suspicion that search always runs on top of the generation because I

tried to, I did some experiments.

276

00:25:28,613 --> 00:25:29,294

would like

277

00:25:29,294 --> 00:25:33,337

the knowledge cutoff of Gemini was February 2025.

278

00:25:33,337 --> 00:25:39,761

And it would still always know who became the Chancellor of Germany after the knowledge

cutoff.

279

00:25:39,761 --> 00:25:43,004

It would always know what happened then in May and so on.

280

00:25:43,004 --> 00:25:49,999

And then it would start denying that it used search when I prohibit, like it would reason

about how to lie to me, like in the reasoning traces.

281

00:25:49,999 --> 00:25:57,176

But essentially, they would be like, I'll just like pretend I just, it was a lucky guess

that like MERS became there and so on.

282

00:25:57,176 --> 00:26:01,901

But in fact, I have a feeling they always kind of ground the model in the search.

283

00:26:01,901 --> 00:26:05,585

that's why maybe they are quite good at this.

284

00:26:05,585 --> 00:26:13,473

it's really literally what data are available to the provider, what niche they decide to

focus on.

285

00:26:13,473 --> 00:26:16,177

yeah, I would say that's the reason why.

286

00:26:16,177 --> 00:26:17,509

essentially, yeah.

287

00:26:17,509 --> 00:26:20,001

no, think it's right.

288

00:26:20,001 --> 00:26:22,062

you bring up a very good point.

289

00:26:22,062 --> 00:26:27,729

The analogy I use as a parent and a grandparent, children lie.

290

00:26:27,729 --> 00:26:30,922

They lie to you all the time, as they're growing up.

291

00:26:30,922 --> 00:26:34,906

And obviously, they're testing the boundaries of trust and...

292

00:26:34,906 --> 00:26:36,657

and all those other kinds of things.

293

00:26:36,657 --> 00:26:41,700

And sometimes they don't want to get in trouble or there's a consequence they're afraid of

and so on and so forth.

294

00:26:41,700 --> 00:26:48,176

And my advice to anybody looking at any of these models is use your filter, right?

295

00:26:48,176 --> 00:26:55,250

Use your book BS filter whenever you're getting a response back and look at it from the

perspective.

296

00:26:55,250 --> 00:26:59,382

Initially I went, you know, I treated the

297

00:26:59,382 --> 00:27:07,007

the GPTs very much as here is a really smart MIT post-grad student.

298

00:27:07,007 --> 00:27:11,750

I.e., they've got all this knowledge, but they've got no context or experience of the real

world.

299

00:27:11,750 --> 00:27:16,443

And so therefore, it's very difficult for them to take that knowledge and be able to

express it.

300

00:27:16,443 --> 00:27:22,610

treat them as a basically a really smart intern who is dumb, if you understand what by

that.

301

00:27:22,610 --> 00:27:29,154

But I've graduated now and I think it's more like, yes, that, but they're also your child.

302

00:27:29,154 --> 00:27:35,979

And so therefore you need to turn on your kind of parental filter and think about, how do

I feel about that response?

303

00:27:35,979 --> 00:27:37,789

Do I feel that response is good or not?

304

00:27:37,789 --> 00:27:46,561

Because sometimes these GPTs, like you say, they'll make stuff up, all right, because

they're filling in gaps or they're jumping to wrong conclusions.

305

00:27:46,561 --> 00:27:48,923

Or they just flat out lie to you.

306

00:27:48,923 --> 00:27:57,670

They take that made up stuff and they speak to you so convincingly in their response that

they actually, they will convince you sometimes.

307

00:27:57,670 --> 00:28:00,434

And then you find out, no, that's not true.

308

00:28:00,434 --> 00:28:11,663

And for all these reasons you see, and it's one of the reasons I wanted to do this episode

for my audience, who I hope are still listening, is understanding some of this technical

309

00:28:11,663 --> 00:28:12,224

depth.

310

00:28:12,224 --> 00:28:22,588

and why the AI tools we're using, whether it be the commercially available ones or ones

that are being baked into other solutions that maybe you're selling or in your

311

00:28:22,588 --> 00:28:31,655

organization that they're buying, is so that you understand the capabilities, but also the

limitations and risks that are there.

312

00:28:31,655 --> 00:28:33,396

And so I think that's a key thing.

313

00:28:33,396 --> 00:28:37,058

You talked a little bit there about a concept that I've written about.

314

00:28:37,058 --> 00:28:41,491

in terms of multi-model and multi-model capabilities.

315

00:28:41,491 --> 00:28:53,034

Basically building large connected, almost like networks, whether it's a network of two or

a network of 52, different types of models and transformers that are assuming different

316

00:28:53,034 --> 00:28:53,565

roles.

317

00:28:53,565 --> 00:29:00,379

And so I think that approach is something I'd like to explore with you for a few minutes

if you wouldn't mind.

318

00:29:00,379 --> 00:29:02,740

What's your experience of the way that's going?

319

00:29:02,740 --> 00:29:16,021

know Microsoft in health announced almost nine months ago now, almost 12 months ago now,

MAI DXO, which is their big step forward and they made a lot of noise about it being

320

00:29:16,021 --> 00:29:17,322

multimodal, i.e.

321

00:29:17,322 --> 00:29:26,079

capable of handling more than just your character-based input and character-based

resources that it's accessing against.

322

00:29:26,079 --> 00:29:44,050

so images and scanned documents and traces from EKGs, et cetera, et cetera, being able to

process all of that, but also to create the kind of orchestrator type architecture whereby

323

00:29:44,050 --> 00:29:54,521

you would hand off parts of the job of whatever the task was, the prompt was that it was

being given to specialist models.

324

00:29:54,521 --> 00:29:58,783

that are actually tuned for whatever that element of the task was.

325

00:29:58,783 --> 00:30:10,127

What's been your experience with that either generally as an architectural concept, but

more specifically examples of it maybe being started to be adopted either in healthcare or

326

00:30:10,127 --> 00:30:17,642

in other industries, because I know you obviously work for an organization that does a lot

of engineering, does a lot of product design, so on and so forth.

327

00:30:29,458 --> 00:30:29,942

Mm-hmm.

328

00:30:29,942 --> 00:30:36,273

I I'm not, actually the model that you mentioned by Microsoft, haven't heard about it,

like embarrassingly.

329

00:30:36,273 --> 00:30:42,748

I need to look it up, but like, okay.

330

00:30:42,748 --> 00:30:44,599

I will read it then afterward.

331

00:30:44,599 --> 00:30:48,613

Yeah.

332

00:30:48,613 --> 00:30:52,355

I mean, well, there are like two things happening now, right?

333

00:30:52,355 --> 00:30:56,659

Like you have this obviously like architectural decision mixture of experts.

334

00:30:56,659 --> 00:31:01,762

But here experts is not like the experts we think of as humans, right?

335

00:31:01,762 --> 00:31:06,405

Like it's not like one expert is doing physics, the other expert is doing literature.

336

00:31:06,405 --> 00:31:12,760

It's just basically like segmentation of neural networks and how they attend to

337

00:31:12,760 --> 00:31:19,805

to different tokens and there's not necessarily any kind of logical domain distribution

between these experts.

338

00:31:19,805 --> 00:31:31,103

That's uh more an architectural decision to like on one point, like that's how I think it

was DeepSec that introduced sparse mixture of experts.

339

00:31:31,103 --> 00:31:35,968

So they reduced the computation through this though, mean,

340

00:31:35,968 --> 00:31:38,269

either DeepSea Core or another Chinese company.

341

00:31:38,269 --> 00:31:39,761

forgot which one published this.

342

00:31:39,761 --> 00:31:51,992

anyway, that's mostly about reducing the computer and it does bring qualitative

improvement because you have different networks like working on the tokens and putting

343

00:31:51,992 --> 00:31:53,305

them all together afterwards.

344

00:31:53,305 --> 00:31:57,728

As for what you described, one model does this, the other model does that.

345

00:31:57,728 --> 00:32:05,491

This sounds to me like this, our agentic story with uh different orchestrators.

346

00:32:05,592 --> 00:32:15,628

And we have some of the problem here because uh the models currently are not cognitively

capable to orchestrate very well.

347

00:32:15,628 --> 00:32:26,578

So if you work with GitHub Copilot, you'll notice that at some point where you have

around, I don't know, like something like 100 tools, it starts complaining to you that it

348

00:32:26,578 --> 00:32:27,670

has too many tools.

349

00:32:27,670 --> 00:32:29,184

And a tool is everything.

350

00:32:29,184 --> 00:32:33,034

A tool is like a read file, write file, update file.

351

00:32:33,034 --> 00:32:34,276

Like everything is a tool.

352

00:32:34,276 --> 00:32:41,314

So when you have this orchestrator and like many tools and many models that they're

353

00:32:41,314 --> 00:32:46,026

you need that orchestrator that would be able to send the models back.

354

00:32:46,026 --> 00:32:47,717

And usually those are like flows.

355

00:32:47,717 --> 00:32:50,058

So like it goes from one tool to the other one.

356

00:32:50,058 --> 00:32:52,549

You need to build this tool flows.

357

00:32:52,549 --> 00:32:54,270

You need to have a fallback.

358

00:32:54,270 --> 00:32:57,272

You burn a lot of tokens and you wait forever.

359

00:32:57,272 --> 00:33:05,077

essentially the accuracy of this multi-agent systems is fairly low because it's a

reservoir step.

360

00:33:05,077 --> 00:33:07,309

You are more likely to propagate errors there.

361

00:33:07,309 --> 00:33:15,900

And it seems like it will take a lot of time till the models will be cognitively there to

have this common sense where it should go.

362

00:33:15,900 --> 00:33:22,689

Because LLMs, have a tendency, everything that goes in them, they kind of accept as a

truth.

363

00:33:22,689 --> 00:33:29,269

So they are not skeptical towards the input of another LLM that goes in.

364

00:33:29,269 --> 00:33:33,459

So if one LLM hallucinated and propagated to the next LLM,

365

00:33:33,459 --> 00:33:39,956

unless the job of that LLM to be skeptical explicitly and corrected, the LLM will accept

this.

366

00:33:39,956 --> 00:33:47,053

again, if the job of the other LLM to be skeptical and corrected, it might be the truth

that comes in and then it will correct the truth and so on.

367

00:33:47,053 --> 00:33:57,744

So, this self-correction mechanism is also necessarily a good one and one shouldn't

overuse this because when you keep on telling your LLM you are wrong, you're incorrect.

368

00:33:57,744 --> 00:34:00,595

eventually it will never stop correcting.

369

00:34:00,595 --> 00:34:04,417

It's very rarely when it says, corrected everything now.

370

00:34:04,417 --> 00:34:17,650

Currently, I'm skeptical that such complex systems with a lot of specialized models and

the other question with specialized models, with small models, small models are not that

371

00:34:17,650 --> 00:34:19,410

good with generalization.

372

00:34:19,410 --> 00:34:21,908

They usually kind of overfit to the task.

373

00:34:21,908 --> 00:34:29,583

And the generalization from Transformers comes exactly from the ability to know many

tasks, many inputs.

374

00:34:29,583 --> 00:34:35,397

And when a new task that is unseen comes in, then they are capable to generalize towards

it.

375

00:34:35,397 --> 00:34:45,688

For example, it knows how to do sentiment analysis for Amazon reviews, then it will not

struggle with doing sentiment analysis for some travel website or something like this.

376

00:34:45,688 --> 00:34:47,499

or even further away.

377

00:34:47,499 --> 00:34:59,026

It knows how to code in Java, so it might be actually okay with generalizing to some kind

of language that is represented in the data, but way less represented.

378

00:34:59,026 --> 00:35:11,437

So it might be that small models in general are not the optimal decision here because we

also know from machine translation from back then, from 10 years ago.

379

00:35:11,437 --> 00:35:16,681

multilingual machine translation was actually beneficial for low resource languages.

380

00:35:16,681 --> 00:35:28,180

So for example, when you have English and you train it together with some Africans, is

similar to English and Dutch, it's basically, it's a language, but it's very similar to

381

00:35:28,180 --> 00:35:28,601

Dutch.

382

00:35:28,601 --> 00:35:37,317

Some might say it's Dutch, it's dialect and Dutch, but let's say it's in language because

language and dialect, the difference is the question of the size of the army and the

383

00:35:37,317 --> 00:35:38,608

budget of the country.

384

00:35:38,608 --> 00:35:48,678

Essentially, have, yeah, and then something like Afrikaans would benefit from English

because it would have a similar grammatical structure, similar syntax, like similar

385

00:35:48,678 --> 00:35:49,219

patterns.

386

00:35:49,219 --> 00:36:01,028

And that's what I feel like many people don't think about when they think about small

language models that actually large language models and transformers, they benefit from

387

00:36:01,028 --> 00:36:02,257

this generalization.

388

00:36:02,257 --> 00:36:07,432

a very fascinating thing you do, know, kind of doing that comparison of small models.

389

00:36:07,432 --> 00:36:14,509

mean, like LLMs are like, sorry, the big GPTs are like English or French or Spanish.

390

00:36:14,509 --> 00:36:16,532

They're very, very universal.

391

00:36:16,532 --> 00:36:24,238

mean, know, French used to be the language of diplomacy and English was the language of

business and so on and so forth.

392

00:36:24,238 --> 00:36:25,670

So they become popular.

393

00:36:25,670 --> 00:36:35,774

But I know, and you know as a linguist, that when you get to a small village in the middle

of England, in the West Midlands, I'll tell you a story.

394

00:36:35,774 --> 00:36:40,554

There was a nurse manager in the hospital that I ran and operated.

395

00:36:40,554 --> 00:36:43,245

We operated an eye hospital.

396

00:36:43,245 --> 00:36:47,045

And the nurse manager there used to run the big clinic.

397

00:36:47,045 --> 00:36:50,236

And we had 54,000 people a year would come to that clinic.

398

00:36:50,236 --> 00:36:58,014

she could hear someone's voice and tell which village they were from based on their

dialect, based on the way they used words.

399

00:36:58,014 --> 00:37:08,190

And I think that that's the thing is that there are villages that exist in small

populations whereby if two people who belong to that community talk together, even though

400

00:37:08,190 --> 00:37:14,421

they may be talking in the root of one language, their dialect means that people can't

understand them.

401

00:37:14,421 --> 00:37:18,084

People used to say that about me when I first moved from Scotland.

402

00:37:18,084 --> 00:37:23,889

My accent and my dialect was so strong that they would go, what did you say?

403

00:37:23,889 --> 00:37:27,081

And so I think that it's really fascinating that you say that.

404

00:37:27,081 --> 00:37:38,734

And I think that connection between large, medium and small models and language use in

large, medium and small populations is really fascinating.

405

00:37:38,786 --> 00:37:39,909

That's a cool insight.

406

00:37:39,909 --> 00:37:41,473

Thank you for that, Maria.

407

00:37:42,157 --> 00:37:44,023

I'll try and store that one away in the brain.

408

00:37:44,023 --> 00:37:44,703

Sure.

409

00:37:44,703 --> 00:37:47,583

They are called language models for a reason, right?

410

00:37:47,583 --> 00:37:57,098

They essentially operate on the language and the current model GPT, 90 % trained on

English data.

411

00:37:57,098 --> 00:37:58,593

And the Chinese models,

412

00:37:58,593 --> 00:38:00,676

the Chinese models also then?

413

00:38:00,676 --> 00:38:03,792

Because clearly, different characters, etc.

414

00:38:03,826 --> 00:38:10,346

to people who I know like people from research team of QAnon and like the show me.

415

00:38:10,346 --> 00:38:14,917

So I actually asked them what's your training data composition in terms of language.

416

00:38:14,917 --> 00:38:18,788

And they say we have 70 % English, 30 % Chinese.

417

00:38:18,788 --> 00:38:26,639

So they are still heavily on English, but they do have a large portion, like 30 % Chinese

is a lot.

418

00:38:26,639 --> 00:38:28,686

And the rest is...

419

00:38:28,686 --> 00:38:30,018

is like smaller.

420

00:38:30,018 --> 00:38:35,756

so yeah, so maybe like 5 % or something like, should be somewhere of the other languages.

421

00:38:35,756 --> 00:38:43,324

So that's, so that's why even though like deep seek kind of one is like, oh, it doesn't

answer certain questions.

422

00:38:43,324 --> 00:38:47,627

It's still strongly biased towards the Western representations.

423

00:38:47,627 --> 00:38:49,439

You, you, cannot completely block it.

424

00:38:49,439 --> 00:38:50,610

You cannot train it out.

425

00:38:50,610 --> 00:38:52,943

If most of your corpus is English.

426

00:38:52,943 --> 00:38:53,454

Yeah.

427

00:38:53,454 --> 00:39:04,301

So you are a small model skeptic and you are a orchestrator skeptic in terms of maturity,

in terms of their small model capability that

428

00:39:04,301 --> 00:39:18,230

I'm skeptical, but we already see from China again, we see a big progress because this uh

sparse mixture of experts seems to be very saving in terms of compute and what they keep

429

00:39:18,230 --> 00:39:23,005

on publishing, for example, Minimax M21 is a very good coding model.

430

00:39:23,005 --> 00:39:24,847

mean, it's comparable maybe.

431

00:39:24,847 --> 00:39:26,929

it's obviously not as good as like

432

00:39:26,929 --> 00:39:30,802

closed, Sonnet 4.5 or anything like this, but it's actually usable.

433

00:39:30,802 --> 00:39:39,180

In my opinion, I'll get complaints about this phrase, but in my opinion, no open source

model, no small model is usable for coding.

434

00:39:39,180 --> 00:39:46,716

But Minimax, I could run it on two NVIDIA DJX Sparks, and this is very cheap.

435

00:39:46,727 --> 00:39:51,781

because this would be only, it's a box, I have it on my desk.

436

00:39:51,781 --> 00:39:59,468

So it costs only 4.4 thousand euros.

437

00:39:59,468 --> 00:40:02,471

So it's not much for GPUs.

438

00:40:02,471 --> 00:40:11,419

So if you buy two of them, like let's say your company, you invest 10,000, you can run a

usable coding model locally.

439

00:40:11,430 --> 00:40:13,861

So your data, your code will stay there.

440

00:40:13,861 --> 00:40:19,152

You do not need to share it with OpenAI, with Claude or anything.

441

00:40:19,152 --> 00:40:24,015

And you might actually save a lot on buying tokens for your coders.

442

00:40:24,015 --> 00:40:25,255

So this is a big...

443

00:40:25,255 --> 00:40:26,506

So here I'm not scared.

444

00:40:26,506 --> 00:40:32,998

Here I'm sure that from that direction, we will come with new architectures that will run

on much smaller compute.

445

00:40:32,998 --> 00:40:38,250

But currently the idea that you can take, I don't know, QAN7B, fine tune it for...

446

00:40:38,250 --> 00:40:42,985

I don't know, some kind of medical diagnosis and make an agent that does medical.

447

00:40:42,985 --> 00:40:45,346

This I don't see now happening.

448

00:40:45,346 --> 00:40:46,077

Yeah.

449

00:40:46,077 --> 00:40:59,318

Well, the other application of complex architecture, clearly I described earlier on, we

have a lot of data about a lot of patients in the organization I work for and many, many

450

00:40:59,318 --> 00:41:00,609

organizations do.

451

00:41:00,609 --> 00:41:08,520

And there is a lot of noise that is out there about the application of AI to build.

452

00:41:08,520 --> 00:41:09,971

medical digital twin.

453

00:41:09,971 --> 00:41:13,314

So the concept of digital twin has existed for some time, i.e.

454

00:41:13,314 --> 00:41:26,979

how can you model the completeness of a physical entity digitally and then basically play

with it to be able to interpret either information that's coming from it, if it's coming

455

00:41:26,979 --> 00:41:35,262

and connected in real time, or information that you've acquired and you're trying to

synthesize and apply it against another body of knowledge.

456

00:41:35,262 --> 00:41:38,703

So the concept with Medical Digital Twin is amazing.

457

00:41:38,703 --> 00:41:50,440

It's like here is a representation of Stuart Miller digitally and his health history, his

genetics, all of the information we know about him and that we compile that all together.

458

00:41:50,440 --> 00:42:00,075

And then we let the AI try and help by using the body of knowledge about not only his

processes, but more widely.

459

00:42:00,075 --> 00:42:13,317

the entirety of everything we know about medicine today and it being constantly updated,

that somehow or other we will build this super diagnostic and care planning machine.

460

00:42:13,317 --> 00:42:23,259

And I think that's kind of, you know, that conceptually the ambition that Microsoft has

with MAI DXO is they want to create a master diagnostician.

461

00:42:23,259 --> 00:42:28,486

There was a TV show in the early 2000s here in the States called House.

462

00:42:28,486 --> 00:42:40,254

about a brilliant, brilliant, but flawed character-wise doctor who was, that was his

title, he was a diagnostician and he worked at one of the major medical centers just

463

00:42:40,254 --> 00:42:41,286

outside New York.

464

00:42:41,286 --> 00:42:47,769

the conceit there was you brought your most difficult patients to him and he would be able

to work out what was going wrong.

465

00:42:47,769 --> 00:42:53,681

And obviously it's a TV show and there are clinicians who exist who

466

00:42:53,812 --> 00:42:59,758

lean towards specializing that way, but there's not a lot of money in it because obviously

patients are very rare.

467

00:42:59,758 --> 00:43:01,749

So therefore, how do you get paid?

468

00:43:01,749 --> 00:43:03,290

How does the hospital get paid?

469

00:43:03,290 --> 00:43:15,159

The insurance companies don't want to anyway, let's ignore why they don't really exist in

volume, but they do exist intellectually in the body of doctors and physicians out there.

470

00:43:15,159 --> 00:43:18,002

concept of that digital twin from

471

00:43:18,002 --> 00:43:21,474

you know, what I've been able to observe in the literature came from engineering.

472

00:43:21,474 --> 00:43:33,804

The idea of taking and modeling data coming from physical objects, you know, typically

engines or large engineering structures of one kind or another and being able to take that

473

00:43:33,804 --> 00:43:34,325

through.

474

00:43:34,325 --> 00:43:39,658

You obviously have worked for BMW, you worked for another large German company.

475

00:43:39,658 --> 00:43:41,101

We said earlier on Siemens.

476

00:43:41,101 --> 00:43:53,733

What's your experience over time in terms of observing digital twins in that domain and

what do you think that means in terms of applying those concepts to this very complex

477

00:43:53,733 --> 00:43:56,066

biological being that is a human?