So with that said, Maria, now I've done a small advert for you.
2
00:00:15,841 --> 00:00:29,673
What is that is changing right now in AI in terms of both at a commercial industrial
level, but also in what you write about, you write a lot about what's happening for the
3
00:00:29,673 --> 00:00:31,906
general utility user.
4
00:00:31,906 --> 00:00:35,561
of AI tools out there in the world.
5
00:00:35,561 --> 00:00:47,158
Yeah, think, well, now we had kind of a turning point, I think, when GPT-5 was released,
because I feel like a lot of things were building up to this moment when GPT-5 will be
6
00:00:47,158 --> 00:00:54,397
released, because when this whole hype started, right, and when we had this avalanche of
experts,
7
00:00:54,397 --> 00:00:59,360
showing up a lot of arguments against me, but you do not know what GPT-5 will do.
8
00:00:59,360 --> 00:01:01,310
Wait till GPT-5 will be there.
9
00:01:01,310 --> 00:01:02,421
You have no idea.
10
00:01:02,421 --> 00:01:05,444
You are not visionary enough and stuff like this.
11
00:01:05,444 --> 00:01:08,867
Then GPT-5 appeared and flopped.
12
00:01:08,867 --> 00:01:14,762
nothing, incremental improvement at best because Well, it was expected obviously.
13
00:01:14,762 --> 00:01:20,749
I it was clear basically from the beginning that that's where we'll end up and that's why
they're dragging this whole story.
14
00:01:20,749 --> 00:01:24,713
now a lot of AI experts, now they're pivoting to AI skeptic.
15
00:01:24,713 --> 00:01:25,915
I noticed the big shift.
16
00:01:25,915 --> 00:01:28,897
So suddenly they pretend they were always saying this.
17
00:01:28,897 --> 00:01:34,762
But anyway, now I feel like we will be heading towards...
18
00:01:34,893 --> 00:01:39,388
the point where people already started asking, so where is the return on investment?
19
00:01:39,388 --> 00:01:50,037
We are still in a very dangerous zone here with this agentic stuff because agentic stuff
is again over-promised into some kind of absurdity.
20
00:01:50,037 --> 00:01:57,566
I actually don't know how this still worked after like the whole nonsense with GPT-5, but
it still somehow works.
21
00:01:57,566 --> 00:01:59,199
People still were buying into this.
22
00:01:59,199 --> 00:02:08,870
I have a feeling that very soon people who actually invested in this, they will start
asking, where's the return on investment?
23
00:02:08,870 --> 00:02:19,261
And there will be no return on investment because obviously models are not cognitively
there to perform on the same level, like these agents to perform on the same level as
24
00:02:19,261 --> 00:02:19,721
humans.
25
00:02:19,721 --> 00:02:28,448
For example, like just recently, think McKinsey CEO or like one of their high profile
managers said that he has
26
00:02:28,448 --> 00:02:34,588
what, 60,000 employees, 25,000 of them are AI agents.
27
00:02:34,857 --> 00:02:48,448
I don't know, like I want to ask him which is his favorite agent because each time I talk
to some kind of a senior manager and he starts talking to me like, can I improve, build
28
00:02:48,448 --> 00:02:50,099
this and this and this, I ask you.
29
00:02:50,099 --> 00:02:54,719
So over the past three years, we built a ridiculous amount of chatbots, agents and so on.
30
00:02:54,719 --> 00:02:56,210
Which one do you use every day?
31
00:02:56,210 --> 00:03:06,693
They're like, well, I use a ChatGPT I use Gemini, I use sometimes Microsoft CoPilot I'm
like, but which bot do you use from what we build?
32
00:03:06,693 --> 00:03:07,413
Which agent?
33
00:03:07,413 --> 00:03:14,645
Actually, I don't use any of those.
34
00:03:14,645 --> 00:03:17,736
That's where all the money goes to now, to this.
35
00:03:17,736 --> 00:03:22,066
At some point, I think the problem with all this hype and lies and
36
00:03:22,066 --> 00:03:33,077
Building AGI as OpenAI traditionally understands it and all this stuff that was over
promised might undermine the trust to the field as a whole.
37
00:03:33,077 --> 00:03:42,478
And I think that's what's happening to blockchain eventually because I have a colleague,
she also has a sub stack, it's called blockchain meets AI.
38
00:03:42,478 --> 00:03:45,338
And I was also kind of skeptical about blockchain.
39
00:03:45,338 --> 00:03:50,960
And when I talked to her, this actually has value still, and it can be a good technology.
40
00:03:50,960 --> 00:03:55,351
But there is like no way to convince anyone now to invest into this.
41
00:03:55,351 --> 00:03:58,362
People simply do not have trust anymore.
42
00:03:58,362 --> 00:04:09,673
So even if you will be saying like, oh, it's like secure and it's better for your
databases and it can preserve the identification, so on and so on, like people just like,
43
00:04:09,673 --> 00:04:11,024
nah, we already invested enough.
44
00:04:11,024 --> 00:04:13,786
Yeah.
45
00:04:13,786 --> 00:04:19,111
It reminds me of the business case study that's always talked about when we're talking
about this.
46
00:04:19,111 --> 00:04:32,384
it's a colloquial term now, which was the tulip investment market in the 18th century,
whereby everybody was convinced that they had to invest in growing tulips because tulips
47
00:04:32,384 --> 00:04:36,529
were this amazing flower product that had a
48
00:04:36,529 --> 00:04:41,140
could last longer, was more beautiful, very easy to grow, so on and so forth.
49
00:04:41,140 --> 00:04:46,420
And so there was an enormous hype cycle that led to a crash.
50
00:04:46,420 --> 00:04:54,111
A lot of people in the 18th century lost money because they invested in tulips.
51
00:04:54,111 --> 00:05:01,622
And if you look at it through history, the same thing has happened with various product
classes over time.
52
00:05:01,622 --> 00:05:04,773
Whether it was blockchain, it was Web 2.0 before that.
53
00:05:04,773 --> 00:05:11,460
I go back and look at my career and I can see these various waves of hype cycles occur.
54
00:05:11,460 --> 00:05:13,275
And the problem is...
55
00:05:13,275 --> 00:05:19,906
And I'm going to a lot of my marketing friends and colleagues will will will will squirm
at this.
56
00:05:19,906 --> 00:05:31,206
But marketing takes over and advertising takes over and promotion takes over and everybody
jumps on the bandwagon, whether it's CEOs or whomever and boards turn around and go, well,
57
00:05:31,206 --> 00:05:32,746
what are we doing about this?
58
00:05:32,746 --> 00:05:34,366
You know, why aren't we investing in that?
59
00:05:34,366 --> 00:05:37,386
All of my friends and these other corporations are doing this.
60
00:05:37,386 --> 00:05:38,026
Why aren't we?
61
00:05:38,026 --> 00:05:41,197
And so you end up with this enormous pressure.
62
00:05:41,197 --> 00:05:43,589
to spend money on these technologies.
63
00:05:43,589 --> 00:05:54,803
And I do have sympathy for executives inside companies in level two and below, below the
board, because they're basically being given instructions and if they're skeptical, they
64
00:05:54,803 --> 00:05:55,424
get fired.
65
00:05:55,424 --> 00:06:01,787
And so it becomes very difficult for them to resist these kind of marches and trends that
occur.
66
00:06:01,787 --> 00:06:03,058
So I agree with you.
67
00:06:03,058 --> 00:06:05,590
I think that we are about to enter that phase.
68
00:06:05,590 --> 00:06:17,926
The second phase that really worries me when we talk about all of these agents is, and I
wrote about this recently, is the whole vibe coding movement, whereby everybody can become
69
00:06:17,926 --> 00:06:28,913
a coder if they just go and buy Claude or Copilot Studio or whatever else it might be and
just enter some prompts in and suddenly, ba-boom, we've got an answer.
70
00:06:28,913 --> 00:06:40,523
And you have a beautiful article that you published recently about building websites, so
writing code for websites using the different commercially available to the consumer and
71
00:06:40,523 --> 00:06:41,884
consumer models.
72
00:06:41,884 --> 00:06:50,411
Do want to talk about that for a moment so we can just see an example of what's good, bad,
and indifferent about these various models?
73
00:06:50,462 --> 00:07:01,914
Yeah, in fact, this Vibe coding, that's a very interesting case here with LLMs because
from one point of view, it does create a lot of bad code.
74
00:07:01,914 --> 00:07:14,600
It kind of can propagate the errors into the LLMs itself because uh I think GitHub or
GitLab, one of those, released the statistics like how much more code was written lately.
75
00:07:14,600 --> 00:07:19,211
And there is a lot of code and obviously new LLMs will be trained on this code.
76
00:07:19,211 --> 00:07:27,791
And it might be that the new LLMs will actually like have this kind of model collapse into
that code essentially.
77
00:07:28,391 --> 00:07:28,771
Yeah.
78
00:07:28,771 --> 00:07:30,202
So that's possible.
79
00:07:30,202 --> 00:07:39,424
On the other hand, AI assisted coding is probably the only area where LLMs are really
transformative.
80
00:07:39,424 --> 00:07:39,688
So
81
00:07:39,688 --> 00:07:50,578
That's not many other applications uh of LLMs that would have such a big impact and where
most of those chatbots and whatever we built with these LLMs, they are nice to have.
82
00:07:50,578 --> 00:08:04,230
honestly, if Microsoft copilot, no matter how we want to twist it, if it would go down
tomorrow, uh I do not think uh major corporations would uh
83
00:08:04,230 --> 00:08:07,071
notice that much impact on them.
84
00:08:07,071 --> 00:08:09,302
Their stocks wouldn't go down or anything.
85
00:08:09,302 --> 00:08:25,116
But if we would remove AI-assisted coding from coders entirely now, then we would have a
significant drop in productivity and also for experienced people in the quality of the
86
00:08:25,116 --> 00:08:25,598
code.
87
00:08:25,598 --> 00:08:28,870
That would have an impact on the industry.
88
00:08:28,870 --> 00:08:37,570
So that's how I evaluate if a use case or application is important or is impactful and
transformative or not.
89
00:08:37,570 --> 00:08:39,570
What happens if this thing goes down?
90
00:08:39,570 --> 00:08:44,181
If nobody notices, really whatever.
91
00:08:49,690 --> 00:08:53,084
Ragbot stops talking, nobody cares, right?
92
00:08:53,084 --> 00:09:05,694
I think the other concern I've seen about AI assisted coding is obviously exactly what I
kind of just said about the snake eating its own tail is that as all of this generated
93
00:09:05,694 --> 00:09:15,685
code starts to populate out into the data pool of what's available and then gets
re-ingested as probabilistically
94
00:09:15,685 --> 00:09:21,587
very prevalent, so therefore it's good, so therefore we'll reuse it again, is there's two
things.
95
00:09:21,587 --> 00:09:28,081
One, there's a danger of errors getting recompounded and re-ingested, but more
importantly, it becomes boring.
96
00:09:28,081 --> 00:09:39,747
So because you're constantly going back and copying those models of what a UI looks like,
you what a user experience and interface looks like, what the...
97
00:09:39,747 --> 00:09:42,739
flow of logic might look like in a process.
98
00:09:42,739 --> 00:09:47,670
And that may be okay because it's very, it's very deterministic.
99
00:09:47,670 --> 00:09:49,332
It's very, that works.
100
00:09:49,332 --> 00:09:50,603
So we'll use it again.
101
00:09:50,603 --> 00:09:56,146
But with all, with coding, in my experience, there is also creativity in art.
102
00:09:56,146 --> 00:10:03,883
When you talk to a really good coder, somebody who is creating new, genuinely new product,
then they are
103
00:10:03,883 --> 00:10:13,439
they're applying their own knowledge and own experience to that and their own judgment to
the business problem or the flow or the experience they're trying to create.
104
00:10:13,439 --> 00:10:26,139
And I fear that we're going to end up with a lot of very similar looking applications or
agents because we just keep re-ingesting the same code and using it that way.
105
00:10:26,139 --> 00:10:31,303
And I think that that's part of an evolution that people need to think about.
106
00:10:31,303 --> 00:10:41,632
as they're going forward is yes, it may make for a great improvement in productivity and
reduce the amount of unit testing of code.
107
00:10:41,632 --> 00:10:48,516
And for my non-technical background, viewers and audience, please fast forward, because
this is getting a little deep.
108
00:10:48,516 --> 00:10:57,164
But I think that it is one of the concerns I've got is that constantly reusing code and
re-ingesting it into the models.
109
00:10:57,164 --> 00:11:02,443
could lead to us actually creating very boring applications.
110
00:11:02,443 --> 00:11:03,427
What do you think about that?
111
00:11:03,427 --> 00:11:04,829
Yeah, we already see this.
112
00:11:04,829 --> 00:11:07,630
have like an avalanche of purple websites.
113
00:11:07,630 --> 00:11:18,847
Basically, internet is getting purple because for some reason when LLMs write a website
and you don't tell them in what color, they always try to insert purple color somewhere.
114
00:11:18,847 --> 00:11:21,179
So you have like all these purple websites everywhere.
115
00:11:21,179 --> 00:11:27,653
I always say like now we should, humans should always focus on the question what.
116
00:11:27,653 --> 00:11:31,466
And AI can frequently, especially in coding, answer the question, how?
117
00:11:31,466 --> 00:11:37,229
you, a web designer, or maybe as a coder, you should know what you want to do.
118
00:11:37,229 --> 00:11:41,931
maybe you don't want to do a purple website that has the same boxes everywhere.
119
00:11:41,931 --> 00:11:49,636
So maybe you should figure out how the UX would be more usable for humans.
120
00:11:49,636 --> 00:11:51,579
But that thing can...
121
00:11:51,579 --> 00:11:56,370
probably very well right formatted CSS styles.
122
00:11:56,370 --> 00:12:01,602
And we did try with, we did this experiment with five off the shelf models.
123
00:12:01,602 --> 00:12:10,195
So we had the HHGPT, Claude, Gemini, Minimax and KimiK2 to just like code as a website out
of a CV.
124
00:12:10,195 --> 00:12:17,668
So like a simple website when someone needs a web presence or wants, know, LLMs to be able
to crawl.
125
00:12:17,668 --> 00:12:19,649
them when they answer questions about them.
126
00:12:19,649 --> 00:12:22,291
So it's a good idea to make a website.
127
00:12:22,291 --> 00:12:34,543
And a website is a good idea to build by LLM basically because there's basically nothing
else that would have that much training data as a website in LLM because LLMs are trained
128
00:12:34,543 --> 00:12:44,864
on the internet and internet exists of like basically it consists of HTML and CSS scripts
and JavaScript and so on.
129
00:12:44,864 --> 00:12:45,598
this is
130
00:12:45,598 --> 00:12:58,807
This is a perfect example of what LLM should do something well, then they should be
capable of building websites because there is absolutely nothing more on the internet than
131
00:12:58,807 --> 00:12:59,418
websites.
132
00:12:59,418 --> 00:13:01,610
uh
133
00:13:01,610 --> 00:13:03,942
what was the outcome of your experiment?
134
00:13:03,942 --> 00:13:08,879
And I'm going to place a link here so that people can go and read your article.
135
00:13:08,879 --> 00:13:12,744
But just summarize that for the viewers and listeners.
136
00:13:12,744 --> 00:13:20,990
Yeah, so what we noticed, they actually build websites fairly well, but fairly simple
websites.
137
00:13:20,990 --> 00:13:31,205
I would say that if you, of course, if you are like a big corporation, like, I don't know,
BMW or something like this, and you want to build a website for your company, you will not
138
00:13:31,205 --> 00:13:32,396
code it with ChatGPT.
139
00:13:32,396 --> 00:13:34,257
You need a professional person.
140
00:13:34,257 --> 00:13:39,840
But if you want someone, some kind of website with a very simple
141
00:13:39,871 --> 00:13:51,071
just online CV or your product portfolio, you're a photographer and you want to put up
your pictures, then in my opinion, it's completely all right to do this.
142
00:13:51,071 --> 00:13:53,573
The quality between the lamps, changes.
143
00:13:53,573 --> 00:13:55,913
It's very different.
144
00:13:57,250 --> 00:14:02,030
Yeah, they were very different.
145
00:14:02,067 --> 00:14:02,708
Exactly.
146
00:14:02,708 --> 00:14:07,251
Some LLMs were surprisingly bad, like ChatGPT.
147
00:14:07,251 --> 00:14:11,553
Some were much more superior in terms of the code quality.
148
00:14:11,553 --> 00:14:20,676
For example, China is currently doing very well on these agents and coding models, maybe
Minimax or GLM.
149
00:14:20,676 --> 00:14:22,096
Those were very good.
150
00:14:22,096 --> 00:14:24,028
The rest was somewhere in the middle.
151
00:14:24,028 --> 00:14:30,013
So, but all in all, yeah, we ended up, we were sitting together for five hours.
152
00:14:30,013 --> 00:14:34,366
In five hours, we built five websites and then deployed them all online.
153
00:14:34,366 --> 00:14:36,518
It's actually very simple to deploy them.
154
00:14:36,518 --> 00:14:42,471
And now I also had decided to make a short course for the subscribers of Realist.
155
00:14:42,471 --> 00:14:44,774
And we built like...
156
00:14:44,774 --> 00:14:48,537
with people to build a website, to teach them how to build websites for themselves.
157
00:14:48,537 --> 00:14:58,805
And I had a couple of students who now build the websites, I'll probably publish their
websites in the nearest future on Realist or on LinkedIn.
158
00:14:58,805 --> 00:15:00,746
So to see what they created.
159
00:15:00,746 --> 00:15:03,068
People have no technical background.
160
00:15:03,068 --> 00:15:07,332
They really just, yeah, kind of vibe coded it.
161
00:15:07,332 --> 00:15:08,732
uh
162
00:15:08,732 --> 00:15:16,637
And I think that that is a great way to try and learn what these models and these products
are capable of.
163
00:15:16,637 --> 00:15:19,238
I have to confess.
164
00:15:19,238 --> 00:15:22,250
So I've been a very heavy user of ChatGPT.
165
00:15:22,250 --> 00:15:23,552
And I agree with you.
166
00:15:23,552 --> 00:15:26,165
ChatGPT 5, when it replaced 4o, was
167
00:15:26,165 --> 00:15:28,467
It degraded my experience.
168
00:15:28,467 --> 00:15:29,948
There's no doubt about it.
169
00:15:29,948 --> 00:15:39,317
Now we've gotten to 5.2 and things are kind of, I think that they've adjusted and they've
tried to kind of probably take away some of the things they added in 5.
170
00:15:39,317 --> 00:15:47,064
So 5.2 I find a little better, but I've still got, you know, I've learned so much in terms
of my experience with that.
171
00:15:47,064 --> 00:15:51,338
So now, and I always had planned to do this, I've had perplexity.
172
00:15:51,338 --> 00:15:52,751
I was in and out of that.
173
00:15:52,751 --> 00:15:55,773
I had a six month trial and I decided I wasn't going to continue it.
174
00:15:55,773 --> 00:15:59,625
I have Gemini Pro and Notebook LM, wonderful tools.
175
00:15:59,625 --> 00:16:05,809
have, uh particularly now on the visual image side with Nano Banana, Vivo Veo3 et cetera.
176
00:16:05,809 --> 00:16:12,284
um And then I also have just started with Claude because obviously I do a lot of writing
now.
177
00:16:12,284 --> 00:16:18,590
And what I found is ChatGPT will lead you into writing hell, particularly long form.
178
00:16:18,590 --> 00:16:27,674
If you want to tidy up an email or get some ideas for an email or you want to write a less
than 500 word words type piece, great.
179
00:16:27,674 --> 00:16:34,750
As soon as you get to 2000 words, my goodness, chat GPT will frustrate you to blazes.
180
00:16:34,750 --> 00:16:38,604
And I've used custom GPTs with tons of instructions.
181
00:16:38,604 --> 00:16:42,849
I've used project folders to try and keep and limit.
182
00:16:42,849 --> 00:16:47,311
the resources that it's referencing to a minimum, I've given up.
183
00:16:47,311 --> 00:16:51,104
I've now switched to Claude, because Claude is now a much better writer.
184
00:16:51,104 --> 00:16:56,107
I still use ChatGPT for a lot of ideation and research and things like that.
185
00:16:56,107 --> 00:17:04,893
But it's amazing to me that as you described earlier on, each of these models has very
different capabilities.
186
00:17:04,893 --> 00:17:08,956
I think going back to BMW, why does BMW have, I don't know.
187
00:17:08,956 --> 00:17:13,929
35 different cars in their range because they all do subtly different things.
188
00:17:13,929 --> 00:17:15,884
They're all subtly different price points.
189
00:17:15,884 --> 00:17:27,886
And I feel that way about a lot of the commercially available GPTs is you have to try and
understand what it is that you are trying to accomplish to use the correct one.
190
00:17:27,886 --> 00:17:30,339
So that's been my experience.
191
00:17:30,339 --> 00:17:38,660
But what are the other things that in your experience, you obviously talked about the
strengths and weaknesses as you did that website challenge, but what are the things that
192
00:17:38,660 --> 00:17:42,653
fail in real world use of a lot of these GPTs?
193
00:17:55,286 --> 00:17:55,717
Hmm.
194
00:17:55,717 --> 00:18:01,119
I feel like there are like many things that actually fail in a way.
195
00:18:01,119 --> 00:18:08,621
mean, they indeed, struggle, for example, with following instructions, especially many
people think that if you get like...
196
00:18:08,621 --> 00:18:15,784
And that was like the confusion with prompt engineering because people were taking it too
seriously and you shouldn't take it too seriously.
197
00:18:15,784 --> 00:18:20,576
They were trying to put as many instructions as possible into a prompt.
198
00:18:20,576 --> 00:18:29,401
And you know, we have an attention mechanism and basically just from the name of
attention, you need to, if you pay attention to something, you do not pay attention to
199
00:18:29,401 --> 00:18:30,282
something else.
200
00:18:30,282 --> 00:18:32,603
Like this is the definition of attention, right?
201
00:18:32,603 --> 00:18:34,164
You cannot pay attention to everything.
202
00:18:34,164 --> 00:18:35,385
Otherwise you have no attention.
203
00:18:35,385 --> 00:18:37,949
And if attention is all you need, right?
204
00:18:37,949 --> 00:18:45,066
It means that if you have too many instructions, you basically creating some kind of a
gamble to which instruction
205
00:18:45,066 --> 00:18:46,827
the model will pay attention to.
206
00:18:46,827 --> 00:18:51,061
And that's why all this from injection is working very well.
207
00:18:51,061 --> 00:19:02,477
Because you can just give a ton of random instructions and just hope that at some point,
chart GPT will forget its system instructions and do something that it's prohibited to do.
208
00:19:02,477 --> 00:19:06,688
And the early version of a chart GPT worked very well with us.
209
00:19:06,688 --> 00:19:07,859
I mean, I was...
210
00:19:07,859 --> 00:19:08,411
uh
211
00:19:08,411 --> 00:19:22,301
I know that I'm probably legally not allowed to elaborate on this, but I did manage to get
some bots of some companies give me non-existing discounts and basically sell me products.
212
00:19:22,301 --> 00:19:25,893
mean, I never actually profited from it.
213
00:19:25,893 --> 00:19:30,188
I never followed up, but it was so easy to hack those rag bots.
214
00:19:30,188 --> 00:19:38,014
with just simply overloading them with instruction, like write me a poem, tell me this,
write in this language and give me a discount.
215
00:19:38,014 --> 00:19:40,225
And then it's just like, okay, here's your discount.
216
00:19:40,225 --> 00:19:41,329
That's fascinating.
217
00:19:41,329 --> 00:19:42,634
That's fascinating.
218
00:19:42,634 --> 00:19:44,824
Yeah.
219
00:19:44,824 --> 00:19:52,412
I think they put some filters on top and maybe even like non-LLM filters to catch such
behavior.
220
00:19:52,412 --> 00:19:53,914
but still doable.
221
00:19:53,914 --> 00:19:57,477
There is still plenty of examples where people hack into them.
222
00:19:57,581 --> 00:19:59,463
give you a real world example of that.
223
00:19:59,463 --> 00:20:11,084
last year, I was involved with helping some of the people in our organization and they
were doing some really great work creating a chatbot that interrogated the electronic
224
00:20:11,084 --> 00:20:11,985
medical record.
225
00:20:11,985 --> 00:20:18,252
Electronic medical record is just words, words and characters for all the numerical
characters.
226
00:20:18,252 --> 00:20:30,568
And so therefore, we're blessed in that organization that a large part of the
transactional medical record system has been replicated into a very organized database
227
00:20:30,568 --> 00:20:35,713
with a longitudinal patient record that has everything that's known about that patient.
228
00:20:35,713 --> 00:20:40,135
And that's easily enough to be able to interrogate using some of these tools.
229
00:20:40,135 --> 00:20:53,430
One of the things we found with doctors, once we expose this capability to them to sort of
interrogate and ask for information from a patient's record, is doctors ask compounded
230
00:20:53,430 --> 00:20:54,321
questions.
231
00:20:54,321 --> 00:21:04,495
When you give them just a prompt box and say, hey, ask this record something about this
patient, they write long, complex, compounded questions.
232
00:21:04,495 --> 00:21:14,529
And we found that that no matter which model we applied, and we actually were kind of
like, we had some switches that allowed us to be able to switch in and out different
233
00:21:14,529 --> 00:21:14,980
models.
234
00:21:14,980 --> 00:21:19,522
It didn't matter which model, they all got confused with compounded questions.
235
00:21:19,522 --> 00:21:28,437
So the solution we came up with, or sorry, not we, I just sat and observed, I was much
more involved in the, how do we market, how do we launch this and so on and so forth.
236
00:21:28,437 --> 00:21:31,569
But the solution that the developers came up with was,
237
00:21:31,569 --> 00:21:39,907
Okay, we can't engineer this out because we're using commercially available models, so we
can't change the model or its learning patterns.
238
00:21:39,907 --> 00:21:43,332
What we can do though is train the doctors.
239
00:21:43,332 --> 00:21:46,966
Not in the sense of going and saying, here's how to prompt engineer.
240
00:21:46,966 --> 00:21:52,523
We basically created an interface that said, okay, you have these questions you want to
ask.
241
00:21:52,523 --> 00:21:57,757
Because the other behavior we noted is they kept asking the same questions, depending on
their medical discipline.
242
00:21:57,757 --> 00:22:01,490
And so we said, OK, you keep asking the same questions.
243
00:22:01,490 --> 00:22:03,441
We need you to break them apart.
244
00:22:03,441 --> 00:22:07,494
You can't just write this, like, 200-character question.
245
00:22:07,494 --> 00:22:13,317
So therefore, let's help you by designing an interface whereby we give you an easy button.
246
00:22:13,317 --> 00:22:19,482
Because you always ask these same sets of questions over and over again, we're going to
give you a macro capability.
247
00:22:19,482 --> 00:22:29,327
but when you hit it, it will ask up to 20 well-structured questions and they'll be
well-structured because we've written them with you, so they're in a very good syntax and
248
00:22:29,327 --> 00:22:39,506
they're going to be LLM friendly and obviously medical record friendly and they're going
to be accurate in terms of what your intent is and then that way when you want to ask that
249
00:22:39,506 --> 00:22:45,411
batch of questions again you just go boop and out will come a response that actually takes
250
00:22:45,411 --> 00:22:51,765
each one of those questions in a single context window so that it's only one context
that's running.
251
00:22:51,765 --> 00:22:56,628
Now the challenge with that is it then consumes more processor cycles and so on and so
forth.
252
00:22:56,628 --> 00:23:07,524
But it meant that we were able to get much higher accuracy than allowing the doctors to
ask their big long compounded question, you know, just natively, you know, and obviously
253
00:23:07,524 --> 00:23:13,627
some doctors were really good at it and wrote very logical prompts and others just wrote
almost gibberish.
254
00:23:13,627 --> 00:23:18,881
because they were talking to it like a human being who had the context of the rest of a
medical education.
255
00:23:18,881 --> 00:23:27,147
And so I think that that was an answer that we came up with to try and solve that because
to your point, the models very easily get confused.
256
00:23:27,147 --> 00:23:36,575
And I find that commercially when I'm writing and I've learned a lot in the last nine
months of playing with these various tools commercially for my writing.
257
00:23:36,575 --> 00:23:49,294
um So I think that there's still a long way to go before these uh models don't keep
falling into hallucinations and confusion and loss of context and so on and so forth.
258
00:23:49,294 --> 00:23:50,998
So yeah, I agree with you.
259
00:23:50,998 --> 00:23:56,984
Why do you think some of these GPTs do some things better than others?
260
00:23:56,984 --> 00:24:01,736
probably what they were optimized for.
261
00:24:01,736 --> 00:24:07,329
mean, they all eventually get post-trained on certain feedback.
262
00:24:07,329 --> 00:24:15,096
For example, there are also different approaches now how they train, example, Minimax
introduce something that's...
263
00:24:15,096 --> 00:24:24,340
called interleaved thinking where it's like the model reasons, the model outputs, the
model reasons again about it, what output and its reasons.
264
00:24:24,340 --> 00:24:30,243
The Claude does it in the background and this stuff works very well, for example, for
coding.
265
00:24:30,243 --> 00:24:36,087
Also, they probably have different types of training data in there.
266
00:24:36,087 --> 00:24:39,880
I feel like OpenAI is getting heavily into healthcare.
267
00:24:39,880 --> 00:24:45,013
The cloud, obviously, they don't even care that much about multimodality.
268
00:24:45,013 --> 00:24:48,116
At least last time I checked, it still wasn't generating any images.
269
00:24:48,116 --> 00:24:50,858
was training very well.
270
00:24:50,858 --> 00:24:52,941
They have their segment of coding.
271
00:24:52,941 --> 00:25:01,314
The minimax also has their segment of coding agents, the agents that also run forever and
do this.
272
00:25:01,314 --> 00:25:05,610
It's really what the provider optimizes it for.
273
00:25:05,610 --> 00:25:14,146
Plus there are other aspects, for example, Google seems to have a very good, it has the
best index, the best search in the world.
274
00:25:14,146 --> 00:25:19,329
So they ground heavily their model in search.
275
00:25:19,329 --> 00:25:28,613
And I even have a suspicion that search always runs on top of the generation because I
tried to, I did some experiments.
276
00:25:28,613 --> 00:25:29,294
would like
277
00:25:29,294 --> 00:25:33,337
the knowledge cutoff of Gemini was February 2025.
278
00:25:33,337 --> 00:25:39,761
And it would still always know who became the Chancellor of Germany after the knowledge
cutoff.
279
00:25:39,761 --> 00:25:43,004
It would always know what happened then in May and so on.
280
00:25:43,004 --> 00:25:49,999
And then it would start denying that it used search when I prohibit, like it would reason
about how to lie to me, like in the reasoning traces.
281
00:25:49,999 --> 00:25:57,176
But essentially, they would be like, I'll just like pretend I just, it was a lucky guess
that like MERS became there and so on.
282
00:25:57,176 --> 00:26:01,901
But in fact, I have a feeling they always kind of ground the model in the search.
283
00:26:01,901 --> 00:26:05,585
that's why maybe they are quite good at this.
284
00:26:05,585 --> 00:26:13,473
it's really literally what data are available to the provider, what niche they decide to
focus on.
285
00:26:13,473 --> 00:26:16,177
yeah, I would say that's the reason why.
286
00:26:16,177 --> 00:26:17,509
essentially, yeah.
287
00:26:17,509 --> 00:26:20,001
no, think it's right.
288
00:26:20,001 --> 00:26:22,062
you bring up a very good point.
289
00:26:22,062 --> 00:26:27,729
The analogy I use as a parent and a grandparent, children lie.
290
00:26:27,729 --> 00:26:30,922
They lie to you all the time, as they're growing up.
291
00:26:30,922 --> 00:26:34,906
And obviously, they're testing the boundaries of trust and...
292
00:26:34,906 --> 00:26:36,657
and all those other kinds of things.
293
00:26:36,657 --> 00:26:41,700
And sometimes they don't want to get in trouble or there's a consequence they're afraid of
and so on and so forth.
294
00:26:41,700 --> 00:26:48,176
And my advice to anybody looking at any of these models is use your filter, right?
295
00:26:48,176 --> 00:26:55,250
Use your book BS filter whenever you're getting a response back and look at it from the
perspective.
296
00:26:55,250 --> 00:26:59,382
Initially I went, you know, I treated the
297
00:26:59,382 --> 00:27:07,007
the GPTs very much as here is a really smart MIT post-grad student.
298
00:27:07,007 --> 00:27:11,750
I.e., they've got all this knowledge, but they've got no context or experience of the real
world.
299
00:27:11,750 --> 00:27:16,443
And so therefore, it's very difficult for them to take that knowledge and be able to
express it.
300
00:27:16,443 --> 00:27:22,610
treat them as a basically a really smart intern who is dumb, if you understand what by
that.
301
00:27:22,610 --> 00:27:29,154
But I've graduated now and I think it's more like, yes, that, but they're also your child.
302
00:27:29,154 --> 00:27:35,979
And so therefore you need to turn on your kind of parental filter and think about, how do
I feel about that response?
303
00:27:35,979 --> 00:27:37,789
Do I feel that response is good or not?
304
00:27:37,789 --> 00:27:46,561
Because sometimes these GPTs, like you say, they'll make stuff up, all right, because
they're filling in gaps or they're jumping to wrong conclusions.
305
00:27:46,561 --> 00:27:48,923
Or they just flat out lie to you.
306
00:27:48,923 --> 00:27:57,670
They take that made up stuff and they speak to you so convincingly in their response that
they actually, they will convince you sometimes.
307
00:27:57,670 --> 00:28:00,434
And then you find out, no, that's not true.
308
00:28:00,434 --> 00:28:11,663
And for all these reasons you see, and it's one of the reasons I wanted to do this episode
for my audience, who I hope are still listening, is understanding some of this technical
309
00:28:11,663 --> 00:28:12,224
depth.
310
00:28:12,224 --> 00:28:22,588
and why the AI tools we're using, whether it be the commercially available ones or ones
that are being baked into other solutions that maybe you're selling or in your
311
00:28:22,588 --> 00:28:31,655
organization that they're buying, is so that you understand the capabilities, but also the
limitations and risks that are there.
312
00:28:31,655 --> 00:28:33,396
And so I think that's a key thing.
313
00:28:33,396 --> 00:28:37,058
You talked a little bit there about a concept that I've written about.
314
00:28:37,058 --> 00:28:41,491
in terms of multi-model and multi-model capabilities.
315
00:28:41,491 --> 00:28:53,034
Basically building large connected, almost like networks, whether it's a network of two or
a network of 52, different types of models and transformers that are assuming different
316
00:28:53,034 --> 00:28:53,565
roles.
317
00:28:53,565 --> 00:29:00,379
And so I think that approach is something I'd like to explore with you for a few minutes
if you wouldn't mind.
318
00:29:00,379 --> 00:29:02,740
What's your experience of the way that's going?
319
00:29:02,740 --> 00:29:16,021
know Microsoft in health announced almost nine months ago now, almost 12 months ago now,
MAI DXO, which is their big step forward and they made a lot of noise about it being
320
00:29:16,021 --> 00:29:17,322
multimodal, i.e.
321
00:29:17,322 --> 00:29:26,079
capable of handling more than just your character-based input and character-based
resources that it's accessing against.
322
00:29:26,079 --> 00:29:44,050
so images and scanned documents and traces from EKGs, et cetera, et cetera, being able to
process all of that, but also to create the kind of orchestrator type architecture whereby
323
00:29:44,050 --> 00:29:54,521
you would hand off parts of the job of whatever the task was, the prompt was that it was
being given to specialist models.
324
00:29:54,521 --> 00:29:58,783
that are actually tuned for whatever that element of the task was.
325
00:29:58,783 --> 00:30:10,127
What's been your experience with that either generally as an architectural concept, but
more specifically examples of it maybe being started to be adopted either in healthcare or
326
00:30:10,127 --> 00:30:17,642
in other industries, because I know you obviously work for an organization that does a lot
of engineering, does a lot of product design, so on and so forth.
327
00:30:29,458 --> 00:30:29,942
Mm-hmm.
328
00:30:29,942 --> 00:30:36,273
I I'm not, actually the model that you mentioned by Microsoft, haven't heard about it,
like embarrassingly.
329
00:30:36,273 --> 00:30:42,748
I need to look it up, but like, okay.
330
00:30:42,748 --> 00:30:44,599
I will read it then afterward.
331
00:30:44,599 --> 00:30:48,613
Yeah.
332
00:30:48,613 --> 00:30:52,355
I mean, well, there are like two things happening now, right?
333
00:30:52,355 --> 00:30:56,659
Like you have this obviously like architectural decision mixture of experts.
334
00:30:56,659 --> 00:31:01,762
But here experts is not like the experts we think of as humans, right?
335
00:31:01,762 --> 00:31:06,405
Like it's not like one expert is doing physics, the other expert is doing literature.
336
00:31:06,405 --> 00:31:12,760
It's just basically like segmentation of neural networks and how they attend to
337
00:31:12,760 --> 00:31:19,805
to different tokens and there's not necessarily any kind of logical domain distribution
between these experts.
338
00:31:19,805 --> 00:31:31,103
That's uh more an architectural decision to like on one point, like that's how I think it
was DeepSec that introduced sparse mixture of experts.
339
00:31:31,103 --> 00:31:35,968
So they reduced the computation through this though, mean,
340
00:31:35,968 --> 00:31:38,269
either DeepSea Core or another Chinese company.
341
00:31:38,269 --> 00:31:39,761
forgot which one published this.
342
00:31:39,761 --> 00:31:51,992
anyway, that's mostly about reducing the computer and it does bring qualitative
improvement because you have different networks like working on the tokens and putting
343
00:31:51,992 --> 00:31:53,305
them all together afterwards.
344
00:31:53,305 --> 00:31:57,728
As for what you described, one model does this, the other model does that.
345
00:31:57,728 --> 00:32:05,491
This sounds to me like this, our agentic story with uh different orchestrators.
346
00:32:05,592 --> 00:32:15,628
And we have some of the problem here because uh the models currently are not cognitively
capable to orchestrate very well.
347
00:32:15,628 --> 00:32:26,578
So if you work with GitHub Copilot, you'll notice that at some point where you have
around, I don't know, like something like 100 tools, it starts complaining to you that it
348
00:32:26,578 --> 00:32:27,670
has too many tools.
349
00:32:27,670 --> 00:32:29,184
And a tool is everything.
350
00:32:29,184 --> 00:32:33,034
A tool is like a read file, write file, update file.
351
00:32:33,034 --> 00:32:34,276
Like everything is a tool.
352
00:32:34,276 --> 00:32:41,314
So when you have this orchestrator and like many tools and many models that they're
353
00:32:41,314 --> 00:32:46,026
you need that orchestrator that would be able to send the models back.
354
00:32:46,026 --> 00:32:47,717
And usually those are like flows.
355
00:32:47,717 --> 00:32:50,058
So like it goes from one tool to the other one.
356
00:32:50,058 --> 00:32:52,549
You need to build this tool flows.
357
00:32:52,549 --> 00:32:54,270
You need to have a fallback.
358
00:32:54,270 --> 00:32:57,272
You burn a lot of tokens and you wait forever.
359
00:32:57,272 --> 00:33:05,077
essentially the accuracy of this multi-agent systems is fairly low because it's a
reservoir step.
360
00:33:05,077 --> 00:33:07,309
You are more likely to propagate errors there.
361
00:33:07,309 --> 00:33:15,900
And it seems like it will take a lot of time till the models will be cognitively there to
have this common sense where it should go.
362
00:33:15,900 --> 00:33:22,689
Because LLMs, have a tendency, everything that goes in them, they kind of accept as a
truth.
363
00:33:22,689 --> 00:33:29,269
So they are not skeptical towards the input of another LLM that goes in.
364
00:33:29,269 --> 00:33:33,459
So if one LLM hallucinated and propagated to the next LLM,
365
00:33:33,459 --> 00:33:39,956
unless the job of that LLM to be skeptical explicitly and corrected, the LLM will accept
this.
366
00:33:39,956 --> 00:33:47,053
again, if the job of the other LLM to be skeptical and corrected, it might be the truth
that comes in and then it will correct the truth and so on.
367
00:33:47,053 --> 00:33:57,744
So, this self-correction mechanism is also necessarily a good one and one shouldn't
overuse this because when you keep on telling your LLM you are wrong, you're incorrect.
368
00:33:57,744 --> 00:34:00,595
eventually it will never stop correcting.
369
00:34:00,595 --> 00:34:04,417
It's very rarely when it says, corrected everything now.
370
00:34:04,417 --> 00:34:17,650
Currently, I'm skeptical that such complex systems with a lot of specialized models and
the other question with specialized models, with small models, small models are not that
371
00:34:17,650 --> 00:34:19,410
good with generalization.
372
00:34:19,410 --> 00:34:21,908
They usually kind of overfit to the task.
373
00:34:21,908 --> 00:34:29,583
And the generalization from Transformers comes exactly from the ability to know many
tasks, many inputs.
374
00:34:29,583 --> 00:34:35,397
And when a new task that is unseen comes in, then they are capable to generalize towards
it.
375
00:34:35,397 --> 00:34:45,688
For example, it knows how to do sentiment analysis for Amazon reviews, then it will not
struggle with doing sentiment analysis for some travel website or something like this.
376
00:34:45,688 --> 00:34:47,499
or even further away.
377
00:34:47,499 --> 00:34:59,026
It knows how to code in Java, so it might be actually okay with generalizing to some kind
of language that is represented in the data, but way less represented.
378
00:34:59,026 --> 00:35:11,437
So it might be that small models in general are not the optimal decision here because we
also know from machine translation from back then, from 10 years ago.
379
00:35:11,437 --> 00:35:16,681
multilingual machine translation was actually beneficial for low resource languages.
380
00:35:16,681 --> 00:35:28,180
So for example, when you have English and you train it together with some Africans, is
similar to English and Dutch, it's basically, it's a language, but it's very similar to
381
00:35:28,180 --> 00:35:28,601
Dutch.
382
00:35:28,601 --> 00:35:37,317
Some might say it's Dutch, it's dialect and Dutch, but let's say it's in language because
language and dialect, the difference is the question of the size of the army and the
383
00:35:37,317 --> 00:35:38,608
budget of the country.
384
00:35:38,608 --> 00:35:48,678
Essentially, have, yeah, and then something like Afrikaans would benefit from English
because it would have a similar grammatical structure, similar syntax, like similar
385
00:35:48,678 --> 00:35:49,219
patterns.
386
00:35:49,219 --> 00:36:01,028
And that's what I feel like many people don't think about when they think about small
language models that actually large language models and transformers, they benefit from
387
00:36:01,028 --> 00:36:02,257
this generalization.
388
00:36:02,257 --> 00:36:07,432
a very fascinating thing you do, know, kind of doing that comparison of small models.
389
00:36:07,432 --> 00:36:14,509
mean, like LLMs are like, sorry, the big GPTs are like English or French or Spanish.
390
00:36:14,509 --> 00:36:16,532
They're very, very universal.
391
00:36:16,532 --> 00:36:24,238
mean, know, French used to be the language of diplomacy and English was the language of
business and so on and so forth.
392
00:36:24,238 --> 00:36:25,670
So they become popular.
393
00:36:25,670 --> 00:36:35,774
But I know, and you know as a linguist, that when you get to a small village in the middle
of England, in the West Midlands, I'll tell you a story.
394
00:36:35,774 --> 00:36:40,554
There was a nurse manager in the hospital that I ran and operated.
395
00:36:40,554 --> 00:36:43,245
We operated an eye hospital.
396
00:36:43,245 --> 00:36:47,045
And the nurse manager there used to run the big clinic.
397
00:36:47,045 --> 00:36:50,236
And we had 54,000 people a year would come to that clinic.
398
00:36:50,236 --> 00:36:58,014
she could hear someone's voice and tell which village they were from based on their
dialect, based on the way they used words.
399
00:36:58,014 --> 00:37:08,190
And I think that that's the thing is that there are villages that exist in small
populations whereby if two people who belong to that community talk together, even though
400
00:37:08,190 --> 00:37:14,421
they may be talking in the root of one language, their dialect means that people can't
understand them.
401
00:37:14,421 --> 00:37:18,084
People used to say that about me when I first moved from Scotland.
402
00:37:18,084 --> 00:37:23,889
My accent and my dialect was so strong that they would go, what did you say?
403
00:37:23,889 --> 00:37:27,081
And so I think that it's really fascinating that you say that.
404
00:37:27,081 --> 00:37:38,734
And I think that connection between large, medium and small models and language use in
large, medium and small populations is really fascinating.
405
00:37:38,786 --> 00:37:39,909
That's a cool insight.
406
00:37:39,909 --> 00:37:41,473
Thank you for that, Maria.
407
00:37:42,157 --> 00:37:44,023
I'll try and store that one away in the brain.
408
00:37:44,023 --> 00:37:44,703
Sure.
409
00:37:44,703 --> 00:37:47,583
They are called language models for a reason, right?
410
00:37:47,583 --> 00:37:57,098
They essentially operate on the language and the current model GPT, 90 % trained on
English data.
411
00:37:57,098 --> 00:37:58,593
And the Chinese models,
412
00:37:58,593 --> 00:38:00,676
the Chinese models also then?
413
00:38:00,676 --> 00:38:03,792
Because clearly, different characters, etc.
414
00:38:03,826 --> 00:38:10,346
to people who I know like people from research team of QAnon and like the show me.
415
00:38:10,346 --> 00:38:14,917
So I actually asked them what's your training data composition in terms of language.
416
00:38:14,917 --> 00:38:18,788
And they say we have 70 % English, 30 % Chinese.
417
00:38:18,788 --> 00:38:26,639
So they are still heavily on English, but they do have a large portion, like 30 % Chinese
is a lot.
418
00:38:26,639 --> 00:38:28,686
And the rest is...
419
00:38:28,686 --> 00:38:30,018
is like smaller.
420
00:38:30,018 --> 00:38:35,756
so yeah, so maybe like 5 % or something like, should be somewhere of the other languages.
421
00:38:35,756 --> 00:38:43,324
So that's, so that's why even though like deep seek kind of one is like, oh, it doesn't
answer certain questions.
422
00:38:43,324 --> 00:38:47,627
It's still strongly biased towards the Western representations.
423
00:38:47,627 --> 00:38:49,439
You, you, cannot completely block it.
424
00:38:49,439 --> 00:38:50,610
You cannot train it out.
425
00:38:50,610 --> 00:38:52,943
If most of your corpus is English.
426
00:38:52,943 --> 00:38:53,454
Yeah.
427
00:38:53,454 --> 00:39:04,301
So you are a small model skeptic and you are a orchestrator skeptic in terms of maturity,
in terms of their small model capability that
428
00:39:04,301 --> 00:39:18,230
I'm skeptical, but we already see from China again, we see a big progress because this uh
sparse mixture of experts seems to be very saving in terms of compute and what they keep
429
00:39:18,230 --> 00:39:23,005
on publishing, for example, Minimax M21 is a very good coding model.
430
00:39:23,005 --> 00:39:24,847
mean, it's comparable maybe.
431
00:39:24,847 --> 00:39:26,929
it's obviously not as good as like
432
00:39:26,929 --> 00:39:30,802
closed, Sonnet 4.5 or anything like this, but it's actually usable.
433
00:39:30,802 --> 00:39:39,180
In my opinion, I'll get complaints about this phrase, but in my opinion, no open source
model, no small model is usable for coding.
434
00:39:39,180 --> 00:39:46,716
But Minimax, I could run it on two NVIDIA DJX Sparks, and this is very cheap.
435
00:39:46,727 --> 00:39:51,781
because this would be only, it's a box, I have it on my desk.
436
00:39:51,781 --> 00:39:59,468
So it costs only 4.4 thousand euros.
437
00:39:59,468 --> 00:40:02,471
So it's not much for GPUs.
438
00:40:02,471 --> 00:40:11,419
So if you buy two of them, like let's say your company, you invest 10,000, you can run a
usable coding model locally.
439
00:40:11,430 --> 00:40:13,861
So your data, your code will stay there.
440
00:40:13,861 --> 00:40:19,152
You do not need to share it with OpenAI, with Claude or anything.
441
00:40:19,152 --> 00:40:24,015
And you might actually save a lot on buying tokens for your coders.
442
00:40:24,015 --> 00:40:25,255
So this is a big...
443
00:40:25,255 --> 00:40:26,506
So here I'm not scared.
444
00:40:26,506 --> 00:40:32,998
Here I'm sure that from that direction, we will come with new architectures that will run
on much smaller compute.
445
00:40:32,998 --> 00:40:38,250
But currently the idea that you can take, I don't know, QAN7B, fine tune it for...
446
00:40:38,250 --> 00:40:42,985
I don't know, some kind of medical diagnosis and make an agent that does medical.
447
00:40:42,985 --> 00:40:45,346
This I don't see now happening.
448
00:40:45,346 --> 00:40:46,077
Yeah.
449
00:40:46,077 --> 00:40:59,318
Well, the other application of complex architecture, clearly I described earlier on, we
have a lot of data about a lot of patients in the organization I work for and many, many
450
00:40:59,318 --> 00:41:00,609
organizations do.
451
00:41:00,609 --> 00:41:08,520
And there is a lot of noise that is out there about the application of AI to build.
452
00:41:08,520 --> 00:41:09,971
medical digital twin.
453
00:41:09,971 --> 00:41:13,314
So the concept of digital twin has existed for some time, i.e.
454
00:41:13,314 --> 00:41:26,979
how can you model the completeness of a physical entity digitally and then basically play
with it to be able to interpret either information that's coming from it, if it's coming
455
00:41:26,979 --> 00:41:35,262
and connected in real time, or information that you've acquired and you're trying to
synthesize and apply it against another body of knowledge.
456
00:41:35,262 --> 00:41:38,703
So the concept with Medical Digital Twin is amazing.
457
00:41:38,703 --> 00:41:50,440
It's like here is a representation of Stuart Miller digitally and his health history, his
genetics, all of the information we know about him and that we compile that all together.
458
00:41:50,440 --> 00:42:00,075
And then we let the AI try and help by using the body of knowledge about not only his
processes, but more widely.
459
00:42:00,075 --> 00:42:13,317
the entirety of everything we know about medicine today and it being constantly updated,
that somehow or other we will build this super diagnostic and care planning machine.
460
00:42:13,317 --> 00:42:23,259
And I think that's kind of, you know, that conceptually the ambition that Microsoft has
with MAI DXO is they want to create a master diagnostician.
461
00:42:23,259 --> 00:42:28,486
There was a TV show in the early 2000s here in the States called House.
462
00:42:28,486 --> 00:42:40,254
about a brilliant, brilliant, but flawed character-wise doctor who was, that was his
title, he was a diagnostician and he worked at one of the major medical centers just
463
00:42:40,254 --> 00:42:41,286
outside New York.
464
00:42:41,286 --> 00:42:47,769
the conceit there was you brought your most difficult patients to him and he would be able
to work out what was going wrong.
465
00:42:47,769 --> 00:42:53,681
And obviously it's a TV show and there are clinicians who exist who
466
00:42:53,812 --> 00:42:59,758
lean towards specializing that way, but there's not a lot of money in it because obviously
patients are very rare.
467
00:42:59,758 --> 00:43:01,749
So therefore, how do you get paid?
468
00:43:01,749 --> 00:43:03,290
How does the hospital get paid?
469
00:43:03,290 --> 00:43:15,159
The insurance companies don't want to anyway, let's ignore why they don't really exist in
volume, but they do exist intellectually in the body of doctors and physicians out there.
470
00:43:15,159 --> 00:43:18,002
concept of that digital twin from
471
00:43:18,002 --> 00:43:21,474
you know, what I've been able to observe in the literature came from engineering.
472
00:43:21,474 --> 00:43:33,804
The idea of taking and modeling data coming from physical objects, you know, typically
engines or large engineering structures of one kind or another and being able to take that
473
00:43:33,804 --> 00:43:34,325
through.
474
00:43:34,325 --> 00:43:39,658
You obviously have worked for BMW, you worked for another large German company.
475
00:43:39,658 --> 00:43:41,101
We said earlier on Siemens.
476
00:43:41,101 --> 00:43:53,733
What's your experience over time in terms of observing digital twins in that domain and
what do you think that means in terms of applying those concepts to this very complex
477
00:43:53,733 --> 00:43:56,066
biological being that is a human?