Speaker: 00:00:00

Welcome back to another riveting episode of Data Driven.

Speaker: 00:00:03

Joining us today, lakeside and positively glowing from his

Speaker: 00:00:07

Appalachian retreat, is Frank. Meanwhile, the

Speaker: 00:00:11

always astute and ever energetic Andy is here to keep us

Speaker: 00:00:14

grounded. But enough about us. Today, we have

Speaker: 00:00:18

a true luminary in the field of AI, someone who's blending the worlds

Speaker: 00:00:22

of academia and enterprise with seamless finesse. He's an

Speaker: 00:00:25

associate professor at the Technion, has published over 100

Speaker: 00:00:29

research papers on automated speech recognition, and is the chief

Speaker: 00:00:33

scientist at Iola. Please welcome doctor Yossi

Speaker: 00:00:36

Keshet or as he's known to his friends, Yossi.

Speaker: 00:00:47

Alright. Hello, and welcome to Data Driven, the podcast where we explore the

Speaker: 00:00:50

emergent fields of artificial intelligence, data science, and,

Speaker: 00:00:55

and, of course, data engineering, without which the whole world would probably stop turning.

Speaker: 00:00:59

And you know, data engineering is important. That's

Speaker: 00:01:03

basically it. Still working on that that that revamped

Speaker: 00:01:06

monologue, for, for season 8, Andy. Were

Speaker: 00:01:10

you on vacation? You're on vacation. I am on vacation. And

Speaker: 00:01:14

for those of you who can't see on camera who are not who are

Speaker: 00:01:17

listening, not watching, I am literally lakeside,

Speaker: 00:01:22

in the foothills. Well, not the foothills. We are actually in the Appalachian Mountains. Or

Speaker: 00:01:25

is it Appalachian? I I never I I've heard of those. I I never

Speaker: 00:01:29

got a clear read on it. Say either. So, you know When I say either.

Speaker: 00:01:32

Yeah. Yeah. Yeah. Yeah. Yeah. So I am in Deep Creek Lake,

Speaker: 00:01:36

Maryland, which is kind of like, Maryland doesn't really have a Panhandle

Speaker: 00:01:40

per se, but if it did, it would be this is what this would be.

Speaker: 00:01:44

I probably think I'm 5 miles from West Virginia and about

Speaker: 00:01:47

20 miles from Pennsylvania. So it's kind of like this quiet

Speaker: 00:01:51

little corner of the state.

Speaker: 00:01:54

And I've been, you know, reading and studying

Speaker: 00:01:58

today. I hit day 600 on Pluralsight Consecutive. Nice.

Speaker: 00:02:02

So recording this June 17th. And, how

Speaker: 00:02:06

things with you, Andy? Things are good. I'm gonna throw out a plug for

Speaker: 00:02:10

data driven media dot tv because Frank mentioned.

Speaker: 00:02:13

If you're listening, he while he was mentioning that, he was

Speaker: 00:02:17

actually panning the camera over to the lake. But if

Speaker: 00:02:20

you're, subscribing to data driven media dot tv, you get

Speaker: 00:02:24

to see us. You get to see the video, and you

Speaker: 00:02:28

can see, for instance, that I am wearing the, my data is the

Speaker: 00:02:32

new oil t shirt, which you can pick up. I'm just full of

Speaker: 00:02:35

sponsor stuff today. I'm just doing Well, it's self out. It's

Speaker: 00:02:39

self sponsored. And, honestly, we really need to get better at that. Right? We have

Speaker: 00:02:43

data channel. Tv. There is a for listeners to the show, I will give

Speaker: 00:02:47

a preview. There is gonna be data driven academy is is launching soon. You have

Speaker: 00:02:50

a course coming up the end of the month. Actually, yeah, it's fabric.

Speaker: 00:02:55

Today. We're recording this on 17th. It's 24th

Speaker: 00:02:59

of of June, but I'm also doing, 2 more, at

Speaker: 00:03:03

near the ends of July August. And in addition

Speaker: 00:03:07

to that, while we're shameless plugging away here,

Speaker: 00:03:10

before we get to our very interesting guest, now I'm also bringing

Speaker: 00:03:14

back my, day of Azure Data Factory as wildly

Speaker: 00:03:18

popular. I delivered it at a couple of, conferences,

Speaker: 00:03:22

international conferences, 22, 23. And,

Speaker: 00:03:27

yeah. Let's see see if people are interested. What do you do Friday this

Speaker: 00:03:31

afternoon Friday afternoons, Andy? Oh, there's this thing, Frank. Thanks for

Speaker: 00:03:34

mentioning that. Totally free. We we gotta we're trying to get better at this. That's

Speaker: 00:03:37

all. We do. Yeah. Data engineering Fridays. And if you go to data engineering

Speaker: 00:03:41

fridays.com, you can learn more about that. Frank, you're doing a lot

Speaker: 00:03:45

of stuff with I noticed with using the, encore

Speaker: 00:03:49

replay feature in Restream. And it's

Speaker: 00:03:52

right you you shared that with me. I started doing that with data engineering

Speaker: 00:03:56

Fridays as well. But great a great way to,

Speaker: 00:04:00

you know, to get your message out there. And, you

Speaker: 00:04:04

know, I I had no idea replays would help. But my gosh.

Speaker: 00:04:08

They really have. It's just a matter of just hitting the echo of I

Speaker: 00:04:11

can't even talk. Algorithm the right way. Yeah. And Yeah. You know,

Speaker: 00:04:15

maybe we can get the so I think it's a good segue, for our

Speaker: 00:04:19

guest. Doctor Yossi, Keshet. He's the chief

Speaker: 00:04:22

scientist at AIOLA, an AI powered tech

Speaker: 00:04:26

company that automates business workflows

Speaker: 00:04:30

by capturing spoken data. Yossi is also

Speaker: 00:04:33

an associate professor at the Faculty of Electrical and Computer

Speaker: 00:04:37

Engineering at the Technion in Israel.

Speaker: 00:04:41

Yossi is an award winning scholar and has published over a 100 research

Speaker: 00:04:44

papers about automated speech recognition and speech

Speaker: 00:04:48

synthesis. Welcome to the show, Yossi. Hi.

Speaker: 00:04:51

Nice for having me. Thank you for having me. Hey. No problem. No

Speaker: 00:04:55

problem. We are very excited to have you. And, you're not just an

Speaker: 00:04:59

academic, but you've also proven yourself in in actual enterprise. So

Speaker: 00:05:04

which sounds really bad as I say that out loud, but I think you knew

Speaker: 00:05:06

there was a compliment.

Speaker: 00:05:12

But, so what is AIOLA?

Speaker: 00:05:16

Can you tell me a little bit about that? Because I'm curious about that and

Speaker: 00:05:19

and and workflows

Speaker: 00:05:23

around spoken data. So

Speaker: 00:05:27

Iola is a company that is aimed to target

Speaker: 00:05:30

the, you know, the very basic and foundational

Speaker: 00:05:34

industries. Maybe if I

Speaker: 00:05:38

may, let's start with the a general scene of the

Speaker: 00:05:42

automatic speech recognition now, and then you will understand where are YOLA stands because we

Speaker: 00:05:45

have now open AI and everything is like we you

Speaker: 00:05:49

can say we solve the AI problem. So it's not like that.

Speaker: 00:05:53

So we are in a in a amazing shape in in

Speaker: 00:05:57

terms of automatic speech recognition. So we we have a paper that shows

Speaker: 00:06:01

that whisper, the model of OpenAI, is as good as humans in

Speaker: 00:06:04

detecting and transcribing language when we speak about

Speaker: 00:06:08

American English with noise, without noise, and

Speaker: 00:06:12

also, l 2 speakers. That is the

Speaker: 00:06:15

speakers of non non native American speakers of the

Speaker: 00:06:19

language. And the the results are so whisper. The

Speaker: 00:06:23

OpenAI model is the same as human listeners. And that is

Speaker: 00:06:26

the main thing. But the thing is that

Speaker: 00:06:30

when you come to industries, usually they have jargon, they have special words.

Speaker: 00:06:35

And and those words are either rare in

Speaker: 00:06:38

their language or they they they are not none

Speaker: 00:06:42

word. It's like I don't know. I when I'm a medical doctor and would like

Speaker: 00:06:46

to make a surgery surgery and I would like to transcribe what I'm saying during

Speaker: 00:06:49

the surgery. I'm there isn't words that which are not

Speaker: 00:06:53

often used or which are none, non English words. And

Speaker: 00:06:57

in that case, those, automatic speech recognizer doesn't

Speaker: 00:07:00

work at all. They don't detect those words. And in Ayala, this

Speaker: 00:07:04

is our target to take those words, which are actually the most important word. Those

Speaker: 00:07:08

are the jargon of the of the industry of the of the facility.

Speaker: 00:07:13

So the goal is to help those industries to come

Speaker: 00:07:17

up with the with the automatic speech recognition for

Speaker: 00:07:21

reporting for transcribing speech.

Speaker: 00:07:25

I have a question. When you say automatic, what what makes it automatic? Is

Speaker: 00:07:29

it just kinda, what exactly does that mean?

Speaker: 00:07:34

So automatic speech recognition today works very similar

Speaker: 00:07:38

very, very similar to the way KJGPT works.

Speaker: 00:07:41

KJGPT works on a model called transformer. It's an, deep

Speaker: 00:07:45

learning architecture, which has, a

Speaker: 00:07:49

history based on previous recurrent architectures.

Speaker: 00:07:53

And it can predict, as as we all know, it can

Speaker: 00:07:56

predict text amazingly. In speech recognition, automatic

Speaker: 00:08:00

speech recognition, it's almost the same thing, but there is another

Speaker: 00:08:04

component, to the to the to the

Speaker: 00:08:08

this transformer, which is which is called encoder.

Speaker: 00:08:12

This this part take the speech and actually transfer it to

Speaker: 00:08:15

a great representation that can be used

Speaker: 00:08:19

with this, with this, let's call it with this with the other side, with

Speaker: 00:08:23

this, GPT together. Together, they can,

Speaker: 00:08:27

transcribe speech in, as I described, in a very good

Speaker: 00:08:30

way, as good as humans in some

Speaker: 00:08:33

cases. I will say, like,

Speaker: 00:08:37

I've been messing around with the app that's on the phone,

Speaker: 00:08:41

for, chat g p chat gbt, and,

Speaker: 00:08:45

I use the the voice interaction feature. It is

Speaker: 00:08:49

amazingly good at getting rid of the umms, the ahs,

Speaker: 00:08:52

the scatterbrain thoughts that I sometimes have when I talk to it.

Speaker: 00:08:56

Like, it it could kinda really distill a lot of

Speaker: 00:09:00

things. Like, I'm impressed with it. It's it's really gotten last time I

Speaker: 00:09:03

did anything serious with speech recognition was probably, like, maybe 4 years

Speaker: 00:09:07

ago, and it's really improved. Like, I mean, orders of magnitude

Speaker: 00:09:11

than I thought. I mean, it's it's it's it's almost at Star Trek level. You

Speaker: 00:09:14

know? I'm not sure

Speaker: 00:09:18

in those it depends on the company if it's Apple or

Speaker: 00:09:21

Google. And I'm not sure which they don't declare

Speaker: 00:09:25

which models they use. I think, personally, they don't use this whisper or

Speaker: 00:09:29

the latest model that we have for automatic speech recognition that

Speaker: 00:09:32

is transcribing speech. And the goal is a little bit different

Speaker: 00:09:36

in the in the phone. You actually want to maybe Right. Make,

Speaker: 00:09:40

make notes, send an email, send a text message,

Speaker: 00:09:44

and maybe the vocabulary the vocabulary is less

Speaker: 00:09:48

less defined. There is another problem with

Speaker: 00:09:51

the phones. Oh, no. Go ahead. I want to call my

Speaker: 00:09:55

friend. His name is xi, and

Speaker: 00:09:59

the last name is CHUNG. How do you pronounce it?

Speaker: 00:10:03

What what do you do with that? I'm gonna say he or chi or

Speaker: 00:10:07

so there is a there is a problem of proper name and how do you

Speaker: 00:10:10

define them. And this is a completely different problem. It's still an open problem, and

Speaker: 00:10:14

the goal is a little bit different. So

Speaker: 00:10:18

it's when we assessing the quality of those models, it's

Speaker: 00:10:22

a little bit different than the assessment of just spoken language

Speaker: 00:10:26

like what we do now. No. I mean, that's a great point. I mean, my

Speaker: 00:10:30

last name has, you know, technically is Lavin.

Speaker: 00:10:34

But, you know, growing up for for reasons many,

Speaker: 00:10:38

big and small, it became Lavinia. And like, so, like,

Speaker: 00:10:42

the phone, depending on if it's Android or an Apple, it will, it

Speaker: 00:10:46

will he gets confused pretty easily.

Speaker: 00:10:50

And that is an interesting point. Some names, Andy is lucky to have an

Speaker: 00:10:54

easy name for the, the system.

Speaker: 00:10:58

But not everybody does. So I understand that. Sure.

Speaker: 00:11:02

I also wanna double click on American

Speaker: 00:11:06

English. You you you said that a bunch of times. Like, is there is there

Speaker: 00:11:09

an inherent bias in these model trainings because these are done by American

Speaker: 00:11:13

companies? Yes. There is. Okay. The

Speaker: 00:11:17

day the data is mostly of American English. The research institutes

Speaker: 00:11:21

are mostly American. So the reason maybe I don't know

Speaker: 00:11:24

if you'd call it you call it inherent or implicit bias, but there is a

Speaker: 00:11:28

bias, definitely.

Speaker: 00:11:33

We are investigating, by the way, the the intelligibility

Speaker: 00:11:37

of speech in some cases And what is the intelligibility of

Speaker: 00:11:40

of American listener versus the inter intelligibility of

Speaker: 00:11:44

myself, which I'm not American listener, but I I know English.

Speaker: 00:11:48

What is the best, what is the best, double quote speaker? What is the best

Speaker: 00:11:51

listener? How can we transform those

Speaker: 00:11:57

to speech recognizer? How can we transform those to assessing the

Speaker: 00:12:01

quality of speech? What does it mean? What does it mean about the pathologies in

Speaker: 00:12:04

speech? And this is ongoing research on

Speaker: 00:12:08

this on this field. Interesting.

Speaker: 00:12:12

I I often wonder, like, you know, what it's not just English.

Speaker: 00:12:16

Right? Like, you know, if you listen to Spanish, like, there's different dialects of

Speaker: 00:12:19

Spanish. Right? Even even German. You know, I'm sure

Speaker: 00:12:23

there's, you know, plenty of dialects of all these languages and,

Speaker: 00:12:26

like, how do you the training of a

Speaker: 00:12:30

model that where it can get to be as good at

Speaker: 00:12:33

understanding x and x versus x and y versus, you know,

Speaker: 00:12:37

the base language, the base standard. I don't know. That's

Speaker: 00:12:41

fascinating. It seems like it seems like it could be an endless loop of, like,

Speaker: 00:12:45

training. It it is. Indeed, it

Speaker: 00:12:48

is. And when we train, there is another so I'm I'm

Speaker: 00:12:52

working on deep learning and AI. And what we found out

Speaker: 00:12:55

that it it may it may be the case that if you train

Speaker: 00:12:59

on 1 language, huge amount of data from 1 language, let's say

Speaker: 00:13:03

American English, but then train on less data on Spanish,

Speaker: 00:13:07

you actually get you get some advantage of training from

Speaker: 00:13:11

from the American English. So, again, in this modern whisper of

Speaker: 00:13:14

OpenAI, most of the data is American English, but,

Speaker: 00:13:18

actually, other languages are really great.

Speaker: 00:13:22

Again, Spanish is amazing. So maybe like

Speaker: 00:13:26

humans maybe like humans as we learn more and more languages, it's easier

Speaker: 00:13:29

for us. This is very interesting, point.

Speaker: 00:13:33

No. That's an interesting idea because I know, like, I never

Speaker: 00:13:37

understood American English grammar, American or otherwise,

Speaker: 00:13:41

until I studied a foreign language. And then when I studied it, it was German.

Speaker: 00:13:45

And, you know, German kept a lot of the archaic things that

Speaker: 00:13:49

are in English and kept them and kept make kept them,

Speaker: 00:13:53

made continue to keep them important. Like in English, you know, who

Speaker: 00:13:57

and whom used to confuse the you know what out of me.

Speaker: 00:14:01

Right? But when I when I learned in German about different cases and things

Speaker: 00:14:04

like that, I was like, oh, that's why it is. Right? So,

Speaker: 00:14:08

like, all these things that just like you said, like, learning another

Speaker: 00:14:12

having more data or data from another point of view, I suppose,

Speaker: 00:14:16

or another way to look at the world help me look at my world

Speaker: 00:14:20

a little better. Maybe maybe that's how

Speaker: 00:14:24

AI will work too. I don't know.

Speaker: 00:14:28

Maybe. We don't know. We we actually have a guess about that

Speaker: 00:14:32

because it those networks actually solve an optimization problem,

Speaker: 00:14:35

mathematical optimization problem. It's a problem that

Speaker: 00:14:40

that is, we define it with equation, and we need to have

Speaker: 00:14:44

a computer running and solve it. The equation is

Speaker: 00:14:48

overtraining set of examples. So it's 1

Speaker: 00:14:51

1 person say that, another person said something else.

Speaker: 00:14:55

And what happened is that when, again, when we have

Speaker: 00:14:59

a large amount of data,

Speaker: 00:15:03

it seems that those those networks get to an amazing place.

Speaker: 00:15:07

So this this, algorithm, this whisper or other

Speaker: 00:15:10

algorithms, it's really from the recent years, like 2, 3 years.

Speaker: 00:15:14

That's it. We it's they they perform amazingly

Speaker: 00:15:18

amazingly, with the with the

Speaker: 00:15:22

same with the same mechanism, not with the same amount of

Speaker: 00:15:25

data. Yeah. That's that's that's the

Speaker: 00:15:29

fascinating aspect of all of this. It's just that some of these things just seem

Speaker: 00:15:33

some problems seem harder than they ought to be,

Speaker: 00:15:37

and then some solutions to problems seem way more effective than they

Speaker: 00:15:41

ought to be. It's an interesting also to say

Speaker: 00:15:45

it's always the case that we so Whisper, OpenAI Whisper, was trained

Speaker: 00:15:49

on 600000 hours of speech. But this is

Speaker: 00:15:53

way, way much more than just a kid learning a language.

Speaker: 00:15:56

Kid language learning a language exposed to way much less hours of

Speaker: 00:16:00

speech, less less accurate, less,

Speaker: 00:16:04

coherent. And this is something,

Speaker: 00:16:08

Nom Chomski raised years ago, like, 50 years ago.

Speaker: 00:16:12

And it's still an open question. Like, if we can make those

Speaker: 00:16:16

system works better, if we know the language,

Speaker: 00:16:22

I guess you learn German faster than any

Speaker: 00:16:25

machine that works today.

Speaker: 00:16:30

That's yeah. It's it's and I'm glad you mentioned Noam

Speaker: 00:16:34

Chomsky because that kinda was like so for those who don't know, Noam

Speaker: 00:16:37

Chomsky is, among other things, a noted linguist scholar.

Speaker: 00:16:42

I highly recommend you do a search on him because that's a that's a

Speaker: 00:16:46

good Wikipedia rabbit hole to fall into. But,

Speaker: 00:16:50

how much does linguistics come up in this? Right? Because I think

Speaker: 00:16:54

what's fascinating about this field for me is a lot

Speaker: 00:16:57

of, my grandfather, my great grandfather

Speaker: 00:17:01

was a a linguistic professor. And, you know, as the

Speaker: 00:17:05

family lore goes, I never met him. He died decade or 2 before I was

Speaker: 00:17:08

born. He spoke, like, 12 languages. He was a professor of, like, 5

Speaker: 00:17:12

or 6. And, you know, a lot of people in my family

Speaker: 00:17:16

seem to have on that side of the family seem to be gifted in language.

Speaker: 00:17:20

And 1 of the fields I was tempted to to study in

Speaker: 00:17:23

university was linguistics. And I just find

Speaker: 00:17:27

it interesting how there's

Speaker: 00:17:31

a now a Venn diagram now is much larger

Speaker: 00:17:35

than it used to be in terms of linguistics and computer science.

Speaker: 00:17:38

So what are your thoughts on? Like, how much does like,

Speaker: 00:17:42

if you're if you have a

Speaker: 00:17:46

company like AIO. Right? Like, how many people are, you know, honest to

Speaker: 00:17:50

goodness, linguists versus computer scientists and and AI engineers?

Speaker: 00:17:55

So there is there is no no linguists there. Oh,

Speaker: 00:17:59

really? Okay. There are no linguists. But I have to tell you, so there was

Speaker: 00:18:02

a professor called Freddie Frederick, Jelinek. He was the

Speaker: 00:18:06

head of language, research at the John Hopkins University

Speaker: 00:18:10

at Baltimore. He was amazing. He was 1 of the smartest,

Speaker: 00:18:14

people on earth. And he said he was

Speaker: 00:18:18

developed many of the speech recognition algorithms. He said,

Speaker: 00:18:22

every time I fire a linguist, the performance of speech recognizer goes

Speaker: 00:18:26

up.

Speaker: 00:18:32

And this is, this is embarrassing. But I've been I

Speaker: 00:18:36

made myself, 1st, really like

Speaker: 00:18:40

linguistics. I really like cognitive sciences, and I really

Speaker: 00:18:44

try to combine it with with my work. But it's really

Speaker: 00:18:47

amazing that we don't have all those AI system

Speaker: 00:18:51

don't have any of that. So you don't train CEGPT

Speaker: 00:18:55

to what is a noun, what is a verb, what is anything. You don't train

Speaker: 00:18:59

speech that this is the

Speaker: 00:19:02

this is the you don't you don't use linguist. You don't use this is

Speaker: 00:19:06

the prominent word. This is the end of the sentence. It just happened

Speaker: 00:19:10

by huge amount of data. And

Speaker: 00:19:14

this is interesting. This is somehow contradict Noam Chomsky who said that

Speaker: 00:19:17

there there is a universal grammar. There is a

Speaker: 00:19:21

we are born innate with language. There is a

Speaker: 00:19:24

maybe some black box in our brain which

Speaker: 00:19:28

is tuned to learn a language. And,

Speaker: 00:19:33

we are not sure about that. There is no direct proof if it's correct or

Speaker: 00:19:37

no. We are born with language. We are as humans, we're

Speaker: 00:19:40

born with language. We this is part of our, human being.

Speaker: 00:19:44

We are not born with written language. So written language was invented.

Speaker: 00:19:48

The spoken language is something like like a zebra

Speaker: 00:19:52

has stripes. This is this is our nature, and this is

Speaker: 00:19:56

interesting. This is not happening not happening in

Speaker: 00:19:59

AI. The best success that didn't have linguist, they don't have any

Speaker: 00:20:03

restriction of what should be say or not.

Speaker: 00:20:10

Maybe maybe AI will be a tool to somehow

Speaker: 00:20:15

make the linguist research more effective and

Speaker: 00:20:18

try to understand what happened in the brain, what happened in the cognition part.

Speaker: 00:20:23

But I would like to tell you about another research we are preparing here, which

Speaker: 00:20:27

is really amazing. 1 of the thing is that we have

Speaker: 00:20:31

so there is this JGPT. It's a language model.

Speaker: 00:20:35

We also have something in the brain. It's also neural network.

Speaker: 00:20:38

And we when we try to compare them, there is a huge

Speaker: 00:20:42

correlation between the the what happened in the artificial neural

Speaker: 00:20:46

network of GPT and the neural

Speaker: 00:20:50

biological neural network in the brain. And, it was

Speaker: 00:20:54

shown, several years ago, and here we

Speaker: 00:20:57

show it again with, with this, with the most modern,

Speaker: 00:21:01

automatic speech recognizers. So this is

Speaker: 00:21:05

a phenomenal post correlation between the artificial and the

Speaker: 00:21:09

neural mechanisms. I was gonna ask about that

Speaker: 00:21:13

because I'm I'm familiar with, you know, at least the abstracts of

Speaker: 00:21:17

the research, from a few years ago and now. And

Speaker: 00:21:20

I was curious if there had been any new correlations

Speaker: 00:21:24

or, you know, or new research, new connections that have been made

Speaker: 00:21:28

between machines learning languages

Speaker: 00:21:32

and the way our brains work. It sounds like

Speaker: 00:21:36

that's true.

Speaker: 00:21:39

So we try to we just initiate, man,

Speaker: 00:21:43

a research here in my lab about that. There was

Speaker: 00:21:48

some French guys from, mainly King

Speaker: 00:21:52

and his colleague at, Meta. And

Speaker: 00:21:57

and I forgot the university in France. So they

Speaker: 00:22:01

show that there is those correlation. They show simple correlation. What we

Speaker: 00:22:05

they show it with LLM, with language model. What we show is a little bit

Speaker: 00:22:09

different. We show correlation with automatic speech

Speaker: 00:22:12

recognition. So we ask people under fMRI, under MRI.

Speaker: 00:22:16

They're we scan their brain at some

Speaker: 00:22:19

resolution, and we try to find correlation with their brain activity

Speaker: 00:22:23

during reading and during speaking aloud,

Speaker: 00:22:27

and ask what is the correlation with the the best model we know for

Speaker: 00:22:31

speech recognition. And then there are correlation.

Speaker: 00:22:35

I have to say that there is a mechanism in the transforming this

Speaker: 00:22:39

architecture of neural network. There is a mechanism called attention. This

Speaker: 00:22:42

mechanism allow those model to to have the connection between

Speaker: 00:22:46

worlds and themselves. So, I'm eating an

Speaker: 00:22:50

apple. It was delicious. So it refers to the apple.

Speaker: 00:22:54

Okay? So there is attention mechanism. This what makes those

Speaker: 00:22:57

model amazing. So there is attention mechanism, I guess, in the

Speaker: 00:23:01

brain. So we try to correlate the this attention mechanism in

Speaker: 00:23:04

the models and compare it to what the activity in the brain. We don't have

Speaker: 00:23:08

results yet, but it seems promising. And we also ask

Speaker: 00:23:12

another question. What if you don't read aloud? What if you read

Speaker: 00:23:15

like silent reading? What if you have dyslexia? What if you have,

Speaker: 00:23:19

other type of, pathology? What

Speaker: 00:23:23

what are the correlation then? So this is fascinating. So and

Speaker: 00:23:27

there is correlation. I don't I don't know still what what's going to happen

Speaker: 00:23:31

with that. But I I know the pathologist, but it's unbelievable, the

Speaker: 00:23:34

correlation. That that is really exciting,

Speaker: 00:23:38

especially when you're examining things like dyslexia,

Speaker: 00:23:41

which is considered, you know, not normal,

Speaker: 00:23:45

or maybe that's not the right term for it, but a

Speaker: 00:23:48

challenge at a minimum. The cool the cool kids call that neurodivergent

Speaker: 00:23:52

now. I think Neurodivergent. Thank you, Frank. So when you're studying, you

Speaker: 00:23:56

know, when you're studying that sort, I'm wondering if there's a place for

Speaker: 00:24:00

that, in in the artificial.

Speaker: 00:24:04

I'm curious. What what do you mean? Can you

Speaker: 00:24:08

So, yeah, is there is is there any benefit

Speaker: 00:24:12

to, I say, transferring the thought processes

Speaker: 00:24:16

of people who are neurodivergent and and automating that

Speaker: 00:24:20

and making that part of the, you know,

Speaker: 00:24:23

the the language model or or speech recognition?

Speaker: 00:24:29

Yeah. I think so. I think so. 1st, it's a it's a tool

Speaker: 00:24:33

to to an to analyze what happened in the

Speaker: 00:24:36

brain. Yeah. What happened

Speaker: 00:24:40

but it's very difficult. So we don't, we don't have any debugger for the build

Speaker: 00:24:44

the brain. We don't see the code of the brain. We don't see that this

Speaker: 00:24:47

function doesn't work. And it's, most of the work

Speaker: 00:24:51

is to design the experiment and

Speaker: 00:24:55

and it's really amazing. In our design, we have the

Speaker: 00:24:58

same so as yet as I told you, I'm asking people to read aloud

Speaker: 00:25:02

and compare it to what automatic speech recognition,

Speaker: 00:25:06

is plan is, supposed to do. But I'm

Speaker: 00:25:09

also asking people to read silently, and then I follow

Speaker: 00:25:13

their eyes. I have a make a make a machine that follows their eyes, and

Speaker: 00:25:17

I know where where is the where like, III

Speaker: 00:25:20

track their eyes and I see which wall they are reading

Speaker: 00:25:24

now. And I can and I can use that to follow

Speaker: 00:25:28

what what they read. But in order to operate that on a speech

Speaker: 00:25:32

recognizer model, I need the speech. So it's during the design of

Speaker: 00:25:35

the experiment, I need artificial speech or I need them to to read aloud

Speaker: 00:25:39

afterwards. It's a it's a big, it's a big question

Speaker: 00:25:43

how to do that properly and how to

Speaker: 00:25:46

make things happen, but definitely walking with

Speaker: 00:25:50

people with, with problems first to help them.

Speaker: 00:25:55

And second, to understand them. And 3rd, to maybe make

Speaker: 00:26:00

understand the brain and make, AI better.

Speaker: 00:26:04

I also think, like, stroke victims, right, could benefit down the line

Speaker: 00:26:07

from a better understanding of lang language models. Right? Like, maybe there would be some

Speaker: 00:26:11

kind of therapy that could be directed to that. I think I think it's

Speaker: 00:26:15

fascinating. I always love those fields where they touch upon more than 1 thing.

Speaker: 00:26:19

Right? This isn't just math. This isn't just computer science. Like, it's linguistics. But,

Speaker: 00:26:23

you know, it's a little bit of everything. It's like a giant, like, pot of

Speaker: 00:26:26

stew that you just throw a bunch of stuff in, and it all kind of

Speaker: 00:26:28

mixes. And, like, it's kind of like, almost like intellectual gumbo,

Speaker: 00:26:32

I guess, would be the word. Right? But,

Speaker: 00:26:37

what what,

Speaker: 00:26:42

what drove you to make, your your your

Speaker: 00:26:45

your company? Like, what what was the driving force to

Speaker: 00:26:49

say, hey. You know, we have

Speaker: 00:26:54

I remember many, many years ago in an office, and you would always see

Speaker: 00:26:57

doctors talking into these little, like, miniature recorders.

Speaker: 00:27:01

Right? In the olden days, they would go off to

Speaker: 00:27:05

some data center somewhere and somebody would not data center, but, like,

Speaker: 00:27:08

some piping center, call center where people would

Speaker: 00:27:12

transcribe that. You know, obviously, that is now an artifact of

Speaker: 00:27:16

the past as these models have gotten better.

Speaker: 00:27:22

What what was the goal in in in, your

Speaker: 00:27:25

company to say we can do this better? What what was the the that breakthrough

Speaker: 00:27:29

moment of, like, here's here's what the industry already does. Here's how we can do

Speaker: 00:27:33

it better. So there is

Speaker: 00:27:36

so we all know Check GPT, and it influence our life. We search now

Speaker: 00:27:40

instead of Google, we search with GPT and it's amazing. It's unbelievable.

Speaker: 00:27:45

So I thought, what about the very fundamental industries? What

Speaker: 00:27:48

about,

Speaker: 00:27:52

like, when you check-in when you, check an airplane, you

Speaker: 00:27:56

use a special jargon. You cannot touch anything. You cannot

Speaker: 00:28:00

leave even a pen there because otherwise the the plane wouldn't be,

Speaker: 00:28:04

valid for flight. What about industries like the food

Speaker: 00:28:08

industries when you need to report, the process? You

Speaker: 00:28:12

have gloves, you cannot touch an iPad, you cannot barely

Speaker: 00:28:15

write. And what about, other industries

Speaker: 00:28:19

like, maybe the cheap technology when you make nanotechnologies and

Speaker: 00:28:23

when you make chips, you make, you know,

Speaker: 00:28:26

silicon chips and silicon

Speaker: 00:28:30

first. So you need you you are cover all.

Speaker: 00:28:34

You are with gloves. You need to report the process. It's a all

Speaker: 00:28:38

those industries has this have special jargons. They use special

Speaker: 00:28:41

terms to describe what they're doing. They don't have access to

Speaker: 00:28:46

to to write something,

Speaker: 00:28:51

and they are very limited in the way they they provide. And on the other

Speaker: 00:28:54

end, we had speech recognition, but speech recognition doesn't work on

Speaker: 00:28:58

those jargon world. Those jargon world are actually the

Speaker: 00:29:02

most important to those industries, and this was the goal for

Speaker: 00:29:05

Iola. So what we do is we operate,

Speaker: 00:29:08

automatic speech recognition, the best automatic speech recognition,

Speaker: 00:29:12

but we also operate something else. We also operate something called keyword spotting.

Speaker: 00:29:16

It's another deep network, which is focused

Speaker: 00:29:20

on detecting only the jargon words. So you can define those jargon

Speaker: 00:29:24

words in advance. You don't need to train them. You you can

Speaker: 00:29:28

define them, and it they all work together. They work like, as a

Speaker: 00:29:31

complimentary, couple to make a

Speaker: 00:29:36

very robust prediction, and we can detect those,

Speaker: 00:29:41

jargon words and make reporting on on on on the

Speaker: 00:29:44

process, without just by speaking. So it

Speaker: 00:29:48

can it can use in any industries,

Speaker: 00:29:51

any, industry that doesn't

Speaker: 00:29:55

have access to the most modern AI system, the speech

Speaker: 00:29:59

recognizer wouldn't walk there. They have problems, like,

Speaker: 00:30:03

writing and formulating their reports.

Speaker: 00:30:06

Yeah. So I'm curious how those work together. You mentioned

Speaker: 00:30:10

that you've got the speech recognizer. You've got the keyword,

Speaker: 00:30:15

engine. Are they 2 separate engines that are just always running

Speaker: 00:30:18

maybe agents, running at the same time or are

Speaker: 00:30:22

they encapsulated, say, is the speech

Speaker: 00:30:25

recognizer does the speech recognizer have a, you know, a

Speaker: 00:30:29

subset or a a function built into it to do the

Speaker: 00:30:33

keyword recognition? So just to

Speaker: 00:30:37

be sure, those keywords in some industries are not are

Speaker: 00:30:40

not are not English words. So it can be a word which nobody

Speaker: 00:30:44

knows about. It was not shown in the in

Speaker: 00:30:47

the, like, in the Internet, like, JGPT strain on the data over the

Speaker: 00:30:51

Internet. There are some walls that are not not there. This is

Speaker: 00:30:55

your, proprietary company. You have invented a wall to

Speaker: 00:30:58

describe what is the this, part of the engine. So

Speaker: 00:31:02

Yeah. So what we so we have this keyword spotting. It was it it

Speaker: 00:31:06

is trained to detect keyword in general. They are defined by,

Speaker: 00:31:10

by text and it operates. We have 2 model for preparation. 1 of them

Speaker: 00:31:13

works on the this encoder part of

Speaker: 00:31:17

the of the automatic speech recognition, and then it guides.

Speaker: 00:31:20

It's still the speech recognition towards the correct

Speaker: 00:31:25

transcription. And there is another mode, which is,

Speaker: 00:31:29

our self, encode our self representation of

Speaker: 00:31:32

speech, and then it also guides the automatic speech

Speaker: 00:31:36

recognition to a better, location and to detect those

Speaker: 00:31:39

words. And, actually, we can show that you can buy combine

Speaker: 00:31:43

any word can be from different languages, and we can

Speaker: 00:31:47

detect them, like, almost 100% correct, those jargon

Speaker: 00:31:50

words. That was that was going sorry. Go ahead.

Speaker: 00:31:55

No. No. No. Sorry. That no. That's okay. That that makes perfect

Speaker: 00:31:58

sense now, what you just said about the languages using

Speaker: 00:32:02

multiple languages, you know, English plus all of the

Speaker: 00:32:06

other languages because sometimes

Speaker: 00:32:09

people will struggle if their English as a second

Speaker: 00:32:13

language speaker. They'll struggle to find the right

Speaker: 00:32:16

English word, and they'll substitute a word from their native language.

Speaker: 00:32:20

And in other cases, they'll be perhaps teaching

Speaker: 00:32:25

on a topic, and they may revert back

Speaker: 00:32:28

to an older language, Greek, Latin, something

Speaker: 00:32:32

like that. That may be part of the, the

Speaker: 00:32:36

lecture or, you know, I could see that in

Speaker: 00:32:39

medicine. I could see it in, you know, all all sorts

Speaker: 00:32:43

of literature studies. I could see a lot of that. And that

Speaker: 00:32:47

that kinda clicked for me as you were saying that that makes sense that you

Speaker: 00:32:50

would have additional languages. Yeah. I also wonder, like, in in

Speaker: 00:32:54

also conversational context. Right? Like, you know, Spanglish is a

Speaker: 00:32:57

thing. Frankel is is the French and

Speaker: 00:33:01

English kinda mashed together, and I know that other language

Speaker: 00:33:05

whenever you have 2 groups of people kinda come together, like, you know, there's always

Speaker: 00:33:08

some kind of weird mix of language that that kinda

Speaker: 00:33:12

just evolves either naturally or forced. I mean, that's Right. That's another

Speaker: 00:33:16

debate. Are you thinking belt or creole? I know we're Belter, you know, I

Speaker: 00:33:20

wasn't going there, but that that's a that's an excellent example.

Speaker: 00:33:24

So, Yosie looks very confused. So so there's a series of

Speaker: 00:33:27

books, called The Expanse. It was an excellent TV show

Speaker: 00:33:31

for about 6 seasons, and it's basically set, 2,

Speaker: 00:33:35

300 years in the future.

Speaker: 00:33:38

And as humans colonize the asteroid belt,

Speaker: 00:33:42

their people from all over the world kinda all end up living

Speaker: 00:33:46

together. So, like, the the Belter Creole language is this is a

Speaker: 00:33:49

creole of, you know, literally dozens of languages. Right?

Speaker: 00:33:53

So, like, it'll switch from, you know, Hindi to Arabic to,

Speaker: 00:33:57

English to French to there's even some German in there. I've heard some of that.

Speaker: 00:34:01

Like, and there are these kind of these weird mixes of things. Right? So they'll

Speaker: 00:34:05

say the the word for the Belter people, like,

Speaker: 00:34:08

people live in the Belk, is Beltaloda. Belt obviously comes from, you

Speaker: 00:34:12

know, the asteroid belt English. Loda, I think is a Hindu term. I

Speaker: 00:34:16

think. Don't hate on me in the comments. Don't hate on me in the comments.

Speaker: 00:34:19

But, I know Walla is a is a is a Hindu term. Right? So

Speaker: 00:34:23

they'll they'll, you know, when they talk to people who live in the Earth or

Speaker: 00:34:26

Mars, they refer to them as well wallahs, gravity well

Speaker: 00:34:30

wallahs. Right? Like so it's like, and I only know wallah because

Speaker: 00:34:34

of dish wallahs, and Wired Magazine did a whole story about dish wallows in

Speaker: 00:34:38

the nineties. Anyway, but I mean, I think, like, you know, I

Speaker: 00:34:42

I suppose that approach could work for something like a creole. Right? Like, we have

Speaker: 00:34:46

multiple languages kinda mixed together. Or is that not really a

Speaker: 00:34:50

massive business case?

Speaker: 00:34:54

It's Creole is really complicated. It's a language. It's like real real a

Speaker: 00:34:57

real language, and it's complicated. This the the more

Speaker: 00:35:01

delicate cases of that, what we call in research, code switching when

Speaker: 00:35:05

I'm Right. When I speak Hebrew, for example, I don't have a

Speaker: 00:35:08

word for the, you know, the Internet router. So I say the router in

Speaker: 00:35:12

in English. Or I said email or I will say

Speaker: 00:35:17

I don't know. There are so many words in English that are used especially

Speaker: 00:35:21

in technology that you use worldwide in other languages, and this

Speaker: 00:35:24

is code switching. There is another case. I think Andy pointed it

Speaker: 00:35:28

out that sometimes when you are stressed

Speaker: 00:35:32

or let's say your l 1 is Spanish, but l 2 is American

Speaker: 00:35:36

English or you're bilingual. And sometimes when you are

Speaker: 00:35:39

stressed, you you just switch the the 1

Speaker: 00:35:43

word and it this is amazing phenomena. This is a research with Tamar Golang

Speaker: 00:35:47

from, University of San Diego and Matt Goldrick from Northwestern

Speaker: 00:35:51

University. And I provide, again, a mechanism to detect

Speaker: 00:35:55

that and to make research of that. And the the key question is,

Speaker: 00:35:58

like, why do you do that? Why do and when do you do that? Is

Speaker: 00:36:01

it stress? What what what is the what is the state of

Speaker: 00:36:05

describing those? Are you gonna describe it in the American

Speaker: 00:36:09

way, the Spanish word, or is it gonna be vice

Speaker: 00:36:13

versa? And this is really interesting.

Speaker: 00:36:18

It's not my field of research. I just know how to detect them

Speaker: 00:36:22

and, and Interesting. To detect them really well,

Speaker: 00:36:26

but I don't know why it happens and what is the mechanism

Speaker: 00:36:29

behind that. I could definitely see,

Speaker: 00:36:35

the opportunity with starting with being

Speaker: 00:36:38

able to detect, you know, these I

Speaker: 00:36:42

don't I don't know the right word for them. I'll I'll call them modes. You

Speaker: 00:36:46

know, a mode of speech where someone is mixing 2

Speaker: 00:36:49

languages. And I'm sure those vary.

Speaker: 00:36:53

So Like when I go Jersey on you. Right? That's we we

Speaker: 00:36:57

can't we can't say any more about that, Frank. We're trying to keep our

Speaker: 00:37:00

clean rating. But yes. Exactly. But,

Speaker: 00:37:05

that's sorry. Inside, Joe. But the,

Speaker: 00:37:08

but, yeah, I could see modes of speaking where someone who is

Speaker: 00:37:12

more familiar with English as a second language.

Speaker: 00:37:16

And and they've still you know, of course, they know their native language. They'll always

Speaker: 00:37:20

know that. But as they I don't I don't wanna use the wrong word

Speaker: 00:37:23

here, but I'm thinking experience is probably the best word is they get more

Speaker: 00:37:27

experience, gain more experience with their second language.

Speaker: 00:37:31

They may switch words less or switch languages

Speaker: 00:37:35

less. And detecting that, I think, is the

Speaker: 00:37:38

is key. I understand now more about what what you're doing, what

Speaker: 00:37:42

you're accomplishing. And that that's the

Speaker: 00:37:46

very first step to then being able to produce speech

Speaker: 00:37:50

in those different modes. And that would be a

Speaker: 00:37:53

fascinating, you know, a fascinating accomplishment.

Speaker: 00:37:58

If you do, the more we can have. Machines

Speaker: 00:38:01

speak to us in the language that we're most familiar with, that,

Speaker: 00:38:05

of course, you know, is is almost there now, mostly

Speaker: 00:38:09

there right now, but have it be able to to speak to us in these

Speaker: 00:38:13

different modes where we where the machine switches where it's

Speaker: 00:38:17

back to our first language, you know, based

Speaker: 00:38:20

on some algorithmic calculation. That sounds

Speaker: 00:38:24

fascinating. Yeah. It is.

Speaker: 00:38:27

I'm not sure we are there yet. It's we have a long way to go

Speaker: 00:38:31

there. But, Sure. Yeah. Makes

Speaker: 00:38:34

sense. Fascinating. Well, this is how it starts, though. Right?

Speaker: 00:38:41

This is fascinating. This is, yeah, this is,

Speaker: 00:38:45

somehow there is an elephant in the room. There we may have to say

Speaker: 00:38:48

something about AI and their regulation and what happens now.

Speaker: 00:38:53

And, if I may, I would like to say something about this because I have

Speaker: 00:38:56

a deep totally different point of view about that.

Speaker: 00:39:01

Please. So everybody is speaking about

Speaker: 00:39:05

regulation and it might be a catastrophic situation

Speaker: 00:39:10

if those, machine are connected

Speaker: 00:39:13

together and they start to train themselves. They try to

Speaker: 00:39:17

build a meta architecture and try to train themselves,

Speaker: 00:39:21

and then they come up with something which is better than human. Some some people

Speaker: 00:39:24

call it the singularity point. So this is frightening. They're smarter

Speaker: 00:39:28

than us. Maybe they they're gonna kill us all. And

Speaker: 00:39:33

people say now people speak about regulation now, and there are

Speaker: 00:39:36

several institutes in Europa, in Europe and in, the US

Speaker: 00:39:40

trying to tackle that. And that

Speaker: 00:39:44

is amazing. That is really important, but I think we missed something here.

Speaker: 00:39:49

And I'll tell you why. So the so there is a book. It's here.

Speaker: 00:39:53

You know, Isaac Asimov, I, Robot. You probably

Speaker: 00:39:56

know that. So he, like, the first page of this book is like the 3

Speaker: 00:40:00

laws of robotic. A robot may not in in injury a

Speaker: 00:40:04

human being or through an interaction, allow human being to come to harm.

Speaker: 00:40:08

A robot must obey others and so on. So we have let's say

Speaker: 00:40:12

we have the regulation. AI cannot hurt humans. Okay?

Speaker: 00:40:16

But that doesn't enough. It's not good enough because if the AI is smart

Speaker: 00:40:20

enough, it will not do the I mean, it will

Speaker: 00:40:23

show us humans that it really obey the law

Speaker: 00:40:27

the laws, but it wouldn't. And this is frightening.

Speaker: 00:40:31

And here I suggest to look a little bit about the human morality

Speaker: 00:40:35

and what why human are have do they have laws? So we need to

Speaker: 00:40:39

think about, if I may, think about the

Speaker: 00:40:43

human psychology. In human psychology, we have a mechanism to obey law.

Speaker: 00:40:47

It's called the superego. It was embedded or defined by

Speaker: 00:40:50

Freud. So we have a mechanism that if we

Speaker: 00:40:55

if we doesn't we if we don't obey a law, we feel either

Speaker: 00:40:58

guilt or fear. And this mechanism was evolutionary.

Speaker: 00:41:02

So do we have a group of monkey? They obey

Speaker: 00:41:07

the the alpha monkey because they're frightened from him. They have some kind of

Speaker: 00:41:10

primitive superego. We obey the law because either we fight them from the

Speaker: 00:41:15

police or either we feel the guilt, we

Speaker: 00:41:18

we it's like the

Speaker: 00:41:23

those experiments that show that, there is, somebody,

Speaker: 00:41:26

left something on the table, and we don't take it because we feel guilt or

Speaker: 00:41:30

we feel something. So this is this mechanism, what

Speaker: 00:41:33

I claim, should be transferred to the

Speaker: 00:41:37

AI machine. This should be the regulation. So what is it superego? Superego

Speaker: 00:41:41

is a infrastructure for to be moral,

Speaker: 00:41:45

and we need a digital version for that for the this is the regulation we

Speaker: 00:41:48

need. We need the infrastructure to be moral in machine. And what it what

Speaker: 00:41:52

does it mean? So superego means that it's a little bit like

Speaker: 00:41:56

self harm, if I may. It's like we feel guilt. We feel something bad if

Speaker: 00:42:00

we do something not okay, if you're not obey the law.

Speaker: 00:42:04

So it's like a self destruction for AI machine. So AI machine,

Speaker: 00:42:07

if it doesn't obey the law, should feel something. It

Speaker: 00:42:11

cannot feel so. Right. It will distract itself. So this is my

Speaker: 00:42:15

claim. This is a book I'm writing, and this is something very fun fundamental.

Speaker: 00:42:19

We we all speak about this regulation, but I think it

Speaker: 00:42:22

it doesn't help just to to do standard

Speaker: 00:42:26

regulation. And if you if I may say another thing, the last thing is that

Speaker: 00:42:30

if you read the I, Robert, carefully, so

Speaker: 00:42:34

he speak there are several short stories there, and he speak about robots that

Speaker: 00:42:37

obey the law. And if you look carefully about those robots that

Speaker: 00:42:41

obey the law, the those robots have super all

Speaker: 00:42:45

all of them have have super ego. They feel guilt.

Speaker: 00:42:48

The the first story is about a robot that play with a girl,

Speaker: 00:42:52

and he feel guilt about winning all the time. So he let her win.

Speaker: 00:42:56

So he feels guilt. It means that it has superhego.

Speaker: 00:43:00

And then he feels frightened from the mother of the girl. And it's

Speaker: 00:43:04

really amazing. So I think, so

Speaker: 00:43:08

this book I'm trying to describe the psychological concept of superego

Speaker: 00:43:11

and then describe why it need to be more and how we can,

Speaker: 00:43:16

find a way to put it in regulation, like the the infrastructure

Speaker: 00:43:19

itself and not just lows.

Speaker: 00:43:23

That is a very interesting problem you're trying to solve.

Speaker: 00:43:27

Very important problem at that. Agreed. And

Speaker: 00:43:31

culturally, we speak, in the US, we have a saying that you

Speaker: 00:43:35

cannot legislate morality, which

Speaker: 00:43:38

legislate, regulate would be, you know,

Speaker: 00:43:42

synonyms. Exactly. Right? So Right. Right. And and legal code

Speaker: 00:43:46

is code. I I

Speaker: 00:43:49

definitely get what you're what you're saying. And I think it's super

Speaker: 00:43:53

important. You mentioned you were writing a book about this. Now

Speaker: 00:43:57

now now you have to tell me more because I wanna read this book.

Speaker: 00:44:00

Same. I'm in the process of looking

Speaker: 00:44:04

for an agent and it's, it's complicated. It's supposed

Speaker: 00:44:08

to be a popular book trying to explain the psychology of fraud.

Speaker: 00:44:12

What is, superego, ego, and the id,

Speaker: 00:44:16

and then describe what is the pathology? So we all have a pathology. So

Speaker: 00:44:20

you have the pathology of, it's called,

Speaker: 00:44:29

the, personalities criminal personality disorder. This

Speaker: 00:44:33

person will not have a super ego, ego ego. It's like Richard the

Speaker: 00:44:37

third from Shakespeare. He didn't have superego. He killed

Speaker: 00:44:40

his family and didn't feel guilt. So this wouldn't what's

Speaker: 00:44:44

going to happen with the with the with those machine. And then I

Speaker: 00:44:48

give some literature examples of,

Speaker: 00:44:51

what is a superego like from the, criminal and

Speaker: 00:44:55

punishment that that the guy killed the the

Speaker: 00:44:59

old lady, but he didn't he nobody,

Speaker: 00:45:02

caught him killing the lady. He murdered her. Nobody caught him, but he

Speaker: 00:45:06

still feel guilt. So he has a very, big

Speaker: 00:45:10

superego. And then we describe I describe, what happened in

Speaker: 00:45:13

other moral theories of human being, all of them connected to the

Speaker: 00:45:17

superego. And then I tried to describe a little bit how machine

Speaker: 00:45:21

learning is trained. Again, solving an optimization problem. And then I try

Speaker: 00:45:24

to describe how can we do superego with, how can we have

Speaker: 00:45:28

a digital superego if we can? No.

Speaker: 00:45:32

It's like you're giving it a conscience of of sorts. Exactly.

Speaker: 00:45:36

Yeah. And I I just wanted to, to add, we

Speaker: 00:45:40

may be able to help you. Maybe not find an

Speaker: 00:45:44

agent, but find a publisher. Both Frank and I are

Speaker: 00:45:47

published. And we, you know, we know Andy has a lot of

Speaker: 00:45:51

Andy's got a lot of connections in the publishing. Well That would be

Speaker: 00:45:54

great. I am I am not, I just wrote a lot of books

Speaker: 00:45:58

for different, publishing houses, and I know some people that if

Speaker: 00:46:02

they can't help you directly, they can probably point you to someone who

Speaker: 00:46:05

can. And, again, I am wholly motivated by wanting to

Speaker: 00:46:09

read this book. Same. Like, I think it's important

Speaker: 00:46:13

because I live in the Washington DC area. Right?

Speaker: 00:46:16

So so, like, there's a lot of people there who they're policy

Speaker: 00:46:20

makers. Right? Like, and they just assume

Speaker: 00:46:24

and I think a lot of humans fall for this. Right? You you see this

Speaker: 00:46:27

when the European Union passed their AI regulation act.

Speaker: 00:46:31

They assume that regulation's gonna solve all their problems.

Speaker: 00:46:34

And I think regulations prove that 1 of the fundamental forces

Speaker: 00:46:38

in the universe is is unintended consequences.

Speaker: 00:46:42

And, you know, when you regulate something, you don't end

Speaker: 00:46:46

the problem. You change the way people will route around it. Right? Like,

Speaker: 00:46:50

and I think a good example of this in AI is the movie Megan, which

Speaker: 00:46:53

I don't know if you've seen, or m threagan. I'm not sure how to pronounce

Speaker: 00:46:56

it, where I think she was about to torture

Speaker: 00:47:00

she was I don't wanna give the plot away, but the the robot

Speaker: 00:47:04

child, Chucky, kinda goes evil, Like, this is the

Speaker: 00:47:07

basic kind of plot line, and the the the person who created her

Speaker: 00:47:11

was like, you can't kill me because it's against your programming. He goes, oh, I

Speaker: 00:47:14

said nothing about killing you. I was gonna put you in a coma, and you'll

Speaker: 00:47:16

live, you know, however many years. Like, it was just like I mean,

Speaker: 00:47:20

that's a great example of, like, she you know, don't kill. Right? Seems like a

Speaker: 00:47:23

pretty reasonable instruction to give a robot, particularly a child's toy.

Speaker: 00:47:28

They'll kill anyone. But, you know, she was realized, like, well, kill

Speaker: 00:47:32

equals death. So if I don't kill you, if I just hospitalize you or

Speaker: 00:47:35

incapacitate you, that doesn't conflict with rule number 1.

Speaker: 00:47:38

Right? Which I think is no. Obviously, as, you

Speaker: 00:47:42

know, humans, we're like, well, it's not really the spirit of the

Speaker: 00:47:46

law, or the rule. But clearly,

Speaker: 00:47:50

the robot or the AI in this case, kind of figured it

Speaker: 00:47:53

out. Like, I don't know. I think you're right. Like and any regulations like that

Speaker: 00:47:57

too. Right? How many loopholes do people discover, whether it's

Speaker: 00:48:01

tax laws or, you know, this. It's like, well, technically, it's

Speaker: 00:48:05

legal. Is it actually, you know,

Speaker: 00:48:09

what the law intended? No. Like, it's Yeah. You need a you need

Speaker: 00:48:13

almost an something like a Nuance engine,

Speaker: 00:48:16

you'll see to Yeah. To get the the

Speaker: 00:48:20

what the machine to interpret

Speaker: 00:48:24

to the laws. And that's I've read Asimov as well,

Speaker: 00:48:28

big fan. And that's what happens down stream of

Speaker: 00:48:31

the 3 laws as they begin to fail as because the

Speaker: 00:48:35

robots are doing exactly what they're programmed to

Speaker: 00:48:39

do. And they're not they're they're

Speaker: 00:48:43

finding ways that in our opinion, human opinion,

Speaker: 00:48:46

circumvents the 3 laws, but really doesn't

Speaker: 00:48:50

break the robot's programming. And it's all about, you know,

Speaker: 00:48:54

how do you define harm? Like, Frank's example is a great, you know,

Speaker: 00:48:58

great example of that. So, yeah,

Speaker: 00:49:01

fascinating stuff. Yeah. We gotta Awesome stuff. We gotta help you write this

Speaker: 00:49:05

book. I wanna read this book. Yeah. I want to raise

Speaker: 00:49:09

another point, but the opposite point that you raised. Like, what happened with

Speaker: 00:49:12

the autonomous car, for example, or people say,

Speaker: 00:49:18

let's let's let's focus on autonomous cars. So so there will be

Speaker: 00:49:21

autonomous car. Who is in charge of a of a car accident?

Speaker: 00:49:25

Accidentally, somebody was killed. You are the

Speaker: 00:49:29

owner you. Somebody is the owner of the car. He sits

Speaker: 00:49:33

there. He bought the car, but the car killed

Speaker: 00:49:36

somebody. So

Speaker: 00:49:40

who who this is an open problem. This is, again,

Speaker: 00:49:43

moral problem. So what I suggest here is

Speaker: 00:49:47

maybe it will take time,

Speaker: 00:49:51

I guess. Maybe the the car, if we can be the

Speaker: 00:49:54

superego and mechanism for morality, you know, the just

Speaker: 00:49:58

the infrastructure for morality can take the

Speaker: 00:50:02

morality of the human. And if somehow he

Speaker: 00:50:05

inherit the the the driver morality, you

Speaker: 00:50:09

can blame the driver. I'll give you another example, which will be much

Speaker: 00:50:13

more maybe concrete. So we say now that there will be change GPT for

Speaker: 00:50:17

every person, for every laptop and iPhone and whatever.

Speaker: 00:50:21

You will have your own GPT with your own life follows

Speaker: 00:50:24

your own history. And the discussion with this GPT will be, And the

Speaker: 00:50:28

discussion with this, GPT will be very personalized and

Speaker: 00:50:32

very helpful. What happened in that case? So in that

Speaker: 00:50:35

case, if this, GPT

Speaker: 00:50:39

will take your responsibilities and morality, somehow we

Speaker: 00:50:43

can copy your morality and be part of it. So if you're moral, it

Speaker: 00:50:47

will be moral. If you're not, you're not, but this is

Speaker: 00:50:50

your responsibility as a human. And I think this

Speaker: 00:50:54

is the way to to go with that. We need just the infrastructure and not

Speaker: 00:50:57

the the law. Anybody can define the low, and anybody

Speaker: 00:51:01

can break the low. We just need the infrastructure to know that

Speaker: 00:51:06

at least the machine to know that it break the broke the low.

Speaker: 00:51:11

And and this is really important. I I think

Speaker: 00:51:16

Oh, I totally agree. Totally agree. Well, we're

Speaker: 00:51:20

gosh. We're coming up on time, Frank. Yeah. This was

Speaker: 00:51:23

awesome. So we'll just any

Speaker: 00:51:27

book recommendations? Obviously, I, Robot, I think, would be good reading

Speaker: 00:51:30

in this space. You also mentioned Shakespeare too,

Speaker: 00:51:34

Richard the 3rd. So Eddie, you can book

Speaker: 00:51:38

which I'm which I'm reading now, which is the band,

Speaker: 00:51:41

Vernon Stuputeux. It's, it's

Speaker: 00:51:45

amazing. It's amazing. It's 3 books, and it's actually

Speaker: 00:51:49

discussed whatever which is not AI. Anything which cannot be solved with

Speaker: 00:51:52

AI. It's speak about a a person who has a vinyl shop,

Speaker: 00:51:57

shop to sell vinyl and then CD runs, and now we cannot sell

Speaker: 00:52:00

anything. So this shop is is closed, and then he

Speaker: 00:52:04

he he try to somehow manage, but he get up at the street. He's, like,

Speaker: 00:52:08

homeless, and he meets many people. And the way like,

Speaker: 00:52:12

every chapter is a different, person or

Speaker: 00:52:15

or a group of pair of people, and it's really

Speaker: 00:52:19

fascinating. It's all those things that you cannot solve with AI. It's all

Speaker: 00:52:22

the human interaction, the very, very basic human interaction. Amazing.

Speaker: 00:52:26

It won the Booker Prize in the, 2018.

Speaker: 00:52:32

Nice. Where can folks find out more about

Speaker: 00:52:35

you? So I have a website

Speaker: 00:52:39

under Joseph Keshet, and, and they

Speaker: 00:52:43

can find me there. Excellent.

Speaker: 00:52:47

Any parting thoughts, Andy? No. Just great great

Speaker: 00:52:50

interview. I appreciate that. 1, I would ask if you repeat the name of

Speaker: 00:52:54

the book you just mentioned about the the different stories.

Speaker: 00:52:58

What's the name of that book? It's not it's a it's a single

Speaker: 00:53:01

story. It's called the the pants,

Speaker: 00:53:06

for non subtext. It's from French. Oh, okay.

Speaker: 00:53:11

Amazing. Amazing. Amazing. Awesome. Excellent. That's it. That's

Speaker: 00:53:15

it for me. But that's great talk. Thank you. Excellent talk. Thank you.

Speaker: 00:53:18

And we'll let Bailey finish the show. Well, folks, that brings us to the end

Speaker: 00:53:22

of another enlightening episode of data driven. We've

Speaker: 00:53:26

navigated the fascinating intricacies of automatic speech

Speaker: 00:53:29

recognition, explored the moral quandaries of AI, and

Speaker: 00:53:33

pondered the future of technology with none other than 1 of the best minds

Speaker: 00:53:37

in the field, doctor Yossi Keshet. Remember, if you

Speaker: 00:53:40

enjoyed today's conversation, don't forget to subscribe to data

Speaker: 00:53:44

driven media TV for exclusive video content.

Speaker: 00:53:48

You can also grab some fantastic merch like the my data is the

Speaker: 00:53:52

new oil t shirt Andy's sporting today. And while Frank is

Speaker: 00:53:56

basking in the Appalachian sunshine, you can bet we're already cooking up the

Speaker: 00:53:59

next episode to keep your data driven minds engaged and entertained.

Speaker: 00:54:04

Until next time, stay curious, stay informed, and

Speaker: 00:54:08

always keep questioning. Cheerio.