Speaker:

Welcome back to another riveting episode of Data Driven.

Speaker:

Joining us today, lakeside and positively glowing from his

Speaker:

Appalachian retreat, is Frank. Meanwhile, the

Speaker:

always astute and ever energetic Andy is here to keep us

Speaker:

grounded. But enough about us. Today, we have

Speaker:

a true luminary in the field of AI, someone who's blending the worlds

Speaker:

of academia and enterprise with seamless finesse. He's an

Speaker:

associate professor at the Technion, has published over 100

Speaker:

research papers on automated speech recognition, and is the chief

Speaker:

scientist at Iola. Please welcome doctor Yossi

Speaker:

Keshet or as he's known to his friends, Yossi.

Speaker:

Alright. Hello, and welcome to Data Driven, the podcast where we explore the

Speaker:

emergent fields of artificial intelligence, data science, and,

Speaker:

and, of course, data engineering, without which the whole world would probably stop turning.

Speaker:

And you know, data engineering is important. That's

Speaker:

basically it. Still working on that that that revamped

Speaker:

monologue, for, for season 8, Andy. Were

Speaker:

you on vacation? You're on vacation. I am on vacation. And

Speaker:

for those of you who can't see on camera who are not who are

Speaker:

listening, not watching, I am literally lakeside,

Speaker:

in the foothills. Well, not the foothills. We are actually in the Appalachian Mountains. Or

Speaker:

is it Appalachian? I I never I I've heard of those. I I never

Speaker:

got a clear read on it. Say either. So, you know When I say either.

Speaker:

Yeah. Yeah. Yeah. Yeah. Yeah. So I am in Deep Creek Lake,

Speaker:

Maryland, which is kind of like, Maryland doesn't really have a Panhandle

Speaker:

per se, but if it did, it would be this is what this would be.

Speaker:

I probably think I'm 5 miles from West Virginia and about

Speaker:

20 miles from Pennsylvania. So it's kind of like this quiet

Speaker:

little corner of the state.

Speaker:

And I've been, you know, reading and studying

Speaker:

today. I hit day 600 on Pluralsight Consecutive. Nice.

Speaker:

So recording this June 17th. And, how

Speaker:

things with you, Andy? Things are good. I'm gonna throw out a plug for

Speaker:

data driven media dot tv because Frank mentioned.

Speaker:

If you're listening, he while he was mentioning that, he was

Speaker:

actually panning the camera over to the lake. But if

Speaker:

you're, subscribing to data driven media dot tv, you get

Speaker:

to see us. You get to see the video, and you

Speaker:

can see, for instance, that I am wearing the, my data is the

Speaker:

new oil t shirt, which you can pick up. I'm just full of

Speaker:

sponsor stuff today. I'm just doing Well, it's self out. It's

Speaker:

self sponsored. And, honestly, we really need to get better at that. Right? We have

Speaker:

data channel. Tv. There is a for listeners to the show, I will give

Speaker:

a preview. There is gonna be data driven academy is is launching soon. You have

Speaker:

a course coming up the end of the month. Actually, yeah, it's fabric.

Speaker:

Today. We're recording this on 17th. It's 24th

Speaker:

of of June, but I'm also doing, 2 more, at

Speaker:

near the ends of July August. And in addition

Speaker:

to that, while we're shameless plugging away here,

Speaker:

before we get to our very interesting guest, now I'm also bringing

Speaker:

back my, day of Azure Data Factory as wildly

Speaker:

popular. I delivered it at a couple of, conferences,

Speaker:

international conferences, 22, 23. And,

Speaker:

yeah. Let's see see if people are interested. What do you do Friday this

Speaker:

afternoon Friday afternoons, Andy? Oh, there's this thing, Frank. Thanks for

Speaker:

mentioning that. Totally free. We we gotta we're trying to get better at this. That's

Speaker:

all. We do. Yeah. Data engineering Fridays. And if you go to data engineering

Speaker:

fridays.com, you can learn more about that. Frank, you're doing a lot

Speaker:

of stuff with I noticed with using the, encore

Speaker:

replay feature in Restream. And it's

Speaker:

right you you shared that with me. I started doing that with data engineering

Speaker:

Fridays as well. But great a great way to,

Speaker:

you know, to get your message out there. And, you

Speaker:

know, I I had no idea replays would help. But my gosh.

Speaker:

They really have. It's just a matter of just hitting the echo of I

Speaker:

can't even talk. Algorithm the right way. Yeah. And Yeah. You know,

Speaker:

maybe we can get the so I think it's a good segue, for our

Speaker:

guest. Doctor Yossi, Keshet. He's the chief

Speaker:

scientist at AIOLA, an AI powered tech

Speaker:

company that automates business workflows

Speaker:

by capturing spoken data. Yossi is also

Speaker:

an associate professor at the Faculty of Electrical and Computer

Speaker:

Engineering at the Technion in Israel.

Speaker:

Yossi is an award winning scholar and has published over a 100 research

Speaker:

papers about automated speech recognition and speech

Speaker:

synthesis. Welcome to the show, Yossi. Hi.

Speaker:

Nice for having me. Thank you for having me. Hey. No problem. No

Speaker:

problem. We are very excited to have you. And, you're not just an

Speaker:

academic, but you've also proven yourself in in actual enterprise. So

Speaker:

which sounds really bad as I say that out loud, but I think you knew

Speaker:

there was a compliment.

Speaker:

But, so what is AIOLA?

Speaker:

Can you tell me a little bit about that? Because I'm curious about that and

Speaker:

and and workflows

Speaker:

around spoken data. So

Speaker:

Iola is a company that is aimed to target

Speaker:

the, you know, the very basic and foundational

Speaker:

industries. Maybe if I

Speaker:

may, let's start with the a general scene of the

Speaker:

automatic speech recognition now, and then you will understand where are YOLA stands because we

Speaker:

have now open AI and everything is like we you

Speaker:

can say we solve the AI problem. So it's not like that.

Speaker:

So we are in a in a amazing shape in in

Speaker:

terms of automatic speech recognition. So we we have a paper that shows

Speaker:

that whisper, the model of OpenAI, is as good as humans in

Speaker:

detecting and transcribing language when we speak about

Speaker:

American English with noise, without noise, and

Speaker:

also, l 2 speakers. That is the

Speaker:

speakers of non non native American speakers of the

Speaker:

language. And the the results are so whisper. The

Speaker:

OpenAI model is the same as human listeners. And that is

Speaker:

the main thing. But the thing is that

Speaker:

when you come to industries, usually they have jargon, they have special words.

Speaker:

And and those words are either rare in

Speaker:

their language or they they they are not none

Speaker:

word. It's like I don't know. I when I'm a medical doctor and would like

Speaker:

to make a surgery surgery and I would like to transcribe what I'm saying during

Speaker:

the surgery. I'm there isn't words that which are not

Speaker:

often used or which are none, non English words. And

Speaker:

in that case, those, automatic speech recognizer doesn't

Speaker:

work at all. They don't detect those words. And in Ayala, this

Speaker:

is our target to take those words, which are actually the most important word. Those

Speaker:

are the jargon of the of the industry of the of the facility.

Speaker:

So the goal is to help those industries to come

Speaker:

up with the with the automatic speech recognition for

Speaker:

reporting for transcribing speech.

Speaker:

I have a question. When you say automatic, what what makes it automatic? Is

Speaker:

it just kinda, what exactly does that mean?

Speaker:

So automatic speech recognition today works very similar

Speaker:

very, very similar to the way KJGPT works.

Speaker:

KJGPT works on a model called transformer. It's an, deep

Speaker:

learning architecture, which has, a

Speaker:

history based on previous recurrent architectures.

Speaker:

And it can predict, as as we all know, it can

Speaker:

predict text amazingly. In speech recognition, automatic

Speaker:

speech recognition, it's almost the same thing, but there is another

Speaker:

component, to the to the to the

Speaker:

this transformer, which is which is called encoder.

Speaker:

This this part take the speech and actually transfer it to

Speaker:

a great representation that can be used

Speaker:

with this, with this, let's call it with this with the other side, with

Speaker:

this, GPT together. Together, they can,

Speaker:

transcribe speech in, as I described, in a very good

Speaker:

way, as good as humans in some

Speaker:

cases. I will say, like,

Speaker:

I've been messing around with the app that's on the phone,

Speaker:

for, chat g p chat gbt, and,

Speaker:

I use the the voice interaction feature. It is

Speaker:

amazingly good at getting rid of the umms, the ahs,

Speaker:

the scatterbrain thoughts that I sometimes have when I talk to it.

Speaker:

Like, it it could kinda really distill a lot of

Speaker:

things. Like, I'm impressed with it. It's it's really gotten last time I

Speaker:

did anything serious with speech recognition was probably, like, maybe 4 years

Speaker:

ago, and it's really improved. Like, I mean, orders of magnitude

Speaker:

than I thought. I mean, it's it's it's it's almost at Star Trek level. You

Speaker:

know? I'm not sure

Speaker:

in those it depends on the company if it's Apple or

Speaker:

Google. And I'm not sure which they don't declare

Speaker:

which models they use. I think, personally, they don't use this whisper or

Speaker:

the latest model that we have for automatic speech recognition that

Speaker:

is transcribing speech. And the goal is a little bit different

Speaker:

in the in the phone. You actually want to maybe Right. Make,

Speaker:

make notes, send an email, send a text message,

Speaker:

and maybe the vocabulary the vocabulary is less

Speaker:

less defined. There is another problem with

Speaker:

the phones. Oh, no. Go ahead. I want to call my

Speaker:

friend. His name is xi, and

Speaker:

the last name is CHUNG. How do you pronounce it?

Speaker:

What what do you do with that? I'm gonna say he or chi or

Speaker:

so there is a there is a problem of proper name and how do you

Speaker:

define them. And this is a completely different problem. It's still an open problem, and

Speaker:

the goal is a little bit different. So

Speaker:

it's when we assessing the quality of those models, it's

Speaker:

a little bit different than the assessment of just spoken language

Speaker:

like what we do now. No. I mean, that's a great point. I mean, my

Speaker:

last name has, you know, technically is Lavin.

Speaker:

But, you know, growing up for for reasons many,

Speaker:

big and small, it became Lavinia. And like, so, like,

Speaker:

the phone, depending on if it's Android or an Apple, it will, it

Speaker:

will he gets confused pretty easily.

Speaker:

And that is an interesting point. Some names, Andy is lucky to have an

Speaker:

easy name for the, the system.

Speaker:

But not everybody does. So I understand that. Sure.

Speaker:

I also wanna double click on American

Speaker:

English. You you you said that a bunch of times. Like, is there is there

Speaker:

an inherent bias in these model trainings because these are done by American

Speaker:

companies? Yes. There is. Okay. The

Speaker:

day the data is mostly of American English. The research institutes

Speaker:

are mostly American. So the reason maybe I don't know

Speaker:

if you'd call it you call it inherent or implicit bias, but there is a

Speaker:

bias, definitely.

Speaker:

We are investigating, by the way, the the intelligibility

Speaker:

of speech in some cases And what is the intelligibility of

Speaker:

of American listener versus the inter intelligibility of

Speaker:

myself, which I'm not American listener, but I I know English.

Speaker:

What is the best, what is the best, double quote speaker? What is the best

Speaker:

listener? How can we transform those

Speaker:

to speech recognizer? How can we transform those to assessing the

Speaker:

quality of speech? What does it mean? What does it mean about the pathologies in

Speaker:

speech? And this is ongoing research on

Speaker:

this on this field. Interesting.

Speaker:

I I often wonder, like, you know, what it's not just English.

Speaker:

Right? Like, you know, if you listen to Spanish, like, there's different dialects of

Speaker:

Spanish. Right? Even even German. You know, I'm sure

Speaker:

there's, you know, plenty of dialects of all these languages and,

Speaker:

like, how do you the training of a

Speaker:

model that where it can get to be as good at

Speaker:

understanding x and x versus x and y versus, you know,

Speaker:

the base language, the base standard. I don't know. That's

Speaker:

fascinating. It seems like it seems like it could be an endless loop of, like,

Speaker:

training. It it is. Indeed, it

Speaker:

is. And when we train, there is another so I'm I'm

Speaker:

working on deep learning and AI. And what we found out

Speaker:

that it it may it may be the case that if you train

Speaker:

on 1 language, huge amount of data from 1 language, let's say

Speaker:

American English, but then train on less data on Spanish,

Speaker:

you actually get you get some advantage of training from

Speaker:

from the American English. So, again, in this modern whisper of

Speaker:

OpenAI, most of the data is American English, but,

Speaker:

actually, other languages are really great.

Speaker:

Again, Spanish is amazing. So maybe like

Speaker:

humans maybe like humans as we learn more and more languages, it's easier

Speaker:

for us. This is very interesting, point.

Speaker:

No. That's an interesting idea because I know, like, I never

Speaker:

understood American English grammar, American or otherwise,

Speaker:

until I studied a foreign language. And then when I studied it, it was German.

Speaker:

And, you know, German kept a lot of the archaic things that

Speaker:

are in English and kept them and kept make kept them,

Speaker:

made continue to keep them important. Like in English, you know, who

Speaker:

and whom used to confuse the you know what out of me.

Speaker:

Right? But when I when I learned in German about different cases and things

Speaker:

like that, I was like, oh, that's why it is. Right? So,

Speaker:

like, all these things that just like you said, like, learning another

Speaker:

having more data or data from another point of view, I suppose,

Speaker:

or another way to look at the world help me look at my world

Speaker:

a little better. Maybe maybe that's how

Speaker:

AI will work too. I don't know.

Speaker:

Maybe. We don't know. We we actually have a guess about that

Speaker:

because it those networks actually solve an optimization problem,

Speaker:

mathematical optimization problem. It's a problem that

Speaker:

that is, we define it with equation, and we need to have

Speaker:

a computer running and solve it. The equation is

Speaker:

overtraining set of examples. So it's 1

Speaker:

1 person say that, another person said something else.

Speaker:

And what happened is that when, again, when we have

Speaker:

a large amount of data,

Speaker:

it seems that those those networks get to an amazing place.

Speaker:

So this this, algorithm, this whisper or other

Speaker:

algorithms, it's really from the recent years, like 2, 3 years.

Speaker:

That's it. We it's they they perform amazingly

Speaker:

amazingly, with the with the

Speaker:

same with the same mechanism, not with the same amount of

Speaker:

data. Yeah. That's that's that's the

Speaker:

fascinating aspect of all of this. It's just that some of these things just seem

Speaker:

some problems seem harder than they ought to be,

Speaker:

and then some solutions to problems seem way more effective than they

Speaker:

ought to be. It's an interesting also to say

Speaker:

it's always the case that we so Whisper, OpenAI Whisper, was trained

Speaker:

on 600000 hours of speech. But this is

Speaker:

way, way much more than just a kid learning a language.

Speaker:

Kid language learning a language exposed to way much less hours of

Speaker:

speech, less less accurate, less,

Speaker:

coherent. And this is something,

Speaker:

Nom Chomski raised years ago, like, 50 years ago.

Speaker:

And it's still an open question. Like, if we can make those

Speaker:

system works better, if we know the language,

Speaker:

I guess you learn German faster than any

Speaker:

machine that works today.

Speaker:

That's yeah. It's it's and I'm glad you mentioned Noam

Speaker:

Chomsky because that kinda was like so for those who don't know, Noam

Speaker:

Chomsky is, among other things, a noted linguist scholar.

Speaker:

I highly recommend you do a search on him because that's a that's a

Speaker:

good Wikipedia rabbit hole to fall into. But,

Speaker:

how much does linguistics come up in this? Right? Because I think

Speaker:

what's fascinating about this field for me is a lot

Speaker:

of, my grandfather, my great grandfather

Speaker:

was a a linguistic professor. And, you know, as the

Speaker:

family lore goes, I never met him. He died decade or 2 before I was

Speaker:

born. He spoke, like, 12 languages. He was a professor of, like, 5

Speaker:

or 6. And, you know, a lot of people in my family

Speaker:

seem to have on that side of the family seem to be gifted in language.

Speaker:

And 1 of the fields I was tempted to to study in

Speaker:

university was linguistics. And I just find

Speaker:

it interesting how there's

Speaker:

a now a Venn diagram now is much larger

Speaker:

than it used to be in terms of linguistics and computer science.

Speaker:

So what are your thoughts on? Like, how much does like,

Speaker:

if you're if you have a

Speaker:

company like AIO. Right? Like, how many people are, you know, honest to

Speaker:

goodness, linguists versus computer scientists and and AI engineers?

Speaker:

So there is there is no no linguists there. Oh,

Speaker:

really? Okay. There are no linguists. But I have to tell you, so there was

Speaker:

a professor called Freddie Frederick, Jelinek. He was the

Speaker:

head of language, research at the John Hopkins University

Speaker:

at Baltimore. He was amazing. He was 1 of the smartest,

Speaker:

people on earth. And he said he was

Speaker:

developed many of the speech recognition algorithms. He said,

Speaker:

every time I fire a linguist, the performance of speech recognizer goes

Speaker:

up.

Speaker:

And this is, this is embarrassing. But I've been I

Speaker:

made myself, 1st, really like

Speaker:

linguistics. I really like cognitive sciences, and I really

Speaker:

try to combine it with with my work. But it's really

Speaker:

amazing that we don't have all those AI system

Speaker:

don't have any of that. So you don't train CEGPT

Speaker:

to what is a noun, what is a verb, what is anything. You don't train

Speaker:

speech that this is the

Speaker:

this is the you don't you don't use linguist. You don't use this is

Speaker:

the prominent word. This is the end of the sentence. It just happened

Speaker:

by huge amount of data. And

Speaker:

this is interesting. This is somehow contradict Noam Chomsky who said that

Speaker:

there there is a universal grammar. There is a

Speaker:

we are born innate with language. There is a

Speaker:

maybe some black box in our brain which

Speaker:

is tuned to learn a language. And,

Speaker:

we are not sure about that. There is no direct proof if it's correct or

Speaker:

no. We are born with language. We are as humans, we're

Speaker:

born with language. We this is part of our, human being.

Speaker:

We are not born with written language. So written language was invented.

Speaker:

The spoken language is something like like a zebra

Speaker:

has stripes. This is this is our nature, and this is

Speaker:

interesting. This is not happening not happening in

Speaker:

AI. The best success that didn't have linguist, they don't have any

Speaker:

restriction of what should be say or not.

Speaker:

Maybe maybe AI will be a tool to somehow

Speaker:

make the linguist research more effective and

Speaker:

try to understand what happened in the brain, what happened in the cognition part.

Speaker:

But I would like to tell you about another research we are preparing here, which

Speaker:

is really amazing. 1 of the thing is that we have

Speaker:

so there is this JGPT. It's a language model.

Speaker:

We also have something in the brain. It's also neural network.

Speaker:

And we when we try to compare them, there is a huge

Speaker:

correlation between the the what happened in the artificial neural

Speaker:

network of GPT and the neural

Speaker:

biological neural network in the brain. And, it was

Speaker:

shown, several years ago, and here we

Speaker:

show it again with, with this, with the most modern,

Speaker:

automatic speech recognizers. So this is

Speaker:

a phenomenal post correlation between the artificial and the

Speaker:

neural mechanisms. I was gonna ask about that

Speaker:

because I'm I'm familiar with, you know, at least the abstracts of

Speaker:

the research, from a few years ago and now. And

Speaker:

I was curious if there had been any new correlations

Speaker:

or, you know, or new research, new connections that have been made

Speaker:

between machines learning languages

Speaker:

and the way our brains work. It sounds like

Speaker:

that's true.

Speaker:

So we try to we just initiate, man,

Speaker:

a research here in my lab about that. There was

Speaker:

some French guys from, mainly King

Speaker:

and his colleague at, Meta. And

Speaker:

and I forgot the university in France. So they

Speaker:

show that there is those correlation. They show simple correlation. What we

Speaker:

they show it with LLM, with language model. What we show is a little bit

Speaker:

different. We show correlation with automatic speech

Speaker:

recognition. So we ask people under fMRI, under MRI.

Speaker:

They're we scan their brain at some

Speaker:

resolution, and we try to find correlation with their brain activity

Speaker:

during reading and during speaking aloud,

Speaker:

and ask what is the correlation with the the best model we know for

Speaker:

speech recognition. And then there are correlation.

Speaker:

I have to say that there is a mechanism in the transforming this

Speaker:

architecture of neural network. There is a mechanism called attention. This

Speaker:

mechanism allow those model to to have the connection between

Speaker:

worlds and themselves. So, I'm eating an

Speaker:

apple. It was delicious. So it refers to the apple.

Speaker:

Okay? So there is attention mechanism. This what makes those

Speaker:

model amazing. So there is attention mechanism, I guess, in the

Speaker:

brain. So we try to correlate the this attention mechanism in

Speaker:

the models and compare it to what the activity in the brain. We don't have

Speaker:

results yet, but it seems promising. And we also ask

Speaker:

another question. What if you don't read aloud? What if you read

Speaker:

like silent reading? What if you have dyslexia? What if you have,

Speaker:

other type of, pathology? What

Speaker:

what are the correlation then? So this is fascinating. So and

Speaker:

there is correlation. I don't I don't know still what what's going to happen

Speaker:

with that. But I I know the pathologist, but it's unbelievable, the

Speaker:

correlation. That that is really exciting,

Speaker:

especially when you're examining things like dyslexia,

Speaker:

which is considered, you know, not normal,

Speaker:

or maybe that's not the right term for it, but a

Speaker:

challenge at a minimum. The cool the cool kids call that neurodivergent

Speaker:

now. I think Neurodivergent. Thank you, Frank. So when you're studying, you

Speaker:

know, when you're studying that sort, I'm wondering if there's a place for

Speaker:

that, in in the artificial.

Speaker:

I'm curious. What what do you mean? Can you

Speaker:

So, yeah, is there is is there any benefit

Speaker:

to, I say, transferring the thought processes

Speaker:

of people who are neurodivergent and and automating that

Speaker:

and making that part of the, you know,

Speaker:

the the language model or or speech recognition?

Speaker:

Yeah. I think so. I think so. 1st, it's a it's a tool

Speaker:

to to an to analyze what happened in the

Speaker:

brain. Yeah. What happened

Speaker:

but it's very difficult. So we don't, we don't have any debugger for the build

Speaker:

the brain. We don't see the code of the brain. We don't see that this

Speaker:

function doesn't work. And it's, most of the work

Speaker:

is to design the experiment and

Speaker:

and it's really amazing. In our design, we have the

Speaker:

same so as yet as I told you, I'm asking people to read aloud

Speaker:

and compare it to what automatic speech recognition,

Speaker:

is plan is, supposed to do. But I'm

Speaker:

also asking people to read silently, and then I follow

Speaker:

their eyes. I have a make a make a machine that follows their eyes, and

Speaker:

I know where where is the where like, III

Speaker:

track their eyes and I see which wall they are reading

Speaker:

now. And I can and I can use that to follow

Speaker:

what what they read. But in order to operate that on a speech

Speaker:

recognizer model, I need the speech. So it's during the design of

Speaker:

the experiment, I need artificial speech or I need them to to read aloud

Speaker:

afterwards. It's a it's a big, it's a big question

Speaker:

how to do that properly and how to

Speaker:

make things happen, but definitely walking with

Speaker:

people with, with problems first to help them.

Speaker:

And second, to understand them. And 3rd, to maybe make

Speaker:

understand the brain and make, AI better.

Speaker:

I also think, like, stroke victims, right, could benefit down the line

Speaker:

from a better understanding of lang language models. Right? Like, maybe there would be some

Speaker:

kind of therapy that could be directed to that. I think I think it's

Speaker:

fascinating. I always love those fields where they touch upon more than 1 thing.

Speaker:

Right? This isn't just math. This isn't just computer science. Like, it's linguistics. But,

Speaker:

you know, it's a little bit of everything. It's like a giant, like, pot of

Speaker:

stew that you just throw a bunch of stuff in, and it all kind of

Speaker:

mixes. And, like, it's kind of like, almost like intellectual gumbo,

Speaker:

I guess, would be the word. Right? But,

Speaker:

what what,

Speaker:

what drove you to make, your your your

Speaker:

your company? Like, what what was the driving force to

Speaker:

say, hey. You know, we have

Speaker:

I remember many, many years ago in an office, and you would always see

Speaker:

doctors talking into these little, like, miniature recorders.

Speaker:

Right? In the olden days, they would go off to

Speaker:

some data center somewhere and somebody would not data center, but, like,

Speaker:

some piping center, call center where people would

Speaker:

transcribe that. You know, obviously, that is now an artifact of

Speaker:

the past as these models have gotten better.

Speaker:

What what was the goal in in in, your

Speaker:

company to say we can do this better? What what was the the that breakthrough

Speaker:

moment of, like, here's here's what the industry already does. Here's how we can do

Speaker:

it better. So there is

Speaker:

so we all know Check GPT, and it influence our life. We search now

Speaker:

instead of Google, we search with GPT and it's amazing. It's unbelievable.

Speaker:

So I thought, what about the very fundamental industries? What

Speaker:

about,

Speaker:

like, when you check-in when you, check an airplane, you

Speaker:

use a special jargon. You cannot touch anything. You cannot

Speaker:

leave even a pen there because otherwise the the plane wouldn't be,

Speaker:

valid for flight. What about industries like the food

Speaker:

industries when you need to report, the process? You

Speaker:

have gloves, you cannot touch an iPad, you cannot barely

Speaker:

write. And what about, other industries

Speaker:

like, maybe the cheap technology when you make nanotechnologies and

Speaker:

when you make chips, you make, you know,

Speaker:

silicon chips and silicon

Speaker:

first. So you need you you are cover all.

Speaker:

You are with gloves. You need to report the process. It's a all

Speaker:

those industries has this have special jargons. They use special

Speaker:

terms to describe what they're doing. They don't have access to

Speaker:

to to write something,

Speaker:

and they are very limited in the way they they provide. And on the other

Speaker:

end, we had speech recognition, but speech recognition doesn't work on

Speaker:

those jargon world. Those jargon world are actually the

Speaker:

most important to those industries, and this was the goal for

Speaker:

Iola. So what we do is we operate,

Speaker:

automatic speech recognition, the best automatic speech recognition,

Speaker:

but we also operate something else. We also operate something called keyword spotting.

Speaker:

It's another deep network, which is focused

Speaker:

on detecting only the jargon words. So you can define those jargon

Speaker:

words in advance. You don't need to train them. You you can

Speaker:

define them, and it they all work together. They work like, as a

Speaker:

complimentary, couple to make a

Speaker:

very robust prediction, and we can detect those,

Speaker:

jargon words and make reporting on on on on the

Speaker:

process, without just by speaking. So it

Speaker:

can it can use in any industries,

Speaker:

any, industry that doesn't

Speaker:

have access to the most modern AI system, the speech

Speaker:

recognizer wouldn't walk there. They have problems, like,

Speaker:

writing and formulating their reports.

Speaker:

Yeah. So I'm curious how those work together. You mentioned

Speaker:

that you've got the speech recognizer. You've got the keyword,

Speaker:

engine. Are they 2 separate engines that are just always running

Speaker:

maybe agents, running at the same time or are

Speaker:

they encapsulated, say, is the speech

Speaker:

recognizer does the speech recognizer have a, you know, a

Speaker:

subset or a a function built into it to do the

Speaker:

keyword recognition? So just to

Speaker:

be sure, those keywords in some industries are not are

Speaker:

not are not English words. So it can be a word which nobody

Speaker:

knows about. It was not shown in the in

Speaker:

the, like, in the Internet, like, JGPT strain on the data over the

Speaker:

Internet. There are some walls that are not not there. This is

Speaker:

your, proprietary company. You have invented a wall to

Speaker:

describe what is the this, part of the engine. So

Speaker:

Yeah. So what we so we have this keyword spotting. It was it it

Speaker:

is trained to detect keyword in general. They are defined by,

Speaker:

by text and it operates. We have 2 model for preparation. 1 of them

Speaker:

works on the this encoder part of

Speaker:

the of the automatic speech recognition, and then it guides.

Speaker:

It's still the speech recognition towards the correct

Speaker:

transcription. And there is another mode, which is,

Speaker:

our self, encode our self representation of

Speaker:

speech, and then it also guides the automatic speech

Speaker:

recognition to a better, location and to detect those

Speaker:

words. And, actually, we can show that you can buy combine

Speaker:

any word can be from different languages, and we can

Speaker:

detect them, like, almost 100% correct, those jargon

Speaker:

words. That was that was going sorry. Go ahead.

Speaker:

No. No. No. Sorry. That no. That's okay. That that makes perfect

Speaker:

sense now, what you just said about the languages using

Speaker:

multiple languages, you know, English plus all of the

Speaker:

other languages because sometimes

Speaker:

people will struggle if their English as a second

Speaker:

language speaker. They'll struggle to find the right

Speaker:

English word, and they'll substitute a word from their native language.

Speaker:

And in other cases, they'll be perhaps teaching

Speaker:

on a topic, and they may revert back

Speaker:

to an older language, Greek, Latin, something

Speaker:

like that. That may be part of the, the

Speaker:

lecture or, you know, I could see that in

Speaker:

medicine. I could see it in, you know, all all sorts

Speaker:

of literature studies. I could see a lot of that. And that

Speaker:

that kinda clicked for me as you were saying that that makes sense that you

Speaker:

would have additional languages. Yeah. I also wonder, like, in in

Speaker:

also conversational context. Right? Like, you know, Spanglish is a

Speaker:

thing. Frankel is is the French and

Speaker:

English kinda mashed together, and I know that other language

Speaker:

whenever you have 2 groups of people kinda come together, like, you know, there's always

Speaker:

some kind of weird mix of language that that kinda

Speaker:

just evolves either naturally or forced. I mean, that's Right. That's another

Speaker:

debate. Are you thinking belt or creole? I know we're Belter, you know, I

Speaker:

wasn't going there, but that that's a that's an excellent example.

Speaker:

So, Yosie looks very confused. So so there's a series of

Speaker:

books, called The Expanse. It was an excellent TV show

Speaker:

for about 6 seasons, and it's basically set, 2,

Speaker:

300 years in the future.

Speaker:

And as humans colonize the asteroid belt,

Speaker:

their people from all over the world kinda all end up living

Speaker:

together. So, like, the the Belter Creole language is this is a

Speaker:

creole of, you know, literally dozens of languages. Right?

Speaker:

So, like, it'll switch from, you know, Hindi to Arabic to,

Speaker:

English to French to there's even some German in there. I've heard some of that.

Speaker:

Like, and there are these kind of these weird mixes of things. Right? So they'll

Speaker:

say the the word for the Belter people, like,

Speaker:

people live in the Belk, is Beltaloda. Belt obviously comes from, you

Speaker:

know, the asteroid belt English. Loda, I think is a Hindu term. I

Speaker:

think. Don't hate on me in the comments. Don't hate on me in the comments.

Speaker:

But, I know Walla is a is a is a Hindu term. Right? So

Speaker:

they'll they'll, you know, when they talk to people who live in the Earth or

Speaker:

Mars, they refer to them as well wallahs, gravity well

Speaker:

wallahs. Right? Like so it's like, and I only know wallah because

Speaker:

of dish wallahs, and Wired Magazine did a whole story about dish wallows in

Speaker:

the nineties. Anyway, but I mean, I think, like, you know, I

Speaker:

I suppose that approach could work for something like a creole. Right? Like, we have

Speaker:

multiple languages kinda mixed together. Or is that not really a

Speaker:

massive business case?

Speaker:

It's Creole is really complicated. It's a language. It's like real real a

Speaker:

real language, and it's complicated. This the the more

Speaker:

delicate cases of that, what we call in research, code switching when

Speaker:

I'm Right. When I speak Hebrew, for example, I don't have a

Speaker:

word for the, you know, the Internet router. So I say the router in

Speaker:

in English. Or I said email or I will say

Speaker:

I don't know. There are so many words in English that are used especially

Speaker:

in technology that you use worldwide in other languages, and this

Speaker:

is code switching. There is another case. I think Andy pointed it

Speaker:

out that sometimes when you are stressed

Speaker:

or let's say your l 1 is Spanish, but l 2 is American

Speaker:

English or you're bilingual. And sometimes when you are

Speaker:

stressed, you you just switch the the 1

Speaker:

word and it this is amazing phenomena. This is a research with Tamar Golang

Speaker:

from, University of San Diego and Matt Goldrick from Northwestern

Speaker:

University. And I provide, again, a mechanism to detect

Speaker:

that and to make research of that. And the the key question is,

Speaker:

like, why do you do that? Why do and when do you do that? Is

Speaker:

it stress? What what what is the what is the state of

Speaker:

describing those? Are you gonna describe it in the American

Speaker:

way, the Spanish word, or is it gonna be vice

Speaker:

versa? And this is really interesting.

Speaker:

It's not my field of research. I just know how to detect them

Speaker:

and, and Interesting. To detect them really well,

Speaker:

but I don't know why it happens and what is the mechanism

Speaker:

behind that. I could definitely see,

Speaker:

the opportunity with starting with being

Speaker:

able to detect, you know, these I

Speaker:

don't I don't know the right word for them. I'll I'll call them modes. You

Speaker:

know, a mode of speech where someone is mixing 2

Speaker:

languages. And I'm sure those vary.

Speaker:

So Like when I go Jersey on you. Right? That's we we

Speaker:

can't we can't say any more about that, Frank. We're trying to keep our

Speaker:

clean rating. But yes. Exactly. But,

Speaker:

that's sorry. Inside, Joe. But the,

Speaker:

but, yeah, I could see modes of speaking where someone who is

Speaker:

more familiar with English as a second language.

Speaker:

And and they've still you know, of course, they know their native language. They'll always

Speaker:

know that. But as they I don't I don't wanna use the wrong word

Speaker:

here, but I'm thinking experience is probably the best word is they get more

Speaker:

experience, gain more experience with their second language.

Speaker:

They may switch words less or switch languages

Speaker:

less. And detecting that, I think, is the

Speaker:

is key. I understand now more about what what you're doing, what

Speaker:

you're accomplishing. And that that's the

Speaker:

very first step to then being able to produce speech

Speaker:

in those different modes. And that would be a

Speaker:

fascinating, you know, a fascinating accomplishment.

Speaker:

If you do, the more we can have. Machines

Speaker:

speak to us in the language that we're most familiar with, that,

Speaker:

of course, you know, is is almost there now, mostly

Speaker:

there right now, but have it be able to to speak to us in these

Speaker:

different modes where we where the machine switches where it's

Speaker:

back to our first language, you know, based

Speaker:

on some algorithmic calculation. That sounds

Speaker:

fascinating. Yeah. It is.

Speaker:

I'm not sure we are there yet. It's we have a long way to go

Speaker:

there. But, Sure. Yeah. Makes

Speaker:

sense. Fascinating. Well, this is how it starts, though. Right?

Speaker:

This is fascinating. This is, yeah, this is,

Speaker:

somehow there is an elephant in the room. There we may have to say

Speaker:

something about AI and their regulation and what happens now.

Speaker:

And, if I may, I would like to say something about this because I have

Speaker:

a deep totally different point of view about that.

Speaker:

Please. So everybody is speaking about

Speaker:

regulation and it might be a catastrophic situation

Speaker:

if those, machine are connected

Speaker:

together and they start to train themselves. They try to

Speaker:

build a meta architecture and try to train themselves,

Speaker:

and then they come up with something which is better than human. Some some people

Speaker:

call it the singularity point. So this is frightening. They're smarter

Speaker:

than us. Maybe they they're gonna kill us all. And

Speaker:

people say now people speak about regulation now, and there are

Speaker:

several institutes in Europa, in Europe and in, the US

Speaker:

trying to tackle that. And that

Speaker:

is amazing. That is really important, but I think we missed something here.

Speaker:

And I'll tell you why. So the so there is a book. It's here.

Speaker:

You know, Isaac Asimov, I, Robot. You probably

Speaker:

know that. So he, like, the first page of this book is like the 3

Speaker:

laws of robotic. A robot may not in in injury a

Speaker:

human being or through an interaction, allow human being to come to harm.

Speaker:

A robot must obey others and so on. So we have let's say

Speaker:

we have the regulation. AI cannot hurt humans. Okay?

Speaker:

But that doesn't enough. It's not good enough because if the AI is smart

Speaker:

enough, it will not do the I mean, it will

Speaker:

show us humans that it really obey the law

Speaker:

the laws, but it wouldn't. And this is frightening.

Speaker:

And here I suggest to look a little bit about the human morality

Speaker:

and what why human are have do they have laws? So we need to

Speaker:

think about, if I may, think about the

Speaker:

human psychology. In human psychology, we have a mechanism to obey law.

Speaker:

It's called the superego. It was embedded or defined by

Speaker:

Freud. So we have a mechanism that if we

Speaker:

if we doesn't we if we don't obey a law, we feel either

Speaker:

guilt or fear. And this mechanism was evolutionary.

Speaker:

So do we have a group of monkey? They obey

Speaker:

the the alpha monkey because they're frightened from him. They have some kind of

Speaker:

primitive superego. We obey the law because either we fight them from the

Speaker:

police or either we feel the guilt, we

Speaker:

we it's like the

Speaker:

those experiments that show that, there is, somebody,

Speaker:

left something on the table, and we don't take it because we feel guilt or

Speaker:

we feel something. So this is this mechanism, what

Speaker:

I claim, should be transferred to the

Speaker:

AI machine. This should be the regulation. So what is it superego? Superego

Speaker:

is a infrastructure for to be moral,

Speaker:

and we need a digital version for that for the this is the regulation we

Speaker:

need. We need the infrastructure to be moral in machine. And what it what

Speaker:

does it mean? So superego means that it's a little bit like

Speaker:

self harm, if I may. It's like we feel guilt. We feel something bad if

Speaker:

we do something not okay, if you're not obey the law.

Speaker:

So it's like a self destruction for AI machine. So AI machine,

Speaker:

if it doesn't obey the law, should feel something. It

Speaker:

cannot feel so. Right. It will distract itself. So this is my

Speaker:

claim. This is a book I'm writing, and this is something very fun fundamental.

Speaker:

We we all speak about this regulation, but I think it

Speaker:

it doesn't help just to to do standard

Speaker:

regulation. And if you if I may say another thing, the last thing is that

Speaker:

if you read the I, Robert, carefully, so

Speaker:

he speak there are several short stories there, and he speak about robots that

Speaker:

obey the law. And if you look carefully about those robots that

Speaker:

obey the law, the those robots have super all

Speaker:

all of them have have super ego. They feel guilt.

Speaker:

The the first story is about a robot that play with a girl,

Speaker:

and he feel guilt about winning all the time. So he let her win.

Speaker:

So he feels guilt. It means that it has superhego.

Speaker:

And then he feels frightened from the mother of the girl. And it's

Speaker:

really amazing. So I think, so

Speaker:

this book I'm trying to describe the psychological concept of superego

Speaker:

and then describe why it need to be more and how we can,

Speaker:

find a way to put it in regulation, like the the infrastructure

Speaker:

itself and not just lows.

Speaker:

That is a very interesting problem you're trying to solve.

Speaker:

Very important problem at that. Agreed. And

Speaker:

culturally, we speak, in the US, we have a saying that you

Speaker:

cannot legislate morality, which

Speaker:

legislate, regulate would be, you know,

Speaker:

synonyms. Exactly. Right? So Right. Right. And and legal code

Speaker:

is code. I I

Speaker:

definitely get what you're what you're saying. And I think it's super

Speaker:

important. You mentioned you were writing a book about this. Now

Speaker:

now now you have to tell me more because I wanna read this book.

Speaker:

Same. I'm in the process of looking

Speaker:

for an agent and it's, it's complicated. It's supposed

Speaker:

to be a popular book trying to explain the psychology of fraud.

Speaker:

What is, superego, ego, and the id,

Speaker:

and then describe what is the pathology? So we all have a pathology. So

Speaker:

you have the pathology of, it's called,

Speaker:

the, personalities criminal personality disorder. This

Speaker:

person will not have a super ego, ego ego. It's like Richard the

Speaker:

third from Shakespeare. He didn't have superego. He killed

Speaker:

his family and didn't feel guilt. So this wouldn't what's

Speaker:

going to happen with the with the with those machine. And then I

Speaker:

give some literature examples of,

Speaker:

what is a superego like from the, criminal and

Speaker:

punishment that that the guy killed the the

Speaker:

old lady, but he didn't he nobody,

Speaker:

caught him killing the lady. He murdered her. Nobody caught him, but he

Speaker:

still feel guilt. So he has a very, big

Speaker:

superego. And then we describe I describe, what happened in

Speaker:

other moral theories of human being, all of them connected to the

Speaker:

superego. And then I tried to describe a little bit how machine

Speaker:

learning is trained. Again, solving an optimization problem. And then I try

Speaker:

to describe how can we do superego with, how can we have

Speaker:

a digital superego if we can? No.

Speaker:

It's like you're giving it a conscience of of sorts. Exactly.

Speaker:

Yeah. And I I just wanted to, to add, we

Speaker:

may be able to help you. Maybe not find an

Speaker:

agent, but find a publisher. Both Frank and I are

Speaker:

published. And we, you know, we know Andy has a lot of

Speaker:

Andy's got a lot of connections in the publishing. Well That would be

Speaker:

great. I am I am not, I just wrote a lot of books

Speaker:

for different, publishing houses, and I know some people that if

Speaker:

they can't help you directly, they can probably point you to someone who

Speaker:

can. And, again, I am wholly motivated by wanting to

Speaker:

read this book. Same. Like, I think it's important

Speaker:

because I live in the Washington DC area. Right?

Speaker:

So so, like, there's a lot of people there who they're policy

Speaker:

makers. Right? Like, and they just assume

Speaker:

and I think a lot of humans fall for this. Right? You you see this

Speaker:

when the European Union passed their AI regulation act.

Speaker:

They assume that regulation's gonna solve all their problems.

Speaker:

And I think regulations prove that 1 of the fundamental forces

Speaker:

in the universe is is unintended consequences.

Speaker:

And, you know, when you regulate something, you don't end

Speaker:

the problem. You change the way people will route around it. Right? Like,

Speaker:

and I think a good example of this in AI is the movie Megan, which

Speaker:

I don't know if you've seen, or m threagan. I'm not sure how to pronounce

Speaker:

it, where I think she was about to torture

Speaker:

she was I don't wanna give the plot away, but the the robot

Speaker:

child, Chucky, kinda goes evil, Like, this is the

Speaker:

basic kind of plot line, and the the the person who created her

Speaker:

was like, you can't kill me because it's against your programming. He goes, oh, I

Speaker:

said nothing about killing you. I was gonna put you in a coma, and you'll

Speaker:

live, you know, however many years. Like, it was just like I mean,

Speaker:

that's a great example of, like, she you know, don't kill. Right? Seems like a

Speaker:

pretty reasonable instruction to give a robot, particularly a child's toy.

Speaker:

They'll kill anyone. But, you know, she was realized, like, well, kill

Speaker:

equals death. So if I don't kill you, if I just hospitalize you or

Speaker:

incapacitate you, that doesn't conflict with rule number 1.

Speaker:

Right? Which I think is no. Obviously, as, you

Speaker:

know, humans, we're like, well, it's not really the spirit of the

Speaker:

law, or the rule. But clearly,

Speaker:

the robot or the AI in this case, kind of figured it

Speaker:

out. Like, I don't know. I think you're right. Like and any regulations like that

Speaker:

too. Right? How many loopholes do people discover, whether it's

Speaker:

tax laws or, you know, this. It's like, well, technically, it's

Speaker:

legal. Is it actually, you know,

Speaker:

what the law intended? No. Like, it's Yeah. You need a you need

Speaker:

almost an something like a Nuance engine,

Speaker:

you'll see to Yeah. To get the the

Speaker:

what the machine to interpret

Speaker:

to the laws. And that's I've read Asimov as well,

Speaker:

big fan. And that's what happens down stream of

Speaker:

the 3 laws as they begin to fail as because the

Speaker:

robots are doing exactly what they're programmed to

Speaker:

do. And they're not they're they're

Speaker:

finding ways that in our opinion, human opinion,

Speaker:

circumvents the 3 laws, but really doesn't

Speaker:

break the robot's programming. And it's all about, you know,

Speaker:

how do you define harm? Like, Frank's example is a great, you know,

Speaker:

great example of that. So, yeah,

Speaker:

fascinating stuff. Yeah. We gotta Awesome stuff. We gotta help you write this

Speaker:

book. I wanna read this book. Yeah. I want to raise

Speaker:

another point, but the opposite point that you raised. Like, what happened with

Speaker:

the autonomous car, for example, or people say,

Speaker:

let's let's let's focus on autonomous cars. So so there will be

Speaker:

autonomous car. Who is in charge of a of a car accident?

Speaker:

Accidentally, somebody was killed. You are the

Speaker:

owner you. Somebody is the owner of the car. He sits

Speaker:

there. He bought the car, but the car killed

Speaker:

somebody. So

Speaker:

who who this is an open problem. This is, again,

Speaker:

moral problem. So what I suggest here is

Speaker:

maybe it will take time,

Speaker:

I guess. Maybe the the car, if we can be the

Speaker:

superego and mechanism for morality, you know, the just

Speaker:

the infrastructure for morality can take the

Speaker:

morality of the human. And if somehow he

Speaker:

inherit the the the driver morality, you

Speaker:

can blame the driver. I'll give you another example, which will be much

Speaker:

more maybe concrete. So we say now that there will be change GPT for

Speaker:

every person, for every laptop and iPhone and whatever.

Speaker:

You will have your own GPT with your own life follows

Speaker:

your own history. And the discussion with this GPT will be, And the

Speaker:

discussion with this, GPT will be very personalized and

Speaker:

very helpful. What happened in that case? So in that

Speaker:

case, if this, GPT

Speaker:

will take your responsibilities and morality, somehow we

Speaker:

can copy your morality and be part of it. So if you're moral, it

Speaker:

will be moral. If you're not, you're not, but this is

Speaker:

your responsibility as a human. And I think this

Speaker:

is the way to to go with that. We need just the infrastructure and not

Speaker:

the the law. Anybody can define the low, and anybody

Speaker:

can break the low. We just need the infrastructure to know that

Speaker:

at least the machine to know that it break the broke the low.

Speaker:

And and this is really important. I I think

Speaker:

Oh, I totally agree. Totally agree. Well, we're

Speaker:

gosh. We're coming up on time, Frank. Yeah. This was

Speaker:

awesome. So we'll just any

Speaker:

book recommendations? Obviously, I, Robot, I think, would be good reading

Speaker:

in this space. You also mentioned Shakespeare too,

Speaker:

Richard the 3rd. So Eddie, you can book

Speaker:

which I'm which I'm reading now, which is the band,

Speaker:

Vernon Stuputeux. It's, it's

Speaker:

amazing. It's amazing. It's 3 books, and it's actually

Speaker:

discussed whatever which is not AI. Anything which cannot be solved with

Speaker:

AI. It's speak about a a person who has a vinyl shop,

Speaker:

shop to sell vinyl and then CD runs, and now we cannot sell

Speaker:

anything. So this shop is is closed, and then he

Speaker:

he he try to somehow manage, but he get up at the street. He's, like,

Speaker:

homeless, and he meets many people. And the way like,

Speaker:

every chapter is a different, person or

Speaker:

or a group of pair of people, and it's really

Speaker:

fascinating. It's all those things that you cannot solve with AI. It's all

Speaker:

the human interaction, the very, very basic human interaction. Amazing.

Speaker:

It won the Booker Prize in the, 2018.

Speaker:

Nice. Where can folks find out more about

Speaker:

you? So I have a website

Speaker:

under Joseph Keshet, and, and they

Speaker:

can find me there. Excellent.

Speaker:

Any parting thoughts, Andy? No. Just great great

Speaker:

interview. I appreciate that. 1, I would ask if you repeat the name of

Speaker:

the book you just mentioned about the the different stories.

Speaker:

What's the name of that book? It's not it's a it's a single

Speaker:

story. It's called the the pants,

Speaker:

for non subtext. It's from French. Oh, okay.

Speaker:

Amazing. Amazing. Amazing. Awesome. Excellent. That's it. That's

Speaker:

it for me. But that's great talk. Thank you. Excellent talk. Thank you.

Speaker:

And we'll let Bailey finish the show. Well, folks, that brings us to the end

Speaker:

of another enlightening episode of data driven. We've

Speaker:

navigated the fascinating intricacies of automatic speech

Speaker:

recognition, explored the moral quandaries of AI, and

Speaker:

pondered the future of technology with none other than 1 of the best minds

Speaker:

in the field, doctor Yossi Keshet. Remember, if you

Speaker:

enjoyed today's conversation, don't forget to subscribe to data

Speaker:

driven media TV for exclusive video content.

Speaker:

You can also grab some fantastic merch like the my data is the

Speaker:

new oil t shirt Andy's sporting today. And while Frank is

Speaker:

basking in the Appalachian sunshine, you can bet we're already cooking up the

Speaker:

next episode to keep your data driven minds engaged and entertained.

Speaker:

Until next time, stay curious, stay informed, and

Speaker:

always keep questioning. Cheerio.