Welcome back to another riveting episode of Data Driven.
Speaker:Joining us today, lakeside and positively glowing from his
Speaker:Appalachian retreat, is Frank. Meanwhile, the
Speaker:always astute and ever energetic Andy is here to keep us
Speaker:grounded. But enough about us. Today, we have
Speaker:a true luminary in the field of AI, someone who's blending the worlds
Speaker:of academia and enterprise with seamless finesse. He's an
Speaker:associate professor at the Technion, has published over 100
Speaker:research papers on automated speech recognition, and is the chief
Speaker:scientist at Iola. Please welcome doctor Yossi
Speaker:Keshet or as he's known to his friends, Yossi.
Speaker:Alright. Hello, and welcome to Data Driven, the podcast where we explore the
Speaker:emergent fields of artificial intelligence, data science, and,
Speaker:and, of course, data engineering, without which the whole world would probably stop turning.
Speaker:And you know, data engineering is important. That's
Speaker:basically it. Still working on that that that revamped
Speaker:monologue, for, for season 8, Andy. Were
Speaker:you on vacation? You're on vacation. I am on vacation. And
Speaker:for those of you who can't see on camera who are not who are
Speaker:listening, not watching, I am literally lakeside,
Speaker:in the foothills. Well, not the foothills. We are actually in the Appalachian Mountains. Or
Speaker:is it Appalachian? I I never I I've heard of those. I I never
Speaker:got a clear read on it. Say either. So, you know When I say either.
Speaker:Yeah. Yeah. Yeah. Yeah. Yeah. So I am in Deep Creek Lake,
Speaker:Maryland, which is kind of like, Maryland doesn't really have a Panhandle
Speaker:per se, but if it did, it would be this is what this would be.
Speaker:I probably think I'm 5 miles from West Virginia and about
Speaker:20 miles from Pennsylvania. So it's kind of like this quiet
Speaker:little corner of the state.
Speaker:And I've been, you know, reading and studying
Speaker:today. I hit day 600 on Pluralsight Consecutive. Nice.
Speaker:So recording this June 17th. And, how
Speaker:things with you, Andy? Things are good. I'm gonna throw out a plug for
Speaker:data driven media dot tv because Frank mentioned.
Speaker:If you're listening, he while he was mentioning that, he was
Speaker:actually panning the camera over to the lake. But if
Speaker:you're, subscribing to data driven media dot tv, you get
Speaker:to see us. You get to see the video, and you
Speaker:can see, for instance, that I am wearing the, my data is the
Speaker:new oil t shirt, which you can pick up. I'm just full of
Speaker:sponsor stuff today. I'm just doing Well, it's self out. It's
Speaker:self sponsored. And, honestly, we really need to get better at that. Right? We have
Speaker:data channel. Tv. There is a for listeners to the show, I will give
Speaker:a preview. There is gonna be data driven academy is is launching soon. You have
Speaker:a course coming up the end of the month. Actually, yeah, it's fabric.
Speaker:Today. We're recording this on 17th. It's 24th
Speaker:of of June, but I'm also doing, 2 more, at
Speaker:near the ends of July August. And in addition
Speaker:to that, while we're shameless plugging away here,
Speaker:before we get to our very interesting guest, now I'm also bringing
Speaker:back my, day of Azure Data Factory as wildly
Speaker:popular. I delivered it at a couple of, conferences,
Speaker:international conferences, 22, 23. And,
Speaker:yeah. Let's see see if people are interested. What do you do Friday this
Speaker:afternoon Friday afternoons, Andy? Oh, there's this thing, Frank. Thanks for
Speaker:mentioning that. Totally free. We we gotta we're trying to get better at this. That's
Speaker:all. We do. Yeah. Data engineering Fridays. And if you go to data engineering
Speaker:fridays.com, you can learn more about that. Frank, you're doing a lot
Speaker:of stuff with I noticed with using the, encore
Speaker:replay feature in Restream. And it's
Speaker:right you you shared that with me. I started doing that with data engineering
Speaker:Fridays as well. But great a great way to,
Speaker:you know, to get your message out there. And, you
Speaker:know, I I had no idea replays would help. But my gosh.
Speaker:They really have. It's just a matter of just hitting the echo of I
Speaker:can't even talk. Algorithm the right way. Yeah. And Yeah. You know,
Speaker:maybe we can get the so I think it's a good segue, for our
Speaker:guest. Doctor Yossi, Keshet. He's the chief
Speaker:scientist at AIOLA, an AI powered tech
Speaker:company that automates business workflows
Speaker:by capturing spoken data. Yossi is also
Speaker:an associate professor at the Faculty of Electrical and Computer
Speaker:Engineering at the Technion in Israel.
Speaker:Yossi is an award winning scholar and has published over a 100 research
Speaker:papers about automated speech recognition and speech
Speaker:synthesis. Welcome to the show, Yossi. Hi.
Speaker:Nice for having me. Thank you for having me. Hey. No problem. No
Speaker:problem. We are very excited to have you. And, you're not just an
Speaker:academic, but you've also proven yourself in in actual enterprise. So
Speaker:which sounds really bad as I say that out loud, but I think you knew
Speaker:there was a compliment.
Speaker:But, so what is AIOLA?
Speaker:Can you tell me a little bit about that? Because I'm curious about that and
Speaker:and and workflows
Speaker:around spoken data. So
Speaker:Iola is a company that is aimed to target
Speaker:the, you know, the very basic and foundational
Speaker:industries. Maybe if I
Speaker:may, let's start with the a general scene of the
Speaker:automatic speech recognition now, and then you will understand where are YOLA stands because we
Speaker:have now open AI and everything is like we you
Speaker:can say we solve the AI problem. So it's not like that.
Speaker:So we are in a in a amazing shape in in
Speaker:terms of automatic speech recognition. So we we have a paper that shows
Speaker:that whisper, the model of OpenAI, is as good as humans in
Speaker:detecting and transcribing language when we speak about
Speaker:American English with noise, without noise, and
Speaker:also, l 2 speakers. That is the
Speaker:speakers of non non native American speakers of the
Speaker:language. And the the results are so whisper. The
Speaker:OpenAI model is the same as human listeners. And that is
Speaker:the main thing. But the thing is that
Speaker:when you come to industries, usually they have jargon, they have special words.
Speaker:And and those words are either rare in
Speaker:their language or they they they are not none
Speaker:word. It's like I don't know. I when I'm a medical doctor and would like
Speaker:to make a surgery surgery and I would like to transcribe what I'm saying during
Speaker:the surgery. I'm there isn't words that which are not
Speaker:often used or which are none, non English words. And
Speaker:in that case, those, automatic speech recognizer doesn't
Speaker:work at all. They don't detect those words. And in Ayala, this
Speaker:is our target to take those words, which are actually the most important word. Those
Speaker:are the jargon of the of the industry of the of the facility.
Speaker:So the goal is to help those industries to come
Speaker:up with the with the automatic speech recognition for
Speaker:reporting for transcribing speech.
Speaker:I have a question. When you say automatic, what what makes it automatic? Is
Speaker:it just kinda, what exactly does that mean?
Speaker:So automatic speech recognition today works very similar
Speaker:very, very similar to the way KJGPT works.
Speaker:KJGPT works on a model called transformer. It's an, deep
Speaker:learning architecture, which has, a
Speaker:history based on previous recurrent architectures.
Speaker:And it can predict, as as we all know, it can
Speaker:predict text amazingly. In speech recognition, automatic
Speaker:speech recognition, it's almost the same thing, but there is another
Speaker:component, to the to the to the
Speaker:this transformer, which is which is called encoder.
Speaker:This this part take the speech and actually transfer it to
Speaker:a great representation that can be used
Speaker:with this, with this, let's call it with this with the other side, with
Speaker:this, GPT together. Together, they can,
Speaker:transcribe speech in, as I described, in a very good
Speaker:way, as good as humans in some
Speaker:cases. I will say, like,
Speaker:I've been messing around with the app that's on the phone,
Speaker:for, chat g p chat gbt, and,
Speaker:I use the the voice interaction feature. It is
Speaker:amazingly good at getting rid of the umms, the ahs,
Speaker:the scatterbrain thoughts that I sometimes have when I talk to it.
Speaker:Like, it it could kinda really distill a lot of
Speaker:things. Like, I'm impressed with it. It's it's really gotten last time I
Speaker:did anything serious with speech recognition was probably, like, maybe 4 years
Speaker:ago, and it's really improved. Like, I mean, orders of magnitude
Speaker:than I thought. I mean, it's it's it's it's almost at Star Trek level. You
Speaker:know? I'm not sure
Speaker:in those it depends on the company if it's Apple or
Speaker:Google. And I'm not sure which they don't declare
Speaker:which models they use. I think, personally, they don't use this whisper or
Speaker:the latest model that we have for automatic speech recognition that
Speaker:is transcribing speech. And the goal is a little bit different
Speaker:in the in the phone. You actually want to maybe Right. Make,
Speaker:make notes, send an email, send a text message,
Speaker:and maybe the vocabulary the vocabulary is less
Speaker:less defined. There is another problem with
Speaker:the phones. Oh, no. Go ahead. I want to call my
Speaker:friend. His name is xi, and
Speaker:the last name is CHUNG. How do you pronounce it?
Speaker:What what do you do with that? I'm gonna say he or chi or
Speaker:so there is a there is a problem of proper name and how do you
Speaker:define them. And this is a completely different problem. It's still an open problem, and
Speaker:the goal is a little bit different. So
Speaker:it's when we assessing the quality of those models, it's
Speaker:a little bit different than the assessment of just spoken language
Speaker:like what we do now. No. I mean, that's a great point. I mean, my
Speaker:last name has, you know, technically is Lavin.
Speaker:But, you know, growing up for for reasons many,
Speaker:big and small, it became Lavinia. And like, so, like,
Speaker:the phone, depending on if it's Android or an Apple, it will, it
Speaker:will he gets confused pretty easily.
Speaker:And that is an interesting point. Some names, Andy is lucky to have an
Speaker:easy name for the, the system.
Speaker:But not everybody does. So I understand that. Sure.
Speaker:I also wanna double click on American
Speaker:English. You you you said that a bunch of times. Like, is there is there
Speaker:an inherent bias in these model trainings because these are done by American
Speaker:companies? Yes. There is. Okay. The
Speaker:day the data is mostly of American English. The research institutes
Speaker:are mostly American. So the reason maybe I don't know
Speaker:if you'd call it you call it inherent or implicit bias, but there is a
Speaker:bias, definitely.
Speaker:We are investigating, by the way, the the intelligibility
Speaker:of speech in some cases And what is the intelligibility of
Speaker:of American listener versus the inter intelligibility of
Speaker:myself, which I'm not American listener, but I I know English.
Speaker:What is the best, what is the best, double quote speaker? What is the best
Speaker:listener? How can we transform those
Speaker:to speech recognizer? How can we transform those to assessing the
Speaker:quality of speech? What does it mean? What does it mean about the pathologies in
Speaker:speech? And this is ongoing research on
Speaker:this on this field. Interesting.
Speaker:I I often wonder, like, you know, what it's not just English.
Speaker:Right? Like, you know, if you listen to Spanish, like, there's different dialects of
Speaker:Spanish. Right? Even even German. You know, I'm sure
Speaker:there's, you know, plenty of dialects of all these languages and,
Speaker:like, how do you the training of a
Speaker:model that where it can get to be as good at
Speaker:understanding x and x versus x and y versus, you know,
Speaker:the base language, the base standard. I don't know. That's
Speaker:fascinating. It seems like it seems like it could be an endless loop of, like,
Speaker:training. It it is. Indeed, it
Speaker:is. And when we train, there is another so I'm I'm
Speaker:working on deep learning and AI. And what we found out
Speaker:that it it may it may be the case that if you train
Speaker:on 1 language, huge amount of data from 1 language, let's say
Speaker:American English, but then train on less data on Spanish,
Speaker:you actually get you get some advantage of training from
Speaker:from the American English. So, again, in this modern whisper of
Speaker:OpenAI, most of the data is American English, but,
Speaker:actually, other languages are really great.
Speaker:Again, Spanish is amazing. So maybe like
Speaker:humans maybe like humans as we learn more and more languages, it's easier
Speaker:for us. This is very interesting, point.
Speaker:No. That's an interesting idea because I know, like, I never
Speaker:understood American English grammar, American or otherwise,
Speaker:until I studied a foreign language. And then when I studied it, it was German.
Speaker:And, you know, German kept a lot of the archaic things that
Speaker:are in English and kept them and kept make kept them,
Speaker:made continue to keep them important. Like in English, you know, who
Speaker:and whom used to confuse the you know what out of me.
Speaker:Right? But when I when I learned in German about different cases and things
Speaker:like that, I was like, oh, that's why it is. Right? So,
Speaker:like, all these things that just like you said, like, learning another
Speaker:having more data or data from another point of view, I suppose,
Speaker:or another way to look at the world help me look at my world
Speaker:a little better. Maybe maybe that's how
Speaker:AI will work too. I don't know.
Speaker:Maybe. We don't know. We we actually have a guess about that
Speaker:because it those networks actually solve an optimization problem,
Speaker:mathematical optimization problem. It's a problem that
Speaker:that is, we define it with equation, and we need to have
Speaker:a computer running and solve it. The equation is
Speaker:overtraining set of examples. So it's 1
Speaker:1 person say that, another person said something else.
Speaker:And what happened is that when, again, when we have
Speaker:a large amount of data,
Speaker:it seems that those those networks get to an amazing place.
Speaker:So this this, algorithm, this whisper or other
Speaker:algorithms, it's really from the recent years, like 2, 3 years.
Speaker:That's it. We it's they they perform amazingly
Speaker:amazingly, with the with the
Speaker:same with the same mechanism, not with the same amount of
Speaker:data. Yeah. That's that's that's the
Speaker:fascinating aspect of all of this. It's just that some of these things just seem
Speaker:some problems seem harder than they ought to be,
Speaker:and then some solutions to problems seem way more effective than they
Speaker:ought to be. It's an interesting also to say
Speaker:it's always the case that we so Whisper, OpenAI Whisper, was trained
Speaker:on 600000 hours of speech. But this is
Speaker:way, way much more than just a kid learning a language.
Speaker:Kid language learning a language exposed to way much less hours of
Speaker:speech, less less accurate, less,
Speaker:coherent. And this is something,
Speaker:Nom Chomski raised years ago, like, 50 years ago.
Speaker:And it's still an open question. Like, if we can make those
Speaker:system works better, if we know the language,
Speaker:I guess you learn German faster than any
Speaker:machine that works today.
Speaker:That's yeah. It's it's and I'm glad you mentioned Noam
Speaker:Chomsky because that kinda was like so for those who don't know, Noam
Speaker:Chomsky is, among other things, a noted linguist scholar.
Speaker:I highly recommend you do a search on him because that's a that's a
Speaker:good Wikipedia rabbit hole to fall into. But,
Speaker:how much does linguistics come up in this? Right? Because I think
Speaker:what's fascinating about this field for me is a lot
Speaker:of, my grandfather, my great grandfather
Speaker:was a a linguistic professor. And, you know, as the
Speaker:family lore goes, I never met him. He died decade or 2 before I was
Speaker:born. He spoke, like, 12 languages. He was a professor of, like, 5
Speaker:or 6. And, you know, a lot of people in my family
Speaker:seem to have on that side of the family seem to be gifted in language.
Speaker:And 1 of the fields I was tempted to to study in
Speaker:university was linguistics. And I just find
Speaker:it interesting how there's
Speaker:a now a Venn diagram now is much larger
Speaker:than it used to be in terms of linguistics and computer science.
Speaker:So what are your thoughts on? Like, how much does like,
Speaker:if you're if you have a
Speaker:company like AIO. Right? Like, how many people are, you know, honest to
Speaker:goodness, linguists versus computer scientists and and AI engineers?
Speaker:So there is there is no no linguists there. Oh,
Speaker:really? Okay. There are no linguists. But I have to tell you, so there was
Speaker:a professor called Freddie Frederick, Jelinek. He was the
Speaker:head of language, research at the John Hopkins University
Speaker:at Baltimore. He was amazing. He was 1 of the smartest,
Speaker:people on earth. And he said he was
Speaker:developed many of the speech recognition algorithms. He said,
Speaker:every time I fire a linguist, the performance of speech recognizer goes
Speaker:up.
Speaker:And this is, this is embarrassing. But I've been I
Speaker:made myself, 1st, really like
Speaker:linguistics. I really like cognitive sciences, and I really
Speaker:try to combine it with with my work. But it's really
Speaker:amazing that we don't have all those AI system
Speaker:don't have any of that. So you don't train CEGPT
Speaker:to what is a noun, what is a verb, what is anything. You don't train
Speaker:speech that this is the
Speaker:this is the you don't you don't use linguist. You don't use this is
Speaker:the prominent word. This is the end of the sentence. It just happened
Speaker:by huge amount of data. And
Speaker:this is interesting. This is somehow contradict Noam Chomsky who said that
Speaker:there there is a universal grammar. There is a
Speaker:we are born innate with language. There is a
Speaker:maybe some black box in our brain which
Speaker:is tuned to learn a language. And,
Speaker:we are not sure about that. There is no direct proof if it's correct or
Speaker:no. We are born with language. We are as humans, we're
Speaker:born with language. We this is part of our, human being.
Speaker:We are not born with written language. So written language was invented.
Speaker:The spoken language is something like like a zebra
Speaker:has stripes. This is this is our nature, and this is
Speaker:interesting. This is not happening not happening in
Speaker:AI. The best success that didn't have linguist, they don't have any
Speaker:restriction of what should be say or not.
Speaker:Maybe maybe AI will be a tool to somehow
Speaker:make the linguist research more effective and
Speaker:try to understand what happened in the brain, what happened in the cognition part.
Speaker:But I would like to tell you about another research we are preparing here, which
Speaker:is really amazing. 1 of the thing is that we have
Speaker:so there is this JGPT. It's a language model.
Speaker:We also have something in the brain. It's also neural network.
Speaker:And we when we try to compare them, there is a huge
Speaker:correlation between the the what happened in the artificial neural
Speaker:network of GPT and the neural
Speaker:biological neural network in the brain. And, it was
Speaker:shown, several years ago, and here we
Speaker:show it again with, with this, with the most modern,
Speaker:automatic speech recognizers. So this is
Speaker:a phenomenal post correlation between the artificial and the
Speaker:neural mechanisms. I was gonna ask about that
Speaker:because I'm I'm familiar with, you know, at least the abstracts of
Speaker:the research, from a few years ago and now. And
Speaker:I was curious if there had been any new correlations
Speaker:or, you know, or new research, new connections that have been made
Speaker:between machines learning languages
Speaker:and the way our brains work. It sounds like
Speaker:that's true.
Speaker:So we try to we just initiate, man,
Speaker:a research here in my lab about that. There was
Speaker:some French guys from, mainly King
Speaker:and his colleague at, Meta. And
Speaker:and I forgot the university in France. So they
Speaker:show that there is those correlation. They show simple correlation. What we
Speaker:they show it with LLM, with language model. What we show is a little bit
Speaker:different. We show correlation with automatic speech
Speaker:recognition. So we ask people under fMRI, under MRI.
Speaker:They're we scan their brain at some
Speaker:resolution, and we try to find correlation with their brain activity
Speaker:during reading and during speaking aloud,
Speaker:and ask what is the correlation with the the best model we know for
Speaker:speech recognition. And then there are correlation.
Speaker:I have to say that there is a mechanism in the transforming this
Speaker:architecture of neural network. There is a mechanism called attention. This
Speaker:mechanism allow those model to to have the connection between
Speaker:worlds and themselves. So, I'm eating an
Speaker:apple. It was delicious. So it refers to the apple.
Speaker:Okay? So there is attention mechanism. This what makes those
Speaker:model amazing. So there is attention mechanism, I guess, in the
Speaker:brain. So we try to correlate the this attention mechanism in
Speaker:the models and compare it to what the activity in the brain. We don't have
Speaker:results yet, but it seems promising. And we also ask
Speaker:another question. What if you don't read aloud? What if you read
Speaker:like silent reading? What if you have dyslexia? What if you have,
Speaker:other type of, pathology? What
Speaker:what are the correlation then? So this is fascinating. So and
Speaker:there is correlation. I don't I don't know still what what's going to happen
Speaker:with that. But I I know the pathologist, but it's unbelievable, the
Speaker:correlation. That that is really exciting,
Speaker:especially when you're examining things like dyslexia,
Speaker:which is considered, you know, not normal,
Speaker:or maybe that's not the right term for it, but a
Speaker:challenge at a minimum. The cool the cool kids call that neurodivergent
Speaker:now. I think Neurodivergent. Thank you, Frank. So when you're studying, you
Speaker:know, when you're studying that sort, I'm wondering if there's a place for
Speaker:that, in in the artificial.
Speaker:I'm curious. What what do you mean? Can you
Speaker:So, yeah, is there is is there any benefit
Speaker:to, I say, transferring the thought processes
Speaker:of people who are neurodivergent and and automating that
Speaker:and making that part of the, you know,
Speaker:the the language model or or speech recognition?
Speaker:Yeah. I think so. I think so. 1st, it's a it's a tool
Speaker:to to an to analyze what happened in the
Speaker:brain. Yeah. What happened
Speaker:but it's very difficult. So we don't, we don't have any debugger for the build
Speaker:the brain. We don't see the code of the brain. We don't see that this
Speaker:function doesn't work. And it's, most of the work
Speaker:is to design the experiment and
Speaker:and it's really amazing. In our design, we have the
Speaker:same so as yet as I told you, I'm asking people to read aloud
Speaker:and compare it to what automatic speech recognition,
Speaker:is plan is, supposed to do. But I'm
Speaker:also asking people to read silently, and then I follow
Speaker:their eyes. I have a make a make a machine that follows their eyes, and
Speaker:I know where where is the where like, III
Speaker:track their eyes and I see which wall they are reading
Speaker:now. And I can and I can use that to follow
Speaker:what what they read. But in order to operate that on a speech
Speaker:recognizer model, I need the speech. So it's during the design of
Speaker:the experiment, I need artificial speech or I need them to to read aloud
Speaker:afterwards. It's a it's a big, it's a big question
Speaker:how to do that properly and how to
Speaker:make things happen, but definitely walking with
Speaker:people with, with problems first to help them.
Speaker:And second, to understand them. And 3rd, to maybe make
Speaker:understand the brain and make, AI better.
Speaker:I also think, like, stroke victims, right, could benefit down the line
Speaker:from a better understanding of lang language models. Right? Like, maybe there would be some
Speaker:kind of therapy that could be directed to that. I think I think it's
Speaker:fascinating. I always love those fields where they touch upon more than 1 thing.
Speaker:Right? This isn't just math. This isn't just computer science. Like, it's linguistics. But,
Speaker:you know, it's a little bit of everything. It's like a giant, like, pot of
Speaker:stew that you just throw a bunch of stuff in, and it all kind of
Speaker:mixes. And, like, it's kind of like, almost like intellectual gumbo,
Speaker:I guess, would be the word. Right? But,
Speaker:what what,
Speaker:what drove you to make, your your your
Speaker:your company? Like, what what was the driving force to
Speaker:say, hey. You know, we have
Speaker:I remember many, many years ago in an office, and you would always see
Speaker:doctors talking into these little, like, miniature recorders.
Speaker:Right? In the olden days, they would go off to
Speaker:some data center somewhere and somebody would not data center, but, like,
Speaker:some piping center, call center where people would
Speaker:transcribe that. You know, obviously, that is now an artifact of
Speaker:the past as these models have gotten better.
Speaker:What what was the goal in in in, your
Speaker:company to say we can do this better? What what was the the that breakthrough
Speaker:moment of, like, here's here's what the industry already does. Here's how we can do
Speaker:it better. So there is
Speaker:so we all know Check GPT, and it influence our life. We search now
Speaker:instead of Google, we search with GPT and it's amazing. It's unbelievable.
Speaker:So I thought, what about the very fundamental industries? What
Speaker:about,
Speaker:like, when you check-in when you, check an airplane, you
Speaker:use a special jargon. You cannot touch anything. You cannot
Speaker:leave even a pen there because otherwise the the plane wouldn't be,
Speaker:valid for flight. What about industries like the food
Speaker:industries when you need to report, the process? You
Speaker:have gloves, you cannot touch an iPad, you cannot barely
Speaker:write. And what about, other industries
Speaker:like, maybe the cheap technology when you make nanotechnologies and
Speaker:when you make chips, you make, you know,
Speaker:silicon chips and silicon
Speaker:first. So you need you you are cover all.
Speaker:You are with gloves. You need to report the process. It's a all
Speaker:those industries has this have special jargons. They use special
Speaker:terms to describe what they're doing. They don't have access to
Speaker:to to write something,
Speaker:and they are very limited in the way they they provide. And on the other
Speaker:end, we had speech recognition, but speech recognition doesn't work on
Speaker:those jargon world. Those jargon world are actually the
Speaker:most important to those industries, and this was the goal for
Speaker:Iola. So what we do is we operate,
Speaker:automatic speech recognition, the best automatic speech recognition,
Speaker:but we also operate something else. We also operate something called keyword spotting.
Speaker:It's another deep network, which is focused
Speaker:on detecting only the jargon words. So you can define those jargon
Speaker:words in advance. You don't need to train them. You you can
Speaker:define them, and it they all work together. They work like, as a
Speaker:complimentary, couple to make a
Speaker:very robust prediction, and we can detect those,
Speaker:jargon words and make reporting on on on on the
Speaker:process, without just by speaking. So it
Speaker:can it can use in any industries,
Speaker:any, industry that doesn't
Speaker:have access to the most modern AI system, the speech
Speaker:recognizer wouldn't walk there. They have problems, like,
Speaker:writing and formulating their reports.
Speaker:Yeah. So I'm curious how those work together. You mentioned
Speaker:that you've got the speech recognizer. You've got the keyword,
Speaker:engine. Are they 2 separate engines that are just always running
Speaker:maybe agents, running at the same time or are
Speaker:they encapsulated, say, is the speech
Speaker:recognizer does the speech recognizer have a, you know, a
Speaker:subset or a a function built into it to do the
Speaker:keyword recognition? So just to
Speaker:be sure, those keywords in some industries are not are
Speaker:not are not English words. So it can be a word which nobody
Speaker:knows about. It was not shown in the in
Speaker:the, like, in the Internet, like, JGPT strain on the data over the
Speaker:Internet. There are some walls that are not not there. This is
Speaker:your, proprietary company. You have invented a wall to
Speaker:describe what is the this, part of the engine. So
Speaker:Yeah. So what we so we have this keyword spotting. It was it it
Speaker:is trained to detect keyword in general. They are defined by,
Speaker:by text and it operates. We have 2 model for preparation. 1 of them
Speaker:works on the this encoder part of
Speaker:the of the automatic speech recognition, and then it guides.
Speaker:It's still the speech recognition towards the correct
Speaker:transcription. And there is another mode, which is,
Speaker:our self, encode our self representation of
Speaker:speech, and then it also guides the automatic speech
Speaker:recognition to a better, location and to detect those
Speaker:words. And, actually, we can show that you can buy combine
Speaker:any word can be from different languages, and we can
Speaker:detect them, like, almost 100% correct, those jargon
Speaker:words. That was that was going sorry. Go ahead.
Speaker:No. No. No. Sorry. That no. That's okay. That that makes perfect
Speaker:sense now, what you just said about the languages using
Speaker:multiple languages, you know, English plus all of the
Speaker:other languages because sometimes
Speaker:people will struggle if their English as a second
Speaker:language speaker. They'll struggle to find the right
Speaker:English word, and they'll substitute a word from their native language.
Speaker:And in other cases, they'll be perhaps teaching
Speaker:on a topic, and they may revert back
Speaker:to an older language, Greek, Latin, something
Speaker:like that. That may be part of the, the
Speaker:lecture or, you know, I could see that in
Speaker:medicine. I could see it in, you know, all all sorts
Speaker:of literature studies. I could see a lot of that. And that
Speaker:that kinda clicked for me as you were saying that that makes sense that you
Speaker:would have additional languages. Yeah. I also wonder, like, in in
Speaker:also conversational context. Right? Like, you know, Spanglish is a
Speaker:thing. Frankel is is the French and
Speaker:English kinda mashed together, and I know that other language
Speaker:whenever you have 2 groups of people kinda come together, like, you know, there's always
Speaker:some kind of weird mix of language that that kinda
Speaker:just evolves either naturally or forced. I mean, that's Right. That's another
Speaker:debate. Are you thinking belt or creole? I know we're Belter, you know, I
Speaker:wasn't going there, but that that's a that's an excellent example.
Speaker:So, Yosie looks very confused. So so there's a series of
Speaker:books, called The Expanse. It was an excellent TV show
Speaker:for about 6 seasons, and it's basically set, 2,
Speaker:300 years in the future.
Speaker:And as humans colonize the asteroid belt,
Speaker:their people from all over the world kinda all end up living
Speaker:together. So, like, the the Belter Creole language is this is a
Speaker:creole of, you know, literally dozens of languages. Right?
Speaker:So, like, it'll switch from, you know, Hindi to Arabic to,
Speaker:English to French to there's even some German in there. I've heard some of that.
Speaker:Like, and there are these kind of these weird mixes of things. Right? So they'll
Speaker:say the the word for the Belter people, like,
Speaker:people live in the Belk, is Beltaloda. Belt obviously comes from, you
Speaker:know, the asteroid belt English. Loda, I think is a Hindu term. I
Speaker:think. Don't hate on me in the comments. Don't hate on me in the comments.
Speaker:But, I know Walla is a is a is a Hindu term. Right? So
Speaker:they'll they'll, you know, when they talk to people who live in the Earth or
Speaker:Mars, they refer to them as well wallahs, gravity well
Speaker:wallahs. Right? Like so it's like, and I only know wallah because
Speaker:of dish wallahs, and Wired Magazine did a whole story about dish wallows in
Speaker:the nineties. Anyway, but I mean, I think, like, you know, I
Speaker:I suppose that approach could work for something like a creole. Right? Like, we have
Speaker:multiple languages kinda mixed together. Or is that not really a
Speaker:massive business case?
Speaker:It's Creole is really complicated. It's a language. It's like real real a
Speaker:real language, and it's complicated. This the the more
Speaker:delicate cases of that, what we call in research, code switching when
Speaker:I'm Right. When I speak Hebrew, for example, I don't have a
Speaker:word for the, you know, the Internet router. So I say the router in
Speaker:in English. Or I said email or I will say
Speaker:I don't know. There are so many words in English that are used especially
Speaker:in technology that you use worldwide in other languages, and this
Speaker:is code switching. There is another case. I think Andy pointed it
Speaker:out that sometimes when you are stressed
Speaker:or let's say your l 1 is Spanish, but l 2 is American
Speaker:English or you're bilingual. And sometimes when you are
Speaker:stressed, you you just switch the the 1
Speaker:word and it this is amazing phenomena. This is a research with Tamar Golang
Speaker:from, University of San Diego and Matt Goldrick from Northwestern
Speaker:University. And I provide, again, a mechanism to detect
Speaker:that and to make research of that. And the the key question is,
Speaker:like, why do you do that? Why do and when do you do that? Is
Speaker:it stress? What what what is the what is the state of
Speaker:describing those? Are you gonna describe it in the American
Speaker:way, the Spanish word, or is it gonna be vice
Speaker:versa? And this is really interesting.
Speaker:It's not my field of research. I just know how to detect them
Speaker:and, and Interesting. To detect them really well,
Speaker:but I don't know why it happens and what is the mechanism
Speaker:behind that. I could definitely see,
Speaker:the opportunity with starting with being
Speaker:able to detect, you know, these I
Speaker:don't I don't know the right word for them. I'll I'll call them modes. You
Speaker:know, a mode of speech where someone is mixing 2
Speaker:languages. And I'm sure those vary.
Speaker:So Like when I go Jersey on you. Right? That's we we
Speaker:can't we can't say any more about that, Frank. We're trying to keep our
Speaker:clean rating. But yes. Exactly. But,
Speaker:that's sorry. Inside, Joe. But the,
Speaker:but, yeah, I could see modes of speaking where someone who is
Speaker:more familiar with English as a second language.
Speaker:And and they've still you know, of course, they know their native language. They'll always
Speaker:know that. But as they I don't I don't wanna use the wrong word
Speaker:here, but I'm thinking experience is probably the best word is they get more
Speaker:experience, gain more experience with their second language.
Speaker:They may switch words less or switch languages
Speaker:less. And detecting that, I think, is the
Speaker:is key. I understand now more about what what you're doing, what
Speaker:you're accomplishing. And that that's the
Speaker:very first step to then being able to produce speech
Speaker:in those different modes. And that would be a
Speaker:fascinating, you know, a fascinating accomplishment.
Speaker:If you do, the more we can have. Machines
Speaker:speak to us in the language that we're most familiar with, that,
Speaker:of course, you know, is is almost there now, mostly
Speaker:there right now, but have it be able to to speak to us in these
Speaker:different modes where we where the machine switches where it's
Speaker:back to our first language, you know, based
Speaker:on some algorithmic calculation. That sounds
Speaker:fascinating. Yeah. It is.
Speaker:I'm not sure we are there yet. It's we have a long way to go
Speaker:there. But, Sure. Yeah. Makes
Speaker:sense. Fascinating. Well, this is how it starts, though. Right?
Speaker:This is fascinating. This is, yeah, this is,
Speaker:somehow there is an elephant in the room. There we may have to say
Speaker:something about AI and their regulation and what happens now.
Speaker:And, if I may, I would like to say something about this because I have
Speaker:a deep totally different point of view about that.
Speaker:Please. So everybody is speaking about
Speaker:regulation and it might be a catastrophic situation
Speaker:if those, machine are connected
Speaker:together and they start to train themselves. They try to
Speaker:build a meta architecture and try to train themselves,
Speaker:and then they come up with something which is better than human. Some some people
Speaker:call it the singularity point. So this is frightening. They're smarter
Speaker:than us. Maybe they they're gonna kill us all. And
Speaker:people say now people speak about regulation now, and there are
Speaker:several institutes in Europa, in Europe and in, the US
Speaker:trying to tackle that. And that
Speaker:is amazing. That is really important, but I think we missed something here.
Speaker:And I'll tell you why. So the so there is a book. It's here.
Speaker:You know, Isaac Asimov, I, Robot. You probably
Speaker:know that. So he, like, the first page of this book is like the 3
Speaker:laws of robotic. A robot may not in in injury a
Speaker:human being or through an interaction, allow human being to come to harm.
Speaker:A robot must obey others and so on. So we have let's say
Speaker:we have the regulation. AI cannot hurt humans. Okay?
Speaker:But that doesn't enough. It's not good enough because if the AI is smart
Speaker:enough, it will not do the I mean, it will
Speaker:show us humans that it really obey the law
Speaker:the laws, but it wouldn't. And this is frightening.
Speaker:And here I suggest to look a little bit about the human morality
Speaker:and what why human are have do they have laws? So we need to
Speaker:think about, if I may, think about the
Speaker:human psychology. In human psychology, we have a mechanism to obey law.
Speaker:It's called the superego. It was embedded or defined by
Speaker:Freud. So we have a mechanism that if we
Speaker:if we doesn't we if we don't obey a law, we feel either
Speaker:guilt or fear. And this mechanism was evolutionary.
Speaker:So do we have a group of monkey? They obey
Speaker:the the alpha monkey because they're frightened from him. They have some kind of
Speaker:primitive superego. We obey the law because either we fight them from the
Speaker:police or either we feel the guilt, we
Speaker:we it's like the
Speaker:those experiments that show that, there is, somebody,
Speaker:left something on the table, and we don't take it because we feel guilt or
Speaker:we feel something. So this is this mechanism, what
Speaker:I claim, should be transferred to the
Speaker:AI machine. This should be the regulation. So what is it superego? Superego
Speaker:is a infrastructure for to be moral,
Speaker:and we need a digital version for that for the this is the regulation we
Speaker:need. We need the infrastructure to be moral in machine. And what it what
Speaker:does it mean? So superego means that it's a little bit like
Speaker:self harm, if I may. It's like we feel guilt. We feel something bad if
Speaker:we do something not okay, if you're not obey the law.
Speaker:So it's like a self destruction for AI machine. So AI machine,
Speaker:if it doesn't obey the law, should feel something. It
Speaker:cannot feel so. Right. It will distract itself. So this is my
Speaker:claim. This is a book I'm writing, and this is something very fun fundamental.
Speaker:We we all speak about this regulation, but I think it
Speaker:it doesn't help just to to do standard
Speaker:regulation. And if you if I may say another thing, the last thing is that
Speaker:if you read the I, Robert, carefully, so
Speaker:he speak there are several short stories there, and he speak about robots that
Speaker:obey the law. And if you look carefully about those robots that
Speaker:obey the law, the those robots have super all
Speaker:all of them have have super ego. They feel guilt.
Speaker:The the first story is about a robot that play with a girl,
Speaker:and he feel guilt about winning all the time. So he let her win.
Speaker:So he feels guilt. It means that it has superhego.
Speaker:And then he feels frightened from the mother of the girl. And it's
Speaker:really amazing. So I think, so
Speaker:this book I'm trying to describe the psychological concept of superego
Speaker:and then describe why it need to be more and how we can,
Speaker:find a way to put it in regulation, like the the infrastructure
Speaker:itself and not just lows.
Speaker:That is a very interesting problem you're trying to solve.
Speaker:Very important problem at that. Agreed. And
Speaker:culturally, we speak, in the US, we have a saying that you
Speaker:cannot legislate morality, which
Speaker:legislate, regulate would be, you know,
Speaker:synonyms. Exactly. Right? So Right. Right. And and legal code
Speaker:is code. I I
Speaker:definitely get what you're what you're saying. And I think it's super
Speaker:important. You mentioned you were writing a book about this. Now
Speaker:now now you have to tell me more because I wanna read this book.
Speaker:Same. I'm in the process of looking
Speaker:for an agent and it's, it's complicated. It's supposed
Speaker:to be a popular book trying to explain the psychology of fraud.
Speaker:What is, superego, ego, and the id,
Speaker:and then describe what is the pathology? So we all have a pathology. So
Speaker:you have the pathology of, it's called,
Speaker:the, personalities criminal personality disorder. This
Speaker:person will not have a super ego, ego ego. It's like Richard the
Speaker:third from Shakespeare. He didn't have superego. He killed
Speaker:his family and didn't feel guilt. So this wouldn't what's
Speaker:going to happen with the with the with those machine. And then I
Speaker:give some literature examples of,
Speaker:what is a superego like from the, criminal and
Speaker:punishment that that the guy killed the the
Speaker:old lady, but he didn't he nobody,
Speaker:caught him killing the lady. He murdered her. Nobody caught him, but he
Speaker:still feel guilt. So he has a very, big
Speaker:superego. And then we describe I describe, what happened in
Speaker:other moral theories of human being, all of them connected to the
Speaker:superego. And then I tried to describe a little bit how machine
Speaker:learning is trained. Again, solving an optimization problem. And then I try
Speaker:to describe how can we do superego with, how can we have
Speaker:a digital superego if we can? No.
Speaker:It's like you're giving it a conscience of of sorts. Exactly.
Speaker:Yeah. And I I just wanted to, to add, we
Speaker:may be able to help you. Maybe not find an
Speaker:agent, but find a publisher. Both Frank and I are
Speaker:published. And we, you know, we know Andy has a lot of
Speaker:Andy's got a lot of connections in the publishing. Well That would be
Speaker:great. I am I am not, I just wrote a lot of books
Speaker:for different, publishing houses, and I know some people that if
Speaker:they can't help you directly, they can probably point you to someone who
Speaker:can. And, again, I am wholly motivated by wanting to
Speaker:read this book. Same. Like, I think it's important
Speaker:because I live in the Washington DC area. Right?
Speaker:So so, like, there's a lot of people there who they're policy
Speaker:makers. Right? Like, and they just assume
Speaker:and I think a lot of humans fall for this. Right? You you see this
Speaker:when the European Union passed their AI regulation act.
Speaker:They assume that regulation's gonna solve all their problems.
Speaker:And I think regulations prove that 1 of the fundamental forces
Speaker:in the universe is is unintended consequences.
Speaker:And, you know, when you regulate something, you don't end
Speaker:the problem. You change the way people will route around it. Right? Like,
Speaker:and I think a good example of this in AI is the movie Megan, which
Speaker:I don't know if you've seen, or m threagan. I'm not sure how to pronounce
Speaker:it, where I think she was about to torture
Speaker:she was I don't wanna give the plot away, but the the robot
Speaker:child, Chucky, kinda goes evil, Like, this is the
Speaker:basic kind of plot line, and the the the person who created her
Speaker:was like, you can't kill me because it's against your programming. He goes, oh, I
Speaker:said nothing about killing you. I was gonna put you in a coma, and you'll
Speaker:live, you know, however many years. Like, it was just like I mean,
Speaker:that's a great example of, like, she you know, don't kill. Right? Seems like a
Speaker:pretty reasonable instruction to give a robot, particularly a child's toy.
Speaker:They'll kill anyone. But, you know, she was realized, like, well, kill
Speaker:equals death. So if I don't kill you, if I just hospitalize you or
Speaker:incapacitate you, that doesn't conflict with rule number 1.
Speaker:Right? Which I think is no. Obviously, as, you
Speaker:know, humans, we're like, well, it's not really the spirit of the
Speaker:law, or the rule. But clearly,
Speaker:the robot or the AI in this case, kind of figured it
Speaker:out. Like, I don't know. I think you're right. Like and any regulations like that
Speaker:too. Right? How many loopholes do people discover, whether it's
Speaker:tax laws or, you know, this. It's like, well, technically, it's
Speaker:legal. Is it actually, you know,
Speaker:what the law intended? No. Like, it's Yeah. You need a you need
Speaker:almost an something like a Nuance engine,
Speaker:you'll see to Yeah. To get the the
Speaker:what the machine to interpret
Speaker:to the laws. And that's I've read Asimov as well,
Speaker:big fan. And that's what happens down stream of
Speaker:the 3 laws as they begin to fail as because the
Speaker:robots are doing exactly what they're programmed to
Speaker:do. And they're not they're they're
Speaker:finding ways that in our opinion, human opinion,
Speaker:circumvents the 3 laws, but really doesn't
Speaker:break the robot's programming. And it's all about, you know,
Speaker:how do you define harm? Like, Frank's example is a great, you know,
Speaker:great example of that. So, yeah,
Speaker:fascinating stuff. Yeah. We gotta Awesome stuff. We gotta help you write this
Speaker:book. I wanna read this book. Yeah. I want to raise
Speaker:another point, but the opposite point that you raised. Like, what happened with
Speaker:the autonomous car, for example, or people say,
Speaker:let's let's let's focus on autonomous cars. So so there will be
Speaker:autonomous car. Who is in charge of a of a car accident?
Speaker:Accidentally, somebody was killed. You are the
Speaker:owner you. Somebody is the owner of the car. He sits
Speaker:there. He bought the car, but the car killed
Speaker:somebody. So
Speaker:who who this is an open problem. This is, again,
Speaker:moral problem. So what I suggest here is
Speaker:maybe it will take time,
Speaker:I guess. Maybe the the car, if we can be the
Speaker:superego and mechanism for morality, you know, the just
Speaker:the infrastructure for morality can take the
Speaker:morality of the human. And if somehow he
Speaker:inherit the the the driver morality, you
Speaker:can blame the driver. I'll give you another example, which will be much
Speaker:more maybe concrete. So we say now that there will be change GPT for
Speaker:every person, for every laptop and iPhone and whatever.
Speaker:You will have your own GPT with your own life follows
Speaker:your own history. And the discussion with this GPT will be, And the
Speaker:discussion with this, GPT will be very personalized and
Speaker:very helpful. What happened in that case? So in that
Speaker:case, if this, GPT
Speaker:will take your responsibilities and morality, somehow we
Speaker:can copy your morality and be part of it. So if you're moral, it
Speaker:will be moral. If you're not, you're not, but this is
Speaker:your responsibility as a human. And I think this
Speaker:is the way to to go with that. We need just the infrastructure and not
Speaker:the the law. Anybody can define the low, and anybody
Speaker:can break the low. We just need the infrastructure to know that
Speaker:at least the machine to know that it break the broke the low.
Speaker:And and this is really important. I I think
Speaker:Oh, I totally agree. Totally agree. Well, we're
Speaker:gosh. We're coming up on time, Frank. Yeah. This was
Speaker:awesome. So we'll just any
Speaker:book recommendations? Obviously, I, Robot, I think, would be good reading
Speaker:in this space. You also mentioned Shakespeare too,
Speaker:Richard the 3rd. So Eddie, you can book
Speaker:which I'm which I'm reading now, which is the band,
Speaker:Vernon Stuputeux. It's, it's
Speaker:amazing. It's amazing. It's 3 books, and it's actually
Speaker:discussed whatever which is not AI. Anything which cannot be solved with
Speaker:AI. It's speak about a a person who has a vinyl shop,
Speaker:shop to sell vinyl and then CD runs, and now we cannot sell
Speaker:anything. So this shop is is closed, and then he
Speaker:he he try to somehow manage, but he get up at the street. He's, like,
Speaker:homeless, and he meets many people. And the way like,
Speaker:every chapter is a different, person or
Speaker:or a group of pair of people, and it's really
Speaker:fascinating. It's all those things that you cannot solve with AI. It's all
Speaker:the human interaction, the very, very basic human interaction. Amazing.
Speaker:It won the Booker Prize in the, 2018.
Speaker:Nice. Where can folks find out more about
Speaker:you? So I have a website
Speaker:under Joseph Keshet, and, and they
Speaker:can find me there. Excellent.
Speaker:Any parting thoughts, Andy? No. Just great great
Speaker:interview. I appreciate that. 1, I would ask if you repeat the name of
Speaker:the book you just mentioned about the the different stories.
Speaker:What's the name of that book? It's not it's a it's a single
Speaker:story. It's called the the pants,
Speaker:for non subtext. It's from French. Oh, okay.
Speaker:Amazing. Amazing. Amazing. Awesome. Excellent. That's it. That's
Speaker:it for me. But that's great talk. Thank you. Excellent talk. Thank you.
Speaker:And we'll let Bailey finish the show. Well, folks, that brings us to the end
Speaker:of another enlightening episode of data driven. We've
Speaker:navigated the fascinating intricacies of automatic speech
Speaker:recognition, explored the moral quandaries of AI, and
Speaker:pondered the future of technology with none other than 1 of the best minds
Speaker:in the field, doctor Yossi Keshet. Remember, if you
Speaker:enjoyed today's conversation, don't forget to subscribe to data
Speaker:driven media TV for exclusive video content.
Speaker:You can also grab some fantastic merch like the my data is the
Speaker:new oil t shirt Andy's sporting today. And while Frank is
Speaker:basking in the Appalachian sunshine, you can bet we're already cooking up the
Speaker:next episode to keep your data driven minds engaged and entertained.
Speaker:Until next time, stay curious, stay informed, and
Speaker:always keep questioning. Cheerio.