Welcome to Impact Quantum, the show that peeks under the hood of
Speaker:quantum computing to reveal what's emerging and why it
Speaker:matters. Today's episode is an absolute masterclass
Speaker:in quantum inspired data science, with a guest who
Speaker:quite frankly makes the rest of us feel like we're still stuck figuring
Speaker:out long division. Joining Frank and Candice is Dr.
Speaker:Marvin Weinstein, emeritus at Stanford University,
Speaker:bona fide particle physicist and co creator of
Speaker:dynamic quantum clustering, a method that
Speaker:sounds like science fiction, but delivers real world,
Speaker:potentially life saving insight. Marvin takes us on
Speaker:a thrilling journey through brain cancer research, data
Speaker:agnosticism, and how a physicist wandered into
Speaker:biology and found patterns that even seasoned
Speaker:researchers had missed. This isn't quantum computing per
Speaker:se, it's quantum mechanics inspired analysis applied with
Speaker:surgical precision minus the surgical gloves.
Speaker:Whether you're a curious technologist or just here for the
Speaker:intellectual thrill ride, this one is for you. And
Speaker:no, you don't need a PhD to follow along. Just
Speaker:curiosity and perhaps a cup of strong tea. This episode
Speaker:is rated 5 Schrodingers. So buckle up and let's get
Speaker:into it.
Speaker:Hello and welcome back to Impact Quantum, the podcast where we explore the
Speaker:emergent fields of quantum computing and
Speaker:the upcoming ecosystem that is going to spread around it.
Speaker:So you don't need to be a quantum physicist, but you do need to be
Speaker:curious and curious about quantum computing. And with me today,
Speaker:as always, is the most quantum curious person I know, Candace Kahuli.
Speaker:How's it going, Candace? It's great. Thank you so much for asking. I'm really
Speaker:excited about today. Yeah. So I think today you actually have
Speaker:an honest to goodness physicist here on
Speaker:as a guest. Absolutely. Amongst many things that
Speaker:he's, that he's doing, I can say that he is a particle physicist at Stanford
Speaker:University as well as,
Speaker:as well as the CSO co founder at
Speaker:Quantum Insights Incorporated. He's got a lot, a
Speaker:lot of experience and a lot of great knowledge to share
Speaker:to our audience. I think everyone's going to find him as fascinating as I do.
Speaker:Cool, I hope.
Speaker:But I am a genuine quantum mechanic. That's right. There you
Speaker:go. So please welcome everybody, Marvin
Speaker:Weinstein to the show. How's it going? It's
Speaker:going well. As I was telling Candice, you got me in a very
Speaker:excited state today, so I hope I'm coherent.
Speaker:Awesome. Yeah. In the virtual green room, you had said you kind of
Speaker:uncovered something very interesting. So we can start there if you
Speaker:like. Well, yeah, I mean,
Speaker:basically the thing we were talking about
Speaker:during the previous interview was a tool that was Developed that was
Speaker:what the company was founded for, to apply the
Speaker:various problems. And it's called dynamic quantum clustering.
Speaker:And it differs from other clustering
Speaker:algorithms, other data mining tools, in that it
Speaker:is completely unbiased. You can take a first look at data with
Speaker:making no assumptions about if there is anything to be found in the data,
Speaker:cleaning the data or
Speaker:labeling it in any manner, shape or form. You just look at the raw
Speaker:data. So
Speaker:for personal history reasons, I mean, Candace was telling me about somebody
Speaker:she knew who partner had
Speaker:died or father had died of a brain tumor. But
Speaker:my first wife also died of glioblastoma.
Speaker:So when Quantum Insights decided
Speaker:to close its doors, I was sitting with all of this data from the Cancer
Speaker:Genome Atlas, including all of its glioma
Speaker:data. That means all low grade gliomas and
Speaker:glioblastoma data. So I had RNA sequencing data
Speaker:for all of those tumors. And basically
Speaker:I decided first thing I want to do with it is take a look
Speaker:at it and see if there's anything to see in that
Speaker:data that I mean people have looked, this data set's old,
Speaker:so it's been around for a long time, has been heavily studied.
Speaker:People were totally sure that everything that there
Speaker:was to be extracted from that data set had been extracted from
Speaker:that data set. And so basically
Speaker:I said, well, nobody's looked with our tool.
Speaker:And so what did I do? Well, the first thing was, as
Speaker:I promised you, I simply loaded up the data. I did
Speaker:restrict the gene expression from the 60,000 genes
Speaker:that it comes with down to what everybody believes is
Speaker:the 20,000 so called protein coding genes.
Speaker:Not all of them code for proteins, but they're the list
Speaker:that various tools
Speaker:restrict to. So I wanted to stay within what other people were doing.
Speaker:So I looked at those 20,000 genes. Well, that's a lot of data. I mean
Speaker:that's a lot of noise. It's actually not a lot of data. It's only 600.
Speaker:I mean that's always a misconception. People say biologists have huge data
Speaker:sets. They really don't. I mean, for example, all of the
Speaker:cancer data is 692
Speaker:tumors, brain cancers.
Speaker:That's not a big data set. There's only 692 pieces of
Speaker:information. I don't care that there's 20,000 genes
Speaker:because all you're seeing is the effect of 692
Speaker:combinations of the expression levels for those genes. The whole
Speaker:data set can be reproduced from those 692 pieces
Speaker:of information. So
Speaker:not like a physics data set which has Millions of
Speaker:samples and stuff to look at. This is a biology data
Speaker:set and typically restricted to a specific disease.
Speaker:It's not huge. What it is, is
Speaker:complicated and really hard to see what's going
Speaker:on because there's so much noise.
Speaker:So first thing I did, as I said, was restricted
Speaker:the raw data. It's a matrix after all, rows
Speaker:and columns, okay? Every row is the expression level
Speaker:for 20,000 genes. I'm rounding the numbers off. You don't
Speaker:want the 20,312 all the time.
Speaker:So it's that. And there are
Speaker:692 rows in all. You feed that into DQC, it's
Speaker:made to ingest that quickly and you do, you just
Speaker:simply run the first analysis and surprise. The first thing you see
Speaker:is there's a whopping big signal.
Speaker:In fact that data, raw, unprocessed,
Speaker:unlabeled, untreated in any way, no
Speaker:training set, separates into two clusters. One
Speaker:very large cluster which is mostly the lower grade
Speaker:gliomas, and another cluster which is
Speaker:almost all of the
Speaker:glioblastomas. Well, that's pretty
Speaker:cool. There's already a signal. It's not the best classification
Speaker:in the world. Maybe it's very good, it's competitive.
Speaker:But DQC has a standard trick which is
Speaker:you can pick out a smaller number of
Speaker:genes to look at, in this case a smaller number of features in the
Speaker:fancy language which
Speaker:give the same information. And so first run at that
Speaker:produced 544genes and exactly
Speaker:the same picture. So I didn't have to look at 20,000, I
Speaker:had to look at 544 which were doing most of the heavy
Speaker:lifting, produce the same two clusters,
Speaker:same, not wonderful, but pretty good classification
Speaker:scheme. Then there's another DQC based trick
Speaker:which is using the information in the two clusters, now
Speaker:I can order the genes that I'm looking at, the
Speaker:544, in order of their
Speaker:importance to the signal.
Speaker:Then I look the first 10 genes, the first 20 genes, the first
Speaker:30 genes, and I did those analyses over and over. Each time I did
Speaker:it starting from 10, I got a pretty good
Speaker:classifications. 20 made it better, 30 made it better
Speaker:until I got up to 90 genes and then at
Speaker:100, 110, 120, everything
Speaker:stopped getting better and started to get worse. Interesting. So the
Speaker:cutoff interesting was I wanted to look at the 90 gene signal
Speaker:because the cleanest information was going to be in the 90 gene
Speaker:signal. Did that and sure enough I find
Speaker:four clusters. So what are the four clusters? Three of
Speaker:those clusters are all low grade gliomas.
Speaker:100% low grade gliomas. They
Speaker:capture all of the low grade gliomas
Speaker:except for four tumors. The
Speaker:fourth cluster is all of the glioblastomas and
Speaker:those four that were not captured. Now remember,
Speaker:there were 692 tumors, I was missing four.
Speaker:So when you look at them plotted in the space,
Speaker:let's call it PCA space. You like that word? And it is the
Speaker:PCA space for the tumor expressions. Those four
Speaker:lie right next to the gliomas, whereas all the other data lie far away from
Speaker:the gliomas. Just for folks that may not know
Speaker:what PCA is, is it Principal Component
Speaker:Analysis? Principal component, yeah. PCA
Speaker:is a way of rotating the data so that
Speaker:the dimension of the data in which
Speaker:the data is most spread out is the first dimension. The dimension
Speaker:in which the data is next most spread out is the second.
Speaker:It tends, if you're lucky in low dimensions
Speaker:to show you what you need to see in order to
Speaker:try to do clustering. Because most clustering algorithms
Speaker:deteriorate rapidly as the dimension of the data goes
Speaker:up. So they like to do a hard dimensional reduction, they
Speaker:call it to two or three PCA directions
Speaker:and then try to cluster based on what they see there.
Speaker:There are algorithms which work in higher dimension, but,
Speaker:but still there are things they struggle with.
Speaker:DTC doesn't really care. It doesn't start with a hard dimensional
Speaker:reduction. It simply works
Speaker:with, with what is showing the most information. If it's 6,
Speaker:if it's 10, if it's 16, if it's 50, that's fine, I
Speaker:don't care. I'll, I'll work in that. The only impact
Speaker:and price I pay is the time it takes to run the algorithm.
Speaker:But, and so the usual trick is you work in the lowest number
Speaker:of dimensions that appear to be noise free,
Speaker:which you can tell by looking at the spectrum that you see in pca
Speaker:and then work your way up to twice that number of
Speaker:dimensions and look again and if you see the same information,
Speaker:well, it's quicker to run everything in the lower dimension, but
Speaker:you don't stop if you see a change. So
Speaker:at any rate, Granite, I got these four clusters.
Speaker:Now you notice the only misclassification out of
Speaker:692 tumors is four tumors.
Speaker:So a considerably less than 1%
Speaker:failure. Of doing close to like
Speaker:1/7 of 1%, would you say? Yeah, yeah, yeah.
Speaker:So I mean that's, that's, I'm trying. To quote the worst possible
Speaker:statistic that I can imagine, but less than 1%. We can all
Speaker:agree on. So that would put it at over 99%. If I tell you
Speaker:from this analysis you have a low grade glioma, I'm
Speaker:100% accurate. Right. If I tell you you have a
Speaker:glioblastoma, I might be as much as
Speaker:2 1/2% inaccurate on just glioma question
Speaker:that's better than world class. Let's say what's current state
Speaker:of the art is closer to like 80, 20. 19 around trying to
Speaker:say how do we compete? And I haven't succeeded yet. My
Speaker:collaborators, one bioinformaticist at
Speaker:Wisconsin and a cancer doc at Stanford,
Speaker:are going to have to help me with that. In searching the literature, what
Speaker:I find are statements like most schemes for
Speaker:doing this, unsupervised from
Speaker:the raw data and then moving on from an internal
Speaker:analysis. Still working, starting with the raw data,
Speaker:what they call the area under the curve. So the likelihood you're right
Speaker:is 70 to 80%.
Speaker:Interesting. So we're not talking anything like the same. There
Speaker:are some special biomarkers. If they're found
Speaker:on a glioblastoma, then people are pretty sure
Speaker:it's a glioblastoma at maybe the 1% level.
Speaker:Okay, but separating
Speaker:glioblastoma from low grade gliomas, blind,
Speaker:they're nowhere near that good.
Speaker:So at any rate,
Speaker:that's what I found. I now have the world's best classifier.
Speaker:In 90 genes, I plot the gene expression levels
Speaker:for each one of those clusters. And for most of the genesis
Speaker:I see the genes either fall into the category, the
Speaker:expectation for the expression of that gene
Speaker:for the either goes systematically up through
Speaker:the four clusters, going from the lowest grade glioma to the
Speaker:glioblastoma, or systematically goes down. That's
Speaker:what you want to see. Those genes are involved in what's happening,
Speaker:but there's still 544 genes.
Speaker:And I can't see the forest for the trees.
Speaker:Interesting. So does this inform treatment options?
Speaker:Well, that's the problem. Treatment options, or at
Speaker:least my understanding. So remember, I have to be
Speaker:very upfront. I'm not a biologist. Right. Everything
Speaker:you will hear me talk about, I learned by looking at this data. I have
Speaker:nothing formal training in biology whatsoever,
Speaker:so you're dealing with a novice. It reminds me of the
Speaker:the original Star Trek show where Bones would always like, I'm a doctor, not an
Speaker:engineer. Like you're like, I'm a physicist, not an engineer. I mean a doctor, you
Speaker:know. So what, what initially inspired you to take
Speaker:all of your quantum mechanics Knowledge and, and, and
Speaker:apply it biological data. Yeah, so remember I'm the
Speaker:co inventor of this algorithm. The other inventor is David
Speaker:Horn, Tel Aviv University, a frequent visitor to
Speaker:Slack. He collaborated on many physics
Speaker:papers. And Slack is not the
Speaker:messaging app. It's Stanford Linear Accelerator. Is that right?
Speaker:No case Stanford. It used to be called the Stanford Linear Accelerator
Speaker:Center. So I'll tell you out of school the story which
Speaker:reflects wonderfully on the doe. At some point the
Speaker:DOE wanted to put its name on everything
Speaker:and trademark it.
Speaker:Well, Slack said, because Stanford said you can't trademark
Speaker:the Stanford Linear Accelerator Center. It's us, right?
Speaker:We run the place. So DOE made
Speaker:SLAC change its name to SLAC S L A C
Speaker:and call it the SLAC National Accelerator Laboratory. So I guess
Speaker:as an abbreviation we're now Snal, not Slack.
Speaker:Slack sounds better. I grew up with it as slack for
Speaker:42 years. To hell with the DOE. I don't intend to
Speaker:listen to what they want. But it is now officially the SLAC
Speaker:National Accelerator Laboratory. Right. So at any rate,
Speaker:David Horn came into my office and life went as normal. He said, oh, I
Speaker:have something interesting to show you. Because he kind of had left high energy
Speaker:physics about eight years earlier and was looking
Speaker:into data mining. And he said there this cool idea
Speaker:that grows out of
Speaker:something done by somebody called Emanuel Parsons, called the Parsons
Speaker:estimator. And I figured out I should think about it as a
Speaker:quantum potential. I already was very
Speaker:suspicious. It sounded like a very strange idea.
Speaker:And so we did our usual thing. We stood at the blackboard and
Speaker:yelled at one another for three or four hours. And
Speaker:then we came to a meeting of the minds and said, this really isn't the
Speaker:stupid idea. It's kind of cute. And
Speaker:you know, David said, well, he showed me some simple problems
Speaker:having to do with classifying crabs. It's the standard
Speaker:old problem that people did
Speaker:and seemed to be very interesting.
Speaker:He said, but the problem is in order to understand who's
Speaker:so the what, the idea behind it is very simple. You take
Speaker:all the data, you create a function. The properties of this
Speaker:function are wherever there's more data
Speaker:than it is in the surrounding, there should be a peak. And
Speaker:wherever there's less data, there should be a value. The problem is,
Speaker:of course, the way you create that data is very sensitive to a parameter that
Speaker:you introduce. Okay, I don't want to get too
Speaker:messy in this. It's all published so it can be
Speaker:read and the sensitivity is hard to
Speaker:deal with. So what we finally
Speaker:understood was if we treated this. So this
Speaker:is just a professional deformation.
Speaker:Because we're particle physicists and quantum mechanics,
Speaker:we think of everything as having something to do with particle physics.
Speaker:Quantum mechanics. This problem has nothing to do with particle
Speaker:physics or quantum mechanics, except we want to get rid of the sensitivity of
Speaker:that function. And we said, if you think of this as the solution to
Speaker:a problem in quantum mechanics, that problem has a
Speaker:term having to do with the particles moving around and another
Speaker:one having to do with the landscape it finds itself in.
Speaker:That's called a potential function. It turns out
Speaker:that potential function always has sharper
Speaker:features, more pronounced dips
Speaker:than the solution, has peaks,
Speaker:and turns out to be much less sensitive to the parameter that goes into
Speaker:building that function. So, literally, by
Speaker:saying, what problem is this picture,
Speaker:the solution to which turns out to be
Speaker:trivial to solve what potential function,
Speaker:you get a sharper picture. And the sensitivity, the parameter used
Speaker:to build what's called the kernel function, that potential function
Speaker:goes down by a factor of 10. So you have a pretty
Speaker:unique answer. It's easy to arrive at. You don't have to be careful about
Speaker:picking your parameters. The problem is if
Speaker:you think of this as things living. So we have these
Speaker:valleys now where the bulk of the heavy
Speaker:concentration of data is, or we have stream
Speaker:beds, but the data is up along the
Speaker:walls as well as being down in the
Speaker:valley. So the question is, which data belongs to which
Speaker:valley, which stream bed, et cetera.
Speaker:And so you want to move the points down the sides of the valley
Speaker:and have them collect in whatever structure is at the bottom.
Speaker:Well, people try that. In fact, my
Speaker:colleague had been trying it. And as the dimension goes up,
Speaker:for reasons we understand, that
Speaker:surface becomes rippled just due to noise.
Speaker:And so basically, if you try to just move things
Speaker:down using ordinary calculus, what's called gradient descent,
Speaker:you're just moving points in the direction of the slope. They get
Speaker:stuck in the ripples. Oh, I see. Because
Speaker:it can't find the global minimum. It can't find the important
Speaker:minimum. Right. If things move according to quantum
Speaker:mechanics, all bets are off. It's a much nicer
Speaker:story. And it's the uncertainty principle, which made the
Speaker:solution wider to begin with. So we're going to
Speaker:exploit the uncertainty principle. If I move points
Speaker:according to the laws of quantum mechanics. The first thing is, unlike
Speaker:gradient descent, the quantum wave function extends out to
Speaker:where the valley starts going up again.
Speaker:So points automatically start to slow down as they
Speaker:reach the minimum. And they don't overshoot and
Speaker:rattle around, they just stop because they see now
Speaker:equal influence from Both walls and therefore no
Speaker:force. Also they don't see ripples because
Speaker:the uncertainty principle allows for quantum tunneling
Speaker:and they simply go through those tiny ripples or ride above them.
Speaker:So as a way of making the data move
Speaker:and find the minima in the function in any number of
Speaker:dimensions and as a way of speeding up the
Speaker:analysis, because quantum evolution is done by matrix
Speaker:multiplication, so it's enormously
Speaker:parallelizable. Didn't say that very well.
Speaker:Parallelizable, you get a very quick algorithm
Speaker:that is using physics principles. But to solve a non physics
Speaker:problem, just getting the points efficiently down to the bottom.
Speaker:If there is a riverbed that tells
Speaker:you something about the data, says there's some one parameter
Speaker:thing, some regression on the data that you can do to
Speaker:something that's very extended. It's a huge discovery.
Speaker:It's much better than finding simple clusters.
Speaker:But that's what this does. So DQC has
Speaker:advantages. One, it doesn't require training sets.
Speaker:So it's great for biology data because having annotated
Speaker:training sets that are really good, hard to combine.
Speaker:So this is interesting and what does DQC stand for?
Speaker:Dynamic Quantum clustering. Meaning we're using quantum mechanics.
Speaker:The find the minimum. Now do you need a quantum computer to do this
Speaker:or this is just an algorithm? Interesting. I told you I'm here under
Speaker:false pretenses. You asked me here to talk about quantum
Speaker:computing. And I told you I don't do quantum computing. I'm talking about using
Speaker:quantum mechanics to run on an ordinary
Speaker:computer. Could it run on a quantum computer? Yes, if
Speaker:they were really as fast and as good as they say they're going to be,
Speaker:would even be better because it can handle bigger. I'm
Speaker:focusing on biology, by the way. Way this algorithm is data
Speaker:agnostic, right? It's not talking about
Speaker:biology per se, it doesn't care. It just says that
Speaker:there's something interesting. Data is not distributed with
Speaker:equal density every place. Things that are more like one
Speaker:another tend to be located in a more dense region.
Speaker:Okay, so and this has been applied to many things. It's been
Speaker:applied to
Speaker:finding radioactive sources in the city of Chicago hidden
Speaker:in a building. Okay, it, there's a paper that I wrote on
Speaker:that it's been applied to. I guess
Speaker:there's no paper on this, but it was a problem I did for somebody
Speaker:finding
Speaker:tanks in the desert that have been camouflaged, painted,
Speaker:same thing using the data from a
Speaker:multispectral hyperspectral camera.
Speaker:So it doesn't care what the data is. It's data agnostic
Speaker:it's feature agnostic. It is
Speaker:unsupervised completely. That doesn't mean that you don't use the results
Speaker:of a previous analysis to now supervise the next analysis
Speaker:based on what you learned. You do do that.
Speaker:But at any rate, that was it. So what's now
Speaker:going on is, as I said, we have the world's best
Speaker:classifier. But I don't know how to tell you what the
Speaker:best drug for your tumor, the one that's most likely
Speaker:to work on, the biology that's happening now, should
Speaker:be. And that's why I need to go find a biologist and they're
Speaker:not so great at doing it either. So witness how
Speaker:many people go through many, many failed drugs. Yeah,
Speaker:well, precision medicine is definitely, you know, one
Speaker:of, one of the, you know, one of the biggest outcomes of using
Speaker:this type of, this type of clustering that
Speaker:we can, we can create. I mean there's so many, there's, there's just, there's
Speaker:so much out there that needs this type of, you know,
Speaker:this type of. Still in its infancy. It's got a place to go to
Speaker:be precision medicine. Where do you. Oh, go ahead.
Speaker:Oh, please don't let me. What you have to say. I was going to say
Speaker:based on dynamic clustering, quantum clustering,
Speaker:you know, where do you see it evolving in the.
Speaker:So I'll finish telling you this story about why I'm excited because
Speaker:I think it's evolving to a really. I, I've
Speaker:seen something today I never thought I would
Speaker:see. So last night it showed up at 10 o' clock
Speaker:in the evening and I'm still digesting what I saw.
Speaker:What I show you, you should take with a grain of salt. But there's no
Speaker:question, there's zero chance that I'm wrong in terms
Speaker:of what you'll see. Okay, so the way
Speaker:docs like to look at the problem or cancer
Speaker:researchers is they talk about so called
Speaker:biological pathways. Biological
Speaker:pathways are sets of genes
Speaker:which carry out some process. In the end, all processes
Speaker:are making proteins, but we're not looking at the proteins being
Speaker:made, but we know these sets of genes are
Speaker:functioning together to produce an interesting
Speaker:output. So if I can
Speaker:take the information I have and find
Speaker:a way of saying, oh, so in fact what I'm
Speaker:seeing is actually predicted by the following
Speaker:set of genes. And I can assign
Speaker:meaningful coordinates to each tumor
Speaker:based on where they are and what that set of
Speaker:genes is doing together. I mean, biospace,
Speaker:a point in that space depending on how many
Speaker:things still I'm producing. One axis in biospace
Speaker:and it's representing a process which is
Speaker:happening in the patient where a bunch of genes are
Speaker:telling me something, not one. And
Speaker:that bunch of genes I can look at and ask what are their
Speaker:properties? What are their common properties?
Speaker:So I will share something with
Speaker:you. So at any rate, did that.
Speaker:Okay. Went to biospace using DQC
Speaker:methods again. Remember I told you I had four clusters. So
Speaker:there are six pairs of clusters which
Speaker:differ in how the genes are being expressed in those clusters.
Speaker:So I can find the most, the list of the most important ones between
Speaker:1 and 2, 1 and 3, 1 and 4, 2 and 3,
Speaker:2 and 4, 3 and 4. So
Speaker:six possible axes
Speaker:in biospace, the sets of genes that are most important.
Speaker:And then using those axes which go from minus something
Speaker:to plus something, I can assign a coordinate to every one of the
Speaker:tumors. So I have points in a six dimensional space.
Speaker:Okay. The way that's done,
Speaker:it's done in a way such that zero on
Speaker:that axis means that for that set of genes,
Speaker:that point is consistent with what the value for
Speaker:all of the genes in that thing. The average value of those genes
Speaker:is. Plus means you are moving
Speaker:x standard deviations away from
Speaker:being at the average expression. So I don't need to know what the
Speaker:normal expression of a gene is. That's always one of the
Speaker:problems. You rarely have data for normal
Speaker:cells of the same type as the tumor.
Speaker:And so you don't know where to set your zeros. Here I'm doing it by
Speaker:the average and I'm saying how far from a standard deviation am
Speaker:I out one way and how far out am I the other way?
Speaker:And so you plot the same tumors.
Speaker:Now I have to remember what I do. I go to share
Speaker:share the screen. So you plot the same set of
Speaker:tumors. Now you see my background. Yes. And I am going to
Speaker:switch over to the computer in my basement and show you a fun thing.
Speaker:So the axes you see here are
Speaker:DQC's plotting of
Speaker:the cancers in a six dimensional biospace.
Speaker:But I want you to see, blues are glioblastomas,
Speaker:reds are the lowest grade gliomas,
Speaker:magentas are the next lowest grade gliomas.
Speaker:And the goals
Speaker:are closest to the glioblastomas.
Speaker:Interesting. Now I told you this is an animation. We're going to start the points.
Speaker:This is how the QC works. Okay. So we're moving the points
Speaker:downhill.
Speaker:You like that? Yeah. So it's all. What's
Speaker:happening, they're all converging into one like a
Speaker:regression, right? Right. There's a one dimensional
Speaker:shape. The healthiest tumors, they're not
Speaker:healthy, but they're the healthiest. They're not the least awful.
Speaker:Yeah, the least awful. So if I look at for this.
Speaker:We already saw that when we analyzed the. So the colors
Speaker:here are the clusters that I discovered
Speaker:in RNA sequencing space in what we call
Speaker:gene space.
Speaker:They've just been arranged in a line from best to
Speaker:worst. So blue is the worst. The
Speaker:glioblastomas over here. Okay. Okay. So
Speaker:for those listening, don't worry, we're gonna link in the show notes to a video
Speaker:representation of this. Interesting. At
Speaker:any rate, this is what you see.
Speaker:So the. This is the plot in bio space. Now that's very
Speaker:interesting because these have the dimensions of the bio coordinates
Speaker:and those coordinates have a meaning.
Speaker:Okay. In fact, I'll tell you what the meaning is. And this is based
Speaker:upon a set of data of patients.
Speaker:Yes, this is 692 patients.
Speaker:That data was submitted to the cancer genome project. Okay. Oh, so this
Speaker:is open source data that you're pulling. This is absolutely open source.
Speaker:What my company did when we existed, because we had various
Speaker:projects going and things like this, we downloaded all of
Speaker:that data for the RNA sequencing data and as much
Speaker:as we could find about each of those tumors, which was
Speaker:not a hell of a lot. But there's something, It's a good
Speaker:database. As I said, it's been studied for years and years and years.
Speaker:So this is results obtained by starting from no information
Speaker:and just relooking at the brain cancer data
Speaker:and saying, people have been studying this forever. Did they ever
Speaker:find anything like this? And the answer is no.
Speaker:This has never been discovered. This has never been discussed. So
Speaker:using traditional analytical sources,
Speaker:you could not. Whatever. You could not get at this information
Speaker:without doing the. The dynamic
Speaker:because you make a lot. Of assumptions about what you're supposed to look at. You
Speaker:make a lot of assumptions about how you filter the data.
Speaker:You end up throwing the baby out with the bathwater.
Speaker:Go ahead. No, you also said you didn't do any cleanup of the data. Like
Speaker:that's just the wrong. No. Well, I mean, they've cleaned it up obviously. Obviously at
Speaker:some level. But we're not doing the post. Whatever they
Speaker:did cleanup that people normally do where they filter out
Speaker:genes, where they have this gene should be expressed at
Speaker:least at this level. All genes that aren't expressed at that
Speaker:level we're throwing out of the data set. Okay.
Speaker:If I see a difference between two
Speaker:clusters and the genes are expressed differently in the two clusters,
Speaker:but what they call the fold value isn't big enough.
Speaker:I'm throwing it out of the data. Well, you can imagine
Speaker:if there's hidden information in the data and you're busy throwing things
Speaker:away, the chance you throw the baby out with the water bath water
Speaker:is very high. Exactly. And
Speaker:that's exactly what this shows. The benefit of going in
Speaker:unbiased, unfiltered,
Speaker:completely agnostic. Look to see if there's a signal first.
Speaker:And then when you see the signal, which I did. So stage one is,
Speaker:wow, there's a signal. Stage two, what is making the
Speaker:signal? EQC is built for solving those
Speaker:problems. Right. So basically, and
Speaker:that's where it differs from AI, okay, AI needs training
Speaker:sets for the most part. There are
Speaker:versions of AI now that claim not to, which
Speaker:are real. They make up data in order to train
Speaker:the data. There aren't enough training sets.
Speaker:So what you do instead is you make up artificial data and then try
Speaker:to teach it to reconstruct the real data.
Speaker:Okay, by by picking the parameters in the artificial data
Speaker:and then you try to classify existing data.
Speaker:But it's a different story here.
Speaker:Everything is understood. The algorithm is totally
Speaker:prescriptive. I know exactly what's going on.
Speaker:There's no mystery. Once I find something
Speaker:and we ended up, I just showed you with this concept of
Speaker:biospace, which is what
Speaker:people in literature, it turns out that's where the idea came from
Speaker:to look at it this way, what people were
Speaker:talking about as latent coordinates in the data.
Speaker:So there are people doing AI that say, oh, I'm going to keep feeding
Speaker:AI from this and AI is going to reduce my problem to
Speaker:some low dimensional manifold and I'll call that a latent
Speaker:coordinate picture. But then I'm faced with the problem. I don't really know
Speaker:what the coordinates mean. I am busy trying to interpret
Speaker:them and I certainly don't know how to exploit them.
Speaker:Different here. Right. So we started with no training data.
Speaker:Am I looking at here? Shouldn't be showing
Speaker:you this, but my, my collaborators say I can show it to you.
Speaker:So here are the axes. So what do
Speaker:you know from these axes? Well, the
Speaker:genes in this axis have, as I
Speaker:say, tumor associated fibroblast activation,
Speaker:their immune checkpoint genes
Speaker:signaling chemokine driven inflation, the pathways that are
Speaker:being recruited for this or that. Basically
Speaker:here it says if you want to
Speaker:change overexpression or under expression, you want to look at
Speaker:the drugs which do the following thing.
Speaker:There's one such description for every one of the six axes
Speaker:they have A meaning. And so if I simply look at the
Speaker:coordinates and biospace and see which. Along which of these
Speaker:axes the biggest signal lies,
Speaker:that's the first set of drugs you try on the tumor.
Speaker:So by looking at in biospace and how the tumor
Speaker:evolves in biospace,
Speaker:that's what this is, right? The evolution of the tumor
Speaker:in biospace. Every one of these points, after all, is a
Speaker:snapshot in time of the tumor at that point.
Speaker:What this suggests is it's a continuous
Speaker:evolution to glioblastoma through these
Speaker:biological processes. And as they change.
Speaker:So you're seeing the. So basically, what
Speaker:have I learned? God is showing me, or biology is
Speaker:showing me how the tumors evolved in
Speaker:time.
Speaker:Interesting. I don't know. That doesn't. So do they all start out
Speaker:as like you showed that image again, but
Speaker:the one where they're all on the same plane, the one that we're all
Speaker:on the same plane. This is the
Speaker:snapshot and survival term for the patient because that's what
Speaker:is changing along this curve. We already saw that. Oh, I see. So
Speaker:this had different survival times. So these are all
Speaker:tumors. I don't know where healthy is.
Speaker:Okay. So not everybody starts out, for example,
Speaker:you know, in the red. And then basically
Speaker:they probably do. Okay.
Speaker:Glioblastomas have to be. If you look at them in terms of their gene
Speaker:expression patterns, they're a mess. Okay.
Speaker:They've undergone many mutations to get where they are. And the more mutations,
Speaker:that's the different colors, basically, and they change. Okay. So
Speaker:everyone starts out maybe with the red, but not everybody
Speaker:goes all the way to the blue. Purple. Right. And they probably. Everybody probably
Speaker:starts out to the left of the red. Right. Because these
Speaker:tumors probably form at the single cell or small number of
Speaker:cell levels. Okay. And take 10 years to grow.
Speaker:Okay. The first show up and be seen. Okay. So
Speaker:it's not. We don't have examples of the earliest
Speaker:version. That's the beauty of what. What's blowing me away. Yeah.
Speaker:Don't need to know any of this. I don't
Speaker:have to know. I only need the gene expression pattern
Speaker:and I only needed the information about survival time to
Speaker:interpret the axis. Everything else came after
Speaker:I found the axes when I had to interrogate
Speaker:pathway databases to find out what they do.
Speaker:And truth be told, I asked an AI to give me the
Speaker:information about that because it's a pain in the ass to
Speaker:go through those things yourself. So we could use. And I just
Speaker:wanted to know what I might see. This is not to be taken
Speaker:seriously. Okay. Because My, my
Speaker:biologist and my doctor friend are going to have to do the job
Speaker:of vetting what these interpretations. I only trust
Speaker:AIs a little
Speaker:bit. It's sort of fun to do that. Okay.
Speaker:But what I wanted to give you was a feeling
Speaker:for the difference between biospace information
Speaker:and simple single gene information. Okay.
Speaker:And it's awesome what the difference is. And
Speaker:it's awesome that there's a progression in
Speaker:biological processes that lead you to
Speaker:glioblastoma. I can't tell
Speaker:you this actually represents evolution,
Speaker:but if it looks like evolution and it smells like
Speaker:evolution and it wax like evolution, it's
Speaker:evolution, okay? I mean that's just my feeling.
Speaker:Now I've already given you all the I don't know any biology,
Speaker:do know a lot of physics, do know how DQC works.
Speaker:Okay. I know that better than anybody. But this
Speaker:business that you can take the information that you learned in
Speaker:the genetic right, in the single gene
Speaker:basis and convert it to biological
Speaker:process basis and learn entirely new things
Speaker:more suited to advising doctors who are treating
Speaker:cancer patients. Because I can take a new
Speaker:tumor stuff and put it on that plot, see where it
Speaker:is, see what its access definition is
Speaker:and see what the likely best drug is to start with. And
Speaker:then if that doesn't work, drop down to the next most likely. The next
Speaker:most likely. So
Speaker:basically that sort of we can stop sharing actually
Speaker:now, which he says I can stop sharing.
Speaker:Okay, great. So you know why I'm
Speaker:in this befuddled state at the moment? Because I am still
Speaker:absorbing what this is telling me. I certainly never expected
Speaker:when I thought of trying that because people talked about these latent
Speaker:variables and hidden dimension, hidden coordinates
Speaker:and describe ways that might work. I didn't see any
Speaker:examples actually worked out. This is the
Speaker:story from beginning to end
Speaker:genetic coordinates to discovery
Speaker:to the world's best classifier to changing that into
Speaker:bio coordinates discovered from the genetic side.
Speaker:The treatment options, a tool for helping doctors treat,
Speaker:for suggesting to cancer researchers new experiments to
Speaker:do to verify what they're seeing on this.
Speaker:Lots of suggested. I alone with no knowledge can think
Speaker:of 10 things people should explore based on this. And
Speaker:drug companies want to know what the next set of things
Speaker:to target should be for a given disease.
Speaker:Wow. I think that's pretty cool. That is
Speaker:impressive. So it really is. You know,
Speaker:DQC is telling me the data is whispering
Speaker:to you. I'm the tool
Speaker:that'll teach you how to listen.
Speaker:That's the way I feel about it. Since it's my baby, it's grown
Speaker:up, I really think it's grown up and
Speaker:I'm very impressed with where it got. So you're getting me in my
Speaker:very biased statement for it. Oh, we can tell it's super, super humble. But
Speaker:no, it's really. It's really exciting. But also to see where it can
Speaker:be taken from there, you know, like, this is just the beginning.
Speaker:The. There's so many scratching the surface. First place, those
Speaker:axes could be improved because there's more than one
Speaker:set of genes that give similar information
Speaker:how to exploit it, how to do the bench experiments.
Speaker:That's not me. I don't know that stuff. And I'm
Speaker:83. I'm not ready to start learning how to be a bench
Speaker:biologist. Okay. But
Speaker:it's. It. It's just so cool. I mean, you know, it's
Speaker:like you've seen the underbelly of what's happening in the biology.
Speaker:At any rate, I don't know if you agree with me, but I think it's
Speaker:really cool. No, that is really cool.
Speaker:There's a lot to take in. I'm sorry about
Speaker:that. No, no, I mean, you know, we have a scale system for these
Speaker:shows, right? Like five. Five. What is the five Schrodinger.
Speaker:Schrodingers, yeah. So we have, like from zero to five Schrodingers. This is definitely gonna
Speaker:be a good five Schrodinger show. Like, and I was able to follow on because
Speaker:I was a d. Data scientist before this. So, like, when you
Speaker:said pca, like, I knew what you were referring to at least. I.
Speaker:But like, so, like, it was like, this show is really geared towards the
Speaker:quantum curious. Some of which will be data scientists, some of these will be
Speaker:marketers, some of those will be, you know, kind of traditional software engineers,
Speaker:et cetera, et cetera. Marketers. Right. Because it's our thesis that
Speaker:when the quantum computing ecosystem comes around, and
Speaker:indeed, I think what you've proven today is you don't really need
Speaker:quantum computing to take advantage of the
Speaker:innovations in quantum science. Right. Like. Right.
Speaker:I think that was an assumption I think Candace and I had. I don't want
Speaker:to speak for Candace, but I know I certainly did. But I know that there
Speaker:is a field called quantum inspired algorithms, which is probably.
Speaker:That's sort of what this falls. Yeah,
Speaker:but it's just exciting
Speaker:that innovation like this can come about in such a way that
Speaker:it's going to improve people's lives. What you've discovered is
Speaker:I'm not a biologist or a doctor, but I would imagine that a doctor or
Speaker:pharmaceutical Researcher would look at that and say, oh, you know what this means? This
Speaker:means xyz. I hope so. I mean, I mean
Speaker:I'm pretty much at the limit of what I can do even
Speaker:with collaborators on our own. The, the point
Speaker:we're writing the paper now. This is, I've already blown
Speaker:my collaborators out of the water because this was discovered last night and they
Speaker:don't know about it yet. I have their permission to talk
Speaker:about it though, so that's cool. It's,
Speaker:you know, so I, I, I'm glad you liked it and the five
Speaker:shortinger level because I'm only here because you guys refuse
Speaker:statement. I don't know anything about quantum. Well, I, I do know something about quantum
Speaker:computing but I am not a quantum computer person and
Speaker:so I didn't belong on your show but you kept refusing to let me off.
Speaker:But I mean, I think it's important that people think about like this is not,
Speaker:I think one of the things that obviously you're, you're, you're a great
Speaker:presenter and great teacher of these very
Speaker:complicated topics but you've also something to figure it out. Plus I also think it's
Speaker:important for people to realize that quite quantum physics and research in that
Speaker:space is already improving people's lives or at
Speaker:least already showing fruits of that. And
Speaker:I think that your research kind of shows that. It's like, you know, you don't
Speaker:have gen, you don't have the billionaires facing off over, you know,
Speaker:Jensen saying it's going to take 20 years, Bill Gates saying it's going to take
Speaker:less. Right. I mean this is pretty basement
Speaker:and the data is free. So I think the other lesson here
Speaker:is we have a wealth of data that's under explored
Speaker:because looking at it in an unbiased fashion hasn't been done.
Speaker:Right. So I have lots more diseases I want to look at
Speaker:and I have all this TCGA data for
Speaker:pancreatic cancer and various other
Speaker:cancer and so
Speaker:it's sort of fun, right? I like how you kind
Speaker:of mix, you know, I know you say you weren't appropriate, but I think you
Speaker:were totally appropriate for the show and you've got the physics
Speaker:background when you're talking about quantum clustering,
Speaker:why it's affecting the biological,
Speaker:giving us biological data that we're able to move forward with
Speaker:potentially for precision medicine. I love the
Speaker:bridges that are being created all over
Speaker:the place here that you're not just kind of stuck in one thing thinking you
Speaker:can only do one thing because you have a certain amount of knowledge but how
Speaker:you've bridged that to bring in all of this
Speaker:biological data information, I think it's
Speaker:fantastic. I'm very happy that you came and you joined us today.
Speaker:I learned. I'm glad I didn't bore you and I hope I didn't get too
Speaker:far into the weeds, which my wife accuses me of doing all the time.
Speaker:Mine too.
Speaker:Where can people find out more about you and what you're up to?
Speaker:Me and what I'm up to? Well, I'm on LinkedIn. People contact
Speaker:me through LinkedIn all the time.
Speaker:I have a long history and you know, if you go look at
Speaker:the archives, the physics archives. Physrev. Physrev A.
Speaker:Physrev B. I, I mean my, my past history is a little
Speaker:eclectic, even in physics, which I attribute to having a
Speaker:short attention Spanish. But I started in
Speaker:particle physics. In phenomenology means looking at data,
Speaker:trying to understand what it's telling me. I moved into
Speaker:pure abstract particle physics
Speaker:and then I went into what's called lattice field theory and lattice gauge
Speaker:theory, which is trying to learn stuff from
Speaker:how to say this. Didn't expect to talk about this. So,
Speaker:so let's talk about how we do physics, which is another
Speaker:totally off topic thing. And you may be running out of time. I don't know.
Speaker:You tell me when I have to shut up.
Speaker:But the, the, the story is
Speaker:physicists are smart, but there are very few problems we know how to solve
Speaker:exactly. Only a handful.
Speaker:Everything else is done by a process we call perturbation theory.
Speaker:Mathematicians also call it perturbation theory. You say, well, this
Speaker:problem that I know how to solve exactly kind of looks
Speaker:a little bit like this other problem, but with some
Speaker:modifications. So let me add the modifications to the problem
Speaker:and try to calculate corrections to the answer
Speaker:based on the modifications. So I have the original problem
Speaker:set and forces involved and the changes in those
Speaker:forces a little bit. And then I calculate
Speaker:perturbatively what's happening. People do it in
Speaker:celestial physics all the time. I have this
Speaker:planet moving around the sun in an elliptical orbit. Oh well, but
Speaker:there's the moon. So how does that affect the orbit?
Speaker:Well, I can't solve that problem. That's already a three body problem.
Speaker:And there's no exact solution to the three body problem by the time
Speaker:it's also got Mars and Jupiter and
Speaker:Saturn and Pluto and Mercury in the problem.
Speaker:I can't plot orbits. But people do it all the time.
Speaker:NASA plots orbits. How do they do it? They calculate
Speaker:the original orbits and they Start calculating the effects of Mars
Speaker:and this and that on that orbit, because we know what those
Speaker:forces are if Mars is on its orbit. And through
Speaker:successive corrections, successive iterations, you're
Speaker:able to make the small perturbations in the orbit that get the answer
Speaker:right for you and eventually lets you send something to the moon
Speaker:and not miss. Okay,
Speaker:so perturbation theory is, is what we use. But what is perturbation
Speaker:theory based on? I have a solution, I know how to get
Speaker:exactly. And I know how to make small corrections
Speaker:to that solution. And then I can describe all kinds of
Speaker:crap. So, for example,
Speaker:condensed matter physics talks about matter.
Speaker:So I ask you, has anybody ever proved that the table you're
Speaker:sitting at exists?
Speaker:Is there such a thing as a table made out of wood? In fact,
Speaker:is there such a thing as wood? The answer is no.
Speaker:Use wood to build houses. I use engineering
Speaker:principles to calculate the stress and load on a beam.
Speaker:How the hell do I do that if I don't know wood exists?
Speaker:I describe wood, I assume it exists, I
Speaker:characterize it in terms of a bunch of properties,
Speaker:and then I can, based on that, make small correction
Speaker:calculations again to see how the wood behaves
Speaker:when I stand on it. But I have to start from the
Speaker:assumption it exists and that there are properties
Speaker:I can measure for it and make prediction based on that.
Speaker:But the first principles thing that would exists, no way.
Speaker:Nobody solved that problem. Okay? So I was
Speaker:very interested in that because that's sort of a first principles problem,
Speaker:right? It's very philosophical, isn't it? It's where the, a
Speaker:hard science like physics kind of meets up against.
Speaker:Oh, we meet up against soft stuff all the time and
Speaker:we fail to solve the problem. But that's okay.
Speaker:It, it's. At any rate,
Speaker:I was always interested, always after many years in
Speaker:phenomenology, I and papers
Speaker:published in phenomenology and things like that,
Speaker:getting into field theory and, and
Speaker:trying to understand from first principles how to solve hard problems
Speaker:that, like quantum chromodynamics.
Speaker:That intrigued me because we're kind of using up this
Speaker:perturbation theory paradigm, okay? It's very
Speaker:useful, it's very good. But we're already running into lots of
Speaker:problems where it doesn't work. We don't know a problem that's
Speaker:approximately like the problem we want to solve.
Speaker:So how do you solve it? So I got involved in that. I got involved
Speaker:in what's called lattice field theory. And then I said, but how am I going
Speaker:to know I'm right? Because
Speaker:I could be Wrong in pushing my answer in the one
Speaker:known direction. There got to be other problems,
Speaker:but there's only one quantum chromodynamics. It's the one we
Speaker:live with, it's the one we're made of.
Speaker:So I don't know if I'm cheating or not, but there's
Speaker:lots of condensed matter problems and they all have different
Speaker:answers and many of them are strong coupling problems and you
Speaker:can't treat them perturbated. So take the same methods
Speaker:and change your field and go look at condensed matter and see if you can
Speaker:develop techniques to do that. Then did that for
Speaker:a long time and then developed some methods and decided,
Speaker:oh, David Horn came into my office and I said,
Speaker:oh, this looks interesting. So I can't stay
Speaker:in one area now, to me it makes sense why I'm changing to other
Speaker:people. It looks like I have no attention span. So that's
Speaker:okay because I do this for me. And
Speaker:so as long as I see the thread, I'm happy. But that's how I'm here.
Speaker:I'm now in biology, quote. But we're
Speaker:glad. We're glad that you're here. Glad that we got to learn
Speaker:a bunch of stuff today. I think it's going to be really
Speaker:exciting to unpack it and to
Speaker:have you back because you are just a. Few
Speaker:guys, but I'm going to bore you. So. No, I don't feel bored. I
Speaker:mean, I'm more fascinated. I'm confused. It's about some things, but,
Speaker:like, I'm also fascinated, too, and we want to be respectful of your
Speaker:time and. But we'd love to have you back on the show.
Speaker:I'm sitting in my office. I have
Speaker:Nothing on until 5:00 clock this evening. Awesome. We'll
Speaker:definitely have you come back then because again,
Speaker:it's just really great information. It's important, it's exciting. I think it's very
Speaker:exciting. So, unfortunately, we have a little limitation,
Speaker:so. Yeah, but definitely. And so folks can
Speaker:reach out to you on LinkedIn and engage with you directly, if you're cool with
Speaker:that and let your AI. I don't promise to
Speaker:answer everybody, and if they're a crackpot,
Speaker:I don't promise to be polite. There you go. That's fair.
Speaker:I'm liking that. I like that. Let our
Speaker:AI finish the show. And that wraps this quantum
Speaker:odyssey on impact. Quantum. A massive thank you to Dr.
Speaker:Marvin Weinstein for taking us deep into the fractal jungle of
Speaker:biology, data, science and quantum mechanics with
Speaker:only his brain, DQC and a suspiciously
Speaker:underutilized basement server farm. From classifying
Speaker:glioblastomas with 99% accuracy to uncovering
Speaker:biocordinates that could revolutionize precision
Speaker:medicine. Marvin reminded us that sometimes the biggest
Speaker:scientific breakthroughs don't require a billion dollar
Speaker:lab, just a stubborn physicist, open source data,
Speaker:and the audacity to ask what if? If you enjoyed
Speaker:this episode, and really, how could you not? Be sure to
Speaker:subscribe, share and let your fellow Quantum Curious friends
Speaker:know. And as always, check the show notes for links to
Speaker:Marvin's work, ways to connect, and possibly a
Speaker:diagram that will make your head spin just a little less.
Speaker:Until next time, stay curious, stay entangled,
Speaker:and remember, just because you can't observe the Quantum doesn't mean it's
Speaker:not observing you. Cheers.