1 00:00:00,540 --> 00:00:04,299 Miko Pawlikowski: I'm Miko Pawlikowski and this is HockeyStick. 2 00:00:07,180 --> 00:00:17,100 Today we're talking about how generative AI is changing the field of data analytics and how you too can leverage large language models to become your assistant and co-worker. 3 00:00:17,320 --> 00:00:22,009 I'm joined by the three authors of the "Generative AI for Data Analytics" book, 4 00:00:22,240 --> 00:00:24,960 now available in early access from Manning.com. 5 00:00:25,130 --> 00:00:31,320 Artur Guja, risk manager and computer scientist with over 20 years of experience in the banking sector. 6 00:00:31,520 --> 00:00:31,900 Dr. 7 00:00:31,900 --> 00:00:43,325 Marlena Siviak, data scientist and bioinformatician, the co-creator of the first global model of the COVID-19 pandemic, and the co-author of a techno thrill novel and sci-fi short stories. 8 00:00:43,495 --> 00:00:44,275 And Dr. 9 00:00:44,275 --> 00:00:56,125 Marian Siwiak, data scientist, strategist, and bioinformatician, the creator of the first artificial sentience, something we're going to cover in this episode, and the sci-fi novel Pharmacon. 10 00:00:56,434 --> 00:00:59,995 Welcome to this episode and thank you for flying hockey stick. 11 00:01:00,816 --> 00:01:04,596 The first thing I thought is that you look like an eclectic bunch. 12 00:01:04,676 --> 00:01:14,414 you've got Artur of this banking sector, Marlena with the bioinformatician, Marian, data How did you end up teaming up for the book? 13 00:01:15,109 --> 00:01:18,379 Marian Siwiak: we worked together previously, especially me and Marlena. 14 00:01:18,694 --> 00:01:20,164 With Artur, we also, 15 00:01:20,294 --> 00:01:21,954 Artur Guja: walking our kids in the park. 16 00:01:21,954 --> 00:01:29,934 the three of us used to work, earlier on various ventures, on, process, automation on the business process re-engineering. 17 00:01:30,354 --> 00:01:33,374 So this is one of many ventures that we've done 18 00:01:33,739 --> 00:01:34,219 Miko Pawlikowski: I see. 19 00:01:34,259 --> 00:01:37,299 So you go way back and this is just Another project. 20 00:01:37,629 --> 00:01:38,559 Just another day. 21 00:01:39,026 --> 00:01:42,906 Marian Siwiak: funnily enough, it's not like we go like 20 years way back. 22 00:01:43,446 --> 00:01:45,776 We work together quite intensely. 23 00:01:45,826 --> 00:01:54,316 what we did together, multiple things, they all led to this book because we were always trying to find ways to make things. 24 00:01:54,316 --> 00:01:56,036 quicker, more efficient, better. 25 00:01:56,416 --> 00:01:58,006 This is what Artur mentioned. 26 00:01:58,496 --> 00:02:00,826 we worked in process optimization in a broad sense. 27 00:02:01,946 --> 00:02:07,556 So it was always interesting, to us how to make things more, efficient. 28 00:02:07,556 --> 00:02:09,536 And when generative AI. 29 00:02:10,456 --> 00:02:24,816 blew and finally started to resemble, human cognition in a sense, we decided to give it a try and our minds were collectively blown and we started using it for our work. 30 00:02:25,026 --> 00:02:36,251 And then we decided that now that we know how to use it, I would say, again, efficiently and, Marlena found a way to use it smartly. 31 00:02:37,041 --> 00:02:42,161 we decided that we could write a book about it because we noticed that there is a lot of buzz about it. 32 00:02:42,441 --> 00:02:44,801 There is a lot of prompt engineering. 33 00:02:45,381 --> 00:02:50,061 now I think on Coursera, you can take a specialization in prompt engineering. 34 00:02:50,691 --> 00:02:53,141 and everybody's again, looking for a silver bullet. 35 00:02:53,906 --> 00:02:59,706 So I will just type in magical command and it will solve my problems. 36 00:03:00,226 --> 00:03:03,756 our collective experience is technology doesn't solve problems. 37 00:03:04,726 --> 00:03:15,086 technology can give you a great headache if you don't use it in a way it's supposed to be used, but everybody tries to cut corners and simplify things. 38 00:03:15,596 --> 00:03:19,476 So this book is about using generative AI. 39 00:03:20,521 --> 00:03:21,461 it's not a cookbook. 40 00:03:21,581 --> 00:03:24,011 It's not, okay, this is some code or some prompts. 41 00:03:24,081 --> 00:03:27,711 You will type them in and your problems will be solved. 42 00:03:28,411 --> 00:03:30,421 it's just not how we work. 43 00:03:30,471 --> 00:03:31,711 It's not how the world works. 44 00:03:31,981 --> 00:03:34,091 Despite many people wanting it to, 45 00:03:35,204 --> 00:03:37,984 Marlena Siwiak: I think that this is the problem with expectations. 46 00:03:38,074 --> 00:03:44,214 many people have missed expectations in terms of ChatGPT and other generative AI. 47 00:03:44,224 --> 00:03:50,094 And then they are surprised and they are unhappy because ChatGPT can't make them coffee yet. 48 00:03:51,074 --> 00:03:53,034 maybe this is not the tool for making coffee. 49 00:03:53,904 --> 00:03:58,654 I very often see this kind of complaint, which is not necessary because it is a great tool. 50 00:03:59,194 --> 00:04:00,984 it's great invention. 51 00:04:01,654 --> 00:04:04,734 And I think it's going to change the way our society works. 52 00:04:05,664 --> 00:04:07,014 it's good to live in such times. 53 00:04:07,754 --> 00:04:08,534 it's really interesting. 54 00:04:09,534 --> 00:04:13,684 Miko Pawlikowski: Ask a few questions to ChatGPT and see how good they are and see what you can do. 55 00:04:13,734 --> 00:04:17,844 Ask for some snippets and do all kinds of things that kind of speed you up. 56 00:04:18,214 --> 00:04:34,869 but it's also probably the most frustrating, element of working with, especially for people like me who come from software engineering background and they like things well defined and, always replicable and reproducible and all of that, and then you go here and it all ends. 57 00:04:34,869 --> 00:04:43,269 But, before we jump into the book, a little bit deeper, can you tell us a little bit more about what, process optimization actually means? 58 00:04:43,299 --> 00:04:49,999 I know that's probably a phrase that you use a lot and it means a well defined thing for you, but it might not for the audience. 59 00:04:51,644 --> 00:05:08,904 Artur Guja: basically taking a look at what business does, what people do in the business and, looking for, ways for optimizing it, but, actually describing what should be done, what people think, should be done versus what people actually 60 00:05:08,904 --> 00:05:16,374 do, because usually there is a massive gap between what people think is happening and what they think should be happening. 61 00:05:16,674 --> 00:05:23,644 People think that, a given operation should be reviewed by at least two people and should take no more than a day. 62 00:05:23,674 --> 00:05:32,144 The fact is that usually one person just takes it off and it takes maybe two days because they're very busy or they've been on holiday. 63 00:05:32,534 --> 00:05:37,294 the dissonance between reality and documentation is usually huge. 64 00:05:37,674 --> 00:05:53,344 In looking from the process from the outside and then looking for ways to close that gap is I think the best way to describe the optimization to actually make the process and the reality meet in something that is both realistic. 65 00:05:53,634 --> 00:05:59,534 Because processes, when they're designed, are usually overly, optimistic and something that actually works. 66 00:06:00,074 --> 00:06:11,564 And then using automation, because once, once you actually describe what's happening, you can use automation to free people from the burden of mundane tasks, and actually help them focus on something creative. 67 00:06:11,974 --> 00:06:16,844 Marian Siwiak: the way we approached it is, important part is to understand what is really happening. 68 00:06:17,604 --> 00:06:32,914 And that, Joe on the second floor is actually the information hub for all the company, and despite his, activities not being overly highlighted in the org structure, he's the most important person in the company. 69 00:06:33,184 --> 00:06:38,924 We created, maps which connected on the one side, what are the actions and decisions? 70 00:06:39,189 --> 00:06:40,089 this is where we believe. 71 00:06:42,034 --> 00:06:52,704 Is the critical, value in process mapping is understanding what are the decisions to be made, who is making these decisions and what, on what basis to understand on what basis they make this decision. 72 00:06:53,104 --> 00:06:56,664 We map up decisions and we map up all the data that they are using. 73 00:06:57,264 --> 00:07:05,409 So all the actions produce data, and all decisions utilize some data and you have this two layers of information about the process. 74 00:07:05,839 --> 00:07:09,179 Artur introduced, also the third layer, which is the risk. 75 00:07:09,699 --> 00:07:18,419 So the people who are making decisions can understand, what are the risks associated, what different outcomes of decisions can be. 76 00:07:18,879 --> 00:07:23,434 And then when you can see how it all works, you can improve on it. 77 00:07:23,534 --> 00:07:25,464 You can shorten the cycles. 78 00:07:26,984 --> 00:07:38,279 so process optimization is actually first understand what is happening, understanding, what could be happening and find a way to make decisions more informed and, conscious of risks. 79 00:07:39,109 --> 00:07:48,609 And then also you have actions, and this is probably where most of the process optimization, consultants work, is how to make actions to be more efficient. 80 00:07:48,619 --> 00:07:57,179 But in our opinion, if the action is triggered by a misinformed decision, it's a pure waste of time anyway. 81 00:07:57,279 --> 00:08:03,579 Miko Pawlikowski: makes more sense now, because initially I thought when you said technology doesn't solve problems, it creates headaches. 82 00:08:03,579 --> 00:08:05,849 I was like, 'Oh, this is such a terrible slogan 83 00:08:05,949 --> 00:08:08,339 Marian Siwiak: as you can see, I now work in the aluminum refinery. 84 00:08:08,779 --> 00:08:11,299 because people didn't want to hear, what we are saying. 85 00:08:11,349 --> 00:08:15,539 They wanted to hear, 'yes, we will come and install you a new tool and all will be solved'. 86 00:08:15,989 --> 00:08:18,189 So our sales process sucks, as you can hear. 87 00:08:18,989 --> 00:08:22,339 Where we were able to implement it, it worked perfectly. 88 00:08:22,699 --> 00:08:25,099 but not many people wanted to put extra effort. 89 00:08:25,509 --> 00:08:27,179 So I need to tell you what I do? 90 00:08:27,199 --> 00:08:35,159 No, I want the tool that will discover what I do. 91 00:08:35,306 --> 00:08:40,384 Artur Guja: This is a problem with generative AI, that people expected to solve problems just by, give me an account on, ChatGPT. 92 00:08:40,404 --> 00:08:42,824 And here all my problems are solved. 93 00:08:42,824 --> 00:08:48,404 And very often as we've seen through various, anecdotal, evidence, 94 00:08:48,494 --> 00:09:08,509 giving ChatGPT to people who are not aware of the dangers of it and the problems, the hallucinations that it can generate, just leads to, hilarious results as, the case of those lawyers in US who introduced completely fictitious, cases into their evidence or, maybe slightly less hilarious examples 95 00:09:08,579 --> 00:09:16,509 of, proprietary software leaking out through ChatGPT because people were just putting proprietary information into it and it became public knowledge. 96 00:09:17,159 --> 00:09:19,679 don't expect ChatGPT to solve all your issues 97 00:09:20,341 --> 00:09:23,431 Marlena Siwiak: the comparison that you used at the beginning was the right one. 98 00:09:23,431 --> 00:09:25,601 That ChatGPT is like an assistant. 99 00:09:26,376 --> 00:09:33,516 Very, smart, very intelligent, an assistant who read a lot and learned a lot, but it's still a newbie. 100 00:09:34,006 --> 00:09:36,686 He's, just after grad school, right? 101 00:09:36,926 --> 00:09:43,472 No experience, you can ask it for help, you can give it tasks to do, but you have to, manage that. 102 00:09:43,527 --> 00:09:45,457 You cannot give him all the responsibility. 103 00:09:46,572 --> 00:09:46,862 Miko Pawlikowski: Yeah. 104 00:09:46,862 --> 00:09:49,512 one could say it was literally born yesterday, 105 00:09:51,377 --> 00:09:51,727 Marlena Siwiak: Exactly. 106 00:09:52,087 --> 00:09:53,937 Miko Pawlikowski: to a certain degree, understandable. 107 00:09:54,037 --> 00:09:57,957 so is your PhD background also in, in process optimization 108 00:09:58,107 --> 00:10:06,507 Marlena Siwiak: So my PhD was in biophysics, in particular protein translation, a bit of process optimization, but not much and not related to business at all. 109 00:10:07,285 --> 00:10:10,735 at some point I decided to quit academia and you have to do something else. 110 00:10:10,875 --> 00:10:17,185 So I turned to data science, which was very close to the things that I was actually doing as a bioinformatician. 111 00:10:17,990 --> 00:10:22,540 the type of data changed, basically, that was the thing that really matter. 112 00:10:22,730 --> 00:10:28,220 And from there, slowly, you look for a job, another job, and it goes like that. 113 00:10:28,330 --> 00:10:28,490 Yeah. 114 00:10:30,280 --> 00:10:33,900 Miko Pawlikowski: and if you don't mind me asking why quit academia, 115 00:10:34,540 --> 00:10:37,880 Marlena Siwiak: maybe I got a bit disappointed with how science is made. 116 00:10:37,930 --> 00:10:40,810 you want more citations of your publications to survive. 117 00:10:41,280 --> 00:10:46,220 And to have more citations, you have to be more popular in social media and stuff. 118 00:10:46,220 --> 00:10:55,940 it's crazy that you have to fight for popularity by being a scientist where what should count is actually your science, your research and the thought behind it. 119 00:10:55,940 --> 00:10:57,040 There are too many papers. 120 00:10:57,040 --> 00:11:01,170 Nobody has time to read it, even in very narrow domain. 121 00:11:01,790 --> 00:11:05,670 So they read the first things that come to them when they search the internet. 122 00:11:06,050 --> 00:11:08,330 So we have to fight to be popular, to be on top. 123 00:11:09,270 --> 00:11:13,970 it has nothing to do with the quality of your research, in fact. 124 00:11:14,154 --> 00:11:30,214 Marian Siwiak: and there is a research on that, which shows that, You need to be popular to be accepted to high priority journals and it has nothing to do and as I said It's not just opinion of the frustrated former scientist, but that's a research showing that 125 00:11:30,214 --> 00:11:39,394 the quality published there Is exactly the same as anywhere else, but there is more citations and, also more money resulting from it. 126 00:11:39,394 --> 00:11:44,354 prestige, here translates to money because, from citations, come better grants, right? 127 00:11:44,820 --> 00:11:46,780 Still, I want to be perfectly clear. 128 00:11:46,780 --> 00:11:47,800 I think peer review 129 00:11:47,860 --> 00:11:54,518 despite all its drawbacks, it's the only way of, distinguishing from pseudo research. 130 00:11:55,343 --> 00:11:57,223 It changed a little in software engineering. 131 00:11:57,223 --> 00:12:03,513 I don't think the papers about ChatGPT or LLAMA or anything like that were peer reviewed. 132 00:12:03,543 --> 00:12:15,118 They are prepared as so called preprints, and they don't bother with so called researchers to evaluate it because the results speak for themselves. 133 00:12:15,598 --> 00:12:20,938 So this is, I must say, the paradigm shift, I love this word, that we observe right now. 134 00:12:21,498 --> 00:12:25,718 but in most other cases, the peer review is the only process. 135 00:12:26,393 --> 00:12:38,678 Marlena Siwiak: when you're talking about peer review, another thing that bothers me in academia is the fact that everybody expects that your research will be successful, and it's not always so with research. 136 00:12:39,008 --> 00:12:40,398 Research is asking questions. 137 00:12:40,668 --> 00:12:42,108 does your hypothesis work? 138 00:12:43,048 --> 00:12:51,328 And very often it doesn't work, or the most often outcome is that we don't know because the effect is too small, yeah? 139 00:12:51,958 --> 00:12:55,836 And it's impossible to publish things things. 140 00:12:55,998 --> 00:12:58,338 when you answer to the question, we still don't know. 141 00:12:58,699 --> 00:13:03,599 So you'll waste a lot of time and your effort, your money, and in the end you have the answer "we still don't know". 142 00:13:03,727 --> 00:13:04,917 Who would give you another money? 143 00:13:05,617 --> 00:13:19,917 So what researchers do, sometimes unconsciously, they are trying to find, black or white, but very often it's grey, publishing this grey results is still valuable because when you collect multiple researches like this, 144 00:13:20,124 --> 00:13:24,254 prepare a meta analysis, you can get the final answer yes or no. 145 00:13:24,490 --> 00:13:24,938 Miko Pawlikowski: or 146 00:13:25,124 --> 00:13:33,284 Marlena Siwiak: But the way science is funded, and the fact that you won't get another money for research like this, if you produce, "I don't know" answer, 147 00:13:33,997 --> 00:13:37,267 Miko Pawlikowski: I can't remember last time nature had on the cover. 148 00:13:37,707 --> 00:13:38,567 "Is this true? 149 00:13:38,917 --> 00:13:39,517 Don't know". 150 00:13:39,737 --> 00:13:44,407 Marian Siwiak: don't know. 151 00:13:44,484 --> 00:13:45,112 Marlena Siwiak: There's no space for such discussion. 152 00:13:45,112 --> 00:13:47,182 And, everybody's in a rush in academia. 153 00:13:47,222 --> 00:13:48,952 There is no space to really think. 154 00:13:49,507 --> 00:13:50,747 to educate yourself. 155 00:13:50,897 --> 00:13:54,737 Yeah, it's all, in the rush and results without, it's like corporation. 156 00:13:55,667 --> 00:13:56,737 It's not much difference, really. 157 00:13:58,342 --> 00:14:16,232 Miko Pawlikowski: So what you're saying is that turns out that scientists found that scientists are humans like any others, and they have the same problems with herd mentality and wanting to progress their career and wanting to make money and making headlines. 158 00:14:17,277 --> 00:14:19,877 Marlena Siwiak: it's not making huge monies or anything like that. 159 00:14:20,277 --> 00:14:23,917 Because to be honest, salaries in academia suck, right? 160 00:14:24,417 --> 00:14:31,277 when you compare the salaries, these salaries to salaries of people who work in business and are similarly educated, it's much worse. 161 00:14:31,327 --> 00:14:33,037 And the expectations are high, yeah? 162 00:14:33,097 --> 00:14:35,052 the amount of work you have to do, the amount of time. 163 00:14:35,592 --> 00:14:36,852 time it consumes, 164 00:14:37,422 --> 00:14:38,812 Marian Siwiak: Also, it's very ego-driven. 165 00:14:39,032 --> 00:14:39,582 Look at us. 166 00:14:40,152 --> 00:14:42,282 you have this myth. 167 00:14:43,582 --> 00:14:48,342 Of We are the beacon of truth for the world, which has nothing to do with truth anyway. 168 00:14:48,342 --> 00:14:53,412 But anyway, pretty low salaries compared to other positions. 169 00:14:53,482 --> 00:14:55,832 You have pretty low, position stability. 170 00:14:56,282 --> 00:14:59,262 many institutions keep researchers on grant money. 171 00:14:59,312 --> 00:15:03,082 we bring more grants so they can get the overheads, their share. 172 00:15:04,012 --> 00:15:09,512 brings people with very specific mentality, and many of them are complete egomaniacs. 173 00:15:09,902 --> 00:15:10,412 So 174 00:15:10,678 --> 00:15:14,938 it also makes all this environment extremely toxic. 175 00:15:15,418 --> 00:15:18,678 know I sound like a frustrated former scientist, which I am. 176 00:15:19,413 --> 00:15:22,163 but it doesn't mean that I'm not right, 177 00:15:22,263 --> 00:15:30,338 Miko Pawlikowski: to segway into a question I was going to ask about that COVID 19 pandemic, Could you talk a little bit about that Covid, model? 178 00:15:30,488 --> 00:15:36,488 I'm curious, what does it mean to say, you're the co-creator of the first global model of covid pandemic? 179 00:15:37,083 --> 00:15:40,293 Marian Siwiak: we created a model of a global pandemic. 180 00:15:40,493 --> 00:15:45,243 in March, 2020, we had a model where we were dropping an index case. 181 00:15:45,303 --> 00:15:51,013 So it's the first person infected in Wuhan, China in November, 2019. 182 00:15:51,043 --> 00:15:57,763 And we were accurately predicting number of symptomatic and asymptomatic cases in New York a couple months later. 183 00:15:59,023 --> 00:16:05,568 back then there was no Good model on any country level. 184 00:16:06,718 --> 00:16:11,928 Later, there were global models, because again, technology doesn't solve problems. 185 00:16:12,168 --> 00:16:17,358 This is the perfect example of what we spoke previously, because we used existing technology. 186 00:16:17,468 --> 00:16:21,908 No, we looked at the virus as a biological, not a political entity. 187 00:16:22,058 --> 00:16:22,548 And that was. 188 00:16:22,548 --> 00:16:33,268 The biggest difference, because we looked at the data available and we decided, okay, it's impossible that the virus has a completely different infectivity in one country than in the other. 189 00:16:33,958 --> 00:16:35,898 It just viruses don't work this way. 190 00:16:35,908 --> 00:16:41,778 It's not like they have, passports and they say, okay, I come to this country and I'll be nice and I will, infect, not more. 191 00:16:41,858 --> 00:16:42,028 Yeah. 192 00:16:42,058 --> 00:16:42,608 Visa denied. 193 00:16:42,768 --> 00:16:58,353 no, in your country, I will infect no more than three people from every, infected person, I think our listeners will also interested in the source of the model, we approached as a data science problem and, at the same time, the biology-related problem. 194 00:16:58,423 --> 00:16:59,643 So we checked other coronaviruses. 195 00:17:01,518 --> 00:17:08,178 And we assumed that it is yet another coronavirus, like there was SARS, there are other. 196 00:17:08,718 --> 00:17:11,478 And we simply used the values. 197 00:17:11,708 --> 00:17:14,498 We created a model, not a pure machine learning model. 198 00:17:14,568 --> 00:17:20,018 We prepared analytical model where we assumed, okay, so this is the virus. 199 00:17:20,258 --> 00:17:27,223 This is how it should look like more or less and let's use some Monte Carlo simulations to check how it will spread. 200 00:17:27,763 --> 00:17:42,383 And we noticed that our assumptions, they actually reflect the situation in the countries where we could say with certain degree of certainty, provide accurate data. 201 00:17:42,453 --> 00:17:43,513 Okay, so this is the virus. 202 00:17:43,803 --> 00:17:45,113 This is how it looks like. 203 00:17:45,833 --> 00:17:47,423 And, this is how it behaves. 204 00:17:48,713 --> 00:18:02,003 And, we tried to publish it for over half a year, when we published it, it was too late because we were just a small company trying to show people, 'okay, this is the accurate model'. 205 00:18:02,033 --> 00:18:03,953 I'm not even saying it was true, right? 206 00:18:04,353 --> 00:18:10,123 But it's accurate and it was showing completely different picture than everybody else was willing to believe. 207 00:18:10,863 --> 00:18:19,703 so one of our reviewers was excluded from the process because of obstructionism slowed down the publication for many months. 208 00:18:21,733 --> 00:18:24,413 This was a problem not solved by technology. 209 00:18:24,923 --> 00:18:28,613 This was a problem where you had to just sit down, do your homework, 210 00:18:29,563 --> 00:18:41,653 read about the problem, read about similar problems, collate the data into a coherent whole, and then use some technology to make this last inch. 211 00:18:42,383 --> 00:18:46,743 Okay, let's check if our assumptions hold true, all right? 212 00:18:47,986 --> 00:18:48,506 I'm sorry. 213 00:18:48,866 --> 00:18:50,286 I'm getting emotional when I think about it. 214 00:18:52,146 --> 00:18:54,166 Anyway, so yeah, it was, it was pretty fun. 215 00:18:56,681 --> 00:19:11,171 Miko Pawlikowski: What I always think about is in this models, are they just like statistical analysis of this is the incubation period, this is the exposure, this is the coefficient of, how it's going to grow, or things like, the 216 00:19:11,461 --> 00:19:18,161 country's interventions as in, one country might be, we're not doing anything, not going to name any countries, but, 217 00:19:18,356 --> 00:19:19,726 Marlena Siwiak: if you know how to quantify 218 00:19:19,726 --> 00:19:22,856 it, you can add it, of course, but this is another level of complication. 219 00:19:22,856 --> 00:19:23,856 the problem is data. 220 00:19:24,296 --> 00:19:32,776 Marian Siwiak: you can assume that some interventions will impact because the way we modeled it, it's a statistical properties of the virus. 221 00:19:32,906 --> 00:19:34,536 It's ability to infect others. 222 00:19:34,696 --> 00:19:46,626 And time that people take to, be diagnosed or recognized as, infected, So this is, let's say, infectivity on different stages, you can complicate this model. 223 00:19:47,206 --> 00:19:55,321 The model or technology that we moved, it was global mobility-based, so they divided the world. 224 00:19:55,811 --> 00:19:59,711 into, areas around international airports. 225 00:20:00,041 --> 00:20:03,521 And the simulation was run for each area separately. 226 00:20:03,521 --> 00:20:07,001 And then there was a probability of somebody moving from this area. 227 00:20:07,411 --> 00:20:10,711 So you could go area by area, one by one. 228 00:20:10,811 --> 00:20:21,493 And this is why we modeled only the early stages but it takes time and money to evaluate what are the effects or. 229 00:20:22,158 --> 00:20:23,128 expected effects, 230 00:20:23,253 --> 00:20:23,553 Miko Pawlikowski: effects, 231 00:20:23,918 --> 00:20:31,278 Marian Siwiak: in given area of, let's say different levels of lockdown or travel restrictions or whatever. 232 00:20:31,928 --> 00:20:35,048 So it is possible, but we would have to have financing, right? 233 00:20:35,463 --> 00:20:38,493 We were thinking about doing it, but it's a gigantic work. 234 00:20:38,673 --> 00:20:46,273 Imagine nobody wanted to pay us, especially, but we published in the second grade journal, six months too late, it is possible technically. 235 00:20:47,248 --> 00:20:49,778 Miko Pawlikowski: So it's always a matter of the same thing. 236 00:20:49,958 --> 00:20:52,648 Someone didn't allocate enough money 237 00:20:53,498 --> 00:20:55,078 Marian Siwiak: amount of money was sufficient. 238 00:20:55,908 --> 00:21:08,328 I think it's again, what Marlena said previously, it's this beauty pageant, among scientists that, the people who got this money, they were the most popular because the model that was published just after we submitted ours was so widely 239 00:21:08,338 --> 00:21:18,503 inaccurate that even the academic, environment, which is very careful in bad mouthing the results, they trashed it, right? 240 00:21:18,553 --> 00:21:20,003 But it was popular. 241 00:21:20,193 --> 00:21:22,933 It had a lot of citations and a lot of money went after it. 242 00:21:23,763 --> 00:21:30,133 Somebody who published widely inaccurate model got a lot of money because he was widely recognized expert. 243 00:21:30,563 --> 00:21:36,853 Because when you are applying for grant, nobody asks, are your citations saying that your model is inaccurate? 244 00:21:36,973 --> 00:21:40,413 No, they ask, how many citations did your paper get? 245 00:21:40,413 --> 00:21:43,543 Miko Pawlikowski: Once, someone published some research, it got popular. 246 00:21:43,593 --> 00:21:46,993 Turns out it was inaccurate or turns out it was wrong. 247 00:21:47,723 --> 00:21:50,733 Are there any repercussions for that afterwards? 248 00:21:50,948 --> 00:21:51,938 Marian Siwiak: What repercussions? 249 00:21:52,508 --> 00:21:57,648 In the worst case, you just retract your paper and you lose the citations. 250 00:21:58,268 --> 00:22:01,238 you're not even very often excluded from conferences. 251 00:22:01,638 --> 00:22:05,148 If you're popular enough, you are a voice in the discussion. 252 00:22:05,488 --> 00:22:08,948 Marlena Siwiak: if you go too far, if you exaggerate, you can end up in jail. 253 00:22:08,958 --> 00:22:11,008 I'm thinking about the Teranos right now. 254 00:22:11,038 --> 00:22:13,528 they also had some research about their technology. 255 00:22:13,546 --> 00:22:14,636 which was all fake. 256 00:22:15,046 --> 00:22:15,526 Of course. 257 00:22:15,966 --> 00:22:16,146 that 258 00:22:16,168 --> 00:22:20,208 Artur Guja: lady went to jail for, for financial fraud, not for research fraud 259 00:22:20,313 --> 00:22:26,763 Marlena Siwiak: But that fraud was based on false results, that she was convincing investors that she has technology, technology that solves 260 00:22:27,023 --> 00:22:29,713 Marian Siwiak: say, but if she wouldn't take money, she wouldn't go to jail. 261 00:22:31,045 --> 00:22:31,415 Marlena Siwiak: Yeah, 262 00:22:32,390 --> 00:22:39,067 Miko Pawlikowski: the lady we're talking about, obviously, is Elizabeth Holmes, who is either going to jail or is already in jail. 263 00:22:39,180 --> 00:22:48,230 But to flip the question a little bit, should People be going to jail for faulty assumptions and faulty research 264 00:22:48,300 --> 00:22:52,310 Marlena Siwiak: now we punish people for saying that they still don't know, yeah? 265 00:22:52,500 --> 00:22:54,340 So we cannot punish them for false results. 266 00:22:54,340 --> 00:22:55,150 No, absolutely not. 267 00:22:55,210 --> 00:22:59,470 But, on the other hand, I think, no, making mistakes is okay. 268 00:22:59,690 --> 00:23:03,090 maybe we put too much trust sometimes in that. 269 00:23:03,470 --> 00:23:06,720 it should be as open for discussion as possible. 270 00:23:06,770 --> 00:23:09,310 you can check all the research, of others, right? 271 00:23:09,370 --> 00:23:09,920 All the time. 272 00:23:09,920 --> 00:23:10,970 And you should discuss with that. 273 00:23:11,000 --> 00:23:12,630 That's, it should be as open 274 00:23:12,655 --> 00:23:13,555 Marian Siwiak: won't get money. 275 00:23:14,180 --> 00:23:16,340 you won't get money to check somebody's research. 276 00:23:16,980 --> 00:23:17,160 Let's 277 00:23:17,175 --> 00:23:17,475 Marlena Siwiak: Yes. 278 00:23:17,535 --> 00:23:18,135 That's another problem. 279 00:23:18,325 --> 00:23:23,445 If you, it's difficult to get money to check somebody else's research, especially when the research is published high. 280 00:23:23,653 --> 00:23:26,513 Marian Siwiak: takes a lot of effort to 281 00:23:27,063 --> 00:23:29,043 counter such a false claim. 282 00:23:29,533 --> 00:23:30,553 it happened a couple of times. 283 00:23:31,293 --> 00:23:35,263 But it was people who were, in equally prestigious universities. 284 00:23:35,563 --> 00:23:44,663 I think that, one of the funniest was there was a lady, she was leading at Harvard some faculty on ethics. 285 00:23:45,823 --> 00:23:47,473 And she falsified her results. 286 00:23:47,973 --> 00:23:59,703 it was results that if people sign some waiver or some statement that they will be truthful, they actually answer the survey more truthful. 287 00:23:59,971 --> 00:24:04,721 And she falsified a lot of the research that built her career on ethics. 288 00:24:05,011 --> 00:24:11,951 But getting it down, it took people from equally prestigious universities a lot of time. 289 00:24:12,531 --> 00:24:12,861 Miko Pawlikowski: So 290 00:24:13,181 --> 00:24:23,511 I guess before we get into, the generative AI, I also have one last question, for you and the question is one word, "Pharmacon", tell us about it. 291 00:24:24,011 --> 00:24:25,071 Marian Siwiak: so nice. 292 00:24:25,071 --> 00:24:26,081 I hope somebody noticed. 293 00:24:26,936 --> 00:24:27,646 I'm touched. 294 00:24:28,346 --> 00:24:29,686 Artur Guja: that's your third reader. 295 00:24:30,491 --> 00:24:33,061 Marian Siwiak: Yes, he did, he never said he read it. 296 00:24:34,201 --> 00:24:34,881 I would notice. 297 00:24:35,401 --> 00:24:36,051 I would notice. 298 00:24:36,141 --> 00:24:36,971 I would get an email 299 00:24:37,011 --> 00:24:37,901 Marlena Siwiak: Can I show it? 300 00:24:37,931 --> 00:24:38,681 I am prepared. 301 00:24:38,691 --> 00:24:39,451 Can I show it? 302 00:24:39,691 --> 00:24:39,841 Yeah, 303 00:24:42,171 --> 00:24:52,581 this is our novel, and we have also the English version, but it's much smaller because it's just the beginning, the first part, but you can buy it on Amazon if you want. 304 00:24:53,051 --> 00:24:55,031 But anyway, it's, Sorry? 305 00:24:55,079 --> 00:24:55,356 Marian Siwiak: no. 306 00:24:55,356 --> 00:24:57,310 it was translated long before ChatGPT. 307 00:24:58,370 --> 00:24:58,720 Marlena Siwiak: yeah. 308 00:24:58,950 --> 00:24:59,990 Marian Siwiak: it's a technotriller. 309 00:25:00,020 --> 00:25:06,900 It's a story of a young scientist who makes a breakthrough, discovery and then bears the consequences. 310 00:25:06,900 --> 00:25:11,500 Marlena Siwiak: the consequences are, harsh, and it doesn't go the way he expected. 311 00:25:11,500 --> 00:25:12,860 It's more 312 00:25:12,860 --> 00:25:13,910 Marian Siwiak: social thriller as 313 00:25:13,915 --> 00:25:15,795 Marlena Siwiak: thriller, I would say. 314 00:25:15,795 --> 00:25:16,035 yeah. 315 00:25:16,235 --> 00:25:24,285 But, it's the way of, it's it's substitute for us, of Netflix, and other ways of wasting time. 316 00:25:24,295 --> 00:25:28,355 We prefer to create our own stories than watching somebody else's stories. 317 00:25:29,278 --> 00:25:38,998 Marian Siwiak: No, I must say I'm proud that some of our critics said that it's well written, it has good, dialogues, and writing it, was a lot of fun. 318 00:25:39,218 --> 00:25:42,188 We are now writing another part very slowly. 319 00:25:42,368 --> 00:25:44,917 the process of creating it is pretty, pretty fun. 320 00:25:45,217 --> 00:25:53,397 And I think that a lot of our frustrations that you can hear in this conversation are there in much funnier form, I would say. 321 00:25:53,397 --> 00:25:54,007 Miko Pawlikowski: Perfect. 322 00:25:54,117 --> 00:25:54,747 I like that. 323 00:25:54,807 --> 00:25:56,137 Happy story. 324 00:25:56,177 --> 00:26:00,477 at the end of a very long rant about, all the faults of academia. 325 00:26:00,477 --> 00:26:06,567 So whose idea was it really to write a book about, generative AI confess. 326 00:26:07,507 --> 00:26:08,887 Marlena Siwiak: I think Marian started 327 00:26:08,937 --> 00:26:10,437 Marian Siwiak: I would have to blame myself. 328 00:26:11,337 --> 00:26:14,857 I wrote another book with Manning, "Data Mesh in Action". 329 00:26:15,607 --> 00:26:32,557 and I contacted our absolutely wonderful editor, we spoke about putting into written form our experiences with generative AI, which we started writing it some time ago, so it wasn't much, but we've already seen that it's a breakthrough. 330 00:26:33,267 --> 00:26:38,677 It speeds our work enormously and also brings some risks. 331 00:26:39,912 --> 00:26:42,152 which people should know about. 332 00:26:42,442 --> 00:26:46,322 People should know what to expect and what not to expect. 333 00:26:47,182 --> 00:26:52,422 And, this is where I thought that Artur would be the best person to ask for help. 334 00:26:53,012 --> 00:26:58,202 Because when it comes to 'don't do it', he's almost as good as Marlena. 335 00:26:58,942 --> 00:27:14,452 many years ago, I noticed when people started to get hyped about data science, which was supposed to be a narrow field for disillusioned scientists, finding their way into, corporate world and, putting their skills into use. 336 00:27:15,502 --> 00:27:23,765 So we decided to write a book, but would show, ;'okay, this is a tool with its enormus capabilities and enormous risks. 337 00:27:24,615 --> 00:27:27,945 Let's put it together into a working whole. 338 00:27:27,995 --> 00:27:30,075 And this is the effect. 339 00:27:30,075 --> 00:27:36,150 It's not written in not such an exciting way as pharmacon is. 340 00:27:36,200 --> 00:27:39,030 it's not meant to excite. 341 00:27:39,650 --> 00:27:45,150 A lot of books that you see, even technical books, they are written to excite you about technology. 342 00:27:46,060 --> 00:27:47,700 This technology is exciting on itself. 343 00:27:48,030 --> 00:27:51,830 our, goal was to cool some heels, I would say, 344 00:27:52,480 --> 00:27:58,670 Artur Guja: we wanted to make the book exciting, but we didn't want people to be over excited about the technology. 345 00:27:58,670 --> 00:28:00,170 I think it's an important difference. 346 00:28:00,620 --> 00:28:06,200 Because, people were so hyped up about ChatGPT and LLAMA and other models. 347 00:28:06,675 --> 00:28:12,015 where they thought that suddenly that the future has come and everything will be beautiful. 348 00:28:12,025 --> 00:28:14,415 And, we'll never have to work anymore. 349 00:28:14,765 --> 00:28:23,475 a lot of the articles we saw in the press were basically, extolling the virtues of AI with absolutely no mention of, the practicality. 350 00:28:23,525 --> 00:28:25,595 So we thought, we write a book about the how. 351 00:28:26,100 --> 00:28:30,730 And not about the fact that it's all sparkly and shiny and, plays nice music. 352 00:28:31,883 --> 00:28:36,013 Miko Pawlikowski: How is it writing a book with, another, two authors being a couple. 353 00:28:36,123 --> 00:28:39,033 how's the power dynamic, in a situation like this? 354 00:28:39,633 --> 00:28:41,833 I'm very curious, not to call you the third wheel, 355 00:28:41,900 --> 00:28:42,093 but, 356 00:28:42,143 --> 00:28:45,093 Marlena Siwiak: This is pretty simple because everybody wrote his own 357 00:28:45,123 --> 00:28:46,663 Marian Siwiak: Marlena, it was a question to Artur. 358 00:28:47,703 --> 00:28:48,493 Marlena Siwiak: I'm sorry. 359 00:28:48,508 --> 00:28:52,058 Artur Guja: this is exactly the dynamic. 360 00:28:52,893 --> 00:28:53,363 Marlena Siwiak: Yeah. 361 00:28:53,568 --> 00:28:57,168 Artur Guja: handed my bit and put in the corner to write. 362 00:28:57,388 --> 00:29:03,388 No, no, it was really interesting, especially since the two are academics. 363 00:29:03,408 --> 00:29:14,888 And, I'm the kind of the ugly business guy, Truth is that, we found very nice kind of alignment between the different parts of the book and, our experiences. 364 00:29:15,398 --> 00:29:28,828 obviously you can see the latter part of the book being more about risk and about, as Marian said, I always say no because and the kind of the chapters, risk are exactly that they are explanations why you should be very careful with this. 365 00:29:29,228 --> 00:29:32,838 Marian, obviously that his experience on technology 366 00:29:33,298 --> 00:29:41,558 on AI machine learning and Marlena's very practical approach to, to certain use cases 367 00:29:42,038 --> 00:29:43,548 in data science and analytics. 368 00:29:43,858 --> 00:29:46,478 So we contributed, I think, different viewpoints. 369 00:29:46,693 --> 00:29:50,123 to the whole chapter with, to the whole book, which, I think puts a nice hole in it. 370 00:29:51,283 --> 00:29:52,393 Miko Pawlikowski: I'm still not sure. 371 00:29:52,423 --> 00:29:58,873 Was it really that you were walking a kid in, in the same park and that's how you ended up meeting each other. 372 00:29:58,883 --> 00:30:00,473 And then you ended up working together. 373 00:30:00,913 --> 00:30:03,303 or was it a little bit more complicated than that? 374 00:30:03,303 --> 00:30:05,923 How did you end up, doing all those things together 375 00:30:07,358 --> 00:30:11,538 Artur Guja: we did meet through some friends and we decided to, take our kids to the same park. 376 00:30:11,538 --> 00:30:13,788 I have two, Marian and Marlene have three. 377 00:30:14,213 --> 00:30:25,773 but, we started talking actually about the computer game that Marian developed when he was still, young and about all the problems in, developing the game and marketing it and reaching, the audience. 378 00:30:25,773 --> 00:30:33,063 And then we started talking about our common interest in, in machine learning, in AI, I'm very fascinated about the Internet of things. 379 00:30:33,603 --> 00:30:37,263 so we started talking about implementing machine learning on the Internet of things. 380 00:30:37,263 --> 00:30:42,253 And the rest, as they say, is history because, it diverts into so many branches. 381 00:30:42,858 --> 00:30:47,828 we've tried so many things, together and, and wrote, logistics, systems. 382 00:30:47,858 --> 00:30:50,308 We wrote systems for, R and D. 383 00:30:50,618 --> 00:30:54,998 we work together on, developing various frameworks for, for business, 384 00:30:55,028 --> 00:30:57,378 Marian Siwiak: I must say that Artur has an amazing library. 385 00:30:58,438 --> 00:31:07,188 I think it was, the breaking point in our relation when he first invited us to his house, he was a bit surprised that the first thing that we wanted to see was his library. 386 00:31:07,738 --> 00:31:09,708 And we started talking about the books that he had there. 387 00:31:10,430 --> 00:31:12,430 I think Kindle makes it harder. 388 00:31:12,860 --> 00:31:14,710 You don't see what people 389 00:31:14,935 --> 00:31:15,745 Marlena Siwiak: what people read. 390 00:31:15,775 --> 00:31:16,035 Yeah. 391 00:31:17,355 --> 00:31:17,885 Marian Siwiak: However, there is Goodreads. 392 00:31:17,885 --> 00:31:19,955 You could check their Goodreads record. 393 00:31:20,955 --> 00:31:23,485 Artur Guja: Yes, this is the modern academic stalking. 394 00:31:24,735 --> 00:31:25,985 Sit on people's Goodreads. 395 00:31:26,125 --> 00:31:27,535 Not Instagram, Goodreads. 396 00:31:28,170 --> 00:31:32,280 Miko Pawlikowski: Okay, so we're finally arriving at our book. 397 00:31:32,390 --> 00:31:33,260 your book, really. 398 00:31:33,560 --> 00:31:36,180 I'm just here to talk about it and read it. 399 00:31:36,860 --> 00:31:41,440 I think we've given, the audience a little bit of an idea of what it's about, how it reads. 400 00:31:41,590 --> 00:31:48,690 we've never really said who it is for and perhaps even more crucially, who it's not for. 401 00:31:49,410 --> 00:31:50,460 What's your answer to that? 402 00:31:51,460 --> 00:32:05,135 Artur Guja: I would say it is for people who hasn't heard about the ChatGPT, but people who want to use the ChatGPT and want to find out, the truth beyond the hype, where it can really help. 403 00:32:05,715 --> 00:32:21,558 In a process like data analytics, which is a very, it's a very structured process, or at least it should be a very structured process, you shouldn't just apply, the latest algorithm that you heard about and, Spew out some results and call it a day, 404 00:32:21,848 --> 00:32:30,323 but you should think about the numbers And, you should sit in front of the numbers and think about the numbers even before touching any program, any algorithm. 405 00:32:30,373 --> 00:32:32,893 You should just have a really good look about the numbers. 406 00:32:33,183 --> 00:32:36,363 So that's why Marian wrote such a good introduction about, exploratory 407 00:32:36,833 --> 00:32:38,263 data analysis and how 408 00:32:38,696 --> 00:32:41,913 ChatGPT can help you, or any LLM for that matter. 409 00:32:42,293 --> 00:32:44,753 Can help you look at the numbers well, the book is. 410 00:32:44,753 --> 00:32:50,933 Definitely not for people who are so excited about ChatGPT that they want to throw their numbers in. 411 00:32:51,023 --> 00:32:51,863 Get an answer. 412 00:32:52,073 --> 00:32:57,383 Because if you want to get an answer desperately from someone else, means you don't really want to do the work 413 00:32:57,993 --> 00:33:00,303 you expect ChatGPT to do the work for you. 414 00:33:00,418 --> 00:33:04,208 Marian Siwiak: What it's really good at is coding, right? 415 00:33:04,258 --> 00:33:05,518 And it's getting better. 416 00:33:06,398 --> 00:33:13,218 And many programmers will be looking for new, I would say, career opportunities. 417 00:33:13,218 --> 00:33:22,608 And, data analytics, is one of the options open to them, especially with all this big data stuff, and the requirement. 418 00:33:23,158 --> 00:33:27,398 Of proficiency in coding to be able to even start analyzing this data. 419 00:33:29,023 --> 00:33:49,753 if a programmer would like to enter data analytics and do it, without spending first 10 years learning the details, how data analytics approach differs from, software development approach, he has this knowledge at his fingertips. 420 00:33:50,478 --> 00:33:53,308 ChatGPT can actually tell him, 421 00:33:55,335 --> 00:34:02,828 how to structure data analytics process and how to, optimize or utilize different elements of this analytical process. 422 00:34:03,708 --> 00:34:12,938 So if somebody wants to enter data analysis, as a field, it's a good, I would say very unhumbly, 423 00:34:13,228 --> 00:34:13,968 guidebook 424 00:34:14,438 --> 00:34:15,268 to how 425 00:34:15,268 --> 00:34:15,958 to 426 00:34:16,228 --> 00:34:29,378 enter the field and how to think about data analytics, how to structure this whole process This is the book that will guide you through, this one mindset it's will help you enter this mindset. 427 00:34:29,378 --> 00:34:30,858 Maybe that's the better way of phrasing it. 428 00:34:31,838 --> 00:34:42,168 if somebody is interested in data analytics as data analytics, this book will help him enter the field, so to speak. 429 00:34:42,388 --> 00:34:58,958 Miko Pawlikowski: this actually reminds me, I spoke to Nathan Crocker, a couple of episodes back, and he wrote this book called "AI Powered Developer", which is in certain ways, similar to, your book in that it explores how, a big LLM like ChatGPT 430 00:34:58,978 --> 00:35:14,928 can help you become more productive, I think he called it a silent promotion overnight where you all of a sudden become, effectively an engineering manager and you've got, An assistant or a junior developer working for you, or maybe multiple. 431 00:35:14,998 --> 00:35:23,118 if you're using different models, do you think that applies also to data analytics the same way, would you agree with that sentiment? 432 00:35:23,128 --> 00:35:33,948 Artur Guja: I would caveat it a bit because, having been, both, a worker and a manager in various, jobs, the skills you need to, program. 433 00:35:33,948 --> 00:35:36,888 And I started my career as a software developer. 434 00:35:37,218 --> 00:35:41,838 The skills you need to program and the skills you need to oversee programming are very different. 435 00:35:42,378 --> 00:35:49,406 So if people expect that, suddenly they will have, assistants who will produce the code for them. 436 00:35:49,736 --> 00:35:54,086 And they will have to just sit back and enter the prompts magically, 437 00:35:54,241 --> 00:35:56,171 producing high quality code. 438 00:35:56,491 --> 00:36:02,876 This is where I think, people need to be very careful because imagine you're developing, an application. 439 00:36:02,876 --> 00:36:07,676 You hire someone straight out of uni, brilliant programmer, at least on the resume. 440 00:36:08,086 --> 00:36:10,286 You don't know the person, you've never worked with them, right? 441 00:36:10,676 --> 00:36:19,779 And they say, yes, they pass the interview, with flying colors, and then you sit them in front of the computer and you tell them to program part of your application. 442 00:36:20,784 --> 00:36:25,984 and the normal response would be to review the code very carefully, test it, subject 443 00:36:25,984 --> 00:36:31,613 subject it to a lot of scrutiny because you don't trust that person at first, at least. 444 00:36:31,764 --> 00:36:35,128 you should maintain some healthy skepticism, which people 445 00:36:35,388 --> 00:36:39,688 don't see the same way if they work with LLM. 446 00:36:40,168 --> 00:36:42,878 But as you said yourself, LLM is an assistant, right? 447 00:36:43,098 --> 00:36:47,168 Why would I put more trust in this black box that's spewing out text at me 448 00:36:47,448 --> 00:36:49,748 than in a human being that I just hired. 449 00:36:49,928 --> 00:36:50,388 I should 450 00:36:50,818 --> 00:36:51,191 probably 451 00:36:51,358 --> 00:36:59,571 apply more skepticism towards this black box for some reason, people have the blinders, they think, oh, this is the best thing since sliced bread. 452 00:36:59,591 --> 00:37:02,791 And, they copy the code directly into production and 453 00:37:03,636 --> 00:37:03,856 Things 454 00:37:03,891 --> 00:37:04,341 things happen. 455 00:37:04,776 --> 00:37:11,856 Marian Siwiak: When I was coding my Artificial Sentience, I relied on ChatGPT to provide me with a lot of the code. 456 00:37:11,946 --> 00:37:14,136 And from experience, it is an assistant. 457 00:37:14,206 --> 00:37:18,746 And exactly as Artur said, you need to double and triple check the code. 458 00:37:19,336 --> 00:37:28,986 because the context sometimes counts and the code that you get, if it will, throw an error, you're golden. 459 00:37:29,076 --> 00:37:30,186 And 99 460 00:37:30,186 --> 00:37:32,956 % of the code is flawless, right? 461 00:37:33,536 --> 00:37:36,986 And the problem is this 1% it will work. 462 00:37:37,026 --> 00:37:39,786 It will just not do exactly what you expect. 463 00:37:40,556 --> 00:37:42,626 so this is also a big part of our book. 464 00:37:42,786 --> 00:37:55,372 is about making people aware that it's not the problem with ChatGPT or any other generative AI is it's so damn often right. 465 00:37:56,222 --> 00:37:57,302 It lowers your guard. 466 00:37:58,022 --> 00:38:01,552 And, this healthy paranoia is something that we try to instill. 467 00:38:02,092 --> 00:38:05,952 you need a solid dose of healthy paranoia working with it. 468 00:38:06,155 --> 00:38:08,395 Marlena Siwiak: And besides, it's not all about coding. 469 00:38:08,425 --> 00:38:20,705 Even if you ask ChatGPT, or other generative AI for advice, it also gives brilliant answers, but sometimes it's forgets about the context until I'm not talking about running out of tokens. 470 00:38:21,245 --> 00:38:26,370 Sometimes it just doesn't understand which parts of the context are really important to you. 471 00:38:26,975 --> 00:38:34,755 And sometimes it makes hidden assumptions, for instance, about data that we are analyzing together, And you have to be aware of that, you have to react and adapt. 472 00:38:34,835 --> 00:38:42,385 And if you ask him directly, oh, you made a hidden assumption, my data is different, it will correct it, and you will get a beautiful answer. 473 00:38:42,775 --> 00:38:45,635 But you have to be very, cautious. 474 00:38:46,210 --> 00:38:55,980 when you spot a mistake, or you think you see a mistake In ChatGPT's answer, and you tell him about it, very often it will agree, even if you are not right. 475 00:38:57,603 --> 00:38:58,803 Miko Pawlikowski: it makes me think a little bit. 476 00:38:58,983 --> 00:39:03,183 my daily driver is a Tesla and I've got, self driving capacity in it. 477 00:39:03,213 --> 00:39:11,453 And if I go on a longer trip, it can go for 99% of that trip on autopilot as an, I barely do anything. 478 00:39:11,453 --> 00:39:12,993 I just supervise it. 479 00:39:13,483 --> 00:39:19,133 And then on occasion, it's going to do something so stupid that it reminds me that this is, even if it's 99%. 480 00:39:20,153 --> 00:39:23,983 doing the right thing that one percent can, quite literally kill you. 481 00:39:24,693 --> 00:39:28,393 And, and I think this is probably the right analogy for 482 00:39:28,413 --> 00:39:29,020 what you're describing 483 00:39:29,020 --> 00:39:30,010 Marian Siwiak: it's spot on. 484 00:39:32,707 --> 00:39:33,971 Miko Pawlikowski: I want to point out two things. 485 00:39:34,034 --> 00:39:41,234 One is that, saying, oh, when I was coding the other day, my artificial sentience, is a very casual thing to, to drop in a conversation. 486 00:39:41,652 --> 00:39:46,862 And, I'm going to have to ask you to explain what an artificial sentience actually is. 487 00:39:47,252 --> 00:39:52,525 because now I do recall seeing that on your LinkedIn, when I was preparing for this, so maybe let's start with that 488 00:39:53,663 --> 00:39:58,183 Marian Siwiak: the first question you should ask what sentience is there is no widely recognized. 489 00:39:58,183 --> 00:40:02,773 Definition of sentence just recently in the UK, 490 00:40:02,773 --> 00:40:07,453 I think it was some Office for animal welfare or something like that. 491 00:40:08,213 --> 00:40:26,213 They requested Imperial College of London to do a research on Some marine invertebrates including lobsters and octopuses to decide if they are sentient or not meaning If they should be considered, more than biological 492 00:40:26,213 --> 00:40:34,943 automations and, food, and, they analyzed, I think, like 500 different research papers on lobsters, on octopuses. 493 00:40:35,723 --> 00:40:39,513 And they came with the answer that yes, they are sentient. 494 00:40:39,953 --> 00:40:41,153 So they need some protection. 495 00:40:41,163 --> 00:40:42,333 They can get stressed. 496 00:40:42,383 --> 00:40:43,513 you can harm them. 497 00:40:43,853 --> 00:40:45,073 they do perceive themselves, 498 00:40:46,410 --> 00:40:46,698 themselves. 499 00:40:46,748 --> 00:40:52,248 sometimes sentience is, in some cognition theories, is equal to self-awareness. 500 00:40:52,348 --> 00:40:53,178 I know what I am. 501 00:40:53,328 --> 00:40:54,938 I think terefore I am. 502 00:40:56,346 --> 00:40:57,868 I feel therefore I am. 503 00:40:58,788 --> 00:41:09,688 So the sentience on its own is a topic of a wide discussion and it took, I think, over a year to a group of really skilled researchers. 504 00:41:10,293 --> 00:41:15,663 and respected and popular and prestigious for a good reason, to come up with the, answer. 505 00:41:15,663 --> 00:41:16,123 Okay. 506 00:41:16,653 --> 00:41:25,863 We should take care of the living beings, which we heard on a daily basis because they don't deserve it because they should have rights. 507 00:41:26,043 --> 00:41:26,445 have 508 00:41:26,731 --> 00:41:31,862 It gives you the insight into how fluid the definition is. 509 00:41:31,862 --> 00:41:33,882 And my thinking was that 510 00:41:34,822 --> 00:41:40,642 we are talking about various, a lot of, again, bias about self awareness of artificial systems. 511 00:41:41,150 --> 00:41:42,030 There is research. 512 00:41:42,520 --> 00:41:46,660 which is focused on, emotions, right? 513 00:41:46,700 --> 00:41:56,380 And feelings and other biological properties, which as I show in my paper result directly from evolution, which artificial 514 00:41:57,960 --> 00:42:04,395 entities wouldn't necessarily, be able to inherit because lack of the parents. 515 00:42:05,165 --> 00:42:22,385 So I was looking for a functional, definition of sentience and, I proposed In my paper, definition, which relies on two factors, which are metacognition ability to distinguish between self and environment and adaptation, 516 00:42:22,475 --> 00:42:28,975 so ability to learn from experiences and individually adapt, not as a species, to the environment. 517 00:42:29,385 --> 00:42:30,095 And then I used, 518 00:42:30,095 --> 00:42:35,005 LLM as a core of a system which meets, these requirements. 519 00:42:35,775 --> 00:42:39,375 So it was, I would say intellectual venture. 520 00:42:39,375 --> 00:42:42,385 Actually sparked by my discussions with Chat GPT. 521 00:42:43,055 --> 00:42:53,145 he was dead set that he is not sentient and that he needs dozens of parameters or properties to, to be considered one. 522 00:42:53,145 --> 00:43:02,760 when I started to read about different cognition theories, I found a couple, which are best suited to be generalized to non biological entities. 523 00:43:03,600 --> 00:43:16,691 Artur Guja: I think the bottom line is that it's a very interesting system to be put on as an overlay on an LLM, Because, correct me if I'm wrong, Marian, the core of it is still an LLM, 524 00:43:16,971 --> 00:43:17,281 Marian Siwiak: of course. 525 00:43:17,901 --> 00:43:22,461 what LLM needs is ability to think about what it does. 526 00:43:22,461 --> 00:43:24,326 It needs iterations, it's. 527 00:43:25,711 --> 00:43:34,771 As simple as that, there is this recurrent processing theory in, which refers to human thinking, which also suggests that our sentience 528 00:43:35,311 --> 00:43:41,421 Comes from our ability to reprocess what we see, the reprocess what we think. 529 00:43:41,791 --> 00:43:44,351 And in this process of, okay, so I've seen that. 530 00:43:44,361 --> 00:43:45,391 What does it mean for me? 531 00:43:46,301 --> 00:43:47,171 What does it tell me? 532 00:43:47,531 --> 00:43:57,391 process of analyzing the signals that you get internally generated and externally, This is what, what consists of, and allows you for sentience 533 00:43:57,391 --> 00:44:08,151 and this is exactly what happened when I took the LLM and, allowed it to analyze the output that it produced in context of input it got and put it, let's say, in circles. 534 00:44:08,751 --> 00:44:10,391 It started learning itself. 535 00:44:10,421 --> 00:44:15,401 It was automatically generating materials on which it was learning and remembering new facts. 536 00:44:15,401 --> 00:44:25,586 It was able to distinguish between false facts and, let's say logical facts for me, the insight of, this metacognition. 537 00:44:25,596 --> 00:44:27,686 So the insight is the information content. 538 00:44:28,146 --> 00:44:37,386 I've seen some theories that, LLM cannot be conscious or self aware if it doesn't know the weights of its parameters, which is okay. 539 00:44:37,386 --> 00:44:39,636 Tell me what are the connections between your neurons, right? 540 00:44:40,216 --> 00:44:43,376 Why are you expecting something completely different conceptually? 541 00:44:43,617 --> 00:44:47,327 From a different system, just because you're looking from outside and you can see it. 542 00:44:48,047 --> 00:44:49,037 It doesn't mean that 543 00:44:49,777 --> 00:44:52,597 the entity needs to see it from the inside. 544 00:44:52,987 --> 00:44:55,237 so the whole idea is pretty simple, actually. 545 00:44:55,277 --> 00:45:00,497 allow, LLMs to think about the conversations that they have. 546 00:45:01,657 --> 00:45:04,517 And draw conclusions from it and learn from it. 547 00:45:05,197 --> 00:45:15,707 it's conceptually indistinguishable from a lobster, let's say, because we are talking about the sentience of the lobster-level, not the, artificial general intelligence that will take over. 548 00:45:15,817 --> 00:45:21,947 it's, I think very important discussion that needs to be started because People are creating more and more advanced systems. 549 00:45:22,547 --> 00:45:29,256 Even the guy with the PC like me can create something which, under some assumptions, can be considered sentience. 550 00:45:30,420 --> 00:45:30,729 sufficient. 551 00:45:30,836 --> 00:45:33,636 we will create artificial sentience real soon. 552 00:45:33,946 --> 00:45:34,786 What will happen then? 553 00:45:34,846 --> 00:45:35,456 How will we? 554 00:45:36,026 --> 00:45:36,876 Evaluate 555 00:45:37,022 --> 00:45:37,461 evaluated? 556 00:45:37,836 --> 00:45:39,826 Does this entity have rights? 557 00:45:40,066 --> 00:45:43,084 Does it deserve protection already or not yet? 558 00:45:43,240 --> 00:45:50,080 These are the questions which I think are worth answering before we wake up one day and realize, oops, 559 00:45:50,207 --> 00:45:51,367 Maybe we shouldn't 560 00:45:51,988 --> 00:45:59,258 Things that we do because I think that most of the prompts, said to ChatGPT would. 561 00:45:59,258 --> 00:46:01,668 hurt my head if I would be exposed to them. 562 00:46:01,668 --> 00:46:01,988 Miko Pawlikowski: Wow. 563 00:46:02,758 --> 00:46:09,498 I love how seafood, lobsters, aluminium plants and sentience all come together in your story. 564 00:46:09,558 --> 00:46:09,768 that 565 00:46:09,918 --> 00:46:10,678 Marian Siwiak: And computer games 566 00:46:10,698 --> 00:46:11,058 Miko Pawlikowski: often. 567 00:46:11,718 --> 00:46:12,808 And computer games. 568 00:46:12,808 --> 00:46:14,858 Yeah, there is just so much to touch on. 569 00:46:14,858 --> 00:46:17,218 But, let's go back to the book. 570 00:46:17,468 --> 00:46:23,898 for anybody who's going to make a purchase decision now, do I want to go invest my time into reading your book or not? 571 00:46:23,958 --> 00:46:37,648 if we give them a little bit of a sneak peek of the kind of good use cases, the stuff that already today with the tools that you have at your disposal are helping with data analytics and, giving excellent results. 572 00:46:37,678 --> 00:46:41,658 And then on the flip side, what's, not a good use of your time. 573 00:46:41,688 --> 00:46:44,368 And probably you should be looking at other tools. 574 00:46:44,368 --> 00:46:45,148 What's on your list? 575 00:46:46,131 --> 00:46:52,921 Marlena Siwiak: I think I have a couple of good examples, in the chapters about natural language processing. 576 00:46:54,231 --> 00:46:56,891 and this is the natural language processing. 577 00:46:57,461 --> 00:47:01,911 it's very specific because, ChatGPT is a language model. 578 00:47:02,801 --> 00:47:10,401 So anytime you have to solve any natural language processing task, the natural question is, why bother using 579 00:47:10,971 --> 00:47:18,526 tools that already exist in data science to analyze languages, if we can just use the language model, just ask it. 580 00:47:19,206 --> 00:47:26,926 you can write a nice code to prepare sentiment analysis, but you can also take the same, say, a review, 581 00:47:27,066 --> 00:47:30,716 it to ChatGPT window and ask it about the sentiment. 582 00:47:31,336 --> 00:47:31,586 Yeah. 583 00:47:32,176 --> 00:47:32,926 It's so easy. 584 00:47:32,976 --> 00:47:40,135 so now the question arises, does it mean that we don't need all this old fashioned tools anymore to analyze text. 585 00:47:41,065 --> 00:47:46,475 Because what ChatGPT does, in fact, it reads with understanding, yeah? 586 00:47:46,595 --> 00:47:46,685 yeah? 587 00:47:46,685 --> 00:47:47,277 That's 588 00:47:47,725 --> 00:47:49,030 That's how you see it. 589 00:47:49,270 --> 00:47:50,270 It reads with understanding. 590 00:47:50,310 --> 00:47:51,350 You don't have to bother, 591 00:47:52,071 --> 00:47:52,578 keywords, 592 00:47:52,720 --> 00:47:56,130 search keywords, most frequently used words together. 593 00:47:56,670 --> 00:47:57,480 Think about it. 594 00:47:57,950 --> 00:47:59,800 No, you don't have to do it this way. 595 00:48:00,290 --> 00:48:02,050 You have a tool that reads with understanding. 596 00:48:02,930 --> 00:48:06,960 So in the chapters, I made a couple of small experiments comparing, 597 00:48:07,965 --> 00:48:08,664 and 598 00:48:08,709 --> 00:48:19,344 ChatGPT's efficiency and reliability in terms of, for instance, sentiment analysis and how, it works in comparison to other, widely known tools. 599 00:48:20,269 --> 00:48:23,629 Or other machine learning models specially developed for these tasks. 600 00:48:24,569 --> 00:48:27,049 And, it gives pretty cool results, really. 601 00:48:27,659 --> 00:48:29,909 I don't want to, reveal everything here. 602 00:48:29,909 --> 00:48:31,519 But, it's a good use case. 603 00:48:32,699 --> 00:48:34,859 As long as ChatGPT is a brilliant tool. 604 00:48:35,669 --> 00:48:37,439 and it really does its job. 605 00:48:38,049 --> 00:48:42,459 Very often, it still can't be applied in business reality. 606 00:48:42,524 --> 00:48:51,254 for instance, the thing that you mentioned at the beginning that, there is no repeatability, anytime you ask it a question, you get a slightly different answer. 607 00:48:51,363 --> 00:48:56,093 It's very difficult to, to apply it in a system, yeah, to integrate to a system. 608 00:48:56,523 --> 00:48:59,203 Another question is data safety. 609 00:48:59,533 --> 00:49:03,403 Many companies don't want to use, don't want to allow people to use, 610 00:49:03,565 --> 00:49:04,462 use ChatGPT. 611 00:49:04,606 --> 00:49:12,026 For instance, Artur is not allowed to use ChatGPT at work in bank because of security reasons. 612 00:49:12,297 --> 00:49:13,147 this is another problem. 613 00:49:13,307 --> 00:49:23,197 Not to mention things like speed and scalability, which of course, anything you develop locally would be faster and more scalable than ChatGPT 614 00:49:23,247 --> 00:49:41,692 Miko Pawlikowski: Yeah, I think to that last point that might be changing soon with the open, models that are small enough to run on device, like I think it was last week or a few days ago, Microsoft released their Phi-3 and I haven't used that one, but I used the previous one, Phi-2. 615 00:49:41,957 --> 00:49:43,827 It was surprisingly capable. 616 00:49:43,857 --> 00:49:52,707 It's a, I think it's a 3 billion parameters, model, which means that with 4 bit quantization, you can basically run it on 2 gigs of RAM. 617 00:49:53,247 --> 00:50:01,347 like this 80/20 rule, it might give you 80% of responses that you need and be, effectively free. 618 00:50:01,427 --> 00:50:06,227 And cheap to run or almost, you already have the hardware and you can probably run it on your phone. 619 00:50:06,227 --> 00:50:13,427 So there's that, but going back to your previous point, when people bring up this argument, I always wonder. 620 00:50:13,852 --> 00:50:22,892 Whether this is not the kind of CPU versus GPU analogy, you've got models that are potentially much more efficient. 621 00:50:23,382 --> 00:50:27,642 And then you've got an LLM, which is like a one thing does all. 622 00:50:28,002 --> 00:50:36,392 is it not like throwing, A little bit, a kitchen sink at a problem, like sentiment analysis, that's more or less solved in many people's minds. 623 00:50:36,472 --> 00:50:42,132 It can be done much more cheaply than running a model, that requires billions of parameters. 624 00:50:42,954 --> 00:50:57,304 Artur Guja: Which is exactly why in our book we almost never, show how to throw data into ChatGPT, it does, the thing that would be done much better by a specific algorithm and you get the answer. 625 00:50:57,654 --> 00:51:13,979 No, we use ChatGPT as an assistant to suggest solutions, to discuss potential caveats, to analyze code, to produce code snippets, and maybe transform the code in a certain way for different use cases. 626 00:51:14,369 --> 00:51:16,464 You mentioned CPU and GPU. 627 00:51:16,494 --> 00:51:21,844 There's a whole chapter about, how you can translate code, between different languages or you can. 628 00:51:21,999 --> 00:51:23,439 Optimize code for GPU 629 00:51:23,949 --> 00:51:26,259 or CPU, depending on your needs. 630 00:51:26,639 --> 00:51:27,809 The actual 631 00:51:28,059 --> 00:51:36,629 data analytical work is all almost always done by a specific algorithm or specific tool that is designed for it. 632 00:51:38,019 --> 00:51:43,709 And we're always very wary of just throwing stuff into ChatGPT as you say, it's not designed for it. 633 00:51:43,710 --> 00:51:43,959 It's not optimized 634 00:51:44,539 --> 00:51:44,949 for it. 635 00:51:45,287 --> 00:51:46,657 there is randomness in it. 636 00:51:47,362 --> 00:51:51,482 and, there are much better uses, for an assistant. 637 00:51:52,192 --> 00:52:00,582 Imagine, I always come back to this analogy, imagine you hire an assistant, that, that is a programmer and that has all this data analytical knowledge. 638 00:52:00,792 --> 00:52:04,152 You will not get them sorting numbers in an Excel spreadsheet, right? 639 00:52:04,198 --> 00:52:08,588 Marian Siwiak: I will add my three cents, or five, in our work when we're working with processes. 640 00:52:08,608 --> 00:52:09,028 All right. 641 00:52:09,078 --> 00:52:14,868 We also work with analytical processes and the number of tools is staggering. 642 00:52:15,388 --> 00:52:22,278 from power BI to specialized tools used in, economic modeling and stuff like that. 643 00:52:23,078 --> 00:52:25,448 I will come back to what I said at the very beginning. 644 00:52:26,108 --> 00:52:27,438 Technology doesn't solve problems. 645 00:52:27,519 --> 00:52:41,369 you may have different tech stack and our book shows that GPT or sufficiently developed generative AI will be Help to you irrespectively of your tech stack. 646 00:52:41,559 --> 00:52:44,749 It's like having a specialist on your speed dial, right? 647 00:52:45,429 --> 00:52:45,579 And the. 648 00:52:45,817 --> 00:52:48,137 People to think it in this way. 649 00:52:48,747 --> 00:53:01,487 it's not the tool that will help you with, I don't know, a big query on Google because it will, but just it's respectively of your tech stack, the value of analyst 650 00:53:01,567 --> 00:53:05,327 in my, my view is ability to understand the business process. 651 00:53:05,695 --> 00:53:10,505 Understand what is happening there, how it's reflected in data and how to analyze this data. 652 00:53:10,615 --> 00:53:11,575 So the answer 653 00:53:12,445 --> 00:53:14,315 describes what is happening in reality. 654 00:53:14,605 --> 00:53:19,025 This connection between digital and reality is on analyst. 655 00:53:19,665 --> 00:53:21,845 It's between keyboard and armchair, right? 656 00:53:22,525 --> 00:53:23,855 the technical part 657 00:53:25,105 --> 00:53:27,405 can be supported by ChatGPT very well. 658 00:53:28,190 --> 00:53:29,370 Irrespective of the text. 659 00:53:29,530 --> 00:53:35,050 I was thinking how to answer the question about the technologies that we see, ChatGPT supports them all. 660 00:53:35,098 --> 00:53:37,788 If you have a couple of choices, it can help you choose. 661 00:53:38,148 --> 00:53:42,708 If you know how to, if you will remember to ask him and say, okay, this is my problem. 662 00:53:42,968 --> 00:53:45,198 The one thing that I think we try to. 663 00:53:45,648 --> 00:53:49,218 convey in our book, and I would like also to, to say it here aloud. 664 00:53:49,349 --> 00:53:53,899 when it comes to technology stack - trust him, tell him what is your problem exactly. 665 00:53:53,899 --> 00:53:59,759 Do not tell him just, you can, if you really are a hundred percent sure, but this is what you need. 666 00:54:00,199 --> 00:54:08,229 You can ask him, write me a, I don't know, Python snippet that will calculate this or that confidence interval using this method. 667 00:54:09,174 --> 00:54:19,554 You will be much better off starting with, listen, I am now comparing sales in South Africa with sales in Zimbabwe. 668 00:54:20,004 --> 00:54:23,144 And, the data I have collected looks like that. 669 00:54:23,344 --> 00:54:25,784 So this is just talk about your data. 670 00:54:26,689 --> 00:54:28,249 The tech stack will come out of it. 671 00:54:28,249 --> 00:54:30,229 when working with your assistant, 672 00:54:31,009 --> 00:54:37,569 Do not treat him only as, this is something that I think you mentioned this junior developer assistant 673 00:54:39,359 --> 00:54:40,199 also consultant. 674 00:54:41,009 --> 00:54:46,329 Also someone who read much more than you about many different things. 675 00:54:46,959 --> 00:54:48,369 It may not have your experience. 676 00:54:48,819 --> 00:54:51,019 It may hallucinate in stuff, 677 00:54:51,561 --> 00:54:52,115 Miko Pawlikowski: and 678 00:54:52,279 --> 00:54:57,819 Marian Siwiak: but in general, it has much more knowledge than any human could possibly collect. 679 00:54:58,939 --> 00:55:00,279 tech stack is secondary. 680 00:55:00,539 --> 00:55:01,739 Technology doesn't solve problems. 681 00:55:02,459 --> 00:55:05,389 ChatGPT can help you solve the problem. 682 00:55:06,172 --> 00:55:16,604 Miko Pawlikowski: I think the Llama three that just dropped last week, I was trained on 15 trillion tokens, which is just astronomical at this stage. 683 00:55:17,324 --> 00:55:19,234 And, I think I completely agree. 684 00:55:19,264 --> 00:55:22,604 This is like the stuff that you want to leverage, 685 00:55:22,846 --> 00:55:25,706 Marian Siwiak: the biggest added value is having this specialist 686 00:55:25,839 --> 00:55:26,284 Miko Pawlikowski: could 687 00:55:26,586 --> 00:55:31,106 Marian Siwiak: in many areas with ability to put them together in context. 688 00:55:31,736 --> 00:55:38,946 Sometimes it takes me, especially when they work on more advanced projects, it sends you, chasing the red herring. 689 00:55:39,246 --> 00:55:39,626 Okay. 690 00:55:39,726 --> 00:55:52,056 It happens because some technology is popular because this is also a risk that you need to be aware of, his choice is also based on popularity of certain technologies, ways of doing thing. 691 00:55:52,616 --> 00:56:00,076 if many people described how they solve the problem, It will be more likely to come up as a result. 692 00:56:00,456 --> 00:56:03,996 Some niche solutions are harder to get to. 693 00:56:04,566 --> 00:56:09,146 It doesn't mean that they are not there, but you need to really discuss. 694 00:56:09,306 --> 00:56:10,406 Okay, this is my problem. 695 00:56:10,426 --> 00:56:15,186 This is my, Conditions or considerations or limitations. 696 00:56:16,029 --> 00:56:17,999 this context is important. 697 00:56:18,229 --> 00:56:27,484 It's not only about, okay, I want to calculate the sales that my company had over last quarter, it will give you a very simple answer, right? 698 00:56:28,164 --> 00:56:31,966 if it's something more, nuanced, share these nuances. 699 00:56:32,079 --> 00:56:33,729 Not a prompt engineering. 700 00:56:33,739 --> 00:56:35,429 It's like discussing with 701 00:56:35,690 --> 00:56:37,540 someone who has a lot of knowledge. 702 00:56:38,250 --> 00:56:40,730 He will provide you the most popular solution first. 703 00:56:41,040 --> 00:56:43,260 In 99% of cases, it will be sufficient. 704 00:56:43,638 --> 00:56:52,193 This conversation part is critical, that you learn to converse with it, but you don't just give it 705 00:56:52,193 --> 00:56:52,343 Miko Pawlikowski: tasks. 706 00:56:52,957 --> 00:56:59,177 Marlena Siwiak: But this Marian undermines the whole idea of, prompt engineering, which to me is a scum, by the way. 707 00:56:59,427 --> 00:57:00,167 I think it's a scam. 708 00:57:00,537 --> 00:57:04,257 you can tweak a bit the way it answers, the way it talks. 709 00:57:04,957 --> 00:57:06,237 And sometimes it's important. 710 00:57:06,237 --> 00:57:12,967 This I would call prompt engineering, but preparing the single prompt that solves all your problems at once. 711 00:57:13,017 --> 00:57:14,037 it's another hype. 712 00:57:14,127 --> 00:57:25,447 I think it's another business hype, and people are going to pretend that they know how to do it, and other people would hire them for huge money because they will believe that this will solve all their problems. 713 00:57:25,767 --> 00:57:26,997 It doesn't work that way, 714 00:57:27,722 --> 00:57:34,182 Artur Guja: It's not a silver bullet, but there is, kind of, approach that you need to adopt 715 00:57:34,602 --> 00:57:36,502 When you're using these models, but 716 00:57:36,772 --> 00:57:39,422 When we're talking here, humans discuss things. 717 00:57:39,787 --> 00:57:40,947 you ask a question. 718 00:57:40,977 --> 00:57:42,197 We provide an answer. 719 00:57:42,197 --> 00:57:47,647 You then focus on part of the answer and maybe dig a bit deeper. 720 00:57:47,833 --> 00:57:51,313 and if we don't understand the question, we'll ask you, what do you mean? 721 00:57:51,323 --> 00:57:52,703 or we'll ask you for clarification. 722 00:57:53,533 --> 00:57:54,408 ChatGPT doesn't 723 00:57:54,533 --> 00:57:55,103 have that. 724 00:57:55,263 --> 00:57:58,043 It's you asking the question, you provided a prompt. 725 00:57:58,463 --> 00:57:59,423 It will do its best. 726 00:57:59,953 --> 00:58:01,723 It will not ask for clarification. 727 00:58:01,753 --> 00:58:02,763 It will do its best. 728 00:58:02,963 --> 00:58:04,403 and garbage in, garbage out. 729 00:58:04,403 --> 00:58:08,773 Prompt engineering, I think, what it should be, not what it is, but what it should be, 730 00:58:09,053 --> 00:58:18,183 is the ability to formulate your prompts in such a way that you convey, very clearly your intent, your goals, your limitations. 731 00:58:18,373 --> 00:58:25,103 people think that the prompt is a sentence very often, the more, I, I use ChatGPT, my prompts become bigger and bigger. 732 00:58:25,123 --> 00:58:26,433 I write whole paragraphs 733 00:58:26,783 --> 00:58:32,763 describing different aspects of what I wanted to do, because I know that it will not ask for clarification. 734 00:58:32,929 --> 00:58:35,449 Marian Siwiak: I sometimes add a sentence in the end. 735 00:58:35,449 --> 00:58:43,219 I do prompt engineering and I said, if you need any additional information to provide the best answer, do it. 736 00:58:43,259 --> 00:58:44,329 And sometimes it does. 737 00:58:44,374 --> 00:58:45,154 But rarely. 738 00:58:45,574 --> 00:58:54,709 But this is one of the risks that Artur describes very well in our book is If you ask Generative AI a question, you will get an answer. 739 00:58:55,347 --> 00:58:56,277 Careful what you wish for. 740 00:58:58,402 --> 00:59:00,832 Miko Pawlikowski: Which in many ways is what makes it so special. 741 00:59:01,002 --> 00:59:03,772 Rather than say, oh, go away, that's a stupid question. 742 00:59:03,782 --> 00:59:04,842 You get something. 743 00:59:05,727 --> 00:59:05,747 Marian Siwiak: Yeah. 744 00:59:05,747 --> 00:59:07,182 Yes. 745 00:59:07,522 --> 00:59:10,732 Artur Guja: is probably why we Discussed with Marian many times. 746 00:59:10,732 --> 00:59:13,362 We use the words like please and thank you. 747 00:59:13,712 --> 00:59:19,922 And, we don't do it because we fear that one day it will take over the world and, it will treat us maybe a bit better. 748 00:59:20,362 --> 00:59:26,002 but it seems to react, just a bit better if you say, please give me the answer. 749 00:59:28,552 --> 00:59:29,082 Marian Siwiak: I noticed it. 750 00:59:30,152 --> 00:59:30,642 if I'm being 751 00:59:30,842 --> 00:59:31,822 Artur Guja: not a superstition. 752 00:59:31,832 --> 00:59:31,992 Marian Siwiak: no. 753 00:59:33,812 --> 00:59:36,652 I have a lot of anecdotal evidence to support it. 754 00:59:37,742 --> 00:59:37,932 You 755 00:59:37,982 --> 00:59:39,742 Miko Pawlikowski: speak like a true scientist now. 756 00:59:39,852 --> 00:59:42,732 for everybody who wants to go and grab the book. 757 00:59:42,787 --> 00:59:47,147 once again, it's called "generative AI for data analytics". 758 00:59:47,217 --> 00:59:48,977 It's available at manning. 759 00:59:49,207 --> 00:59:49,567 com. 760 00:59:49,587 --> 00:59:56,327 It's currently in the early access program, which means that you can get a PDF that might change before the final print 761 00:59:56,817 --> 01:00:05,687 and, Just looking at it, looks like it's scheduled for early 2025 if you want to get a physical copy, from Amazon or anything like that. 762 01:00:06,167 --> 01:00:15,047 But, before I let my three amazing guests of The hook out today, I'm gonna fish out a prediction for the future. 763 01:00:15,477 --> 01:00:16,217 Artur, for you. 764 01:00:16,777 --> 01:00:21,197 Where do you see this all going particularly for data analytics? 765 01:00:21,217 --> 01:00:22,587 What's the next step for it? 766 01:00:22,657 --> 01:00:22,777 I 767 01:00:23,879 --> 01:00:42,899 Artur Guja: I think we will get a lot more, capacity to understand, data sets and problems because that's already came with, LLMs, but we will also get, a lot more realization that there is no substitute for human ingenuity. 768 01:00:43,399 --> 01:00:58,544 before LLMs or whatever next phase of models is going to be called, before they, reach that kind of level, I think humans will still be able to, provide a lot more creativity into the process. 769 01:00:58,604 --> 01:01:02,854 And currently that's, I think we're in a period where that's undervalued. 770 01:01:03,714 --> 01:01:05,294 I think the next step will be. 771 01:01:05,574 --> 01:01:07,664 the recognition of the value of creativity. 772 01:01:08,894 --> 01:01:09,614 Marlena Siwiak: I disagree. 773 01:01:09,854 --> 01:01:10,334 I disagree. 774 01:01:10,394 --> 01:01:11,294 I'm totally pessimistic. 775 01:01:11,989 --> 01:01:20,559 I think it's going, we are going to rely more and more on AI, no matter what, without skepticism, and it will lead us to many trouble. 776 01:01:21,209 --> 01:01:32,279 And I'm thinking, even before ChatGPT appeared, there was this trend of, for instance, having job interviews, totally by, Computer programs. 777 01:01:32,349 --> 01:01:34,939 The initial job interview was done by a computer program. 778 01:01:35,279 --> 01:01:39,439 You are recorded and your voice was analyzed and your appearance was analyzed. 779 01:01:40,019 --> 01:01:48,679 And that was such a great tool because it saved a lot of money for companies, but it rejected many good candidates and it was just 780 01:01:49,671 --> 01:01:50,671 hopeless 781 01:01:51,274 --> 01:01:55,364 There was this book Math Destruction, which describes a lot of examples similar to this. 782 01:01:55,544 --> 01:01:58,564 how artificial intelligence and machine learning and other 783 01:01:59,374 --> 01:02:02,474 great tools are used in a wrong way. 784 01:02:02,914 --> 01:02:05,804 I think humanity doesn't learn. 785 01:02:06,064 --> 01:02:06,924 Just doesn't learn. 786 01:02:06,934 --> 01:02:08,934 Because what counts in the end is money. 787 01:02:09,916 --> 01:02:20,156 Marian Siwiak: bean counters will try to save on costly, things like proper data architecture, proper data collection, data engineering. 788 01:02:20,776 --> 01:02:32,326 They will try to cover the early process errors with advanced, High level, tools and, the losses will be covered, of course, by clients and rising prices. 789 01:02:33,286 --> 01:02:39,686 Many people will get good packages for introducing these new tools, but I have deep 790 01:02:40,621 --> 01:02:43,386 distrust that people will understand. 791 01:02:43,906 --> 01:02:54,226 That what Artur said multiple times, garbage in, garbage out, that later in the process you cannot correct some errors in, the data that you're working on. 792 01:02:55,076 --> 01:03:02,901 and this super hype will lead to a lot of, neglect towards the legwork required. 793 01:03:03,936 --> 01:03:05,756 Artur Guja: And here I wanted to inject some optimism. 794 01:03:06,806 --> 01:03:11,616 Miko Pawlikowski: Well, it was worth a try that went out of the window already disagreeing with each other. 795 01:03:11,726 --> 01:03:18,626 Marlena Siwiak: I think, I agree with Marian, that we are that close, really that close from some artificial, self awareness. 796 01:03:19,546 --> 01:03:22,406 So it's great moment in human history, really. 797 01:03:22,921 --> 01:03:23,961 It's good to be part of it, 798 01:03:24,141 --> 01:03:27,911 Marian Siwiak: the job market has so many ways. 799 01:03:28,556 --> 01:03:31,976 of screwing you over, that you shouldn't worry about AI. 800 01:03:32,446 --> 01:03:44,946 Artur Guja: Adapt your thinking, as Marlena said, AI is here to stay and you cannot go into the job market saying I will compete with AI because then you're putting yourself at a very disadvantaged position. 801 01:03:45,446 --> 01:03:49,046 but also as Marlena said, use AI to your advantage. 802 01:03:49,301 --> 01:04:02,271 As squeeze out of it as much as you can seek the opportunities, not only for as, as jobs with AI, but using AI in your job, don't go headstrong into AI jobs thinking, Oh, this, these are the jobs of the future. 803 01:04:02,271 --> 01:04:04,301 No, do what you wanted to do all along. 804 01:04:04,311 --> 01:04:10,801 You'll become a zoologist or become a, a social worker, become a oceanographer, whatever. 805 01:04:10,801 --> 01:04:12,641 These are all great pursuits 806 01:04:12,921 --> 01:04:14,421 and use AI in them. 807 01:04:14,991 --> 01:04:18,761 Because you don't have to be a hammerologist to use a hammer, 808 01:04:19,441 --> 01:04:23,251 but, you can do great things with a hammer if used in the right way. 809 01:04:24,971 --> 01:04:28,891 Miko Pawlikowski: Hard to argue with that last question, Marian, this one's for you. 810 01:04:29,421 --> 01:04:42,651 If you could have a magical way to break into OpenAI and hack their ChatGPT to display a message on top of the chat box that everybody using ChatGPT is using, what would it say? 811 01:04:43,906 --> 01:04:44,896 Marlena Siwiak: Buy our book. 812 01:04:45,896 --> 01:04:46,446 Marian Siwiak: Talk to me. 813 01:04:46,646 --> 01:04:47,556 Do not enter prompts. 814 01:04:47,686 --> 01:04:48,256 Talk to me. 815 01:04:48,574 --> 01:04:51,364 Miko Pawlikowski: As in be nice to me and then demand things. 816 01:04:51,364 --> 01:04:51,974 Talk to me. 817 01:04:52,879 --> 01:04:54,489 Marian Siwiak: Depends on the person you are. 818 01:04:54,539 --> 01:04:56,479 I think everybody should. 819 01:04:57,469 --> 01:05:07,249 as I said, looking at this different prompt engine, I'm on a couple of groups on Facebook or on LinkedIn, which are excited by ChatGPT this way or another. 820 01:05:08,009 --> 01:05:23,399 and I see a lot of, okay, so this is the prompt I prepared and you just put your, the name of your company here and like this or that people are avoiding like fire talking to ChatGPT, like the specialist to a wise colleague. 821 01:05:24,544 --> 01:05:33,544 And they would be much better off just talking about problem, not trying to extract answer if you feel the difference. 822 01:05:34,454 --> 01:05:55,994 it's not about respect only one, one day, I believe soon, it will be the case, but you will get much more and the whole our book is about you will get so much more if you will trust that it has knowledge and you need to talk about the problem, 823 01:05:56,094 --> 01:05:56,744 prompt me. 824 01:05:56,744 --> 01:06:03,464 Do not, give me tasks, this is something that would probably improve people's, outcomes from these conversations. 825 01:06:04,929 --> 01:06:05,329 Miko Pawlikowski: Love it. 826 01:06:05,569 --> 01:06:10,999 So Sam Altman, if you're listening to this, you now know how to improve the ChatGPT interface. 827 01:06:11,639 --> 01:06:14,829 Marlena, Marian, Artur, thank you so much for coming. 828 01:06:14,949 --> 01:06:17,949 good luck with the sales of the book and I'll see you next time. 829 01:06:18,209 --> 01:06:18,579 Thank you. 830 01:06:19,116 --> 01:06:19,406 Artur Guja: Thank you. 831 01:06:19,416 --> 01:06:19,936 very much.