1 00:00:00,090 --> 00:00:01,040 Hello, ladies. 2 00:00:01,040 --> 00:00:04,240 Welcome to this week's episode of the hourly to exit podcast. 3 00:00:04,565 --> 00:00:06,636 I have a very special guest today. 4 00:00:06,666 --> 00:00:10,079 My law school classmate, joy Butler joy. 5 00:00:10,079 --> 00:00:10,649 Welcome. 6 00:00:10,649 --> 00:00:12,179 And thank you so much for joining us. 7 00:00:12,833 --> 00:00:13,853 Thank you, Aaron. 8 00:00:13,883 --> 00:00:17,283 I am honored to have been asked to be a guest. 9 00:00:17,504 --> 00:00:23,182 Well, we're very excited to have you because AI could not be more 10 00:00:23,252 --> 00:00:25,745 top of mind, for this audience. 11 00:00:25,865 --> 00:00:29,678 And so as someone who has written extensively and spoken 12 00:00:29,678 --> 00:00:33,772 about AI, I definitely wanted to have you on to, , go deep. 13 00:00:33,938 --> 00:00:36,818 So before we get started, would you introduce yourself to the audience? 14 00:00:37,188 --> 00:00:37,718 Sure. 15 00:00:37,898 --> 00:00:42,658 so as you already shared, I am an attorney and in my law firm practice, 16 00:00:42,698 --> 00:00:45,398 I provide product counsel services. 17 00:00:45,655 --> 00:00:52,149 that essentially means I provide a combination of strategic and legal advice. 18 00:00:52,497 --> 00:00:58,575 To companies that are, going into new lines of business or launching new 19 00:00:58,575 --> 00:01:04,515 products new features of existing products or forming strategic partnerships. 20 00:01:04,742 --> 00:01:11,762 And I come by that from, two areas of wall where I have a deep, in depth 21 00:01:11,812 --> 00:01:18,915 knowledge, and that includes the technology side where I have worked on 22 00:01:18,975 --> 00:01:25,052 and, help to structure probably literally, over 1000 contracts over the course 23 00:01:25,072 --> 00:01:30,402 of my career for, all the contracts 1 would need when 1 is doing business. 24 00:01:30,587 --> 00:01:36,384 Online and in digital technology, including end user license arrangements 25 00:01:36,404 --> 00:01:41,740 and terms and conditions and the other prong of my in depth legal 26 00:01:41,740 --> 00:01:46,364 knowledge concerns entertainment and copyright and this is where 27 00:01:46,364 --> 00:01:48,594 you and I overlap quite a bit. 28 00:01:48,760 --> 00:01:56,414 so I work on a lot of, creative content contracts also advise companies on. 29 00:01:56,577 --> 00:02:02,162 Protecting their, copyrights and trademarks and, work with companies 30 00:02:02,352 --> 00:02:06,655 that want to use, someone else's content, doing a lot of work 31 00:02:06,715 --> 00:02:08,755 in the rights clearance area. 32 00:02:08,959 --> 00:02:12,639 just to give your audience a little more of a flavor of the 33 00:02:12,659 --> 00:02:16,749 types of projects I might work on, most of them are in the digital 34 00:02:16,789 --> 00:02:18,719 technology and entertainment space. 35 00:02:18,719 --> 00:02:24,576 for So a couple of projects include helping an entertainment social media 36 00:02:24,586 --> 00:02:32,517 network launch, working with an commerce retail site that was incorporating a lot 37 00:02:32,517 --> 00:02:36,741 of album cover work and original artwork. 38 00:02:37,307 --> 00:02:41,867 another was, an ad supported, stock simulation game. 39 00:02:41,917 --> 00:02:46,030 And here's something that may resonate with your audience, helping 40 00:02:46,080 --> 00:02:51,587 a professional in the finance area take, this niche financial service 41 00:02:51,587 --> 00:02:59,570 he was offering and, convert it into an online software as a service. 42 00:02:59,785 --> 00:03:00,505 product. 43 00:03:01,188 --> 00:03:03,465 so, that is me in a nutshell. 44 00:03:03,695 --> 00:03:04,165 Awesome. 45 00:03:04,368 --> 00:03:08,668 When did you first when I think even tell you the day I first heard about AI. 46 00:03:08,855 --> 00:03:10,792 Where were you when you first heard about it? 47 00:03:10,832 --> 00:03:12,982 What was the context and what were your initial thoughts? 48 00:03:14,165 --> 00:03:19,500 I don't remember, the first time I heard about, JATGPT. 49 00:03:20,587 --> 00:03:20,747 Right. 50 00:03:20,757 --> 00:03:22,527 That may be what you're referring to. 51 00:03:22,673 --> 00:03:22,873 Yeah. 52 00:03:22,953 --> 00:03:29,490 but, actually within my practice, I have for quite some time been experimenting 53 00:03:29,540 --> 00:03:36,715 with, trying to take some of my knowledge And, develop it into, digital tools, 54 00:03:36,918 --> 00:03:39,352 making it more accessible to people. 55 00:03:39,518 --> 00:03:44,568 as you know, I've written a couple of books on my areas of in depth knowledge. 56 00:03:44,748 --> 00:03:49,058 So, one of the things I've been experimenting with is, taking some of that 57 00:03:49,078 --> 00:03:54,795 knowledge and offering it in a digital format, one, experiment I believe I shared 58 00:03:54,795 --> 00:04:01,042 with you was A contest and promotion tool, which asked a number of questions 59 00:04:01,132 --> 00:04:04,345 and then gave you kind of a checklist. 60 00:04:04,582 --> 00:04:09,213 of the legal questions you might ask before going forward with that. 61 00:04:09,370 --> 00:04:15,640 and I've asked, actually used a tool, that a lot of, attorneys and, 62 00:04:15,820 --> 00:04:20,967 well, it is a, Interview construction tool targeted to the legal space. 63 00:04:20,967 --> 00:04:21,787 It's called Doc. 64 00:04:21,787 --> 00:04:22,457 Assemble. 65 00:04:22,623 --> 00:04:26,940 It's actually open source and spent a little bit of time. 66 00:04:26,990 --> 00:04:31,010 tinkering around with that, is a long way to answer your question. 67 00:04:31,060 --> 00:04:34,830 I was familiar with automation and artificial intelligence 68 00:04:35,103 --> 00:04:36,137 through that process. 69 00:04:36,560 --> 00:04:41,987 But when chat GPT came to my attention, that may have been around the same time 70 00:04:41,987 --> 00:04:43,487 as it came to everyone else's attention. 71 00:04:43,487 --> 00:04:45,287 I kept hearing about it and 72 00:04:46,267 --> 00:04:46,620 right. 73 00:04:46,700 --> 00:04:46,950 Right. 74 00:04:47,187 --> 00:04:50,237 I guess I'd heard about it, but it was just noise to me kind 75 00:04:50,237 --> 00:04:52,250 of like block train or crypto. 76 00:04:52,560 --> 00:04:53,420 don't need to know that. 77 00:04:53,420 --> 00:04:56,880 I don't want to know it, until finally I could no longer 78 00:04:56,890 --> 00:04:58,260 ignore it, which was during. 79 00:04:58,260 --> 00:04:58,330 Yeah. 80 00:04:58,577 --> 00:05:02,230 And MCLE where I needed to get some credits, so I wasn't delinquent. 81 00:05:02,523 --> 00:05:07,353 And so I'm listening to this one about AI and it's describing, they were talking 82 00:05:07,353 --> 00:05:10,367 about chat, GBD in particular, and they're describing what you could do. 83 00:05:10,367 --> 00:05:13,157 And they're having these samples and I'm like, what it can do. 84 00:05:13,157 --> 00:05:13,327 What? 85 00:05:14,137 --> 00:05:16,897 And so while I'm still in there, you know, it was just online. 86 00:05:17,047 --> 00:05:17,327 I'm silly. 87 00:05:17,327 --> 00:05:18,787 God forbid I go someplace in person. 88 00:05:18,960 --> 00:05:20,930 and then I'm on my computer, like. 89 00:05:21,043 --> 00:05:21,853 Doing stuff with it. 90 00:05:21,853 --> 00:05:23,653 I'm like, Oh my God, this is bad. 91 00:05:24,150 --> 00:05:28,090 And that was, well, it was February, 2023 and that was my initiation. 92 00:05:28,130 --> 00:05:33,273 So what the last year has been, actually a fire hose of information 93 00:05:33,273 --> 00:05:34,623 and changes in that time. 94 00:05:34,889 --> 00:05:39,309 so I think chat GPT, it's the AOL of our times. 95 00:05:40,109 --> 00:05:44,409 It's this technology that's been around for a while, but we finally have this 96 00:05:44,499 --> 00:05:49,923 application that has made it a much more, accessible and user friendly 97 00:05:49,933 --> 00:05:51,503 for a much wider group of people. 98 00:05:51,879 --> 00:05:51,939 Yeah. 99 00:05:51,939 --> 00:05:54,299 I mean, I guess, you know, when you think artificial intelligence has 100 00:05:54,299 --> 00:05:57,513 been a while, I mean, obviously we've always had autocorrect and things 101 00:05:57,513 --> 00:06:00,793 like that, or, all those things were artificial intelligence, right? 102 00:06:00,946 --> 00:06:03,380 things like Alexa and Siri, right? 103 00:06:03,390 --> 00:06:04,905 I mean, those Versions of it. 104 00:06:04,925 --> 00:06:07,995 We just didn't think of it the way that we think of AI now. 105 00:06:08,133 --> 00:06:08,748 Exactly. 106 00:06:09,138 --> 00:06:10,298 It's been around for a while. 107 00:06:10,308 --> 00:06:14,571 We just finally got a killer app in chat GPT. 108 00:06:15,261 --> 00:06:15,561 Right. 109 00:06:15,651 --> 00:06:16,121 Awesome. 110 00:06:16,290 --> 00:06:22,545 So a lot of questions that I get are around, where's this data coming from? 111 00:06:22,545 --> 00:06:26,761 what is the black box of, generative AI in particular we're talking about. 112 00:06:26,943 --> 00:06:29,563 and what do I need to worry about? 113 00:06:29,676 --> 00:06:33,223 are they taking my prompts and what are they doing with it? 114 00:06:33,504 --> 00:06:38,997 client who is, utilizing signing an agreement to utilize the contract 115 00:06:39,017 --> 00:06:43,941 review a I like, what are the issues regarding using 1 of those? 116 00:06:43,951 --> 00:06:48,344 So everybody has questions about, What happens when I use AI and 117 00:06:48,344 --> 00:06:49,454 what do I need to worry about? 118 00:06:49,576 --> 00:06:54,314 And where does that, data come from and what is my exposure? 119 00:06:54,493 --> 00:06:56,933 So I would just like to start from the top. 120 00:06:57,078 --> 00:07:01,856 I think most of the audience is familiar, AI, but let's talk 121 00:07:01,866 --> 00:07:04,392 about what training data is. 122 00:07:04,392 --> 00:07:06,435 Like does it get its information from? 123 00:07:06,435 --> 00:07:07,595 How does it get in there? 124 00:07:07,730 --> 00:07:09,743 And, yeah, just start there with a general. 125 00:07:10,153 --> 00:07:10,363 Yeah. 126 00:07:10,611 --> 00:07:15,959 when we talk about a I models and some of the copyright and licensing 127 00:07:15,999 --> 00:07:19,099 issues, there are kind of 2 categories. 128 00:07:19,776 --> 00:07:21,526 category is the input. 129 00:07:21,606 --> 00:07:24,216 And the 2nd category is the output. 130 00:07:24,216 --> 00:07:28,091 When we're talking about generative, a, I, So when you mention training 131 00:07:28,101 --> 00:07:31,881 material, you're talking about the first category of input. 132 00:07:32,073 --> 00:07:37,264 And there has been a lot of controversy over whether or not, the training 133 00:07:37,264 --> 00:07:43,261 material that is required to train these models, can be used without permission. 134 00:07:43,441 --> 00:07:48,641 Because what the foundation models do, when I say foundation models, I mean 135 00:07:48,661 --> 00:07:56,074 that Maybe eight or 10 models are around that, literally have millions of pieces 136 00:07:56,134 --> 00:08:03,318 of content that they take into their kind of black box and, analyze it so that it 137 00:08:03,318 --> 00:08:07,221 can be a general use large language model. 138 00:08:07,398 --> 00:08:12,744 many of these models do is they source that data by getting data from 139 00:08:12,784 --> 00:08:17,604 anywhere that they can, including, scraping the Internet for millions 140 00:08:17,624 --> 00:08:19,964 and millions of pieces of data. 141 00:08:20,149 --> 00:08:23,138 So, there's been, as I said, a lot of controversy around 142 00:08:23,148 --> 00:08:26,971 whether or not permission is required for them to do that. 143 00:08:27,128 --> 00:08:34,064 and many of these models are relying on now is an argument that, their use of that 144 00:08:34,064 --> 00:08:41,274 material, as training material qualifies as a fair use to the Copyright Act. 145 00:08:41,731 --> 00:08:47,751 I believe there number of, Areas a number of factors that will gradually 146 00:08:47,811 --> 00:08:53,252 push these AI foundation models towards licensing that material. 147 00:08:53,418 --> 00:08:58,492 1 of them is, is that there have been a number of lawsuits that have been filed 148 00:08:58,542 --> 00:09:04,632 against them, charging them with copyright infringement and other related infractions 149 00:09:04,672 --> 00:09:06,252 over their use of this material. 150 00:09:06,398 --> 00:09:06,952 and. 151 00:09:07,165 --> 00:09:11,967 A lot of those suits while all of those suits are still pending and they may 152 00:09:11,987 --> 00:09:14,687 take a very long time to play out. 153 00:09:14,830 --> 00:09:19,530 I think we're going to see progress towards more licensing prior to that. 154 00:09:19,710 --> 00:09:22,517 And that's because people are very anxious. 155 00:09:22,727 --> 00:09:24,943 to, use generative A. 156 00:09:24,983 --> 00:09:25,433 I. 157 00:09:25,610 --> 00:09:30,153 And, before they use that, though, they want some comfort level that 158 00:09:30,163 --> 00:09:35,800 their use that material is not going to subject them to any type of a 159 00:09:35,840 --> 00:09:38,690 copyright infringement or other claim. 160 00:09:38,887 --> 00:09:44,212 So in order to make their customers, comfortable, With the fact that they 161 00:09:44,212 --> 00:09:49,465 can use this material without taking on any legal liability, we are seeing more 162 00:09:49,465 --> 00:09:55,937 and more of these AI companies gradually move towards licensing the content. 163 00:09:56,188 --> 00:09:59,375 want to follow up on that before I'm going to step back just a second, 164 00:09:59,405 --> 00:10:03,092 because you said large language models, and then we have machine 165 00:10:03,092 --> 00:10:05,012 learning and we have generative AI. 166 00:10:05,012 --> 00:10:06,632 Are those synonyms? 167 00:10:06,882 --> 00:10:09,605 Or are they all different elements? 168 00:10:09,605 --> 00:10:10,325 Transcribed Okay, 169 00:10:11,098 --> 00:10:14,848 not the expert here, but I'll share with you my understanding. 170 00:10:15,063 --> 00:10:22,815 So, the large language models, they are, the general models that can process. 171 00:10:23,517 --> 00:10:29,327 the generated output, so that means they take all of the input, 172 00:10:29,357 --> 00:10:34,823 all of that training material, and they basically analyze it to see 173 00:10:34,833 --> 00:10:39,813 what the relationship of each data point is to this other data point. 174 00:10:39,977 --> 00:10:46,077 So, when you ask it to produce something, it is, estimating or. 175 00:10:46,263 --> 00:10:52,260 Putting forth, it's analysis of what word should come next or what 176 00:10:52,270 --> 00:10:57,027 should come next in this particular graph, which is why it needs so much 177 00:10:57,027 --> 00:10:59,247 training material from which to learn. 178 00:10:59,468 --> 00:10:59,918 Got it. 179 00:11:00,018 --> 00:11:00,498 Okay. 180 00:11:00,665 --> 00:11:06,815 Now, you mentioned going back to where it's going towards licensing because users 181 00:11:06,855 --> 00:11:11,205 of AI, I want to know that they're not going to get sued they use the output. 182 00:11:11,412 --> 00:11:16,002 what does that mean for all of the current data that has been scraped 183 00:11:16,022 --> 00:11:17,812 from the Internet and all these places? 184 00:11:18,002 --> 00:11:21,522 previously, I mean, isn't the data sets. 185 00:11:21,780 --> 00:11:26,530 and our use of AI as is almost like too big to fail. 186 00:11:26,540 --> 00:11:30,465 what could happen with these lawsuits that are happening right now, if there 187 00:11:30,475 --> 00:11:36,153 are billions of pieces of, let's say pirated information and say, the chat 188 00:11:36,233 --> 00:11:38,957 GBT, open AI is training data set. 189 00:11:38,988 --> 00:11:42,712 what could the possible remedy be if they lose? 190 00:11:43,493 --> 00:11:48,535 Okay, so I do want to separate this into 2 categories again, because 191 00:11:48,535 --> 00:11:51,705 when we talk about infringement, there are 2 separate questions. 192 00:11:51,767 --> 00:11:56,068 The 1st question being whether or not just the process of. 193 00:11:56,208 --> 00:12:01,958 Of the, a I companies, taking in data as training material and using 194 00:12:01,958 --> 00:12:05,018 it to train their model, whether or not that's copyright infringement. 195 00:12:05,048 --> 00:12:06,478 That's one question. 196 00:12:06,657 --> 00:12:12,138 And then the second question is if you as a user of these models, 197 00:12:12,347 --> 00:12:15,212 if you produce content and. 198 00:12:15,437 --> 00:12:17,460 Use it to produce generated content. 199 00:12:17,667 --> 00:12:19,753 Is there any legal liability for you? 200 00:12:19,953 --> 00:12:25,527 Now, there are circumstances that could be imagined where, it's 201 00:12:25,737 --> 00:12:32,947 possible for the, models, training data to be considered a fair use. 202 00:12:33,303 --> 00:12:39,206 But maybe the way you've used it in creating output, is infringing 203 00:12:39,256 --> 00:12:41,726 or violating in some way. 204 00:12:41,861 --> 00:12:45,299 I'm not saying that scenario has actually come up or may come up 205 00:12:45,409 --> 00:12:50,183 often, but 1 can imagine a set of circumstances where that might be true. 206 00:12:50,708 --> 00:12:53,901 So, back to your original question, where is all this going? 207 00:12:53,921 --> 00:12:55,431 What are the potential remedies? 208 00:12:55,628 --> 00:13:02,028 well, 1 remedy with respect to these lawsuits is that they will settle with 209 00:13:02,058 --> 00:13:06,441 a lot of these companies because the companies that have sued them have been 210 00:13:06,891 --> 00:13:12,321 the largest companies with the most resources and very large organizations. 211 00:13:12,514 --> 00:13:18,221 Like the author's guild, so they may settle, come to some agreement 212 00:13:18,301 --> 00:13:21,251 on what a settlement fee should be. 213 00:13:21,461 --> 00:13:25,394 And it's also possible that part of their settlement might be a 214 00:13:25,404 --> 00:13:27,624 licensing agreement going forward. 215 00:13:27,818 --> 00:13:33,241 that resolves matters for, the large organizations that have sued and. 216 00:13:33,403 --> 00:13:37,899 The large private companies, if it's an organization or association, representing 217 00:13:37,909 --> 00:13:43,126 much smaller players, it remains to be seen how much might flow to them 218 00:13:43,283 --> 00:13:45,533 as part of any judicial settlement. 219 00:13:45,719 --> 00:13:50,789 It may be that as opposed to a private settlement, we might get 220 00:13:50,799 --> 00:13:53,623 some sort of a judicial settlement. 221 00:13:53,719 --> 00:13:54,156 I think. 222 00:13:54,319 --> 00:13:59,824 It's perhaps less likely, but it might be one of the outcomes and that 223 00:13:59,824 --> 00:14:04,034 might be a settlement like something that was proposed in Google Books. 224 00:14:04,189 --> 00:14:10,771 Now, the Google Books lawsuit, if anyone remembers his lawsuit from 2015. 225 00:14:10,949 --> 00:14:14,903 This is the lawsuit that came out of Google Books starting its program 226 00:14:14,903 --> 00:14:18,956 where it digitized millions of books and use them and still uses 227 00:14:18,956 --> 00:14:23,336 them today to give a snippet of books in response to our search. 228 00:14:23,549 --> 00:14:27,969 So that is one of the cases on which a lot of these AI model companies 229 00:14:27,969 --> 00:14:33,309 rely when they argue that their use of the training material is a fair use. 230 00:14:33,596 --> 00:14:39,533 For those who remember, the Google Books case initially, tried to resolve 231 00:14:39,533 --> 00:14:44,381 itself, via a judicial settlement agreement that would have permitted 232 00:14:44,411 --> 00:14:49,014 the snippets of those books and allowed the digitization, but that judicial 233 00:14:49,024 --> 00:14:53,079 settlement, or the private settlement that was proposed, went to court. 234 00:14:53,084 --> 00:14:59,141 Very much beyond just providing snippets, which is, one of the reasons 235 00:14:59,191 --> 00:15:05,508 that it was ultimately, not approved by the court and kept going on and 236 00:15:05,558 --> 00:15:08,661 ultimately said, okay, well, we're stripping out all this information. 237 00:15:09,426 --> 00:15:11,066 you try to do in the settlement. 238 00:15:11,198 --> 00:15:15,923 But, as consolation, we decided Google books that your use is a fair use. 239 00:15:16,128 --> 00:15:21,933 So, it might be that some of the parties, try to move in that direction of some 240 00:15:21,933 --> 00:15:27,333 type of a settlement that encompasses both small and larger players. 241 00:15:27,536 --> 00:15:31,369 some of the other types of resolutions that have been thrown 242 00:15:31,369 --> 00:15:34,026 out include kind of a collective. 243 00:15:34,306 --> 00:15:39,896 that would be parallel to, the way we collect, public performance 244 00:15:39,916 --> 00:15:41,706 royalties in the music industry. 245 00:15:41,866 --> 00:15:48,173 So, for example, when a song is performed on the radio, all songwriters, 246 00:15:48,223 --> 00:15:51,879 receive some income from their songs they've written being played. 247 00:15:51,929 --> 00:15:52,199 Well. 248 00:15:52,568 --> 00:15:56,721 the radio station is not going out and I'm entering into license agreements 249 00:15:56,721 --> 00:15:58,141 with the millions of songwriters. 250 00:15:58,303 --> 00:16:03,769 They have collected is in the case of music, as cap and BMI and couple others. 251 00:16:03,884 --> 00:16:08,508 that, have these collective agreements where they issue blanket licenses. 252 00:16:08,678 --> 00:16:12,981 So something like that has been proposed, potentially for, 253 00:16:13,071 --> 00:16:14,601 the training material space. 254 00:16:14,631 --> 00:16:19,378 So that brings in both, rights owners with very large catalogs and rights 255 00:16:19,388 --> 00:16:21,858 owners with very small catalogs. 256 00:16:22,081 --> 00:16:27,368 The copyright office had a comment period where it asked a bunch of industry 257 00:16:27,368 --> 00:16:31,251 players what they thought of this and most of the people who commented were 258 00:16:31,251 --> 00:16:37,821 very much in favor with, direct licensing, or perhaps even aggregated licensing 259 00:16:38,021 --> 00:16:43,578 now that may be in part because that's where, larger companies are going to 260 00:16:43,748 --> 00:16:45,948 get kind of the premium licensing. 261 00:16:46,193 --> 00:16:50,766 Um, because the direct licenses we've seen, today have been between, you 262 00:16:50,803 --> 00:16:55,206 AI model companies and very large organizations for millions of dollars. 263 00:16:55,376 --> 00:17:00,682 just like the, what we're talking about in the collective licensing, example, 264 00:17:00,812 --> 00:17:05,795 those very large companies are not going to enter into license agreements 265 00:17:05,805 --> 00:17:08,465 with, millions of small players. 266 00:17:09,445 --> 00:17:12,425 there'll be some balancing where they too can participate. 267 00:17:12,762 --> 00:17:18,265 1 potential example of how this might be alleviated is through aggregators. 268 00:17:18,414 --> 00:17:22,754 So, one aggregator we have right now is the Copyright Clearance Center, 269 00:17:22,794 --> 00:17:28,284 which is aggregating, scientific papers for use in training material, and 270 00:17:28,284 --> 00:17:33,514 that allows, smaller rights owners to participate in, having their material 271 00:17:33,654 --> 00:17:37,194 and being paid for their material to be used as training material, 272 00:17:37,344 --> 00:17:39,154 if that's what they choose to do. 273 00:17:39,332 --> 00:17:46,022 An example in, this space I've seen, come forward as a startup is called Dappier, 274 00:17:46,197 --> 00:17:54,145 and that is a startup that is, dedicated to getting those smaller, rights owners, 275 00:17:54,145 --> 00:17:59,232 giving them the opportunity to participate in being a part of training material. 276 00:17:59,425 --> 00:18:07,669 and making that training material more accessible to both the large, AI models, 277 00:18:07,719 --> 00:18:12,542 and, you know, the smaller, AI companies that might have fewer resources and not 278 00:18:12,542 --> 00:18:16,672 be as able to, compete when, you know, license, you know, is Agreements are going 279 00:18:16,672 --> 00:18:18,052 for millions and millions of dollars. 280 00:18:18,192 --> 00:18:18,512 Yeah. 281 00:18:18,562 --> 00:18:18,722 Yeah. 282 00:18:18,912 --> 00:18:21,242 I mean, it sounds like this would all have to be perspective. 283 00:18:21,242 --> 00:18:25,595 I mean, if, the AI companies have been scraping the Internet for we 284 00:18:25,595 --> 00:18:30,449 don't know how long and is it even able to distinguish 1 piece of data? 285 00:18:30,582 --> 00:18:35,683 In the data set from another, I don't know, like, how would you compensate all 286 00:18:35,683 --> 00:18:37,413 the information that's already in there. 287 00:18:37,567 --> 00:18:43,147 and in order to, parcel out payments, whatever fraction of a penny that, I might 288 00:18:43,147 --> 00:18:46,117 get for, something, and going forward. 289 00:18:46,272 --> 00:18:50,222 If you are a small content creator, kind of your everyday content creator, 290 00:18:50,222 --> 00:18:55,165 like the audience here, it would then be on you to make sure that 291 00:18:55,165 --> 00:18:57,045 your content is registered somewhere. 292 00:18:57,045 --> 00:19:00,672 So you'd be part of some aggregator that has a license 293 00:19:00,672 --> 00:19:03,425 who is getting paid by the AI, AI 294 00:19:03,840 --> 00:19:04,302 Okay. 295 00:19:04,452 --> 00:19:07,202 So several issues in that question. 296 00:19:07,980 --> 00:19:08,420 okay. 297 00:19:08,470 --> 00:19:12,834 Let's go with, That first part where you talk about kind of the provenance, 298 00:19:12,874 --> 00:19:14,394 what was the source of the data? 299 00:19:14,464 --> 00:19:16,074 Is it even traceable? 300 00:19:16,547 --> 00:19:18,767 And this is one of the pain points. 301 00:19:18,827 --> 00:19:25,914 And this is also where that analysis about whether your output subjects 302 00:19:25,924 --> 00:19:28,274 you to any type of liability. 303 00:19:28,397 --> 00:19:30,237 so back up for a second. 304 00:19:30,377 --> 00:19:35,955 if you are any type of content creator, and you are trying to determine 305 00:19:35,955 --> 00:19:40,715 whether or not the content you've created, is violating any rights, 306 00:19:40,835 --> 00:19:43,455 you need to know its source, right? 307 00:19:43,515 --> 00:19:48,977 So, For the output that they have, if that provenance is not available, 308 00:19:49,174 --> 00:19:52,977 you using the generative AI, you can't even do that analysis. 309 00:19:53,170 --> 00:19:57,457 So that's part of the pressure on the AI model companies. 310 00:19:57,584 --> 00:20:02,004 in not just waiting for these lawsuits to play out, but making their 311 00:20:02,004 --> 00:20:07,990 potential customers comfortable that you can use our AI models and it can 312 00:20:07,990 --> 00:20:11,530 produce output that you can then use. 313 00:20:11,600 --> 00:20:16,147 And so part of having to do that is knowing the provenance. 314 00:20:16,347 --> 00:20:19,430 Now, the extent to which they currently do that. 315 00:20:19,560 --> 00:20:20,580 I don't know. 316 00:20:20,927 --> 00:20:22,707 I model companies have often. 317 00:20:22,842 --> 00:20:28,275 Been quite opaque and not very transparent about how the sausage is being made. 318 00:20:28,523 --> 00:20:32,863 on the outside, though, like again as another example of where the industry 319 00:20:32,863 --> 00:20:38,553 is going, there has cropped up another, kind of startup in this space called 320 00:20:38,716 --> 00:20:44,786 barely trained, which is offering certification for AI model companies. 321 00:20:44,976 --> 00:20:52,340 that have, produced their models relying solely on an authorized data set. 322 00:20:52,740 --> 00:20:57,610 And then, you know, theory is, you are, a company that wants to leverage AI, 323 00:20:58,090 --> 00:21:03,020 you can get more comfort in knowing that, you're relying on AI model, an 324 00:21:03,030 --> 00:21:04,840 AI company that is fairly trained. 325 00:21:04,840 --> 00:21:08,950 And the last time I checked, there were only a few dozen companies that had 326 00:21:08,950 --> 00:21:11,490 that certification, but, that may grow. 327 00:21:11,703 --> 00:21:15,321 So, maybe enterprise users would go for the fairly trained type, 328 00:21:15,321 --> 00:21:20,416 because they're much more concerned, frankly, than most everyday users 329 00:21:20,456 --> 00:21:22,586 about the quality of that output. 330 00:21:22,849 --> 00:21:26,626 it seems like if they're using it, to create public facing 331 00:21:26,719 --> 00:21:30,916 materials, they would want that fairly trained data set behind it. 332 00:21:31,139 --> 00:21:33,469 they did, they also give reps and warranties when you go through them 333 00:21:33,631 --> 00:21:35,612 regarding the quality of the output. 334 00:21:35,821 --> 00:21:39,929 Do they give representations and warranties fairly trained provided 335 00:21:39,974 --> 00:21:44,491 anyone who well, fairly trained or someone who has licensed their 336 00:21:44,501 --> 00:21:49,804 data from fairly trained would they then in their terms of use have. 337 00:21:49,919 --> 00:21:51,137 Represent fairly, 338 00:21:51,167 --> 00:21:53,244 trained doesn't license data. 339 00:21:53,324 --> 00:21:56,224 Fairly trained is a certification program. 340 00:21:56,421 --> 00:22:04,027 So if an AI company, wants this certification to show everyone, that. 341 00:22:04,201 --> 00:22:07,777 They have relied on an authorized data set, then this is a 342 00:22:07,777 --> 00:22:09,947 certification that they can apply for. 343 00:22:10,097 --> 00:22:10,607 Got it. 344 00:22:10,637 --> 00:22:11,077 Okay. 345 00:22:11,271 --> 00:22:15,601 because I believe that there are some platforms that do provide 346 00:22:15,601 --> 00:22:19,522 indemnification, although they have a bunch of provisos, where's that going? 347 00:22:19,562 --> 00:22:20,902 So that users feel more. 348 00:22:21,077 --> 00:22:22,421 Comfort there, right? 349 00:22:22,461 --> 00:22:27,224 So, I mean, I think that's part of their responding to, this pain point 350 00:22:27,397 --> 00:22:31,907 of needing to make their customers more comfortable with using their product. 351 00:22:32,081 --> 00:22:37,271 they are, providing certain indemnifications, it remains to be seen, 352 00:22:37,531 --> 00:22:44,354 how effective those indemnifications would be if a customer were actually sued. 353 00:22:44,537 --> 00:22:51,701 And as you mentioned, they do have, a lot of exclusions, personally, I think that is 354 00:22:51,771 --> 00:22:57,851 just sort of an intermediate stop gap and they are going to be pushed more towards, 355 00:22:57,951 --> 00:23:01,221 More licensing of their data sets. 356 00:23:01,397 --> 00:23:05,384 and I would say, while we wait for this to play out, I mean, as you 357 00:23:05,384 --> 00:23:09,417 know, this lawsuit could take and probably will take a very long time. 358 00:23:09,534 --> 00:23:12,941 the Google books case, for example, on which the AI companies are 359 00:23:12,941 --> 00:23:18,831 relying to 10 years before finally reaching, that conclusion that, Google 360 00:23:18,951 --> 00:23:21,041 books digitization was a fair use. 361 00:23:21,306 --> 00:23:25,862 So I would say in the interim, AI companies, and those. 362 00:23:26,396 --> 00:23:33,706 producing AI models should look more to, using, authorized data sets or 363 00:23:33,706 --> 00:23:38,727 construction of their models and authorized data sets with a traceable 364 00:23:38,811 --> 00:23:44,637 provenance so that, their customers, when using the, output or wanting 365 00:23:44,637 --> 00:23:47,163 to put the output into use, can. 366 00:23:47,260 --> 00:23:51,122 Know what the source is and do that analysis of is this 367 00:23:51,193 --> 00:23:52,438 violating any copyright? 368 00:23:52,438 --> 00:23:56,418 Is this violating any right of publicity or trademark or anything else? 369 00:23:57,149 --> 00:24:01,885 I would say for the companies that want to leverage a I, when you're looking for 370 00:24:01,925 --> 00:24:07,785 partners, you do want to look at partners who are using authorized data sets. 371 00:24:08,094 --> 00:24:13,306 Right now, what I see is that a lot of companies, brands. 372 00:24:13,797 --> 00:24:19,611 companies in the film and television industry that are actually leveraging AI 373 00:24:19,801 --> 00:24:26,144 and it is being leveraged, but they're using it for a first draft or a proof 374 00:24:26,214 --> 00:24:31,204 of concept for things that are iterative and you'll need to be turned around 375 00:24:31,264 --> 00:24:34,802 very quickly, but they're not using it. 376 00:24:34,954 --> 00:24:42,666 As part of the final consumer facing output, just due to those copyright 377 00:24:42,696 --> 00:24:47,566 reasons, both the reasons we just discussed, fear of having any type 378 00:24:47,566 --> 00:24:53,869 of legal liability, but also, because there are limitations on the degree to 379 00:24:53,869 --> 00:24:59,369 which you can protect, output that's generated by, artificial intelligence. 380 00:24:59,616 --> 00:25:01,506 Right, so when they have a. 381 00:25:01,872 --> 00:25:03,256 Authorized data set. 382 00:25:03,442 --> 00:25:07,569 And does the output come with footnotes with What does that look like? 383 00:25:07,579 --> 00:25:07,979 Do you know? 384 00:25:07,989 --> 00:25:09,532 Have you seen, what that looks like 385 00:25:09,623 --> 00:25:11,804 tell us what the sources are with it? 386 00:25:12,008 --> 00:25:13,098 Like, does it identify? 387 00:25:13,098 --> 00:25:13,388 Yeah. 388 00:25:13,688 --> 00:25:15,008 Oh, oh, oh, I see. 389 00:25:15,008 --> 00:25:15,818 When is it authorized? 390 00:25:15,818 --> 00:25:16,458 It is right. 391 00:25:16,618 --> 00:25:21,208 To my knowledge, it is not coming with anything. 392 00:25:21,218 --> 00:25:23,898 And you're talking about the fairly trained component, right? 393 00:25:23,898 --> 00:25:26,514 Yeah, is not coming with anything. 394 00:25:26,674 --> 00:25:31,451 but that does need to be a path toward which we're traveling. 395 00:25:31,628 --> 00:25:36,874 And there has been like a lot of conversation about that in this space that 396 00:25:36,954 --> 00:25:42,708 it needs to be, you marked, needs to be traced in terms of, what was the source? 397 00:25:42,748 --> 00:25:46,113 What did you rely on to do that? 398 00:25:46,331 --> 00:25:46,700 Yeah. 399 00:25:46,920 --> 00:25:50,643 As far as, the magic that happens inside of generative AI 400 00:25:50,716 --> 00:25:53,103 platform, do we know what that is? 401 00:25:53,103 --> 00:25:56,146 Or is that kind of the trade secrets of each companies? 402 00:25:56,313 --> 00:25:59,216 Or is there general technology that Makes the magic happen? 403 00:26:00,316 --> 00:26:06,753 I am not the expert on the technology inside of the AI models. 404 00:26:06,926 --> 00:26:08,436 I can share what I know. 405 00:26:08,490 --> 00:26:14,070 In part, it does depend on the approach that they've used, whether it's supervised 406 00:26:14,070 --> 00:26:16,460 learning or unsupervised learning. 407 00:26:16,643 --> 00:26:21,513 Which to make it very simple depends on how much you assisted the machine, 408 00:26:21,513 --> 00:26:26,270 like, did you mark things and tell them, this is a dog and this is a cat 409 00:26:26,397 --> 00:26:29,730 or did you just give them like kind of millions of pictures and kind of let 410 00:26:29,760 --> 00:26:34,987 them figure it out when you let them figure it out when it's unsupervised, 411 00:26:35,148 --> 00:26:40,197 it is more of a black box in terms of how they got to that answer, 412 00:26:40,301 --> 00:26:43,051 which brings up all sorts of other. 413 00:26:43,258 --> 00:26:45,238 societal issue, right? 414 00:26:45,598 --> 00:26:47,068 I think it's a time to play it about 415 00:26:47,068 --> 00:26:48,398 a, uh, interesting. 416 00:26:48,601 --> 00:26:53,644 can we wrap up with some best practices just for your everyday kind of chat, GBT, 417 00:26:53,780 --> 00:26:55,808 Janet, what is the Google on Genesis? 418 00:26:55,838 --> 00:26:59,841 What does it, user like when they're using it, for this audience, the expertise 419 00:26:59,841 --> 00:27:03,751 based business, maybe they're using it to create first drafts or to help them with 420 00:27:03,761 --> 00:27:07,751 social media posts or something like that, like just some general best practices. 421 00:27:08,021 --> 00:27:15,558 Sure, you want to be, circumspect about any, confidential or proprietary 422 00:27:15,588 --> 00:27:21,276 information you include in a prompt, may want to anonymize it. 423 00:27:21,455 --> 00:27:26,737 you need to keep in mind that, whatever output you get from, the 424 00:27:26,777 --> 00:27:31,443 AI model, may not be eligible for copyright protection if this is, um. 425 00:27:31,595 --> 00:27:38,465 something, material or output that you are passing on to a client or to a 426 00:27:38,465 --> 00:27:45,625 customer, you may need to disclose that use, and you have to make sure that, 427 00:27:45,625 --> 00:27:51,442 you're using the output Depending on the extent to which you're using it, are 428 00:27:51,442 --> 00:27:57,218 you using it just for, a little bit of assistance in, modifying a few sentences? 429 00:27:57,218 --> 00:28:02,075 Or are you actually producing images with it or producing an entire report with it? 430 00:28:02,075 --> 00:28:07,205 You gonna want to make sure that you're procedures for like using generative 431 00:28:07,235 --> 00:28:12,212 AI are consistent with the, contract that you have with your customer. 432 00:28:12,434 --> 00:28:17,780 If you want to know whether or not, your material, your prompts are being 433 00:28:17,964 --> 00:28:25,257 incorporated into the training data and being used to further train that AI model, 434 00:28:25,400 --> 00:28:28,060 take a look at the terms and conditions. 435 00:28:28,224 --> 00:28:33,940 to give you an example for chat, if you're using the free model, 436 00:28:34,107 --> 00:28:36,807 and it is, recording your history. 437 00:28:36,957 --> 00:28:40,040 Of your prompts, then, the prompts that you put in. 438 00:28:40,040 --> 00:28:45,377 There are subject to being included as part of future training data. 439 00:28:45,932 --> 00:28:46,332 Yeah. 440 00:28:46,432 --> 00:28:49,255 So, if it along the left hand side there, you have scroll 441 00:28:49,255 --> 00:28:51,485 through and see all your graphs. 442 00:28:51,485 --> 00:28:51,825 I have. 443 00:28:51,995 --> 00:28:54,145 that means it is going into the training data. 444 00:28:54,185 --> 00:28:55,015 That is excellent. 445 00:28:55,332 --> 00:28:58,652 can't say with certainty that it is going into the training data. 446 00:28:58,815 --> 00:29:01,735 But I would say, it is susceptible to being used. 447 00:29:02,710 --> 00:29:06,190 It's like they have not provided you, CHAT2P has not provided 448 00:29:06,200 --> 00:29:09,820 you any representation that they will not use it for training. 449 00:29:09,940 --> 00:29:10,180 Right. 450 00:29:10,365 --> 00:29:10,680 Very good. 451 00:29:10,680 --> 00:29:12,195 Thank you for making that distinction. 452 00:29:12,397 --> 00:29:13,203 thank you for this. 453 00:29:13,357 --> 00:29:18,650 this podcast is to help create a society that, and an economy 454 00:29:18,740 --> 00:29:19,800 that works for more of us. 455 00:29:20,177 --> 00:29:26,954 So I love to ask my guests, if there is an organization or a person who is doing the 456 00:29:26,997 --> 00:29:31,000 good and hard work to help make an economy that works for more of us, is there 457 00:29:31,000 --> 00:29:32,100 one that you'd like to share with us? 458 00:29:32,555 --> 00:29:37,269 Sure, I really like organizations whose mission it is to 459 00:29:37,279 --> 00:29:39,129 bridge the digital divide. 460 00:29:39,334 --> 00:29:44,695 And one of my favorite is girls who code that has as part of its 461 00:29:44,705 --> 00:29:46,989 mission, introducing more women. 462 00:29:47,129 --> 00:29:51,949 into the technology field, and that's very apropos to our conversation 463 00:29:51,949 --> 00:29:58,352 today, because as part of making AI, you beneficial for all humankind, we 464 00:29:58,362 --> 00:30:01,806 really do need, a diverse perspective. 465 00:30:02,272 --> 00:30:06,559 Yeah, I mean, we know that just when we talk about the Trina dating sets, like 466 00:30:06,559 --> 00:30:08,479 what data is going in there, obviously. 467 00:30:08,564 --> 00:30:11,744 what the output is only as diverse as the input, right. 468 00:30:11,861 --> 00:30:13,337 And how it's being trained. 469 00:30:13,337 --> 00:30:17,274 And I know that has come up in a number of, controversial ways as well, but 470 00:30:17,274 --> 00:30:20,524 whether something's leaning this way or that way, but we definitely want to make 471 00:30:20,524 --> 00:30:23,764 sure everyone has a voice in the future. 472 00:30:23,774 --> 00:30:24,734 Thank you for that one. 473 00:30:24,849 --> 00:30:29,387 And we will put that in the show notes along with how people can reach you. 474 00:30:29,487 --> 00:30:30,537 where do you hang out, Joy? 475 00:30:30,547 --> 00:30:32,427 And how can people get in touch with you to find out more? 476 00:30:32,561 --> 00:30:38,114 Sure, so I'm always, , happy to, chat with, people doing innovative 477 00:30:38,114 --> 00:30:43,204 things with technology, especially in the digital technology, online and 478 00:30:43,244 --> 00:30:47,587 entertainment space, so they can find me through my website, which is www. 479 00:30:47,627 --> 00:30:48,297 joybutler. 480 00:30:49,417 --> 00:30:49,887 com. 481 00:30:49,907 --> 00:30:50,767 And I'm awesome. 482 00:30:50,907 --> 00:30:53,237 So, on LinkedIn, I have Joy Butler. 483 00:30:53,421 --> 00:30:53,604 Awesome. 484 00:30:53,924 --> 00:30:55,564 Well, thank you so much. 485 00:30:55,735 --> 00:30:59,730 And, yes, everyone, please, follow Joy and, let us know if you have 486 00:30:59,740 --> 00:31:01,090 any other questions about AI. 487 00:31:01,360 --> 00:31:03,810 I know it's constantly evolving. 488 00:31:03,820 --> 00:31:06,550 There's always going to be something new and we can continue 489 00:31:06,550 --> 00:31:08,300 this conversation in the future. 490 00:31:08,491 --> 00:31:09,380 Thanks again, Joy. 491 00:31:09,571 --> 00:31:10,150 Thank you.