1 00:00:20,040 --> 00:00:25,600 Amy Qi: Hello, I'm Amy from McGill University and with me is Jay Kerr-Wilson, 2 00:00:25,640 --> 00:00:30,160 a partner in Fasken's Ottawa office and head of the firm's copyright practice. 3 00:00:30,520 --> 00:00:34,800 Welcome to the first episode in our series, Perspectives, AI and Copyright: 4 00:00:34,800 --> 00:00:41,400 Exploring the legal issues posed by adoption of generative artificial intelligence systems. 5 00:00:41,920 --> 00:00:47,880 In today's episode, we're exploring the legal debate surrounding the use of copyright protected content to train 6 00:00:47,920 --> 00:00:55,000 AI models and how courts and governments are responding to the conflicts between AI developers and content creators and 7 00:00:55,000 --> 00:00:58,640 owners. For context. So we're all on the same page. 8 00:00:58,920 --> 00:01:02,400 What exactly is machine learning and generative AI? 9 00:01:03,600 --> 00:01:12,840 Jay Kerr-Wilson: Thanks, Amy. So artificial intelligence refers to an umbrella term for technology that's designed to perform a 10 00:01:12,840 --> 00:01:22,160 human task. So, you've got a computer system or machinery that is doing something that until then, 11 00:01:22,200 --> 00:01:28,920 humans had done. And, you know, we've had AI in their simplest form for a long time. 12 00:01:28,960 --> 00:01:36,320 For example, stoplights and calculators are both examples of very simple AI that have been around for a long time. 13 00:01:36,880 --> 00:01:42,000 Machine learning is a more complex version of artificial intelligence, 14 00:01:43,000 --> 00:01:52,240 and it focuses on teaching algorithms to learn from data without being explicitly programmed. 15 00:01:52,400 --> 00:02:02,230 So you set up a system that can analyze data and apply data to rules and to 16 00:02:02,270 --> 00:02:11,710 learn patterns, and then to predict behaviour or future events based on those patterns that it has learned. 17 00:02:12,070 --> 00:02:17,470 And really simple examples of machine learning that everyone is familiar with. 18 00:02:17,710 --> 00:02:22,750 If you have a Spotify account or a Netflix account that recommends, 19 00:02:22,950 --> 00:02:32,390 um, new shows for you to watch or new music for you to listen to based on your prior consumption. 20 00:02:32,390 --> 00:02:37,790 So the AI has kept track of everything you've consumed on the platform, 21 00:02:38,070 --> 00:02:43,470 and it says, oh, "Amy's very interested in historical drama". 22 00:02:43,470 --> 00:02:48,910 And so then it will go through using that and predict other titles of, 23 00:02:49,070 --> 00:02:53,030 um, of that genre that you might also enjoy. 24 00:02:54,150 --> 00:02:59,830 The game changer that we've seen in the last few years is what's referred to as generative AI. 25 00:03:00,070 --> 00:03:03,790 And this is the CHATGPT, for example. 26 00:03:04,070 --> 00:03:13,470 What's different about generative AI is where machine learning could could apply a rule to a set of 27 00:03:13,470 --> 00:03:15,910 data and then predict outcomes. 28 00:03:16,110 --> 00:03:25,830 Generative AI is able to take a huge amount of data and learn the rules itself and then use 29 00:03:25,830 --> 00:03:29,030 those rules to generate brand new data. 30 00:03:29,390 --> 00:03:38,870 So ChatGPT, um, by reviewing huge amounts of examples of written documents, 31 00:03:39,390 --> 00:03:47,390 is able to learn how documents are constructed, how humans write documents, 32 00:03:47,710 --> 00:03:54,030 and then, and in fact write its own documents by applying what it's learned and creating something brand new. 33 00:03:54,070 --> 00:03:59,620 So that's sort of the evolution from artificial intelligence through machine learning. 34 00:03:59,660 --> 00:04:02,820 To now generative AI, which is what we're talking about today. 35 00:04:03,420 --> 00:04:07,860 Amy Qi: You say that machine learning involves teaching algorithms to learn from data. 36 00:04:08,060 --> 00:04:12,500 How specifically does generative AI use this data to train models? 37 00:04:12,940 --> 00:04:18,140 Jay Kerr-Wilson: Word of caution I'm not a computer scientist, so this is going to be in very simple terms. 38 00:04:18,420 --> 00:04:25,700 But what large language models, which are text based generative AI systems like ChatGPT are 39 00:04:25,740 --> 00:04:32,180 able to do is they analyse hundreds of millions of documents, 40 00:04:32,180 --> 00:04:39,300 examples of of written material, and they break that material down those documents down into 41 00:04:39,300 --> 00:04:44,460 very small fragments of text, um, that are then called tokens. 42 00:04:44,460 --> 00:04:47,700 And these tokens are then assigned numerical values. 43 00:04:48,020 --> 00:04:53,580 What the system is able to do is it learns the relationships between tokens, 44 00:04:54,180 --> 00:04:57,340 but at a huge scale. Hundreds of millions of times. 45 00:04:57,340 --> 00:05:00,780 What words tend to follow these combination of words? 46 00:05:00,980 --> 00:05:04,140 What letters tend to follow these combinations of letters? 47 00:05:04,540 --> 00:05:10,620 And it runs through the system, and then it compares what it thinks it's learned to its data 48 00:05:10,660 --> 00:05:12,620 set, which is the works it's learning from. 49 00:05:12,820 --> 00:05:16,220 And then it makes adjustments, and then it does it again, and then it makes more 50 00:05:16,220 --> 00:05:24,100 adjustments. Um, and by doing this, the by processing this huge amount of information over and 51 00:05:24,100 --> 00:05:30,900 over and over again, many, many, many iterations, the the AI system is then in effect, 52 00:05:31,780 --> 00:05:36,700 able to learn how to mimic human generated content. 53 00:05:36,700 --> 00:05:42,340 So it's able to predict what a human might write in response to a given prompt. 54 00:05:42,340 --> 00:05:46,460 And these things are all driven by prompts that a user gives it, 55 00:05:46,460 --> 00:05:51,460 asking it to to write a specific, uh, piece of text. 56 00:05:51,460 --> 00:06:00,090 And it can then mimic what, what it expects that a human would respond when given that that same prompt. 57 00:06:00,690 --> 00:06:02,410 Amy Qi: Great. I think that makes sense. 58 00:06:02,810 --> 00:06:05,970 Pivoting to the intellectual property landscape. 59 00:06:06,210 --> 00:06:13,290 What does the existing copyright law in Canada look like in the context of AI and generative AI? 60 00:06:13,810 --> 00:06:20,970 Jay Kerr-Wilson: So why the the discussion about generative AI has become so entangled with copyright, 61 00:06:21,010 --> 00:06:27,490 not only in Canada but around the world, is because to train large language models, 62 00:06:27,490 --> 00:06:33,690 you have to provide the system with access to hundreds of millions of copies. 63 00:06:33,850 --> 00:06:41,690 And anytime you're making a copy of something, you're engaging copyright law of the jurisdiction you're in. 64 00:06:42,250 --> 00:06:47,610 You know, ChatGPT has gone out and basically scoured the internet. 65 00:06:47,610 --> 00:06:57,410 So websites, bulletin boards, looking for examples of articles and posts and social 66 00:06:57,410 --> 00:07:01,970 media content and anything that's involved with written language. 67 00:07:02,650 --> 00:07:10,770 And it has scraped all that content and ingested it into its training set that it's then using to do the training process 68 00:07:10,970 --> 00:07:19,770 that I described, and all of that material, or virtually all of that material will be protected by 69 00:07:19,770 --> 00:07:25,250 copyright. It's material that's owned by somebody, whoever the author or owner was. 70 00:07:25,570 --> 00:07:30,250 And in most cases, these AI systems, large language models, 71 00:07:30,250 --> 00:07:37,530 the people who are developing them have not asked for permission to use the content. 72 00:07:38,210 --> 00:07:43,450 So they've simply scraped all this content and then used it to train the AI model. 73 00:07:43,450 --> 00:07:51,090 So now we're at a stage where the various groups of owners of this content, 74 00:07:51,330 --> 00:07:57,520 and whether it's authors of books, or people who take photographs, 75 00:07:57,520 --> 00:08:07,200 or people who create artistic images are all starting to challenge the fact that their work 76 00:08:07,440 --> 00:08:09,680 has been used without their permission. 77 00:08:09,960 --> 00:08:19,720 In the training of these, Canada's Copyright Act has not been amended specifically to deal with generative AI yet. 78 00:08:19,920 --> 00:08:24,600 But copyright law, like all laws, apply to AI. 79 00:08:25,360 --> 00:08:34,560 So. So where there's copying taking place, then the Copyright Act would apply to those copies. 80 00:08:34,800 --> 00:08:43,120 And in Canada, in most other jurisdictions, it's an infringement of copyright to make a copy of a work 81 00:08:43,120 --> 00:08:46,880 without the owner's consent, unless there's an applicable exception. 82 00:08:46,880 --> 00:08:49,840 And we'll talk about the exceptions a little bit later. 83 00:08:50,000 --> 00:08:57,320 But the starting presumption is, is that unless the people who are developing these AI 84 00:08:57,360 --> 00:09:03,200 systems can establish that there's an exception to copyright involved in the training of AI, 85 00:09:03,880 --> 00:09:09,560 they are potentially liable for infringing copyright in a large amount of written material. 86 00:09:10,800 --> 00:09:20,800 And why this is such a challenge to policy makers and to the industry is you can imagine that 87 00:09:20,800 --> 00:09:30,680 if an AI system is trained using hundreds of millions of documents that will be owned by millions of people, 88 00:09:31,000 --> 00:09:40,640 it's hard to conceive of a licensing system where the people who own the copyright in this content would get 89 00:09:40,640 --> 00:09:50,360 paid anything more than fragments of a penny for each work, without creating such a huge cost on 90 00:09:50,400 --> 00:09:54,270 AI developers that AI development will become impossible. 91 00:09:54,470 --> 00:09:58,830 So when you're talking about the scale of hundreds of millions of documents, it's really hard to figure out what 92 00:09:58,830 --> 00:10:06,070 is a price that we can put on this activity that will compensate authors more than a few pennies, 93 00:10:06,390 --> 00:10:12,030 but still make the training of AI models financially feasible. 94 00:10:13,070 --> 00:10:19,350 And so the the that's the tension and the debate that's going on right now. 95 00:10:19,590 --> 00:10:27,670 Amy Qi: I see. Has there been any government response to address some of these issues raised by AI that you're talking about? 96 00:10:28,510 --> 00:10:35,870 Jay Kerr-Wilson: Yeah. So it's interesting. In Canada, there was an initial consultation on whether or not the 97 00:10:35,870 --> 00:10:41,230 Copyright Act should be amended to respond to the development of artificial intelligence. 98 00:10:41,630 --> 00:10:47,950 And so stakeholders filed their submissions and the government reviewed them. 99 00:10:48,310 --> 00:10:52,230 And this was 5 or 6 years ago, and the government came out and said, 100 00:10:52,230 --> 00:10:57,390 well, we don't think it's necessary to amend the Copyright Act right now. 101 00:10:57,710 --> 00:11:01,030 It's not an urgent situation. We're going to keep monitoring it. 102 00:11:02,270 --> 00:11:11,710 Then shortly after that, we had the release of ChatGPT and a whole bunch of other AI 103 00:11:11,910 --> 00:11:18,630 systems, both text based systems, image based systems, um, and they change the game very, 104 00:11:18,630 --> 00:11:26,870 very quickly. So we started to see, um, a lot of development in the use of AI and a lot of 105 00:11:26,870 --> 00:11:31,470 attention being paid on it, um, by copyright owners. 106 00:11:32,190 --> 00:11:38,470 So the government launched a second consultation specifically to deal with, 107 00:11:39,150 --> 00:11:43,750 uh, generative AI and the new developments that had come out. 108 00:11:44,150 --> 00:11:52,700 And, you know, predictably, people who were from the technology sector. 109 00:11:52,700 --> 00:11:59,140 People who are involved in the development or use of AI systems were suggesting that the Copyright Act should be 110 00:11:59,140 --> 00:12:04,260 amended to include an exception to copyright. 111 00:12:04,300 --> 00:12:07,940 What's often referred to as a text and data mining exception. 112 00:12:08,260 --> 00:12:14,380 So an exception means that if you if you make a particular use of copyright material, 113 00:12:14,660 --> 00:12:17,780 that the use is not an infringement of copyright. 114 00:12:17,780 --> 00:12:19,940 And a lot of times there's conditions on that. 115 00:12:20,140 --> 00:12:27,020 So the idea would being is if you're using material that's available on the internet that you've, 116 00:12:27,340 --> 00:12:31,020 uh, scraped from the internet to train a large language model, 117 00:12:31,020 --> 00:12:37,940 you would not need the consent of the people who own the copyright in that material in order to train your large 118 00:12:38,220 --> 00:12:45,180 language model. And understandably, uh, the creative industries and people who own copyright in 119 00:12:45,180 --> 00:12:50,620 material that's available over the internet Took a very different view, 120 00:12:50,620 --> 00:12:59,740 and they were advocating that there should not be any exception that applies to development of AI systems. 121 00:13:00,020 --> 00:13:06,660 They said that these companies had raised billions of dollars in investment to develop their AI. 122 00:13:07,060 --> 00:13:14,420 They were starting to generate huge amounts of revenue from licensing their AI systems. 123 00:13:14,420 --> 00:13:21,260 And it wasn't fair that these systems were being built on the backs of creators who were receiving no compensation for 124 00:13:21,260 --> 00:13:25,220 the use of their works. That's where we are right now in Canada. 125 00:13:25,220 --> 00:13:29,220 We've had this consultation. The government has received all of this, 126 00:13:29,420 --> 00:13:33,260 all of these submissions from these various different perspectives. 127 00:13:33,500 --> 00:13:38,140 But, you know, we had a federal election and we have a new government. 128 00:13:38,140 --> 00:13:45,180 Obviously, the new government is facing a lot of other priorities on the trade front and on sort of foreign 129 00:13:45,180 --> 00:13:47,610 relations with our largest neighbour. 130 00:13:48,050 --> 00:13:55,450 Um, but at some point, I think they're going to have to come back and take another look at the at the development of AI 131 00:13:55,450 --> 00:14:02,170 and try and come up with a policy that will then be reflected in amendments to the Copyright Act. 132 00:14:02,970 --> 00:14:09,810 Amy Qi: I see. Um, you mentioned earlier that there's some exceptions for copyright infringement. 133 00:14:10,250 --> 00:14:14,490 In Canada, there's a fair dealing exception to copyright for research. 134 00:14:14,530 --> 00:14:16,890 What does this exception entail? 135 00:14:17,010 --> 00:14:23,570 Jay Kerr-Wilson: So section 29 of the Copyright Act is what's known as the fair dealing exception. 136 00:14:24,210 --> 00:14:28,090 And people may have also heard uh, the term fair use. 137 00:14:28,090 --> 00:14:34,610 So fair use describes the sort of the, the system in Canada or the exception in Canada or in the 138 00:14:34,610 --> 00:14:38,730 United States, rather. So in the United States, it's referred to as fair use. 139 00:14:38,730 --> 00:14:41,130 In Canada we refer to it as fair dealing. 140 00:14:41,610 --> 00:14:47,930 And so fair dealing applies to a very specific list of uses, 141 00:14:48,330 --> 00:14:54,530 um, like research, like education, uh, like parody and satire, 142 00:14:55,090 --> 00:15:03,890 um, news reporting. And if you're using copyright material for one of these very specific purposes and your use is 143 00:15:03,890 --> 00:15:11,290 fair, then you don't need the permission of the copyright owner and you don't need to pay compensation. 144 00:15:11,650 --> 00:15:18,810 So one of the very specific uses of fair dealing is fair dealing for the purpose of research. 145 00:15:19,250 --> 00:15:27,690 Uh, so the argument would be that training large language models and other types of generative AI systems is research. 146 00:15:28,090 --> 00:15:37,930 So you could use then copyright material, um, for the training of AI as long as the 147 00:15:37,930 --> 00:15:47,760 use was fair. And whether or not the use is fair is a very fact specific examination that courts will go 148 00:15:47,760 --> 00:15:54,040 through in infringement proceedings where fair dealing has been raised as a defence. 149 00:15:54,280 --> 00:16:00,160 And so they will look at, you know, what is the purpose of the dealing if it's for research or 150 00:16:00,200 --> 00:16:07,320 is it for some other purpose. What is the amount of the dealings or are you dealing with the whole work, 151 00:16:07,320 --> 00:16:09,280 or are you dealing with just part of the work? 152 00:16:09,640 --> 00:16:11,560 What is the nature of the work? 153 00:16:11,600 --> 00:16:14,320 Is this a work that's already widely available? 154 00:16:14,360 --> 00:16:17,280 Is this a work that is confidential? 155 00:16:17,600 --> 00:16:24,320 And one of the key ones in the debate around AI is what is the effect of the dealing on the work? 156 00:16:24,360 --> 00:16:28,640 So if you're involved in fair dealing for the purpose of research, 157 00:16:28,640 --> 00:16:37,160 does that research or that dealing diminish the economic value of the work to the owner? 158 00:16:37,600 --> 00:16:43,880 We haven't had. So there's been a number of cases that have been started in Canada, 159 00:16:43,880 --> 00:16:46,640 but we haven't had any decisions yet. 160 00:16:46,640 --> 00:16:52,480 But that's that's how fair dealing for the purpose of, of research might apply. 161 00:16:52,640 --> 00:17:00,400 Another exception that I think courts will want to look at in the context of generative AI is Canada has a specific 162 00:17:00,440 --> 00:17:06,560 exception that covers temporary reproductions for a technological purpose. 163 00:17:06,760 --> 00:17:13,200 So if you have to make a very brief copy of a work or a reproduction of a work for a temporary purpose, 164 00:17:13,600 --> 00:17:18,360 and it's it's in order to facilitate some technological purpose. 165 00:17:18,840 --> 00:17:23,160 And then once that technological purpose has been fulfilled, 166 00:17:23,320 --> 00:17:33,080 the copy is destroyed. Then that's also another exception to copyright that could apply to to the 167 00:17:33,080 --> 00:17:36,800 use of copyright materials to train AI systems. 168 00:17:37,200 --> 00:17:43,950 Amy Qi: You mentioned that in the US, this concept of fair dealing is called fair use in section 169 00:17:43,950 --> 00:17:49,790 107 of the Copyright Act. It outlines the four factors for determining fair use. 170 00:17:49,990 --> 00:17:52,750 Can you briefly explain this criteria as well? 171 00:17:53,830 --> 00:18:02,470 Jay Kerr-Wilson: Sure. And there is a lot of overlap between the fair use analysis in the US and the fair dealing analysis in Canada. 172 00:18:02,870 --> 00:18:06,950 But there are some differences. And one of the big differences in the US, 173 00:18:06,990 --> 00:18:16,150 if you want to make fair use of somebody else's work, then your use has to be what's known as transformative. 174 00:18:16,310 --> 00:18:20,030 So you have to use the work to, in effect, create something new. 175 00:18:20,390 --> 00:18:28,950 And that's a requirement of the use being fair in the US And in Canada, 176 00:18:29,710 --> 00:18:34,750 we don't have that same factor in our fair dealing analysis. 177 00:18:35,830 --> 00:18:42,550 Whether or not the dealing is transformative isn't something that courts will look at, 178 00:18:42,950 --> 00:18:47,590 but in many other ways the two analysis are very similar. 179 00:18:47,830 --> 00:18:57,830 So courts in the US will look at whether or not the use has a prejudicial impact on the market value for 180 00:18:57,830 --> 00:19:01,230 the copyrighted work. Um, what is the nature of the work? 181 00:19:01,230 --> 00:19:07,790 Is the posed fair use for a commercial purpose, a non-commercial purpose? 182 00:19:07,990 --> 00:19:13,390 So it's similar. But the big difference is this requirement in the US that the use be transformative. 183 00:19:13,830 --> 00:19:20,470 Amy Qi: Okay, great. Now that we understand some of the tech and legal background associated with AI training data, 184 00:19:20,670 --> 00:19:24,230 I think it's interesting to see how the theory is applied in court. 185 00:19:24,710 --> 00:19:31,670 Recently, there have been several landmark cases in court decisions pertaining to the use of copyrighted material to 186 00:19:31,710 --> 00:19:34,310 train AI models that we've been discussing. 187 00:19:34,750 --> 00:19:43,140 Two notable cases are the lawsuits that both Anthropic and Meta are facing in the San Francisco US District Court, 188 00:19:43,620 --> 00:19:47,580 starting with Anthropic. Could you give us a summary of the context for this case? 189 00:19:47,620 --> 00:19:54,820 Jay Kerr-Wilson: Sure. So Anthropic is an AI company that's developed, uh, a family of large language models that it's named 190 00:19:54,820 --> 00:19:57,780 Claude. Uh, so it works similar to ChatGPT. 191 00:19:58,420 --> 00:20:06,980 And in order to train their large language model, anthropic really focused on using books in their training 192 00:20:06,980 --> 00:20:13,500 set. They had determined that, you know, books by their very nature, 193 00:20:13,780 --> 00:20:23,620 um, provide much better training content than blog posts or articles or much smaller pieces of 194 00:20:23,620 --> 00:20:26,220 text. Books tend to be much better written. 195 00:20:26,220 --> 00:20:32,260 They tend to be more complex. Uh, they express sort of more comprehensive thoughts. 196 00:20:32,260 --> 00:20:41,380 So if you want to train a system about how to predict and produce high value Literary 197 00:20:41,380 --> 00:20:46,460 content. The the idea was that books provides the ideal training set. 198 00:20:47,060 --> 00:20:52,980 So anthropic built its training data set from two sources. 199 00:20:52,980 --> 00:20:57,540 So it licensed a large amount of copyrighted books. 200 00:20:57,580 --> 00:20:59,580 So it actually purchased books. 201 00:20:59,940 --> 00:21:05,460 And then it scanned these books into digital form and fed them into its training set. 202 00:21:05,460 --> 00:21:14,700 But these were books that it had bought and paid for, but it had also was alleged that it had downloaded 203 00:21:15,260 --> 00:21:22,300 millions of copies of copyrighted books from what are known as pirate sites on the internet, 204 00:21:22,300 --> 00:21:28,940 so large files that contained millions and millions of digital copies of books for which the publishers and authors 205 00:21:28,940 --> 00:21:32,580 had never given their consent for the copies to be made. 206 00:21:33,780 --> 00:21:43,450 Anthropic used both of these sort of sources for books, for literary materials to then train its Claude large 207 00:21:43,450 --> 00:21:48,810 language model. And there was what was known as a class action. 208 00:21:48,810 --> 00:21:58,290 So a class action lawsuit is where you have one plaintiff who represents a whole class of similarly situated 209 00:21:58,290 --> 00:22:06,810 plaintiffs. So in this case, we had an author was suing Anthropic in California for the unauthorised use of his 210 00:22:06,810 --> 00:22:15,810 book, but his litigation would also cover, uh, all authors in the class whose books had then been used 211 00:22:15,930 --> 00:22:18,770 by Anthropic uh, without consent. 212 00:22:19,250 --> 00:22:22,890 Amy Qi: And what was the court's decision in this case? 213 00:22:23,530 --> 00:22:27,290 Jay Kerr-Wilson: So there was a couple different layers of the court's decision. 214 00:22:27,290 --> 00:22:33,090 And again, it depends. It turned on the fact that there was the two sources of data. 215 00:22:33,090 --> 00:22:36,490 So there was the the books that anthropic had bought. 216 00:22:36,490 --> 00:22:44,050 And then there was the. Scanned digital copies that it had acquired without the author's consent. 217 00:22:44,250 --> 00:22:49,770 And the fact that Anthropic. In addition to training its large language model, 218 00:22:49,770 --> 00:22:57,010 Claude. It was also building a central library of as many literary works as it could. 219 00:22:57,050 --> 00:23:03,930 So it wanted to both house a central library that would then persist of books, 220 00:23:03,930 --> 00:23:07,090 but also train its large language model. 221 00:23:08,330 --> 00:23:18,170 So the court on the on the training data, it did the fair use analysis and it said absolutely that 222 00:23:18,170 --> 00:23:23,010 training large language models and other AI systems was transformative. 223 00:23:23,410 --> 00:23:33,410 Uh, it found that, um, the that there was no impact on the commercial value of the works that 224 00:23:33,410 --> 00:23:37,610 were used in the training model because Anthropic had put safeguards in place. 225 00:23:37,880 --> 00:23:45,920 So if you're using Claude, the large language model, you couldn't ask Claude to simply reproduce one of the books 226 00:23:45,920 --> 00:23:47,440 that was in the training data. 227 00:23:47,480 --> 00:23:56,960 Claude would not. Was designed not to reproduce a verbatim a book that it had that had been included in its system. 228 00:23:56,960 --> 00:23:59,320 It would only generate brand new content. 229 00:23:59,680 --> 00:24:07,920 So the court concluded, well, you know, Claude's not going to produce copies of Catcher in 230 00:24:07,920 --> 00:24:15,200 the Rye that will then compete with the original Catcher in the Rye that's included in the in the training set. 231 00:24:16,000 --> 00:24:20,840 So, the court concluded that that based on its fair use analysis, 232 00:24:21,520 --> 00:24:31,480 training LLMs in the way that that Anthropic was doing, it was fair use and therefore not infringement 233 00:24:31,480 --> 00:24:39,800 of copyright. On the issue of maintaining this large, persistent digital library of books. 234 00:24:40,200 --> 00:24:48,080 The court said anthropic had a right to keep digital copies of the books it had legitimately purchased and scanned 235 00:24:48,080 --> 00:24:55,720 because it owned those books, but it didn't have a right to keep in this large collection 236 00:24:56,000 --> 00:24:58,080 the pirated copies of the books. 237 00:24:58,480 --> 00:25:06,640 So the court ultimately found that there was liability for the reproduction of the copyright material in the persistent 238 00:25:06,640 --> 00:25:13,880 library, but that fair use covered the training model, which was sort of the big takeaway out of that case. 239 00:25:14,560 --> 00:25:19,880 Amy Qi: I know you've already briefly touched on this, but just to get into more specifics. 240 00:25:20,320 --> 00:25:27,440 How were the four factors that we've talked about before of American copyright law for fair use evaluated in this case? 241 00:25:28,200 --> 00:25:34,480 Jay Kerr-Wilson: Sure. And just to sort of go through the list, so the the, the purpose and character of the use in this 242 00:25:34,480 --> 00:25:40,670 case was the training using the the copies of the books to train large language model, 243 00:25:40,670 --> 00:25:47,110 which the court had no trouble finding that the the purpose was transformative because, 244 00:25:47,310 --> 00:25:55,150 um, of the process of breaking the works down into the small fragments and the fact that you're producing something new 245 00:25:55,190 --> 00:26:03,150 at the end, the fact that the author's works were creative worked against, 246 00:26:03,590 --> 00:26:07,670 uh, anthropic on the nature of the of the works. 247 00:26:07,950 --> 00:26:14,270 Uh, but the amount of the use the court actually found in anthropic favour, 248 00:26:14,390 --> 00:26:23,350 because one of the things about training large language models is you want large language models to produce content 249 00:26:23,550 --> 00:26:31,630 that is as free of bias as possible and as, you know, accurate as possible. 250 00:26:31,870 --> 00:26:36,710 And you want to avoid AI systems that are, you know, hallucinate. 251 00:26:36,710 --> 00:26:42,750 And in order to to generate the best possible outcomes, you have to have as much material from as many different 252 00:26:42,750 --> 00:26:46,190 sources as you can. So in this case, the court said, in fact, 253 00:26:46,190 --> 00:26:55,110 the amount of material that anthropic was using worked in favour of fair use because it made for better outcomes. 254 00:26:55,390 --> 00:27:01,310 And then on the effect of the market, the court found in favour of Anthropic because they said, 255 00:27:01,470 --> 00:27:11,470 you know, there was no the plaintiffs could not establish that Anthropic was producing copies that were 256 00:27:11,470 --> 00:27:14,550 competing directly with the works that were in the training set. 257 00:27:16,070 --> 00:27:23,070 Amy Qi: Very recently, Anthropic has actually announced that it has reached a preliminary settlement in this class action 258 00:27:23,070 --> 00:27:27,030 lawsuit. In a court filing on August 26th, 2025. 259 00:27:27,390 --> 00:27:31,390 What does this mean and what are the impacts of the settlement? 260 00:27:32,350 --> 00:27:38,060 Jay Kerr-Wilson: So, as I said, although Anthropic was successful on the fair use question, 261 00:27:38,060 --> 00:27:45,180 it um, uh for the training, it was not successful on the sort of the large collection, 262 00:27:45,180 --> 00:27:54,180 the library it had built. And because of the number ofindividual books that were in this library for which there 263 00:27:54,180 --> 00:28:03,300 was potential liability, Anthropic was facing a huge potential financial hit from damages. 264 00:28:03,660 --> 00:28:12,660 So, you know, this was just a way to not have to go through the process of assessing the damages in an unnecessary 265 00:28:12,660 --> 00:28:19,340 trial. So the settlement then sort of puts the question of damages at an end. 266 00:28:19,660 --> 00:28:24,780 Amy Qi: Right. And the other case that we mentioned earlier involves Meta, 267 00:28:25,180 --> 00:28:28,540 and I think it serves as an interesting comparative to the Anthropic one, 268 00:28:28,540 --> 00:28:34,620 because they both occurred in the same court and it has similar facts to the anthropic case. 269 00:28:35,020 --> 00:28:38,660 So could you give a summary of the context for this Meta case? 270 00:28:39,300 --> 00:28:46,460 Jay Kerr-Wilson: Sure. So Meta, as most people know, is the company that operates social media platforms such as 271 00:28:46,500 --> 00:28:53,220 Facebook and Instagram. But it's also developing its own large language models called llama. 272 00:28:53,540 --> 00:29:01,460 Similar to the Anthropic case, Meta had used a very large volume of books for which it had 273 00:29:01,460 --> 00:29:07,700 not obtained the consent of the publishers of the authors in order to train its models. 274 00:29:08,380 --> 00:29:11,220 And so, again, it was a class action. 275 00:29:11,220 --> 00:29:15,780 So there was a representative plaintiff author who represented a class of authors, 276 00:29:15,780 --> 00:29:23,020 and they were claiming copyright infringement because of the copies that were used to train the large language model. 277 00:29:23,140 --> 00:29:26,180 Amy Qi: And what was the court's decision in this case? 278 00:29:27,140 --> 00:29:28,540 Jay Kerr-Wilson: This is an interesting outcome. 279 00:29:28,540 --> 00:29:36,050 So the court ultimately so so what had happened as Meta had brought a motion for summary judgement, 280 00:29:36,770 --> 00:29:41,410 and the court granted Meta's motion. 281 00:29:41,410 --> 00:29:46,450 So so Meta ultimately won the case on the summary judgement motion. 282 00:29:46,690 --> 00:29:56,570 But what was really interesting was the court did the same kind of fair use analysis that the court in Anthropic had 283 00:29:56,570 --> 00:30:06,050 done, and came to almost an opposite result, and largely on the question of what is the impact of large 284 00:30:06,050 --> 00:30:12,930 language models on the market for published books, and why this is particularly interesting is this is the 285 00:30:12,930 --> 00:30:19,290 exact same court, same district court in the US and California, 286 00:30:19,570 --> 00:30:23,370 different judges. And these decisions were issued just a few days apart. 287 00:30:23,690 --> 00:30:30,530 And the judge in the Meta decision, in fact, explicitly references and disagrees with the 288 00:30:30,530 --> 00:30:34,090 decision of the same court in the anthropic case. 289 00:30:34,650 --> 00:30:42,970 So on this question of what is the effect of large language models on the copyright works or the books that are used in 290 00:30:42,970 --> 00:30:50,330 training? The court in the Anthropic case said, um, well, anthropic large language model. 291 00:30:50,330 --> 00:30:58,170 Claude isn't producing copies of the books, so it's not competing with the books in the training set 292 00:30:58,170 --> 00:31:01,930 because it's not reproducing those books, it's producing new books. 293 00:31:01,930 --> 00:31:05,730 So there's no effect on the value of catcher in the Rye. 294 00:31:06,210 --> 00:31:14,090 When the large language model produces brand new books or new written texts that aren't catcher in the Rye. 295 00:31:14,330 --> 00:31:21,970 So it took a very narrow view of the analysis of what is the impact on the on the market for books. 296 00:31:22,490 --> 00:31:26,330 The court in the Meta decision took a very different view. 297 00:31:26,370 --> 00:31:34,920 It took a very broad view of what is the impact of large language models on the publishing industry in general, 298 00:31:35,240 --> 00:31:45,120 and the court was very concerned that because large language models are able to crank out thousands or 299 00:31:45,120 --> 00:31:51,120 millions of books very quickly, very cheaply, um, that are, 300 00:31:51,920 --> 00:31:55,960 you know, based on what it has learned from the published works, 301 00:31:56,200 --> 00:31:58,200 it would overwhelm the publishing market. 302 00:31:58,200 --> 00:32:02,560 So human authors will have a hard time being able to make a living, 303 00:32:02,800 --> 00:32:09,760 because they're then going to be competing against all of these mass produced AI copies that are going to overwhelm 304 00:32:09,760 --> 00:32:14,840 the publishing industry. So it's a very different from a very narrow perspective, 305 00:32:15,480 --> 00:32:25,040 you know, will the individual author be harmed versus a very broad policy based perspective in the meta court that says, 306 00:32:25,280 --> 00:32:29,480 you know, the publishing industry, in fact, is in peril or human. 307 00:32:29,520 --> 00:32:36,520 The human element of the publishing industry is in peril if AI is allowed to run unchecked. 308 00:32:36,840 --> 00:32:43,480 We haven't seen the end of this debate in the US, and there is very similar class action lawsuits that have 309 00:32:43,480 --> 00:32:47,720 been commenced in Canada. We don't have any decisions in Canada yet, 310 00:32:47,720 --> 00:32:52,920 but this the the court battle over the use of AI is just beginning. 311 00:32:52,920 --> 00:32:54,040 It's far from over. 312 00:32:54,640 --> 00:32:57,000 Amy Qi: You mentioned Canada there at the end. 313 00:32:57,320 --> 00:33:05,040 Um, how do you think these cases could be evaluated if they were transposed to a Canadian copyright law context? 314 00:33:05,840 --> 00:33:12,080 Jay Kerr-Wilson: So it's interesting because Canada tends to have, under its fair dealing, 315 00:33:12,480 --> 00:33:17,920 uh, analysis results that are more user friendly than the United States. 316 00:33:18,280 --> 00:33:22,840 So we've had decisions of the Supreme Court of Canada that said, 317 00:33:23,200 --> 00:33:27,120 fair dealing isn't just a technical exception to copyright. 318 00:33:27,120 --> 00:33:32,270 It's actually a user's right. So this is how fair dealing has been constructed, 319 00:33:32,270 --> 00:33:41,790 as Canada and the court has told us that copyright has to be analysed as a balance between the interests of the user, 320 00:33:41,910 --> 00:33:44,870 the public interest and the interest of the owner. 321 00:33:45,350 --> 00:33:51,590 So although we haven't seen any cases yet, and I think there's lots of good arguments on both sides, 322 00:33:52,470 --> 00:34:00,190 my prediction would be that an analysis of a Canadian court would likely be closer to the anthropic, 323 00:34:01,110 --> 00:34:10,390 uh, analysis, that it would take a very narrow view of what is the impact of the use or the dealing on the particular 324 00:34:10,390 --> 00:34:17,070 work at issue, and not take the broad policy based approach that the court did in the meta case? 325 00:34:17,790 --> 00:34:23,390 Amy Qi: That makes sense. Um, just to end off the episode with a more general question, 326 00:34:23,750 --> 00:34:25,590 where do we go from here? 327 00:34:26,350 --> 00:34:28,550 Jay Kerr-Wilson: So these issues need to be resolved. 328 00:34:28,550 --> 00:34:33,270 I think everyone understands that, you know, the generative AI genie's out of the bottle. 329 00:34:33,270 --> 00:34:41,070 It's not going back in. Um, all industries are starting to adopt AI solutions and AI technologies, 330 00:34:41,070 --> 00:34:44,230 so there needs to be some sort of resolution to these issues, 331 00:34:44,230 --> 00:34:47,910 and it's going to have to be a policy based resolution. 332 00:34:47,910 --> 00:34:51,150 So ultimately, I think there's going to have to be legislative amendments. 333 00:34:51,550 --> 00:34:54,470 We may get some court decisions before that happens. 334 00:34:54,470 --> 00:35:00,550 And oftentimes court decisions will inform what governments will do with legislative amendments. 335 00:35:00,670 --> 00:35:05,950 The challenge is it takes a long time for these cases to go through the court system, 336 00:35:06,310 --> 00:35:15,790 and it takes a long time for governments to pass legislation to amend the Copyright Act to deal with these emerging 337 00:35:15,830 --> 00:35:18,830 technologies. And the technology's going really fast. 338 00:35:18,830 --> 00:35:26,420 So even if we figured out a way to, you know, resolve the issues that are confronting us today. 339 00:35:26,860 --> 00:35:30,180 You know, three years from now, we're probably going to have a whole different set of 340 00:35:30,180 --> 00:35:33,940 challenges that will also have to be addressed. 341 00:35:34,860 --> 00:35:37,380 Amy Qi: Okay. I think that's a good note to end on. 342 00:35:37,420 --> 00:35:43,860 Thank you to Jay for your insight on generative AI and the legal issues associated with it. 343 00:35:43,860 --> 00:35:49,740 As we've seen, there's a lot of nuance involved, but I think you really helped clarify the legal debate on 344 00:35:49,780 --> 00:35:53,820 the potential copyright infringement involved in AI training data. 345 00:35:54,460 --> 00:35:59,580 Thank you for listening to this episode of Perspectives and Copyright, 346 00:35:59,940 --> 00:36:02,260 and make sure to tune in for the next one.