Copyright Law, Fair Use, and the Future of AI

The AI giants are going to be writing some big checks. But they may never be big enough.

Mar 07, 2024

Sree’s newsletter is produced with Zach Peterson (@zachprague). Digimentors Tech Tip from Robert S. Anthony (@newyorkbob). Our sponsorship kit. The image above “AI eating copyright” was generated on Jasper.ai.

🗞 @Sree’s NYT Readalong: Our most recent episode was w/ TikTok star Kelsey Russell, who is getting young people to — gasp! — read newspapers! Watch the recording here. You’ll find three years’ worth of archives at this link (we’ve been reading the paper aloud on social for 8+ years now!). The Readalong is sponsored by Muck Rack. Interested in sponsorship opportunities? Email sree@digimentors.group and neil@digimentors.group.

🤖 For just $10, you can buy the video and slides from my “Non-Scary Guide to AI” workshop and all-star panel with experts Aimee Rinehart, Senior Product Manager AI Strategy for The Associated Press; and Dr. Borhane Blili-Hamelin, AI Risk and Vulnerability Alliance, here: https://digimentors.gumroad.com/l/aipanel. One cool part is that it has Purchasing Power Parity pricing, so it adjusts automatically to the country you live in. eg: Sweden: $8; Italy: $6.70; Singapore: $6.60; UAE: $6.10; India: $4; South Africa: $4. TESTIMONIAL: "What an excellent class... thank you! Generative AI is an unbelievably exciting and terrifying subject. Looking forward to participating in future classes." — Jill Davison, global comms executive. I am doing these workshops around the country and abroad as well as by Zoom, customized for each audience. If you'd like to discuss organizing one, please LMK at sree@digimentors.group

***

THE DECODER PODCAST is one of the best places to go to understand AI. Host Nilay Patel, co-founder and EIC of The Verge, and a former lawyer, has always been one of the first people I turn to for perspective on the tech universe. A recent episode with The Verge features editor Sarah Jeong, also a former lawyer, is all about copyright law, the AI industry, and the ins and outs of the colossal confrontation brewing between people who create things and the companies creating the machines are learning from them.

The show starts off perfectly—basically, the lawyers think the new wave of copyright suits against AI companies could actually signal an extinction level event for AI, but the AI CEOs don’t. According to both Patel and Jeong, the CEOs are of the view that money will end up solving this problem. This may be true, but I think the idea that a few million dollars here and there will be the solution is completely misplaced.

It all comes down to fair use. From The Verge:

Fair use is written right into the Copyright Act, and it says that certain kinds of copies are okay. Since the law can’t predict what everyone might want to do, it has a four-factor test written into it that courts can use to determine if a copy is fair use.
But the legal system is not deterministic or predictable. Any court gets to run that test any way they want, and one court’s fair use determination isn’t actually precedent for the next court.
That means fair use is a very vibes-based situation[…]

It’s a coin flip on a case-by-case basis. In December, The New York Times joined the fray when they filed suit against OpenAI and Microsoft for copyright infringement. The NYT claims in the suit that OpenAI’s AI training models are essentially stealing the NYT’s work, leading to billions of dollars in revenue for OpenAI and a net loss for NYT journalists and shareholders alike. This is essentially the first major case in this space (though some comedians and authors had sued OpenAI earlier), and there are untold billions of dollars at stake.

Here is a pdf of the full filing. We’d love to hear from any attorneys with some thoughts on the whole thing—feel free to comment or send an email!

Here’s an illustrative excerpt:

Publicly, Defendants [eds: OpenAI and Microsoft] insist that their conduct is protected as “fair use” because their unlicensed use of copyrighted content to train GenAI models serves a new “transformative” purpose. But there is nothing “transformative” about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it. Because the outputs of Defendants’GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use.
The law does not permit the kind of systematic and competitive infringement that Defendants have committed.

For comparison’s sake, here’s what the AI companies have been saying to date. An excerpt from public comments submitted to the U.S. Copyright Office:

Google:

If training could be accomplished without the creation of copies, there would be no copyright questions here. Indeed that act of “knowledge harvesting.” to use the Court’s metaphor from Harper & Row, like the act of reading a book ‘and learning the facts and ideas within it, would not only be non-infringing, it would further the very purpose of copyright law. The mere fact that, as a technological matter, copies need to be made to extract those ideas and facts from copyrighted works should not alter that result.

StabilityAI:

A range of jurisdictions including Singapore, Japan, the European Union, the Republic of Korea, Taiwan, Malaysia, and Israel have reformed their copyright laws to create safe harbors for Al training that achieve similar effects o fair use.” In the United Kingdom, the Government Chief Scientific Advisor has recommended that “if the government’s aim is to promote an innovative Al industry in the UK, it should enable mining of available data, text, and images (the input) and utilise [sic] existing protections of copyright and IP law on the output of AI.

Andreessen Horowitz:

Over the last decade or more, there has been an enormous amount of investment—billons and billions of dollars—in the development of AI technologies, premised on an understanding that, under current copyright law, any copying necessary to extract statistical facts is permitted. A change in this regime will significantly disrupt settled expectations in this area. Those expectations have been a critical factor in the enormous investment of private capital into U.S.-based AI companies which, in turn, has made the U.S. a global leader in AI. Undermining those expectations will jeopardize future investment, along with U.S. economic competitiveness and national security.

This won’t be a slam dunk case and the next major one like it won’t be either. The Supreme Court has yet to weigh in, of course—a very interesting discussion point in the Decoder episode.

The NYT is asking for “billions of dollars” in compensation, and I support them wholeheartedly, but I would take it one step further. The companies, unions and others filing similar suits should not just take the cash that will inevitably be on the table. The big media companies have done this dance with Big Tech’s next big thing before, and one hopes that they learned the appropriate lessons from the rise of the social media age. I’m sure many of the people in the upper echelons of the AI startup-verse arrived after some sort of tenure with one of the social media giants, so there should be some familiar faces.

For a decade, media companies rode the coattails of social media companies that, in the end, paid very little to kill off small and mid-sized media outlets. Sure, there were “pivots to video,” and then the “pivot to live video”, and then the pivot to algorithmic feeds that were trained on years of personal data willingly handed over by billions of people around the world. Given that data transfer and the amount of data it encompasses, it’s hard not to see that AI—powerful AI—was inevitable.

I hope that instead of taking a check for what at the time feels like an almost impossible amount of money, the media companies and creative artist community more widely get meaningful stakes in these companies and the opportunity to share in that financial success.

I always thought it was crazy that, during the steel company bankruptcies of the 1990s and airline bankruptcies in the decade or so that followed the 9/11 attacks, the U.S. government canceled debts, negotiated buyouts, and pumped cash into entire industries, but it never took permanent stakes in that recovery. Companies in the creative industries lost even more by not getting stakes in the social media companies that were slowly killing them, and they shouldn’t let it happen again.

There’s no reason the New York Times should settle for anything less than significant rolling payments from, and a meaningful equity stake in, OpenAI.

What’s more (and likely a topic for a later edition of this newsletter) unions have a chance here to have a real re-awakening in the U.S. A big reason there aren’t more suits like this is the lack of potential filers who can afford to pay the legal fees necessary to settle these matters in court. Collective action is the only thing that will get the journalists at the NYT, the writers of the next great Netflix series, and anyone else who puts that touch of creativity to things that only a human can, what they deserve—and that process should have started yesterday.

These world-beating AI models should be powerful enough to figure out how much of what it is using as learning inputs, maybe we should put it to use to make sure that the people—real, actual human beings—who write the first draft of history, document corruption, and…paint lovely landscapes, get the compensation they deserve for the consumption of their work, whether by human or machine.

Interestingly, there is a toggle opt-in to AI training in the Substack settings (the default setting is to allow this data transfer). The little info blurb tells you all you need to know, emphasis mine:

This setting indicates to AI tools like ChatGPT and Google Bard that their models should not be trained on your published content. This will only apply to AI tools which respect this setting, and blocking training may limit your publication's discoverability in tools and search engines that return AI-generated results.

It’s time to get ahead of AI regulation—all facets of it—before it’s too late.

— Sree

Twitter | Instagram | LinkedIn | YouTube | Threads

By Robert S. Anthony
Each week, veteran tech journalist Bob Anthony shares a tech tip you don’t want to miss. Follow him @newyorkbob.

If you’ve been waiting for the day when artificial intelligence affects the way you live or work, guess what: That day passed a long time ago. But AI may soon affect the way you look as it expands into a new area: fashion.

Instead of spending hours at a sketchpad or computer brainstorming ideas, what if you could generate the next big thing in fashion by plugging words or “prompts” into a generative AI platform like ChatGPT and letting it do the work? That’s already happening.

But AI comes with a lot of open questions. Whose photos, sketches, ideas and concepts went into the databases generating these images? Who owns the copyrights? Are people of color reasonably represented among those creating these AI databases or have they been left behind?

At a recent New York panel of AI experts titled “The Future in Black Presents: Tech Couture,” the key messages were simple: Creative Black fashion designers will be left behind if they don’t learn to use AI tools now, but the good news is that it’s not too late: AI-powered fashion designing is still in its infancy.

“We’re going to get erased if we don’t get our voices in,” said Joy Fennell, founder of The Future in Black. “You have to train your own AI model to create output that’s all yours.”

Currently Blacks have little representation at some of the major companies building tomorrow’s AI. According to Statista, Blacks held only 4.1 per cent of tech jobs at Google in 2023 and according to Microsoft’s Global Diversity & Inclusion Report for 2023, Blacks accounted for 6.7 per cent of its workforce.

Opé, an AI fashion designer and stylist, said that by understanding AI engines, she can generate images that reflect her own tastes, not some programmer’s. “I could make the images [of people] look like what I wanted them to look like and clothes I wanted them to wear,” she said.

So how easy is it to generate a classy image? In a demonstration, digital creator Tristan King entered the words “pearl,” “afro,” “love,” and “business man” into the Lexica Art generative AI Android app and within moments it came back with glamourous images that could easily have graced a book cover or art poster.

With all this computer-generated output, is there still room for human-powered creativity? The answer is yes, said Leighton McDonald, an experience designer and producer, if you become expert at tailoring your prompts for the AI platform you’re using.

So, is it too late for minorities to have their voices heard in the development of AI? The short answer from the panelists was “no.”

“The time to get in is now. Grow while it’s still low water,” said Opé. “Learn what you’re drawn to and take advantage of the tech.”

Did we miss anything? Make a mistake? Do you have an idea for anything we’re up to? Let’s collaborate! sree@sree.net and please connect w/ me: Twitter | Instagram | LinkedIn | YouTube / Threads

Copyright Law, Fair Use, and the Future of AI

The AI giants are going to be writing some big checks. But they may never be big enough.

DIGIMENTORS TECH TIP | AI & Fashion: Get In Before You’re Left Out

Discussion about this post