A while back, I wrote about some of my thoughts on generative AI around the copyright issues. Not much has changed since then, but a new article (Academic authors 'shocked' after Taylor & Francis sells access to their research to Microsoft AI) is definitely stirring up all sorts of concerns by academic authors. The basics of that article are that Taylor & Francis sold access to authors' research to Microsoft for AI development without informing the authors, sparking significant concern among academics and the Society of Authors about transparency, consent, and the implications for authors' rights and future earnings.
The stir can be seen as both valid and redundant. Two folks’ points stick out to me in this regard.
My friend and colleague, Lauren Barbeau, co-author of Critical Teaching Behaviors: Defining, Documenting, and Discussing Good Teaching had the following to say on LinkedIn:
I want to highlight the specific lines that caught my attention:
“I'm still sorting out my reaction to this news. How will AI's decontextualization of my work affect its productive use? How might it lead to misinterpretations and misrepresentations of my research? How will it affect future research we conduct related to that original project? Will AI properly attribute the information it shares? Academic publishing is not a money maker for most authors anyway, but how will it affect royalties?”
I think that’s a reasonable concern and challenge for scholars everywhere—how will their work, when influential, be acknowledged in the realm of LLMs. It brings me back to the way that LLM might make some very specific people’s work more invisible. That’s concerning because in so much of academic, citation is the coin of the realm.
And yet, Martin Eve, open access and humanities scholar (and also on my dissertation committee) explained on his site that while people are griping and threatening to boycott—they also need to read their contracts:
“The point is: the license here permits the Publisher (Routledge) to distribute a copy in any format to anybody. It doesn’t say ‘distribute a copy in any format worldwide for the purpose of human reading only’ but only, paraphrased, ‘to make a copy available [to anybody they choose] in any format worldwide’.”
It’s definitely the moment that many who have been following copyright and scholarly publishing practices over the last decade or long feel like collectively refraining, “It’s in the syllabus.”
Of course, I had a hot-take that was probably less diplomatic than Dr. Eve:
There are generations of research literature that publishers will continue to repackage, repurpose, and resell back to higher education or other industries. There's not much that can be done about it until scholars collectively rethink their relationship with publishers or society collectively rethinks copyright. Not sure either are going to happen.
Given the conversation, I’d decided to do a couple research insights on articles focused on copyright. I’ve updated the prompt (see the end of the piece to see it) and abbreviated the summaries but link to the full chatlog.
Mantegna, M. (2024). ARTificial: Why Copyright Is Not the Right Policy Tool to Deal with Generative AI. The Yale Law Journal Forum.
Generative AI summary
The rapid evolution of GAI has turned theoretical questions about artificial creativity into pressing legal and ethical issues. Mantegna introduces key questions surrounding compensation for authors, fair use, originality of AI outputs, and the potential imbalance in the public domain caused by AI-generated content.
Mantegna delves into the philosophical and legal dimensions of creativity, arguing that copyright law's traditional notions of authorship and originality are insufficient to address AI-generated works. The essay explores the ethical implications of GAI, including issues of consent, attribution, and compensation, and critiques the conflation of unethical behavior with illegality in public discourse.
Mantegna contends that expanding copyright law to cover AI-generated content could have negative consequences for innovation and cultural access. The essay calls for a holistic policy approach that considers ethical principles and broader societal impacts, rather than relying solely on copyright law.
Weaknesses, Limitations, and Blindspots:
One notable limitation in Mantegna's essay is the lack of detailed exploration of alternative legal frameworks that could more effectively address the challenges posed by GAI. While the essay critiques the applicability of copyright law, it does not provide a comprehensive analysis of potential solutions such as data privacy laws, AI-specific regulations, or other intellectual property regimes. Additionally, the essay's focus on the ethical dimensions of AI-generated content may overshadow practical considerations for implementation and enforcement of proposed policies. Finally, while the essay acknowledges the complexity of AI technologies, it could benefit from a more detailed examination of specific case studies or legal precedents to illustrate its arguments.
Key Applications and Considerations:
For scholars and students, Mantegna's essay offers a critical perspective on the intersection of AI and copyright law, encouraging a reevaluation of traditional legal frameworks in light of emerging technologies. The essay's emphasis on ethical considerations and societal impacts provides a holistic view that extends beyond legal doctrine. This approach is particularly valuable for interdisciplinary research, as it highlights the need for collaboration between legal scholars, ethicists, technologists, and policymakers. Additionally, the essay's critique of copyright expansion serves as a cautionary tale for those advocating for legal reforms, underscoring the importance of considering unintended consequences.
My Commentary
Mantegna is saying what so many of us realize but also offering a more critical thought about what does that mean afterward. Copyright isn't a great tool--it was antiquated before GenAI--doubly so now. But it's unclear where to go with it because so many traditional institutions use it as their metaphorical hammer to all things they claim to be nails (e.g. anything in the realm of intellectual property). This article calls out some of the usual suspects in sussing out the issues but am intrigued that the new prompt I'm using was able to articulate the notion of limited frameworks, legal precedence, or or solution. Still, it's a strong piece to help think through the complicated issues surrounding Gen AI and copyright.
(Note: I had the pleasure to work with the author when we were both at the Berkman Klein Center for Internet and Society--that didn't drive my decision to use her work, but it did mean I was more likely to come in contact with her work).
Lemley, M. A. (2024). How Generative AI Turns Copyright Law Upside Down. Columbia Science & Technology Law Review, 25, 190-212.
Generative AI summary
Mark A. Lemley’s article, "How Generative AI Turns Copyright Law Upside Down," provides an in-depth analysis of the challenges that generative artificial intelligence (AI) poses to traditional copyright law. Lemley argues that the rise of generative AI necessitates a fundamental shift in how we understand and enforce copyright protection. Central to his thesis are two main points: the disruption of the idea-expression dichotomy and the substantial similarity test for infringement.
Lemley posits that the traditional model of copyright law, which separates the protectable expression of ideas from the non-protectable ideas themselves, is increasingly untenable in the context of generative AI. In a world where AI can generate creative content from prompts, the creativity lies not in the generated content but in the formulation of the prompts. This shift challenges the core of copyright law, which has always protected the expression rather than the process of creation. Lemley illustrates this with examples where human input into AI systems, such as prompt engineering, is argued to be the new locus of creativity deserving protection. However, this new form of creativity, being fundamentally different from traditional forms, does not align neatly with existing legal doctrines.
Moreover, Lemley addresses the substantial similarity test, a cornerstone of copyright infringement analysis. He argues that in the context of AI-generated works, this test becomes problematic. Since AI can produce highly similar outputs from similar prompts, the traditional method of inferring copying from similarity and access is less effective. This necessitates a rethinking of how courts determine infringement, potentially requiring new methods to assess the originality and creative input of the prompts rather than the end products themselves.
Despite the robustness of his arguments, the article has several limitations and blind spots. One significant weakness is its heavy reliance on theoretical and legal frameworks without sufficient empirical data to back the claims. While Lemley provides numerous legal precedents and hypothetical scenarios, the lack of concrete examples from ongoing litigation involving AI-generated works leaves a gap in practical applicability. Additionally, the article does not fully explore the international dimensions of copyright law and how different jurisdictions might adapt to the challenges posed by generative AI. This is crucial given the global nature of both AI development and content distribution.
For scholars and students, Lemley’s article is essential reading as it highlights the pressing need to reconsider foundational legal concepts in light of technological advancements. It underscores the importance of understanding the intersection of technology and law, particularly how legal doctrines may evolve to accommodate new forms of creativity. Furthermore, the article encourages legal scholars to engage in interdisciplinary research, combining insights from law, computer science, and economics to develop comprehensive solutions to these emerging issues.
In terms of applications, this article serves as a critical resource for those involved in intellectual property law, offering a framework for analyzing and addressing the legal challenges of AI-generated content. It also provides a basis for policymakers to consider new regulations that balance the protection of human creativity with the advancement of AI technologies.
My Commentary
I appreciate how Lemley digs into this transition in Gen AI from outcome to process. I think that too has a lot of bigger questions and considerations for academia. First, process is incredibly important in the production of research but largely forgotten once it is done. Sure, it's addressed in "methods" but in much research that I've seen, the methods section is pretty small in contrast to results and discussion. But furthermore, it makes me think about the slow shift I've seen in some areas of teaching and learning away from product (e.g. paper, exam) to process-focused (captured through scaffolded assignments, reflection, service-learning and other practices). The critiques are interesting for two reasons. For one, it makes me think that this might be a great way of getting feedback on the gaps in one's research article before sending it off--especially if you can have it target particular disciplines or even journals in terms of its focused feedback. Second, I wonder about the usefulness of international consideration. That is, is Lemley speaking globally or within the specific context of the US?
Islam, G., & Greenwood, M. (2024). Generative Artificial Intelligence as Hypercommons: Ethics of Authorship and Ownership. Journal of Business Ethics, 1-5.
Generative AI summary
The article titled "Generative Artificial Intelligence as Hypercommons: Ethics of Authorship and Ownership" by Gazi Islam and Michelle Greenwood presents a multifaceted discussion on the ethical implications of generative AI (GenAI) in the realm of academic authorship and ownership. The authors propose the concept of a "hypercommons," which refers to the extensive, collectively produced digital inputs that GenAI relies on. This notion of hypercommons brings to the fore the ethical challenges posed by the exploitation of shared digital resources in the creation of academic content, thereby impacting the moral agency traditionally associated with scholarly work.
The authors construct their arguments by first defining the hypercommons and its significance in the digital age. They highlight how digital platforms and social media have contributed to a vast pool of data that GenAI utilizes to generate content. This exploitation of collective inputs for private gain, especially by large corporations, raises critical questions about ownership, authorship, and the moral agency of scholars who use these tools. The article also discusses the paradox of academic authorship, which, while promoting individual recognition and accountability, is inherently a part of a broader collective intellectual effort. The tension between individual and collective contributions is exacerbated by GenAI, which can blur the lines of authorship and accountability.
One of the primary weaknesses of the article is its somewhat abstract treatment of the hypercommons concept. While the authors effectively introduce the idea, the practical implications and specific examples of how GenAI disrupts traditional notions of authorship are less thoroughly explored. Additionally, the article could benefit from a more detailed analysis of potential solutions or frameworks for addressing the ethical challenges identified. The discussion on the two possible responses—reinforcing individual responsibility or rethinking the social organization of scholarship—remains somewhat speculative and lacks concrete recommendations for implementation.
For scholars and students, this article provides several key applications and considerations. Firstly, it underscores the importance of understanding the ethical dimensions of using GenAI in academic work, particularly in terms of attribution and accountability. Scholars must be mindful of the sources of their data and the potential implications of integrating AI-generated content into their work. Secondly, the article prompts a critical examination of the existing academic reward systems, which may need to adapt to recognize collective contributions more effectively. Finally, it encourages a dialogue on the governance of digital commons and the need for more democratic and transparent management of the hypercommons to prevent exploitation.
In light of the arguments presented in the article, scholars should contemplate several critical questions as they navigate their academic careers. How can they ensure that their use of GenAI tools aligns with ethical standards of authorship and accountability? What mechanisms can be developed to fairly attribute and reward both individual and collective contributions in a hypercommons environment? How might the erosion of traditional authorship norms affect the integrity and credibility of academic scholarship? What roles do academic institutions and publishers play in addressing the ethical challenges posed by GenAI, and how can they support scholars in this evolving landscape? Lastly, how can the academic community collaboratively develop governance structures for the hypercommons that promote fairness and transparency?
My Commentary
I found this idea of the hypercommons appealing and the idea that it has resonance in scholarship reaffirms to my own research. What I heard from many scholars in my dissertation study is the idea that all research is built upon and interconnected with other research. Therefore, intellectual ownership is a tricky concept--particularly in scholarly spaces because, without access to prior research, one's research is unattainable or untenable (that is, you can't build out knowledge without knowledge to build from). That collective idea is cool to consider but hard to think about in the context of academia's legitimacy-granting practices (e.g. citations to just impact for tenure, grants, awards, etc).
Meta-commentary
These 3 pieces are food for thought but still leave more to consider. In some ways, they fit together well. Lemley explains what has changed, Mantegna telps us why copyright isn't the right tool, and Islam & Greenwood help us to think differently about what it is when we use generative AI. And I think that's as far as things can go. All of these are pushing against traditional copyright but it feels like it will be incredibly hard to fundamentally alter copyright or to come up with a new framework to understand works created from these new tools of amalgamation. They're all trying to answer that question: How do we get there from here? I don’t believe they help us solve the issue posed at the start of this piece. Yet, I think we need more thinking in that space and people with some desire to rethink how we’ve been doing this whole copyright approach.
Appendix
Prompt used
You're a legal scholar who is an expert in generative artificial intelligence. Review the attached article and provide the following:
1. A comprehensive description of the arguments made and how they are constructed.
2. Weaknesses, limitations, or blindspots that the article evidently has.
3. Key applications and considerations of this article for scholars and students.
4. 3-5 Questions that one would want to be thinking about for their academic career as a result of this article. Do not do bulletpoints. Use paragraphs.
Do not do bulletpoints. Use paragraphs.
AI+Edu=Simplified by Lance Eaton is licensed under Attribution-ShareAlike 4.0 International
I enjoyed your introduction, exposition and clarification of the issues surrounding IP in academia. We have clearly arrived at a digital liminal space where traditional solutions no longer apply.
I think the main problem in academia is attribution - for both personal ego but also research network reasons. But when an LLM searches and generates an answer, the probability next-word-generation calculus seems to make attribution problematic ... at least for current LLM iterations.
But I would love to see LLMs be able to identify the best info (defined as coming from the most credible sources, especially scholarly platforms), like some kind of hypercommons to democratize the best knowledge.
This seems to be the current problem with Perplexity who cannot afford access to the pay walls guarding high quality info, and what they are left with is scraping social media like reddit and LinkedIn articles.
Maybe the coffers of Microsoft are deep enough the democratize high quality knowledge and make it accessible in a hypercommons. I think most scholars would appreciate this as long as they get some form of attribution.
Thanks for the nod! I've been thinking a lot about this whole issue. People immediately told me to sue, and I said we have no grounds because we don't hold copyright, exactly as you and others noted. While we as authors don't have any control over what the publisher does with our work in terms of distribution, it feels like a betrayal to not even give notice that they were going to do this.
Following up on your ideas, I've been thinking about how this event can and should change academic publishing. I can imagine some independent academic presses popping up to "combat" this trend. But I can also see how this could positively impact things like open access journal publications and self-published print-on-demand books. I could even see higher ed developing rigorous standards of peer evaluation for self-published books. I'm hopeful that this will shake up the iron grip publishing companies have on academics who need to get tenure or promotion. Just some thoughts I've been mulling over.