AI Audiobooks & Learning
Thinking about how we avoid letting decent be the enemy of perfect...
Folks who know me, know I am an audiobook fanatic. I’ve been listening to them for 30 years, reviewing them for publications and a judge for the Audies for over 20 years. I’ve written about them on my other blog for years. A while back I even discussed audiobooks and AI here and was featured in a Inside Higher Ed piece about AI audiobooks and university presses.
Since then, I have had the opportunity to listen to a few audiobooks with AI narrators and figured I would talk a bit about them here for folks who have or haven’t tried AI audiobooks yet or spark conversation in those that have.
Types of AI Audiobooks
First, I want to discuss the two types of AI audiobooks that I’ve encountered. The first have been the “Virtual Voice” audiobooks that are made available on platforms like Audible by the 10,000s. In fact, the flooding of Audible with virtual voice is only comparable of the flooding of books on Audible created with AI (for instance, “Barrett Williams” and ChatGPT have over 4,000 books on the platform). Of course, these aren’t just on Audible. Hoopla has hundreds of entries that are “synthesized voice” and others called “TBD” that are readily available and so I can only imagine are AI voices
.Side note: if you want to find audiobooks that are not Virtual Voice on Audible, audiobook narrator, Phil Gilbert, shared this resource with me on how to filter out Virtual Voice. Basically, when you search on Audible, just include the follow (without the quotes): “-virtual_voice”.
These audiobooks are put onto platforms to be sold, borrowed, and presented as publisher and platform-endorsed products for consumers. Granted, Audible has made many virtual voice audiobooks for free if you have a Plus membership. My guess is that they’re waiting to see what the level of interest is before they start figuring out the right means of charging for them—a price point that is not as much as audiobooks by human authors but still enough to make more profit.
Audible and Hoopla are not the only folks exploring this. Project Gutenberg all partnered up with Microsoft to make thousands of audiobooks available with AI narrated voices for free. You can check out the Spotify collection or on this website.
Recently, I tried a different type of AI narrated audiobook. One that gets made in the moment rather than something that is already created. A few months ago, Eleven Labs—the folks that are creating AI voice avatars—released an app, Eleven Labs Reader. This app allows users to import textual content (webpages, pdfs, epubs), select from a range of pre-created voices and then goes about reading the text with some degree of inflection and emphasis.
One of the features it has is that if you have your screen on, it will also highlight the paragraph it is reading from as a whole and spotlight the specific word it is reading. Like other audiobook apps, you can also adjust the reading speed to your liking as well.
Virtual voice audiobooks or readers like Eleven Labs in some ways feels like what has already existed: text-to-voice capabilities. Yet, in both cases, the sound quality, pacing and such were reasonable. Not great but felt reasonable enhanced from what I’ve listened to before over the years with text-to-voice.
Review of AI Audiobooks
So how were the experiences of listening to AI-generated audiobooks?
For this, I’ll be largely talking about nonfiction audiobooks. There are plenty of fiction audiobooks narrated by virtual voice in mysteries & thrillers, sci-fi & fantasy, romance, and yes, even erotica. I will probably try the AI narrations for fiction (more on that later), but for now, this discussion is on nonfiction.
I should make a distinction here too in my listening habits. I typically listen to audiobooks at 1.75X speed for fiction and 2.0X for nonfiction. I’ve found that this is a pace of listening that keeps my attention (or rather demands it so letting my mind stray is much less likely). These days, listening at regular speed is a challenge even for real narrators.
So listening to the AI narrations at 2.0X speed, I found they did a good-enough job. The pacing made the voice sound smoother. At that pace, it had just enough tone and emphasis to not distract me, but they were not particularly striking. I was never not aware that it was an AI narrator but I also wasn’t constantly focused on it. It performed no feats where I felt like, “yes, they nailed that complicated passage”. Yet, it read whole sentences together coherently rather than individual words. That is, the voice had a consistently and light level of emphasis that indicated what was being said was interconnected among all the words in the sentence.
Yet, at regular speed, I was painfully aware that it was an AI voice without any life to it. It did not seem to have advanced much from previous voice-to-text tools and certainly did not sound as good as some of the voices we’ve heard from Google NotebookLLM or the voices available for ChatGPT.
And, of course, it was still clunky with different elements that are pretty normal and easy to navigate. For instance, in one audiobook, the AI narrator kept pronouncing 503C(3) that varies distinctly from how we typically say it. In general conversation, we’ll say “5-0-1-C-3” in a way that they all string together as seemingly one word. Yet, for the narration, it read it as “5-0-1, C, 3”—with distinct pronunciation and pause between the 1 and C and the C and 3.
It also did not handle bullet points or different formatting well. It would default to reading it aloud (“bullet point”), even when there was a series of punctuation (one book could have been at least 30 minutes short if they took out all the pronounced “underscore” throughout the book’s many lists. I can imagine folks who use screen-readers most definitely encounter this all over the internet so it is not anything new. Yet, for supposedly consumer-driven products, it seems an oversight (especially when you’re the major audiobook platform and put 50,000 of these audiobooks out there).
More important, it didn’t really distinguish when people were speaking, which is important even in nonfiction—to understand if something is a quote or prose. It didn’t have any consistency or style, which is part of what I and others love about audiobooks. It repeated the words but did not “read” the words.
Overall, I’m not going to rule out listening to AI audiobooks but it’s not going to be my preferred method. It’s still a bit too flat and needs more nuance to interpreting the text than an AI tool is likely to do. There are many texts that need more depth of narration in order for them to actually capture the words on the page and AI narrators can’t do that yet unless there is an editor who is actively adjusting and refining the voice. Of course, to have someone doing that would undermine why Audible and other entities are interested in this modality. The mass production of narrations without needing direction and editing is clearly the point—but also, the point that is needed for there to be excited to be listening as opposed to just listening.
My Concern About AI Audiobooks
I do have some concerns about AI Audiobooks in their current form. Sure, I know “the technology will get better; this is the worst it will be” argument—but honestly, what is out there is not that much better than the text-to-voice quality we had before GenAI and it’s still glitchy.
The voices aren’t lively or dynamic and I think that’s a problem on lots of of levels. We take in lots of information with our ears; it’s a sense that we cannot turn off. We’re continually scanning (knowingly or not) our surroundings with sound. We know that hearing human voices can be incredibly important. And when we know that some texts are as important (or more important) to hear.
AI audiobooks will help make the textual world more aurally constructed manufactured. Yet, it will also be inferior to human narrators. And, if the past is prologue, the automated version will be either cheaper or as expensive, but also more ubiquitous. More audiobooks but also lesser-quality audiobooks. And we see time and again, those with fewer resources will get the inferior product.
I don’t want to play “what about the children” because there’s already plenty of that out there. But I am curious and concerned about the impact of mass exposure to the flatter voice. Because while I’m talking about audiobooks, it’s also likely to happen in lots of places, where using an a lesser-quality AI voice is going to be cheaper and more effective and what does it mean to have a flat voices everywhere for how we learn to listen and sense-make. Of course, this could be me making an argument akin to folks who insist that vinyl or tape recordings are better than MP3s when it comes to music—an argument, I literally can’t hear a difference from.
The Case for AI Audiobooks
Now, we know I love audiobooks. I’ve listened to thousands of audiobooks and reviewed hundreds across different publications. Over my career, I’ve gotten to be a judge for the Audies for over 20 years. I’ve gotten to interview dozens of audiobook narrators and am an absolute fanboy of many of them (Stefan Rudnicki could read references from a Works Cited and I’d listen to every single word). I’ve spent years converting folks to listening to audiobooks because they have been so life-changing for me.
And yet, I’m going to make the case for AI audiobooks to exist.
Let me start with a wishlist. I have a wishlist of 100+ books. Books that I think are important for me to read and learn from and also, they are books that I know I will never get to unless they are not turned into audiobooks. Specifically, these books will never be made into an audiobook with a human narrator. In part, I know this because they’ve been on my “audiobook wishlist” for a decade or more. Many of them are also academic texts or much less popular books that will never make it onto any publisher’s priority list. They’re not that important to the world, even if I deem them important.
I’m a pragmatist in this regard. Some audiobooks are definitely about the listening experience. They are meant to be cherished. But more often, it’s about what the content is, not the performance. It’s the same reason I use different modalities of reading (physical books, tablet reading, phone reading)—it is what is available to access the content. And after listening to a couple AI audiobooks, I have to say AI narrators do a good enough job.
Of course, there is an argument that AI will take jobs from artists like narrators. That is, companies and self-publishers will opt to use an AI narrator and therefore, there will be fewer paying opportunities and those will inevitably go to the more established and popular narrators (the rich get richer).
And there’s truth in that, but in the context of the world we live in, it also becomes an anti-accessibility argument. There are amazing entities like the Learning Ally (formerly Recording for the Blind & Dyslexic) and National Library Service for the Blind and Print Disabled doing important work to make books accessible to those with visual disabilities and yet, they are never going to be able to keep up with the millions of books published each year, never mind the backlog of millions of books that have never been converted to audio.
Much of the advancement of audiobooks and their availability has come from commercial enterprises and technology advancements (indeed, audiobooks have become significantly cheaper over the last 30 years and exponentially more available). And yes, it’s still a fraction of books out there. If we want those books to be available for all readers, especially to those that audio is a more successful medium to learn and engage with texts, holding onto the purity test of “it must be a ‘real voice’” means there are vast sums of knowledge and ideas that will continue to be denied to people.
The argument that the artist must be preserved and it is better and therefore we should shun, shame, or dismiss the use of AI narrators, basically says that it must be first class or no class. That the inferiority of a second-class product is such an affront that there is no point to the venture. But, of course, there is. For me, making the world more legible, feels more important in this moment. If we continue to wait until all audiobooks are narrated with human voices, we’re never going to achieve that—for a lot of different (and problematic) reasons.
But I also get it—it feels like a slippery slope to lean all into AI audiobooks. I just haven’t seen any viable solutions that solve for access over the last 30 years of listening and the last 20 years of actively following the industry.
(Inconclusive) Conclusion
Ok, so this is a substack supposedly about AI and Education—and here I am rambling about audiobooks. But audiobooks for me a microcosm of education. They have been a tremendous source of formal and informal learning over the decades. They also represent a lens for thinking about accessibility that proves helpful.
Of course, while I’m thinking about AI audiobooks as I wrote this piece, I was also finishing Aubrey Watters, Teaching Machines: The History of Personalized Learning—a critical history of the rise of automated machines created in the mid-20th century that were going to revolutionize education (according to scientists like B. F. Skinner and businesses). Watters, who has recently returned to writing about educational technology, has regularly been a thoughtful and critical voice. This book, in many ways, feels like a primer to the AI education conversation just like Brian Merchant’s Blood in the Machine does and who continues to write about the impact of technology—particularly Silicon Valley—on society in this blog.
They both offer great critiques of embracing technology carelessly while also demonstrating how these technologies are not neutral and come with real costs. But neither of them, or any of the critiques out there don’t seem to solve for the fact that these technologies do also help people at a scale previously unimaginable. They are imperfect and create problems but what we have had before these tools also do.
And so while I, too, am still very skeptical, as I’ve started to explore AI audiobooks, it does make me think about where else within the educational realm are we doing to let decent be the enemy of the perfect. Does outright rejecting any place that AI augments or changes or stands in for teachers make total sense—especially in circumstances where it’s either nothing or an AI option?
It’s a challenging question because I also know how easy it can be to then decide that the decent option is good enough for all. So that’s what I’m grappling with today. If you have thoughts or angles to consider to square this—I’d love to hear them!
Upcoming Events
I’ve got a couple more public events coming up that I figured I’d also share!
AI to Open Educational Resources: Enhancing Learning and Solving Ethical Challenges – March 12, 17, 20, 26 from 12pm-1:30pm ET. This mini course will be offered through EDUCAUSE and will include live sessions where we delve into the the relationship and challenges of GenAI and OER.
This is a 2-week asynchronous program from EDUCAUSE that offers synchronous drop-in sessions. This program is either facilitated by myself or Judy Lewandowski. They’re really good for folks still getting situated with GenAI. Below are the sessions that I’m facilitating:
NERCOMP Annual Conference: My colleague and friend, Antonia Levy and I are doing a facilitated discussion on Actionable Insights for Generative AI, Accessibility, and Open Educational Resources on April 01, 2025 from 11:30AM–12:15PM. If you’re attending the conference, we hope to see you!
AI+Edu=Simplified by Lance Eaton is licensed under Attribution-ShareAlike 4.0 International
Good overview, Lance. (I didn't know you were a big audiobook fan. Did you ever narrate for Librivox?)
Your call for creating more audiobooks is an important one.
I've been using Speechify to listen to web content. The fee is modest ($35 a year, I think) but it's well worth the ability to look away from the screen for awhile and still listen to articles.