Ebooks and Text-To-Speech Technology: A Legal Perspective

This article, from Charles A. Gaglia And Thomas R. DeSimone of The Legal Intelligencer, originally appeared on Law.com’s Legal Technology blog on 6/30/09.

Amazon’s recent foray into the electronic book business can be described in no other way than as a resounding success. In a short period of time, Amazon’s Kindle has done for the electronic book what Apple’s iPod did for electronic music: that is, make it easily accessible, downloadable and, most importantly, cool. However, Amazon’s attempts to find new ways to exploit this medium and enhance the reading experience have met with their fair share of controversy.

The Kindle 2 recently hit the market, and it included a new feature that had the publishing industry up in arms and threatening suit. Ths feature is commonly referred to as "text to speech," but according to representatives for the publishing industry and the Authors Guild, it may represent the beginning of the end for the burgeoning audio book market, in addition to constituting a blatant violation of existing copyright law. From a copyright point of view, does text-to-speech technology require a license? And should publishers be legitimately concerned about the demise of the audio book?

What exactly is an e-book? Quite simply, it is nothing more than an electronic version of a traditional paper copy of a book. An e-book is usually in some type of computer readable format (such as DOC, PDF, etc.) and can be read on any type of electronic device capable of displaying that particular file type. E-books have been around for quite some time but have had a limited appeal because of the fact that many people prefer the portability and ease of use of traditional printed media, as opposed to being tethered to a computer screen. Keenly aware of these shortcomings, several manufacturers attempted to develop dedicated hardware devices that would emulate the traditional book-reading experience while at the same time providing many advantages only possible with e-book technology, such as storage of hundreds or thousands of books on a single device and instant access to titles via downloading.

Sony was an early entrant into the field with its LIBRIe device, which never really found an audience. Sony tried again more recently with its PRS-500, which experienced moderate success, but has been largely overshadowed by the popularity of Amazon’s Kindle device. Unlike Sony’s PRS-500 reader, the Kindle does not need to be coupled to a computer in order to download titles. It uses Amazon’s wireless Whispernet (provided by Sprint) in order to download any available title from Amazon’s e-book library, wirelessly, on demand. However, the most controversial feature of the Kindle was introduced to the public when Amazon released the second-generation device, known as the Kindle 2. This device incorporated text-to-speech technology, which, at the press of a button, allows the Kindle to read the e-book.

Much like the e-book, text-to-speech technology is not something entirely new. In fact, the first computer-based text-to-speech system was completed in 1968. Text-to-speech software enables a computer to convert text characters into audible, intelligible words by virtue of the computer’s internal synthesizer. If one wants to get an idea of what typical text to speech carried out by a computer sounds like, it may be instructive to listen to any interview given by world-renowned physicist Stephen Hawking, who communicates with the aid of a computer because of severe paralysis brought on by the ravages of Lou Gehrig’s Disease. The technology continues to improve, and many who have heard the Kindle 2 in action have remarked on the quality and clarity of the Kindle 2’s electronic "voice." However, text-to-speech technology continues to be hampered by the software’s inability to convey emotion and to handle heteronyms, which are words that are spelled the same, but pronounced differently (e.g., "bow" as the front of a ship versus "bow," which is used to fire arrows). Considering these significant shortcomings, should the publishing industry be legitimately concerned that text to speech may replace audio books created by professional voice actors? The answer to this question is important, as it relates directly to whether text-to-speech technology is a permitted use of computer-stored text under U.S. copyright law.

As a result of protests made by the publishing industry and the Authors Guild that Amazon had not negotiated for the text-to-speech rights, Amazon elected to disable the feature at any publisher’s request, effectively forestalling any threatened litigation for the time being. In a press release announcing the compromise, Amazon steadfastly maintained its original stance that its text-to-speech feature was in fact a permitted use of computer text under their current license. In an opinion piece published in the Feb. 25 issue of The New York Times , Roy Blount Jr., president of the Author’s Guild, stressed the importance and value of protecting audio rights and the continued success of the audio book market. His argument was primarily economic in nature, stressing that authors be adequately compensated for their creative works and any derivative rights that may flow from them. But the letter is noticeably devoid of any legal support for the contention that text-to-speech technology is violative of U.S. copyright law. Blount concludes by noting that while parents need not fear any legal repercussions for reading bedtime stories aloud to their children, performing the same act with the Kindle’s text-to-speech function is another matter. He fails, however, to explain the distinction.

Under the 1976 Copyright Act, copyright protection may extend to any work of authorship. Among the works that are subject to protection are literary, musical, dramatic, choreographic, graphic, audiovisual and architectural works as well as sound recordings. In order to be eligible for copyright protection, the work must be "fixed in a tangible medium of expression." With respect to e-books, the underlying text itself is clearly subject to protection, in that the e-book text is fixed as an electronic file on the Kindle’s internal memory. (This assumes, of course, that the underlying e-book is still subject to copyright protection, and that the work has not passed into the public domain.)

However, the situation is not so simple when one considers that when a Kindle user activates the text-to-speech feature, there is no fixation of anything into a tangible medium. In fact, after the software completes the process of converting text into audible sound waves, and those waves have reverberated throughout the listener’s immediate vicinity, there is nothing tangible that remains. With respect to audio books, there is fixation, in that the sound waves of the author or professional reader’s voice are affixed to a compact disc, or more recently in the form of an electronic MPEG file affixed to the hard drive of a user’s iPod. But nothing similar exists with respect to text to speech.

Read the rest of the article on Law.com’s Legal Technology blog.