[EAS] Google Book Search Settlement

Peter J. Kindlmann pjk at design.eng.yale.edu
Mon Nov 3 20:50:08 EST 2008

Dear Colleagues -

One more item in the general area of "Scholarship and the Data 
Deluge." You may have heard about the recent settlement between 
Google and various publishers' associations. I'd read several short 
reports, but tonight ran across the article below in TidBITS 
<http://www.tidbits.com/>, a long-running quality Mac newsletter.

The issues in the settlement are not simple enough to ever allow a 
concise description, but this article, and its links, offer very good 
coverage of the background and of the settlement, its advantages to 
authors and readers, and the inevitabilities of digital publishing.


"Last year's words belong to last year's language,
next year's words await another voice."  --T. S. Eliot

"Outside of a dog a book is man's best friend.
Inside of a dog it's too dark to read."
      --Groucho Marx

Authors and Publishers Settle with Google Book Search
   by Glenn Fleishman <glenn at tidbits.com>
   article link: <http://db.tidbits.com/article/9837>

   Google wants to index all knowledge, and it thought that scanning a
   few tens of millions of books might be a good addition to the
   compendium of billions of Web pages, PDFs, and Word documents they
   already offer. The only trouble? Most of the books they wanted to
   scan are still under copyright protection. This caused the
   Association of American Publishers (AAP), the Authors Guild, and
   other organizations to gnash their teeth - and file lawsuits.

   Last week, Google and a host of these complainants agreed to a
   settlement that a court must still approve. Google will contribute
   piles of cash - $125 million - to settle outstanding issues and fund
   a new copyright clearinghouse that will enable authors and
   publishers to receive funds for online viewing of works.


   The settlement also clears the way for far greater access to
   _orphaned works_: books (and other material) that remain protected
   by copyright, but which are out of print or out of production, and
   largely unavailable even through lending libraries.

   Unlike the outcome of many lawsuits about copyright and access, this
   settlement could be a big win for authors, publishers, readers, and
   libraries. Could such a thing be possible?

   (Full disclosure: I am a member of the Authors Guild. Although I did
   not support the particular form of the Authors Guild lawsuit,
   neither did I cancel my membership as a result of the legal action.)

   [Editors' disclosure: With our Take Control hats on, we've worked
   with Google Book Search for years, and it pains me to say that the
   experience has been nothing but frustrating, with literally months
   of delay between uploading a fully searchable PDF - no need to scan
   anything - and having it posted. Plus, although Google's support
   people responded quickly to our queries, they were universally
   useless at addressing any complaints, such as posting delays or the
   existence of guaranteed broken links to Amazon.com for our titles,
   given the fact that Amazon doesn't resell our ebooks. I certainly
   hope that the settlement will mean increased exposure for Google
   Book Search and our content, and additional sales. -Adam]

**The Backstory** -- After a couple years of prep work, Google
   announced in 2004 its Google Print program, later renamed Google
   Book Search, as well as its Library Project, the controversial part.


   Google started partnering with major publishers first, followed by
   smaller houses - a total of 20,000 so far - to make their books
   available in some form online.

   Google's bigger objective was to partner with major academic
   libraries around the world, scan books using high-speed techniques
   it had invented, and use optical character recognition (OCR)
   technology to turn the scans into searchable text.

   Google Book Search made it possible for anyone to search the
   contents of any scanned book and, depending on the copyright status
   of the book and other factors, view or even download some or all
   pages. (Microsoft started two similar programs which avoided many
   copyright issues, but the company shut those projects down in May


   This behavior rankled many because Google claimed the right to scan
   copyright-protected books because the company wasn't per se
   distributing the books, even though it had full digital copies.
   Google maintained - in a rough approximation - that because it was
   working under contract with libraries that owned physical copies of
   the books, that making archival digital copies was perfectly
   legitimate, as was turning the copyrighted works into text and
   images that weren't revealed in whole on the Web.

   The various parties aligned against Google disagreed, and filed suit
   in 2005.

**The Variety of Works under Discussion** -- Part of what publishers
   and the Authors Guild found problematic, and part of how the
   settlement on which parties agreed was designed, centers around
   separating works into three categories: public domain, in
   copyright/out of print, and in copyright/in print.

* Public domain works are no longer covered by copyright, and may be
   used in essentially any form and any fashion. Many publishers,
   notably Dover, reprint public-domain works in various forms and
   compendiums. Copyright holders can also release all rights on works
   they control, placing a creation in the public domain. Google Book
   Search makes the full text available, including for download.


* Books that are in copyright, but out of print, are often called
   _orphan works_. This category covers books that are no longer
   stocked or available from the commercial book trade, but which
   remain under copyright. The copyright may be owned by a living
   person or his or her estate, by a trust, by a publisher, or by a
   company. Orphaned works make writers cry, because their hard-wrung
   prose - fiction or non-fiction or reference - is unavailable, even
   if the market desires it, because the economics of print publishing
   have until recently put their children in the gutter. Google Book
   Search makes the full text searchable, with snippets of context

* Active books are in copyright and in print. Books that are actively
   sold by publishers through booksellers or directly, even if they're
   30, 40, or 70 years old, fit in this category. Publishers often
   refer to their frontlist, books that are relatively new and actively
   promoted, and their backlist, titles still in stock and available,
   and which may even sell well, but which aren't promoted. The same
   searching and results are allowed as with out of print titles. (By
   the way, Amazon's special-order books program, launched at the same
   time as the bookseller's overall store in 1995, was the first simple
   way to obtain in-print books that weren't routinely stocked by
   either bookstores or book distributors. Prior to Amazon, special
   order books required time and effort on the part of a bookseller,
   and were often regarded as a giant pain to fulfill.)

   These three categories raise the question: what's covered under
   copyright, anyway? I'm glad you asked.

**Copyright's Increasing Longevity** -- Copyright law in the United
   States has been tweaked quite a bit since the right was granted in
   the Constitution, and because of this, there's quite a bit of
   complexity involved. The U.S. Copyright Office has a brief
   explanation, as well as a more extended discussion of terms.


   If I can try to boil the discussion down for published works
   copyrighted in the United States:

* Everything copyrighted - registered with the Copyright Office -
   before 1922 is in the public domain.

* Nearly everything registered as under copyright starting in 1922 was
   under copyright initially for a term of 28 years, which could be
   renewed on the 28th anniversary through the Copyright Office for
   another 28 years.

* Works registered starting 01-Jan-50 are grandfathered through a
   variety of rules to extend their copyright with no renewal being
   required. There are a lot of niceties involved, but this is the
   general rule.

* Any work copyrighted from 01-Jan-78 on is under copyright protection
   the moment it's created for the author's life plus 70 years, or for
   95 years from publication for works owned by a company - so-called
   "work for hire," in which a work was created by a statutorily
   defined employee of a firm or institutions, or for which copyright
   has been transferred by the individual or people involved to a
   company. No registration is required, but it ensures both a proof of
   ownership along with the maximum statutory damages (treble!) for
   successful proof of violation. (Before the Sonny Bono Copyright Term
   Extension Act of 1998, the duration was 50 years following death or
   75 years for works for hire. This was also pejoratively known as the
   Mickey Mouse Protection Act, because Mickey's appearance in
   Steamboat Willie would have entered the public domain in 2000.)


   A lot more explanation, which I'll avoid here, is necessary for
   rules surrounding other countries' copyright regulations prior to
   general international agreement in the 1970s about copyright terms,
   and rules in the United States for anonymous, pseudonymous, and
   unpublished works.

   If you read this carefully, you'll notice a gap. If a work was
   registered starting in 1922 and before 1950, it would wind up in the
   public domain if a renewal notice were not filed. It's unclear how
   many hundreds of thousands or millions of works may have fallen into
   that gap.

   But you can see that there's a giant divide. Before 1922,
   essentially everything. After 1922, nothing that anyone paid
   attention to.

**Fair Use** -- Copyright law contains a giant set of exemptions that
   are supposed to balance the U.S. Constitution's language against the
   public good. Article 1, Section 8, states that Congress shall have
   power "To promote the Progress of Science and useful Arts, by
   securing for limited Times to Authors and Inventors the exclusive
   Right to their respective Writings and Discoveries."

   Many arguments have been made about what limited times means -
   Stanford law professor Larry Lessig argued the Sonny Bono Act all
   the way to the Supreme Court - but the idea that copyright is
   intended not solely for the benefit of "authors and inventors" but
   for society as a whole should be undisputed. (If you've followed the
   actions of the movie and recording industries, and legislative
   efforts to support their actions, you might believe that copyright
   is all about ownership, not public good.)


   In that spirit, Congress defined exceptions to copyright, including
   fair use, which have further been refined by practice and the
   courts. There's a quadripartite test when a claimed fair use is
   examined: the commercial nature or lack thereof; the kind of work
   involved; the quantity of work used in relation to the original; the
   effect on the market of the original work. The test doesn't require
   every element to be met, but each part to be evaluated against the
   whole. (You can read about this in more depth at the Copyright


   Google has argued that its efforts at scanning copyrighted books and
   making them available for search with only snippets of results meet
   the smell test: Google was making no specific commercial return on
   its book search (in fact, investing tens of millions into its
   library-scanning efforts with libraries), that the works were
   intended for public distribution, that snippets were infinitesimal
   parts of books, and that the search giant was stimulating demand for
   the books it provided results against. Google provided links to
   purchase the books, and could thus track sales, too.

   The Authors Guild, among others, stated that simply the act of
   creating electronic editions that were stored and distributed,
   required permission from copyright holders, much less displaying the
   results. With a little programming work, an interested party could
   extract passages or entire books, too.

   Without being a lawyer specializing in this area, I believe it was
   and remains impossible to determine whether Google or its one-time
   opponents would have prevailed. They clearly would have created a
   new sub-area of law, either affirming, denying, or making far more
   complicated the notion of whether creating and owning copies of
   copyrighted works were de facto violations of the law.

   But these one-time opponents are now at least somewhat supportive of
   Google's efforts. What changed? Quite a lot, and in ways that all
   parties, and we readers, stand to benefit.

**Out-of-Print Books and Book Rights Registry** -- The settlement
   opens the way to allowing vastly improved availability of
   in-copyright books by separating out-of-print and in-print books
   into their respective categories, and collecting fees for all
   snippet displays, page reading, and page printing.

   Publishers, authors, and other copyright holders will be able to
   opt-out of having out-of-print books included; by default, all
   out-of-print books will be available, but parties can opt out. For
   in-print books, those who own the rights will opt in. This allows
   all of Google's existing partners to continue what they're doing,
   and publishers to experiment by adding specific titles or simply
   adding their entire catalogs.

   If I read the settlement right, publishers who do not opt in to
   allow in-print titles to be included by Google will simply have
   their works removed if available or not added in the future. (A
   complete set of links to resources can be found at the Authors Guild


   Where this agreement goes far beyond Google's current program,
   making it a win for Google, is that Google will now be able to
   provide not just snippet results, but entire pages or books (for
   viewing and printing).

   Google would collect the fees and pass them on to the Book Rights
   Registry, which will be run by a board of authors and publishers,
   and be founded with $34.5 million of a $125 million settlement that
   Google has agreed to pay - without admitting that any of Google's
   claims are invalid.

   Authors and publishers win by suddenly having a mechanism to
   disseminate electronic editions while collecting for per-snippet,
   per-page viewing, and per-page printing. Google has agreed to a
   63-37 split in favor of the copyright holder.

   The public wins because the settlement calls for a free subscription
   license for "designated" computers at all U.S. public and academic
   libraries - a miserly 1 per public library building or either 1 per
   4,000 or 1 per 10,000 students, depending on the institution type.
   Google has also agreed to pay all printing royalty fees for 5 years
   or up to $3 million, whichever comes first, for these qualifying

   Other institutions can pay for overarching printing and reading
   licenses, and public libraries can upgrade to fuller licenses, too.
   Without knowing what these more extensive subscriptions cost, it's
   hard to know whether public libraries will be able to afford them.
   Wade Roush of Xconomy, from whose writing I learned about the limits
   on free library access, is down on the whole deal, partly due to the
   scale of free access and partly due to the default pricing that
   Google will set on out-of-print, in-copyright books.


   Anyone who researches a topic should benefit from the availability
   of out-of-print works, as they comprise many millions of titles that
   are rarely available in wide circulation. Ten libraries around the
   world might have a particular book you need, but that doesn't mean
   you can gain access to it.

   Google has also agreed to pay legal fees, and at least $45 million
   to copyright holders whose works were scanned before a certain date
   connected to the lawsuit.

   Now, of course, not all publishers or copyright holders are
   represented by the parties involved, and some may choose to sue
   separately in the future. The court might also require the parties
   to appear in court, although courts prefer settlements.

   The only fly in the ointment is that copyright holders of
   out-of-print but in-copyright works are being de facto opted in to
   having their works available by virtue of this settlement, even if
   they're not party to it. That should fly, because most of these
   creators or owners can get no value out of their works at present,
   and few people complain about receiving additional compensation.
   Further, the creation of a clearinghouse gives a kind of imprimatur,
   allowing a party that represents authors and publishers to make sure
   out-of-print works see life again.

   There was the notable case in the music world of James Carter, a
   former convict whose voice was recorded on a chain gang in 1959 by
   pioneering folk music collector Alan Lomax. In 2002, the song he
   sang, "Po' Lazarus," was used in the opening of the movie "O
   Brother, Where Art Thou?" The soundtrack sold 4 million copies.

   Carter, who left prison in 1967 and had led a quiet life since, was
   tracked down after months of research by the Lomax archives, and
   presented with a $20,000 check; he received $100,000 by his death in


**Avoiding Collision with the Future** -- I'm a writer. I make my
   living by sitting down and typing, as I am now. The notion of Google
   appropriating my words without my permission or acknowledgment
   always bothered me, even though I also accepted that there was a
   fine chance that the company was operating within the legal
   constraints of copyright law.

   I similarly was troubled by the Authors Guild partnering with what
   is often its natural enemy, the AAP, in trying to prevent Google
   from related activities, some of which seemed to benefit me and
   authors, and others of which did not. (For instance, the AAP at
   times has suggested that public libraries should pay fees to
   publishers when they lend works. While this is the case in EU
   nations, authors generally don't believe that publishers would pass
   along these fees to authors; that's separate from the seemingly
   un-American idea that public libraries pay royalties!)


   This reconciliation doesn't solve all issues, but it makes it much
   more likely that independent authors and publishers survive and even
   thrive by providing a broader marketplace, while also providing
   greater availability of human knowledge. While the ease of access to
   publicly promulgated information, like Web pages, has increased,
   trends seemed to suggest that books would go down the path that
   movies are still taking and music is slowly escaping from: being
   available only in highly restricted ways that interfere with
   technological progress.

   With this new agreement in place, it's possible that you could
   publish a book, distribute it entirely through Google Book Search,
   and earn some money - maybe even a lot of money if the book goes
   viral - and bypass publishers entirely. That was the promise of the
   Internet music, blog, and podcast revolutions, too. While it hasn't
   come true for everyone, it's certain that many more voices are being
   heard by many more people around the world. And that's a good thing.

More information about the EAS-INFO mailing list