[EAS] Google Book Search Settlement
Peter J. Kindlmann
pjk at design.eng.yale.edu
Mon Nov 3 20:50:08 EST 2008
Dear Colleagues -
One more item in the general area of "Scholarship and the Data
Deluge." You may have heard about the recent settlement between
Google and various publishers' associations. I'd read several short
reports, but tonight ran across the article below in TidBITS
<http://www.tidbits.com/>, a long-running quality Mac newsletter.
The issues in the settlement are not simple enough to ever allow a
concise description, but this article, and its links, offer very good
coverage of the background and of the settlement, its advantages to
authors and readers, and the inevitabilities of digital publishing.
--PJK
"Last year's words belong to last year's language,
next year's words await another voice." --T. S. Eliot
"Outside of a dog a book is man's best friend.
Inside of a dog it's too dark to read."
--Groucho Marx
-----------------------------------------------------
Authors and Publishers Settle with Google Book Search
-----------------------------------------------------
by Glenn Fleishman <glenn at tidbits.com>
article link: <http://db.tidbits.com/article/9837>
Google wants to index all knowledge, and it thought that scanning a
few tens of millions of books might be a good addition to the
compendium of billions of Web pages, PDFs, and Word documents they
already offer. The only trouble? Most of the books they wanted to
scan are still under copyright protection. This caused the
Association of American Publishers (AAP), the Authors Guild, and
other organizations to gnash their teeth - and file lawsuits.
Last week, Google and a host of these complainants agreed to a
settlement that a court must still approve. Google will contribute
piles of cash - $125 million - to settle outstanding issues and fund
a new copyright clearinghouse that will enable authors and
publishers to receive funds for online viewing of works.
<http://books.google.com/booksrightsholders/>
The settlement also clears the way for far greater access to
_orphaned works_: books (and other material) that remain protected
by copyright, but which are out of print or out of production, and
largely unavailable even through lending libraries.
Unlike the outcome of many lawsuits about copyright and access, this
settlement could be a big win for authors, publishers, readers, and
libraries. Could such a thing be possible?
(Full disclosure: I am a member of the Authors Guild. Although I did
not support the particular form of the Authors Guild lawsuit,
neither did I cancel my membership as a result of the legal action.)
[Editors' disclosure: With our Take Control hats on, we've worked
with Google Book Search for years, and it pains me to say that the
experience has been nothing but frustrating, with literally months
of delay between uploading a fully searchable PDF - no need to scan
anything - and having it posted. Plus, although Google's support
people responded quickly to our queries, they were universally
useless at addressing any complaints, such as posting delays or the
existence of guaranteed broken links to Amazon.com for our titles,
given the fact that Amazon doesn't resell our ebooks. I certainly
hope that the settlement will mean increased exposure for Google
Book Search and our content, and additional sales. -Adam]
**The Backstory** -- After a couple years of prep work, Google
announced in 2004 its Google Print program, later renamed Google
Book Search, as well as its Library Project, the controversial part.
<http://books.google.com/googlebooks/newsviews/history.html>
Google started partnering with major publishers first, followed by
smaller houses - a total of 20,000 so far - to make their books
available in some form online.
Google's bigger objective was to partner with major academic
libraries around the world, scan books using high-speed techniques
it had invented, and use optical character recognition (OCR)
technology to turn the scans into searchable text.
Google Book Search made it possible for anyone to search the
contents of any scanned book and, depending on the copyright status
of the book and other factors, view or even download some or all
pages. (Microsoft started two similar programs which avoided many
copyright issues, but the company shut those projects down in May
2008.)
<http://www.libraryjournal.com/article/CA6564275.html>
This behavior rankled many because Google claimed the right to scan
copyright-protected books because the company wasn't per se
distributing the books, even though it had full digital copies.
Google maintained - in a rough approximation - that because it was
working under contract with libraries that owned physical copies of
the books, that making archival digital copies was perfectly
legitimate, as was turning the copyrighted works into text and
images that weren't revealed in whole on the Web.
The various parties aligned against Google disagreed, and filed suit
in 2005.
**The Variety of Works under Discussion** -- Part of what publishers
and the Authors Guild found problematic, and part of how the
settlement on which parties agreed was designed, centers around
separating works into three categories: public domain, in
copyright/out of print, and in copyright/in print.
* Public domain works are no longer covered by copyright, and may be
used in essentially any form and any fashion. Many publishers,
notably Dover, reprint public-domain works in various forms and
compendiums. Copyright holders can also release all rights on works
they control, placing a creation in the public domain. Google Book
Search makes the full text available, including for download.
<http://store.doverpublications.com/by-subject-literature-dover-thrift-editions.html>
* Books that are in copyright, but out of print, are often called
_orphan works_. This category covers books that are no longer
stocked or available from the commercial book trade, but which
remain under copyright. The copyright may be owned by a living
person or his or her estate, by a trust, by a publisher, or by a
company. Orphaned works make writers cry, because their hard-wrung
prose - fiction or non-fiction or reference - is unavailable, even
if the market desires it, because the economics of print publishing
have until recently put their children in the gutter. Google Book
Search makes the full text searchable, with snippets of context
presented.
* Active books are in copyright and in print. Books that are actively
sold by publishers through booksellers or directly, even if they're
30, 40, or 70 years old, fit in this category. Publishers often
refer to their frontlist, books that are relatively new and actively
promoted, and their backlist, titles still in stock and available,
and which may even sell well, but which aren't promoted. The same
searching and results are allowed as with out of print titles. (By
the way, Amazon's special-order books program, launched at the same
time as the bookseller's overall store in 1995, was the first simple
way to obtain in-print books that weren't routinely stocked by
either bookstores or book distributors. Prior to Amazon, special
order books required time and effort on the part of a bookseller,
and were often regarded as a giant pain to fulfill.)
These three categories raise the question: what's covered under
copyright, anyway? I'm glad you asked.
**Copyright's Increasing Longevity** -- Copyright law in the United
States has been tweaked quite a bit since the right was granted in
the Constitution, and because of this, there's quite a bit of
complexity involved. The U.S. Copyright Office has a brief
explanation, as well as a more extended discussion of terms.
<http://www.copyright.gov/help/faq/faq-duration.html#duration>
<http://www.copyright.gov/circs/circ15a.html#works>
If I can try to boil the discussion down for published works
copyrighted in the United States:
* Everything copyrighted - registered with the Copyright Office -
before 1922 is in the public domain.
* Nearly everything registered as under copyright starting in 1922 was
under copyright initially for a term of 28 years, which could be
renewed on the 28th anniversary through the Copyright Office for
another 28 years.
* Works registered starting 01-Jan-50 are grandfathered through a
variety of rules to extend their copyright with no renewal being
required. There are a lot of niceties involved, but this is the
general rule.
* Any work copyrighted from 01-Jan-78 on is under copyright protection
the moment it's created for the author's life plus 70 years, or for
95 years from publication for works owned by a company - so-called
"work for hire," in which a work was created by a statutorily
defined employee of a firm or institutions, or for which copyright
has been transferred by the individual or people involved to a
company. No registration is required, but it ensures both a proof of
ownership along with the maximum statutory damages (treble!) for
successful proof of violation. (Before the Sonny Bono Copyright Term
Extension Act of 1998, the duration was 50 years following death or
75 years for works for hire. This was also pejoratively known as the
Mickey Mouse Protection Act, because Mickey's appearance in
Steamboat Willie would have entered the public domain in 2000.)
<http://en.wikipedia.org/wiki/Sonny_Bono_Copyright_Term_Extension_Act>
A lot more explanation, which I'll avoid here, is necessary for
rules surrounding other countries' copyright regulations prior to
general international agreement in the 1970s about copyright terms,
and rules in the United States for anonymous, pseudonymous, and
unpublished works.
If you read this carefully, you'll notice a gap. If a work was
registered starting in 1922 and before 1950, it would wind up in the
public domain if a renewal notice were not filed. It's unclear how
many hundreds of thousands or millions of works may have fallen into
that gap.
But you can see that there's a giant divide. Before 1922,
essentially everything. After 1922, nothing that anyone paid
attention to.
**Fair Use** -- Copyright law contains a giant set of exemptions that
are supposed to balance the U.S. Constitution's language against the
public good. Article 1, Section 8, states that Congress shall have
power "To promote the Progress of Science and useful Arts, by
securing for limited Times to Authors and Inventors the exclusive
Right to their respective Writings and Discoveries."
Many arguments have been made about what limited times means -
Stanford law professor Larry Lessig argued the Sonny Bono Act all
the way to the Supreme Court - but the idea that copyright is
intended not solely for the benefit of "authors and inventors" but
for society as a whole should be undisputed. (If you've followed the
actions of the movie and recording industries, and legislative
efforts to support their actions, you might believe that copyright
is all about ownership, not public good.)
<http://www.lessig.org/blog/eldredcc/>
<http://wiki.lessig.org/index.php/Against_perpetual_copyright>
In that spirit, Congress defined exceptions to copyright, including
fair use, which have further been refined by practice and the
courts. There's a quadripartite test when a claimed fair use is
examined: the commercial nature or lack thereof; the kind of work
involved; the quantity of work used in relation to the original; the
effect on the market of the original work. The test doesn't require
every element to be met, but each part to be evaluated against the
whole. (You can read about this in more depth at the Copyright
Office.)
<http://www.copyright.gov/fls/fl102.html>
Google has argued that its efforts at scanning copyrighted books and
making them available for search with only snippets of results meet
the smell test: Google was making no specific commercial return on
its book search (in fact, investing tens of millions into its
library-scanning efforts with libraries), that the works were
intended for public distribution, that snippets were infinitesimal
parts of books, and that the search giant was stimulating demand for
the books it provided results against. Google provided links to
purchase the books, and could thus track sales, too.
The Authors Guild, among others, stated that simply the act of
creating electronic editions that were stored and distributed,
required permission from copyright holders, much less displaying the
results. With a little programming work, an interested party could
extract passages or entire books, too.
Without being a lawyer specializing in this area, I believe it was
and remains impossible to determine whether Google or its one-time
opponents would have prevailed. They clearly would have created a
new sub-area of law, either affirming, denying, or making far more
complicated the notion of whether creating and owning copies of
copyrighted works were de facto violations of the law.
But these one-time opponents are now at least somewhat supportive of
Google's efforts. What changed? Quite a lot, and in ways that all
parties, and we readers, stand to benefit.
**Out-of-Print Books and Book Rights Registry** -- The settlement
opens the way to allowing vastly improved availability of
in-copyright books by separating out-of-print and in-print books
into their respective categories, and collecting fees for all
snippet displays, page reading, and page printing.
Publishers, authors, and other copyright holders will be able to
opt-out of having out-of-print books included; by default, all
out-of-print books will be available, but parties can opt out. For
in-print books, those who own the rights will opt in. This allows
all of Google's existing partners to continue what they're doing,
and publishers to experiment by adding specific titles or simply
adding their entire catalogs.
If I read the settlement right, publishers who do not opt in to
allow in-print titles to be included by Google will simply have
their works removed if available or not added in the future. (A
complete set of links to resources can be found at the Authors Guild
site.)
<http://authorsguild.org/advocacy/articles/settlement-resources.html>
Where this agreement goes far beyond Google's current program,
making it a win for Google, is that Google will now be able to
provide not just snippet results, but entire pages or books (for
viewing and printing).
Google would collect the fees and pass them on to the Book Rights
Registry, which will be run by a board of authors and publishers,
and be founded with $34.5 million of a $125 million settlement that
Google has agreed to pay - without admitting that any of Google's
claims are invalid.
Authors and publishers win by suddenly having a mechanism to
disseminate electronic editions while collecting for per-snippet,
per-page viewing, and per-page printing. Google has agreed to a
63-37 split in favor of the copyright holder.
The public wins because the settlement calls for a free subscription
license for "designated" computers at all U.S. public and academic
libraries - a miserly 1 per public library building or either 1 per
4,000 or 1 per 10,000 students, depending on the institution type.
Google has also agreed to pay all printing royalty fees for 5 years
or up to $3 million, whichever comes first, for these qualifying
locations.
Other institutions can pay for overarching printing and reading
licenses, and public libraries can upgrade to fuller licenses, too.
Without knowing what these more extensive subscriptions cost, it's
hard to know whether public libraries will be able to afford them.
Wade Roush of Xconomy, from whose writing I learned about the limits
on free library access, is down on the whole deal, partly due to the
scale of free access and partly due to the default pricing that
Google will set on out-of-print, in-copyright books.
<http://www.xconomy.com/boston/2008/10/31/in-google-book-search-settlement-readers-lose/>
Anyone who researches a topic should benefit from the availability
of out-of-print works, as they comprise many millions of titles that
are rarely available in wide circulation. Ten libraries around the
world might have a particular book you need, but that doesn't mean
you can gain access to it.
Google has also agreed to pay legal fees, and at least $45 million
to copyright holders whose works were scanned before a certain date
connected to the lawsuit.
Now, of course, not all publishers or copyright holders are
represented by the parties involved, and some may choose to sue
separately in the future. The court might also require the parties
to appear in court, although courts prefer settlements.
The only fly in the ointment is that copyright holders of
out-of-print but in-copyright works are being de facto opted in to
having their works available by virtue of this settlement, even if
they're not party to it. That should fly, because most of these
creators or owners can get no value out of their works at present,
and few people complain about receiving additional compensation.
Further, the creation of a clearinghouse gives a kind of imprimatur,
allowing a party that represents authors and publishers to make sure
out-of-print works see life again.
There was the notable case in the music world of James Carter, a
former convict whose voice was recorded on a chain gang in 1959 by
pioneering folk music collector Alan Lomax. In 2002, the song he
sang, "Po' Lazarus," was used in the opening of the movie "O
Brother, Where Art Thou?" The soundtrack sold 4 million copies.
Carter, who left prison in 1967 and had led a quiet life since, was
tracked down after months of research by the Lomax archives, and
presented with a $20,000 check; he received $100,000 by his death in
2003.
<http://articles.latimes.com/2003/dec/08/local/me-carter8>
**Avoiding Collision with the Future** -- I'm a writer. I make my
living by sitting down and typing, as I am now. The notion of Google
appropriating my words without my permission or acknowledgment
always bothered me, even though I also accepted that there was a
fine chance that the company was operating within the legal
constraints of copyright law.
I similarly was troubled by the Authors Guild partnering with what
is often its natural enemy, the AAP, in trying to prevent Google
from related activities, some of which seemed to benefit me and
authors, and others of which did not. (For instance, the AAP at
times has suggested that public libraries should pay fees to
publishers when they lend works. While this is the case in EU
nations, authors generally don't believe that publishers would pass
along these fees to authors; that's separate from the seemingly
un-American idea that public libraries pay royalties!)
<http://en.wikipedia.org/wiki/Directive_92/100/EEC>
This reconciliation doesn't solve all issues, but it makes it much
more likely that independent authors and publishers survive and even
thrive by providing a broader marketplace, while also providing
greater availability of human knowledge. While the ease of access to
publicly promulgated information, like Web pages, has increased,
trends seemed to suggest that books would go down the path that
movies are still taking and music is slowly escaping from: being
available only in highly restricted ways that interfere with
technological progress.
With this new agreement in place, it's possible that you could
publish a book, distribute it entirely through Google Book Search,
and earn some money - maybe even a lot of money if the book goes
viral - and bypass publishers entirely. That was the promise of the
Internet music, blog, and podcast revolutions, too. While it hasn't
come true for everyone, it's certain that many more voices are being
heard by many more people around the world. And that's a good thing.
More information about the EAS-INFO
mailing list