Fair Use is FairThe argument that Google has made from the beginning of its book scanning project is that copying for the purpose of providing keyword access to full texts is fair use. They are fortunately able to cite case law to defend this, including case law allowing the copying of entire images by image search engines.
Among the reasons that they give for their fair use defense are:
1. Keyword search is not a substitute for the text itself. In fact, the copy of the text is necessary to provide a means for users to discover the existence of books and therefore for the books to fulfill their purpose of being read.
"Books exist to be read. Google Books exists to help readers find those books. Like a paper index or a card catalogue, it does not substitute for reading the books themselves..." (p. 2)
2. Google has elaborate protections in place to prevent users from reconstructing the text from its products. They reveal some of these protections, such as disabling snippet display for one instance of the keyword on each page, and disabling display of one page out of ten.
"One of the snippets on each page is blacklisted (meaning that it will not be shown). In addition, at least one out of ten entire pages in each book is blacklisted." (p. 10)3. No advertising appears on the GBS pages. This implies that Google is not making any money that could be claimed by authors as being theirs.
4. The Authors Guild has no proof of harm that has come from the digitization of the books. It is suggested that a thorough study might show that there have been gains rather than losses in terms of book sales. Even the Authors Guild (the Plaintiff in this case) advises authors to provide some of the text of their books (usually the first chapter) for browsing in online bookstores, and many rights holders participate voluntarily in Amazon's "Look inside" feature that shows considerably more than the disputed snippets that are displayed in GBS. And Google notes that 45,000 (!) publishers have signed up to have their in-print books searchable in GBS, with varying amounts of text available to the searcher prior to purchase. This makes the case that search and some text display is good for authors, not harmful.
5. Digital copies of books have never been "distributed to the public" (key wording in the copyright law). Only the libraries themselves that held the actual hard copies could receive a copy of the files resulting from the digitization.
Of course, all of this is done citing court cases in support of these arguments. The Authors Guild undoubtedly has counter-cases to present.
Libraries Under the BusOne of the key copyright-related arguments that Google makes is that its full text search within books provides a public service and support of research that is unprecedented. In making these claims Google decided to particularly emphasize its superiority to library catalogs. (Google refers multiple times to "card catalogues" which seems oddly antiquated, but perhaps that was the intent.)
"The tool is not a substitute for the books themselves -- readers still must buy a book from a store or borrow it from a library to read it. Rather, Google Books is an important advance on the card-catalogue method of finding books. The advance is simply stated: unlike card catalogues, which are limited to a very small amount of bibliographic information, Google Books permits full-text search, identifying books that could never be found using even the most thorough card catalog." (p.1) [sic uses of "catalogue" and "catalog" in the same paragraph.]
"Google Books was born of the realization that much of the store of human knowledge lies in books on library shelves where it is very difficult to find....Despite the importance of this vast store of human knowledge, there exists no centralized way to search these texts to identify which might be germane to the interests of a particular reader." (p. 4)As a librarian, I have to say that this dismissal of the library as inadequate really hurts. Yet I believe that Google is expressing an opinion that is probably quite common among information searchers today. One could counter with many examples where the library catalog entry succeeds and GBS fails, but of course that wouldn't bolster Google's arguments here. A reasonable analysis would put the two methods (full text and standards-based metadata) as complementary.
Google also argues that it did not give copies of the digital files resulting from its scanning to the libraries. How this plays out is not only clever, but it shows some real foresight on Google's part. They developed a portal where the libraries could request that a copy of the files be made "on demand" for the library, and using an encryption specific to that library. The transmission of the files from Google to the libraries was then an act of the libraries, not of Google.
"Moreover, the undisputed facts show that it is the libraries that make the library copies, not Google, and that Google provides only a technological system that enables libraries to create digital copies of books in their collections. Under established Second Circuit precedent, Google cannot be held directly liable for infringement because Google itself has not engaged in any volitional act constituting distribution." (p. 33)Clearly, Google designed the system (with goes by the acronym "GRIN") with this in mind.
I don't mind this, but wish that Google hadn't included a dig at HathiTrust as part of this argument. The document would not have suffered, in my opinion, if Google had left the parenthetical phrase off of this sentence:
"No library may obtain a digital copy created from another library's book -- even if both libraries own identical copies of that book (although libraries may delegate that task to a technical service provider such as HathiTrust)." (p. 15)It's one thing to claim innocence, but another to point the finger at others.
OmissionsThere a few glaring omissions from the document, some of which would weaken Google's case.
There is no mention of the computational uses that can be made of the digital corpus, something that was a strong focus in the failed settlement between Google and the authors and publishers. I have no doubts that Google is currently engaged in research using this corpus -- I don't see how they could resist doing so. They do mention the "n-gram" feature briefly, but as this is based on what appears to be a simple use of term frequency, it may not attract the court's attention.
In another omission, Google states that:
"Informed by the results of a search of that index, users can click on links in Google Books to locate a library from which to borrow those books ... " (p. 4)Google fails to state that this is not a service provided by Google but one provided by OCLC using exactly those card catalogues that Google finds so inadequate. Credit should be given where credit is due, but there is an important battle to be won.
Bottom LineThe ability to create full text searches of printed works (and other physical materials) is so important to research and learning -- and should be such an obvious modern approach to searching these materials -- that a win for Google is a win for us all. Although some aspects of this document shot arrows into my librarian-ly heart, I hope with all of that wounded heart that they prevail in this suit.
 This points to the ScribD site which unfortunately is now connected to Facebook and therefore is a huge privacy monster. The document should appear on the Public Index site shortly, with no login required.
 The term "product" could also be used to describe GBS.