Friday, November 28, 2008

OCLC Use Policy Details: Use and Transparency

An interesting aspect of this policy is that it is entirely about the use of WorldCat records. That may seem obvious from its title, but what I am interpreting from the policy language is that the policy covers all WorldCat records currently in existence, regardless of when they were created or the policy in force at the time that were first used. Creation or update of records take place at a particular time, while use is an ongoing activity. I'd like to cover some possible consequences of that.

Agreement to the policy

OCLC has stated that the Policy will go into effect in mid-February. It appears that current Members will be "grandfathered" in under the policy, their continued use of OCLC being their agreement to the terms. The Policy also covers Non-OCLC Members, who will not have made any agreement with OCLC, and I am hard pressed to understand why those organizations would abide by the terms of the Policy. 

Versioning and records already "in play"

Section E.7 says that OCLC can make changes to the policy, and that those changes will apply to use from that point on, essentially what is happening now with this Policy. Although they have agreed to place a version indication in the policy statement field in the WorldCat MARC records, I'm unclear as to what role that version would play. Instead, it seems to me that the policy implies that all WorldCat records will be covered by the current policy, whatever version that is. If this is not the case, then it isn't clear how the new policy can apply to records obtained from WorldCat before the Policy was in force. Yet this is exactly what is implied in the section on adding 996 fields on page 8 of the FAQ:
B. Retrospectively. For records that already exist in your local system, we encourage you to add the 996 field to WorldCat records transferred to others. Should you choose to use it, the field should have an explicit note like the examples below:

MARC:
996 $aOCLCWCRUP $iUse and transfer of this record is governed by the OCLC® Policy for Use and Transfer of WorldCat® Records. $uhttp://purl.org/oclc/wcrup/1.0
"Retrospectively" in this case means for records that were created before OCLC began adding 996 fields, and thus before the Policy goes into effect.

With this control over the use of all WorldCat records in existence, OCLC could become a highly disruptive force for anyone with ongoing relationships around bibliographic records. Because the policy could change again regarding records that have already been transmitted, anyone developing applications around use of WorldCat records is left with great uncertainty. Absent a good survey of the OCLC record use landscape, it is hard to know how many organizations and uses could be affected by this because we don't know all of the many ways that organizations are transmitting, receiving and using WorldCat records. However, with a policy based on use, possession of WorldCat records is like having a ticking time bomb since you have no assurance that your use will be permitted in the future.

Transparency

The "out" for all of these areas where it isn't clear what use is or is not allowed is to file a WorldCat Record Use Form with OCLC.  OCLC will then determine if the use is allowed. Section E.6 says:
OCLC has the sole discretion to determine whether any Use and/or Transfer of WorldCat Records complies with this Policy.
If I were an OCLC Member organization, I would want this process to be as clearly defined and as transparent as possible, if for no other reason than to avoid any semblance of discrimination against parties making requests. For publicly funded libraries, participation in a process that even appears to some to exhibit prejudices could be a public relations disaster. The only way to demonstrate fairness is to have a process that is open and auditable. The same section says:
In the event OCLC identifies a Use and/or Transfer which does not comply with this Policy, OCLC shall notify the relevant OCLC Member(s) and/or Non-OCLC Member(s) and such parties agree to work with OCLC to resolve the noncompliance.
I would go further and ask for the development of a publicly available set of guidelines for use of the records, and a formal appeals process that has member input. 

OCLC Use Policy Details: Your Records

There has been a lot of excellent commentary about the proposed OCLC record use policy. What I want to do here is highlight a few details about the policy that I haven't seen discussed elsewhere. The first is...

Your original cataloging


There are two areas where it becomes important to identify "your records." The first is in section B.3 where "WorldCat record" is defined. In the final paragraph (top of p.2) it states:

An OCLC Member or Non-OCLC Member may Use or Transfer the following without complying with this policy: (i) a WorldCat Record designated in WorldCat as the Original Cataloging of the OCLC Member or Non-OCLC member...
In other words, your own original cataloging is not covered by this policy. That's good news, but the practical application of this may not be simple. The way to determine this is by reading the MARC 040 $a subfield, presuming that the system you used at the time set this correctly. There is also the fact that OCLC merges duplicate records, so two instances of original cataloging could become one in OCLC...

Then there's the issue of how this affects down-stream users. For example, if Library A gives a copy of all of its original cataloging to Library B, and says: "no restraints on use," is Library B still held to the policy in terms of its use of WorldCat Records? According to the policy (E.5):

Regardless of the source from which WorldCat Records are received, Use and Transfer of WorldCat Records is authorized solely by OCLC pursuant to this Policy.
This seems to contradict the "your original cataloging is not covered" clause, although perhaps contract law deals with these kinds of apparent conflicts in some neat way. I would say that your original cataloging is not considered a WorldCat Record (as defined in the policy) except that the language of the exception refers to the original cataloging records as WorldCat records.

Also not clear is how this relates to the request to include the OCLC policy field in exported records. Although it isn't stated here, it would seem that original cataloging records should not contain the statement. (Those records could, however, be given a CC license by the originating library.)

Your holdings

Another key area relating to a library's own records is section D on the transfer of WorldCat Records. Section D.1.a states that libraries can transfer WorldCat records of their own holdings to other Members and Non-Members. Holdings is defined in the glossary as the OCLC institutional symbol on the record.

Section D.3 gives the logical converse of that: that to transfer WorldCat records that aren't of your own holdings, you must obtain permission from OCLC. This places restrictions on any institution that has received records from others, and could have implications for union and consortial catalogs. There isn't any mention of consortial agreements in the policy, yet many libraries already share their records in one or more such databases.

-------

Even if we work out the conceptual issues, both of these pose some real challenges in implementation since our bibliographic data today often does not clearly define the origin nor the source of the record, especially data that is not transmitted in MARC format. I'm really not at all sure that we could actually do what the policy requires.

Saturday, November 22, 2008

More on Google/AAP

Here are some more bits and thoughts on the agreement between Google and the AAP.

Library Involvement

Some librarians were involved in the settlement talks. The only one I have found so far who has come out about this is Georgia Harper. The librarians were working under a non-disclosure agreement (NDA), and therefore will not be able to reveal any details of the discussions. I have heard statements from others who I believe were privy to the negotiations, and they all seem to feel that the outcome was better for libraries due to the involvement of members of our "class." (Note that Google and AAP had high-end lawyers arguing their side, and we had hard-working librarians. I don't know how many of "our" representatives were also lawyers, but you can just imagine how greatly out-gunned they were.) Unfortunately that doesn't change my mind about the bait and switch move.

Google Books as Library

Some have begun to refer to Google Books as a library. We have to do some serious thinking about what the Google Book database really is. To begin with, it's not a research collection, at least not at this point. It's really a somewhat odd, almost random bunch of book "stuff." As you know, neither Google nor the libraries are selecting particular books for digitization. This is a "mass digitization" project that starts at one end of a library and plows through blindly to the other end. Some libraries have limited Google to public domain works, so in terms of any area of study there is an artificial cut-off of knowledge. Not to mention that some libraries, mainly the University of California, have been working with Google primarily to digitize books in their two storage facilities; that is, they have been digitizing the low use books that were stored remotely.

So the main reason why Google Books is not a library is that it isn't what we would call a "collection." The books have not been chosen to support a particular discipline or research area. Yet it will become a de facto collection because people will begin using it for research. Thus "all human knowledge" becomes something more like the elephant and the blind man: research in online resources and research that uses print materials will get very different views of human knowledge. (This is not a new phenomenon. I wrote about this in terms of some early digital projects I was involved in.) One of the big gaps in Google Books will be current materials, those that are still in print. Google will need to convince the publishers that it can increase their revenue stream for current books in order to get them to participate.

Subscribing to Google Books: Just Say No?


Beyond the (undoubtedly hard-won by library representatives) single terminal access in each public library in the US, libraries will be asked to subscribe to the Google Book service in order to give their users access to the text of the books (not just the search capability). This is one of the more painful aspects of the agreement because it seems to ignore the public costs that went in to the purchase, organization, and storage of those works by libraries. (I'm not includng privately funded libraries here, but many of the participants are publicly funded.) The parallels with the OCLC mess are ironic: libraries paying for access to their own materials. So, couldn't the libraries just refuse to subscribe? Not really. Publicly funded libraries have a mission to provide access to the world's intellectual output in a way that best serves their users. When something new comes along -- films on DVD, music on CD, the Internet -- libraries must do what they can to make sure that their users are not informationally underpriviledged. Google now has the largest body of digitized full text, and there will be a kind of "information arms race" as institutions work to make sure that their users can compete using these new resources.

The (Somewhat Hidden) Carrot

I can't imagine that anyone thought that libraries and Google were digitizing books primarily so that people could read what are essentially photographs of book pages on a computer screen. Google initially stated that they were only interested in searching the full text of books. While interesting in itself, keyword searching of rather poor OCR text is not a killer app. What we gain by having a large number of digitized books is a large corpus on which we can do computational research. We can experiment with ideas like: can we follow the flow of knowledge through these texts? Can we create topic maps of fields of study? Can we identify the seminal works in some area? The ability to do this research is included in the agreement (section 7.2(d), The Research Corpus). There will be two copies of this corpus allowed under the agreement, although I don't see any detail as to what the "corpus" will consist of. Will it just be a huge file of digitized books and OCR? Will it be a set of services?

I have suspected for a while that Google was already doing research on the digital files that it holds. It only makes sense. For academics in areas like statistics, computer science, and linguistics, this corpus opens up a whole range of possibilities for research; and research means grants, and grants mean jobs (or tenure, as the case may be). This will be a strong motivation for institutions to want to participate in the Google Book product. Research will NOT be limited to participants; others can request access. What I haven't yet found is anything relating to pricing for the use of the research collection, nor if being a participating library grants less expensive access for your institution. If the latter is the case, then one motivation for libraries to agree to allow Google to scan their books (at some continuing cost to the library) will be that it favors the institution's researchers in this new and exciting area. Full participant libraries (the ones that get to keep the digital copies of their works) can treat their own corpus as research fodder. The other costs of being a full participant are such that I'll still be surprised if any libraries go that route, but if they do I think that this "hidden carrot" will be a big part of it.

----

There's lots of good blogging going on out there on this topic. It needs a cumulative page to help people find the posts. Please tell me you have time to work on that, so I don't have to take it on! (Or that it exists already and I've missed it.) (The PureInformation Blog has a good list.)

Note: the Internet Archive/OCA may take this on. I'll post if/when they do.

Previous posts:

Friday, November 21, 2008

Fork WorldCat



Done in haste - hopefully someone can improve.

Also, as stimulus for those with better art skills:

Tuesday, November 18, 2008

Google Giveth ... and Taketh Away

Some additions, amendments.

The agreement between Google and the AAP is of great significance for libraries. It is also very long, written in "legalese", and contains conclusions of a lengthy negotiation without revealing the nature of the discussion. Given that many lawyers were involved, we may never get the back story of this historic settlement, yet it has the potential to change the landscape on rights, digitization, and libraries.

I am basing much of my analysis on the summary of the agreement produced by ARL. This unfortunately means that some errors may be introduced between their summary and my interpretation. I have gone to the original document to check some particulars, such as definitions, but much of that document goes unread for now.

Key Points

(... or, a summary of the summary)

  • The agreement is primarily about books that are presumed to be in copyright but which are no longer in print. In-print books continue to be managed directly by the rights holders, who can make agreements with Google (or anyone else) for uses of those items.

  • The agreement has some odd limitations that baffle me: it only covers books published in the US that have been registered with the Copyright Office. It does not include any books published after January 5, 2009 .The settlement does cover non-US books (e.g. Berne countries); I'm still unclear on the statement about registration for US books, but it was cited in the ARL document.

  • The agreement trades off Google's liability with payment to rights holders. That is, as long as Google requires payment from users to displays and copies, and passes 2/3 of those monies to the rights holder, Google is exempt from copyright infringement claims by rights owners. So users of the digital files will pay to keep Google legal.

  • The agreement does not answer the all-important question of whether scanning for the purposes of searching is an allowed use under copyright law.

  • The agreement flaunts the concept of Fair Use by quantifying the amount of an in-copyright book that users can view for free ("20% of the text," "five adjacent pages," but not the final 5% of a fiction book, to keep the endings a surprise.) The ARL document has Google saying that it will not interfere with fair use. I can't find that statement in the actual settlement. These quantities are contractual, and I'm assuming that that technology will not allow users to exert fair use rights, only the contractual agreement.

  • Google will sell digital copies of in-copyright books to users, who will have perpetual access to the book online. Some printing will be allowed but all printed pages will have a watermark that identifies the user. (I'm calling this "ratwear," software that rats you out.) Users will be able to make notes on the book's pages, but they will only be able to share those notes with other purchasers of the book. (Thus buying a Google book is like joining a secret reading club.) The settle states that the watermark will identifier either the user, or other information "which could be used to identify the authorized user that printed the material or the access point from which the material was printed." Agreement, p. 47

Key Points Relating to Libraries

This is the hard part for me. Hard in that it really hurts.

  • After digitizing books held in libraries, Google will then turn around and become a library vendor, supplying those same books back to libraries under Google's control. Each public library in the US will get a single "terminal" provided (and presumably controlled) by Google that allows users to view (but not copy and paste from) books in the Google database. Some printing is allowed, but there will be a per-page fee charged.

  • Libraries and institutions can also subscribe to all or part of the database of out of print books. Access is not perpetual, but limited to the life of the subscription.

  • There is verbiage about how users in these institutions can share their "annotations." In other words, if you take notes on your own, obviously those are yours. But if you use the capabilities of the system to make your notes in the system, you cannot share your own notes freely.

Now for the Clincher


... this is the pact with the devil.

  • A library can partner with Google for digitization of its collection and get the same release from liability that Google has. The library can keep copies of these digitized books, however, it must follow security standards set by Google and the AAP and must submit its security plan for review and allow yearly auditing. (The security measures are formidable and quite possibly not affordable for all but the wealthiest institutions. There are huge penalties up to millions of dollars for not getting security right.)

  • Libraries that make this pact with the devil are thereby allowed to preserve the files, print replacement copies for deteriorating books, and provide access for people with disabilities. Note that all of these uses by libraries are already allowed by copyright law.

  • The libraries that make this pact with the devil cannot let their users read the digitized books. Well, they can let them read up to five (5!) pages in any digitized book. Presumably if the library wants to provide other uses it must subscribe to Google's service. Libraries are expressly forbidden from using their copies of the books for interlibrary loan, e-reserves, or in course management systems.

... and if you refuse to negotiate with the devil...

  • Current Google library partners who do not choose to become party to this must delete all copies of digitizations of in-copyright works made by the Google project in order to obtain a release from liability. If they choose not to delete the copies, they are on their own in terms of liability for the in-copyright books that Google did digitize (and Google knows exactly which books are involved.)

  • Even if the library was only allowing Google to digitize public domain works, those libraries must destroy all of their copies to get release from liability in case they mis-judged the copyright status of one of the those books.
In other words, this agreement is making the assumption that if anyone sues Google for copyright infringement, the library will be a party to that suit.

They say that "the devil is in the details." In this case that is not true: the devil is right up front, in the main message. That message is that Google has agreed with the publishers, and is selling out the libraries that is has been working with. The deal that Google and the libraries had was that in exchange for working with Google to digitize books in their collections, the libraries received a copy of the digital file. After that, it was up to the libraries to do the right thing based on their understanding of copyright law. Participating with Google has been an expensive proposition for the libraries in terms of their own staff time and in the development of digital storage facilities. Part of the appeal of working with Google was the assumption that partnering with the search giant gve the entire project clout and provided some protection for the libraries. With Google and the AAP now in cahoots, the libraries must join them or try to stand alone in an unclear legal situation; an unclear situation that Google invited the libraries into in the first place.

This is classic bait and switch. And it is bait and switch with powerful commercial interests against public institutions. There is no question about it...

THIS IS EVIL

Note: I've added more comment and info in the comments area as things pop up. So read on....

Tuesday, November 11, 2008

The Importance of FRBR Expression

Most of the talk about "FRBR-ization" (a terrible mis-nomer, but now common terminology) is about creating clusters of records that represent the same work. In fact, I'm of the opinion that the work level is of interest only to a few (for example, literary critics) -- what most users would like to see is the expression level. The expression is also the level that is needed for the various efforts to associate copyright information with bibliographic data.

In many cases, the work and expression are one and the same because the item has only been issued in one expression. For those, the distinction isn't of consequence.

Where there is more than one expression for the work, those expressions tend to take particular forms, at least for books: new editions, mainly for non-fiction; and translations. In both of these cases, I maintain that the expression level is what users want, not the work. (Non-book experts: does this carry through to other formats?)

My usual example of a translated work is Thomas Mann's Der Zauberberg. According to the cataloging rules, the work's title is Der Zauberberg, while expressions in our libraries may have the title in the language of the translation, e.g. The magic mountain. A FRBR-based work display would be something like:

Mann, Thomas
Der Zauberberg. 1924

This would be the work entry into The magic mountain for users in English-language catalogs, and I assume that many of those users would not recognize the German language title, nor want to go through this level to reach the translated version that they seek.

WorldCat has finessed this by keeping the translations separate -- in other words, WorldCat responds to a search with FRBR expression-level records. And I think this is more user-friendly than the work-level record would be.

The other case, that of editions, also argues for the importance of the FRBR expression-level, but the user needs may be different. In this case, the work level will be recognizable to the user, but the information about which is the latest edition/expression needs to be very clear so that the user does not mistakenly select an item that has been replaced or updated by a later edition. Using the Dewey decimal classification and relative index as our example of a work with many editions, WorldCat shows a single edition on its 'work' page, and I assumed that it was the latest, listed as "Ed. 20," "1989." In fact this isn't the latest edition -- there is a 22nd edition from 2003. Users would only find this by going to what seems to be the expression level where all editions are listed.

This shows how hard it is to create a single grouping for all records that serve the users' needs.

Meanwhile, I have another project that will be attempting to connect copyright information to bibliographic items, including linking to entries in the renewal database. Oddly enough, RDA lists the "copyright notice" element as being at the manifestation level, which seems wrong to me. Copyright is determined on the expression, at least for the two cases I have mentioned so far: each translation receives its own copyright, as does each distinct edition. That these may be republished in a variety of manifestations (hard back, paperback, large print, etc.) does not change their copyright status.

We cannot, however, link copyright information to works. There is no copyright in Der Zauberberg or in the Decimal Classification as a work; copyright will instead be on each expression. So for the purposes of linking to copyright information, it seems that we would ideally have a way to group items by expression. If not, then the only proper link would be on the manifestation, even though that means some repetition. What will make all of this difficult is that we won't often have a date that we can associate with the expression, only with the manifestation, and that isn't necessarily the copyright date. (Except when it is, of course. You librarians reading this know what I mean.)

It still baffles me that we don't include a transcription of the copyright statement on the book or item when we create library bibliographic data, considering how useful that could be. Yet, when I proposed the copyright statement field for the MARC record there was great opposition. Some things I just don't get.

Monday, November 03, 2008

Determining Copyright Status

Among the many interesting bits in the Google/AAP agreement is Section E which essentially lays out in detail what steps Google must take to determine if an item is or is not in the public domain. As we know, this is not easy. The agreement states that two people must view the title page of the work (yes, it says "two people") to determine if the item has a copyright notice, and to check the place of publication. To determine if copyright has been renewed, "Google shall search either the United States Copyright Renewal Records or a copy thereof." If a renewal record isn't found, and the work has a copyright date before 1964, then it is presumed to be in the public domain.

I decided to try this out, at least the part about checking the renewal. I did my searches in two databases: Stanford's and Rutgers'.

I happen to have a copy of Orwell's 1984 with detailed copyright notices. It lists the first copyright as 1949, by Harcourt, Brace and Jovanovich, Inc. It then says "Copyright renewed 1977 by Sonia Brownell Orwell." It also includes "Copyright 1984 by Virgin Cinema Films Limited" although I must say that I'm not sure why that latter copyright notice is in the book.

A search on '1984' in the Rutgers' database yields no hits, but using the author's name I find 37 items, of which one reads:
AUTH: George Orwell, translation: Amelie Audiberti. NM: translation.
TITL: 1984.
ODAT: 1Jul50; DREG: 7Nov77 RREG: R678090. RCLM: AFO-2377. Amelie Audiberti, nee Elisabeth Savane (A)
A search in the Stanford database gets me:
Title    1984 NM: translation
Author George Orwell, translation: Amelie Audiberti
Registration Date 1Jul50
Renewal Date 7Nov77
Registration Number AFO-2377
Renewal Id R678090
Renewing Entity Amelie Audiberti, nee Elisabeth Savane (A)
Both of these seem to be for the same item, and it's a translation of the book 1984. The renewal listed in the book for the English text is not in the databases. The instructions to Google say nothing about taking renewal dates from the book, so this one would appear to be in the public domain by the agreement's criteria.

Picking up another book of the right age, I have Proust's "The Captive" in the Modern Library edition, the "C. K. Scott Moncrieff" translation, with "Copyright, 1929, by Random House, Inc." on the title page.

In Stanford's database I get:

Title    The captive. Translated by C. K. Scott Monorieff
Author PROUST, MARCEL
Registration Date 27Jun29
Renewal Date 7Sep56
Registration Number A9965
Renewal Id R176423
Renewing Entity Random House, Inc. (PWH)

In Rutgers I get:
CLNA: RANDOM HOUSE, INC.
TITL: The captive.
XREF: Proust, Marcel.
Unfortunately, this latter doesn't include a date, so I'm not sure that this record provides sufficient information. Fortunately, the Stanford database gives more information. Unfortunately, the Stanford record gives the title and what we librarians would call the "statement of responsibility" in the same field, and misspells the name of the translator. This may make it more difficult for any automated matching of the records. (I am assuming that Google will be doing automated matching, not hand searching of the database. That may be a mistaken assumption, especially since they have agreed that two humans will view the title page.)

This next (and last) one is an especially interesting case. I have a copy of Rebecca West's "Black Lamb and Grey Falcon: A Journey through Yugoslavia" printed by Penguin books in 1994. It gives the copyright date as "1940, 1941" and the renewal date as "1968, 1969", both under the name of Rebecca West.

A search on the title in Rutgers' database gets me these three records:

CLNA: WEST, ROBERT.
TITL: Black lamb and grey falcon. (In Atlantic monthly, Feb.-May 1941)
ODAT: 21Jan41 OREG: B482882; 19Feb41 RREG: Rebecca West ; 12Aug68; R441634-441631.

CLNA: WEST, REBECCA.
TITL: Black lamb and grey falcon; a journey through Yugoslavia. Pub. serially in the Atlantic monthly, Dec. 17, 1940-Apr. 17, 1941. NM: additions.
ODAT: 20Oct41; A158501 RREG: Rebecca West ; 10Jan69; R453530.

CLNA: WEST PUB. CO.
TITL: Black lamb and grey falcon. (In The Atlantic monthly, Jan. 1941)
ODAT: 20Dec40; B479489 RREG: Rebecca West ; 2Jan68; R426137.

As you can tell, some part of the book was originally published in the Atlantic Monthly as a serial. From these records it's difficult to tell exactly what issues of the monthly it was included in, and the "Claimants" are all different. In the Stanford database it's a bit more clear. There are five records; four are duplicates for the original articles in the Atlantic Monthly and one more called "Additions." Each of the four duplicate records is like this one:
Title    Black lamb and grey falcon. (In Atlantic monthly, Feb.-May 1941)
Author WEST, REBECCA.
Registration Date 21Jan41, 19Feb41,21Mar41 21Apr41
Renewal Date 12Aug68
Registration Number B482882, B488595, , B492319,, B495868
Renewal Id R441633
Renewing Entity Rebecca West (A)
I suppose that the four renewal records are one for each item in the Atlantic Monthly, but they each have the same information. Only the fifth record, the one for "additions," includes the subtitle that appears on the book. The presence of the article records is puzzling because Stanford claims to have included only records for the renewal of books. In fact, it is easy to find records for articles in the database, so it's probably best to assume that the database covers text in general.

Even for the human searcher, it may be difficult to connect the book and the records because there is nothing in the book itself to indicate that it was previously published in a journal. In fact, the introduction merely mentions that the book itself was first published in two volumes in 1941.

The book was published in two volumes because it is nearly 1200 pages long. The archives of the Atlantic Monthly list the four articles with this same name as containing 24, 24, 26, and 24 pages, respectively. It's rather hard to understand how those articles, as copyrighted, could be the same as a 1200 page book. We are left only with the record that claims to be "Additions" and that has the same subtitle as the book:

Title   Black lamb and grey falcon; a journey through Yugoslavia.
Pub. serially in the Atlantic monthly, Dec. 17, 1940-Apr. 17, 1941.
NM: additions
Author WEST, REBECCA
Registration Date 20Oct41
Renewal Date 10Jan69
Registration Number A158501
Renewal Id R453530
Renewing Entity Rebecca West (A)
Again, title field contains quite a bit of information beyond the title, and it just isn't crystal clear to me that this record is for the book and not for the articles. If it is for the book, then the idea that 1200 pages were published serially over four journal issues is quite a stretch. Plus, the Monthly archive claims that the dates are Jan, Feb, Apr and May, 1941.

Underlying this statement: "To determine if copyright has been renewed, "Google shall search either the United States Copyright Renewal Records or a copy thereof" is a great deal more complexity than that one sentence implies. It makes me wonder if the negotiators for the AAP are fully aware of how inaccurate the results might be. (An example: the author field in a record for an article by George Orwell reads: "Author George Orwell. U. S. ed. pub. as Shooting an elephant, 26Oct50, A49135".) If they are aware of it, then I must commend them for taking the practical path and allowing Google to make books available based on this evidence. If a copyright holder notifies Google that a book has been determined to be public domain in error, Google is obliged to change the status of the work from public domain to "in copyright," but is not held liable for infringement if the steps for determining public domain were followed and documented as laid out in the agreement.

It will be hard to determine, however, if Google should happen to err on the side of copyright, and lists as under copyright works that are actually in the public domain. While copyright holders can be expected to make sure that their works are properly protected, works in the public domain have no rights holder to monitor their status, and no one assigned to protect the public interest.

One other caveat, which appears in Section E, is:
Any determination by Google that a work is a Public Domain Book is solely for the purposes of Section 3.2(d)(v) and is not to be relied on or invoked for any other purposes, including determining whether a work is in fact in the public domain under the Copyright Act.
Basically, this means that just because Google determines that a book is in the public domain doesn't mean that's the legal status of the book. It also means that the rest of us can't use the excuse: "But Google says it's in the public domain." I have not heard whether Google will make the documentation of its copyright search available, and it's that documentation that has the real value. It's kind of like algebra: the answer is important, but what really matters is how you got the answer.

[Note: keep an eye on the Open Library and Creative Commons for some work on copyright determination that will be openly accessible.]

Google/AAP settlement

This Google/AAP settlement has hit my brain like a steel ball in a pinball machine, careening around and setting off bells and lights in all directions. In other words, where do I start?

Reading the FAQ (not the full 140+ page document), it seems to go like this:

Google makes a copy of a book.
Google lets people search on words in the book.
Google lets people pay to see the book, perhaps buy the book, with some money going to the rights holder.
Google manages all of this with a registry of rights.

Now, replace the word "Google" above with "Kinko's."

Next, replace the word "Google" above with "A library."

TILT! If Google is allowed to do this, shouldn't anyone be allowed to do it? Is Jeff Bezos kicking himself right now for playing by the rules? Did Google win by going ahead and doing what no one else dared to do? Can they, like Microsoft, flaunt the law because they can buy their way out of any legal pickle?


Ping! Next thought: we already have vendors of e-books who provide this service for libraries. They serve up digital, encoded versions of the books, not scans of pages. These digital books often have some very useful features, such as allowing the user to make notes, copy quotes of a certain length, create bookmarks, etc. The current Google Books offering is very feature poor. Also, because it is based on scans, there is no flowing of pages to fit the screen. The OCR is too poor to be useful to the sight-impaired. And if they sell books, what will the format be?


TILT! Will it even be legal for a publicly-funded library to provide Google books if they aren't ADA compliant?


Ping! This one I have to quote:

"Public libraries are eligible to receive one free Public Access Service license for a computer located on-site at each of their library buildings in the United States. Public libraries will also be able to purchase a subscription which would allow them to offer access on additional terminals within the library building and would eliminate the requirement of a per page printing fee. Higher education institutions will also be eligible to receive free Public Access Service licenses for on-site computers, the exact number of which will depend on the number of students enrolled."


TILT! Were any public libraries asked about this? Does anyone have an idea of what it will cost them to 1) manage this limited access and pay-per-page printing 2) obtain more licenses when demand rises? Remember when public libraries only had one machine hooked up to the Internet? Is this the free taste that leads to the Google Books habit?


Ping! The e-book vendors only provide books where they have an agreement with the publishers, thus no orphan works are included. So, will Google's niche mainly consist of providing access to orphan works? Or will the current e-book vendors be forced out of the market because Google's total base is larger, even though the product may be inferior?


Ping! We already have a licensor of rights, the Copyright Clearance Center, and it was founded with the support of the very folks (the AAP) who have now agreed to create another organization, funded initially by Google and responding only to the licensing of Google-held content.


TILT! Google books gets its own licensing service, its own storefront... can anyone compete with that? And what happens to anything that Google doesn't have?


Ping! It looks like Google will collect fees on all books that are not in the public domain. This means that users will pay to view orphan works, even though a vast number of them are actually in the public domain. Unclaimed fees will go to pay for the licensing service. Thus, users will be paying for the service itself, and will be paying to view books they should be able to access freely and for free.


Ping! We have a copyright office run by the US government. I'm beginning to wonder what that Copyright Office does, however, since we now have two non-profit organizations in the business of managing rights, plus others getting into the game, such as OCLC with its rights assessment registry, and folks like Creative Commons. Shouldn't the Copyright Office be the go-to place to find out who owns the rights to a work? Shouldn't we be scanning the documents held by the Copyright Office that tell us who has rights? (Note: the famed renewal database is actually a scan of the INDEX to the copyright renewal documents, not the full information about renewal.) Even if we had access to every copyright registration document in the Copyright Office, would we know who owns various rights? I think not. And how much of this will change with the Google opt-in system? I get the feeling that we'll maybe resolve some small percentage of rights questions, somewhere in the order of 2-5%. And it will, in the end, all be paid for by readers, or by libraries on behalf of readers.


TILT! Rights holders can opt-out of the Google Books database. If (when) Google has the monopoly on books online, opt-out will be a nifty form of censorship. Actually, censorship aimed directly at Google will be a nifty form of censorship.


GAME OVER. All your book belong to us.