Friday, February 26, 2010

Yet more OCLC

I have in hand a letter from Clifford H. Haka, Director of the Michigan State University Libraries, addressed to "ILL Partners" and dated February 24, 2010. The letter is a response to Larry Alford's document in my previous post. I will try to represent the facts he presents here as accurately as possible, and to distinguish those from my own opinions.

FACTS (from the letter)

MSU libraries chose to move their cataloging from OCLC to SkyRiver in a cost saving effort. They expect to save about $80,000 per year. Because MSU uses OCLC for ILL, they intended to pay to have their records loaded into OCLC. The OCLC service charge list gives the price for this service as $0.23 per record.

However, when MSU requested the upload service, OCLC offered them a price of $54,000 for five months (presumably end of fiscal year?), which would amount to $74,000 per year for 26,000 records, or $2.85 per record. (Some of this would be offset by cataloging credits.)

MSU has decided that they cannot afford this, and therefore will not be uploading current cataloging into OCLC. Haka says: "While we will continue with OCLC for ILL, I regret that our newer holdings will not be available for others to consult."

Now My Take

I find it astonishing that any corporation would choose to punish customers rather than to work to win them back. I also find it astonishing that OCLC is willing to keep current customers through threats and fear. Essentially, MSU is being made an example: if you move your cataloging to a competitor, we'll cut you out of OCLC services. This is a lesson for anyone else thinking of moving to SkyRiver or some other service.

As Haka points out in his letter, the OCLC database has a huge number of records that were not created through OCLC cataloging services. When the RLIN cataloging service still existed, many libraries that did their cataloging in RLIN uploaded those records to OCLC so that they could use the OCLC ILL service. They paid an amount similar to the $0.23 that Haka quoted from the current price list. This ability to upload (economically, I should add) is directly in support of the stated goal of maintaining WorldCat's value as a union catalog. The more complete the catalog, the more value it has for services like ILL, resource sharing, and collection development. Yet it is OCLC's action that is devaluing WorldCat by deliberately setting an upload price that MSU obviously cannot support economically. This tells me that the real issue is not the "value of WorldCat" but the revenue that OCLC receives from cataloging.

Business 101 would tell you that the existence of a competitor brings prices down in the sector. If you can't meet your competitor's price, then you can try to keep your customers through a superior product and better services, but for some price will be the main factor. If someone else can provide the same service at a better price, your customers will go there.

It seems to me, and Haka alludes to this, that OCLC's reliance on cataloging revenue may be in trouble, not just because of SkyRiver but also because of the Internet: it is now very easy for anyone to store and move metadata on the public Internet. The number of sites dedicated to the same materials that one finds in libraries in increasing rapidly. We have Amazon, Google Books, LibraryThing, Open Library, IMDB, and on and on. They all have metadata describing the things in their focus. It's not the same as library metadata, but the library catalog is no longer, and not by any means, an exclusive source of description for books, films, or music.

What OCLC has that is unique is not just the quantity of metadata but the library holdings information. And they seem to be aware of this as they load in both records and holdings from many libraries that do not do their cataloging on OCLC. OCLC's value is in the whole package, but it still relies on cataloging as its primary revenue (although shrinking as a percentage of the total income, as you can see in their annual reports).

The services, like ILL, that OCLC provides for libraries are incredibly valuable and it would be a great detriment to the library community to lose them. It does appear, however, that there has been shift in the marketplace; a shift that has nothing to do with library loyalty to the OCLC collective, but one of changing technology and economics. OCLC is trying to push water upriver, when it should be seeking a new balance in its revenue stream. Instead, OCLC is making a real mess of its relationship with its members -- first with the horribly botched record use policy (which isn't going to solve this problem anyway), and now with acting punitively toward members who make the kinds of economic decisions that we all make every day. I believe the "collective" can be saved, but only if OCLC decides to work with, not against, its members.

More thoughts (added later)

I realize now that I have many other questions about record loading on OCLC. For example, many libraries get some of their records from their book vendors, and those do get loaded into OCLC. Is that charged as cataloging, or as record loading? Are there different fees for loading records if you are doing your cataloging on OCLC vs. if you are not? Are there "load only" libraries who load their records in order to participate in ILL and other services? If so, what are they charged for record loading?

I say this because it makes sense to me that libraries that do not do their cataloging on OCLC would be encouraged to load their records so that they can participate in other services. It also makes sense that the price for this would be commensurate with that of adding your holdings online (or maybe a bit cheaper if it's more economical for OCLC to batch load rather than provide cataloging online). In fact, what difference does it make how you get your records into OCLC? The most important thing is that your records are there as part of WorldCat.

What the MSU letter tells me is that the OCLC economics are such that cataloging on OCLC is paying for other services, like record uploads, which may be under-priced. A different upload charge for non-cataloging libraries makes sense, and if that's the case then OCLC needs to make that clear. However, it wouldn't surprise me if that wouldn't make alternative cataloging services unmarketable, because as the MSU case shows, the total for cataloging elsewhere plus loading on OCLC would favor doing cataloging on OCLC. This makes perfect sense to me, but it appears that members haven't been informed of this pricing practice. Really, a little more transparency about pricing could go a long way toward avoiding situations like the MSU one.

Thursday, February 25, 2010

OCLC again

Someone slipped me Larry Alford's letter to OCLC members. This is the worst piece of "argument by innuendo" that I have ever seen. The members deserve better, much better.

I am pretty much unable to discern the message in these four pages of insinuations and scores of questions. The document is entirely devoid of facts or information. Still, I'm going to attempt to extract some sense out of it.

First, it's all about threats to WorldCat, in particular as libraries turn to other sources of bibliographic records. What these threats are should be easily quantifiable, but Alford doesn't provide us with any figures. Here's the information that is needed if one wants to make an assessment of the situation:
  1. Are member libraries adding fewer records to WorldCat? How many fewer, and what is the actual loss of revenue to OCLC? Has anyone interviewed them to ask why?
  2. Are former member libraries leaving WorldCat for other services? How many, and what is the actual loss of revenue to OCLC?
  3. What does OCLC charge for its various services? There is no information on the web site, and I've heard it said that contracts between OCLC and libraries are confidential. This makes it very hard to have a discussion about costs and how costs are affecting OCLC's services in the market. Alford makes reference to "alternate service providers" (*cough* SkyRiver) but makes no comparison of costs or services.
There are, of course, a number of red herrings in the text. I say "of course" because it is in the nature of this kind of emotional plea to bring up unsupported statements. As an example, he states that he has asked a series of questions, like
Should the OCLC cooperative create and support software that provides quality control and the ability to make global changes as librarians create new subject headings and revise authority records?

and ends with
I am pleased to note that the response of almost everyone to whom I have posed these questions has been a universal and enthusiastic "yes."

But let's look at those questions. He asks about "supporting" CONSER, NACO and BIBCO without saying the nature or cost of that support. Maybe there is something to think about there. He asks if OCLC should continue maintaining the Dewey classification. Well, what does it cost OCLC, and what revenue does it bring in? And would there be another venue for the community to maintain DDC if members decide that it's not a good activity for OCLC?

He also asks, rhetorically, whether it is better to have a single database for bibliographic and holdings information or
... is it preferable to sequentially search dozens or even hundreds of catalogs around the world to try to find that particular book or article that a researcher needs?

He should know that there are other options, but this document is not about facts but persuasion.

Oftentimes I am unclear at what he is alluding to. On page three he says that there are libraries who are doing their cataloging elsewhere but "still want to participate in the resource sharing made possible by WorldCat." I don't know what resource sharing he means, but as far as I know anything beyond a search in the open WorldCat database is done for a fee. Is he complaining that some libraries do not contribute records to WorldCat but subscribe to other services? That sounds like a revenue stream to me. He refers to these libraries as consuming more value than they return, but I don't know what the unit of the "value" is. As a matter of fact, throughout the document there are references to value that sometimes seem to be about OCLC's revenue, and at other times seem to be about the completeness of WorldCat. Mixing these two up in the discussion is not helpful, not at all.

The purpose of the mailing that this document was attached to was to let OCLC members know that a new, revised policy will soon be sent to OCLC's Council and Board of Trustees, and eventually to all members. If the policy was developed in the same kind of information vacuum that this document exhibits, I have little hope that it will be any better than the original policy that began this round of member dissatisfaction.

Monday, February 22, 2010

Shameless Self-Promotion

The American Library Association has published two reports that I prepared on metadata and the semantic web.

Report 1 is called: Understanding the Semantic Web: Bibliographic Data and Metadata. This is a broad overview of new concepts, aimed especially at those who are new to the semantic web and to web-based metadata. (Note: To understand the diagrams, you will need a copy of the Errata page, since a key set of the diagrams was borked.)

Report 2 has the catchy title of: RDA Vocabularies for a Twenty-First-Century Data Environment. This builds on the first report, and gives more information about building semantic web vocabularies. This report is for all of you who are wondering what on earth it is that Diane Hillmann and I keep going on about when we talk about registering RDA for semantic web use. It is not overly technical so anyone who reads through Report 1 should be able to understand the general direction that we are advocating.

Feel free to ask questions, make comments, argue with me, or tell me why I'm wrong. I don't claim to have the final answers, and want very much to have a dialog about these concepts that will lead us to a new interesting place to be in library technology.

Sunday, February 21, 2010

Trust and the Settlement

In the week leading up to the hearing (Feb. 18, 2010) in New York in Judge Chin's court on the proposed settlement between the AAP/AG and Google, many parties weighed in with formal documents as well as informal ones. While few if any of these produced new information for the judge, they do reveal the different points of view of the parties involved.

One of these revelatory pieces is a blog post by the University of California's Ivy Anderson. Anderson has been involved in the negotiations with Google probably from the very beginning of UC's involvement. Her post attempts to counter the criticism of the settlement as well as many fears that have been expressed, with an emphasis on academia and academic libraries. For example, Anderson cites checks and balances on pricing that should prevent price gouging, as well as the possibility for the participating libraries to negotiate prices with Google.

I find two fundamental flaws in her arguments. The first is that she speaks from the perspective of a participating library, that is, a library that is able to negotiate directly with Google because of its position as a provider of books to be scanned. I have no doubt that this is a comfortable position for UC and for the other participating libraries, but they are small in number, especially compared to the total number of libraries and institutions that will be affected by the Google Book Search product. And of course their position is diametrically opposed to that of the general public, who have no voice in any of this project.

It doesn't surprise me that Anderson and others in similar positions have positive feelings about the settlement: they have been able to negotiate with Google and to make their needs known. Undoubtedly they have received some concessions. I also have no doubt that Google has been gracious and helpful. For all of the rest of us, however, the entire process has been a black box. We are being asked to trust the participating libraries, and to trust their trust in Google. Even though the needs of the participating libraries, all of whom are large research libraries, are almost certainly not the same as our own.

The second flaw that I see is Anderson's focus on Google as decision-maker. My reading of the composition of the governing body (should the settlement be approved) is that it will solely represent rights holders. It will set prices and even must approve Google's products. I find it interesting that we all (and Anderson included) tend to refer to this as the "Google settlement" -- but Google is the weak party in this particular situation. Remember that Google is the defendant, and that the mere act of settling is an admission of defeat. The libraries have hitched their wagon to the loser in this case. That can't be a good position.

I must say that I am much more afraid, if that's the right word, of the power that could be wielded by the AAP/AG should the settlement be approved. Google has many kind words to say about libraries. The AAP, however, has made it clear that they consider many library uses of materials to be infringements:
We also had significant concerns with respect to the digital copies that
Google was providing to libraries. Libraries might use significant portions, or all, of the contents of books on such copies for a range of purposes that publishers would not regard as permitted by the Copyright Act, including uses in classroom, “e-reserve” access to
students and faculty via institutional servers and lending digital copies to other libraries. Libraries might have raised fair use defenses in an attempt to justify such activities. We might also have been faced with sovereign immunity defenses by state institutions. In
addition, we were concerned about how the libraries could maintain the security of these digital copies. Security breaches might result in broad copying, uploading, downloading, and display of copyrighted works. (Statement of Richard Sarnoff, for the AAP board, p. 3)
The interesting upshot of this entire settlement process is that by digitizing the contents of libraries and managing those digital copies through contracts, the publishers could finally get the kind of control over library uses that they would have liked to have over the paper books held in libraries. They would like to have controls over inter-library loan, classroom use, and reserves, but they cannot exercise such controls in the analog world. Publishers have argued since the very early days of digital documents that all lending of digital documents is the making of a copy, and therefore is not allowed by copyright law.

As a matter of fact, right on page one of the Plaintiff's statement for the judge, among the bullet points describing the main achievements of the settlement, is this one:
Limits library uses of digital copies of Rightsholders’ works.
Perhaps it has been naive of me to see this settlement as being about Google's commercialization of the world of books. It is possible that the more pertinent end result could be a renewed control of books and their uses by the publisher community. Attempts to modify copyright law to cover digital resources have failed, and the rights of the public in relation to those resources are as yet unclear. This has left a gap that the AAP/AG settlement exploits fully.

OK, now I'm afraid!

Friday, February 05, 2010

DOJ: "A Bridge Too Far"

How long has it been since you read something that came from a government agency and thought: "Wow! Brilliant!" Kudos to the Department of Justice for their Statement of Interest in the AAP/AG v. Google suit. Summed up, in their words:
In general, the project is a "good thing" -
Breathing life into millions of works that are now effectively dormant, allowing users to search the text of millions of books at no cost, creating a rights registry, and enhancing the accessibility of such works for the disabled and others are all worthy objectives.

However, the settlement goes beyond the original dispute, and is trying to use class action to create a new market that is unrelated to the copyright-related lawsuit -

Although the United States believes the parties have approached this effort in good faith and the ASA is more circumscribed in its sweep than the original Proposed Settlement, the ASA suffers from the same core problem as the original agreement: it is an attempt to use the class action mechanism to implement forward-looking business arrangements that go far beyond the dispute before the Court in this litigation. As a consequence, the ASA purports to grant legal rights that are difficult to square with the core principle of the Copyright Act that copyright owners generally control whether and how to exploit their works during the term of copyright. Those rights, in turn, confer significant and possibly anticompetitive advantages on a single entity – Google.

Not only that, but the DOJ seems to lend some weight to the "fair use" defense originally claimed by Google (and by the participating libraries) -
There has not been – and simply could not be – any allegation in this litigation that Google has sold full access to works for which it lacks the right to do so, or even that such activity was threatened. Indeed, selling such access would have been legally indefensible, and thus would have been at odds with Google’s entire pre-settlement book search strategy, which was premised upon staying within colorable “fair use” grounds. With very good reason, therefore, Google consciously avoided creating precisely the factual predicate that might support the settlement of book- and
subscription-selling claims. The business models that the ASA authorizes therefore relate to activities in which Google never engaged or threatened to engage, and thus claims of copyright infringement that could not have been brought.

The anti-trust issues brought up by the suit are unchanged in this amended settlement agreement. This leaves the judge in an even tougher spot than he seemed to be in before: if he decides that the suit is a valid class-action then he has to address the anti-trust issues. However, I have seen no clear description anywhere of how those could be addressed, so the judge is being asked to be very clever indeed -
Finally, the United States recognizes that if, as discussed supra, class representatives lack the power under Rule 23 to grant Google the power to exploit broadly the digital rights of class members to sell books, create subscription libraries, etc., then neither the class representatives nor Google possesses the power to authorize such activity by third parties. However, if the Court determines that the class representatives possess such rights as to Google, then the Court should carefully examine whether there exists a means for rival distributors to access orphan and rights-uncertain works consistent with Rule 23.

The DOJ suggests the following:
  1. Some issues could be resolved by turning the "opt out" into "opt in" for rights holders. (That would essentially be exactly what we have today under copyright law.)
  2. A "waiting period" before Google can make use of out-of-print works, to give rights holders a chance to surface. (This option seems to contradict #1)
  3. More effort should go into finding rights holders.
  4. A periodic reassessment of the marketplace for the out of print works (which, because of exposure, could have changed in market value)

The big question is: Is this the death knell for the settlement? And if so, where do we go next? I predict that if the suit is rejected we will have orphan works legislation sooner rather than later, since this suit has clearly high-lighted the need for such legislation. The copyright violation lawsuit against Google, however, remains. I fear that the settlement has poisoned the air for a fair use decision. We've seen the sausage being made, and it will be harder than ever to approach this project with an open and fair mind.

What can be done? Well, in France, when faced with a take-over of their cultural heritage by Google (their words, not mine), the government responded by giving libraries a large sum so that they can do the digitizing themselves; a kind of "by the people, for the people" digitization project. Is it too much to hope that could happen here?