Thursday, March 26, 2009

LC discovers infinity

If you were at ALA Midwinter in Denver (January, 2009) you may have been in one of the meetings where the Library of Congress announced its intention to atone for the fiasco. In case you missed that, Ed Summers of LC created an online version of the Library of Congress Subject Heading authority records, re-organized as a SKOS vocabulary and available for linking on the open Web. After being available for about six months (beginning in May of 2008), Ed was asked by his employer to take down the site on December 18, 2008. This was in spite of the fact that the data had been out there long enough to have a number of users, and that the removal broke existing systems that had developed around the data.

[Note, has been re-born as, hosted by Talis.]

The outcry in the community was strong, including a reply to Ed's blog post by Sir Web himself, Tim Berners-Lee. Library of Congress must have been suitably embarassed.

Thus the announcement at Midwinter that LC not only understands the value of linked open access to LCSH, but that all of the vocabularies managed by LC -- from the name authorities to the lists of document types, languages, locations, etc., -- need to be openly available in a format suitable for inclusion in Web services. LC has created a web site to host these vocabularies: On that site they say:
Initially, within 6 to 8 weeks, the Library of Congress will release its first offering: the Library of Congress Subject Headings. This will be an almost verbatim re-release of the system and content once found at the popular prototype service.
They also say:
We aim to make resources available on this site within 6-8 weeks. Check this site regularly for more updates as we continue to develop this service!
The page is dated 1/22/09. My calculations show that 9 weeks have passed. OK, that's only one week over their stated deadline. But nothing on the page has changed. No resources have been made available. An "almost verbatim" release of should not be too hard given that Ed had code written that he has made publicly available.

But even today, the promised service is 6-8 weeks away. It may stay that way for a long time. Maybe even forever.

Why does this matter? It matters because the availability of these vocabularies is essential for the library world to move forward. Some of us have been asking LC to put the vocabularies online in a machine-actionable format for a very long time. The Dublin Core community worked with LC to create a machine-actionable and URI-identified version of the MARC role terms as early as 2005. You can't find this linked from any of the MARC documentation. Some of us brought up the topic ad nauseum at MARBI meetings, but to no avail. Now LC seems to have "gotten it" conceptually but they have yet to show us that they can deliver.

I may seem to be undeservedly impatient on this score, but it's not that we have been waiting for this for 9 weeks: we've been waiting for years. And quite honestly, this is not rocket science, nor does LC have no guidance for how to manage this data. In fact, they could use the NSDL Metadata Registry, or, if they insist on hosting this themselves, the Registry's source code is available. Quite frankly, if LC does not prove to us soon that it can perform this necessary function, I feel that we are quite justified in going forward without them, registering the vocabularies where they can be used and managed by anyone who needs them, and going forward with a transformation of library data that will meet 21st century needs.

Friday, March 06, 2009

Un-uniform titles

The Open Library will soon be revealing its first attempt to bring together all of the many published books that represent the same work. It's been a fascinating exercise; sometimes very satisfying and other times terribly frustrating. If I hadn't already been convinced that we will need to change our data practices if we want to implement FRBR, this experience would have convinced me.

One of the problems that we ran into was one that Thom Hickey at OCLC had already reported in his blog post: that uniform titles (MARC 240) are both necessary for the identification of works, and a hindrance. The uniform title, which is being called the 'work title' in RDA (see Chapter 6) actually serves two (possibly three) different functions, and unfortunately this is not being fixed in RDA.

The first function of the work title is to bring together the different expressions of a work. This is mostly obvious for works that have been issued with different titles (the various Hamlets over time) and for works that have been translated (which also includes Hamlet). In this case, the work gets a 'work title' that is unifying, and this work title helps create work views in bibliographic databases.
Shakespeare, William, 1564-1616
The tragedy of Hamlet Prince of Denmark, as is now acted by Her majesties Servants.

Shakespeare, William, 1564-1616
The tragicall historie of Hamlet, Prince of Denmarke.

Shakespeare, William, 1564-1616
William Shakespeare's Hamlet, Prince of Denmark.
The second function performed by the uniform title field is to give works a collective title. These are titles like 'Essays' or 'Works.' This is a title given to a grouping of works, not a single work. It's kind of a superset of works, and the same work title can be given to a different selection of an author's works. This uniform title does not help gather and display the FRBR work level, and in particular it isn't useful for user displays because the grouping title is so broad and vague. It probably would be useful as a genre for retrieval, but it's not great as an organization for works. In particular, you wouldn't want to present these to users as the same work:
Bacon, Francis, 1561-1626.
The essayes or counsels, ciuill and morall, of Francis Lo. Verulam, Viscount St. Alban

Bacon, Francis, 1561-1626.
The essaies of Sr Francis Bacon knight, the Kings Atturney Generall. His Religious meditations. Places of perswasion and disswasion. Seene and allowed
These are of the genre essays, and a genre data element is commonly used as a facet in systems that have that functionality. But the genre should not be confused with the work title, as it is here.

The function that may or may not be a third function relates to the additions to the uniform title, which really should be handled elsewhere in the record. Thus:
Hamlet. French
Hamlet. German
Hamlet. Italian
Languages and dates get all mixed in with the title of the work when the title is a 'heading' in the bibliographic record. Like the use of the uniform title for genre, systems today can provide this kind of organization, if it is desired, from data in the record, and can use it for a variety of purposes such as selection or grouping. There is absolutely no need to tack this data onto the work title now that records are no longer being placed in linear catalogs.

Note that I'm aware that I haven't expounded on the uses of uniform titles in music cataloging. The uniform titles in music cataloging are fascinating constructs for the arrangement of musical pieces, but they are not work titles. I haven't had any experience trying to create work views of music, however I'm sure that's a very interesting problem; one that I hope someone else will solve and share with the rest of us.

We need a work title if we are to follow the bibliographic concepts in FRBR. One of the big problems with the data we create today is that so many data elements are performing multiple functions that may be clear to humans but aren't coded in such a way to be clear for machine processing. That this same mistake is being made in RDA, which is supposed to be based on FRBR, shows that we still aren't designing our data for machine processing. In this day and age, that is pretty sad.