Friday, August 24, 2007

Information architects v. librarians

I have discovered a key difference between information architects and librarians: information architects write books that you can find at a bookstore; librarians write books that you can only find (at best!) at the ALA store twice a year, assuming you attend ALA meetings.

There is meaning behind this statement beyond book distribution. It has to do with the insularity of the library world and our tendency to only speak to each other. It also reveals an underlying assumption that what we know and what we think isn't of interest to anyone outside of our profession. At least, I hope that's the reason, because another possibility, which would be even worse, would be that we don't think that anyone outside the library world is worth speaking to. That would be truly tragic.

Tuesday, August 14, 2007


It had been announced a while back that folks from a Danish standards body were proposing an ISO standard for an XML version of ISO 2709, which is the ISO standard for what we think of as MARC. I couldn't figure out at the time why an ISO standard was needed since we have MARCXML. I found the draft of the ISO standard (ISO/DIS 25577) online, and learned some important things.

To begin with, I have never seen a copy of ISO 2709, even though the standard is referenced in just about every document that relates to the MARC format. In fact, you often see references to "Z39.2, also known as ISO 2709." Z39.2 is available from the NISO web site, and is the basis for what those of us in the U.S. think of as MARC. So I assumed that ISO 2709 was essentially the same as Z39.2. It turns out that there are some differences that are evidenced in this new standard. They may just be differences in terminology, but here's what shows up in ISO 25577:
  • the "Leader" is called "record label" in ISO 2709
  • the "control fields" (those beginning with "00") are called identifier field and reference fields in ISO 2709
  • what we call "variable" fields in Z39.2 are called "data fields" in ISO 2709
I agree that these may be minor differences, but now I have to go back and try to fix the wikipedia article on ISO 2709. And I have no idea if there are other differences that didn't show up in this particular standards document. I am really annoyed -- no, more than annoyed -- that ISO standards are not open. (And if anyone wants to violate copyright and license and send me a copy of 2709, I will not tell anyone it was you.)

OK, over that hump, the MarcXchange (ISO 25577) is an XML format for ISO 2709. MARCXML is an XML format for MARC21. The difference is the ISO 25577 is much broader than MARCXML. Tags can be anything from 001 to 999 and 00A to ZZZ. And you can have up to nine indicators on a field.

The significance? Well, since you are creating records in XML, certain limitations in the ISO 2709 format do not exist (like field lengths). And you don't have the limitations of MARC21, like limiting tags to 000-999 or having exactly two indicators on every variable field. In this schema, you could create an instance that has no indicators on some fields, and the fields that have indicators wouldn't need to have the same number of them. Think of all of those fields where both indicators have been used and you'd like to add another one. (I don't have the schema in a machine-readable format, but it looks like indicators are limited to one character. I'd love to see that changed so you could have multi-character indicators -- hey, why not?)

No, I'm not advocating that we drop MARC21 for MarcXchange, but could we at least brainstorm on whether MarcXchange could help us out in expanding our bibliographic record where it's needed? No, you couldn't round-trip it, but eventually we have to move forward and quit circling back. Would something like this help us out?

Thursday, August 09, 2007

Wish list: ONIX records

There are things that I wish existed, but don't, so I'm going to start posting my wishlist here, one piece at a time. Some of these things might not be possible for various reasons, and some may already exist but I'm just not aware of them (but I hope you'll clue me in). For those that could be done, let's talk about how we could make them happen.

The first one that I'm posting is a desire for a database of available ONIX records.

A few years ago I looked at some ONIX records that were being created for e-books and I have to say that they were so poor as to be almost unusable. Recently I've been reviewing some ONIX records received at the Internet Archive for the OpenLibrary project. There are only about a half dozen publishers represented there, but it's obvious to me that they are producing useful data. The basic bibliographic data is there, plus there is data that fits into the "book promotion" realm: blurbs, author bios, subject categories. This is data that is sent to online booksellers and to bookstores. It would be useful for libraries and for anyone else keeping data about books. But I don't know of anyone who is aggregating it, much less making it public.

What we need is:
  • a database that receives ONIX feeds
  • that keeps the records up to date
  • that has a z39.50 capability and an API for retrieving data
  • that can output in a couple of different common formats
It seems that this could be a great companion to CoverThing, a project proposed by LibraryThing creator Tim Spaulding (and perhaps in the works?) In any case, it's like there's a bunch of bibliographic data that is being created and then flushed down the drain. Let's find a way to save it and use it. (And I sure hope the publishers feel this way, too.)

Wednesday, August 01, 2007

Deceptive Copyright Notices

I have often pointed out some of the deceptive copyright notices that libraries and archives put on materials, such as the many statements on digitized public domain materials that tell users that they cannot make copies of the digital file without the permission of the holding library. (Yes, there is debate as to whether that constitutes a license and its agreement, but let's not go there for the moment.) I also have some wonderful examples of real-life copyright notices that are questionable at best, such as this notice which appears on the back cover of a... blank book:

Now an organization called the Computer and Communications Industry Association (CCIA) has filed a complaint with the FTC stating that NFL, NBC, DreamWorks, Harcourt, and others, are misrepresenting the rights of consumers through their copyright notices. They do have some delightfully egregious examples in their document, and the web site allows you to view the video-related ones, such as the NFL's statement that any "account of the game without permission is prohibited." Wonderfully, they posted those clips on YouTube. Included in the complaint are those "FBI" warnings at the beginning of DVDs. There actually are aficionados of the FBI warning screens and their variations over time (blue phase, green phase) as well as numerous parodies like this one.

At the meeting on copyright at the University of Maryland, Fred von Lohmann of the EFF (whose talk was outstanding, and sadly is not available online even though it was webcast) showed a video with a modified FBI warning that says:

WARNING. Federal law allows citizens to reproduce, distribute, or exhibit portions of copyright motion pictures, video tapes, or video discs under certain circumstances without authorization of the copyright holder. This infringement of copyright is called "Fair use" and is allowed for purposes of criticism, news reporting, teaching, and parody.

This perfectly conveys the message that seems to be sought by this complaint, which is to point out that the truth is very different from the messages that we see every day.

The complaint talks about the "chilling effect" of the false statements about copyright. I think there's also a numbing effect -- the ridiculousness of the claims means that we just ignore them all, and leads folks to see copyright itself as ridiculous.

The complaint's "Request for Relief" is mainly a call for the FTC to make the companies stop making these false and misleading statements about user rights. Like the "punishment" meted out to the tobacco companies, the complaint also calls for the offending companies to be required to engage in some honest consumer education about copyright. (Are pigs flying yet?) There's another relief requested that I probably shouldn't point out because I suspect it's the real payoff:

Order the Rights-holder Corporations to forebear from attempting to force consumers into waiving their rights through contractual instruments, including contracts of adhesion.
The FTC action would only be against the companies named in the complaint, but if it were to become common practice, libraries and archives would be among those who have to clean up their act when it comes to statements about user rights.

So who or what is this CCIA? The list of members includes Google, Oracle, Microsoft, Sun, Fujitsu, Intuit, and many others. I admit that confuses me -- these are not the organizations that I would expect to engage in a campaign of this nature. The web site claims that the organization has existed for three decades, and it appears to be primarily a lobbying organization for "policy and legislation" on Capitol Hill. That part makes sense, but I'm baffled by their campaign for public rights. If they are trustworthy in this endeavor, I would like to see them prevail. But there's that "if" that nags at me.