Digitization and the Catalog

I have just posted the preprint of my current column for the Journal of Academic Librarianship, titled "Mass Digitization of Books." It takes about 4-6 months for the columns to be published, and as I read over this one I can see that things have already changed. For example, when I wrote the column, Google was not yet allowing the download of its public domain books.

However, I should have included one more very important issue in the article, but it hadn't occurred to me at the time: the effect of this mass digitization on our catalogs. The cataloging rules require that the digital copy be represented in the catalog with its own record. This means that a library that undergoes a mass digitization project on its book collection faces doubling the number of book records in its catalog. Leaving aside the issues of user display for now, and assuming that the creation of the records requires very little human intervention, we can probably still calculate a significant cost in storage space (albeit cheap these days), the size of backups, the time to load and index all of those records, and a general overhead in the underlying database.

This brings up the issue of creating catalog entries that represent "multiple versions," that is, having a single record that contains the information for all of the different formats in which the book is available -- regular print, e-book version, digitized copy, large print. There are good arguments both for and against, and it's a complex discussion, but I'll just say that I am convinced that we could structure our catalog records in a way that would make this work.

Section 108, oh my!

The Library of Congress Study Group on Section 108 (of Title 17, the US copyright law) has issued a "notice of a public roundtable with a request for comments" in the ever-popular Federal Register. Which we all read daily, right? (I checked - no RSS feed that I could find, thank you very much.)

I have only read through the section on Topic A (there's also a Topic B), but I don't think I can go any further. This is about the worst mish-mash I have ever seen. If this is intended to clarify things, we are in deep doo-doo. (Believe me, I'm trying hard not to sound any less professional than that.)

OK, first, Section 108 is the section of the US copyright law with exceptions for libraries. In essence, section 108 allows libraries to make copies of items that are still under copyright in certain prescribed cases. Library of Congress formed a group to study Section 108 and make recommendations on how to update it for the digital environment. The group has been meeting, behind closed doors, for over a year. The group consists of lawyers, librarians, publishers, and lawyers. Oh, I said that, didn't I? They have held public meetings and have issued a document outlining what they see as the issues. This most recent call is proof that the study group is getting absolutely nowhere.

The subsections of Section 108 under question in this "notice" are the two that allow copying for lending, both within the library and over interlibrary loan. Because the study group's meetings are not open to the public, and because this is a highly political issue, the notice asks many questions that are suspiciously leading but there is no clue as to WHOSE issue it is. There also isn't much to explain the assumptions about technology that are behind some of the questions, so I often find myself unable to understand WHY a certain question is being asked.

That said, here are some examples of what I think are very strange statements and questions:

  • There is a great deal of concern about users receiving a copy of an item from a library through Interlibrary Loan without going through their own library. In other words, direct user borrowing. This violates what someone sees as the "natural friction" of ILL:

    it was presumed that users had to go to their local library to make an interlibrary loan request. ... for any user electronically to request free copies from any library from their desks, that natural friction would break down, as would the balance originally struck by the provision.

    Now this is just weird. Essentially they are implying that ILL was ok, even digital delivery, as long as it was inefficient and costly. If it becomes efficient, then it's just too much, and competes with sales. (I don't really see a difference between a user sending a request to their own library for an ILL rather than directly to the lending library -- except for the cost to the local library to pass the request through. And if that becomes efficient enough, the user won't even know how many middle-men there are in her request.)

  • Question 1:

    How can copyright law better facilitate the ability of libraries and archives to make copies for users in the digital environment without unduly interfering with the interests of rightsholders?

    What? Isn't this exactly what the study group has been discussing for 18 months? Now they put out a public notice asking the rest of us to answer the question? Haven't they at least worked it out to a set of choices or options? What have they been doing?

  • Question 3 (and Question 4 is very similar)

    How prevalent is library and archives use of subsection (d) for direct copies for their own users? For interlibrary loan copies? How would usage be affected if digital reproduction and/or delivery were explicitly permitted?

    Uh, isn't this something that someone should study? I mean, this is not something you ask people's (even educated) opinions on -- you've got to get facts and figures. It would be very interesting to know how much digital copying and delivery does go on in libraries. Without that information, we're just jabbering into the wind here, aren't we?

  • Question 5
    ... should there any any conditions on digital distribution that would prevent users from further copying or distributing the materials for downstream use?

    Well, there are conditions, and they are called copyright law. And of course they deter more than they prevent, but this really seems to be a silly question.
    Should persistent identifiers on digital copies be required?

    I wonder what they think that identifiers will accomplish? Do they see them as acting like watermarks, that would identify whose digital copy it is?

  • Question 7
    Should subsections (d) and (e) be amended to clarify that interlibrary loan transactions of digital copies require the mediation of a library or archives on both ends, and to not permit direct electronic requests from, and/or delivery to, the user from another library or archives?

OK, I'll stop here. As I have said, these statements and questions are so odd that I have no idea what happened in that closed room but it was weird.
The keyboard

I spend a lot of time each day "working the keyboard." It's easy to take it for granted; I learned to touch type in junior high school when the ability to type with speed and accuracy was part of a common job description. Little did we know at the time that we were heading into a future when everyone typed, and that typing would no longer be considered a special skill. (Nor would it be considered something "girly".)

There has been some questioning of the keyboard in the form of criticism of the QWERTY design. I tried switching to a Dvorak keyboard for a while, but didn't have the patience to work up to an approximation of the unconscious ease with which I type today. Recent ads I've seen are touting voice recognition as the replacement for typing, but I don't want to say all of my thoughts out loud, and in most offices with open or cubicled designs voice recognition would lead to cacophony. No, I'm happy to type, I just want it to be more efficient.

What I haven't seen questioned, yet it must have occurred to someone, is why we are still typing every letter when software could fill in or complete most words for us. Remember the ads that used to be on the back of magazines: "if u cn rd ths u cn gt a gd jb"? That's how I'd like to type. Yes, I can add those into my MS Word autocorrect, and I have placed a select number of long words I hate to type into the list. But we know that our language is very predictable and we should be able to take advantage of that. There are interesting IM keyboard options like T9 Word -- although obviously, the IM vocabulary doesn't need a large dictionary behind it. Open Office tries to help out by auto-completing words as you type, but this is useless for a touch typist because you have to 1) watch the screen (I often type while staring into space) and 2) take your fingers off their normal home row positions to hit the enter key. The Open Office method might work with a re-organized keyboard with a special key that means "go for it" when the screen shows the correct word, but I still think that would be slower than touch typing.

A neighbor of mine is a court reporter. She has the chorded court reporter "typewriter" which today hooks into a computer that auto-translates from the shorthand coming out of the device to words. The output isn't perfect, but it's good enough to be used in a courtroom in real time to feed the text to lawyers. That shows me that it can be done. Yes, of course, we'd all have to learn something new. But upcoming generations would benefit from a better solution to getting words onto a screen.