Tuesday, May 31, 2011

All the ____ in the world

"All the ___ in the world"
"Every ____ ever created"
"World's largest ____ "
"Repository of all knowledge in ____"

There's something compelling about completeness, about the idea that you could gather ALL of something, anything, together into a single system or database or even, as in the ancient library of Alexandria, physical space. Perhaps it's because we want the satisfaction of being finished. Perhaps it's something primitive in our brain stems that has the evolutionary advantage of keeping us from declaring victory with a job half done. (Well, at least some of us.) To be sure, setting your goal to gather all of something means you don't have to make awkward choices about what to gather/keep and what to discard. The indiscriminate everything may be the easier target.

Worldcat has 229,322,364 bibliographic records.
OpenLibrary has over 20 million records and 1.7 million fulltext books.
LibraryThing has records for 6,102,788 unique works.
If you read one book a week for 60 years, you will have read 3,120 books. If you read one book a day for that same length of time, you will have read 21,360 (not counting leap years).
The trick, obviously, is to discover the set of books, articles, etc., that will enhance your brief time on this planet. To do this, we search in these large databases. By having such large databases to search we are increasing our odds of finding everything in the world about our topic. Of course, we probably do not want everything in the world about our topic, we want the right books (articles, etc.) for us.

There are some down sides to this everything approach, not surprisingly. The first is that any search in a large database retrieves an unwieldy, if not unusable, large set of stuff. For this reason, many user interfaces give us ways to reduce the set using additional searches, often in the form of facets. Yet even then one is likely to be overwhelmed.

Everything includes key works and the odd bits and pieces of dubious repute and utility. Retrieving everything places a great burden on the user to sort out the wheat from the chaff. This is especially difficult when you are investigating an area where you are not an expert. Ranking may highlight the most popular items but those may not be what you are seeking. In fact, they may be items that you have retrieved before, even multiple times, because every search begins with a tabula rasa.

Another down side is that although computers are more powerful than ever and storage space is inexpensive, these large databases tend to collapse under the demands of just a few complex queries. Because of this, what users can and cannot do is controlled by the user interface which serves to protect the system by steering users to safe functions. Users often can create their own lists, can add tags, can make changes to the underlying data, but they cannot reorder the retrieved set by an arbitrary data element, they can't compare their retrieved set against items they have already saved or seen previously, they can't run analyses like topic maps on their retrieved set to better understand what is there.

I conclude, therefore, that what would be useful would be to treat these large databases as warehouses or raw materials, and provide software that allow users to select from these to create a personal database. This personal database software would resemble, ta da!, Vannevar Bush's Memex, a combination database and information use system. I can see it having components that are analogous to some systems we already have:
The personal database would be able to interact with the world of raw material and with other databases. I can imagine functions like: "get me all of the books and articles from this item's bibliography." Or: "compare my library to The Definitive Bibliography of [some topic]." Or: "Check my library and tell me if there are new editions to any of my books." In other words, it's not enough to search and get; in fact, searching and getting should be the least of what we are able to do.

There are a whole lot of resource management functions that a student or researcher could find useful because within a selected set there is still much to discover. These smaller, personal databases should also be able to interact with each other, doing comparisons and cross-database queries. We should be able to make notes and create relationships and share them (a Memex feature). The personal database should be associated with person, not a particular library or institution, and must work across institutions and services. I can't imagine what it must be like today to graduate and to lose not only the privileged access that members of institutions enjoy but also the entire personal space that one has created while attached to that institution.

In short, it's not about the STUFF, it's about the services. It doesn't matter how much STUFF you have it's what people can DO with it. Verb, not noun. Quality not quantity.

3 comments:

Joseph J. Esposito said...

This is perfectly on target. And I am pleased to learn that the digital age is 42.

Alain Pierrot said...

"Everything", but "more than someone/anybody else" should also be considered — with the same scope and ranking effects.

Thanks for — once more — a very useful and intuitive post.

Anonymous said...

But without the stuff, you can't have services. I submit they are both important.