Sunday, December 18, 2016

Transparency of judgment

The Guardian, and others, have discovered that when querying Google for "did the Holocaust really happen", the top response is a Holocaust denier site. They mistakenly think that the solution is to lower the ranking of that site.

The real solution, however, is different. It begins with the very concept of the "top site" from searches. What does "top site" really mean? It means something like "the site most often pointed to by other sites that are most often pointed to." It means "popular" -- but by an unexamined measure. Google's algorithm doesn't distinguish fact from fiction, or scientific from nutty, or even academically viable from warm and fuzzy. Fan sites compete with the list of publications of a Nobel prize-winning physicist. Well, except that they probably don't, because it would be odd for the same search terms to pull up both, but nothing in the ranking itself makes that distinction.

The primary problem with Google's result, however, is that it hides the relationships that the algorithm itself uses in the ranking. You get something ranked #1 but you have no idea how Google arrived at that ranking; that's a trade secret. By not giving the user any information on what lies behind the ranking of that specific page you eliminate the user's possibility to make an informed judgment about the source. This informed judgment is not only about the inherent quality of the information in the ranked site, but also about its position in the complex social interactions surrounding knowledge creation itself.

This is true not only for Holocaust denial but every single site on the web. It is also true for every document that is on library shelves or servers. It is not sufficient to look at any cultural artifact as an isolated case, because there are no isolated cases. It is all about context, and the threads of history and thought that surround the thoughts presented in the document.

There is an interesting project of the Wikimedia Foundation called "Wikicite." The goal of that project is to make sure that specific facts culled from Wikipedia into the Wikidata project all have citations that support the facts. If you've done any work on Wikipedia you know that all statements of fact in all articles must come from reliable third-party sources. These citations allow one to discover the background for the information in Wikipedia, and to use that to decide for oneself if the information in the article is reliable, and also to know what points of view are represented. A map of the data that leads to a web site's ranking on Google would serve a similar function.

Another interesting project is CITO, the Citation Typing Ontology. This is aimed at scholarly works, and it is a vocabulary that would allow authors to do more than just cite a work - they could give a more specific meaning to the citation, such as "disputes", "extends", "gives support to". A citation index could then categorize citations so that you could see who are the deniers of the deniers as well as the supporters, rather than just counting citations. This brings us a small step, but a step, closer to a knowledge map.

All judgments of importance or even relative position of information sources must be transparent. Anything else denies the value of careful thinking about our world. Google counts pages and pretends not to be passing judgment on information, but they operate under a false flag of neutrality that protects their bottom line. The rest of us need to do better.

Tuesday, December 13, 2016

All the (good) books

I have just re-read Ray Bradbury's Fahrenheit 451, and it was better than I had remembered. It holds up very well for a book first published in 1953. I was reading it as an example of book worship, as part of my investigation into examples of an irrational love of books. What became clear, however, is that this book does not describe an indiscriminate love, not at all.

I took note of the authors and the individual books that are actually mentioned in Fahrenheit 451. Here they are (hopefully a complete list):

Authors: Dante, Swift, Marcus Aurelius, Shakespeare, Plato, Milton, Sophocles, Thomas Hardy, Ortega y Gasset, Schweitzer, Einstein, Darwin, Gandhi, Guatama Buddha, Confucius, Thomas Love Peacock, Thomas Jefferson, Lincoln, Tom Paine, Machiavelli, Christ, Bertrand Russell.
Books: Little Black Sambo, Uncle Tom's Cabin, the Bible, Walden
I suspect that by the criteria with which Bradbury chose his authors, he himself, merely an author of popular science fiction, would not have made his own list. Of the books, the first two were used to illustrate books that offended.
"Don't step on the toes of the dog lovers, the cat lovers, doctors, lawyers, merchants, chiefs, Mormons, Baptists, Unitarians, second-generation Chinese, Swedes, Italians, Germans, Texans, Brooklynites, Irishmen, people from Oregon or Mexico."
"Colored people don't like Little Black Sambo. Burn it. White people don't feel good about Uncle Tom's Cabin. Burn it. Someone's written a book on tobacco and cancers of the lungs? The cigarette people are weeping? Burn the book. Serenity, Montag."
The other two were examples of books that were being preserved.

Bradbury was a bit of a social curmudgeon, and in terms of books decidedly a traditionalist. He decried the dumbing down of American culture, with digests of books (perhaps prompted by the Reader's Digest brand, which began in 1950), then "digest-digests, digest-digest-digests," then with books being reduced to one or two sentences, and television keeping people occupied but without any perceptible content. (Although he pre-invents a number of recognizable modern technologies, such as earbuds, he fails to anticipate the popular of writers like George R. R. Martin and other writers of brick-sized tomes.)

Fahrenheit 451 is not a worship of books, but of their role in preserving a certain culture. The "book-people" who each had memorized a book or a chapter hoped to see those become the canon once the new "dark ages" had ended. This was not a preservation of all books but of a small selection of books. That is, of course, exactly what happened in the original dark ages, although the potential corpus then was much smaller: only those texts that had been carefully copied and preserved, and in small numbers, were available for distribution once printing technology became available. Those manuscripts were converted to printed texts, and the light came back on in Europe, albeit with some dark corners un-illuminated where texts had been lost.

Another interesting author on the topic of preservation, but less well-known, is Louis-Sébastien Mercier, writing in 1772 in his utopian novel of the future, Memoirs of the Year Two Thousand Five Hundred.*  In his book he visits the King's Library in that year to find that there is only a small cabinet holding the entire book collection. He asks the librarians whether some great fire had destroyed the books, but they answered instead that it was a conscious selection.
"Nothing leads the mind farther astray than bad books; for the first notions being adopted without attention, the second become precipitate conclusion; and men thus go on from prejudice to prejudice, and from error to error. What remained for us to do, but to rebuild the structure of human knowledge?" (v. 2, p. 5)
The selection criteria eliminated commentaries ("works of envy or ignorance") but kept original works of discovery or philosophy. These people also saw a virtue in abridging works to save the time of the reader. Not all works that we would consider "classics" were retained:
"In the second division, appropriated to the Latin authors, I found Virgil, Pliny, and Titus Livy entire; but they had burned Lucretius, except some poetic passages, because his physics they found false, and his morals dangerous." (v. 2, p.9)
In this case, books are selectively burned because they are considered inferior, a waste of the reader's time or tending to lead one in a less than moral direction. Although Mercier doesn't say so, he is implying a problem of information overload.

In Bradbury's book the goal was to empty the minds of the population, make them passive, not thinking. Mercier's world was gathering all of the best of human knowledge, perhaps even re-writing it, as Paul Otlet proposed. (More on him in a moment.) Mercier's year 2500 world eliminated all the works of commentary on other works, treating them like unimportant rantings on today's social networks. Bradbury also did not mention secondary sources; he names no authors of history (although we don't know how he thought of Bertrand Russell, as philosopher or also a historian) or works of literary criticism.

Both Bradbury and Mercier would be considered well-read. But we are all like the blind men and the elephant. We all operate based on the information we have. Bradbury and Mercier each had very different minds because they had been informed by what they had read. For the mind it is "you are what you see and read." Mercier could not have named Thoreau and Bradbury did not mention any French philosophers. Had they each saved a segment of the written output of history their choices would have been very different with little overlap, although they both explicitly retain Shakespeare. Their goals, however, run in parallel, and in both cases the goal is to preserve those works that merit preserving so that they can be read now and in the future.  

In another approach to culling the mass of books and other papers, Kurt Vonnegut, in his absurdist manner, addressed the problem as one of information overload:
"In the year Ten Million, according to Koradubian, there would be a tremendous house-cleaning. All records relating to the period between the death of Christ and the year One Million A.D. would be hauled to dumps and burned. This would be done, said Koradubian, because museums and archives would be crowding the living right off the earth. 
The million-year period to which the burned junk related would be summed up in history books in one sentence, according to Koradubian: Following the death of Jesus Christ, there was a period of readjustment that lasted for approximately one million years." (Sirens of Titan, p. 46)
While one hears often about a passion for books, some disciplines rely on other types of publications, such as journal articles and conference papers. The passion for books rarely includes these except occasionally by mistake, such as the bound journals that were scanned by Google in its wholesale digitization of library shelves, and the aficionados of non-books are generally limited to specific forms, such as comic books. In the late 19th and early 20th century, Belgian Paul Otlet, a fascinating obsessive whose lifetime and interests coincided with that our own homegrown bibliographic obsessive, Melvil Dewey, began work leading to his creation of what was intended to be a universal bibliography that included both books and journal articles, as well as other publications. Otlet's project was aimed at all knowledge, not just that contained in books, and his organization solicited books and journals from European and North American learned societies, especially those operating in scientific areas. As befits a project with the grandiose goal of cataloging all of the world's information, Otlet named it the Mundaneum. Otlet represents another selection criterion, because his Mundaneum appears to have been limited to academic materials and serious works; at the least, there is no mention of fiction or poetry in what I have read on the topic.

Among Otlet's goals was to pull out information buried in books and bring related bits of information together. He called the result of this a Biblion. This Biblion sounds somewhat related to the abridgments and re-gatherings of information that Mercier describes in his book. It also sounds like what motivated the early encyclopedists. To Otlet, the book format was a barrier, since his goal was not the preservation of the volumes themselves, but was to be a centralized knowledge base.

So now we have a range of book preservation goals, from all the books to all the good books, and then to the useful information in books. Within the latter two we see that each selection represents a fairly limited viewpoint that would result in a loss of a large number of the books and other materials that are held in research libraries today. For those of us in libraries and archives, the need is to optimize quality without being arbitrary, and at the same time to serve a broad intellectual and creative base. We won't be as perfect as Otlet or as strict the librarians in the year 2500, but hopefully our preservation practices will be more predictable than the individual choices made by Bradbury's "human books."

* In the original French, the title referred to the year 2440 ("L'An 2440, rêve s'il en fut jamais"). I have no idea why it was rounded up to 2500 in the English translation.

Works cited or used

Bradbury, Ray. Fahrenheit 451. New York: Ballantine, 1953

Mercier, Louis-Sébastien. Memoirs of the year two thousand five hundred, London, Printed for G. Robinson, 1772 (HathiTrust copy)

Vonnegut, Kurt. The Sirens of Titan. New York: Dial Press Trade Paperbacks, 2006

 Wright, Alex. Cataloging the World: Paul Otlet and the Birth of the Information Age. New York, NY : Oxford University Press, 2014.