Friday, November 23, 2012

Fair Use(-ful)

The beauty and the aggravation of Fair Use in US copyright law is that one cannot pre-define particular uses as "fair." The countries that have, instead, the legal concept of "Fair Dealing" have an enumerated set of uses that are considered fair, although there is obviously still some need for interpretation. The advantage to Fair Use is that it can be re-interpreted with the times without the need for modification of the law. As new technologies come along, such as digitization of previously analog works, courts can make a decision based on the same four factors that have been used for earlier technologies. However, until such a decision is made in a court of law, it isn't possible to be sure whether a use is fair or not.

We have recently seen a court case that decided that HathiTrust's use of digitized books to provide an index to those books is fair. There is another court case that will decide a similar question regarding Google's digitization of books for its Google Book Search. Note, however, that even if both of these are determined to be fair use, each is a particular situation in a particular context. Both organizations have developed their services in an attempt to meet what they judged to be the letter of the law, and yet there is a considerable difference in the services they provide.

HathiTrust stores copies of digitized books from the collections of member libraries. In this case, HT is not itself doing the digitization but is storing files for books mostly digitized by Google. A search in the full text database of OCR'd page images returns, for in-copyright items, the page numbers on which the terms were found, and the number of hits found on each page. There are no snippets and no view of the text unless the text itself is deemed to be out of copyright.

Google has a different approach. To begin with, Google has performed mass digitization of books (estimated at about 20 million) without first obtaining permission from rights holders. So the Google case includes the act of digitization, whereas the HathiTrust case begins with digital files obtained from Google. Therefore the act of digitizing was not a factor in that case. In terms of use of the digitized works, Google also provides keyword searching of the OCR'd digital images, but takes a different approach to the results viewable by the searchers. Google provides short (about 3-5 lines) snippets that show the search terms in context on a page.
Google, however, places specific restrictions to avoid letting users "game" the search to gain access to enough of the text to substitute for actually acquiring access to the book. Here is how Google describes this in its recent legal response:
"The information that appears in Google Books does not substitute for reading the book. Google displays no more than three snippets from a book in response to a search query, even if the search term appears many times in the book. ... Google also prevents users from view a full page, or even several contiguous snippets, by displaying only one snippet per page in response to a given search and by 'blacking' (i.e. making unable for snippet view in response to any search) at least one snippet per page and one out of ten pages in a book." p.8
Google also exempts some types of books, like reference works, cookbooks, and poetry, from snippet display entirely.

The differences in the results returned by these two services reflect the differences in their contexts and their goals. HathiTrust has member institutions and their authorized users. The collection within HathiTrust reflects the holdings of the member institutions' libraries which means that the authorized users should have access, either in their library or through inter-library loan, to the physical book that was scanned. The HathiTrust full text is a search on the members' "stuff." The decision to give only page numbers makes some sense in this context, although providing snippets to scholars might have been acceptable to the judge. The return of page numbers and full word counts within pages reflects, IMO, the interest in quantitative analysis of term use. It also gives scholars some idea of the weight the term has within the text.

Google's situation is different. Google has no institutions, no members, no libraries; it provides its service to the general public (at least to the US public). There is no reason to assume that all of the members of that public will have access to the hard copy of any particular digitized book. Google seems to have decided that promoting its service as having primarily a marketing function, with the snippets as "teasers," would mollify the various intellectual property owners. In its brief of November 9, Google reiterates that it does not put advertising on the Google Book Search results pages, nor does Google make any money off of its referrals to book purchasing sites.

So here are two organizations that have bent over backwards to stay within what they deemed to be the boundaries of fair use, and they have done so in significantly different ways. This means that the fair use determination of each of these could have different outcomes, and each will provide different clues as to how fair use is viewed for digitized works.

It of course bears mentioning that both of these solutions provide hurdles for users. The HathiTrust user who is searching on a term that could have more than one meaning ("iron" "dive" "foot") does not have any context to help her understand if the results are relevant. The Google user, on the other hand, gets some context but cannot see all of the results and therefore does not know if there are key retrievals among those that have been blocked algorithmically. A use that is "fair" within copyright law may not seem "fair" to the user who is doing research. It makes you wonder if our idea of "fair use" couldn't be extended to be fair but also "useful."

Related posts

Thursday, November 01, 2012

Turing's Cathedral, or Women Disappear

"She features significantly in computing historian George Dyson's book, Turing's Cathedral: The Origins of the Digital Universe, ISBN 978-0375422775."
From the Wikipedia article for Klara Dan von Neumann

Unfortunately, she features significantly mainly as von Neumann's wife, even though she also was "a pioneer computer programmer," as per the Wikipedia article. In fact, of the 35 women whose names are in the book's index, 24 are in the book as wives, including Klara. Klara is the only one who gets a full bio and a fair amount of ink. Much of the ink comes from her unfinished memoirs about her life as von Neumann's wife. She was also one of the primary programmers working on the ENIAC, and Dyson's book names her as one of the first three programmers, along with her husband, programming ENIAC. (p. 104). Her work, however, is described as "help," one of the ways that women's activities are diminished in importance (men "do", women "help"):
"'With the help of Klari von Neumann,' says Metropolis, 'plans were revised and completed and we undertook to implement them on the ENIAC...'" p. 194
Yet she obviously provided more than "help." In fact, she invented:
"'Your code was described and was impressive,' von Neumann wrote to Klari from Los Alamos, discussing whether a routine she had developed should be coded as software or hardwired into the machine. 'They claim now, however, that making one more, 'fixed,' function table is so little work, that they want to do it. It was decided that they will build one, with the order soldered in." (p. 195)
Of the other women mentioned, one is a secretary, the other the manager of the cafeteria. The saddest story is that of Bernetta Miller, the fifth licensed woman pilot in the US who was a demonstration pilot for an airplane company, volunteered for duty in WWI and was wounded, then became secretary to the directory of the Institute for Advanced Study in Princeton. In the Dyson book, she is mainly remembered for her memoranda about dining room accounting, and for being fired by Oppenheimer. (p. 91-92)

There are eight women, other than Klara, who are in the book in their professional positions. Three of them are mentioned in a single sentence as "computers," that is people (mainly women) who did the hard math by hand before the machine computers were up to the job. (see:, and I highly recommend the books by Grier in the bibliography if you wish to learn of the sophistication of methods that were developed by the "girls.")

One woman, Mina Rees, is named twice as someone who was written to:
"... Goldstine had written to Mina Rees of the Office of Naval Research." p. 147
"... 'The best change for a real undersatnding of protein chemistry lies in the x-ray diffraction field,' he wrote to Mina Rees at the Office of Naval Research." p.229
Later there is a quote from a report that states,
"... was informed by Dr. Mina Rees and Colonel Oscar Maier, representing the Office of Naval Research, and the Air Material Command, respectively..." p. 321
In themselves these quotes are not important, but this is one of the few professional women who gets mentioned in the book, and this is all that is said about her. Dr. Mina Rees was an amazing character: "She earned her doctorate in 1931 with a thesis on "Division algebras associated with an equation whose group has four generators," published in the American Journal of Mathematics, Vol 54 (Jan. 1932), 51-65. Her advisor was Leonard Dickson." (Wikipedia article) At the time of these references she was head of the Mathematics Department at the Office of Naval Research.

There are some other minor mentions, like one of Meg Ryan in a parenthetical sentence about a named location that was later used in a movie, and one woman mathematician who was named with two male mathematicians in a single sentence. These obviously are not major characters in the book, and the book is wide-ranging with everything from Aldus Huxley to George Washington, also not major characters.

The real mystery woman is Hedvig Selberg.
"'... says Atle Selberg, whose wife, Hedi was hired by von Neumann on September 29, 1950, and remained with the computer project until its termination in 1958.'" p. 152
Later we get a short bio of her: born in 1919 in Transylvania, graduated with a master's degree in mathematics at the head of her class, and was the only family member to survive Auschwitz. She came to the U.S. and was hired to work on the first computer project. She seems to have worked closely with a Martin Schwarzschild on a complex model of stellar evolution that related to the radiation effects of the bomb that was being designed. Schwarzschild went on to fame, as did Selberg's husband, a mathematician. Hedvig didn't even rate an obituary in the big newspapers (nor a Wikipedia article), although she is mentioned in her husband's obit where he first marries her, then she dies (in 1995) and he remarries.
"His first wife, Hedvig Liebermann, a researcher at the institute and Princeton's Plasma Physics Laboratory, died in 1995." (NYT Aug 17, 2007)
(Note: Mina Rees did get a NY Times obit. )

Among the other striking aspects of this treatment of women (and this book isn't by any means unusual in this respect) is that women tend not to exist until they marry a man of interest, and then suddenly they appear on the scene. Men, on the other hand, have parents and educations and often interesting stories that are told in the book, both as character building but also as bone fides. It is therefore a bit of a shock to learn in some aside that the wife has a PhD in "trans-sonic aerodynamics" as in the case with Kathleen Booth. (p. 133)

Admittedly the opportunities for women in science were very limited in the period being discussed in this book, the 1950's. However, the role of a historian is to go beyond the period's view of itself and tease out a deeper meaning from the privileged position of hind-sight. I have read other histories of computing that also failed to notice that there were women involved in the invention of this field, but this one has come out in 2012. Really, we didn't need another book on the topic written with male blinders. What a shame.

Suggested reading:
Noble, David F. The Religion of Technology : the Divinity of Man and the Spirit of Invention. 1st ed. New York: A.A. Knopf :, 1997.
Mozans, H. J. Woman in Science; with an Introductory Chapter on Woman’s Long Struggle for Things of the Mind,. New York,: D. Appleton and company, 1913.
Toole, Betty A., and Ada King Lovelace. Ada, the Enchantress of Numbers : a Selection from the Letters of Lord Byron’s Daughter and Her Description of the First Computer. 1st ed. Mill Valley, Calif.: Strawberry Press ;, 1992.
Grier, David Alan. When Computers Were Human. Princeton: Princeton University Press, 2005. Print.