Google Announces Major Digitization Project

Buzz abounds about Google’s announcement of special agreements made with five leading research libraries to digitize their collections. An Reporter, Mistress Siobhan O'Neill, offers this in-depth look at Google's library initiative.

For the serious or novice SCA researcher, this is incredibly exciting news. It is entirely possible that in the future, we will be able to directly access digitized images and full texts of rare materials we could only obtain in secondary resources, personal visits, or through Resource Sharing departments in libraries, if those materials even qualified for direct loan or photocopying of the appropriate sections.

The reputable libraries involved in this innovative project are the New York Public Library, University of Michigan Library, Stanford University Libraries, Harvard University Library, and Oxford University’s Bodleian Library. Early reports indicated this would be digitization of the libraries' entire collections, but this is not necessarily the case.

Copyright Issues Daunting but Surmountable

Initially the goal will be to digitize materials that are not covered by copyright restrictions (materials published prior to 1923 (U.S.) or 1920 (U.K.) with no copyright renewals). Full text and full images of these materials will be available.

University of Michigan Libraries are planning on digitizing their entire collection of approximately seven million volumes. This would include copyright-protected as well as public domain material. For materials under copyright, there would be very limited information available. Depending on the material, you would get either a paragraph or two from the source, with information on how to get it from the institution, or a bibliographic citation and a link to the university holding the information. As copyright permission is secured, more of the material could be made available, but that will take a long time to arrange, with hundreds of thousands of publishers and authors involved.

Sophisticated Search Features, Long-Term Horizon

A recent article in the Wall Street Journal (December 14, 2004, page D1) indicates in part:

Under the new program, consumers will be able to type key words into Google's main web search site just as they currently do. Links to portions of text from the library books will then show up in the results. The books-related results will be set apart at the top of the search-result page. When users click on a book-related result, they will see images of the relevant scanned pages with their search terms highlighted.

This article continues:

Starting [December 14, 2004], some of the books already digitized at the University of Michigan will go online. Piggybacking on a separate database, Google's service allows consumers to type in their ZIP Code and see if books that show up in their search results are owned by libraries near them.

Michael A. Keller, Stanford University's head librarian, said in a New York Times interview that since "within two decades, most of the world's knowledge will be digitized and available, one hopes for free reading on the Internet, just as there is free reading in libraries today."

The University of Oxford is planning to digitize more than one million of their copyright-free volumes in the Bodleian Library. The news announcement on the Library’s website indicates:

While the vast collections of unique, or especially rare, research materials in Oxford (manuscripts, archives, maps, and early printed books) are not included within the scope of the agreement with Google, the OULS 'Oxford Digital Library' initiative, which was launched in 2001, will continue with its in-house aim of digitizing as many as possible of the University's more 'high-value' library materials, on the basis of local demands and scholarly needs. But the ultimate objective is to ensure that these 'high-end' digital resources are made seamlessly searchable along with the many 'Google' copies of later printed materials, to provide Oxford library users with round-the-clock networked access to an electronic library of unparalleled quality and depth.

The New York Public Library is the only non-academic institution participating in this venture. Dr. Paul LeClerc, President of the NYPL, says in a press release that "the books that will be digitized in this pilot project were chosen based on three important criteria: they are all in the public domain, they are not too fragile to scan, and we know that they are of great interest to the public." The Google scanning operation will be located onsite at NYPL during the pilot project. LeClerc said the initiative is an important part of the library's mission of "making [the] collections democratically accessible to a global audience, free of charge." He added that the ditigization project would have been cost-prohibitive without Google's assistance.

The project had been kept very quiet until many of the initial details could be worked out. Don’t look for a wealth of results immediately; it will take a number of years for all the university library books to be scanned. The University of Michigan began scanning late this summer, and says only about 10,000 of its books have been scanned to date, and that it will take approximately six years for its entire collection to be digitized. Among the titles that have been scanned thus far: Darwin and After Darwin by George John Romanes, originally published in 1892; and The Return of the Middle Class by John Corbin.

Advancing the State of the Art in Digitization

Larry Page, one of Google’s co-founders, is an alumni of the University of Michigan. Page and another Google co-founder, Sergey Brin, were both doctoral graduates of Stanford University. Page indicated that U-M was already in the process of digitizing some of their materials, to the tune of five thousand books a year. In stark comparison, when Google’s digitization facilities are at full capacity, they will be scanning approximately five thousand books a day. Coincidentally, while a student at U-M, Page experimented with an early digitization project with some of the libraries’ materials.

Even current state-of-the-art commercial digitization facilities are not as gentle on these fragile materials as is the proprietary Google digitization process, which they are keeping secret. Current digitization options still require books to be laid wide open, putting strain on old spines and sewing. The Google version will take more hands-on care, but it does the digitization process at a much faster rate. Google plans on establishing digitization facilities in the vicinity of each participating institution, allowing the materials to stay fairly close to the owning libraries.

Libraries Worldwide Collaborate on Digital Archive

In another exciting development, this week the Library of Congress and a group of international libraries from the United States, Canada, Egypt, China and the Netherlands announced a plan to create a publicly available digital archive of one million books on the Internet. The group said it planned to have 70,000 volumes online by next April.

In a New York Times interview, Brewster Kahle, founder and president of the Internet Archive, a San Francisco-based digital library that is also trying to digitize existing print information, said, "Having the great libraries at your fingertips allows us to build on and create great works based on the work of others."

All of these are efforts to digitize as much of the world’s accumulated knowledge as possible and make it freely accessible to people all over the world. Potentially, such efforts could be as much a revolution in access to resources as was the advent of microfilming rare materials in past decades.

Some were concerned that this mass digitization and availability of resources could curb patron use of libraries, but this does not seem to be the trend thus far. Statistics have shown library use actually increased with the advent of the Internet, and librarians expect they will be tapped even more when these "new" patrons approach libraries wanting the physical volumes of materials they have seen. Librarians can share their professional searching knowledge to further tailor searches to meet patron needs.

Google is underwriting the entire cost of this project, with no costs to the libraries involved. In return for permission to digitize the collections, Google will give each participating institution a database of their digitized holdings, which these libraries can then load into their own catalogs.

Google’s agreements with these libraries are not exclusive, so other services in the digitization field are open to negotiating deals of their own with these and additional institutions. A great digitization race has begun, with Google making a commanding and authoritative start.

Related Links

For more hard news on this, see the following links:

Clicking on the "original article" link below will take you to the Google announcement.

This article is Copyright © 2004 by Judith Kirk, and may not be republished in print or online without her express permission. URL (web) links to this page may, however, be forwarded freely to email lists or published on web sites. The direct link to this page is (this URL will continue to work indefinitely).