The role of the world’s leading universities in the preservation of knowledge in the digital age

Research Center for Future Information Technology (FIT) Building
Tsinghua University
Beijing, China

Wednesday, December 07, 2016

Introduction

Thank you very much, Professor Yin, for that kind introduction.

It is a great pleasure to be back in Beijing and back at Tsinghua University.

This is my eighth visit to China as president of Indiana University and my 10th overall.

I was here in 2011 for Tsinghua University’s Global Education Conference, which was part of Tsinghua’s centennial. Incidentally, Indiana University will be celebrating its bicentennial in 2020, and we are planning similar celebrations.

IU China Global Gateway Office

And of course, in 2014, I was here to dedicate the IU China Global Gateway Office, which we opened in the CERNET Tower in Tsinghua’s Science Park with the help of Tsinghua’s leaders, including Vice President and Provost Professor YUAN Si and Professor WU. The IU China Global Gateway Office now supports scholarly research and teaching, conferences and workshops, study abroad programs, distance learning initiatives, alumni engagement events, and much more. The office has welcomed nearly 1,000 attendees to 35 events, including a dozen conferences. Just this morning, the office hosted a symposium on the work of the late Elinor Ostrom, the recipient of the 2009 Nobel Prize in Economics, and who was on the IU faculty for many years until her recent death. Professor Ostrom’s groundbreaking work on the governance of the commons has had a major influence around the world, including here in China, where Professor Ostrom made a number of visits. The IU China Global Gateway Office is also home to the Beijing office of IU’s Research Center for Chinese Politics and Business, led by IU Professor Joyce Man, who is also the academic director of the IU China Office.

IU and Tsinghua University: a longstanding, productive partnership

IU is also proud of its longstanding partnership with Tsinghua University.

The Indianapolis campus of Indiana University has had a partnership with Tsinghua’s Department of Automotive Engineering for nearly 20 years.

Beginning in 1997, a research partnership with Tsinghua’s Department of Automotive Engineering that focused on advanced hybrid and electric vehicles led to the establishment, in 2006, of the Transportation Active Safety Institute on our IUPUI campus in Indianapolis. TASI is a collaborative university, industry, and government consortium to facilitate research in advanced active safety systems and technologies as well as automated and autonomous vehicles. Faculty members from more than 10 departments and schools at IUPUI, IU Bloomington, and Purdue University are involved in TASI’s research activities. TASI has established academic partnerships with a number of leading academic automotive safety research centers in the U.S. and with Tsinghua. For the last 19 years, many faculty and students from Tsinghua have come to visit our school and research labs for short visits and extended stays for collaborative research.

IU’s Kelley School of Business and Tsinghua’s School of Management, are working together to establish a dual master of science degree in finance, which will further strengthen our partnership.

When I was here with a number of IU colleagues in 2014, we signed an agreement between IU’s Lilly Family School of Philanthropy—the first school of its kind in the world—and Tsinghua to help create Tsinghua’s Institute for Philanthropy, a research institute that studies the role of philanthropy and non-governmental organizations in China.

I am very pleased to be here again today to meet with the leaders, faculty, students, and staff of one of Indiana University’s strongest international academic partners.

Digitization and the preservation of knowledge

Today, I want to discuss what we are doing at Indiana University in the area of large-scale digitization and the preservation of knowledge. The knowledge produced by researchers in all fields at the world’s leading universities—whether in the sciences or the arts and humanities—plays a major role in economic competitiveness and in improving the lives of people around the world. But it is vitally important for this knowledge to be preserved so that it can be accessed, used, and improved upon.

For more than 25 centuries, the great universities of the world have always had three fundamental missions:

  • the creation of knowledge through research and innovation,
  • the dissemination of knowledge through education and learning, and
  • the preservation of knowledge.

We tend, these days, to mainly associate the first two of these missions with a university. However, the advent of the digital age, with the development of the Internet and the World Wide Web, is giving renewed and rapidly increasing focus to the importance of the third mission of a university—the preservation of knowledge. The digital age also allows us to think about it in completely new ways. And so, today, I want to focus on this mission in some detail. I will speak about a major initiative we have undertaken at Indiana University, but much of what I will talk about—particularly the importance of preservation of knowledge—is relevant to universities around the world.

Preservation was traditionally the role of libraries and museums

In the past, the preservation of knowledge—thought of in the broadest way to include not just material from books and journals, but collections of other objects—such as photos, paintings, prints, sculptures, cultural objects, sound recordings, video, film, and scientific data—was almost exclusively the responsibility of libraries and museums. It is this accumulated knowledge, in all its immensity and complexity, that provides the fundamental and essential foundation for the first two missions of a university—for research and for education. But access to this knowledge has often been place-dependent—in libraries and museums—and it has not been not broadly accessible or shareable.

The Internet, the Web and digitization have changed all that. Suddenly, all knowledge, taken to include all human artifacts, even in this broad sense, is in principle digitizable in at least some form, and can be made accessible, shared, and transmitted over the Internet. It is only limited by data storage and the data transfer capacity of networks.

Some of you may recall that when I was at the Australian National University, I worked closely with many colleagues in the region to establish the Asia Pacific Advanced Network (APAN), and then led the NSF-funded TransPAC initiative that connected the United States to APAN. These high-speed networks, including, of course, CERNET, as well as their successors, now enable universities’ vast and growing digital collections to be shared in unprecedented ways by scholars around the globe.

Vast amounts of accumulated material can be digitized

Thus, vast amounts of material at Indiana University, which had been patiently accumulated and curated over decades, can, again in principle, be made instantly and inexpensively available in digital form at any time, not only to students, scholars and scientists throughout IU, but across the country and around the world. The key is the orderly digitization of this material, which can be vast in scope. The digitization and accessibility over the Internet of this type of material is now essential throughout the academy. There is no academic area, from anthropology to zoology, that has not, to greater or lesser degree, become highly digital. Data is being generated, collected, processed, analyzed, visualized, and stored in digital form. Simulations and modeling are being carried out completely digitally. And the historical and contemporary archives of nearly all areas of scholarship, certainly the main material, have been converted fully into digital form.

All of Indiana University’s library, museum, and other collections also represent the investment, over many decades, of the people of the State of Indiana, the United States government, foundations, and businesses in research and scholarship at IU. Many generous donors have also entrusted vital and irreplaceable collections to IU. And the new vast amounts of born-digital data being generated today represent their continuing investment. The digitization of these legacy collections ensures that all of this material will be made available to the broadest possible audience and that it is preserved forever. In this sense, it fully maximizes the value of all these collections to the IU community, our state, and the international research and scholarly community in the digital age.

It is also the collections of such objects, many of which will continue to evolve in size as will the scholarly dialog concerning them, that also define the character, values and heritage of an institution like IU. These “assets” also provide a key element in institutional differentiation for us and they underpin some of our key academic strengths.

Digitization at Indiana University

IU has been a major national leader in large-scale and wide-ranging digitization projects for over 20 years.

  • The Variations Project in IU’s world-renowned Jacobs School of Music began in 1990, and in partnership with IBM and a number of foundations, developed an advanced digital music library to support instruction. IU’s Jacobs School of Music, incidentally, is generally regarded as the top-ranked music school in the United States. IU now has a world-renown digital music capability that our students still use today to access over 20,000 digitized scores and audio recordings.
  • This success helped accelerate the IU Digital Library Program in the 1990s that pioneered and developed well over 50 highly- unique, digitized collections that span areas from the "Chymistry of Isaac Newton" papers to the "Victorian Women Writers Project."
  • In 2004, then as IU’s Vice President for Research and Vice President for Information Technology, I appointed a Cyberinfrastructure Research Taskforce to assess what scholars needed for data access and preservation. The report from that faculty taskforce continues to guide IU strategy to this day, and led to the establishment of the IU "Scholarly Data Archive" systems with over 42 petabytes of online storage for research data, one of the largest and most sophisticated at any university in the country.
  • In 2007, IU joined with a number of other universities in the Midwestern United States and Google to digitize millions of our book holdings as part of the Google Book Search Project.
  • And in 2008, IU co-founded—with the University of Michigan—The HathiTrust, which now has over 100 partners and is one of the most important digital libraries in existence.

IU's extensive holdings of rare audio, video and film

The digitization of text has, of course, been routine for many years.

But these collections represent only part of a huge spectrum of material from written material at one end, where the size of digitized documents might be measured in a few megabytes, to repositories of genomic or particle physics data, measured in petabytes. It also includes what are called time-based media objects—basically sound recordings, video recordings, and films, and it is the digitization of this sort of material on which I wish to concentrate. IU has an extensive range of extraordinarily rare, and in some cases, irreplaceable and unique collections in this area. These collections contain material from a wide range of areas in the humanities, the arts and music, the social sciences, and the health sciences—areas of great traditional strength at Indiana University.

A comprehensive study in 2009 estimated that there are well over half a million audio and video recordings and film reels—many of which are historically significant—on the IU Bloomington campus alone. There are another 100,000 or more objects of significance on the Indianapolis campus and on IU’s six regional campuses.

Degrading source materials

But nearly all of this vast amount of material was difficult to access. Much of it was recorded in formats that are now obsolete, and for which few playback devices remain in existence. And as is tragically too often the case, some of this material was at risk of deterioration or was already deteriorating. We had reached a point where we had to take immediate action or many of these precious objects—many potentially vital to scholarship and part of the heritage of IU—would be lost forever.

The media digitization and preservation initiative

So, in 2013, I announced that IU would establish the Indiana University Media Digitization and Preservation Initiative with total funding of $15 million over five years. The goal of this ongoing initiative is extremely ambitious—it is, in short, to digitize, preserve, and make universally available by IU’s 200th anniversary in 2020—consistent with copyright or other legal restrictions—all of the time-based media objects on all campuses of IU judged important by experts.

Digitization in process

This initiative is being carried out as part of a public/private partnership with a leading international company in the area, Memnon Archival Services, a division of Sony.

With so much to digitize, the monumental task of MDPI was separated between two teams. Memnon, with its industrial-scale digitization process, can process up to 600 recordings in a day at peak performance.

We also established a smaller IU Media Digitization Studio to work solely with problematic pieces—those that are delicate and labor intensive—to carefully capture and preserve as much as possible. The IU facility digitizes one fragile or problematic recording at a time, with one engineer working with it until it is done.

Early success

Earlier this year, we announced that the MDPI Initiative was well ahead of schedule and had digitized more than 100,000 audio and video recordings in just its first year.

Memnon has already digitized:

  • more than 30,000 long-playing record albums,
  • more than 14,000 digital audio tapes,
  • more than 15,000 Betacam video tapes,
  • 10,000 CD-Rs, and will complete the digitization of
  • more than 43,000 open-reel audio tapes by next fall,
  • 44,000 78- and 45-RPM recordings by early next year, as well as tens of thousands of audiocassettes and video tapes.

And the smaller IU operation has already digitized hundreds of open-reel audio tapes, Betamax and 8mm videos, and will digitize thousands of lacquer and aluminum discs and wax cylinders in the coming years.

There is increasing interest in this area not only in academia, but commercially, as the gravity of the preservation situation with such material becomes more widely recognized. The MDPI initiative has made IU a leader in this field and has opened up many new opportunities for partnership and collaboration. In fact, we announced this summer that we were extending the use of the facilities created for the MDPI to enable Memnon to do digitization work for new clients including other universities, museums, and commercial broadcasters. This further strengthens IU’s position as a major center for high-volume media digitization and preservation work for scholarly and research collections.

While this digitization facility is impressive, it is only made possible through IU’s extensive cyberinfrastructure involving mass storage systems, data centers, and very fast networks. IU has approximately 40 petabytes of robotic tape-based storage for our Scholarly Data Archive. It is backed up in real-time as every use is immediately replicated at both the Bloomington and Indianapolis Data Centers, 80 km away. These are connected by a redundant 100 gigabits per second network, and even the local MDPI facility has a dedicated 10 Gbps fiber connection to the Data Center.

Access through IU Libraries for scholars

Transparent access to this remarkable collection of material is essential to the success of MDPI. Making it available online to current and future generations of scholars and researchers is just as important as preserving it. IU is using the Avalon Media System to make it easier for libraries and archives to provide that access. This open-source system has been co-developed by the libraries of Indiana University and Northwestern University, with support from the Institute of Museum and Library Services and the Andrew W. Mellon Foundation, and is based on widely-used digital repository technologies.

Together with existing tools, Avalon will enable members of the IU community, the public, and researchers around the world to discover, view, and listen to audio and video digitized from IU’s collections, with access granted based on the rights status of the recordings.

Curated collections available to scholars

Within IU, this Initiative will be of great importance to all of our campuses and the many schools at IU by providing them with immediate Internet access to large amounts of material now almost inaccessible, and which is at threat of disappearing forever.

The initiative also provides outstanding opportunities for education and research in IU’s School of Informatics and Computing and in IU’s new Media School. In addition, the extensive amounts of visual material will open up major new opportunities for film studies in the Media School and for the nationally acclaimed IU Cinema.

All of this leverages IU’s decades-long investment in information technology infrastructure—through storage, through supercomputing, through our hardened data centers, and our cutting-edge networks.

Related digitization initiatives: the Uffizi Gallery

In fact, these investments in information technology infrastructure have made a number of related initiatives possible.

Earlier this year, IU announced a cooperative agreement with the Uffizi Gallery in Florence, Italy, to carry out the 3-D digitization of the museum's entire collection of 1,250 pieces of irreplaceable classical Greek and Roman sculpture. The project between the Uffizi, one of the oldest and most renowned art museums in the world, and IU's Virtual World Heritage Laboratory is creating high-resolution 3-D digital models of the Uffizi sculptures and is making them freely available online. It will be completed by IU’s bicentennial in 2020.

The Uffizi project is being led by Bernard Frischer, IU professor of informatics, director of the university's Virtual World Heritage Laboratory, and one of the world's leading virtual archaeologists.

A number of sculptures from the Uffizi Gallery have already been digitized, and you can interact with 3D models of them on the website of the Virtual World Heritage Laboratory.

Video of 3D model of bust of Agrippina the Younger

This video shows one of the 3D models of the bust of Agrippina the Younger—the mother of the Emperor Nero. The video is based on capture of a real-time interaction with the 3D model on the web. It illustrates how the models can be used as design elements of webpages to illustrate catalogue entries, or to enhance scholarly papers about the work of art. In the case of this bust of Agrippina, you will note that there is so-called "tombstone" information on the webpage next to the interactive model. This gives the name of the work, information about the material from which it is made, its dimensions, its Uffizi museum catalogue number, and its condition.

Digitization master plan

The transformation of the third mission of universities—the preservation of knowledge— from the physical to the virtual world of digitization, is both essential and irreversible.

But the success of these major projects at IU—MPDI and the Uffizi project—suggest an even bolder goal—to digitize all of our collections. We are now working to draw all of our digitization efforts together into a true university-wide strategy—IU’s Digitization Master Plan. The goal of this plan is to digitize and store in some form, all of our existing collections judged by experts and scholars to be of lasting importance to research and scholarship, and to ensure the preservation of all new research and scholarship at IU that is born digital.

The development of this plan is well underway.

Jefferson on preservation

Now let me move on to the problem of ultra-long-term preservation. And let me illustrate the key idea here with a quote from Thomas Jefferson.

In a letter that he wrote in 1791, Jefferson, who would later become the third president of the United States, wrote about the importance of preservation.

He wrote: “…let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.”1

Long-term preservation

The preservation mission of universities is foundational to further the creation and dissemination of knowledge missions. For Millennia, universities have preserved vast collections of physical objects through floods, wars, and societal upheaval. Libraries have sometimes been destroyed, but through multiple copies of books, knowledge has been preserved for future generations.

We must now learn to do the same for the digital era. How can we be certain that today’s knowledge and scholarship, increasingly “born digital,” will be available to future generations? What if the technology melts down? Or the digital version becomes unreadable?

Long-term digital preservation

We have been working on the problems of ultra-long-term digital preservation and assessing the technical, legal, and operational challenges of Digital Preservation for the decades and centuries ahead. In 2008, I was part of a group of University Presidents who asked if the digital copy could, in fact, become the preservation copy of record. That is, not a digital representation of a physical paper or book, but the preservation copy for the future.

That work advanced, and in 2012, I co-founded and served as the founding chair of the Digital Preservation Network or DPN—pronounced “Deepen.” DPN is not for providing access to digital content as that will continue through the libraries and evolving means. DPN is a fail-safe, dark archive that establishes replicas of digital content transported by fast computer networks, and provides essential legal agreements for access. DPN, and other efforts like it, will be essential to the Preservation mission of great universities.

DPN architectural overview

Here you see an overview of the DPN’s architecture.

The initial implementation of DPN has a number of major storage nodes geographically distributed around the nation, with as much diversity of software and hardware as possible for resilience, acting as front doors to different types of digital data—for example, text, rich media, and large scientific data sets.

Fundamental to it is cooperative, independent development of the repositories. Each node replicates the data of each of the other nodes so that each node has a full copy of all the data. Heterogeneity among the different repositories is an essential part of creating digital reliance over time.

Conclusion

Let me say just a few words in conclusion. Today, I have discussed the preservation of knowledge as a fundamental mission of the great universities of the world. In the 21st century, the long-term preservation digital knowledge itself will ultimately rest on universities and on a strong global research and education network.

The great universities of the 21st century are made up of communities of scholars who contribute in transformative and innovative ways to the prosperity and progress of their nations and the world.

As all of us move forward in this new digital age, we must promote collaborative learning and research by sharing expertise, experience, and resources. We must continue to ensure that our faculty has access to the most advanced tools and facilities to support their research and scholarship. And we must ensure that the complete scholarly record is preserved for future generations.

Information technology specialists—like those of you here in this room—can make major contributions to these goals.

Thank you very much.

Source notes

  1. Thomas Jefferson, Letter to Ebenezer Hazard, 18 February, 1791, Web, Accessed December 1, 2016, URL: http://founders.archives.gov/documents/Jefferson/01-19-02-0059.