Reproducible Research and the problems of preserving computer code and software

We collect and preserve a lot of the documentary evidence of science happening at UCSF — everything from lab notebooks to lab websites detailing research processes. We even hold tons and tons of data in our collections, mostly in physical form, as patient surveys or health records, or even raw data as it was initially recorded by hand in the lab.

But what about the products of contemporary science, where key digital elements such as computer code or software might be crucial to an understanding of the research? This is already presenting problems for research reproducibility. Think, for example, of a set of results which were obtained using a computer script written in the Python computer programming language. If you want to verify these results, are you able to view the source code which produced them? Are you able to execute that code on your own computer? Can you tell what each piece of the code does? Does the code rely on access to an external data set to work correctly, and can you access and/or assess that data set to test the code?

As we work more closely with our Data Science Initiative team on these issues, it becomes clear that these are preservation questions as well. A critical understanding of the scientific past and present requires access to the primary source documentation of that research, including computer code and software. Being able to understand and interpret that computer code involves many of the same questions mentioned above — executions of code, documentation of each process in the code, access to necessary data, etc.

To begin to address this, we are working with the Data Science team to assess researcher coding practices as a first step in understanding how the library can encourage better documentation and preservation of code in the service of reproducible research and the persistence of the scientific scholarly record. And if you’re a researcher who codes for your work, then we want feedback from you! Please consider attending one of our lunchtime listening sessions in the coming weeks — 4/20 from 12-1:30 pm at Mission Bay, and 4/27 from 12-1:30 pm at Parnassus. We will have an informal chat about research coding practices and will discuss some of the issues we encounter as information professionals, as well as talking about what the library can do to aid in these areas.

Join us as we make some in-roads on this challenging information problem.