Archives as Data Research Guide Now Available!

To help researchers in finding and understanding how to work with data from archival health sciences collections, we have compiled and published the Archives as Data research guide. “Archives as Data” refers to archival collection materials in digital form that can be shared, accessed, analyzed, and referenced as data. Using digital tools, researchers can work with archives as data to explore and evaluate characteristics of collection materials and analyze trends and connections within and across them.

AIDS History Project Collections document included in the No More Silence dataset with Python code used for analysis.

UCSF Archives and Special Collections makes data available from a number of our digital collections. Researchers will find information in the guide about accessing and using such data as well as descriptions of both the form and content this data takes. As well, you’ll find a growing set of links to to learning resources about various data analysis methods used to work with archives as data.

This new Archives as Data research guide provides researchers with a centralized resource hub with brief descriptions of collection materials as well as links to the datasets that have been prepared from them, including:

  • The No More Silence dataset, an aggregation of data from selected collections included in the AIDS History Project which range from the records of community activism groups to the papers of health researchers and journalists.
  • Data from the Industry Documents Library, comprising collections of documents from the tobacco, food, drug, fossil fuel, chemical, and opioid industries, all of which impact public health.
  • Selected datasets from the COVID Tracking Project, a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States, with data collected from March 2020-March 2021.
  • Data from digitized UCSF University Publications, from course catalogs to annual reports, newsletters, and more.

We look forward to updating the guide as more data from UCSF Archives and Special Collections becomes available, and anticipate expanding to include links to “archives as data” of interest for digital health humanities work made available by other institutions and organizations.

To learn more about how we are making archives as data available at UCSF, check out recordings and resources from our recent sessions on Finding and Exploring Archives as Data for Digital Health Humanities!

The Archives as Data Research Guide has been published as part of the UCSF DIgital Health Humanities pilot program. Please reach out to the Digital Health Humanities Program Coordinator Kathryn Stine, at kathryn.stine@ucsf.edu with any questions about DHH at UCSF. The UCSF Digital Health Humanities Pilot is funded by the Academic Senate Chancellor’s Fund via the Committee on Library and Scholarly Communication.

Digital Health Humanities: Showcasing “Archives as Data” for Analysis

UCSF Archives & Special Collections includes numerous digitized collections documenting health sciences topics ranging from institutional, community, and individual response to illness and disease to industry impacts on public health. We make many of these collections available as data that can be computationally analyzed for health sciences and humanities research.

Voyant Cirrus term frequency visualization generated from AIDS health crisis workshops file data, 1986 from the UCSF AIDS Health Project Records, UCSF Archives & Special Collections (data available in the No More Silence dataset).

If you are curious about working with data from the UCSF Archives and Special Collections, the Digital Health Humanities (DHH) pilot program will showcase our “archives as data” throughout the month. In two upcoming sessions, we’ll provide an orientation to available data as well as methods for finding, accessing, and exploring these data resources:

Voyant Bubbleline term occurrence visualization generated from Letter from the FDA to Purdue re: new drug application for OxyContin Controlled-Release Tablets data, 1995 from the Kentucky Opioid Litigation Documents collection, UCSF Industry Documents Library (data available from from item page link or as part of collection dataset).

Python for Data Analysis series workshops

DHH programming also continues to partner with the Data Science Institute (DSI) to offer workshops on tools and methods well-suited to conducting research with “archives as data.” March workshops in the DSI Python for Data Analysis series will dig in to text analysis using natural language processing and building machine learning models:

Through these workshops and selected companion follow-up sessions with troubleshooting and guided process walkthroughs, researchers can learn and practice data analysis techniques and get familiar with data from our collections. Check out the library’s events calendar to find and register for the latest offerings!

OpenRefine workshops

If you have data you’d like to work with but it needs tidying and preparation attend a DSI OpenRefine workshop. This workshop will cover techniques for cleaning structured data, no programming required! There will be two OpenRefine sessions this month:

Previously-held DHH session slides, linked resources, and recordings are available on the CLE. There you will find materials from a Digital Health Humanities Overview session and recorded walkthroughs for Unix, Python, and Jupyter notebooks basics. Related resources will be updated on the CLE following DHH sessions.

Questions?

Please contact DHH Program Coordinator, Kathryn Stine, at kathryn.stine@ucsf.edu. The UCSF Digital Health Humanities Pilot is funded by the Academic Senate Chancellor’s Fund via the Committee on Library and Scholarly Communication.

How to Digitize 68,000 Pages of Documents

Guest post by Heather Wagner, Digitization Coordinator at UC Merced Library

For the Pioneering Child Studies project the UC Merced Library’s Digital Curation and Scholarship unit was tasked with digitizing 68,000 pages of documents. So, how do we go about digitizing 68,000 pages of documents? With some help. That help comes from four undergraduate student assistants who play an important part in the digitization process.

The first part of the process is the actual digitization. Our undergraduate student assistants digitize materials on a variety of equipment. These include high speed document scanners and flatbed scanners for documents, book scanners for bound material, and cameras on stands for oversize or fragile materials.

Student Nicolas Fleming digitizing bound materials using a book scanner

Once the digitization is complete, the next step is quality checking. Students review each image in Adobe Bridge and zoom in to check for issues such as lines in scans or items out of focus. Some images may need minor editing such as straightening and cropping which is completed during the quality checking step in Photoshop. The quality checking step is time consuming but necessary, so we are sure we are receiving the best possible results from digitization.

Student Dathan Hansell quality checking digitized documents.

PDFs with optical character recognition (OCR) are created from the digitized image files so they are accessible to users. OCR makes the PDF document searchable. The PDF documents are then quality checked by the students, and the documents are then optimized. Optimizing the PDF files reduces their file size, which makes them better suited for web viewing. The files are then ready for uploading.

We appreciate the hard work of our undergraduate student assistants. We would not be able to complete digitization projects of this size without them.

Spotlight on Carlton B. Goodlett

Carlton Benjamin Goodlett, PhD, MD (1914-1997) was a San Francisco newspaper publisher, civil rights leader and physician. He practiced medicine at Mount Zion Hospital (now known as UCSF Medical Center at Mount Zion) and at that time, was one of only three Black doctors in the city.

His 1997 obituary in Synapse, UCSF’s student newspaper, enumerated his many accomplishments and commitment to social justice. Goodlett graduated magna cum laude from Howard University in 1935. At the age of 23 he received his doctorate in child psychology from the University of California, Berkeley, making him one of the first Black students to receive a PhD from the UC Berkeley Department of Psychology. He went on to receive his medical degree from Meharry Medical College in Nashville, Tennessee.

Goodlett’s legacy includes leading boycotts of businesses that discriminated against people of color and participating in student protests at San Francisco State University.  He was also a co-founder of the San Francisco Young Democrats. According to the San Francisco Chronicle, “Until the emergence of the Black Panther Party in the late 1960s, Goodlett was the dominant figure in San Francisco’s civil rights movement in securing jobs for African Americans and appointments to important city commissions that blacks had never held.”

Another notable element of the Synapse article is a featured a drawing of Dr. Goodlett by the American graphic artist Emory Douglas (b. 1943). Douglas was the minister of culture and revolutionary artist for the Black Panther Party. He designed the Party’s newspaper, The Black Panther, and was responsible for the publication’s iconic imagery.

For additional resources on Carlton B. Goodlett and Emory Douglas:

Dr. Leona Mayer Bayer Digital Collection Now Available

UCSF Archives and Special Collections is delighted to announce the publication of the Leona Mayer Bayer Correspondence digital collection on Calisphere. The digitization project is part of the NHPRC grant, Pioneering Child Studies: Digitizing and Providing Access to Collection of Women Physicians who Spearheaded Behavioral and Developmental Pediatrics. We worked in partnership with UC Merced Library’s Digital Assets Unit towards our goal of digitizing and publishing 68,000 pages from the collections of Drs. Hulda Evelyn ThelanderHelen Fahl GofmanSelma FraibergLeona Mayer Bayer, and Ms. Carol Hardgrove. To date we have digitized over 59,000 pages. Most digitized material is still undergoing quality assurance (QA) procedures. Here are some items we have digitized from Dr. Leona Mayer Bayer collection.

Dr. Leona Mayer Bayer, 1956. Leona Mayer Bayer Correspondence box 1, folder 9

Dr. Leona Mayer Bayer

Dr. Leona Mayer Bayer received her MD from Stanford University Medical School in 1928. She worked with the Institute of Human Development in Berkeley and focused on child development, human growth, and psychology of sick children. The collection consists of around 400 digitized pages and the collection features professional correspondence of Dr. Leona Mayer Bayer. Some items that may be of interest is her correspondence with Dr. Hilde Bruch and her acceptance remarks for the PSR Broadstreet Pump Award she received in March of 1987.

In the next months we will digitize and soon publish our next four collections on Calisphere. Stay tuned for our next update

Welcome Allison Tracy-Taylor, Oral History Archivist

We are excited to introduce Allison Tracy-Taylor who joins UCSF Archives & Special Collections as an Oral History Archivist. Allison will be leading the Oral History Program (OHP) supported by the Academic Senate Chancellor’s Fund and Committee on Library and Scholarly Communication that will enable the university to record and preserve diverse voices of the UCSF faculty sharing their stories in their own words and better shape the legacy they leave behind.

 This program aims to better understand and share the history of the health sciences education through recording, transcribing, and preserving oral histories with members of the UCSF teaching and research community and by making these oral histories available to the public. Through engagement with DEI leaders, the project will record their experiences and document efforts to address and remediate inequities in health, health care, and education. The Oral History Program will elevate the narratives, perspectives, and expertise of historically underrepresented populations in the education and research communities at UCSF. This one-of-a-kind public record will address “silences” and gaps in the existing historical narrative. Allison will collaborate with faculty to convene Oral History Advisory Committees at each of the schools to identify and develop the list of interviewees and perform outreach activities related to the program.

Allison Tracy-Taylor, Oral History Archivist
Allison Tracy-Taylor

Collecting and preserving archival material that documents nuanced historical narratives and encourages contemporary conversations has been a major theme of Allison’s work. Most recently as an independent oral historian based in Sacramento, CA, Allison was the project lead for the California State Library website Voices of the Golden State, a curated collection of oral histories exploring many facets of California’s history. She also worked on multiple oral history projects, including a project on the history of the medical device technology industry in Silicon Valley for Stanford BioDesign, and the Documenting the Experiences of Mexican, Filipina, and Chicana Women in California Agriculture Oral History Project for the Center for Oral and Public History at Cal State Fullerton.  

Allison is passionate about supporting oral history practitioners and growing the field into an inclusive, equitable space. She served as the 2019-2020 President of the Oral History Association (OHA), as well as on the OHA’s Council for several years. While president she initiated the development of the OHA’s Guidelines for Social Justice Oral History Work, convened and served on the Independent Practitioners’ Task Force, which developed a robust toolkit for independent oral historians, and chaired a task force that developed remote interviewing guidelines during the COVID-19 pandemic. 

Community engagement and education have also been central to Allison’s work. Prior to going independent, Allison worked as the Oral History Administrator at the Kentucky Historical Society, overseeing the Kentucky Oral History Commission (KOHC), the only commission of its kind in the United States. She provided outreach, education, and technical support to oral history practitioners and programs throughout Kentucky. Allison was also the Oral Historian for the Stanford Historical Society, documenting Stanford University’s history through the stories of faculty and staff and serving as the program’s senior oral history mentor. 

Allison began her work in oral history at the University of Nevada Oral History Program (UNOHP), serving in multiple roles, including as an interviewer for a multi-year project on the history of women’s athletics at the University of Nevada, and an editor for the resulting book We Were All Athletes: Title IX and Women’s Athletics at the University of Nevada. In addition to an M.A. in Oral History from Columbia University, Allison holds a B.A. in Sociology and English Literature from the University of Nevada. 

In her free time, Allison enjoys hiking, reading, the distinct hobbies of collecting craft supplies and crafting, and baking. Though she will always be a Nevadan at heart, she has come to love the profound beauty of California. 

Launching the Digital Health Humanities Pilot

We are excited to launch digital health humanities pilot programming starting January 2023! Digital health humanities (DHH) is an emerging discipline that utilizes digital methods and resources to explore research questions investigating the human experience around health and illness. The Digital Health Humanities Pilot (DHHP) will facilitate new insights into historical health data. Participants will learn how to evaluate and integrate digital methods and “archives as data” into their research through a range of offerings and trainings.

Participants at the first workshop for the No More Silence project, a precursor to digital health humanities pilot programming

The programming from this pilot will bring a humanistic context to understanding institutional, personal and community responses to health issues, as well as social, cultural, political and economic impacts on individual and public health. The DHHP will offer researchers from all disciplines (including faculty, staff, and other learners) tailored workshops, classes, and skill-building sessions. Workshops will encourage the use of “archives as data” and utilize datasets from holdings within the UCSF Archives and Special Collections (including the AIDS History Project and Industry Documents Library, among others). Additionally, in spring 2023 we will be hosting the Digital Health Humanities Symposium. The symposium will provide space to consider theoretical issues central to this emerging field and highlight digital health humanities projects. More information on the symposium will be shared soon.

The UCSF Digital Health Humanities Pilot is funded by the Academic Senate Chancellor’s Fund via the Committee on Library and Scholarly Communication.

Register for an upcoming Digital Health Humanities overview session

Are you interested in learning how DHH can inform your research? We invite you to participate in our virtual session, Digital Health Humanities: An Overview of Methods, Tools, Archives, and Applications, Thursday, January 19, from 1 to 3 p.m. PT.

This session will include an orientation led by Digital Health Humanities Program Coordinator, Kathryn Stine and Digital Archivist, Charlie Macquarie. We will discuss various approaches in DHH research, including getting familiar with data analysis and programming skills, and will share an overview of the UCSF Library’s archival collections data available for research.

For questions about digital health humanities at UCSF, please contact Digital Health Humanities Program Coordinator, Kathryn Stine at kathryn.stine@ucsf.edu.

Register Now

Collaborating with the Data Science Initiative

The Data Science Initiative (DSI) is offering workshops in the coming months to support researchers interested in implementing DHH approaches. Follow-up sessions will be available for researchers to reinforce and contextualize programming foundations in practical application. Check out the upcoming sessions:

We invite you to check out the library’s events and classes calendar for upcoming DHHP (and related DSI) programming. If you are unable to attend any of the sessions listed above, we advise referring to the DSI Collaborative Learning Environment (CLE) (accessible with MyAccess credentials) for recordings and resources.



Alex Duryee Named New COVID Tracking Project Archive Lead

The UCSF Archives & Special Collections is delighted to welcome our new colleague, Alex Duryee who took over from Kevin Miller as the COVID Tracking Project Archive Lead. The project team continues the work of preserving, providing online access, and building educational resources for the organizational records and datasets of the COVID Tracking Project at The Atlantic (CTP).

Alex Duryee

Alex brings a background in metadata, digital archives, and archival access to the COVID Tracking Project Archive team.  He holds a BA from The College of New Jersey and a MLIS from Rutgers University, and also serves as the Manager for Archival Metadata at the New York Public Library.  In this position, he manages the Library’s archival metadata platforms and develops metadata policy for the Library’s archival collections.  He also collaborates with staff across the organization to improve systems integrations and develop new methods for accessing and using archival materials.  Alex also serves on the National Finding Aid Network (NAFAN) Technical Advisory Working Group, SAA’s Technical Subcommittee for Encoded Archival Standards, and as the chair of the SNAC (Social Networks and Archival Context) Technology & Infrastructure Working Group.  He contributes to open-source projects such as ArchivesSpace, as well as developing open-source metadata tools.  In 2019, his team was awarded the C. F. W. Coker Award for Archival Description by the Society of American Archivists.

Alex’s background also includes experience as a freelance ArchivesSpace developer, a consultant with AVP, and a digital archives fellow with Rhizome.

Alex enjoys puzzles of all sorts (including metadata), board games, baking, and dancing.

Web-Archiving the UCSF Response to COVID-19

We’re excited to announce the publication of the UCSF COVID-19 Response Web-Archive. UCSF has historically been a “first responder” to a wide variety of public health emergencies. At the outset of the COVID-19 pandemic, UCSF archivists recognized that the evolving UCSF response to the situation would contain valuable information about this important, tragic, and devastating historical moment, and that documenting that response as it grew and changed would be a powerful historical record. And we were able to act quickly, because so much of the record is on the web.

Archives and Special Collections has been archiving websites for a long time — our oldest captures date back to 2007, which feels like another epoch in web-time (you can see all of our web-archives here: https://archive-it.org/organizations/986). To archive the web, we use specialized tools to take “captures” or “snapshots” of a certain web-page at a certain time, usually coming back to take a new capture at regular intervals. Because of this technique, web-archives are a valuable way to watch any given website evolve and change, and this documents something like a rapidly-evolving response to a global pandemic very well.

Image of website of AIDS Research Institute's COVID-19 Task Force showing their March 25, 2020 update on the pandemic in California and San Francisco.
The March 25, 2020 update of the AIDS Research Institute’s COVID-19 task force. Note that at this time there were only 76 confirmed COVID cases and no deaths.

In documenting the UCSF response to COVID-19 however, we had to work much more quickly and in much greater volume than we are used to. As you likely remember, during the height of the early days of the pandemic both the UCSF and the nationwide response was changing daily based on rapidly shifting information. Archives usually captures web-pages every 3 months or every 6 months, but upon embarking on this collection we realized that we needed to begin capturing certain websites every day. Additionally, UCSF has at any given time as many as 1000 different official websites (something with ucsf.edu at the base domain), so knowing which of these contained COVID information and should be captured was difficult. To remedy this problem, archivists set up GoogleAlerts to notify us anytime something was published to a ucsf.edu domain which mentioned certain key words identified as likely COVID-related.

And this was only the official UCSF websites. We also wanted to document outside coverage of UCSF activities, things that appeared on news websites, blogs, and occasionally social media (though the latter is persistently difficult to capture — download your Twitter archives people!). We were able to use GoogleAlerts in a similar way to help alert us to these sites, but even more importantly we benefited from the immense assistance of the amazing Anirvan Chatterjee, Director of Data Strategy at the Clinical & Translational Science Institute. Anirvan reached out to us early in the pandemic with a list of sites he had collected that contained documentation of UCSF’s role in the pandemic response, and his human-curated list was immensely helpful. The proliferation of digital information makes human curation and metadata creation increasingly difficult in archival repositories, and having someone like Anirvan who was able to devote the time to it (most digital archivist aren’t able to devote such time, if you can believe it!) really improved the collection.

This collection is also important because it can be both accessed by a human browsing and by a computer doing computational research. We plan to use these materials to expand our work in digital health humanities as well as collections as data as our newest colleague Kathryn Stine gets underway in her role coordinating these programs. Have a question about the COVID-19 web-archive collection? Want to use it in a computational project? Just love it? Get in touch!