COVID Tracking Project Records and Resources Now Available

This announcement is authored by COVID Tracking Project Archive Lead, Alex Duryee

The UCSF Library Archives and Special Collections is pleased to announce that the COVID Tracking Project (CTP) records are available online for research.  The CTP is a crowdsourced digital archive that was managed by a group of journalists at The Atlantic and approximately 500 volunteers who gathered, cataloged, and published state-level COVID-19 data over the first fifteen months of the pandemic. “The COVID Tracking Project was a remarkable and influential initiative — part citizen science, part journalism, part crisis response. I’m thrilled that UCSF Archives has acquired, processed, and made available the digital records of this unique organization,” said Amanda French, a digital archivist and key leader of the CTP at The Atlantic.

In addition to the CTP’s data products, this collection includes its data creation and quality records, organizational records, correspondence, and code repositories. Over 2,100 academic articles have cited data from the collection and federal agencies like the Centers for Disease Control and Prevention.

Open records available

The finding aid on the Online Archive of California describes the entirety of the collection and includes all of the CTP records held by UCSF. Records range from data processing infrastructure and documentation, correspondence with state and territorial health departments, original COVID-19 data captures, and Slack discussions like #gratitude and #emoji-march-madness.  A significant portion of the collection is restricted until 2102 to protect the privacy of CTP members. However, the open records are available for digitally and on-site by appointment within the UCSF Library Archives and Special Collections reading room. 

The final data products from the CTP are available on Dryad, in accordance with FAIR principles:

In addition to the final data sets, UCSF developed a tool for viewing the data as it changed over time.  COVID-19 data was never static. Often reporting schedules were inconsistent around weekends and holidays, and data was either reported late or updated long after the initial release. Another factor was that states continuously changed their data definitions throughout the pandemic. UCSF’s Data Explorer lets researchers view CTP’s data as it was updated, providing a more profound view of the topline numbers. Data Explorer includes references to original data sources (generally screenshots of websites and data files) and daily Slack discussions for each reporting source (available on-site at UCSF).

Oral histories and open source tools

Along with the collection’s files and data, the CTP records include oral histories created by the CTP as it came to a close in 2021.  These oral histories provide a human-centered perspective on the data, the organization, and the pandemic in the United States.  With permission from the interviewees, the oral histories are available via Calisphere.

The UCSF Archives and Special Collections also developed several open-source tools to aid in acquisition, preservation, and access to the CTP materials. CTP used platforms like GitHub, Instagram, and Twitter for public and internal communication.  These platforms do not always provide accessible tools for preserving data; thus, UCSF created tools to download posts and private messages and generate access versions in PDF.  These tools are available on GitHub for use in and development of digital archives.

Inspiring future research and education

This collection was designed in adherence to UCSF Library’s Archives as Data initiative and the broader Collections as Data movement. UCSF Archives and Special Collections developed multiple platforms and pathways to approach the collection.

This way researchers across disciplines can discover and use the records in their work. Whether it is from an epidemiological, social science, or data science lens, CTP archive lead Alexander Duryee acknowledges the powerful insights this collection affords, “We believe that this collection will provide key context for the story of the pandemic and that researchers across disciplines will find it illuminating.” By cross-linking between the archival collection, oral histories, and data sets, the collection encourages deep exploration of the “whats” and “hows” of the CTP and its data.

The collection serves as the foundation of the Data Journalism Course In A Box (DJCB) project, which is building a data science curriculum around the CTP records to support journalism education.  The collection includes a comprehensive view of the data, from its initial publication on agency web pages through quality control and publication. Investigative reporter Tyler Dukes is developing the DJCB with the help of the UCSF team. The curriculum uses CTP data to illustrate to journalists how to work with and analyze real-world public health data and how to communicate complex topics to a broad audience.

Project team members

  • Tyler Dukes, data journalism consultant
  • Alexander Duryee, Covid Tracking Project archive lead
  • Edith Escobedo, UCSF project archivist
  • Polina Ilieva, UCSF Associate University Librarian for Collections and archivist
  • Charlie Macquarie, former UCSF digital archivist
  • Kevin Miller, former Covid Tracking Project archive lead

In addition, the team would like to thank the many collaborators across the University of California system and advisory board members for their contributions to this project.

Funding for The COVID Tracking Project Archive was provided by the Alfred P. Sloan Foundation (Sloan grant G-2022-17133).


Student Fellows Explore Machine Learning with UCSF Industry Documents Library and Data Science Initiative

The UCSF Industry Documents Library (IDL) and Data Science Initiative (DSI) teams are excited to be working with three Data Science Fellows this summer. The Data Science Fellows are part of a joint IDL-DSI project to explore machine learning technologies to create and enhance descriptive metadata for thousands of audio and video recordings in IDL’s archival collections.  This year’s summer program includes two junior fellows and one senior fellow.

Our junior fellows are tasked with manually assigning or improving metadata fields such as title, description, subject, and runtime for a selection of videos in IDL’s collection on the Internet Archive. This is a detailed and time-consuming task, which would be costly to perform for the entire collection. In contrast, our senior fellow is using transcriptions of the videos, which we have generated with Google’s AutoML tool, to explore different technologies to automatically extract the descriptive information. We’ll then compare the human-generated data with the machine-generated data to assess accuracy.  The hope is that IDL can develop a workflow for using machine learning to create or improve metadata for many other videos in our collections.

Our Junior Data Science Fellows are Bryce Quintos and Adam Silva. Bryce and Adam are both participating in the San Francisco Unified School District (SFUSD) Career Pathway Summer Fellowship Program. This six-week program provides opportunities for high school students to gain work experience in a variety of industries and to expand their learning and skills outside of the classroom. Bryce and Adam are learning about programming and creating transcription for selected audiovisual materials. The IDL thanks SFUSD and its partners for running this program and providing sponsorship support for our fellows.

Noel Salmeron is our Senior Data Science Fellow participating in Life Science Cares Bay Area’s Project Onramp. Noel is using automated transcription tools to extract text from audiovisual files, run sentiment and topic analyses, and compare automated results to human transcription. Noel also provides guidance and mentoring to the Junior Fellows.

Our Fellows have shared a bit about themselves below. Please join us in recognizing Bryce, Adam, and Noel for their contributions to the UCSF Library this summer!

IDL-DSI Junior Data Science Fellow Bryce Quintos

Hi everyone! My name is Bryce Quintos and I am an incoming freshman at Boston University. I
hope to major in biochemistry and work in the biotechnology and pharmaceutical field. As someone who is interested in medical research and science, I am incredibly honored for the opportunity to help organize the Industry Documents Library at UCSF this summer and learn more about computer programming. I can’t wait to meet all of you!

IDL-DSI Junior Data Science Fellow Adam Silva

Hi, my name is Adam Silva and I am a Junior Intern for the UCSF Library. Currently, I am 17 years old and I am going into my senior year at Abraham Lincoln High School in San Francisco. I am part of Lincoln High School’s Dragon Boat team and I am also a part of Boy Scout Troop 15 in San Francisco. My favorite activities include cooking, camping, hiking, and backpacking. My favorite thing that I did in Boy Scouts was backpacking through Rae Lakes for a week. I am excited to work as a Junior Intern this year because working online rather than in person is new to me. I look forward to working with other employees and gaining the experience of working in a group.

IDL-DSI Senior Data Science Fellow Noel Salmeron

My name is Noel Salmeron and I am a third-year data science major and education minor at UC Berkeley. I’m excited to work with everyone this summer and looking forward to contributing to the Industry Documents Library!