Open-source tools from The COVID Tracking Project Archive.

The COVID Tracking Project Archive has several unique challenges, namely how to preserve unique, born-digital materials in formats that will be easily accessible to researchers far in the future. Tools like Twitter, Instagram, and Slack are constantly changing their interfaces, making preservation difficult. 

To make the job of the archive and other archivists easier, the COVID Tracking Project is releasing several tools we have developed to preserve these digital formats on our Github Organization. These include: 

  • Twitter Preserver – A tool to convert the downloaded Zip file a user gets from Twitter into stable HTML files. This includes Direct Messages as well as public Tweets and Favorites. View a preview of the output of this tool.
  • XLSX Bulk Converted – A python script that will bulk-convert Excel files into folders of CSV files, one file per worksheet. 
  • Instagram Preserver — A tool that logs into Instagram and downloads all the feed data and images from another account. Instagram is particularly difficult to access without logging in, so this tool uses an internal API to access the user’s feed. 

Our XLSX bulk converter was written by our amazing Tech intern Tracy Lee

We believe these kind of tools provide a model of how to preserve and protect information from proprietary and sometimes fleeting platforms for future researchers. 

Support for development came from the Alfred P. Sloan Foundation.