How to Digitize 68,000 Pages of Documents

Guest post by Heather Wagner, Digitization Coordinator at UC Merced Library

For the Pioneering Child Studies project the UC Merced Library’s Digital Curation and Scholarship unit was tasked with digitizing 68,000 pages of documents. So, how do we go about digitizing 68,000 pages of documents? With some help. That help comes from four undergraduate student assistants who play an important part in the digitization process.

The first part of the process is the actual digitization. Our undergraduate student assistants digitize materials on a variety of equipment. These include high speed document scanners and flatbed scanners for documents, book scanners for bound material, and cameras on stands for oversize or fragile materials.

Student Nicolas Fleming digitizing bound materials using a book scanner

Once the digitization is complete, the next step is quality checking. Students review each image in Adobe Bridge and zoom in to check for issues such as lines in scans or items out of focus. Some images may need minor editing such as straightening and cropping which is completed during the quality checking step in Photoshop. The quality checking step is time consuming but necessary, so we are sure we are receiving the best possible results from digitization.

Student Dathan Hansell quality checking digitized documents.

PDFs with optical character recognition (OCR) are created from the digitized image files so they are accessible to users. OCR makes the PDF document searchable. The PDF documents are then quality checked by the students, and the documents are then optimized. Optimizing the PDF files reduces their file size, which makes them better suited for web viewing. The files are then ready for uploading.

We appreciate the hard work of our undergraduate student assistants. We would not be able to complete digitization projects of this size without them.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.