Technical and Digitization Options
Center for Rural Affairs Best PracticesPurpose
The purpose of this document is to offer guidance into the creation of a digital library collection at the Center for Rural Affairs (CFRA). The guideline featured will discuss digital library software as well as digital imaging recommendations. The recommendations and guidelines within this piece are not the final say and should be taken only as suggestion to be referred to on a case-by-case biases.
The guidelines were developed with the following aspects in mind:
- Ease of Future implementation of digital library on the CFRA’s server
- Increase accessibility and interoperability
- Ensure consistency and continuity
- Ensure crosswalk operations to allow for ease of data transfer to CFRA’s server from test site
- Provide a baseline for materials digitization process
Project Planning
The success of the CFRA digital library creation will require careful planning. The planning will reflect on how the digitization project will best serve the CFRA strategic plan, technology, and project capacity.
General Principles to consider:
- Digital library software available
- Material scanned at the highest resolution appropriate to the original source form
- Efficient scanning practices to ensure effective use of digitation time and preservation of original materials condition
- Creation of a master file and storage of,
- Creation of backup files on a stable medium
- Creation of distribution files to ensure CFRA community access to materials
The software and equipment selection processes is crucial to the success of the CFRA digital library creation. The material that follows is recommendations for the process and also include examples used during the creation of the sample site.
Digital Library Software
The creation of a digital library requires the use of a digital library software for the structure of the library. Since the CFRA is a nonprofit institution, the project has focused on the open-source digital library software options. The open-source software ensures the lowest cost in initial upkeep, by eliminating the yearly cost of proprietary systems.
For the CFRA in the future pursuit of creating a permanent digital library collection, our team suggests the utilization of Greenstone. Greenstone is an open-source system that performs consistently in the research performed by Dion Hoe-Lian Goh. The CFRA can utilize the features in Greenstone within the server present on site at the center.
For the current scope of a sample site, our team will utilize Omeka and Omeka’s online hosting. The research presented by Andro supports the idea that Omeka has many of the features present in Greenstone. The time constraints on the project has also led our team to use the more familiar platform.
Scanning Recommendations
The images scanned will need to meet minimal standards as recognized by the The Library of Congress's “Technical Standards for Digital Conversion Of Text and Graphic Materials”. (“Technical Standards for Digital Conversion Of Text and Graphic Materials”, 2006)
OCR IMAGES
- Image Files should be created specifically for Optical Character Recognition software
- Image files should be created for each individual page of the newsletter
- The spatial resolution will be 600 pixels-per-inch (ppi) relative to the original newsletter
- Black & White (2 bit)
- TIF 6.0 uncompressed
- De-skew images with a skew of greater than 3 degrees
- Crop to edge of page
- Image files should be created for each individual page of the newsletter
- The spatial resolution will be 300 pixels-per-inch (ppi) relative to the original newsletter
- 24 bit color
- TIF 6.0 uncompressed
- De-skew images with a skew of greater than 3 degrees
- Crop to edge of page
- TIF headers will contain The Title of the Newsletter, The Organization (Center for Rural Affairs), the Date of Publication (YYYY-MM), and ISSN
- Each TIFF file will be compressed using the JPG standard
- The spatial resolution will be 72 pixels-per-inch (ppi) relative to the original newsletter
- 24 bit color
- The compression ratio should produce the highest quality of image (Q >= 95)
- Each JPG image will be reduced to thumbnail size
- The spatial resolution will be 72 pixels-per-inch (ppi) relative to the original newsletter
- 24 bit color
- ¼ of the scale of the large JPG, or 153 x 198 pixel
A top level directory named Newsletters will be created. A folder foreach newsletter will be created. The name of the folder should correspond to the 4 digit Year, followed by an underscore, followed by the 2 digit month designation of the newsletter issue. For example, the newsletter for January 2010 will be placed in a folder named 2010_01. Each image will be placed in a folder named according to the date associated with the newsletter. Included in the Folder will be a TIFF, JPG and second thumbnail JPG as well as the TEI xml document.
Each page of a newsletter should have two TIFFs. The first TIFF should be 600dpi in black and white. The filename for the TIFF should be of the form: ‘CFRA_Newsletter_[YYYY]_[MM]_[page #]_BW.tiff’ with [page #] being substituted with the integer number of the page scanned. For example, the first page of the newsletter issued in January 2010 in TIFF format will be named CFRA_Newsletter_2010_01_1_BW.tiff.
The second TIFF should be 300dpi with 24bit color. The filename for the TIFF should be of the form: ‘CFRA_Newsletter_[YYYY]_[MM]_[page #].tiff’ with [page #] being substituted with the integer number of the page scanned. For example, the first page of the newsletter issued in January 2010 in TIFF format will be named CFRA_Newsletter_2010_01_1.tiff. The final dimensions should be 2550 pixels x 3300 pixels and approximately 25MB.
Similarly each page should have a JPG, derived from the TIFF. The JPG should only be 72dpi with 24 bit color and a minimum compression of 80%. The filename for the JPG should be of the form: ‘CFRA_Newsletter_[YYYY]_[MM]_[page #].jpg’ with [page #] being substituted with the integer number of the page compressed. For example, the first page of the newsletter issued in January 2010 in JPG format will be named CFRA_Newsletter_2010_01_1.jpg. The final dimensions should be 612 x 792 pixels.
Lastly each JPG image should also have a Thumbnail image associated with it. The Thumbnail should be ¼ of the scale of the large JPG with the same resolution. The filename for the thumbnail should be of the form:(CFRA_Newsletter_[page #] _THUMB + .jpg) with [page #] being substituted with the integer number of the page compressed. For example, the first page of the newsletter issued in January 2010 in JPG Thumbnail format will be named CFRA_Newsletter_2010_01_1_THUMB.jpg. The final dimensions should be 153 x 198 pixels.
Equipment Recommendations
The equipment recommendations for the CFRA are designed to meet the software needs listed above to provide the most cost effective solutions.Servers
From the research performed by Han, our team suggests the use of cloud computing resources. The cloud computing allows the CFRA to remove the operating cost for server maintenance. The servers are also considered more secure in that the server is not located to one physical location and direct server, but is able to move from operational server to operational server will little to no down time due to hardware malfunctions.Scanners
For the format of material to be digitized through the CFRA, our team flatbed scanners be utilized for the best digitization results. The flatbed scanner enable to digitizer greater flexibility in the material size and structure to achieve the greatest clarity in imaging.
Works Cited
Andro, M., Asselin, E., & Maisonneuve, M. (2012). Digital libraries: Comparison of 10 software. Library Collections, Acquisitions, and Technical Services.
Cordell, R. (2011). New technologies to get your students engaged. Chronicle of Higher Education.
Goh,
D. H. L., Chua, A., Khoo, D. A., Khoo, E. B. H., Mak, E. B. T., &
Ng, M. W. M. (2006). A checklist for evaluating open source digital
library software. Online Information Review, 30(4), 360-379.
Han, Y. (2013). On the clouds: a new way of computing. Information Technology and Libraries, 29(2), 87-92.
Harvey,
R., & Bastian, J. A. (2012). Out of the classroom and into the
laboratory: Teaching digital curation virtually and experientially. IFLA journal, 38(1), 25-34.
Library of Congress. (2006).Technical Standards for Digital Conversion Of Text and Graphic Materials. Retrieved from http://memory.loc.gov/ammem/about/techStandards.pdf.
Library of Congress. (2006).Technical Standards for Digital Conversion Of Text and Graphic Materials. Retrieved from http://memory.loc.gov/ammem/about/techStandards.pdf.
Witten, I. H., Bainbridge, D., Paynter, G., & Boddie, S. (2002). Importing documents and metadata into digital libraries: Requirements analysis and an extensible architecture (pp. 390-405). Springer Berlin Heidelberg.
No comments:
Post a Comment