UA using Matrox Imaging OCR software to digitize 92,000 NASA film images

说明: A typical film image from a Surveyor mission with a CRT display (left) and associated data fields (right) is shown; this is the data that the University of Arizona is tasked with digitizing and storing. (Image credit: Matrox Imaging)

IMAGE: A typical film image from a Surveyor mission with a CRT display (left) and associated data fields (right) is shown; this is the data that the University of Arizona is tasked with digitizing and storing. (Image credit: Matrox Imaging)

The University of Arizona's (UA; Tucson, AZ) Lunar and Planetary Laboratory (LPL) is home to the Space Imagery Center, a NASA Regional Planetary Image Facility. Founded in 1960, LPL was one of the few places engaged in studies of the solar system at that time. In 2015, NASA partnered with UA, providing funding to digitize the Surveyor moon lander film images and data that have been in storage since the 1960s. The goal is to create an archive for inclusion in the NASA Planetary Data System (PDS), a collection of data products from NASA planetary missions.

RELATED ARTICLE: MIT/NASA lunar laser communication team to present details on Earth-to-moon uplink performance

Between 1966 and 1968, the five successful Surveyor missions returned more than 92,000 individual images of the moon's surface. Film images were created by focusing a 70 mm film camera at a precision CRT display monitor and photographed onto special recording film. In the 50 years since, the computer files and video tape records have long disappeared or become obsolete--the only existing copies of the images are the film rolls.

Many frames from the Surveyor missions had seemingly legible text, which the operators initially thought could easily be read by conventional optical character recognition (OCR) software. They soon discovered that the characters in the text were a dot matrix similar to old printers using a 7 x 9 teletype-style character, making it a challenge to find OCR software capable of accurately reading the text fields.

This is where Matrox Imaging (Dorval, QC, Canada) comes in. The overall project involves creating a searchable archive that will outlast conventional physical media repositories. Given the possible long-term reference potential of the images and data, there is need for careful and accurate treatment of the resources. The workflow comprised an image scanning system from Stokes Imaging. The Stokes Imaging System captured between four and eight frames per minute as high-resolution .tif images.

Operator interaction was intensive during the original scanning process. While the Stokes Imaging System was automated, the film itself was not uniform in spacing, indexing, exposure, or processing. Once scanned, Adobe, Photoshop, and MATLAB software were used to pick out the details and create large composite mosaics from the image files. The process also required manual error checking since the decoding of the dot-field data relied on calibration lookup tables created from the original 1966 pre-launch test data.

The project began in February 2015 with the assembly of the Stokes system, and continues to process, catalog, and data-mine the information contained within the images.

Even though there are sprocket perforations on the film stock, the original recording transport was sprocket-less, resulting in inconsistent frame spacing as well as frames drifting with respect to the edge perforations. The team at LPL was unable to determine a consistent film advance, and with each new roll of film, the spacing of the frames and lateral positioning of the image shifted. This resulted in overall images with text in different places, as well as some images tainted with artifacts. Moreover, the data fields have HRT with varying number of characters.

Matrox's solution--based on one of its efficient and accurate OCR software tools--addressed the problem of reading dot-matrix characters, and reduced the time expenditure to a few minutes per roll.

The initial review of the Matrox OCR solution showed an almost perfect read from nearly 4500 different image files. For example, for roll 1 of Mission 5, the Matrox OCR solution scanned 846 files, reading 15,191 individual fields for a staggering 99.77% accuracy. Rolls 2 and 9 of Mission 5 were even better, yielding respective 99.92% and 100% accuracy rates.

To date, the Matrox photonics software has helped tackle data from Surveyor 5, and will prove a valuable tool during the catalogue and error check of data from Surveyor 6 and 7, along with other mission materials from NASA projects and explorations.


  Copyright © The Institute of Optics And Electronics, The chinese Academy of Sciences
Address: Box 350, Shuangliu, Chengdu, Sichuan, China Post Code: 610 209 备案号:蜀ICP备05022581号