All Events

Analysis Of Historical Documents Through The Use Of Optical Character Recognition

Eleanor Burch

School Name

Governor's School for Science and Math

Grade Level

12th Grade

Presentation Topic

Math and Computer Science

Presentation Type

Mentored

Mentor

Mentor: Dr. Saqib Bukhari; Knowledge Management, German Research Center for Artificial Intelligence

Oral Presentation Award

2nd Place

Abstract

As our world enters an electronic era, it has become important to be able to quickly and easily preserve documents in an electronic format. The purpose of this project was to build upon a preexisting optical character recognition (OCR) system in order to be able to analyze and recognize the text in handwritten historical documents. The preexisting system, called OCRopus and created by researchers from the German Research Center for Artificial Intelligence in Kaiserslautern, Germany, was designed to recognize computer created documents that have a specific font and spacing between words and characters. However, historical documents are handwritten, with varied spacing between words and characters, and contain characters that no longer exist in the modern alphabet. In order to examine handwritten documents, a program was written to divide lines of text into words. While individual characters can be recognized by finding the blank space between characters, the spacing between words varies. The average spacing between words was found in order to accurately divide lines into words. In addition, the grayscale images of text were binarized into black and white images in a way that eliminated as many random marks, or noise, on the page as possible.

Recommended Citation

Burch, Eleanor, "Analysis Of Historical Documents Through The Use Of Optical Character Recognition" (2016). South Carolina Junior Academy of Science. 85.
https://scholarexchange.furman.edu/scjas/2016/all/85

Location

Owens 207

Start Date

4-16-2016 9:15 AM

COinS

Apr 16th, 9:15 AM

Analysis Of Historical Documents Through The Use Of Optical Character Recognition

Owens 207

South Carolina Junior Academy of Science

All Events

Analysis Of Historical Documents Through The Use Of Optical Character Recognition

School Name

Grade Level

Presentation Topic

Presentation Type

Mentor

Oral Presentation Award

Abstract

Recommended Citation

Location

Start Date

Search

Links

Browse

South Carolina Junior Academy of Science

All Events

Analysis Of Historical Documents Through The Use Of Optical Character Recognition

Author(s)

School Name

Grade Level

Presentation Topic

Presentation Type

Mentor

Oral Presentation Award

Abstract

Recommended Citation

Location

Start Date

Search

Links

Browse