All Events

Developing A Document Classifier Using A Part Of Speech Tagger

Emily Babb

School Name

Governor's School for Science and Math

Grade Level

12th Grade

Presentation Topic

Math and Computer Science

Presentation Type

Mentored

Mentor

Mentor: Dr. Rashid; Knowledge Management, German Research Center for Artificial Intelligence

Abstract

Natural language processing is a form of artificial intelligence, in which human language is interpreted and examined. In natural language processing, researchers have the ability to summarize a document of text into a paragraph of text, to translate text from one language to another, and to give an answer to provided question. The Natural Language Toolkit1 (NLTK) is a python software library that offers helpful methods in this subset of artificial intelligence. The overall goal of the research was to develop a classifier, which could sort documents into type, such as email, essay, or joke, and its tone towards a subject by tagging the words in the document with their respective parts of speech. As research progressed, it could be seen that the part of speech tagger was not tagging with a high accuracy using the NLTK software. Therefore, I began to examine the NLTK part of speech tagger. Many documents, all of different types, were tagged using the NTLK toolkit. Those same documents were then manually tagged using a dictionary. Then, the percent accuracy of the NLTK part of speech tagger was determined, and steps were taken to improve the tagger, which was critical to the success of the classifier.

Recommended Citation

Babb, Emily, "Developing A Document Classifier Using A Part Of Speech Tagger" (2016). South Carolina Junior Academy of Science. 82.
https://scholarexchange.furman.edu/scjas/2016/all/82

Location

Owens 207

Start Date

4-16-2016 8:30 AM

COinS

Apr 16th, 8:30 AM

Developing A Document Classifier Using A Part Of Speech Tagger

Owens 207

South Carolina Junior Academy of Science

All Events

Developing A Document Classifier Using A Part Of Speech Tagger

School Name

Grade Level

Presentation Topic

Presentation Type

Mentor

Abstract

Recommended Citation

Location

Start Date

Search

Links

Browse

South Carolina Junior Academy of Science

All Events

Developing A Document Classifier Using A Part Of Speech Tagger

Author(s)

School Name

Grade Level

Presentation Topic

Presentation Type

Mentor

Abstract

Recommended Citation

Location

Start Date

Search

Links

Browse