The Effect of Lexicon Based and Machine Learning Based Sentiment Analysis on the Accuracy of a Stock Market Prediction System

Author(s)

Christopher Li

School Name

Spring Valley High School

Grade Level

10th Grade

Presentation Topic

Computer Science

Presentation Type

Non-Mentored

Abstract

In a world becoming more connected by social media and technology, the analysis of public opinion, otherwise known as sentiment analysis, has become an increasingly common form of data mining. Particularly, this study focused on using sentiment analysis to increase the accuracy of stock market prediction systems. Two of the main methods of sentiment analysis were compared: the lexicon based method and the machine learning based method. It was hypothesized that the machine learning sentiment analysis would benefit the prediction system the most. Over 45,000 Tweets about Apple, Google, and Microsoft were analyzed for sentiment with a lexicon program and a machine learning program. The independent variable for this experiment was the set of features used for the stock market prediction system. One level of the independent variable was using only past stocks as training data, another was using past stocks and sentiment values from a lexicon program, and the last was using past stocks with sentiment values from a machine learning program. An ANOVA test was conducted, revealing that the accuracies of the three prediction systems were significantly different; F(2,89997)=689.51, p<0.00001. The Tukey post hoc test rejected the hypothesis, indicating that the sentiment analysis methods worsened the accuracy of the prediction system. Although this sentiment analysis did not benefit the stock market predictor, there are many parameters that could be changed in future studies. Examples include experimenting with different lexicon dictionaries and the type of machine learning program used for sentiment analysis.

Location

Furman Hall 109

Start Date

3-28-2020 9:00 AM

Presentation Format

Oral and Written

Group Project

No

COinS
 
Mar 28th, 9:00 AM

The Effect of Lexicon Based and Machine Learning Based Sentiment Analysis on the Accuracy of a Stock Market Prediction System

Furman Hall 109

In a world becoming more connected by social media and technology, the analysis of public opinion, otherwise known as sentiment analysis, has become an increasingly common form of data mining. Particularly, this study focused on using sentiment analysis to increase the accuracy of stock market prediction systems. Two of the main methods of sentiment analysis were compared: the lexicon based method and the machine learning based method. It was hypothesized that the machine learning sentiment analysis would benefit the prediction system the most. Over 45,000 Tweets about Apple, Google, and Microsoft were analyzed for sentiment with a lexicon program and a machine learning program. The independent variable for this experiment was the set of features used for the stock market prediction system. One level of the independent variable was using only past stocks as training data, another was using past stocks and sentiment values from a lexicon program, and the last was using past stocks with sentiment values from a machine learning program. An ANOVA test was conducted, revealing that the accuracies of the three prediction systems were significantly different; F(2,89997)=689.51, p<0.00001. The Tukey post hoc test rejected the hypothesis, indicating that the sentiment analysis methods worsened the accuracy of the prediction system. Although this sentiment analysis did not benefit the stock market predictor, there are many parameters that could be changed in future studies. Examples include experimenting with different lexicon dictionaries and the type of machine learning program used for sentiment analysis.