The Effect of Lexicon Based and Machine Learning Based Sentiment Analysis on the Accuracy of a Stock Market Prediction System
School Name
Spring Valley High School
Grade Level
10th Grade
Presentation Topic
Computer Science
Presentation Type
Non-Mentored
Abstract
In a world becoming more connected by social media and technology, the analysis of public opinion, otherwise known as sentiment analysis, has become an increasingly common form of data mining. Particularly, this study focused on using sentiment analysis to increase the accuracy of stock market prediction systems. Two of the main methods of sentiment analysis were compared: the lexicon based method and the machine learning based method. It was hypothesized that the machine learning sentiment analysis would benefit the prediction system the most. Over 45,000 Tweets about Apple, Google, and Microsoft were analyzed for sentiment with a lexicon program and a machine learning program. The independent variable for this experiment was the set of features used for the stock market prediction system. One level of the independent variable was using only past stocks as training data, another was using past stocks and sentiment values from a lexicon program, and the last was using past stocks with sentiment values from a machine learning program. An ANOVA test was conducted, revealing that the accuracies of the three prediction systems were significantly different; F(2,89997)=689.51, p<0.00001. The Tukey post hoc test rejected the hypothesis, indicating that the sentiment analysis methods worsened the accuracy of the prediction system. Although this sentiment analysis did not benefit the stock market predictor, there are many parameters that could be changed in future studies. Examples include experimenting with different lexicon dictionaries and the type of machine learning program used for sentiment analysis.
Recommended Citation
Li, Christopher, "The Effect of Lexicon Based and Machine Learning Based Sentiment Analysis on the Accuracy of a Stock Market Prediction System" (2020). South Carolina Junior Academy of Science. 91.
https://scholarexchange.furman.edu/scjas/2020/all/91
Location
Furman Hall 109
Start Date
3-28-2020 9:00 AM
Presentation Format
Oral and Written
Group Project
No
The Effect of Lexicon Based and Machine Learning Based Sentiment Analysis on the Accuracy of a Stock Market Prediction System
Furman Hall 109
In a world becoming more connected by social media and technology, the analysis of public opinion, otherwise known as sentiment analysis, has become an increasingly common form of data mining. Particularly, this study focused on using sentiment analysis to increase the accuracy of stock market prediction systems. Two of the main methods of sentiment analysis were compared: the lexicon based method and the machine learning based method. It was hypothesized that the machine learning sentiment analysis would benefit the prediction system the most. Over 45,000 Tweets about Apple, Google, and Microsoft were analyzed for sentiment with a lexicon program and a machine learning program. The independent variable for this experiment was the set of features used for the stock market prediction system. One level of the independent variable was using only past stocks as training data, another was using past stocks and sentiment values from a lexicon program, and the last was using past stocks with sentiment values from a machine learning program. An ANOVA test was conducted, revealing that the accuracies of the three prediction systems were significantly different; F(2,89997)=689.51, p<0.00001. The Tukey post hoc test rejected the hypothesis, indicating that the sentiment analysis methods worsened the accuracy of the prediction system. Although this sentiment analysis did not benefit the stock market predictor, there are many parameters that could be changed in future studies. Examples include experimenting with different lexicon dictionaries and the type of machine learning program used for sentiment analysis.