Predicting Secondary Structures Of Proteins Using Deep Learning And SVM

Author(s)

Stone Holt

School Name

South Carolina Governor's School for Science and Mathematics

Grade Level

12th Grade

Presentation Topic

Math and Computer Science

Presentation Type

Mentored

Mentor

Mentor: Feng Luo, School of Computing, Clemson University

Abstract

Proteins are biological macromolecules that perform the most functions in living cells. In order for the proteins to have the correct function, they need to be folded to secondary structure and then the three-dimensional structure. The most common second structures in proteins are alpha-helices and beta-sheets. Computational prediction of protein secondary structure is important for understanding the protein functionalities. Many different algorithms have attempted to predict the secondary structure of proteins; however, current algorithms have not been able to achieve high accuracy yet due to the complexity of this problem. In this study, we used Deep Learning and Support Vector Machine (SVM) to predict the secondary structures of proteins. We created models from a training set and applied to be a testing set. We evaluated the accuracy and run time of both algorithms. The training set had a size of 10000 amino acids, and the testing set of size 3400 amino acids. Several trials (six for Deep Learning and eighty-eight for SVM) of different parameters were run to get the best results for both Deep Learning and SVM. In our results, SVM proved to be faster and more accurate with the best parameters. The highest accuracy with SVM was 71.44% with a run time of about six minutes while the highest accuracy with Deep Learning was only 65.15% with a run time of a little under nine minutes. With these results, it is easy to see the advantage that SVM has over Deep Learning in this setting.

Start Date

4-11-2015 10:00 AM

End Date

4-11-2015 10:15 AM

COinS
 
Apr 11th, 10:00 AM Apr 11th, 10:15 AM

Predicting Secondary Structures Of Proteins Using Deep Learning And SVM

Proteins are biological macromolecules that perform the most functions in living cells. In order for the proteins to have the correct function, they need to be folded to secondary structure and then the three-dimensional structure. The most common second structures in proteins are alpha-helices and beta-sheets. Computational prediction of protein secondary structure is important for understanding the protein functionalities. Many different algorithms have attempted to predict the secondary structure of proteins; however, current algorithms have not been able to achieve high accuracy yet due to the complexity of this problem. In this study, we used Deep Learning and Support Vector Machine (SVM) to predict the secondary structures of proteins. We created models from a training set and applied to be a testing set. We evaluated the accuracy and run time of both algorithms. The training set had a size of 10000 amino acids, and the testing set of size 3400 amino acids. Several trials (six for Deep Learning and eighty-eight for SVM) of different parameters were run to get the best results for both Deep Learning and SVM. In our results, SVM proved to be faster and more accurate with the best parameters. The highest accuracy with SVM was 71.44% with a run time of about six minutes while the highest accuracy with Deep Learning was only 65.15% with a run time of a little under nine minutes. With these results, it is easy to see the advantage that SVM has over Deep Learning in this setting.