Clustering Of Single Cell Using Locality Preserving Projection

Author(s)

Xiang Li

School Name

Governor's School for Science and Math

Grade Level

12th Grade

Presentation Topic

Math and Computer Science

Presentation Type

Mentored

Mentor

Mentor: Dr. Luo; School of Computing, Clemson University

Abstract

Clustering is a technique used to separate a collection of data into groups or clusters based on their attributes. Often large datasets come with unnecessary characteristics that overweigh the components that actually matter when clustering. K-means clustering is a learning algorithm most well-known for its simple method of calculation. However, due to that simplicity, unnecessary characteristics in a dataset, referred to as noise, often overweigh the fundamental characteristics. Therefore, k-means clustering is most efficient when processing a dataset with a lower dimensionality. In order to optimize the performance of k-means, a dataset must be processed through a dimensionality-reduction algorithm to lower its dimensionality. Locality Preserving Projection (LPP), one of the more accepted algorithms for dimensionality-reduction, processes the data from different cells to reduce the size of the dataset from thousands down to tens, making the process more efficient. An Adjusted Rand Index (ARI) evaluation test is run to determine the accuracy of the clustering process. ARI values measure the similarities between two clusters, so by comparing a manually clustered set of data, used as an index, to one generated by k-means clustering, an accuracy score can be assigned. A higher ARI score means the resulting clustering is closer to the perfect clusters. Clustering was performed on both the unaltered and dimensionality-reduced datasets. Results of each were compared to the manually created index clusters and ARI scores calculated. It was found that the ARI of the LPP-processed data was considerably higher and the processing speed was significantly reduced.

Location

Owens 207

Start Date

4-16-2016 12:00 PM

COinS
 
Apr 16th, 12:00 PM

Clustering Of Single Cell Using Locality Preserving Projection

Owens 207

Clustering is a technique used to separate a collection of data into groups or clusters based on their attributes. Often large datasets come with unnecessary characteristics that overweigh the components that actually matter when clustering. K-means clustering is a learning algorithm most well-known for its simple method of calculation. However, due to that simplicity, unnecessary characteristics in a dataset, referred to as noise, often overweigh the fundamental characteristics. Therefore, k-means clustering is most efficient when processing a dataset with a lower dimensionality. In order to optimize the performance of k-means, a dataset must be processed through a dimensionality-reduction algorithm to lower its dimensionality. Locality Preserving Projection (LPP), one of the more accepted algorithms for dimensionality-reduction, processes the data from different cells to reduce the size of the dataset from thousands down to tens, making the process more efficient. An Adjusted Rand Index (ARI) evaluation test is run to determine the accuracy of the clustering process. ARI values measure the similarities between two clusters, so by comparing a manually clustered set of data, used as an index, to one generated by k-means clustering, an accuracy score can be assigned. A higher ARI score means the resulting clustering is closer to the perfect clusters. Clustering was performed on both the unaltered and dimensionality-reduced datasets. Results of each were compared to the manually created index clusters and ARI scores calculated. It was found that the ARI of the LPP-processed data was considerably higher and the processing speed was significantly reduced.