The Effect of Structural Variation on Gene Expression
School Name
Easley High School
Grade Level
12th Grade
Presentation Topic
Cell and Molecular Biology
Presentation Type
Non-Mentored
Abstract
Short-read RNA sequencing captures only fragments of gene transcripts, requiring computational reconstruction and limiting transcript diversity knowledge. In contrast, long-read sequencing provides full-length RNA transcripts without reconstruction. Dr. Mahmoud et al. identified 389 medically relevant genes, selecting glucose-6-phosphate isomerase (GPI) on chromosome 19 and its neighboring genes (GARRE1, PDCD2L, UBA2) due to their links to genetic disorders. Analyzing 130 haplotypes from 65 diverse individuals in the Human Genome Structural Variation Consortium (HGSVC), repeat masking was performed using RepeatMasker with the Dfam library. Gene and exon locations were determined using Ensembl (release 113). A custom exon library was created, retaining only exon hits with less than 5% divergence. For each haplotype, the intronic region between exons 9 and 10 of GPI was identified and analyzed using k-mers (15-79 bases). The 64 unique k-mers were aligned with MUSCLE, producing a 50-base consensus sequence dubbed the dark region repeat consensus sequence. This sequence was analyzed with nhmmer, retaining hits with e-values below 0.01. Multiple sequence alignment using Clustal Omega revealed eight network-based component consensus sequences (NCCs), used to reannotate the region, yielding 13 unique GPI dark region haplotypes. A key structural variation found was a deletion in one individual's haplotype, leading to the loss of a novel isoform. This study highlights long-read sequencing's ability to uncover previously unknown transcript variations.
Recommended Citation
Rodriguez, Alejandra, "The Effect of Structural Variation on Gene Expression" (2025). South Carolina Junior Academy of Science. 31.
https://scholarexchange.furman.edu/scjas/2025/all/31
Location
PENNY 201
Start Date
4-5-2025 9:45 AM
Presentation Format
Oral and Written
Group Project
No
The Effect of Structural Variation on Gene Expression
PENNY 201
Short-read RNA sequencing captures only fragments of gene transcripts, requiring computational reconstruction and limiting transcript diversity knowledge. In contrast, long-read sequencing provides full-length RNA transcripts without reconstruction. Dr. Mahmoud et al. identified 389 medically relevant genes, selecting glucose-6-phosphate isomerase (GPI) on chromosome 19 and its neighboring genes (GARRE1, PDCD2L, UBA2) due to their links to genetic disorders. Analyzing 130 haplotypes from 65 diverse individuals in the Human Genome Structural Variation Consortium (HGSVC), repeat masking was performed using RepeatMasker with the Dfam library. Gene and exon locations were determined using Ensembl (release 113). A custom exon library was created, retaining only exon hits with less than 5% divergence. For each haplotype, the intronic region between exons 9 and 10 of GPI was identified and analyzed using k-mers (15-79 bases). The 64 unique k-mers were aligned with MUSCLE, producing a 50-base consensus sequence dubbed the dark region repeat consensus sequence. This sequence was analyzed with nhmmer, retaining hits with e-values below 0.01. Multiple sequence alignment using Clustal Omega revealed eight network-based component consensus sequences (NCCs), used to reannotate the region, yielding 13 unique GPI dark region haplotypes. A key structural variation found was a deletion in one individual's haplotype, leading to the loss of a novel isoform. This study highlights long-read sequencing's ability to uncover previously unknown transcript variations.