Furman University Scholar Exchange - South Carolina Junior Academy of Science: The Effect of Structural Variation on Gene Expression
 

The Effect of Structural Variation on Gene Expression

School Name

Easley High School

Grade Level

12th Grade

Presentation Topic

Cell and Molecular Biology

Presentation Type

Non-Mentored

Abstract

Short-read RNA sequencing captures only fragments of gene transcripts, requiring computational reconstruction and limiting transcript diversity knowledge. In contrast, long-read sequencing provides full-length RNA transcripts without reconstruction. Dr. Mahmoud et al. identified 389 medically relevant genes, selecting glucose-6-phosphate isomerase (GPI) on chromosome 19 and its neighboring genes (GARRE1, PDCD2L, UBA2) due to their links to genetic disorders. Analyzing 130 haplotypes from 65 diverse individuals in the Human Genome Structural Variation Consortium (HGSVC), repeat masking was performed using RepeatMasker with the Dfam library. Gene and exon locations were determined using Ensembl (release 113). A custom exon library was created, retaining only exon hits with less than 5% divergence. For each haplotype, the intronic region between exons 9 and 10 of GPI was identified and analyzed using k-mers (15-79 bases). The 64 unique k-mers were aligned with MUSCLE, producing a 50-base consensus sequence dubbed the dark region repeat consensus sequence. This sequence was analyzed with nhmmer, retaining hits with e-values below 0.01. Multiple sequence alignment using Clustal Omega revealed eight network-based component consensus sequences (NCCs), used to reannotate the region, yielding 13 unique GPI dark region haplotypes. A key structural variation found was a deletion in one individual's haplotype, leading to the loss of a novel isoform. This study highlights long-read sequencing's ability to uncover previously unknown transcript variations.

Location

PENNY 201

Start Date

4-5-2025 9:45 AM

Presentation Format

Oral and Written

Group Project

No

COinS
 
Apr 5th, 9:45 AM

The Effect of Structural Variation on Gene Expression

PENNY 201

Short-read RNA sequencing captures only fragments of gene transcripts, requiring computational reconstruction and limiting transcript diversity knowledge. In contrast, long-read sequencing provides full-length RNA transcripts without reconstruction. Dr. Mahmoud et al. identified 389 medically relevant genes, selecting glucose-6-phosphate isomerase (GPI) on chromosome 19 and its neighboring genes (GARRE1, PDCD2L, UBA2) due to their links to genetic disorders. Analyzing 130 haplotypes from 65 diverse individuals in the Human Genome Structural Variation Consortium (HGSVC), repeat masking was performed using RepeatMasker with the Dfam library. Gene and exon locations were determined using Ensembl (release 113). A custom exon library was created, retaining only exon hits with less than 5% divergence. For each haplotype, the intronic region between exons 9 and 10 of GPI was identified and analyzed using k-mers (15-79 bases). The 64 unique k-mers were aligned with MUSCLE, producing a 50-base consensus sequence dubbed the dark region repeat consensus sequence. This sequence was analyzed with nhmmer, retaining hits with e-values below 0.01. Multiple sequence alignment using Clustal Omega revealed eight network-based component consensus sequences (NCCs), used to reannotate the region, yielding 13 unique GPI dark region haplotypes. A key structural variation found was a deletion in one individual's haplotype, leading to the loss of a novel isoform. This study highlights long-read sequencing's ability to uncover previously unknown transcript variations.