Johns Hopkins Researchers Unveil AI Tool Splam for Gene Splicing Analysis

Edited by: Надежда Садикова

Johns Hopkins researchers have developed a powerful new AI tool called Splam that can identify where splicing occurs in genes. This advancement could assist scientists in analyzing genetic data with greater accuracy, offering new insights into gene function and the role of mutations in disease.

The results of this research are published in Genome Biology.

Splam analyzes genetic data with greater accuracy than existing methods, exploring potential links between mutations and disease. It recognizes splice sites, where cells remove non-essential portions, which is a crucial step in assembling gene transcripts to identify functional parts of DNA.

“Precisely identifying splicing sites is key to understanding how cells interpret genetic instructions,” says co-lead author Kuan-Hao Chao, a doctoral student in the Whiting School of Engineering's Department of Computer Science, affiliated with the Center for Computational Biology (CCB). “Splam lets us analyze genetic data with accuracy and efficiency, showing how mutations affect our health and why the same gene can produce different proteins in different conditions.”

Chao is joined by his advisors, Steven Salzberg, the Bloomberg Distinguished Professor of Computational Biology and Genomics and director of the CCB, Mihaela Pertea, an associate professor of biomedical engineering and genetic medicine, and Alan Mao, a fourth-year undergraduate majoring in biomedical engineering and computer science.

Cells rely on genes to guide their functions, containing both useful instructions (exons) and non-essential segments (introns). Splicing is the process where cells remove non-essential portions, retaining only what is necessary.

Recognizing splice sites computationally is crucial for accurately assembling gene transcripts. RNA sequencing experiments measure gene expression levels in different conditions, determining if a gene is active or inactive.

“For example, cancer researchers often use RNA sequencing techniques to compare gene expression in healthy versus cancerous cells,” explains Chao.

Identifying splice sites is also significant in genome annotation, which involves determining functional parts of DNA. Genetic testing services, such as those offered by 23andMe, utilize genome annotation to provide insights into ancestry, health risks, and genetic traits.

Compared to the state-of-the-art SpliceAI tool, the Hopkins team's Splam method uses a shorter DNA sequence window to predict RNA splice sites, making it more biologically realistic for research purposes. The Splam algorithm processes a DNA sequence of 800 nucleotides and outputs the probability for each base pair being a donor site, an acceptor site, or neither.

“Our algorithm attempts to recognize these donor/acceptor sites in pairs, similar to how a spliceosome molecular machine operates in cells,” states Chao.

The team developed their algorithm to recognize splice junctions within a window of 800 nucleotides, significantly smaller than the 10,000 nucleotides required by SpliceAI. Despite using less genomic data, Splam achieves better splice junction recognition accuracy.

After training their deep learning model on human DNA, the researchers tested it on the genomes of a chimpanzee, a mouse, and a flowering plant. Their experiments confirmed that Splam's design produced accurate results on these more distant DNA sequences, indicating that it learned essential splicing patterns shared across various species.

The team's next steps involve applying the model to more species and integrating it into existing RNA sequencing pipelines for practical use in transcriptome assembly. “Our method has immediate applications in improving transcriptome assembly and reducing splicing noise, making it valuable for a wide range of genomic studies,” concludes Chao.

Did you find an error or inaccuracy?

We will consider your comments as soon as possible.