Nucleotide Transformer: building and evaluating robust foundation models for human genomics

H Dalla-Torre, L Gonzalez, J Mendoza-Revilla… - Nature …, 2024 - nature.com
The prediction of molecular phenotypes from DNA sequences remains a longstanding
challenge in genomics, often driven by limited annotated data and the inability to transfer …

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Y Ji, Z Zhou, H Liu, RV Davuluri - Bioinformatics, 2021 - academic.oup.com
Motivation Deciphering the language of non-coding DNA is one of the fundamental
problems in genome research. Gene regulatory code is highly complex due to the existence …

Learning the regulatory code of gene expression

J Zrimec, F Buric, M Kokina, V Garcia… - Frontiers in Molecular …, 2021 - frontiersin.org
Data-driven machine learning is the method of choice for predicting molecular phenotypes
from nucleotide sequence, modeling gene expression events including protein-DNA …

Principles and correction of 5'-splice site selection

F Malard, CD Mackereth, S Campagne - RNA biology, 2022 - Taylor & Francis
In Eukarya, immature mRNA transcripts (pre-mRNA) often contain coding sequences, or
exons, interleaved by non-coding sequences, or introns. Introns are removed upon splicing …

Spliceator: multi-species splice site prediction using convolutional neural networks

N Scalzitti, A Kress, R Orhand, T Weber, L Moulinier… - BMC …, 2021 - Springer
Background Ab initio prediction of splice sites is an essential step in eukaryotic genome
annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene …

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools

PL Germain, A Sonrel, MD Robinson - Genome biology, 2020 - Springer
Abstract We present pipeComp (https://github. com/plger/pipeComp), a flexible R framework
for pipeline comparison handling interactions between analysis steps and relying on multi …

[HTML][HTML] Computational identification of N6-methyladenosine sites in multiple tissues of mammals

FY Dao, H Lv, YH Yang, H Zulfiqar, H Gao… - Computational and …, 2020 - Elsevier
Abstract N6-methyladenosine (m6A) is the methylation of the adenosine at the nitrogen-6
position, which is the most abundant RNA methylation modification and involves a series of …

DL-m6A: Identification of N6-methyladenosine Sites in Mammals using deep learning based on different encoding schemes

MU Rehman, H Tayara… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
N6-methyladenosine (m6A) is a common post-transcriptional alteration that plays a critical
function in a variety of biological processes. Although experimental approaches for …

Neural network analysis

A Joshi, J Sasumana, NM Ray, V Kaushik - Advances in bioinformatics, 2021 - Springer
Neural networks play very significant role when it comes to analysis of proteins and nucleic
acid sequences. Many of the pattern recognition software are based on neural networks for …

Progress and opportunities of foundation models in bioinformatics

Q Li, Z Hu, Y Wang, L Li, Y Fan, I King… - Briefings in …, 2024 - academic.oup.com
Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly
through foundation models (FMs), which address longstanding challenges in bioinformatics …