Emerging evidence for functional peptides encoded by short open reading frames

SJ Andrews, JA Rothnagel - Nature Reviews Genetics, 2014 - nature.com
Short open reading frames (sORFs) are a common feature of all genomes, but their coding
potential has mostly been disregarded, partly because of the difficulty in determining …

Pfam: the protein families database

RD Finn, A Bateman, J Clements, P Coggill… - Nucleic acids …, 2014 - academic.oup.com
Pfam, available via servers in the UK (http://pfam. sanger. ac. uk/) and the USA (http://pfam.
janelia. org/), is a widely used database of protein families, containing 14 831 manually …

Learning the protein language: Evolution, structure, and function

T Bepler, B Berger - Cell systems, 2021 - cell.com
Language models have recently emerged as a powerful machine-learning approach for
distilling information from massive protein sequence databases. From readily available …

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

A Rives, J Meier, T Sercu, S Goyal… - Proceedings of the …, 2021 - National Acad Sciences
In the field of artificial intelligence, a combination of scale in data and model capacity
enabled by unsupervised learning has led to major advances in representation learning and …

Marine DNA viral macro-and microdiversity from pole to pole

AC Gregory, AA Zayed, N Conceição-Neto… - Cell, 2019 - cell.com
Microbes drive most ecosystems and are modulated by viruses that impact their lifespan,
gene flow, and metabolic outputs. However, ecosystem-level impacts of viral community …

A chromosome conformation capture ordered sequence of the barley genome

M Mascher, H Gundlach, A Himmelbach, S Beier… - Nature, 2017 - nature.com
Cereal grasses of the Triticeae tribe have been the major food source in temperate regions
since the dawn of agriculture. Their large genomes are characterized by a high content of …

Progen: Language modeling for protein generation

A Madani, B McCann, N Naik, NS Keskar… - arXiv preprint arXiv …, 2020 - arxiv.org
Generative modeling for protein engineering is key to solving fundamental problems in
synthetic biology, medicine, and material science. We pose protein engineering as an …

Anaerobic methane oxidation coupled to manganese reduction by members of the Methanoperedenaceae

AO Leu, C Cai, SJ McIlroy, G Southam… - The ISME …, 2020 - academic.oup.com
Anaerobic oxidation of methane (AOM) is a major biological process that reduces global
methane emission to the atmosphere. Anaerobic methanotrophic archaea (ANME) mediate …

MUSCLE v5 enables improved estimates of phylogenetic tree confidence by ensemble bootstrapping

RC Edgar - BioRxiv, 2021 - biorxiv.org
Phylogenetic tree confidence is often estimated from a multiple sequence alignment (MSA)
using the Felsenstein bootstrap heuristic. However, this does not account for systematic …

Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool

EY Chen, CM Tan, Y Kou, Q Duan, Z Wang… - BMC …, 2013 - Springer
Background System-wide profiling of genes and proteins in mammalian cells produce lists of
differentially expressed genes/proteins that need to be further analyzed for their collective …