Using deep learning to annotate the protein universe

ML Bileschi, D Belanger, DH Bryant, T Sanderson… - Nature …, 2022 - nature.com
Understanding the relationship between amino acid sequence and protein function is a long-
standing challenge with far-reaching scientific and translational implications. State-of-the-art …

High speed BLASTN: an accelerated MegaBLAST search tool

Y Chen, W Ye, Y Zhang, Y Xu - Nucleic acids research, 2015 - academic.oup.com
Sequence alignment is a long standing problem in bioinformatics. The Basic Local
Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools …

Computational biology in the 21st century: Scaling with compressive algorithms

B Berger, NM Daniels, YW Yu - Communications of the ACM, 2016 - dl.acm.org
Computational biology in the 21st century: scaling with compressive algorithms Page 1 72
COMMUNICATIONS OF THE ACM | AUGUST 2016 | VOL. 59 | NO. 8 review articles DOI:10.1145/2957324 …

Sketching and sublinear data structures in genomics

G Marçais, B Solomon, R Patro… - Annual Review of …, 2019 - annualreviews.org
Large-scale genomics demands computational methods that scale sublinearly with the
growth of data. We review several data structures and sketching techniques that have been …

Levenshtein distance, sequence comparison and biological database search

B Berger, MS Waterman, YW Yu - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Levenshtein edit distance has played a central role-both past and present-in sequence
alignment in particular and biological database similarity search in general. We start our …

Fast search of thousands of short-read sequencing experiments

B Solomon, C Kingsford - Nature biotechnology, 2016 - nature.com
The amount of sequence information in public repositories is growing at a rapid rate.
Although these data are likely to contain clinically important information that has not yet …

Mantis: a fast, small, and exact large-scale sequence-search index

P Pandey, F Almodaresi, MA Bender, M Ferdman… - Cell systems, 2018 - cell.com
Sequence-level searches on large collections of RNA sequencing experiments, such as the
NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the …

A survey on data compression methods for biological sequences

M Hosseini, D Pratas, AJ Pinho - Information, 2016 - mdpi.com
The ever increasing growth of the production of high-throughput sequencing data poses a
serious challenge to the storage, processing and transmission of these data. As frequently …

Single cell genomics reveals viruses consumed by marine protists

JM Brown, JM Labonté, J Brown, NR Record… - Frontiers in …, 2020 - frontiersin.org
The predominant model of the role of viruses in the marine trophic web is that of the “viral
shunt,” where viral infection funnels a substantial fraction of the microbial primary and …

Improved search of large transcriptomic sequencing databases using split sequence bloom trees

B Solomon, C Kingsford - Journal of Computational Biology, 2018 - liebertpub.com
Enormous databases of short-read RNA-seq experiments such as the NIH Sequencing
Read Archive are now available. These databases could answer many questions about …