Ultrafast clustering algorithms for metagenomic sequence analysis

W Li, L Fu, B Niu, S Wu, J Wooley - Briefings in bioinformatics, 2012 - academic.oup.com
The rapid advances of high-throughput sequencing technologies dramatically prompted
metagenomic studies of microbial communities that exist at various environments …

Large differences in gene expression responses to drought and heat stress between elite barley cultivar Scarlett and a Spanish landrace

CP Cantalapiedra, MJ García-Pereira… - Frontiers in plant …, 2017 - frontiersin.org
Drought causes important losses in crop production every season. Improvement for drought
tolerance could take advantage of the diversity held in germplasm collections, much of …

A bioinformatician's guide to the forefront of suffix array construction algorithms

AMS Shrestha, MC Frith, P Horton - Briefings in bioinformatics, 2014 - academic.oup.com
The suffix array and its variants are text-indexing data structures that have become
indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an …

gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections

FA Louza, GP Telles, S Gog, N Prezza… - Algorithms for Molecular …, 2020 - Springer
Background The construction of a suffix array for a collection of strings is a fundamental task
in Bioinformatics and in many other applications that process strings. Related data …

Hadooping the genome: The impact of big data tools on biology

H Stevens - BioSocieties, 2016 - Springer
This essay examines the consequences of the so-called 'big data'technologies in
biomedicine. Analyzing algorithms and data structures used by biologists can provide …

Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data

N Välimäki, E Rivals - International Symposium on Bioinformatics …, 2013 - Springer
Abstract Philippe et al.(2011) proposed a data structure called Gk arrays for indexing and
querying large collections of high-throughput sequencing data in main-memory. The data …

Diagaf: A more accurate and efficient pre-alignment filter for sequence alignment

C Yu, Y Zhao, C Zhao, H Ma… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Sequence alignment is an essential step in computational genomics. More accurate and
efficient sequence pre-alignment methods that run before conducting expensive …

An overview of string processing applications to data analytics

H Koponen, N Mhaskar… - 2021 Reconciling Data …, 2021 - ieeexplore.ieee.org
Data analytics may conveniently be divided into four stages: preparation, preprocessing,
analysis, and post-processing. Especially in the second and third of these, where the data is …

Efficient soft relational clustering based on randomized search applied to selection of bio-basis for amino acid sequence analysis

MA Mahfouz, MA Ismail - 2012 Seventh International …, 2012 - ieeexplore.ieee.org
Protein sequence clustering is a process that aims to identify sets of homologous proteins in
a protein database. In this paper, two efficient soft c-mediods clustering algorithms for …

Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems

R Kobus - 2023 - openscience.ub.uni-mainz.de
A wide range of bioinformatics applications have to deal with a continuously growing
amount of data generated by high-throughput sequencing techniques. Exclusively CPU …