Creating and using minimizer sketches in computational genomics

H Zheng, G Marçais, C Kingsford - Journal of Computational …, 2023 - liebertpub.com
Processing large data sets has become an essential part of computational genomics.
Greatly increased availability of sequence data from multiple sources has fueled …

High-quality haplotype-resolved genome assembly of cultivated octoploid strawberry

J Mao, Y Wang, B Wang, J Li, C Zhang… - Horticulture …, 2023 - academic.oup.com
Cultivated strawberry (Fragaria× ananassa), a perennial herb belonging to the family
Rosaceae, is a complex octoploid with high heterozygosity at most loci. However, there is no …

Ten quick tips for bioinformatics analyses using an Apache Spark distributed computing environment

D Chicco, U Ferraro Petrillo… - PLOS Computational …, 2023 - journals.plos.org
Some scientific studies involve huge amounts of bioinformatics data that cannot be analyzed
on personal computers usually employed by researchers for day-to-day activities but rather …

Efficient minimizer orders for large values of k using minimum decycling sets

D Pellow, L Pu, B Ekim, L Kotlar, B Berger… - Genome …, 2023 - genome.cshlp.org
Minimizers are ubiquitously used in data structures and algorithms for efficient searching,
mapping, and indexing of high-throughput DNA sequencing data. Minimizer schemes select …

Framing Apache Spark in life sciences

A Manconi, M Gnocchi, L Milanesi, O Marullo… - Heliyon, 2023 - cell.com
Advances in high-throughput and digital technologies have required the adoption of big data
for handling complex tasks in life sciences. However, the drift to big data led researchers to …

How to optimally sample a sequence for rapid analysis

MC Frith, J Shaw, JL Spouge - Bioinformatics, 2023 - academic.oup.com
Motivation We face an increasing flood of genetic sequence data, from diverse sources,
requiring rapid computational analysis. Rapid analysis can be achieved by sampling a …

Sequence-specific minimizers via polar sets

H Zheng, C Kingsford, G Marçais - Bioinformatics, 2021 - academic.oup.com
Motivation Minimizers are efficient methods to sample k-mers from genomic sequences that
unconditionally preserve sufficiently long matches between sequences. Well-established …

Evolution and related pathogenic genes of Pseudodiploöspora longispora on Morchella based on genomic characterization and comparative genomic analysis

J Xie, X Liu, Z Qin, S Mei, E Tarafder, C Li, X Zeng… - Scientific Reports, 2024 - nature.com
True morels (Morchella) are globally renowned medicinal and edible mushrooms. White
mold disease caused by fungi is the main disease of Morchella, which has the …

[HTML][HTML] Survey: time-series data preprocessing: a survey and an empirical analysis

A Tawakuli, B Havers, V Gulisano, D Kaiser… - Journal of Engineering …, 2024 - Elsevier
Data are naturally collected in their raw state and must undergo a series of preprocessing
steps to obtain data in their input state for Artificial Intelligence (AI) and other applications …

Bidirectional String Anchors for Improved Text Indexing and Top- Similarity Search

G Loukides, SP Pissis… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The minimizers sampling mechanism is a popular mechanism for string sampling. However,
minimizers sampling mechanisms lack good guarantees on the expected size of their …