Bioinformatics applications on apache spark

R Guo, Y Zhao, Q Zou, X Fang, S Peng - GigaScience, 2018 - academic.oup.com
With the rapid development of next-generation sequencing technology, ever-increasing
quantities of genomic data pose a tremendous challenge to data processing. Therefore …

Recommendations for performance optimizations when using GATK3. 8 and GATK4

JR Heldenbrand, S Baheti, MA Bockol, TM Drucker… - BMC …, 2019 - Springer
Abstract Background Use of the Genome Analysis Toolkit (GATK) continues to be the
standard practice in genomic variant calling in both research and the clinic. Recently the …

elPrep 4: A multithreaded framework for sequence analysis

C Herzeel, P Costanza, D Decap, J Fostier… - PLoS …, 2019 - journals.plos.org
We present elPrep 4, a reimplementation from scratch of the elPrep framework for
processing sequence alignment map files in the Go programming language. elPrep 4 …

BigFiRSt: a software program using big data technique for mining simple sequence repeats from large-scale sequencing data

J Chen, F Li, M Wang, J Li, TT Marquez-Lago… - Frontiers in big …, 2022 - frontiersin.org
Background Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide
sequences. It has been shown that SSRs are associated with human diseases and are of …

SparkGA2: production-quality memory-efficient Apache Spark based genome analysis framework

H Mushtaq, N Ahmed, Z Al-Ars - PloS one, 2019 - journals.plos.org
Due to the rapid decrease in the cost of NGS (Next Generation Sequencing), interest has
increased in using data generated from NGS to diagnose genetic diseases. However, the …

The utilization of perspective quantum technologies in biomedicine

PA Tarasov, EA Isaev, AA Grigoriev… - Journal of Physics …, 2020 - iopscience.iop.org
Currently, there is a widespread introduction of quantum technologies in human activity. The
prospects of quantum technologies use for the needs of biomedicine are considered. The …

A fast and scalable workflow for SNPs detection in genome sequences using hadoop map-reduce

M Tahir, M Sardaraz - Genes, 2020 - mdpi.com
Next generation sequencing (NGS) technologies produce a huge amount of biological data,
which poses various issues such as requirements of high processing time and large …

Review of Classification and Feature Selection Methods for Genome‐Wide Association SNP for Breast Cancer

LR Sujithra, A Kuntha - Artificial Intelligence for Sustainable …, 2023 - Wiley Online Library
Cancer is a complicated disease with many molecular changes driven by hereditary,
environmental, and lifestyle factors. Cancer cells develop abnormalities that alter the cells …

ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark

A Xiao, Z Wu, S Dong - BMC bioinformatics, 2019 - Springer
Background The advance of next generation sequencing enables higher throughput with
lower price, and as the basic of high-throughput sequencing data analysis, variant calling is …

xGAP: a python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery

A Gorla, B Jew, L Zhang, JH Sul - Bioinformatics, 2021 - academic.oup.com
Motivation Since the first human genome was sequenced in 2001, there has been a rapid
growth in the number of bioinformatic methods to process and analyze next-generation …