Data-driven algorithm design

MF Balcan - arXiv preprint arXiv:2011.07177, 2020 - arxiv.org
Data driven algorithm design is an important aspect of modern data science and algorithm
design. Rather than using off the shelf algorithms that only have worst case performance …

How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design

MF Balcan, D DeBlasio, T Dick, C Kingsford… - Proceedings of the 53rd …, 2021 - dl.acm.org
Algorithms often have tunable parameters that impact performance metrics such as runtime
and solution quality. For many algorithms used in practice, no parameter settings admit …

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization

S Krieger, J Kececioglu - Bioinformatics, 2020 - academic.oup.com
Motivation Protein secondary structure prediction is a fundamental precursor to many
bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary …

[PDF][PDF] How much data is sufficient to learn high-performing algorithms?

MF Balcan, D DeBlasio, T Dick, C Kingsford… - not applicable …, 2019 - par.nsf.gov
Algorithms—for example for scientific analysis—typically have tunable parameters that
significantly influence computational efficiency and solution quality. If a parameter setting …

Athena: automated tuning of k-mer based genomic error correction algorithms using language models

M Abdallah, A Mahgoub, H Ahmed, S Chaterji - Scientific reports, 2019 - nature.com
The performance of most error-correction (EC) algorithms that operate on genomics reads is
dependent on the proper choice of its configuration parameters, such as the value of k in k …

Learning to optimize computational resources: Frugal training with generalization guarantees

MF Balcan, T Sandholm, E Vitercik - … of the AAAI Conference on Artificial …, 2020 - aaai.org
Algorithms typically come with tunable parameters that have a considerable impact on the
computational resources they consume. Too often, practitioners must hand-tune the …

Automating parameter selection to avoid implausible biological pathway models

CS Magnano, A Gitter - NPJ systems biology and applications, 2021 - nature.com
A common way to integrate and analyze large amounts of biological “omic” data is through
pathway reconstruction: using condition-specific omic data to create a subnetwork of a …

Faster algorithms for learning to link, align sequences, and price two-part tariffs

MF Balcan, C Seiler, D Sharma - arXiv preprint arXiv:2204.03569, 2022 - arxiv.org
Data-driven algorithm configuration is a promising, learning-based approach for beyond
worst-case analysis of algorithms with tunable parameters. An important open problem is the …

Highlights from the tenth ISCB student council symposium 2014

F Rahman, K Wilkins, A Jacobsen, A Junge, E Vicedo… - BMC …, 2015 - Springer
This report summarizes the scientific content and activities of the annual symposium
organized by the Student Council of the International Society for Computational Biology …

More accurate transcript assembly via parameter advising

D Deblasio, K Kim, C Kingsford - Journal of Computational Biology, 2020 - liebertpub.com
Computational tools used for genomic analyses are becoming more accurate but also
increasingly sophisticated and complex. This introduces a new problem in that these pieces …