GHOST: adjusting the decision threshold to handle imbalanced data in machine learning

C Esposito, GA Landrum, N Schneider… - Journal of Chemical …, 2021 - ACS Publications
Machine learning classifiers trained on class imbalanced data are prone to overpredict the
majority class. This leads to a larger misclassification rate for the minority class, which in …

Structure–activity relationship-based chemical classification of highly imbalanced Tox21 datasets

G Idakwo, S Thangapandian, J Luttrell, Y Li… - Journal of …, 2020 - Springer
The specificity of toxicant-target biomolecule interactions lends to the very imbalanced
nature of many toxicity datasets, causing poor performance in Structure–Activity …

Deep learning-based imbalanced data classification for drug discovery

S Korkmaz - Journal of chemical information and modeling, 2020 - ACS Publications
Drug discovery studies have become increasingly expensive and time-consuming
processes. In the early phase of drug discovery studies, an extensive search has been …

AugLiChem: data augmentation library of chemical structures for machine learning

R Magar, Y Wang, C Lorsung, C Liang… - Machine Learning …, 2022 - iopscience.iop.org
Abstract Machine learning (ML) has demonstrated the promise for accurate and efficient
property prediction of molecules and crystalline materials. To develop highly accurate ML …

Tapping on the black box: how is the scoring power of a machine-learning scoring function dependent on the training set?

M Su, G Feng, Z Liu, Y Li, R Wang - Journal of chemical …, 2020 - ACS Publications
In recent years, protein–ligand interaction scoring functions derived through machine-
learning are repeatedly reported to outperform conventional scoring functions. However …

MoleculeNet: a benchmark for molecular machine learning

Z Wu, B Ramsundar, EN Feinberg, J Gomes… - Chemical …, 2018 - pubs.rsc.org
Molecular machine learning has been maturing rapidly over the last few years. Improved
methods and the presence of larger datasets have enabled machine learning algorithms to …

Most ligand-based classification benchmarks reward memorization rather than generalization

I Wallach, A Heifets - Journal of chemical information and …, 2018 - ACS Publications
Undetected overfitting can occur when there are significant redundancies between training
and validation data. We describe AVE, a new measure of training–validation redundancy for …

[HTML][HTML] Prediction is a balancing act: Importance of sampling methods to balance sensitivity and specificity of predictive models based on imbalanced chemical data …

P Banerjee, FO Dehnbostel, R Preissner - Frontiers in chemistry, 2018 - frontiersin.org
Increase in the number of new chemicals synthesized in past decades has resulted in
constant growth in the development and application of computational models for prediction …

A systematic study of key elements underlying molecular property prediction

J Deng, Z Yang, H Wang, I Ojima, D Samaras… - Nature …, 2023 - nature.com
Artificial intelligence (AI) has been widely applied in drug discovery with a major task as
molecular property prediction. Despite booming techniques in molecular representation …

Molecular property prediction: recent trends in the era of artificial intelligence

J Shen, CA Nicolaou - Drug Discovery Today: Technologies, 2019 - Elsevier
Artificial intelligence (AI) has become a powerful tool in many fields, including drug
discovery. Among various AI applications, molecular property prediction can have more …