The relative importance of domain applicability metrics for estimating prediction errors in QSAR varies with training set diversity

RP Sheridan - Journal of Chemical Information and Modeling, 2015 - ACS Publications
In QSAR, a statistical model is generated from a training set of molecules (represented by
chemical descriptors) and their biological activities (an “activity model”). The aim of the field …

Using random forest to model the domain applicability of another random forest model

RP Sheridan - Journal of chemical information and modeling, 2013 - ACS Publications
In QSAR, a statistical model is generated from a training set of molecules (represented by
chemical descriptors) and their biological activities. We will call this traditional type of QSAR …

Three useful dimensions for domain applicability in QSAR models using random forest

RP Sheridan - Journal of chemical information and modeling, 2012 - ACS Publications
One popular metric for estimating the accuracy of prospective quantitative structure–activity
relationship (QSAR) predictions is based on the similarity of the compound being predicted …

Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR

RP Sheridan, BP Feuston, VN Maiorov… - Journal of chemical …, 2004 - ACS Publications
How well can a QSAR model predict the activity of a molecule not in the training set used to
create the model? A set of retrospective cross-validation experiments using 20 diverse in …

Assessing model fit by cross-validation

DM Hawkins, SC Basak, D Mills - Journal of chemical information …, 2003 - ACS Publications
When QSAR models are fitted, it is important to validate any fitted model to check that it is
plausible that its predictions will carry over to fresh data not used in the model fitting …

Predicting the predictability: a unified approach to the applicability domain problem of QSAR models

H Dragos, M Gilles, V Alexandre - Journal of chemical information …, 2009 - ACS Publications
The present work proposes a unified conceptual framework to describe and quantify the
important issue of the Applicability Domains (AD) of Quantitative Structure− Activity …

Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models

M Toplak, R Močnik, M Polajnar… - Journal of chemical …, 2014 - ACS Publications
The vastness of chemical space and the relatively small coverage by experimental data
recording molecular properties require us to identify subspaces, or domains, for which we …

[HTML][HTML] Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions

F Sahigara, D Ballabio, R Todeschini… - Journal of …, 2013 - Springer
Background With the growing popularity of using QSAR predictions towards regulatory
purposes, such predictive models are now required to be strictly validated, an essential …

Reliably assessing prediction reliability for high dimensional QSAR data

J Huang, X Fan - Molecular diversity, 2013 - Springer
Predictability and prediction reliability are of utmost important to characterize a good
Quantitative structure–activity relationships (QSAR) model. However, validation methods are …

Data set modelability by QSAR

A Golbraikh, E Muratov, D Fourches… - Journal of chemical …, 2014 - ACS Publications
We introduce a simple MODelability Index (MODI) that estimates the feasibility of obtaining
predictive QSAR models (correct classification rate above 0.7) for a binary data set of …