Using machine learning to predict antimicrobial minimum inhibitory concentrations and associated genomic features for nontyphoidal Salmonella

M Nguyen, SW Long, PF McDermott, RJ Olsen… - BioRxiv, 2018 - biorxiv.org
BioRxiv, 2018biorxiv.org
Nontyphoidal Salmonella species are the leading bacterial cause of food-borne disease in
the United States. Whole genome sequences and paired antimicrobial susceptibility data
are available for Salmonella strains because of surveillance efforts from public health
agencies. In this study, a collection of 5,278 nontyphoidal Salmonella genomes, collected
over 15 years in the United States, were used to generate XGBoost-based machine learning
models for predicting minimum inhibitory concentrations (MICs) for 15 antibiotics. The MIC …
Nontyphoidal Salmonella species are the leading bacterial cause of food-borne disease in the United States. Whole genome sequences and paired antimicrobial susceptibility data are available for Salmonella strains because of surveillance efforts from public health agencies. In this study, a collection of 5,278 nontyphoidal Salmonella genomes, collected over 15 years in the United States, were used to generate XGBoost-based machine learning models for predicting minimum inhibitory concentrations (MICs) for 15 antibiotics. The MIC prediction models have average accuracies between 95-96% within ± 1 two-fold dilution factor and can predict MICs with no a priori information about the underlying gene content or resistance phenotypes of the strains. By selecting diverse genomes for training sets, we show that highly accurate MIC prediction models can be generated with fewer than 500 genomes. We also show that our approach for predicting MICs is stable over time despite annual fluctuations in antimicrobial resistance gene content in the sampled genomes. Finally, using feature selection, we explore the important genomic regions identified by the models for predicting MICs. To date, this is one of the largest MIC modeling studies to be published. Our strategy for developing whole genome sequence-based models for surveillance and clinical diagnostics can be readily applied to other important human pathogens.
biorxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果