An Extensive Overview of Feature Representation Techniques for Molecule Classification

N Pavlidis, CC Nikolaidis, V Perifanis… - Proceedings of the 27th …, 2023 - dl.acm.org
Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and …, 2023dl.acm.org
The application of Machine Learning (ML) algorithms in biomedical engineering and
specifically on molecular data has gained much attention in recent years. Accurate
predictions using molecular data is directly linked with many open problems, like drug
discovery, disease prediction and treatment optimization. However, finding the most
appropriate method for transforming molecules into feature-ready inputs for an ML algorithm
is a challenging task. Despite the numerous featurizers, ie algorithms that transform …
The application of Machine Learning (ML) algorithms in biomedical engineering and specifically on molecular data has gained much attention in recent years. Accurate predictions using molecular data is directly linked with many open problems, like drug discovery, disease prediction and treatment optimization. However, finding the most appropriate method for transforming molecules into feature-ready inputs for an ML algorithm is a challenging task. Despite the numerous featurizers, i.e. algorithms that transform molecules into features, there is a lack of comprehensive analysis comparing their impact on model’s accuracy and efficiency for downstream tasks. In this study, we evaluate ten (10) featurizers and five (5) ML models for classification tasks. In addition, we explore the differences between the two main categories of featurizing techniques, i.e., Graph and Linear form representations. Our results show that the selection of an appropriate featurizer is model and application specific. We demonstrate that a combination of linear molecular representation and a conventional ML algorithm can result in superior predictive performance than more complex and sophisticated graph-based representations.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果