[PDF][PDF] Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data.

MCSF Costa, JV de Araújo Oliveira, WMC Silva… - …, 2021 - scitepress.org
MCSF Costa, JV de Araújo Oliveira, WMC Silva, R Sen, J Fallmann, PF Stadler
Bioinformatics, 2021scitepress.org
Machine learning (ML) methods are often used to identify members of non-coding RNA
classes such as microRNAs or snoRNAs. However, ML methods have not been successfully
used for homology search tasks. A systematic evaluation of ML in homology search requires
large, controlled, and known ground truth test sets, and thus, methods to construct large
realistic artificial data sets. Here we describe a method for producing sets of arbitrarily large
and diverse snoRNA sequences based on artificial evolution. These are then used to …
Abstract
Machine learning (ML) methods are often used to identify members of non-coding RNA classes such as microRNAs or snoRNAs. However, ML methods have not been successfully used for homology search tasks. A systematic evaluation of ML in homology search requires large, controlled, and known ground truth test sets, and thus, methods to construct large realistic artificial data sets. Here we describe a method for producing sets of arbitrarily large and diverse snoRNA sequences based on artificial evolution. These are then used to evaluate supervised ML methods (Support Vector Machine, Artificial Neural Network, and Random Forest) for snoRNA detection in a chordate genome. Our results indicate that ML approaches can indeed be competitive also for homology search.
scitepress.org
以上显示的是最相近的搜索结果。 查看全部搜索结果