作者
Akash Nag, Sunil Karforma
发表日期
2017/7/7
期刊
Indian Journal of Science and Technology
卷号
10
期号
25
简介
Objectives
We present an algorithm to quickly identify conserved patterns from a set of aligned protein sequences.
Method
Using contribution statistics, the proposed method identifies a motif describing the given set of sequences, and it is flexible enough to identify variable-length wildcard regions and also identifying motif elements based on regions containing amino-acids having similar physiochemical properties. In this paper, we compare its performance against other well-known motif-discovery algorithms, on three datasets: snake-toxins, insulin proteins, and methylated-DNA proteincysteine methyl transferase active-site enzymes.
Findings
When tested with 91 neurotoxin protein sequences from 45 species of Elapid snakes, the algorithm successfully generated a motif which had a 97% precision. The motif generated by our algorithm had a 92% precision on the Insulin family and 96.5% on the MGMT family of proteins.
Novelty
Our algorithm is fast, efficient, outperforms on average the commonly used motif generation algorithms in terms of accuracy, and never fails to report any motifs unlike some other algorithms.
引用总数
学术搜索中的文章