Engineering Proteins Using Statistical Models of Coevolutionary Sequence Information

JC Dinan, JW McCormick… - Cold Spring Harbor …, 2024 - cshperspectives.cshlp.org
Cold Spring Harbor Perspectives in Biology, 2024cshperspectives.cshlp.org
Homologous protein sequences are wonderfully diverse, indicating many possible
evolutionary “solutions” to the encoding of function. Consequently, one can construct
statistical models of protein sequence by analyzing amino acid frequency across a large
multiple sequence alignment. A central premise is that covariance between amino acid
positions reflects coevolution due to a shared functional or biophysical constraint. In this
review, we describe the implementation and discuss the advantages, limitations, and recent …
Homologous protein sequences are wonderfully diverse, indicating many possible evolutionary “solutions” to the encoding of function. Consequently, one can construct statistical models of protein sequence by analyzing amino acid frequency across a large multiple sequence alignment. A central premise is that covariance between amino acid positions reflects coevolution due to a shared functional or biophysical constraint. In this review, we describe the implementation and discuss the advantages, limitations, and recent progress on two coevolution-based modeling approaches: (1) Potts models of protein sequence (direct coupling analysis [DCA]-like), and (2) the statistical coupling analysis (SCA). Each approach detects interesting features of protein sequence and structure—the former emphasizes local physical contacts throughout the structure, while the latter identifies larger evolutionarily coupled networks of residues. Recent advances in large-scale gene synthesis and high-throughput functional selection now motivate additional work to benchmark model performance across quantitative function prediction and de novo design tasks.
cshperspectives.cshlp.org
以上显示的是最相近的搜索结果。 查看全部搜索结果