Deep semantic protein representation for annotation, discovery, and engineering

AS Schwartz, GJ Hannum, ZR Dwiel, ME Smoot… - BioRxiv, 2018 - biorxiv.org
AS Schwartz, GJ Hannum, ZR Dwiel, ME Smoot, AR Grant, JM Knight, SA Becker, JR Eads…
BioRxiv, 2018biorxiv.org
Computational assignment of function to proteins with no known homologs is still an
unsolved problem. We have created a novel, function-based approach to protein annotation
and discovery called D-SPACE (Deep Semantic Protein Annotation Classification and
Exploration), comprised of a multi-task, multi-label deep neural network trained on over 70
million proteins. Distinct from homology and motif-based methods, D-SPACE encodes
proteins in high-dimensional representations (embeddings), allowing the accurate …
Abstract
Computational assignment of function to proteins with no known homologs is still an unsolved problem. We have created a novel, function-based approach to protein annotation and discovery called D-SPACE (Deep Semantic Protein Annotation Classification and Exploration), comprised of a multi-task, multi-label deep neural network trained on over 70 million proteins. Distinct from homology and motif-based methods, D-SPACE encodes proteins in high-dimensional representations (embeddings), allowing the accurate assignment of over 180,000 labels for 13 distinct tasks. The embedding representation enables fast searches for functionally related proteins, including homologs undetectable by traditional approaches. D-SPACE annotates all 109 million proteins in UniProt in under 35 hours on a single computer and searches the entirety of these in seconds. D-SPACE further quantifies the relative functional effect of mutations, facilitating rapid in silico mutagenesis for protein engineering applications. D-SPACE incorporates protein annotation, search, and other exploratory efforts into a single cohesive model.
biorxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果