Guiding questions to avoid data leakage in biological machine learning applications

J Bernett, DB Blumenthal, DG Grimm, F Haselbeck… - Nature …, 2024 - nature.com
Abstract Machine learning methods for extracting patterns from high-dimensional data are
very important in the biological sciences. However, in certain cases, real-world applications …

A first computational frame for recognizing heparin-binding protein

W Zhu, SS Yuan, J Li, CB Huang, H Lin, B Liao - Diagnostics, 2023 - mdpi.com
Heparin-binding protein (HBP) is a cationic antibacterial protein derived from multinuclear
neutrophils and an important biomarker of infectious diseases. The correct identification of …

A statistical analysis of the sequence and structure of thermophilic and non-thermophilic proteins

Z Ahmed, H Zulfiqar, L Tang, H Lin - International Journal of Molecular …, 2022 - mdpi.com
Thermophilic proteins have various practical applications in theoretical research and in
industry. In recent years, the demand for thermophilic proteins on an industrial scale has …

[HTML][HTML] Empirical comparison and recent advances of computational prediction of hormone binding proteins using machine learning methods

H Zulfiqar, Z Guo, BK Grace-Mercure, ZY Zhang… - Computational and …, 2023 - Elsevier
Hormone binding proteins (HBPs) belong to the group of soluble carrier proteins. These
proteins selectively and non-covalently interact with hormones and promote growth …

DeepTP: a deep learning model for thermophilic protein prediction

J Zhao, W Yan, Y Yang - International Journal of Molecular Sciences, 2023 - mdpi.com
Thermophilic proteins have important value in the fields of biopharmaceuticals and enzyme
engineering. Most existing thermophilic protein prediction models are based on traditional …

Superior protein thermophilicity prediction with protein language model embeddings

F Haselbeck, M John, Y Zhang, J Pirnay… - NAR Genomics and …, 2023 - academic.oup.com
Protein thermostability is important in many areas of biotechnology, including enzyme
engineering and protein-hybrid optoelectronics. Ever-growing protein databases and …

TemStaPro: protein thermostability prediction using sequence representations from protein language models

I Pudžiuvelytė, K Olechnovič, E Godliauskaite… - …, 2024 - academic.oup.com
Motivation Reliable prediction of protein thermostability from its sequence is valuable for
both academic and industrial research. This prediction problem can be tackled using …

Identification of thermophilic proteins based on sequence-based bidirectional representations from transformer-embedding features

H Pei, J Li, S Ma, J Jiang, M Li, Q Zou, Z Lv - Applied Sciences, 2023 - mdpi.com
Thermophilic proteins have great potential to be utilized as biocatalysts in biotechnology.
Machine learning algorithms are gaining increasing use in identifying such enzymes …

Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor

A Huang, F Lu, F Liu - Frontiers in Microbiology, 2023 - frontiersin.org
Introduction Psychrophilic enzymes are a class of macromolecules with high catalytic activity
at low temperatures. Cold-active enzymes possessing eco-friendly and cost-effective …

PreDBP-PLMs: prediction of DNA-binding proteins based on pre-trained protein language models and convolutional neural networks

D Qi, C Song, T Liu - Analytical Biochemistry, 2024 - Elsevier
The recognition of DNA-binding proteins (DBPs) is the crucial step to understanding their
roles in various biological processes such as genetic regulation, gene expression, cell cycle …