Preprint FZJ-2024-06604

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
TransMEP: Transfer learning on large protein language models to predict mutation effects of proteins from a small known dataset

 ;

2024

bioRxiv 23 pp. () [10.1101/2024.01.12.575432]

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: Machine learning-guided optimization has become a driving force for recent improvements in protein engineering. In addition, new protein language models are learning the grammar of evolutionarily occurring sequences at large scales. This work combines both approaches to make predictions about mutational effects that support protein engineering. To this end, an easy-to-use software tool called TransMEP is developed using transfer learning by feature extraction with Gaussian process regression. A large collection of datasets is used to evaluate its quality, which scales with the size of the training set, and to show its improvements over previous fine-tuning approaches. Wet-lab studies are simulated to evaluate the use of mutation effect prediction models for protein engineering. This showed that TransMEP finds the best performing mutants with a limited study budget by considering the trade-off between exploration and exploitation.


Contributing Institute(s):
  1. Strukturbiochemie (IBI-7)
Research Program(s):
  1. 5241 - Molecular Information Processing in Cellular Systems (POF4-524) (POF4-524)

Appears in the scientific report 2024
Click to display QR Code for this record

The record appears in these collections:
Institute Collections > IBI > IBI-7
Document types > Reports > Preprints
Workflow collections > Public records
Publications database

 Record created 2024-12-03, last modified 2024-12-12


Restricted:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)