Preprint FZJ-2024-06604

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
TransMEP: Transfer learning on large protein language models to predict mutation effects of proteins from a small known dataset

 ;

2024

bioRxiv 23 pp. () [10.1101/2024.01.12.575432]

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: Machine learning-guided optimization has become a driving force for recent improvements in protein engineering. In addition, new protein language models are learning the grammar of evolutionarily occurring sequences at large scales. This work combines both approaches to make predictions about mutational effects that support protein engineering. To this end, an easy-to-use software tool called TransMEP is developed using transfer learning by feature extraction with Gaussian process regression. A large collection of datasets is used to evaluate its quality, which scales with the size of the training set, and to show its improvements over previous fine-tuning approaches. Wet-lab studies are simulated to evaluate the use of mutation effect prediction models for protein engineering. This showed that TransMEP finds the best performing mutants with a limited study budget by considering the trade-off between exploration and exploitation.


Contributing Institute(s):
  1. Strukturbiochemie (IBI-7)
Research Program(s):
  1. 5241 - Molecular Information Processing in Cellular Systems (POF4-524) (POF4-524)

Appears in the scientific report 2024
Click to display QR Code for this record

The record appears in these collections:
Institutssammlungen > IBI > IBI-7
Dokumenttypen > Berichte > Vorabdrucke
Workflowsammlungen > Öffentliche Einträge
Publikationsdatenbank

 Datensatz erzeugt am 2024-12-03, letzte Änderung am 2024-12-12


Restricted:
Volltext herunterladen PDF
Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)