001     1033763
005     20241212210725.0
024 7 _ |a 10.1101/2024.01.12.575432
|2 doi
037 _ _ |a FZJ-2024-06604
100 1 _ |a Hoffbauer, Tilman
|0 P:(DE-HGF)0
|b 0
245 _ _ |a TransMEP: Transfer learning on large protein language models to predict mutation effects of proteins from a small known dataset
260 _ _ |c 2024
336 7 _ |a Preprint
|b preprint
|m preprint
|0 PUB:(DE-HGF)25
|s 1733997316_5472
|2 PUB:(DE-HGF)
336 7 _ |a WORKING_PAPER
|2 ORCID
336 7 _ |a Electronic Article
|0 28
|2 EndNote
336 7 _ |a preprint
|2 DRIVER
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a Output Types/Working Paper
|2 DataCite
520 _ _ |a Machine learning-guided optimization has become a driving force for recent improvements in protein engineering. In addition, new protein language models are learning the grammar of evolutionarily occurring sequences at large scales. This work combines both approaches to make predictions about mutational effects that support protein engineering. To this end, an easy-to-use software tool called TransMEP is developed using transfer learning by feature extraction with Gaussian process regression. A large collection of datasets is used to evaluate its quality, which scales with the size of the training set, and to show its improvements over previous fine-tuning approaches. Wet-lab studies are simulated to evaluate the use of mutation effect prediction models for protein engineering. This showed that TransMEP finds the best performing mutants with a limited study budget by considering the trade-off between exploration and exploitation.
536 _ _ |a 5241 - Molecular Information Processing in Cellular Systems (POF4-524)
|0 G:(DE-HGF)POF4-5241
|c POF4-524
|f POF IV
|x 0
588 _ _ |a Dataset connected to CrossRef
700 1 _ |a Strodel, Birgit
|0 P:(DE-Juel1)132024
|b 1
|e Corresponding author
|u fzj
773 _ _ |a 10.1101/2024.01.12.575432
|p 23
|t bioRxiv
|y 2024
856 4 _ |u https://juser.fz-juelich.de/record/1033763/files/bioRxiv_2024.01.12.575432v1.full-2.pdf
|y Restricted
909 C O |o oai:juser.fz-juelich.de:1033763
|p VDB
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)132024
913 1 _ |a DE-HGF
|b Key Technologies
|l Natural, Artificial and Cognitive Information Processing
|1 G:(DE-HGF)POF4-520
|0 G:(DE-HGF)POF4-524
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Molecular and Cellular Information Processing
|9 G:(DE-HGF)POF4-5241
|x 0
914 1 _ |y 2024
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)IBI-7-20200312
|k IBI-7
|l Strukturbiochemie
|x 0
980 _ _ |a preprint
980 _ _ |a VDB
980 _ _ |a I:(DE-Juel1)IBI-7-20200312
980 _ _ |a UNRESTRICTED


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21