001033763 001__ 1033763
001033763 005__ 20241212210725.0
001033763 0247_ $$2doi$$a10.1101/2024.01.12.575432
001033763 037__ $$aFZJ-2024-06604
001033763 1001_ $$0P:(DE-HGF)0$$aHoffbauer, Tilman$$b0
001033763 245__ $$aTransMEP: Transfer learning on large protein language models to predict mutation effects of proteins from a small known dataset
001033763 260__ $$c2024
001033763 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1733997316_5472
001033763 3367_ $$2ORCID$$aWORKING_PAPER
001033763 3367_ $$028$$2EndNote$$aElectronic Article
001033763 3367_ $$2DRIVER$$apreprint
001033763 3367_ $$2BibTeX$$aARTICLE
001033763 3367_ $$2DataCite$$aOutput Types/Working Paper
001033763 520__ $$aMachine learning-guided optimization has become a driving force for recent improvements in protein engineering. In addition, new protein language models are learning the grammar of evolutionarily occurring sequences at large scales. This work combines both approaches to make predictions about mutational effects that support protein engineering. To this end, an easy-to-use software tool called TransMEP is developed using transfer learning by feature extraction with Gaussian process regression. A large collection of datasets is used to evaluate its quality, which scales with the size of the training set, and to show its improvements over previous fine-tuning approaches. Wet-lab studies are simulated to evaluate the use of mutation effect prediction models for protein engineering. This showed that TransMEP finds the best performing mutants with a limited study budget by considering the trade-off between exploration and exploitation.
001033763 536__ $$0G:(DE-HGF)POF4-5241$$a5241 - Molecular Information Processing in Cellular Systems (POF4-524)$$cPOF4-524$$fPOF IV$$x0
001033763 588__ $$aDataset connected to CrossRef
001033763 7001_ $$0P:(DE-Juel1)132024$$aStrodel, Birgit$$b1$$eCorresponding author$$ufzj
001033763 773__ $$a10.1101/2024.01.12.575432$$p23$$tbioRxiv$$y2024
001033763 8564_ $$uhttps://juser.fz-juelich.de/record/1033763/files/bioRxiv_2024.01.12.575432v1.full-2.pdf$$yRestricted
001033763 909CO $$ooai:juser.fz-juelich.de:1033763$$pVDB
001033763 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132024$$aForschungszentrum Jülich$$b1$$kFZJ
001033763 9131_ $$0G:(DE-HGF)POF4-524$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5241$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vMolecular and Cellular Information Processing$$x0
001033763 9141_ $$y2024
001033763 920__ $$lyes
001033763 9201_ $$0I:(DE-Juel1)IBI-7-20200312$$kIBI-7$$lStrukturbiochemie$$x0
001033763 980__ $$apreprint
001033763 980__ $$aVDB
001033763 980__ $$aI:(DE-Juel1)IBI-7-20200312
001033763 980__ $$aUNRESTRICTED