001     1037654
005     20250203124502.0
024 7 _ |a 10.1038/s43588-024-00627-2
|2 doi
024 7 _ |a 10.34734/FZJ-2025-00819
|2 datacite_doi
024 7 _ |a 38730184
|2 pmid
024 7 _ |a WOS:001220857400002
|2 WOS
037 _ _ |a FZJ-2025-00819
082 _ _ |a 004
100 1 _ |a Siebenmorgen, Till
|0 0009-0008-5160-8100
|b 0
245 _ _ |a MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery
260 _ _ |a London
|c 2024
|b Nature Research
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1737441865_21954
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
520 _ _ |a Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule–ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein–ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein–ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
536 _ _ |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5112
|c POF4-511
|f POF IV
|x 0
588 _ _ |a Dataset connected to CrossRef, Journals: juser.fz-juelich.de
700 1 _ |a Menezes, Filipe
|0 0000-0002-7630-5447
|b 1
700 1 _ |a Benassou, Sabrina
|0 P:(DE-Juel1)192312
|b 2
|u fzj
700 1 _ |a Merdivan, Erinc
|0 P:(DE-HGF)0
|b 3
700 1 _ |a Didi, Kieran
|0 0000-0001-6839-3320
|b 4
700 1 _ |a Mourão, André Santos Dias
|0 P:(DE-HGF)0
|b 5
700 1 _ |a Kitel, Radosław
|0 P:(DE-HGF)0
|b 6
700 1 _ |a Liò, Pietro
|0 P:(DE-HGF)0
|b 7
700 1 _ |a Kesselheim, Stefan
|0 P:(DE-Juel1)185654
|b 8
700 1 _ |a Piraud, Marie
|0 P:(DE-HGF)0
|b 9
700 1 _ |a Theis, Fabian J.
|0 0000-0002-2419-1943
|b 10
700 1 _ |a Sattler, Michael
|0 0000-0002-1594-0527
|b 11
700 1 _ |a Popowicz, Grzegorz M.
|0 0000-0003-2818-7498
|b 12
|e Corresponding author
773 _ _ |a 10.1038/s43588-024-00627-2
|g Vol. 4, no. 5, p. 367 - 378
|0 PERI:(DE-600)3029424-1
|n 5
|p 367 - 378
|t Nature computational science
|v 4
|y 2024
|x 2662-8457
856 4 _ |u https://juser.fz-juelich.de/record/1037654/files/s43588-024-00627-2.pdf
|y OpenAccess
909 C O |o oai:juser.fz-juelich.de:1037654
|p openaire
|p open_access
|p VDB
|p driver
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 2
|6 P:(DE-Juel1)192312
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 8
|6 P:(DE-Juel1)185654
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5112
|x 0
914 1 _ |y 2024
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
|d 2024-12-13
915 _ _ |a Creative Commons Attribution CC BY 4.0
|0 LIC:(DE-HGF)CCBY4
|2 HGFVOC
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
|b NAT COMPUT SCI : 2022
|d 2024-12-13
915 _ _ |a WoS
|0 StatID:(DE-HGF)0112
|2 StatID
|b Emerging Sources Citation Index
|d 2024-12-13
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
|d 2024-12-13
915 _ _ |a DEAL Nature
|0 StatID:(DE-HGF)3003
|2 StatID
|d 2024-12-13
|w ger
915 _ _ |a IF >= 10
|0 StatID:(DE-HGF)9910
|2 StatID
|b NAT COMPUT SCI : 2022
|d 2024-12-13
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
|d 2024-12-13
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Clarivate Analytics Master Journal List
|d 2024-12-13
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 1 _ |a FullTexts
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21