001050455 001__ 1050455
001050455 005__ 20260121204316.0
001050455 0247_ $$2doi$$a10.1038/s43588-025-00854-1
001050455 0247_ $$2datacite_doi$$a10.34734/FZJ-2026-00225
001050455 037__ $$aFZJ-2026-00225
001050455 082__ $$a004
001050455 1001_ $$0P:(DE-Juel1)194421$$aLeroux, Nathan$$b0$$ufzj
001050455 245__ $$aAnalog in-memory computing attention mechanism for fast and energy-efficient large language models
001050455 260__ $$aLondon$$bNature Research$$c2025
001050455 3367_ $$2DRIVER$$aarticle
001050455 3367_ $$2DataCite$$aOutput Types/Journal article
001050455 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1768997788_9544
001050455 3367_ $$2BibTeX$$aARTICLE
001050455 3367_ $$2ORCID$$aJOURNAL_ARTICLE
001050455 3367_ $$00$$2EndNote$$aJournal Article
001050455 520__ $$aTransformer networks, driven by self-attention, are central to large languagemodels. In generative transformers, self-attention uses cache memoryto store token projections, avoiding recomputation at each time step.However, graphics processing unit (GPU)-stored projections must be loadedinto static random-access memory for each new generation step, causinglatency and energy bottlenecks. Here we present a custom self-attentionin-memory computing architecture based on emerging charge-basedmemories called gain cells, which can be efficiently written to store newtokens during sequence generation and enable parallel analog dot-productcomputation required for self-attention. However, the analog gain-cellcircuits introduce non-idealities and constraints preventing the directmapping of pre-trained models. To circumvent this problem, we design aninitialization algorithm achieving text-processing performance comparableto GPT-2 without training from scratch. Our architecture reduces attentionlatency and energy consumption by up to two and four orders of magnitude,respectively, compared with GPUs, marking a substantial step towardultrafast, low-power generative transformers
001050455 536__ $$0G:(DE-HGF)POF4-5234$$a5234 - Emerging NC Architectures (POF4-523)$$cPOF4-523$$fPOF IV$$x0
001050455 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
001050455 7001_ $$0P:(DE-Juel1)192242$$aManea, Paul$$b1$$eCorresponding author$$ufzj
001050455 7001_ $$0P:(DE-Juel1)198888$$aSudarshan, Chirag$$b2$$ufzj
001050455 7001_ $$0P:(DE-Juel1)190112$$aFinkbeiner, Jan$$b3
001050455 7001_ $$0P:(DE-Juel1)174486$$aSiegel, Sebastian$$b4$$ufzj
001050455 7001_ $$0P:(DE-Juel1)188145$$aStrachan, John Paul$$b5$$ufzj
001050455 7001_ $$0P:(DE-Juel1)188273$$aNeftci, Emre$$b6
001050455 770__ $$aNeuromorphic Hardware and Computing 2024
001050455 773__ $$0PERI:(DE-600)3029424-1$$a10.1038/s43588-025-00854-1$$gVol. 5, no. 9, p. 813 - 824$$n9$$p813 - 824$$tNature computational science$$v5$$x2662-8457$$y2025
001050455 8564_ $$uhttps://juser.fz-juelich.de/record/1050455/files/s43588-025-00854-1-1.pdf$$yOpenAccess
001050455 909CO $$ooai:juser.fz-juelich.de:1050455$$popenaire$$popen_access$$pVDB$$pdriver$$pdnbdelivery
001050455 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)194421$$aForschungszentrum Jülich$$b0$$kFZJ
001050455 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)192242$$aForschungszentrum Jülich$$b1$$kFZJ
001050455 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)198888$$aForschungszentrum Jülich$$b2$$kFZJ
001050455 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190112$$aForschungszentrum Jülich$$b3$$kFZJ
001050455 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)174486$$aForschungszentrum Jülich$$b4$$kFZJ
001050455 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188145$$aForschungszentrum Jülich$$b5$$kFZJ
001050455 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188273$$aForschungszentrum Jülich$$b6$$kFZJ
001050455 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5234$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x0
001050455 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-12-13
001050455 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
001050455 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bNAT COMPUT SCI : 2022$$d2024-12-13
001050455 915__ $$0StatID:(DE-HGF)0112$$2StatID$$aWoS$$bEmerging Sources Citation Index$$d2024-12-13
001050455 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2024-12-13
001050455 915__ $$0StatID:(DE-HGF)3003$$2StatID$$aDEAL Nature$$d2024-12-13$$wger
001050455 915__ $$0StatID:(DE-HGF)9910$$2StatID$$aIF >= 10$$bNAT COMPUT SCI : 2022$$d2024-12-13
001050455 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001050455 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2024-12-13
001050455 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2024-12-13
001050455 920__ $$lyes
001050455 9201_ $$0I:(DE-Juel1)PGI-14-20210412$$kPGI-14$$lNeuromorphic Compute Nodes$$x0
001050455 9201_ $$0I:(DE-Juel1)PGI-15-20210701$$kPGI-15$$lNeuromorphic Software Eco System$$x1
001050455 980__ $$ajournal
001050455 980__ $$aVDB
001050455 980__ $$aUNRESTRICTED
001050455 980__ $$aI:(DE-Juel1)PGI-14-20210412
001050455 980__ $$aI:(DE-Juel1)PGI-15-20210701
001050455 9801_ $$aFullTexts