| Home > Publications database > Analog in-memory computing attention mechanism for fast and energy-efficient large language models > print |
| 001 | 1050455 | ||
| 005 | 20260121204316.0 | ||
| 024 | 7 | _ | |a 10.1038/s43588-025-00854-1 |2 doi |
| 024 | 7 | _ | |a 10.34734/FZJ-2026-00225 |2 datacite_doi |
| 037 | _ | _ | |a FZJ-2026-00225 |
| 082 | _ | _ | |a 004 |
| 100 | 1 | _ | |a Leroux, Nathan |0 P:(DE-Juel1)194421 |b 0 |u fzj |
| 245 | _ | _ | |a Analog in-memory computing attention mechanism for fast and energy-efficient large language models |
| 260 | _ | _ | |a London |c 2025 |b Nature Research |
| 336 | 7 | _ | |a article |2 DRIVER |
| 336 | 7 | _ | |a Output Types/Journal article |2 DataCite |
| 336 | 7 | _ | |a Journal Article |b journal |m journal |0 PUB:(DE-HGF)16 |s 1768997788_9544 |2 PUB:(DE-HGF) |
| 336 | 7 | _ | |a ARTICLE |2 BibTeX |
| 336 | 7 | _ | |a JOURNAL_ARTICLE |2 ORCID |
| 336 | 7 | _ | |a Journal Article |0 0 |2 EndNote |
| 520 | _ | _ | |a Transformer networks, driven by self-attention, are central to large languagemodels. In generative transformers, self-attention uses cache memoryto store token projections, avoiding recomputation at each time step.However, graphics processing unit (GPU)-stored projections must be loadedinto static random-access memory for each new generation step, causinglatency and energy bottlenecks. Here we present a custom self-attentionin-memory computing architecture based on emerging charge-basedmemories called gain cells, which can be efficiently written to store newtokens during sequence generation and enable parallel analog dot-productcomputation required for self-attention. However, the analog gain-cellcircuits introduce non-idealities and constraints preventing the directmapping of pre-trained models. To circumvent this problem, we design aninitialization algorithm achieving text-processing performance comparableto GPT-2 without training from scratch. Our architecture reduces attentionlatency and energy consumption by up to two and four orders of magnitude,respectively, compared with GPUs, marking a substantial step towardultrafast, low-power generative transformers |
| 536 | _ | _ | |a 5234 - Emerging NC Architectures (POF4-523) |0 G:(DE-HGF)POF4-5234 |c POF4-523 |f POF IV |x 0 |
| 588 | _ | _ | |a Dataset connected to CrossRef, Journals: juser.fz-juelich.de |
| 700 | 1 | _ | |a Manea, Paul |0 P:(DE-Juel1)192242 |b 1 |e Corresponding author |u fzj |
| 700 | 1 | _ | |a Sudarshan, Chirag |0 P:(DE-Juel1)198888 |b 2 |u fzj |
| 700 | 1 | _ | |a Finkbeiner, Jan |0 P:(DE-Juel1)190112 |b 3 |
| 700 | 1 | _ | |a Siegel, Sebastian |0 P:(DE-Juel1)174486 |b 4 |u fzj |
| 700 | 1 | _ | |a Strachan, John Paul |0 P:(DE-Juel1)188145 |b 5 |u fzj |
| 700 | 1 | _ | |a Neftci, Emre |0 P:(DE-Juel1)188273 |b 6 |
| 770 | _ | _ | |a Neuromorphic Hardware and Computing 2024 |
| 773 | _ | _ | |a 10.1038/s43588-025-00854-1 |g Vol. 5, no. 9, p. 813 - 824 |0 PERI:(DE-600)3029424-1 |n 9 |p 813 - 824 |t Nature computational science |v 5 |y 2025 |x 2662-8457 |
| 856 | 4 | _ | |u https://juser.fz-juelich.de/record/1050455/files/s43588-025-00854-1-1.pdf |y OpenAccess |
| 909 | C | O | |o oai:juser.fz-juelich.de:1050455 |p openaire |p open_access |p VDB |p driver |p dnbdelivery |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 0 |6 P:(DE-Juel1)194421 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 1 |6 P:(DE-Juel1)192242 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 2 |6 P:(DE-Juel1)198888 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 3 |6 P:(DE-Juel1)190112 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 4 |6 P:(DE-Juel1)174486 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 5 |6 P:(DE-Juel1)188145 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 6 |6 P:(DE-Juel1)188273 |
| 913 | 1 | _ | |a DE-HGF |b Key Technologies |l Natural, Artificial and Cognitive Information Processing |1 G:(DE-HGF)POF4-520 |0 G:(DE-HGF)POF4-523 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Neuromorphic Computing and Network Dynamics |9 G:(DE-HGF)POF4-5234 |x 0 |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0200 |2 StatID |b SCOPUS |d 2024-12-13 |
| 915 | _ | _ | |a Creative Commons Attribution CC BY 4.0 |0 LIC:(DE-HGF)CCBY4 |2 HGFVOC |
| 915 | _ | _ | |a JCR |0 StatID:(DE-HGF)0100 |2 StatID |b NAT COMPUT SCI : 2022 |d 2024-12-13 |
| 915 | _ | _ | |a WoS |0 StatID:(DE-HGF)0112 |2 StatID |b Emerging Sources Citation Index |d 2024-12-13 |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0150 |2 StatID |b Web of Science Core Collection |d 2024-12-13 |
| 915 | _ | _ | |a DEAL Nature |0 StatID:(DE-HGF)3003 |2 StatID |d 2024-12-13 |w ger |
| 915 | _ | _ | |a IF >= 10 |0 StatID:(DE-HGF)9910 |2 StatID |b NAT COMPUT SCI : 2022 |d 2024-12-13 |
| 915 | _ | _ | |a OpenAccess |0 StatID:(DE-HGF)0510 |2 StatID |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0300 |2 StatID |b Medline |d 2024-12-13 |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0199 |2 StatID |b Clarivate Analytics Master Journal List |d 2024-12-13 |
| 920 | _ | _ | |l yes |
| 920 | 1 | _ | |0 I:(DE-Juel1)PGI-14-20210412 |k PGI-14 |l Neuromorphic Compute Nodes |x 0 |
| 920 | 1 | _ | |0 I:(DE-Juel1)PGI-15-20210701 |k PGI-15 |l Neuromorphic Software Eco System |x 1 |
| 980 | _ | _ | |a journal |
| 980 | _ | _ | |a VDB |
| 980 | _ | _ | |a UNRESTRICTED |
| 980 | _ | _ | |a I:(DE-Juel1)PGI-14-20210412 |
| 980 | _ | _ | |a I:(DE-Juel1)PGI-15-20210701 |
| 980 | 1 | _ | |a FullTexts |
| Library | Collection | CLSMajor | CLSMinor | Language | Author |
|---|