Journal Article FZJ-2026-00225

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Analog in-memory computing attention mechanism for fast and energy-efficient large language models

 ;  ;  ;  ;  ;  ;

2025
Nature Research London

Nature computational science 5(9), 813 - 824 () [10.1038/s43588-025-00854-1] special issue: "Neuromorphic Hardware and Computing 2024"

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: Transformer networks, driven by self-attention, are central to large languagemodels. In generative transformers, self-attention uses cache memoryto store token projections, avoiding recomputation at each time step.However, graphics processing unit (GPU)-stored projections must be loadedinto static random-access memory for each new generation step, causinglatency and energy bottlenecks. Here we present a custom self-attentionin-memory computing architecture based on emerging charge-basedmemories called gain cells, which can be efficiently written to store newtokens during sequence generation and enable parallel analog dot-productcomputation required for self-attention. However, the analog gain-cellcircuits introduce non-idealities and constraints preventing the directmapping of pre-trained models. To circumvent this problem, we design aninitialization algorithm achieving text-processing performance comparableto GPT-2 without training from scratch. Our architecture reduces attentionlatency and energy consumption by up to two and four orders of magnitude,respectively, compared with GPUs, marking a substantial step towardultrafast, low-power generative transformers

Classification:

Contributing Institute(s):
  1. Neuromorphic Compute Nodes (PGI-14)
  2. Neuromorphic Software Eco System (PGI-15)
Research Program(s):
  1. 5234 - Emerging NC Architectures (POF4-523) (POF4-523)

Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; OpenAccess ; Clarivate Analytics Master Journal List ; DEAL Nature ; Emerging Sources Citation Index ; IF >= 10 ; JCR ; SCOPUS ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > PGI > PGI-15
Institute Collections > PGI > PGI-14
Workflow collections > Public records
Publications database
Open Access

 Record created 2026-01-12, last modified 2026-01-21


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)