Preprint FZJ-2025-01113

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models

 ;  ;  ;  ;  ;  ;

2024
arXiv

arXiv () [10.48550/arXiv.2409.19315]

This record in other databases:

Please use a persistent id in citations: doi:  doi:  doi:

Abstract: Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks. We present a custom self-attention in-memory computing architecture based on emerging charge-based memories called gain cells, which can be efficiently written to store new tokens during sequence generation and enable parallel analog dot-product computation required for self-attention. However, the analog gain cell circuits introduce non-idealities and constraints preventing the direct mapping of pre-trained models. To circumvent this problem, we design an initialization algorithm achieving text processing performance comparable to GPT-2 without training from scratch. Our architecture respectively reduces attention latency and energy consumption by up to two and five orders of magnitude compared to GPUs, marking a significant step toward ultra-fast, low-power generative Transformers.

Keyword(s): Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI) ; Hardware Architecture (cs.AR) ; Emerging Technologies (cs.ET) ; FOS: Computer and information sciences


Contributing Institute(s):
  1. Neuromorphic Software Eco System (PGI-15)
  2. Neuromorphic Compute Nodes (PGI-14)
Research Program(s):
  1. 5234 - Emerging NC Architectures (POF4-523) (POF4-523)
  2. BMBF 16ME0400 - Verbundprojekt: Neuro-inspirierte Technologien der künstlichen Intelligenz für die Elektronik der Zukunft - NEUROTEC II - (16ME0400) (16ME0400)
  3. BMBF 03ZU1106CA - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - A (03ZU1106CA) (03ZU1106CA)
  4. BMBF 03ZU1106CB - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - B (BMBF-03ZU1106CB) (BMBF-03ZU1106CB)

Appears in the scientific report 2024
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Institute Collections > PGI > PGI-15
Institute Collections > PGI > PGI-14
Document types > Reports > Preprints
Workflow collections > Public records
Publications database
Open Access

 Record created 2025-01-24, last modified 2025-02-03


OpenAccess:
Download fulltext PDF
External link:
Download fulltextFulltext
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)