001     1038064
005     20250203103308.0
024 7 _ |a 10.48550/ARXIV.2409.19315
|2 doi
024 7 _ |a 10.48550/arXiv.2409.19315
|2 doi
024 7 _ |a 10.34734/FZJ-2025-01113
|2 datacite_doi
037 _ _ |a FZJ-2025-01113
100 1 _ |a Leroux, Nathan
|0 P:(DE-Juel1)194421
|b 0
|e Corresponding author
|u fzj
245 _ _ |a Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models
260 _ _ |c 2024
|b arXiv
336 7 _ |a Preprint
|b preprint
|m preprint
|0 PUB:(DE-HGF)25
|s 1738249459_31339
|2 PUB:(DE-HGF)
336 7 _ |a WORKING_PAPER
|2 ORCID
336 7 _ |a Electronic Article
|0 28
|2 EndNote
336 7 _ |a preprint
|2 DRIVER
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a Output Types/Working Paper
|2 DataCite
520 _ _ |a Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks. We present a custom self-attention in-memory computing architecture based on emerging charge-based memories called gain cells, which can be efficiently written to store new tokens during sequence generation and enable parallel analog dot-product computation required for self-attention. However, the analog gain cell circuits introduce non-idealities and constraints preventing the direct mapping of pre-trained models. To circumvent this problem, we design an initialization algorithm achieving text processing performance comparable to GPT-2 without training from scratch. Our architecture respectively reduces attention latency and energy consumption by up to two and five orders of magnitude compared to GPUs, marking a significant step toward ultra-fast, low-power generative Transformers.
536 _ _ |a 5234 - Emerging NC Architectures (POF4-523)
|0 G:(DE-HGF)POF4-5234
|c POF4-523
|f POF IV
|x 0
536 _ _ |a BMBF 16ME0400 - Verbundprojekt: Neuro-inspirierte Technologien der künstlichen Intelligenz für die Elektronik der Zukunft - NEUROTEC II - (16ME0400)
|0 G:(BMBF)16ME0400
|c 16ME0400
|x 1
536 _ _ |a BMBF 03ZU1106CA - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - A (03ZU1106CA)
|0 G:(BMBF)03ZU1106CA
|c 03ZU1106CA
|x 2
536 _ _ |a BMBF 03ZU1106CB - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - B (BMBF-03ZU1106CB)
|0 G:(DE-Juel1)BMBF-03ZU1106CB
|c BMBF-03ZU1106CB
|x 3
588 _ _ |a Dataset connected to DataCite
650 _ 7 |a Neural and Evolutionary Computing (cs.NE)
|2 Other
650 _ 7 |a Artificial Intelligence (cs.AI)
|2 Other
650 _ 7 |a Hardware Architecture (cs.AR)
|2 Other
650 _ 7 |a Emerging Technologies (cs.ET)
|2 Other
650 _ 7 |a FOS: Computer and information sciences
|2 Other
700 1 _ |a Manea, Paul-Philipp
|0 P:(DE-Juel1)192242
|b 1
|e Corresponding author
|u fzj
700 1 _ |a Sudarshan, Chirag
|0 P:(DE-Juel1)198888
|b 2
|u fzj
700 1 _ |a Finkbeiner, Jan
|0 P:(DE-Juel1)190112
|b 3
|u fzj
700 1 _ |a Siegel, Sebastian
|0 P:(DE-Juel1)174486
|b 4
|u fzj
700 1 _ |a Strachan, John Paul
|0 P:(DE-Juel1)188145
|b 5
|u fzj
700 1 _ |a Neftci, Emre
|0 P:(DE-Juel1)188273
|b 6
|u fzj
773 _ _ |a 10.48550/arXiv.2409.19315
856 4 _ |u https://doi.org/10.48550/arXiv.2409.19315
856 4 _ |u https://juser.fz-juelich.de/record/1038064/files/Analog%20In-Memory%20Computing%20Attention%20Mechanism%20for%20Fast%20and%20Energy-Efficient%20Large%20Language%20Models.pdf
|y OpenAccess
909 C O |o oai:juser.fz-juelich.de:1038064
|p openaire
|p open_access
|p VDB
|p driver
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)194421
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)192242
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 2
|6 P:(DE-Juel1)198888
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)190112
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 4
|6 P:(DE-Juel1)174486
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 5
|6 P:(DE-Juel1)188145
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 6
|6 P:(DE-Juel1)188273
913 1 _ |a DE-HGF
|b Key Technologies
|l Natural, Artificial and Cognitive Information Processing
|1 G:(DE-HGF)POF4-520
|0 G:(DE-HGF)POF4-523
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Neuromorphic Computing and Network Dynamics
|9 G:(DE-HGF)POF4-5234
|x 0
914 1 _ |y 2024
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)PGI-15-20210701
|k PGI-15
|l Neuromorphic Software Eco System
|x 0
920 1 _ |0 I:(DE-Juel1)PGI-14-20210412
|k PGI-14
|l Neuromorphic Compute Nodes
|x 1
980 _ _ |a preprint
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)PGI-15-20210701
980 _ _ |a I:(DE-Juel1)PGI-14-20210412
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21