001 | 1037904 | ||
005 | 20250203103256.0 | ||
024 | 7 | _ | |a 10.34734/FZJ-2025-01042 |2 datacite_doi |
037 | _ | _ | |a FZJ-2025-01042 |
100 | 1 | _ | |a Finkbeiner, Jan Robert |0 P:(DE-Juel1)190112 |b 0 |u fzj |
245 | _ | _ | |a On-Chip Learning via Transformer In-Context Learning |
260 | _ | _ | |c 2024 |
336 | 7 | _ | |a Preprint |b preprint |m preprint |0 PUB:(DE-HGF)25 |s 1738239208_31383 |2 PUB:(DE-HGF) |
336 | 7 | _ | |a WORKING_PAPER |2 ORCID |
336 | 7 | _ | |a Electronic Article |0 28 |2 EndNote |
336 | 7 | _ | |a preprint |2 DRIVER |
336 | 7 | _ | |a ARTICLE |2 BibTeX |
336 | 7 | _ | |a Output Types/Working Paper |2 DataCite |
520 | _ | _ | |a Autoregressive decoder-only transformers have become key components for scalable sequence processing and generation models. However, the transformer's self-attention mechanism requires transferring prior token projections from the main memory at each time step (token), thus severely limiting their performance on conventional processors. Self-attention can be viewed as a dynamic feed-forward layer, whose matrix is input sequence-dependent similarly to the result of local synaptic plasticity. Using this insight, we present a neuromorphic decoder-only transformer model that utilizes an on-chip plasticity processor to compute self-attention. Interestingly, the training of transformers enables them to ``learn'' the input context during inference. We demonstrate this in-context learning ability of transformers on the Loihi 2 processor by solving a few-shot classification problem. With this we emphasize the importance of pretrained models especially their ability to find simple, local, backpropagation free, learning rules enabling on-chip learning and adaptation in a hardware friendly manner. |
536 | _ | _ | |a 5234 - Emerging NC Architectures (POF4-523) |0 G:(DE-HGF)POF4-5234 |c POF4-523 |f POF IV |x 0 |
536 | _ | _ | |a BMBF 03ZU1106CA - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - A (03ZU1106CA) |0 G:(BMBF)03ZU1106CA |c 03ZU1106CA |x 1 |
536 | _ | _ | |a BMBF 03ZU1106CB - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - B (BMBF-03ZU1106CB) |0 G:(DE-Juel1)BMBF-03ZU1106CB |c BMBF-03ZU1106CB |x 2 |
700 | 1 | _ | |a Neftci, Emre |0 P:(DE-Juel1)188273 |b 1 |u fzj |
856 | 4 | _ | |u https://arxiv.org/abs/2410.08711 |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1037904/files/arxiv_On-Chip%20Learning%20via%20Transformer%20In-Context%20Learning.pdf |y OpenAccess |
909 | C | O | |o oai:juser.fz-juelich.de:1037904 |p openaire |p open_access |p VDB |p driver |p dnbdelivery |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 0 |6 P:(DE-Juel1)190112 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 1 |6 P:(DE-Juel1)188273 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Natural, Artificial and Cognitive Information Processing |1 G:(DE-HGF)POF4-520 |0 G:(DE-HGF)POF4-523 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Neuromorphic Computing and Network Dynamics |9 G:(DE-HGF)POF4-5234 |x 0 |
914 | 1 | _ | |y 2024 |
915 | _ | _ | |a OpenAccess |0 StatID:(DE-HGF)0510 |2 StatID |
920 | _ | _ | |l yes |
920 | 1 | _ | |0 I:(DE-Juel1)PGI-15-20210701 |k PGI-15 |l Neuromorphic Software Eco System |x 0 |
980 | 1 | _ | |a FullTexts |
980 | _ | _ | |a preprint |
980 | _ | _ | |a VDB |
980 | _ | _ | |a UNRESTRICTED |
980 | _ | _ | |a I:(DE-Juel1)PGI-15-20210701 |
Library | Collection | CLSMajor | CLSMinor | Language | Author |
---|