On-Chip Learning via Transformer In-Context Learning

Finkbeiner, Jan Robert; Neftci, Emre
001037904 001__ 1037904
001037904 005__ 20250203103256.0
001037904 0247_ $$2datacite_doi$$a10.34734/FZJ-2025-01042
001037904 037__ $$aFZJ-2025-01042
001037904 1001_ $$0P:(DE-Juel1)190112$$aFinkbeiner, Jan Robert$$b0$$ufzj
001037904 245__ $$aOn-Chip Learning via Transformer In-Context Learning
001037904 260__ $$c2024
001037904 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1738239208_31383
001037904 3367_ $$2ORCID$$aWORKING_PAPER
001037904 3367_ $$028$$2EndNote$$aElectronic Article
001037904 3367_ $$2DRIVER$$apreprint
001037904 3367_ $$2BibTeX$$aARTICLE
001037904 3367_ $$2DataCite$$aOutput Types/Working Paper
001037904 520__ $$aAutoregressive decoder-only transformers have become key components for scalable sequence processing and generation models. However, the transformer's self-attention mechanism requires transferring prior token projections from the main memory at each time step (token), thus severely limiting their performance on conventional processors. Self-attention can be viewed as a dynamic feed-forward layer, whose matrix is input sequence-dependent similarly to the result of local synaptic plasticity. Using this insight, we present a neuromorphic decoder-only transformer model that utilizes an on-chip plasticity processor to compute self-attention. Interestingly, the training of transformers enables them to ``learn'' the input context during inference. We demonstrate this in-context learning ability of transformers on the Loihi 2 processor by solving a few-shot classification problem. With this we emphasize the importance of pretrained models especially their ability to find simple, local, backpropagation free, learning rules enabling on-chip learning and adaptation in a hardware friendly manner.
001037904 536__ $$0G:(DE-HGF)POF4-5234$$a5234 - Emerging NC Architectures (POF4-523)$$cPOF4-523$$fPOF IV$$x0
001037904 536__ $$0G:(BMBF)03ZU1106CA$$aBMBF 03ZU1106CA - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - A (03ZU1106CA)$$c03ZU1106CA$$x1
001037904 536__ $$0G:(DE-Juel1)BMBF-03ZU1106CB$$aBMBF 03ZU1106CB - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - B (BMBF-03ZU1106CB)$$cBMBF-03ZU1106CB$$x2
001037904 7001_ $$0P:(DE-Juel1)188273$$aNeftci, Emre$$b1$$ufzj
001037904 8564_ $$uhttps://arxiv.org/abs/2410.08711
001037904 8564_ $$uhttps://juser.fz-juelich.de/record/1037904/files/arxiv_On-Chip%20Learning%20via%20Transformer%20In-Context%20Learning.pdf$$yOpenAccess
001037904 909CO $$ooai:juser.fz-juelich.de:1037904$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
001037904 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190112$$aForschungszentrum Jülich$$b0$$kFZJ
001037904 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188273$$aForschungszentrum Jülich$$b1$$kFZJ
001037904 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5234$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x0
001037904 9141_ $$y2024
001037904 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001037904 920__ $$lyes
001037904 9201_ $$0I:(DE-Juel1)PGI-15-20210701$$kPGI-15$$lNeuromorphic Software Eco System$$x0
001037904 9801_ $$aFullTexts
001037904 980__ $$apreprint
001037904 980__ $$aVDB
001037904 980__ $$aUNRESTRICTED
001037904 980__ $$aI:(DE-Juel1)PGI-15-20210701
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help