001037902 001__ 1037902
001037902 005__ 20250203103256.0
001037902 037__ $$aFZJ-2025-01040
001037902 1001_ $$0P:(DE-Juel1)190112$$aFinkbeiner, Jan$$b0$$ufzj
001037902 1112_ $$aAAAI Conference on Artificial Intelligence$$cVancouver$$d2024-02-27 - 2024-03-04$$wCanada
001037902 245__ $$aHarnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models
001037902 260__ $$c2024
001037902 300__ $$a11996-12005
001037902 3367_ $$2ORCID$$aCONFERENCE_PAPER
001037902 3367_ $$033$$2EndNote$$aConference Paper
001037902 3367_ $$2BibTeX$$aINPROCEEDINGS
001037902 3367_ $$2DRIVER$$aconferenceObject
001037902 3367_ $$2DataCite$$aOutput Types/Conference Paper
001037902 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1738231095_11970
001037902 520__ $$aCurrent AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel workloads and dense vector matrix multiplications. Potentially more efficient neural network models utilizing sparsity and recurrence cannot leverage the full power of SIMD processor and are thus at a severe disadvantage compared to today’s prominent parallel architectures like Transformers and CNNs, thereby hindering the path towards more sustainable AI. To overcome this limitation, we explore sparse and recurrent model training on a massively parallel multiple instruction multiple data (MIMD) architecture with distributed local memory. We implement a training routine based on backpropagation through time (BPTT) for the brain-inspired class of Spiking Neural Networks (SNNs) that feature binary sparse activations. We observe a massive advantage in using sparse activation tensors with a MIMD processor, the Intelligence Processing Unit (IPU) compared to GPUs. On training workloads, our results demonstrate 5-10× throughput gains compared to A100 GPUs and up to 38× gains for higher levels of activation sparsity, without a significant slowdown in training convergence or reduction in final model performance. Furthermore, our results show highly promising trends for both single and multi IPU configurations as we scale up to larger model sizes. Our work paves the way towards more efficient,non-standard models via AI training hardware beyond GPUs, and competitive large scale SNN models.
001037902 536__ $$0G:(DE-HGF)POF4-5234$$a5234 - Emerging NC Architectures (POF4-523)$$cPOF4-523$$fPOF IV$$x0
001037902 536__ $$0G:(BMBF)03ZU1106CA$$aBMBF 03ZU1106CA - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - A (03ZU1106CA)$$c03ZU1106CA$$x1
001037902 536__ $$0G:(DE-Juel1)BMBF-03ZU1106CB$$aBMBF 03ZU1106CB - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - B (BMBF-03ZU1106CB)$$cBMBF-03ZU1106CB$$x2
001037902 7001_ $$0P:(DE-HGF)0$$aGmeinder, Thomas$$b1
001037902 7001_ $$0P:(DE-HGF)0$$aPupilli, Mark$$b2
001037902 7001_ $$0P:(DE-HGF)0$$aTitterton, Alexander$$b3
001037902 7001_ $$0P:(DE-Juel1)188273$$aNeftci, Emre$$b4$$ufzj
001037902 8564_ $$uhttps://juser.fz-juelich.de/record/1037902/files/AAAI_Harnessing%20Manycore%20Processors%20with%20Distributed%20Memory%20for%20Accelerated%20Training%20of%20Sparse%20and%20Recurrent%20Models.pdf$$yRestricted
001037902 909CO $$ooai:juser.fz-juelich.de:1037902$$pVDB
001037902 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190112$$aForschungszentrum Jülich$$b0$$kFZJ
001037902 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188273$$aForschungszentrum Jülich$$b4$$kFZJ
001037902 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5234$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x0
001037902 9141_ $$y2024
001037902 9201_ $$0I:(DE-Juel1)PGI-15-20210701$$kPGI-15$$lNeuromorphic Software Eco System$$x0
001037902 980__ $$acontrib
001037902 980__ $$aVDB
001037902 980__ $$aI:(DE-Juel1)PGI-15-20210701
001037902 980__ $$aUNRESTRICTED