Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models

Finkbeiner, Jan Robert; Gmeinder, Thomas; Titterton, Alexander; Neftci, Emre; Pupilli, Mark
doi:10.48550/arXiv.2311.04386
001022263 001__ 1022263
001022263 005__ 20250203103347.0
001022263 0247_ $$2doi$$a10.48550/arXiv.2311.04386
001022263 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-01381
001022263 037__ $$aFZJ-2024-01381
001022263 041__ $$aEnglish
001022263 088__ $$2arXiv$$aarXiv:2311.04386
001022263 1001_ $$0P:(DE-Juel1)190112$$aFinkbeiner, Jan Robert$$b0$$eCorresponding author
001022263 245__ $$aHarnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models
001022263 260__ $$barXiv$$c2023
001022263 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1714552617_3947
001022263 3367_ $$2ORCID$$aWORKING_PAPER
001022263 3367_ $$028$$2EndNote$$aElectronic Article
001022263 3367_ $$2DRIVER$$apreprint
001022263 3367_ $$2BibTeX$$aARTICLE
001022263 3367_ $$2DataCite$$aOutput Types/Working Paper
001022263 520__ $$aCurrent AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel workloads and dense vector matrix multiplications. Potentially more efficient neural network models utilizing sparsity and recurrence cannot leverage the full power of SIMD processor and are thus at a severe disadvantage compared to today's prominent parallel architectures like Transformers and CNNs, thereby hindering the path towards more sustainable AI. To overcome this limitation, we explore sparse and recurrent model training on a massively parallel multiple instruction multiple data (MIMD) architecture with distributed local memory. We implement a training routine based on backpropagation through time (BPTT) for the brain-inspired class of Spiking Neural Networks (SNNs) that feature binary sparse activations. We observe a massive advantage in using sparse activation tensors with a MIMD processor, the Intelligence Processing Unit (IPU) compared to GPUs. On training workloads, our results demonstrate 5-10x throughput gains compared to A100 GPUs and up to 38x gains for higher levels of activation sparsity, without a significant slowdown in training convergence or reduction in final model performance. Furthermore, our results show highly promising trends for both single and multi IPU configurations as we scale up to larger model sizes. Our work paves the way towards more efficient, non-standard models via AI training hardware beyond GPUs, and competitive large scale SNN models.
001022263 536__ $$0G:(DE-HGF)POF4-5234$$a5234 - Emerging NC Architectures (POF4-523)$$cPOF4-523$$fPOF IV$$x0
001022263 536__ $$0G:(DE-Juel1)BMBF-03ZU1106CB$$aBMBF 03ZU1106CB - NeuroSys: Algorithm-Hardware Co-Design (Projekt C) - B (BMBF-03ZU1106CB)$$cBMBF-03ZU1106CB$$x1
001022263 588__ $$aDataset connected to DataCite
001022263 650_7 $$2Other$$aNeural and Evolutionary Computing (cs.NE)
001022263 650_7 $$2Other$$aArtificial Intelligence (cs.AI)
001022263 650_7 $$2Other$$aFOS: Computer and information sciences
001022263 7001_ $$0P:(DE-HGF)0$$aGmeinder, Thomas$$b1
001022263 7001_ $$0P:(DE-HGF)0$$aPupilli, Mark$$b2
001022263 7001_ $$0P:(DE-HGF)0$$aTitterton, Alexander$$b3
001022263 7001_ $$0P:(DE-Juel1)188273$$aNeftci, Emre$$b4
001022263 773__ $$a10.48550/arXiv.2311.04386$$y2023
001022263 8564_ $$uhttps://juser.fz-juelich.de/record/1022263/files/finkbeiner23_arxiv_harnessing_maycore_processors.pdf$$yOpenAccess
001022263 8564_ $$uhttps://juser.fz-juelich.de/record/1022263/files/finkbeiner23_arxiv_harnessing_maycore_processors.gif?subformat=icon$$xicon$$yOpenAccess
001022263 8564_ $$uhttps://juser.fz-juelich.de/record/1022263/files/finkbeiner23_arxiv_harnessing_maycore_processors.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
001022263 8564_ $$uhttps://juser.fz-juelich.de/record/1022263/files/finkbeiner23_arxiv_harnessing_maycore_processors.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
001022263 8564_ $$uhttps://juser.fz-juelich.de/record/1022263/files/finkbeiner23_arxiv_harnessing_maycore_processors.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
001022263 909CO $$ooai:juser.fz-juelich.de:1022263$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
001022263 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001022263 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190112$$aForschungszentrum Jülich$$b0$$kFZJ
001022263 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188273$$aForschungszentrum Jülich$$b4$$kFZJ
001022263 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5234$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x0
001022263 9141_ $$y2024
001022263 920__ $$lyes
001022263 9201_ $$0I:(DE-Juel1)PGI-15-20210701$$kPGI-15$$lNeuromorphic Software Eco System$$x0
001022263 980__ $$apreprint
001022263 980__ $$aVDB
001022263 980__ $$aUNRESTRICTED
001022263 980__ $$aI:(DE-Juel1)PGI-15-20210701
001022263 9801_ $$aFullTexts
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help