001037668 001__ 1037668
001037668 005__ 20250220092006.0
001037668 0247_ $$2doi$$a10.48550/arXiv.2412.20215
001037668 0247_ $$2datacite_doi$$a10.34734/FZJ-2025-00833
001037668 037__ $$aFZJ-2025-00833
001037668 1001_ $$0P:(DE-Juel1)174486$$aSiegel, Sebastian$$b0$$eCorresponding author$$ufzj
001037668 245__ $$aIMSSA: Deploying modern state-space models on memristive in-memory compute hardware
001037668 260__ $$barXiv$$c2024
001037668 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1738828598_24855
001037668 3367_ $$2ORCID$$aWORKING_PAPER
001037668 3367_ $$028$$2EndNote$$aElectronic Article
001037668 3367_ $$2DRIVER$$apreprint
001037668 3367_ $$2BibTeX$$aARTICLE
001037668 3367_ $$2DataCite$$aOutput Types/Working Paper
001037668 520__ $$aProcessing long temporal sequences is a key challenge in deep learning. In recent years, Transformers have become state-of-the-art for this task, but suffer from excessive memory requirements due to the need to explicitly store the sequences. To address this issue, structured state-space sequential (S4) models recently emerged, offering a fixed memory state while still enabling the processing of very long sequence contexts. The recurrent linear update of the state in these models makes them highly efficient on modern graphics processing units (GPU) by unrolling the recurrence into a convolution. However, this approach demands significant memory and massively parallel computation, which is only available on the latest GPUs. In this work, we aim to bring the power of S4 models to edge hardware by significantly reducing the size and computational demand of an S4D model through quantization-aware training, even achieving ternary weights for a simple real-world task. To this end, we extend conventional quantization-aware training to tailor it for analog in-memory compute hardware. We then demonstrate the deployment of recurrent S4D kernels on memrisitve crossbar arrays, enabling their computation in an in-memory compute fashion. To our knowledge, this is the first implementation of S4 kernels on in-memory compute hardware.
001037668 536__ $$0G:(DE-HGF)POF4-5234$$a5234 - Emerging NC Architectures (POF4-523)$$cPOF4-523$$fPOF IV$$x0
001037668 588__ $$aDataset connected to DataCite
001037668 650_7 $$2Other$$aMachine Learning (cs.LG)
001037668 650_7 $$2Other$$aHardware Architecture (cs.AR)
001037668 650_7 $$2Other$$aFOS: Computer and information sciences
001037668 7001_ $$0P:(DE-Juel1)192385$$aYang, Ming-Jay$$b1$$ufzj
001037668 7001_ $$0P:(DE-Juel1)188145$$aStrachan, John Paul$$b2$$ufzj
001037668 773__ $$a10.48550/arXiv.2412.20215
001037668 8564_ $$uhttps://arxiv.org/abs/2412.20215
001037668 8564_ $$uhttps://juser.fz-juelich.de/record/1037668/files/Toward_memristive_SSM_deployment-1.pdf$$yOpenAccess
001037668 909CO $$ooai:juser.fz-juelich.de:1037668$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
001037668 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)174486$$aForschungszentrum Jülich$$b0$$kFZJ
001037668 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)192385$$aForschungszentrum Jülich$$b1$$kFZJ
001037668 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188145$$aForschungszentrum Jülich$$b2$$kFZJ
001037668 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5234$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x0
001037668 9141_ $$y2024
001037668 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001037668 920__ $$lyes
001037668 9201_ $$0I:(DE-Juel1)PGI-14-20210412$$kPGI-14$$lNeuromorphic Compute Nodes$$x0
001037668 980__ $$apreprint
001037668 980__ $$aVDB
001037668 980__ $$aUNRESTRICTED
001037668 980__ $$aI:(DE-Juel1)PGI-14-20210412
001037668 9801_ $$aFullTexts