IMSSA: Deploying modern state-space models on memristive in-memory compute hardware

Siegel, Sebastian; Yang, Ming-Jay; Strachan, John Paul

doi:10.48550/arXiv.2412.20215

Items
Marc 21

001			1037668
005			20250220092006.0
024	7	_	\|a 10.48550/arXiv.2412.20215 \|2 doi
024	7	_	\|a 10.34734/FZJ-2025-00833 \|2 datacite_doi
037	_	_	\|a FZJ-2025-00833
100	1	_	\|a Siegel, Sebastian \|0 P:(DE-Juel1)174486 \|b 0 \|e Corresponding author \|u fzj
245	_	_	\|a IMSSA: Deploying modern state-space models on memristive in-memory compute hardware
260	_	_	\|c 2024 \|b arXiv
336	7	_	\|a Preprint \|b preprint \|m preprint \|0 PUB:(DE-HGF)25 \|s 1738828598_24855 \|2 PUB:(DE-HGF)
336	7	_	\|a WORKING_PAPER \|2 ORCID
336	7	_	\|a Electronic Article \|0 28 \|2 EndNote
336	7	_	\|a preprint \|2 DRIVER
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a Output Types/Working Paper \|2 DataCite
520	_	_	\|a Processing long temporal sequences is a key challenge in deep learning. In recent years, Transformers have become state-of-the-art for this task, but suffer from excessive memory requirements due to the need to explicitly store the sequences. To address this issue, structured state-space sequential (S4) models recently emerged, offering a fixed memory state while still enabling the processing of very long sequence contexts. The recurrent linear update of the state in these models makes them highly efficient on modern graphics processing units (GPU) by unrolling the recurrence into a convolution. However, this approach demands significant memory and massively parallel computation, which is only available on the latest GPUs. In this work, we aim to bring the power of S4 models to edge hardware by significantly reducing the size and computational demand of an S4D model through quantization-aware training, even achieving ternary weights for a simple real-world task. To this end, we extend conventional quantization-aware training to tailor it for analog in-memory compute hardware. We then demonstrate the deployment of recurrent S4D kernels on memrisitve crossbar arrays, enabling their computation in an in-memory compute fashion. To our knowledge, this is the first implementation of S4 kernels on in-memory compute hardware.
536	_	_	\|a 5234 - Emerging NC Architectures (POF4-523) \|0 G:(DE-HGF)POF4-5234 \|c POF4-523 \|f POF IV \|x 0
588	_	_	\|a Dataset connected to DataCite
650	_	7	\|a Machine Learning (cs.LG) \|2 Other
650	_	7	\|a Hardware Architecture (cs.AR) \|2 Other
650	_	7	\|a FOS: Computer and information sciences \|2 Other
700	1	_	\|a Yang, Ming-Jay \|0 P:(DE-Juel1)192385 \|b 1 \|u fzj
700	1	_	\|a Strachan, John Paul \|0 P:(DE-Juel1)188145 \|b 2 \|u fzj
773	_	_	\|a 10.48550/arXiv.2412.20215
856	4	_	\|u https://arxiv.org/abs/2412.20215
856	4	_	\|u https://juser.fz-juelich.de/record/1037668/files/Toward_memristive_SSM_deployment-1.pdf \|y OpenAccess
909	C	O	\|o oai:juser.fz-juelich.de:1037668 \|p openaire \|p open_access \|p VDB \|p driver \|p dnbdelivery
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)174486
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)192385
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)188145
913	1	_	\|a DE-HGF \|b Key Technologies \|l Natural, Artificial and Cognitive Information Processing \|1 G:(DE-HGF)POF4-520 \|0 G:(DE-HGF)POF4-523 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Neuromorphic Computing and Network Dynamics \|9 G:(DE-HGF)POF4-5234 \|x 0
914	1	_	\|y 2024
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)PGI-14-20210412 \|k PGI-14 \|l Neuromorphic Compute Nodes \|x 0
980	_	_	\|a preprint
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-Juel1)PGI-14-20210412
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe