Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in Deep Learning

Penke, Carolin
001034062 001__ 1034062
001034062 005__ 20250203103147.0
001034062 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-06883
001034062 037__ $$aFZJ-2024-06883
001034062 1001_ $$0P:(DE-Juel1)192254$$aPenke, Carolin$$b0$$eCorresponding author$$ufzj
001034062 1112_ $$aRWTH Aachen SFB 1481 Colloquium$$cAachen$$d2024-12-11 - 2024-12-11$$wGermany
001034062 245__ $$aEfficient Computation of Low-Rank Representations to Reduce Memory Requirements in Deep Learning$$f2024-09-11 - 
001034062 260__ $$c2024
001034062 3367_ $$033$$2EndNote$$aConference Paper
001034062 3367_ $$2DataCite$$aOther
001034062 3367_ $$2BibTeX$$aINPROCEEDINGS
001034062 3367_ $$2ORCID$$aLECTURE_SPEECH
001034062 3367_ $$0PUB:(DE-HGF)31$$2PUB:(DE-HGF)$$aTalk (non-conference)$$btalk$$mtalk$$s1736497335_6155$$xInvited
001034062 3367_ $$2DINI$$aOther
001034062 520__ $$aComputing an orthogonal basis that approximates the range or corange of a matrix is a ubiquitous problem in computational science and engineering. In numerous applications, a rapid decay of singular values permits the use of such bases to approximate a linear operator by restricting it to low-rank subspaces, thereby significantly reducing computational and storage demands. A powerful approach for constructing a basis with a specified rank or approximation tolerance is the (adaptive) randomized range finder. In this talk, we introduce a novel variant of this algorithm, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators. This development is motivated by its potential to substantially lower memory requirements during the training of deep neural networks such as transformers. We discuss the GaLore (Gradient Low-Rank Projection) training framework, and demonstrate how the randomized range finder can be employed to derive low-rank representations of optimizer states. Further potential avenues for future research are discussed.
001034062 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001034062 536__ $$0G:(DE-Juel-1)68GX21007F$$aOpenGPT-X - Aufbau eines Gaia-X Knotens für große KI-Sprachmodelle und innovative Sprachapplikations-Services; Teilvorhaben: Optimierung und Skalierung auf großen HPC-Systemen (68GX21007F)$$c68GX21007F$$x1
001034062 8564_ $$uhttps://juser.fz-juelich.de/record/1034062/files/LowRankRepresentationsDL.pdf$$yOpenAccess
001034062 909CO $$ooai:juser.fz-juelich.de:1034062$$pdriver$$pVDB$$popen_access$$popenaire
001034062 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)192254$$aForschungszentrum Jülich$$b0$$kFZJ
001034062 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001034062 9141_ $$y2024
001034062 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001034062 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001034062 9801_ $$aFullTexts
001034062 980__ $$atalk
001034062 980__ $$aVDB
001034062 980__ $$aUNRESTRICTED
001034062 980__ $$aI:(DE-Juel1)JSC-20090406
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe