Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training

Penke, Carolin
001034068 001__ 1034068
001034068 005__ 20250203103429.0
001034068 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-06889
001034068 037__ $$aFZJ-2024-06889
001034068 1001_ $$0P:(DE-Juel1)192254$$aPenke, Carolin$$b0$$eCorresponding author$$ufzj
001034068 1112_ $$aLoRAINNe’24: workshop on LOw-Rank Approximations and their Interactions with Neural NEtworks$$cNancy$$d2024-11-26 - 2024-11-27$$gLoRAINNe’24$$wFrance
001034068 245__ $$aEfficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training
001034068 260__ $$c2024
001034068 3367_ $$033$$2EndNote$$aConference Paper
001034068 3367_ $$2DataCite$$aOther
001034068 3367_ $$2BibTeX$$aINPROCEEDINGS
001034068 3367_ $$2DRIVER$$aconferenceObject
001034068 3367_ $$2ORCID$$aLECTURE_SPEECH
001034068 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1736500131_6151$$xInvited
001034068 520__ $$aThe OpenGPT-X project represents one of Europe’s pioneering publicly funded efforts in the domain of large language models (LLMs), covering the entire lifecycle from pre-training foundational models to fine-tuning and practical application development. To maximize the efficiency of training on High Performance Computing (HPC) resources, strategies aimed at reducing computational and memory demands are being explored. A promising avenue exploits the low-rank structure of gradients, as done in the LoRA or GaLore frameworks, the latter of which relies on the computation of dominant low-rank subspaces during training. The randomized range finder algorithm provides a more efficient alternative to computing a full singular value decomposition (SVD). We introduce a novel variant of the range finder, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators.
001034068 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001034068 536__ $$0G:(DE-Juel-1)68GX21007F$$aOpenGPT-X - Aufbau eines Gaia-X Knotens für große KI-Sprachmodelle und innovative Sprachapplikations-Services; Teilvorhaben: Optimierung und Skalierung auf großen HPC-Systemen (68GX21007F)$$c68GX21007F$$x1
001034068 8564_ $$uhttps://juser.fz-juelich.de/record/1034068/files/Penke_LowRankRepresentationsToReduceMemoryInLLMs.pdf$$yOpenAccess
001034068 909CO $$ooai:juser.fz-juelich.de:1034068$$pdriver$$pVDB$$popen_access$$popenaire
001034068 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)192254$$aForschungszentrum Jülich$$b0$$kFZJ
001034068 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001034068 9141_ $$y2024
001034068 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001034068 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001034068 9801_ $$aFullTexts
001034068 980__ $$aconf
001034068 980__ $$aVDB
001034068 980__ $$aUNRESTRICTED
001034068 980__ $$aI:(DE-Juel1)JSC-20090406
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe