Home > Publications database > Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in Deep Learning > print |
001 | 1034062 | ||
005 | 20250203103147.0 | ||
024 | 7 | _ | |a 10.34734/FZJ-2024-06883 |2 datacite_doi |
037 | _ | _ | |a FZJ-2024-06883 |
100 | 1 | _ | |a Penke, Carolin |0 P:(DE-Juel1)192254 |b 0 |e Corresponding author |u fzj |
111 | 2 | _ | |a RWTH Aachen SFB 1481 Colloquium |c Aachen |d 2024-12-11 - 2024-12-11 |w Germany |
245 | _ | _ | |a Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in Deep Learning |f 2024-09-11 - |
260 | _ | _ | |c 2024 |
336 | 7 | _ | |a Conference Paper |0 33 |2 EndNote |
336 | 7 | _ | |a Other |2 DataCite |
336 | 7 | _ | |a INPROCEEDINGS |2 BibTeX |
336 | 7 | _ | |a LECTURE_SPEECH |2 ORCID |
336 | 7 | _ | |a Talk (non-conference) |b talk |m talk |0 PUB:(DE-HGF)31 |s 1736497335_6155 |2 PUB:(DE-HGF) |x Invited |
336 | 7 | _ | |a Other |2 DINI |
520 | _ | _ | |a Computing an orthogonal basis that approximates the range or corange of a matrix is a ubiquitous problem in computational science and engineering. In numerous applications, a rapid decay of singular values permits the use of such bases to approximate a linear operator by restricting it to low-rank subspaces, thereby significantly reducing computational and storage demands. A powerful approach for constructing a basis with a specified rank or approximation tolerance is the (adaptive) randomized range finder. In this talk, we introduce a novel variant of this algorithm, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators. This development is motivated by its potential to substantially lower memory requirements during the training of deep neural networks such as transformers. We discuss the GaLore (Gradient Low-Rank Projection) training framework, and demonstrate how the randomized range finder can be employed to derive low-rank representations of optimizer states. Further potential avenues for future research are discussed. |
536 | _ | _ | |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) |0 G:(DE-HGF)POF4-5112 |c POF4-511 |f POF IV |x 0 |
536 | _ | _ | |a OpenGPT-X - Aufbau eines Gaia-X Knotens für große KI-Sprachmodelle und innovative Sprachapplikations-Services; Teilvorhaben: Optimierung und Skalierung auf großen HPC-Systemen (68GX21007F) |0 G:(DE-Juel-1)68GX21007F |c 68GX21007F |x 1 |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1034062/files/LowRankRepresentationsDL.pdf |y OpenAccess |
909 | C | O | |o oai:juser.fz-juelich.de:1034062 |p openaire |p open_access |p VDB |p driver |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 0 |6 P:(DE-Juel1)192254 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-511 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Enabling Computational- & Data-Intensive Science and Engineering |9 G:(DE-HGF)POF4-5112 |x 0 |
914 | 1 | _ | |y 2024 |
915 | _ | _ | |a OpenAccess |0 StatID:(DE-HGF)0510 |2 StatID |
920 | 1 | _ | |0 I:(DE-Juel1)JSC-20090406 |k JSC |l Jülich Supercomputing Center |x 0 |
980 | 1 | _ | |a FullTexts |
980 | _ | _ | |a talk |
980 | _ | _ | |a VDB |
980 | _ | _ | |a UNRESTRICTED |
980 | _ | _ | |a I:(DE-Juel1)JSC-20090406 |
Library | Collection | CLSMajor | CLSMinor | Language | Author |
---|