SGD Biased towards Early Important Samples for Efficient Training

Quercia, Alessio; Morrison, Abigail; Assent, Ira; Scharr, Hanno

Items
Marc 21

001			1021206
005			20250903202255.0
024	7	_	\|a 10.34734/FZJ-2024-00647 \|2 datacite_doi
037	_	_	\|a FZJ-2024-00647
041	_	_	\|a English
100	1	_	\|a Quercia, Alessio \|0 P:(DE-Juel1)188471 \|b 0 \|e Corresponding author \|u fzj
111	2	_	\|a International Conference on Data Mining \|g ICDM2023 \|c Shanghai \|d 2023-12-01 - 2023-12-04 \|w Peoples R China
245	_	_	\|a SGD Biased towards Early Important Samples for Efficient Training
260	_	_	\|c 2023
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a Other \|2 DataCite
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a LECTURE_SPEECH \|2 ORCID
336	7	_	\|a Conference Presentation \|b conf \|m conf \|0 PUB:(DE-HGF)6 \|s 1756892510_26549 \|2 PUB:(DE-HGF) \|x After Call
520	_	_	\|a In deep learning, using larger training datasets usually leads to more accurate models. However, simply adding more but redundant data may be inefficient, as some training samples may be more informative than others. We propose to bias SGD (Stochastic Gradient Descent) towards samples that are found to be more important after a few training epochs, by sampling them more often for the rest of training. In contrast to state-of-the-art, our approach requires less computational overhead to estimate sample importance, as it computes estimates once during training using the prediction probabilities, and does not require that training be restarted. In the experimental evaluation, we see that our learning technique trains faster than state-of-the-art and can achieve higher test accuracy, especially when datasets are not well balanced. Lastly, results suggest that our approach has intrinsic balancing properties. Code is available at https://github.com/AlessioQuercia/sgd biased.
536	_	_	\|a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) \|0 G:(DE-HGF)POF4-5112 \|c POF4-511 \|f POF IV \|x 0
536	_	_	\|a HDS LEE - Helmholtz School for Data Science in Life, Earth and Energy (HDS LEE) (HDS-LEE-20190612) \|0 G:(DE-Juel1)HDS-LEE-20190612 \|c HDS-LEE-20190612 \|x 1
700	1	_	\|a Morrison, Abigail \|0 P:(DE-Juel1)151166 \|b 1 \|u fzj
700	1	_	\|a Scharr, Hanno \|0 P:(DE-Juel1)129394 \|b 2 \|u fzj
700	1	_	\|a Assent, Ira \|0 P:(DE-Juel1)188313 \|b 3 \|u fzj
856	4	_	\|u https://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.pdf \|y OpenAccess
856	4	_	\|u https://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.gif?subformat=icon \|x icon \|y OpenAccess
856	4	_	\|u https://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.jpg?subformat=icon-1440 \|x icon-1440 \|y OpenAccess
856	4	_	\|u https://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.jpg?subformat=icon-180 \|x icon-180 \|y OpenAccess
856	4	_	\|u https://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.jpg?subformat=icon-640 \|x icon-640 \|y OpenAccess
909	C	O	\|o oai:juser.fz-juelich.de:1021206 \|p openaire \|p open_access \|p VDB \|p driver
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)188471
910	1	_	\|a RWTH Aachen \|0 I:(DE-588b)36225-6 \|k RWTH \|b 0 \|6 P:(DE-Juel1)188471
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)151166
910	1	_	\|a RWTH Aachen \|0 I:(DE-588b)36225-6 \|k RWTH \|b 1 \|6 P:(DE-Juel1)151166
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)129394
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 3 \|6 P:(DE-Juel1)188313
913	1	_	\|a DE-HGF \|b Key Technologies \|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action \|1 G:(DE-HGF)POF4-510 \|0 G:(DE-HGF)POF4-511 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Enabling Computational- & Data-Intensive Science and Engineering \|9 G:(DE-HGF)POF4-5112 \|x 0
914	1	_	\|y 2023
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)IAS-8-20210421 \|k IAS-8 \|l Datenanalyse und Maschinenlernen \|x 0
920	1	_	\|0 I:(DE-Juel1)IAS-6-20130828 \|k IAS-6 \|l Computational and Systems Neuroscience \|x 1
980	_	_	\|a conf
980	_	_	\|a VDB
980	_	_	\|a I:(DE-Juel1)IAS-8-20210421
980	_	_	\|a I:(DE-Juel1)IAS-6-20130828
980	_	_	\|a UNRESTRICTED
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help