SGD Biased towards Early Important Samples for Efficient Training

Quercia, Alessio; Morrison, Abigail; Assent, Ira; Scharr, Hanno
001021206 001__ 1021206
001021206 005__ 20250903202255.0
001021206 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-00647
001021206 037__ $$aFZJ-2024-00647
001021206 041__ $$aEnglish
001021206 1001_ $$0P:(DE-Juel1)188471$$aQuercia, Alessio$$b0$$eCorresponding author$$ufzj
001021206 1112_ $$aInternational Conference on Data Mining$$cShanghai$$d2023-12-01 - 2023-12-04$$gICDM2023$$wPeoples R China
001021206 245__ $$aSGD Biased towards Early Important Samples for Efficient Training
001021206 260__ $$c2023
001021206 3367_ $$033$$2EndNote$$aConference Paper
001021206 3367_ $$2DataCite$$aOther
001021206 3367_ $$2BibTeX$$aINPROCEEDINGS
001021206 3367_ $$2DRIVER$$aconferenceObject
001021206 3367_ $$2ORCID$$aLECTURE_SPEECH
001021206 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1756892510_26549$$xAfter Call
001021206 520__ $$aIn deep learning, using larger training datasets usually leads to more accurate models. However, simply adding more but redundant data may be inefficient, as some training samples may be more informative than others. We propose to bias SGD (Stochastic Gradient Descent) towards samples that are found to be more important after a few training epochs, by sampling them more often for the rest of training. In contrast to state-of-the-art, our approach requires less computational overhead to estimate sample importance, as it computes estimates once during training using the prediction probabilities, and does not require that training be restarted. In the experimental evaluation, we see that our learning technique trains faster than state-of-the-art and can achieve higher test accuracy, especially when datasets are not well balanced. Lastly, results suggest that our approach has intrinsic balancing properties. Code is available at https://github.com/AlessioQuercia/sgd biased.
001021206 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001021206 536__ $$0G:(DE-Juel1)HDS-LEE-20190612$$aHDS LEE - Helmholtz School for Data Science in Life, Earth and Energy (HDS LEE) (HDS-LEE-20190612)$$cHDS-LEE-20190612$$x1
001021206 7001_ $$0P:(DE-Juel1)151166$$aMorrison, Abigail$$b1$$ufzj
001021206 7001_ $$0P:(DE-Juel1)129394$$aScharr, Hanno$$b2$$ufzj
001021206 7001_ $$0P:(DE-Juel1)188313$$aAssent, Ira$$b3$$ufzj
001021206 8564_ $$uhttps://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.pdf$$yOpenAccess
001021206 8564_ $$uhttps://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.gif?subformat=icon$$xicon$$yOpenAccess
001021206 8564_ $$uhttps://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
001021206 8564_ $$uhttps://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
001021206 8564_ $$uhttps://juser.fz-juelich.de/record/1021206/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
001021206 909CO $$ooai:juser.fz-juelich.de:1021206$$popenaire$$popen_access$$pVDB$$pdriver
001021206 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188471$$aForschungszentrum Jülich$$b0$$kFZJ
001021206 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-Juel1)188471$$aRWTH Aachen$$b0$$kRWTH
001021206 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)151166$$aForschungszentrum Jülich$$b1$$kFZJ
001021206 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-Juel1)151166$$aRWTH Aachen$$b1$$kRWTH
001021206 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)129394$$aForschungszentrum Jülich$$b2$$kFZJ
001021206 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188313$$aForschungszentrum Jülich$$b3$$kFZJ
001021206 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001021206 9141_ $$y2023
001021206 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001021206 920__ $$lyes
001021206 9201_ $$0I:(DE-Juel1)IAS-8-20210421$$kIAS-8$$lDatenanalyse und Maschinenlernen$$x0
001021206 9201_ $$0I:(DE-Juel1)IAS-6-20130828$$kIAS-6$$lComputational and Systems Neuroscience$$x1
001021206 980__ $$aconf
001021206 980__ $$aVDB
001021206 980__ $$aI:(DE-Juel1)IAS-8-20210421
001021206 980__ $$aI:(DE-Juel1)IAS-6-20130828
001021206 980__ $$aUNRESTRICTED
001021206 9801_ $$aFullTexts
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help