SGD Biased towards Early Important Samples for Efficient Training

Quercia, Alessio; Morrison, Abigail; Assent, Ira; Scharr, Hanno
001022039 001__ 1022039
001022039 005__ 20250903202255.0
001022039 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-01176
001022039 037__ $$aFZJ-2024-01176
001022039 1001_ $$0P:(DE-Juel1)188471$$aQuercia, Alessio$$b0$$eCorresponding author$$ufzj
001022039 1112_ $$aInternational Conference on Data Mining$$cShanghai$$d2023-12-01 - 2023-12-04$$wPeoples R China
001022039 245__ $$aSGD Biased towards Early Important Samples for Efficient Training
001022039 260__ $$c2023
001022039 300__ $$aTBA
001022039 3367_ $$2ORCID$$aCONFERENCE_PAPER
001022039 3367_ $$033$$2EndNote$$aConference Paper
001022039 3367_ $$2BibTeX$$aINPROCEEDINGS
001022039 3367_ $$2DRIVER$$aconferenceObject
001022039 3367_ $$2DataCite$$aOutput Types/Conference Paper
001022039 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1756892575_17983
001022039 520__ $$aIn deep learning, using larger training datasets usually leads to more accurate models. However, simply adding more but redundant data may be inefficient, as some training samples may be more informative than others. We propose to bias SGD (Stochastic Gradient Descent) towards samples that are found to be more important after a few training epochs, by sampling them more often for the rest of training. In contrast to state-of-the-art, our approach requires less computational overhead to estimate sample importance, as itcomputes estimates once during training using the prediction probabilities, and does not require that training be restarted. In the experimental evaluation, we see that our learning technique trains faster than state-of-the-art and can achieve higher test accuracy, especially when datasets are not well balanced. Lastly, results suggest that our approach has intrinsic balancing properties. Code is available at https://github.com/AlessioQuercia/sgd biased.
001022039 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001022039 536__ $$0G:(DE-HGF)POF4-5232$$a5232 - Computational Principles (POF4-523)$$cPOF4-523$$fPOF IV$$x1
001022039 536__ $$0G:(DE-HGF)POF4-5234$$a5234 - Emerging NC Architectures (POF4-523)$$cPOF4-523$$fPOF IV$$x2
001022039 536__ $$0G:(DE-HGF)POF4-510$$a510 - Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action (POF4-500)$$cPOF4-500$$fPOF IV$$x3
001022039 536__ $$0G:(DE-Juel1)HDS-LEE-20190612$$aHDS LEE - Helmholtz School for Data Science in Life, Earth and Energy (HDS LEE) (HDS-LEE-20190612)$$cHDS-LEE-20190612$$x4
001022039 7001_ $$0P:(DE-Juel1)151166$$aMorrison, Abigail$$b1$$ufzj
001022039 7001_ $$0P:(DE-Juel1)129394$$aScharr, Hanno$$b2$$ufzj
001022039 7001_ $$0P:(DE-Juel1)188313$$aAssent, Ira$$b3$$ufzj
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.pdf$$yOpenAccess
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.pdf$$yRestricted
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.gif?subformat=icon$$xicon$$yOpenAccess
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.gif?subformat=icon$$xicon$$yRestricted
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.jpg?subformat=icon-1440$$xicon-1440$$yRestricted
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.jpg?subformat=icon-180$$xicon-180$$yRestricted
001022039 8564_ $$uhttps://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.jpg?subformat=icon-640$$xicon-640$$yRestricted
001022039 909CO $$ooai:juser.fz-juelich.de:1022039$$popenaire$$popen_access$$pVDB$$pdriver$$pdnbdelivery
001022039 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188471$$aForschungszentrum Jülich$$b0$$kFZJ
001022039 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)151166$$aForschungszentrum Jülich$$b1$$kFZJ
001022039 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)129394$$aForschungszentrum Jülich$$b2$$kFZJ
001022039 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188313$$aForschungszentrum Jülich$$b3$$kFZJ
001022039 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001022039 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5232$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x1
001022039 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5234$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x2
001022039 9131_ $$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$x3
001022039 9141_ $$y2023
001022039 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001022039 920__ $$lyes
001022039 9201_ $$0I:(DE-Juel1)IAS-8-20210421$$kIAS-8$$lDatenanalyse und Maschinenlernen$$x0
001022039 9201_ $$0I:(DE-Juel1)IAS-6-20130828$$kIAS-6$$lComputational and Systems Neuroscience$$x1
001022039 980__ $$acontrib
001022039 980__ $$aVDB
001022039 980__ $$aI:(DE-Juel1)IAS-8-20210421
001022039 980__ $$aI:(DE-Juel1)IAS-6-20130828
001022039 980__ $$aUNRESTRICTED
001022039 9801_ $$aFullTexts
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe