Home > Publications database > SGD Biased towards Early Important Samples for Efficient Training > print |
001 | 1022039 | ||
005 | 20250903202255.0 | ||
024 | 7 | _ | |a 10.34734/FZJ-2024-01176 |2 datacite_doi |
037 | _ | _ | |a FZJ-2024-01176 |
100 | 1 | _ | |a Quercia, Alessio |0 P:(DE-Juel1)188471 |b 0 |e Corresponding author |u fzj |
111 | 2 | _ | |a International Conference on Data Mining |c Shanghai |d 2023-12-01 - 2023-12-04 |w Peoples R China |
245 | _ | _ | |a SGD Biased towards Early Important Samples for Efficient Training |
260 | _ | _ | |c 2023 |
300 | _ | _ | |a TBA |
336 | 7 | _ | |a CONFERENCE_PAPER |2 ORCID |
336 | 7 | _ | |a Conference Paper |0 33 |2 EndNote |
336 | 7 | _ | |a INPROCEEDINGS |2 BibTeX |
336 | 7 | _ | |a conferenceObject |2 DRIVER |
336 | 7 | _ | |a Output Types/Conference Paper |2 DataCite |
336 | 7 | _ | |a Contribution to a conference proceedings |b contrib |m contrib |0 PUB:(DE-HGF)8 |s 1756892575_17983 |2 PUB:(DE-HGF) |
520 | _ | _ | |a In deep learning, using larger training datasets usually leads to more accurate models. However, simply adding more but redundant data may be inefficient, as some training samples may be more informative than others. We propose to bias SGD (Stochastic Gradient Descent) towards samples that are found to be more important after a few training epochs, by sampling them more often for the rest of training. In contrast to state-of-the-art, our approach requires less computational overhead to estimate sample importance, as itcomputes estimates once during training using the prediction probabilities, and does not require that training be restarted. In the experimental evaluation, we see that our learning technique trains faster than state-of-the-art and can achieve higher test accuracy, especially when datasets are not well balanced. Lastly, results suggest that our approach has intrinsic balancing properties. Code is available at https://github.com/AlessioQuercia/sgd biased. |
536 | _ | _ | |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) |0 G:(DE-HGF)POF4-5112 |c POF4-511 |f POF IV |x 0 |
536 | _ | _ | |a 5232 - Computational Principles (POF4-523) |0 G:(DE-HGF)POF4-5232 |c POF4-523 |f POF IV |x 1 |
536 | _ | _ | |a 5234 - Emerging NC Architectures (POF4-523) |0 G:(DE-HGF)POF4-5234 |c POF4-523 |f POF IV |x 2 |
536 | _ | _ | |a 510 - Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action (POF4-500) |0 G:(DE-HGF)POF4-510 |c POF4-500 |f POF IV |x 3 |
536 | _ | _ | |a HDS LEE - Helmholtz School for Data Science in Life, Earth and Energy (HDS LEE) (HDS-LEE-20190612) |0 G:(DE-Juel1)HDS-LEE-20190612 |c HDS-LEE-20190612 |x 4 |
700 | 1 | _ | |a Morrison, Abigail |0 P:(DE-Juel1)151166 |b 1 |u fzj |
700 | 1 | _ | |a Scharr, Hanno |0 P:(DE-Juel1)129394 |b 2 |u fzj |
700 | 1 | _ | |a Assent, Ira |0 P:(DE-Juel1)188313 |b 3 |u fzj |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.pdf |y OpenAccess |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.pdf |y Restricted |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.gif?subformat=icon |x icon |y OpenAccess |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.jpg?subformat=icon-1440 |x icon-1440 |y OpenAccess |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.jpg?subformat=icon-180 |x icon-180 |y OpenAccess |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD%20Biased%20towards%20Early%20Important%20Samples%20for%20Efficient%20Training-2.jpg?subformat=icon-640 |x icon-640 |y OpenAccess |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.gif?subformat=icon |x icon |y Restricted |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.jpg?subformat=icon-1440 |x icon-1440 |y Restricted |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.jpg?subformat=icon-180 |x icon-180 |y Restricted |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1022039/files/SGD_Biased_towards_Early_Important_Samples_for_Efficient_Training_ICDM_CopyrightNotice.jpg?subformat=icon-640 |x icon-640 |y Restricted |
909 | C | O | |o oai:juser.fz-juelich.de:1022039 |p openaire |p open_access |p VDB |p driver |p dnbdelivery |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 0 |6 P:(DE-Juel1)188471 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 1 |6 P:(DE-Juel1)151166 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 2 |6 P:(DE-Juel1)129394 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 3 |6 P:(DE-Juel1)188313 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-511 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Enabling Computational- & Data-Intensive Science and Engineering |9 G:(DE-HGF)POF4-5112 |x 0 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Natural, Artificial and Cognitive Information Processing |1 G:(DE-HGF)POF4-520 |0 G:(DE-HGF)POF4-523 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Neuromorphic Computing and Network Dynamics |9 G:(DE-HGF)POF4-5232 |x 1 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Natural, Artificial and Cognitive Information Processing |1 G:(DE-HGF)POF4-520 |0 G:(DE-HGF)POF4-523 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Neuromorphic Computing and Network Dynamics |9 G:(DE-HGF)POF4-5234 |x 2 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |x 3 |
914 | 1 | _ | |y 2023 |
915 | _ | _ | |a OpenAccess |0 StatID:(DE-HGF)0510 |2 StatID |
920 | _ | _ | |l yes |
920 | 1 | _ | |0 I:(DE-Juel1)IAS-8-20210421 |k IAS-8 |l Datenanalyse und Maschinenlernen |x 0 |
920 | 1 | _ | |0 I:(DE-Juel1)IAS-6-20130828 |k IAS-6 |l Computational and Systems Neuroscience |x 1 |
980 | _ | _ | |a contrib |
980 | _ | _ | |a VDB |
980 | _ | _ | |a I:(DE-Juel1)IAS-8-20210421 |
980 | _ | _ | |a I:(DE-Juel1)IAS-6-20130828 |
980 | _ | _ | |a UNRESTRICTED |
980 | 1 | _ | |a FullTexts |
Library | Collection | CLSMajor | CLSMinor | Language | Author |
---|