SGD Biased towards Early Important Samples for Efficient Training

Quercia, Alessio; Morrison, Abigail; Assent, Ira; Scharr, Hanno
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Quercia:1022039,
      author       = {Quercia, Alessio and Morrison, Abigail and Scharr, Hanno
                      and Assent, Ira},
      title        = {{SGD} {B}iased towards {E}arly {I}mportant {S}amples for
                      {E}fficient {T}raining},
      reportid     = {FZJ-2024-01176},
      pages        = {TBA},
      year         = {2023},
      abstract     = {In deep learning, using larger training datasets usually
                      leads to more accurate models. However, simply adding more
                      but redundant data may be inefficient, as some training
                      samples may be more informative than others. We propose to
                      bias SGD (Stochastic Gradient Descent) towards samples that
                      are found to be more important after a few training epochs,
                      by sampling them more often for the rest of training. In
                      contrast to state-of-the-art, our approach requires less
                      computational overhead to estimate sample importance, as
                      itcomputes estimates once during training using the
                      prediction probabilities, and does not require that training
                      be restarted. In the experimental evaluation, we see that
                      our learning technique trains faster than state-of-the-art
                      and can achieve higher test accuracy, especially when
                      datasets are not well balanced. Lastly, results suggest that
                      our approach has intrinsic balancing properties. Code is
                      available at https://github.com/AlessioQuercia/sgd biased.},
      month         = {Dec},
      date          = {2023-12-01},
      organization  = {International Conference on Data
                       Mining, Shanghai (Peoples R China), 1
                       Dec 2023 - 4 Dec 2023},
      cin          = {IAS-8 / IAS-6},
      cid          = {I:(DE-Juel1)IAS-8-20210421 / I:(DE-Juel1)IAS-6-20130828},
      pnm          = {5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs)
                      and Research Groups (POF4-511) / 5232 - Computational
                      Principles (POF4-523) / 5234 - Emerging NC Architectures
                      (POF4-523) / 510 - Engineering Digital Futures –
                      Supercomputing, Data Management and Information Security for
                      Knowledge and Action (POF4-500) / HDS LEE - Helmholtz School
                      for Data Science in Life, Earth and Energy (HDS LEE)
                      (HDS-LEE-20190612)},
      pid          = {G:(DE-HGF)POF4-5112 / G:(DE-HGF)POF4-5232 /
                      G:(DE-HGF)POF4-5234 / G:(DE-HGF)POF4-510 /
                      G:(DE-Juel1)HDS-LEE-20190612},
      typ          = {PUB:(DE-HGF)8},
      doi          = {10.34734/FZJ-2024-01176},
      url          = {https://juser.fz-juelich.de/record/1022039},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help