SGD Biased towards Early Important Samples for Efficient Training

Quercia, Alessio; Morrison, Abigail; Assent, Ira; Scharr, Hanno
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Quercia:1021206,
      author       = {Quercia, Alessio and Morrison, Abigail and Scharr, Hanno
                      and Assent, Ira},
      title        = {{SGD} {B}iased towards {E}arly {I}mportant {S}amples for
                      {E}fficient {T}raining},
      reportid     = {FZJ-2024-00647},
      year         = {2023},
      abstract     = {In deep learning, using larger training datasets usually
                      leads to more accurate models. However, simply adding more
                      but redundant data may be inefficient, as some training
                      samples may be more informative than others. We propose to
                      bias SGD (Stochastic Gradient Descent) towards samples that
                      are found to be more important after a few training epochs,
                      by sampling them more often for the rest of training. In
                      contrast to state-of-the-art, our approach requires less
                      computational overhead to estimate sample importance, as it
                      computes estimates once during training using the prediction
                      probabilities, and does not require that training be
                      restarted. In the experimental evaluation, we see that our
                      learning technique trains faster than state-of-the-art and
                      can achieve higher test accuracy, especially when datasets
                      are not well balanced. Lastly, results suggest that our
                      approach has intrinsic balancing properties. Code is
                      available at https://github.com/AlessioQuercia/sgd biased.},
      month         = {Dec},
      date          = {2023-12-01},
      organization  = {International Conference on Data
                       Mining, Shanghai (Peoples R China), 1
                       Dec 2023 - 4 Dec 2023},
      subtyp        = {After Call},
      cin          = {IAS-8 / IAS-6},
      cid          = {I:(DE-Juel1)IAS-8-20210421 / I:(DE-Juel1)IAS-6-20130828},
      pnm          = {5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs)
                      and Research Groups (POF4-511) / HDS LEE - Helmholtz School
                      for Data Science in Life, Earth and Energy (HDS LEE)
                      (HDS-LEE-20190612)},
      pid          = {G:(DE-HGF)POF4-5112 / G:(DE-Juel1)HDS-LEE-20190612},
      typ          = {PUB:(DE-HGF)6},
      doi          = {10.34734/FZJ-2024-00647},
      url          = {https://juser.fz-juelich.de/record/1021206},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help