Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning

Pleines, Marco; Preuss, Mike; Zimmer, Frank; Jitsev, Jenia
doi:10.1109/CoG47356.2020.9231802
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Pleines:890092,
      author       = {Pleines, Marco and Jitsev, Jenia and Preuss, Mike and
                      Zimmer, Frank},
      title        = {{O}bstacle {T}ower {W}ithout {H}uman {D}emonstrations:
                      {H}ow {F}ar a {D}eep {F}eed-{F}orward {N}etwork {G}oes with
                      {R}einforcement {L}earning},
      publisher    = {IEEE},
      reportid     = {FZJ-2021-00681},
      isbn         = {978-1-7281-4533-4},
      pages        = {447 - 454},
      year         = {2020},
      comment      = {2020 IEEE Conference on Games (CoG) : [Proceedings] - IEEE,
                      2020},
      booktitle     = {2020 IEEE Conference on Games (CoG) :
                       [Proceedings] - IEEE, 2020},
      abstract     = {The Obstacle Tower Challenge is the task to master a
                      procedurally generated chain of levels that subsequently get
                      harder to complete. Whereas the most top performing entries
                      of last year's competition used human demonstrations or
                      reward shaping to learn how to cope with the challenge, we
                      present an approach that performed competitively (placed
                      7th) but starts completely from scratch by means of Deep
                      Reinforcement Learning with a relatively simple feed-forward
                      deep network structure. We especially look at the
                      generalization performance of the taken approach concerning
                      different seeds and various visual themes that have become
                      available after the competition, and investigate where the
                      agent fails and why. Note that our approach does not possess
                      a short-term memory like employing recurrent hidden states.
                      With this work, we hope to contribute to a better
                      understanding of what is possible with a relatively simple,
                      flexible solution that can be applied to learning in
                      environments featuring complex 3D visual input where the
                      abstract task structure itself is still fairly simple.},
      month         = {Aug},
      date          = {2020-08-24},
      organization  = {2020 IEEE Conference on Games (CoG),
                       Osaka (Japan), 24 Aug 2020 - 27 Aug
                       2020},
      cin          = {JSC},
      cid          = {I:(DE-Juel1)JSC-20090406},
      pnm          = {512 - Data-Intensive Science and Federated Computing
                      (POF3-512)},
      pid          = {G:(DE-HGF)POF3-512},
      typ          = {PUB:(DE-HGF)8 / PUB:(DE-HGF)7},
      UT           = {WOS:000632592300058},
      doi          = {10.1109/CoG47356.2020.9231802},
      url          = {https://juser.fz-juelich.de/record/890092},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help