Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning

Pleines, Marco; Preuss, Mike; Zimmer, Frank; Jitsev, Jenia
doi:10.1109/CoG47356.2020.9231802
000890092 001__ 890092
000890092 005__ 20231116095325.0
000890092 020__ $$a978-1-7281-4533-4
000890092 0247_ $$2doi$$a10.1109/CoG47356.2020.9231802
000890092 0247_ $$2Handle$$a2128/26998
000890092 0247_ $$2WOS$$aWOS:000632592300058
000890092 037__ $$aFZJ-2021-00681
000890092 1001_ $$0P:(DE-HGF)0$$aPleines, Marco$$b0$$eCorresponding author
000890092 1112_ $$a2020 IEEE Conference on Games (CoG)$$cOsaka$$d2020-08-24 - 2020-08-27$$wJapan
000890092 245__ $$aObstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning
000890092 260__ $$bIEEE$$c2020
000890092 29510 $$a2020 IEEE Conference on Games (CoG) : [Proceedings] - IEEE, 2020
000890092 300__ $$a447 - 454
000890092 3367_ $$2ORCID$$aCONFERENCE_PAPER
000890092 3367_ $$033$$2EndNote$$aConference Paper
000890092 3367_ $$2BibTeX$$aINPROCEEDINGS
000890092 3367_ $$2DRIVER$$aconferenceObject
000890092 3367_ $$2DataCite$$aOutput Types/Conference Paper
000890092 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1611566175_25895
000890092 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb
000890092 520__ $$aThe Obstacle Tower Challenge is the task to master a procedurally generated chain of levels that subsequently get harder to complete. Whereas the most top performing entries of last year's competition used human demonstrations or reward shaping to learn how to cope with the challenge, we present an approach that performed competitively (placed 7th) but starts completely from scratch by means of Deep Reinforcement Learning with a relatively simple feed-forward deep network structure. We especially look at the generalization performance of the taken approach concerning different seeds and various visual themes that have become available after the competition, and investigate where the agent fails and why. Note that our approach does not possess a short-term memory like employing recurrent hidden states. With this work, we hope to contribute to a better understanding of what is possible with a relatively simple, flexible solution that can be applied to learning in environments featuring complex 3D visual input where the abstract task structure itself is still fairly simple.
000890092 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0
000890092 588__ $$aDataset connected to CrossRef Conference
000890092 7001_ $$0P:(DE-Juel1)158080$$aJitsev, Jenia$$b1$$eCorresponding author$$ufzj
000890092 7001_ $$0P:(DE-HGF)0$$aPreuss, Mike$$b2
000890092 7001_ $$0P:(DE-HGF)0$$aZimmer, Frank$$b3
000890092 773__ $$a10.1109/CoG47356.2020.9231802
000890092 8564_ $$uhttps://juser.fz-juelich.de/record/890092/files/Obstacle%20tower%20without%20human%20demonstrations%20How%20far%20a%20deep%20feed-forward%20network%20goes%20with%20reinforcement%20learning%20-%202020%20-%20Pleines%20et%20al..pdf$$yOpenAccess
000890092 909CO $$ooai:juser.fz-juelich.de:890092$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000890092 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)158080$$aForschungszentrum Jülich$$b1$$kFZJ
000890092 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0
000890092 9141_ $$y2020
000890092 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000890092 920__ $$lyes
000890092 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000890092 980__ $$acontrib
000890092 980__ $$aVDB
000890092 980__ $$aUNRESTRICTED
000890092 980__ $$acontb
000890092 980__ $$aI:(DE-Juel1)JSC-20090406
000890092 9801_ $$aFullTexts
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe